earticle

논문검색

AI는 한영 번역을 어떻게 평가하는가? 챗GPT-인간 평가의 상관관계와 챗GPT 평가의 특징에 관하여

원문정보

The Potential of ChatGPT as a Translation Evaluator: Characteristics and Comparisons to Human Evaluation.

이선우, 이상빈

피인용수 : 0(자료제공 : 네이버학술정보)

초록

영어

In this study, we carried out a series of experiments to explore how ChatGPT (version 4o) evaluated Korean-English translations. Using two datasets of human translations (n=57) and two datasets of post-edited translations (n=56), all drawn from Lee and Lee (2021), we adopted two evaluation approaches with strict prompt control. In Experiment A, ChatGPT rated the four datasets freely on a five-point scale without specific criteria. In Experiment B, which was conducted concurrently with Experiment A, ChatGPT rated the same datasets using a prescribed, criterion-referenced five-point scale. To assess intra-rater reliability, we repeated both experiments one month later. This study yielded both quantitative and qualitative findings, including the following: (1) ChatGPT’s average scores differed significantly from those of human raters; (2) correlations between human and ChatGPT scores ranged from ‘moderate’ to ‘strong’; (3) the use of the prescribed rating scale improved ChatGPT’s reliability as a rater; (4) ChatGPT exhibited very low intra-rater reliability; and (5) ChatGPT’s self-justifications for its ratings varied in quality, often failing to identify obvious errors.

목차


1. 서론
2. 선행연구 검토
2.1. 챗GPT와 외국어 작문/번역
2.2. 챗GPT를 평가 도구로 활용한 연구
3. 연구 방법
4. 분석 결과
4.1. 정량분석
4.1.1. 평균 비교
4.1.2. 평가자 간의 상관관계
4.1.3. 평가자 내 신뢰도
4.2. 정성분석
4.2.1. 평가 근거와 관련된 특징
4.2.2. 척도에 따른 평가 차이
4.2.3. 챗GPT 평가 사례
5. 결론
참고문헌

저자정보

  • 이선우 Lee, Sun-Woo. 한국외국어대학교
  • 이상빈 Lee, Sang-Bin. 한국외국어대학교

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 6,000원

      0개의 논문이 장바구니에 담겼습니다.