AI는 한영 번역을 어떻게 평가하는가? 챗GPT-인간 평가의 상관관계와 챗GPT 평가의 특징에 관하여

이선우; 이상빈

AI는 한영 번역을 어떻게 평가하는가? 챗GPT-인간 평가의 상관관계와 챗GPT 평가의 특징에 관하여

원문정보

The Potential of ChatGPT as a Translation Evaluator: Characteristics and Comparisons to Human Evaluation.

이선우, 이상빈

한국외국어대학교 통번역연구소 통번역학연구 제29권 1호 2025.02 pp.29-51 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

In this study, we carried out a series of experiments to explore how ChatGPT (version 4o) evaluated Korean-English translations. Using two datasets of human translations (n=57) and two datasets of post-edited translations (n=56), all drawn from Lee and Lee (2021), we adopted two evaluation approaches with strict prompt control. In Experiment A, ChatGPT rated the four datasets freely on a five-point scale without specific criteria. In Experiment B, which was conducted concurrently with Experiment A, ChatGPT rated the same datasets using a prescribed, criterion-referenced five-point scale. To assess intra-rater reliability, we repeated both experiments one month later. This study yielded both quantitative and qualitative findings, including the following: (1) ChatGPT’s average scores differed significantly from those of human raters; (2) correlations between human and ChatGPT scores ranged from ‘moderate’ to ‘strong’; (3) the use of the prescribed rating scale improved ChatGPT’s reliability as a rater; (4) ChatGPT exhibited very low intra-rater reliability; and (5) ChatGPT’s self-justifications for its ratings varied in quality, often failing to identify obvious errors.

1. 서론
2. 선행연구 검토
2.1. 챗GPT와 외국어 작문/번역
2.2. 챗GPT를 평가 도구로 활용한 연구
3. 연구 방법
4. 분석 결과
4.1. 정량분석
4.1.1. 평균 비교
4.1.2. 평가자 간의 상관관계
4.1.3. 평가자 내 신뢰도
4.2. 정성분석
4.2.1. 평가 근거와 관련된 특징
4.2.2. 척도에 따른 평가 차이
4.2.3. 챗GPT 평가 사례
5. 결론
참고문헌

키워드

저자정보

이선우 Lee, Sun-Woo. 한국외국어대학교
이상빈 Lee, Sang-Bin. 한국외국어대학교

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

6,000원

0개의 논문이 장바구니에 담겼습니다.

earticle