원문정보
Automatic Evaluation of Human Translation using BERT.
초록
영어
The most recent language model in NLP, known as BERT, has numerous advantages over other language models. It ‘understands’ human language in a subword unit and can recognize rare words and out-of-vocabulary words. Furthermore, it considers syntax and context during the text-understanding process. We applied the evaluation metric “BERTscore” using this BERT model to evaluate human translation (HT). 120 translated texts were evaluated with five evaluation metrics: BLEU, METEOR, emBLEU, emMETEOR, as well as BERTscore and the result was compared with professional translators’ evaluation. The comparison examines the validity and reliability of these metrics, particularly the BERTscore, for future application for HT evaluation. BERTscore demonstrated a stable performance, taking first place in scores, and third in ranks. The validity of metrics of word2vec models, especially that of emBLEU, was somewhat disappointing, probably owing to the domain difference between the training corpus and test corpus.
목차
1. 서론
2. BERTscore와 다른 자동평가 모델
2.1. 기본 모델 – BLEU, METEOR
2.2. 워드투벡터 활용 모델 – emBLEU, emMETEOR
2.3. 버트 활용 모델 - BERTscore
3. 인간평가와의 비교
4. 실험
4.1. 실험 코퍼스 구축
4.2. 평가와 분석
5. 결과 및 토론
5.1. 인간평가
5.2. 자동평가
5.3. 인간평가와 자동평가 비교
6. 요약 및 결론
참고문헌
