earticle

논문검색

AI-HUB 중국어 인공지능 학습용 데이터에 대한 품질 검증 방안 연구- ‘방송 콘텐츠 한-중 번역 병렬 말뭉치 데이터’를 중심으로

원문정보

A Study on Quality Verification of AI-HUB Chinese Artificial Intelligence Learning Data - Focusing on -Broadcast Content Korea-China Translation Parallel Corpus Data-

고권태

동북아시아문화학회 동북아 문화연구 제85집 2025.12 pp.363-382

doi:10.17949/jneac.1.85.202512.017

피인용수 : 0(자료제공 : 네이버학술정보)

초록

영어

This study proposes a systematic methodology for evaluating Korean-Chinese translation quality using AI-HUB's ‘Broadcasting Content Korean-Chinese Translation Parallel Corpus Data’. While machine translation technology has advanced significantly, research on Korean-Chinese translation characteristics remains insufficient, particularly for domains featuring extensive colloquial expressions and cultural contexts such as broadcasting content. This research selected 10,000 sentences through stratified sampling from 1.2 million sentences and evaluated translation quality using BLEU, METEOR, and TER metrics. The analysis revealed that translation quality varied significantly by genre and sentence length. Educational programs achieved the highest BLEU score of 0.467, while reality variety shows recorded the lowest at 0.371. Translation quality declined sharply as sentence length increased, from 0.518 for short sentences to 0.287 for long sentences. Error analysis identified colloquial expression mistranslations (32%), demonstrative errors (21%), and literal translations of idioms (18%) as major challenges. The methodology established reproducibility through publicly available resources (AI-HUB data, Naver Papago API, Python/NLTK), while findings suggest that educational content shows higher translation reliability compared to entertainment programs requiring careful post-editing. This study is significant as the first systematic evaluation of Korean-Chinese translation quality in the broadcasting content domain, providing specific directions for improving translation of colloquial expressions and cultural context that are characteristic of Chinese language translation.

저자정보

  • 고권태

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문이용 방식은 연계기관의 정책을 따르고 있습니다.

      • 원문보기

      0개의 논문이 장바구니에 담겼습니다.