Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning

최맹식; Kim Hark Soo

정보처리학회 논문지

이 학술지 인용지수 조회 이 학술지 논문 검색

약어 : KTSDE

2011, vol.18, no.1, 통권 136호 pp. 45-50 (6 pages)

UCI : G704-B00564.2011.18.1.006

발행기관 : 한국정보처리학회

이 기관으로 동향 분석 이 기관 일반 현황

연구분야 :

공학

이 분야로 동향 분석 이 분야로 인용지수 조회 이 분야로 논문 검색

최맹식 ¹ , 김학수 /Kim Hark Soo ²

¹강원대학교

²강원대학교

초록

한국어 형태소 분석에서 미등록 형태소 오류들은 2가지 유형으로 나뉜다. 첫 번째 오류 유형은 형태소 분석기가 어떤 형태소열도 찾아내지 못하는 것이고, 두 번째 오류 유형은 등록 형태소들의 잘못된 조합을 찾아내는 것이다. 지금까지 대부분의 기존 미등록 형태소 추정 기술들은 단지 첫 번째 오류 유형에만 초점을 맞추어 왔다. 본 논문에서는 2가지 유형의 오류들의 모두 다룰 수 있는 미등록 형태소 추정 방법을 제안한다. 제안 방법은 SVM(Support Vector Machine)을 이용하여 미등록 형태소 오류들을 포함할 가능성이 있는 어절들을 검출한다. 그리고 CRFs(Conditional Random Fields)를 이용하여 검출된 어절들의 형태소 분리와 품사 태깅을 수행한다. 실험에서 제안 방법은 기능어 최장 일치 기반의 전형적인 방법보다 뛰어난 성능을 보였다. 실험 결과에 기초하여 미등록 형태소 오류의 두 번째 유형이 한국어 형태소 분석의 성능을 올리기 위해서 꼭 다루어져야 한다는 것을 알 수 있었다.

Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.

키워드

인용현황

기계학습에 기반한 한국어 미등록 형태소 인식 및 품사 태깅 Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning

초록 열기/닫기 버튼

키워드열기/닫기 버튼

피인용 횟수

인용현황

KCI에서 이 논문을 인용한 논문의 수는 5건입니다. 열기/닫기 버튼

참고문헌(8) 열기/닫기 버튼 * 2023년 이후 발행 논문의 참고문헌은 현재 구축 중입니다.