원문정보
초록
영어
In this paper multiple senses of some Korean ambiguous words are discriminated on the basis of Bayesian inference which utilizes the conditional probability widely accepted in mathematics. A POS tagged 8.1 million words Korean corpus was used as the resource of the linguistic informations for disambiguation. As a result of disambiguational experiment on the 13 words(9 nouns and 4 verbs) by computational programming of the algorithm based on the Bayesian inference, the whole precision accomplished 81.5%(25981/31874), with 83.5%(12546/15030) for nouns and 79.8%(13435/16844) for verbs respectively. In the course of the experiment some parametric variations were engaged to reveal the optimistic condition for this methodological process. The focus was set on the effect of the variation of the smoothing values from 0.9 to 0.0001 which is substituted for the value 0 of the co-occurrence frequency of a word in the context, and to the contrary of general expectations, smoothing value 0.1 resulted in the topmost precision. In addition to the machine process and its promising result, the way how the individual words of the sentences in the corpus are to be treated under the Bayesian inference is exemplified in this paper in detail, thus clarifying the methodological understanding.
목차
1. 서론
2. 베이즈 조건 확률과 언어 자료
2.1. 조건 확률과 베이즈 추론
2.2. Bayes 추론과 언어 처리
3. Bayes 추론에 의한 단어 중의성 해소
4. 실험의 평가
5. 결론
참고문헌