초록 열기/닫기 버튼


In this paper I review two approaches - theoretical and statistical approach-, to Korean collocations and suggest a best statistical model for detecting collocation constructions. Pure theory-based approaches to collocations have problems of defining collocations and separating collocations from other free constructions or idioms. Statistical approaches, on the other hand, have been criticised because much focus lies only on frequencies of collocations and thus non-collocation constructions are highly ranked. I adopted the log likelihood ratio for the statistical methodology and a sub data model in which specific syntactic and grammatical structures such as ‘subject-predicate’, and ‘object-predicate’ constructions are taken into account. The log likelihood ratio is applied to not a whole corpus but the sub data model, which has advantage over low frequencies of content words and garbages from collocate words. This analysis showed that the ratio is much more suitable for Korean collocations compared to other statistical methods. Thus the approach enables us to avoid the vagueness of theoretical approaches to Korean collocations and come up with statistically well-founded collocation constructions.


키워드열기/닫기 버튼

log likelihood ratio, collocation, statistics, specific grammatical relations