Thesaurus-Based Semantic Smoothing in Language Modeling for Chinese Document Retrieval

Liqi Gao; Ting Liu; Ru Chen; Yu Zhang

Thesaurus-Based Semantic Smoothing in Language Modeling for Chinese Document Retrieval

원문정보

Liqi Gao, Ting Liu, Ru Chen, Yu Zhang

한국어정보학회 한국어정보학 제8권 1호 2006.06 pp.58-63

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Language modeling for Information Retrieval proposed a few years ago has been attractive
and improved the performance of IR systems effectively comparing to classic models and
approaches. Smoothing technology in parameter estimations is one of main problems in carrying
out language models. The performance of IR system will be enhanced by effective smoothing
methods. Semantic smoothing has been developed recently for language modeling with some
knowledge of language. This paper presents a modification to a smoothing approach in general
language model combining with translation modeling, which is taking synonyms in documents
and the collection into account for semantic smoothing and performance improving in Chinese
document retrieval. The synonym knowledge is from a well‐known thesaurus in Chinese NLP,
called Tongyici Cilin (Extended). A comparison shows that the semantic smoothed approach
brings approximately 1.33% improvement on average.

키워드

Language Model for Information Retrieval; Semantic Smoothing; Chinese Thesaurus

저자정보

Liqi Gao Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology
Ting Liu Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology
Ru Chen Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology
Yu Zhang Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle