Document Classification Using N-gram and Word Semantic Similarity

Mei-ying Ren; Sinjae Kang

Document Classification Using N-gram and Word Semantic Similarity

원문정보

Mei-ying Ren, Sinjae Kang

보안공학연구지원센터(IJUNESST) International Journal of u- and e- Service, Science and Technology Vol.8 No.8 2015.08 pp.111-118

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

This paper mainly conducted two series of experiments. One is investigation between language dependent and language independent features. Bi-grams in Korean experiments and uni-grams in Chinese contributed most as basic features. And another one is utilization of Korean WordNet to improve the performance of Korean document classification. Korean WordNet is a Korean Lexical Semantic Network. Language independent features seem can lead better performance and stable. The performance of Korean text classification was improved by using Korean WordNet.

Abstract
1. Introduction
2. Related Research
  2.1. Statistical Research
  2.2. WordNet Based Research
3. Document classification
  3.1. Data
  3.2. Document Classification
  3.3. Tools and Evaluation Measure
4. Experiments
  4.1. Korean and Chinese Document Classification Using Basic Feature Sets
  4.2 Korean Document Classification Using Korean WordNet
5. Conclusion
ACKNOWLEDGEMENTS
References

키워드

Document Classification
Feature Selection
N-gram
Korean WordNet

저자정보

Mei-ying Ren Dept. of Computer & Information Engineering, Daegu University, Republic of Korea
Sinjae Kang School of Computer & Information Technology, Daegu University, Republic of Korea

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle