earticle

논문검색

Document Classification Using N-gram and Word Semantic Similarity

초록

영어

This paper mainly conducted two series of experiments. One is investigation between language dependent and language independent features. Bi-grams in Korean experiments and uni-grams in Chinese contributed most as basic features. And another one is utilization of Korean WordNet to improve the performance of Korean document classification. Korean WordNet is a Korean Lexical Semantic Network. Language independent features seem can lead better performance and stable. The performance of Korean text classification was improved by using Korean WordNet.

목차

Abstract
 1. Introduction
 2. Related Research
  2.1. Statistical Research
  2.2. WordNet Based Research
 3. Document classification
  3.1. Data
  3.2. Document Classification
  3.3. Tools and Evaluation Measure
 4. Experiments
  4.1. Korean and Chinese Document Classification Using Basic Feature Sets
  4.2 Korean Document Classification Using Korean WordNet
 5. Conclusion
 ACKNOWLEDGEMENTS
 References

저자정보

  • Mei-ying Ren Dept. of Computer & Information Engineering, Daegu University, Republic of Korea
  • Sinjae Kang School of Computer & Information Technology, Daegu University, Republic of Korea

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.