earticle

논문검색

The KNN based Uyghur Text Classification and its Performance Analysis

초록

영어

This paper takes the automatic classification of the large-scale Uyghur text collected from the network as research background, designed the functional block structure of the Uyghur text classification system, and chose the KNN algorithm as the classification engine, and programmed the classification system using C sharp. In the preprocessing part, combining with the Uyghur language’s lexical characteristics, we introduced the stem extraction method into the procedure, and then have greatly reduced the whole feature dimensions. the classification experimental results on the basis of large-scale text corpus includes more than 3000 documents which are belongs to different 10 categories are given, and the results of the classification experiments for the different number of features selected by using x2 statistical method are also given. The results show that only 3% to 5% of the whole high dimensional features are crucial to higher classification accuracy, so it is possible how to determine what those best features are or further reducing the feature space dimensions which are the interesting issues to be further continued.

목차

Abstract
 1. Introduction
 2. Uyghur Text Preprocessing
  2.1. Uyghur Text Features
  2.2. Uyghur Text Preprocessing
  2.3. Feature Selection
 3. Text Categorization Algorithm
 4. Text Categorization Experiments and Analysis
  4.1. Data Sets
  4.2. Evaluation Parameters
  4.3. Experimental Results Analysis
 5. Conclusions
 Acknowledgements
 References

저자정보

  • Palidan Tuerxun School of information and technology, Northwestern University, Xi’an, China, School of Software, Xinjiang University, Urumqi, Xinjiang, China
  • Fang Dingyi School of information and technology, Northwestern University, Xi’an, China
  • Askar Hamdulla School of Software, Xinjiang University, Urumqi, Xinjiang, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.