The KNN based Uyghur Text Classification and its Performance Analysis

Palidan Tuerxun; Fang Dingyi; Askar Hamdulla

The KNN based Uyghur Text Classification and its Performance Analysis

원문정보

Palidan Tuerxun, Fang Dingyi, Askar Hamdulla

보안공학연구지원센터(IJHIT) International Journal of Hybrid Information Technology Vol.8 No.3 2015.03 pp.63-72

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

This paper takes the automatic classification of the large-scale Uyghur text collected from the network as research background, designed the functional block structure of the Uyghur text classification system, and chose the KNN algorithm as the classification engine, and programmed the classification system using C sharp. In the preprocessing part, combining with the Uyghur language’s lexical characteristics, we introduced the stem extraction method into the procedure, and then have greatly reduced the whole feature dimensions. the classification experimental results on the basis of large-scale text corpus includes more than 3000 documents which are belongs to different 10 categories are given, and the results of the classification experiments for the different number of features selected by using x2 statistical method are also given. The results show that only 3% to 5% of the whole high dimensional features are crucial to higher classification accuracy, so it is possible how to determine what those best features are or further reducing the feature space dimensions which are the interesting issues to be further continued.

Abstract
1. Introduction
2. Uyghur Text Preprocessing
  2.1. Uyghur Text Features
  2.2. Uyghur Text Preprocessing
  2.3. Feature Selection
3. Text Categorization Algorithm
4. Text Categorization Experiments and Analysis
  4.1. Data Sets
  4.2. Evaluation Parameters
  4.3. Experimental Results Analysis
5. Conclusions
Acknowledgements
References

키워드

Uyghur
Text classification
KNN
Stop words

저자정보

Palidan Tuerxun School of information and technology, Northwestern University, Xi’an, China, School of Software, Xinjiang University, Urumqi, Xinjiang, China
Fang Dingyi School of information and technology, Northwestern University, Xi’an, China
Askar Hamdulla School of Software, Xinjiang University, Urumqi, Xinjiang, China

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle