원문정보
초록
영어
Text classification (TC) is a classic research topic in computer applications. In this paper, we firstly explore the widely used distance metrics (such as Euclidean) in TC problems, and we find that these metrics may not be appropriate for highly skewed dataset like text categorization. Therefore, a novel method of learning evidence from multiple distance metric is proposed. Based on DS theory, the evidences learnt from these distance metric are combined for improving the effectiveness of kNN based text classifier. Because the computed neighbors for the given query pattern may be from heterogeneous neighborhood sources and usually have different influence on predicting the class label. The ensemble of distance metric is tested on three standard benchmark data sets. Finally, we demonstrate the robustness of the proposed approach by a series of experiments.
목차
1. Introduction
2. Overview of Text Classification (TC)
2.1. Problem description
2.2. Text representation
2.3. kNN text classification algorithm
3. Text Classification based on Evidence Theory
4. Experimental Results
5. Conclusions
References
