원문정보
초록
영어
A novel approach to feature selection is proposed for data space defined over continuous features. This approach can obtain a subset of features, such that the subset features can discriminate class labels of objects and the discriminant ability is prior or equivalent to that of the original features, so to effectively improve the learning performance and intelligibility of the classification model. According to the spatial distribution of objects and their classification labels, a data space is partitioned into subspaces, each with a clear edge and a single classification label. Then these labelled subspaces are projected to each continuous feature. The measurement of each feature is estimated for a subspace against all other subspace-projected features by means of statistical significance. Through the construction of a matrix of the measurements of the subspaces by all features, the subspace-projected features are ranked in a descending order based on the discriminant ability of each feature in the matrix. After evaluating a gain function of the discriminant ability defined by the best-so-far feature subset, the resulting feature subset can be incrementally determined. Our comprehensive experiments on the UCI Repository data sets have demonstrated that the approach of the subspace-based feature ranking and feature selection has greatly improved the effectiveness and efficiency of classifications on continuous features.
목차
1. Introduction
2. Basic Concepts
2.1. Information Model and Feature Selection
2.2. Distribution Center and Radius
3. Feature Selection on Continuous Features
3.1. Covers and Its Optimization
3.2. Matrix of Feature Discriminant Ability
3.3. Feature Ranking
3.4. Gains of Discriminant Ability of Feature Subset
3.5. FSFSF Algorithm
4. Experiments
4.1. Using CoverSet as Classifier
4.2. Effectiveness of Feature Ranking
4.3. Effectiveness of Feature Selection
5. Conclusions
Acknowledgment
References