원문정보
초록
영어
Support vector machine (SVM) is biased towards the majority class, in some case dataset is class-imbalanced and the bias is even larger for high-dimensional. In order to improve the classification accuracy of SVM on high-dimensional imbalanced data, we combine signal-noise ratio (SNR) and under-sampling technique based on K-means. In this article firstly we apply SNR into feature selection to reducing the feature amount then solve the problem of data imbalance using under-sampling technique based on K-means. To verify the feasibility of the proposed strategy, we utilize some metrics such as receiver operating characteristic curve (ROC curve) and area under the receiver operating characteristic curve (AUC value).As a result, the AUC value increased by 4%~16% before and after the process. The experimental results show that our strategy is feasible and effective exactly.
목차
1. Introduction
2. Methodology
2.1. Feature Selection
2.2. Under-sampling based on K-means
2.3. Support Vector Machine
3. The Results and Analysis of Experiment
3.1. Datasets
3.2. Evaluation Index of Experiment
3.3. Evaluation Index of Experiment
4. Conclusion
Acknowledgements
References
