earticle

논문검색

SVM Classification for High-dimensional Imbalanced Data based on SNR and Under-sampling

초록

영어

Support vector machine (SVM) is biased towards the majority class, in some case dataset is class-imbalanced and the bias is even larger for high-dimensional. In order to improve the classification accuracy of SVM on high-dimensional imbalanced data, we combine signal-noise ratio (SNR) and under-sampling technique based on K-means. In this article firstly we apply SNR into feature selection to reducing the feature amount then solve the problem of data imbalance using under-sampling technique based on K-means. To verify the feasibility of the proposed strategy, we utilize some metrics such as receiver operating characteristic curve (ROC curve) and area under the receiver operating characteristic curve (AUC value).As a result, the AUC value increased by 4%~16% before and after the process. The experimental results show that our strategy is feasible and effective exactly.

목차

Abstract
 1. Introduction
 2. Methodology
  2.1. Feature Selection
  2.2. Under-sampling based on K-means
  2.3. Support Vector Machine
 3. The Results and Analysis of Experiment
  3.1. Datasets
  3.2. Evaluation Index of Experiment
  3.3. Evaluation Index of Experiment
 4. Conclusion
 Acknowledgements
 References

저자정보

  • Li Peng School of Software, Harbin University of Science and Technology, 150080 Harbin, China, School of Computer Science and Technology, Harbin University of Science and Technology, 150080 Harbin, China
  • Bi Ting-ting School of Computer Science and Technology, Harbin University of Science and Technology, 150080 Harbin, China
  • Liu Yang School of Computer Science and Technology, Harbin University of Science and Technology, 150080 Harbin, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.