earticle

논문검색

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

원문정보

초록

영어

Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

목차

Abstract
1. Introduction
2. Data imbalance problem
2.1 Over-sampling
2.2 Under-sampling
2.3 Fractional instances
2.4 Cost for misclassification
2.5 Imbalance-aware learning algorithms
3. Heterogeneous ensemble
3.1 Bagging
3.2 Boosting
3.3 Arcing
3.4 Stacking
4. Our approach
5. Experiments
5.1 Datasets
5.2 Experimental Results
6. Conclusion
Acknowledgement
References

저자정보

  • Dae-Ki Kang Dept. of Computer Engineering, Dongseo University, Busan, South Korea
  • Min-gyu Han ICT Convergence Program, Hansung University, Seoul, South Korea

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 4,000원

      0개의 논문이 장바구니에 담겼습니다.