원문정보
초록
영어
Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.
목차
1. Introduction
2. Data imbalance problem
2.1 Over-sampling
2.2 Under-sampling
2.3 Fractional instances
2.4 Cost for misclassification
2.5 Imbalance-aware learning algorithms
3. Heterogeneous ensemble
3.1 Bagging
3.2 Boosting
3.3 Arcing
3.4 Stacking
4. Our approach
5. Experiments
5.1 Datasets
5.2 Experimental Results
6. Conclusion
Acknowledgement
References