원문정보
초록
영어
Class imbalance is a problem that is very much critical in many real-world application domains of machine learning. When examples of one class in a training data set vastly outnumber examples of the other class(es), traditional data mining algorithms tend to create suboptimal classification models. Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid feature selection algorithm dubbed as Class Imbalance Learning using Intelligent Under Sampling (CILIUS), for learning from skewed training data. This algorithm provides a simpler and faster alternative by using C4.5 as base algorithm. We conduct experiments using four UCI data sets from various application domains using five learning algorithms for comparison and five evaluation metrics. Experimental results show that our method has higher Area under the ROC Curve, F-measure, Precision, TP rate and low TN rate values than many existing class imbalance learning methods.
목차
1. Introduction
2. Data Balancing
3. Class Imbalance Learning using Intelligent Under-Sampling
3.1. Preparation of the Subsets
3.2. Influential Feature Subset Detection
3.3. Choosing Feature Class Label Noise Ranges
3.4. Forming the Balance Dataset
4. Dataset Details
4.1. Datasets
4.2. Performance Evaluation Criteria’s
5. Experimental Settings
5.1. Algorithms and Parameters
6. Results
7. Conclusion
Acknowledgements
References