원문정보
초록
영어
The Korea Acute Myocardial Infarction Registry (KAMIR) dataset has been under construction at 41 Primary PCI Centers in Korea since November 2005. Many studies for the KAMIR have proceeded via analysis of statistical approaches: student’s t-test, χ2-test, and multivariate logistic regression analysis. However, there are problems, in that features tested are selected by domain experts according to the analysis conditions, that degrees of importance for features cannot be obtained, and that the huge numbers of features and instances involved incurs a high computation load and low processing speed. Thus, we considered novel feature selection methods using Gini-Index for prediction of the major features and reduction of feature space dimension. Unfortunately, only few studies on Gini-Index based nominal feature selection have as yet been completed, and problems in extracting representative features remain for 1) unbalanced dataset for classes, 2) instances having almost all of the features of the datasets, and 3) instances having almost all features with non-null values. Thus, for the datasets, the features selected are not discriminated for each class. In an effort to solve these problems and enable obtainment of good representative features for each class, we introduce here a novel Gini-Index feature selection algorithm for nominal datasets. We tested the algorithm for prediction of major features of AMI patients from the KAMIR. In the results, it can shows the degrees of importance for features with Gini values, and select the major features for given conditions without help by experts.
목차
1. Introduction
2. Existing Gini-Index for Text Feature Selection
3. Novel Gini-Index Algorithm for Nominal Dataset
3.1. Reformulated Gini-Index Expressions
3.2. Novel Nominal Gini-Index Algorithm for Feature Selection
4. Experiments and Evaluation with KAMIR
4.1. KAMIRDataset
4.2. Experiments and Evaluations
4.3. Experimental Results
5. Conclusions
Acknowledgements
References