원문정보
초록
영어
An extremely crucial step in the diagnosis of cancers is to select a small number of informative genes for accurate classification. This issue has become a hot focus in the data mining of gene expression profiles. Especially for data with a large number of cancer types, many conventional classification methods show very poor performance. Here, we proposed a new approach for gene selection and multi-cancer classification based on step-by-step improvement of classification performance (SSiCP). The SSiCP gene selection algorithms were evaluated over the NCI60 and GCM benchmark datasets, with accuracy of 96.6% and 95.5% in 10-fold cross-validation, respectively. Furthermore, the SSiCP outperformed recently published algorithms when applied to another two multi-cancer data sets. Computational evidence indicated that SSiCP can avoid overfitting effectively. Compared with various gene selection algorithms, the implementation of SSiCP is simple and many of the selected genes by SSiCP are shown to be closely related to cancers.
목차
1. Introduction
2. Materials and Methods
2.1. Data Sets
2.2. Gene Pre-selection
2.3. RFE: Recursive Feature Elimination
2.4. Feature Selection Methodology
2.5. Over-fitting Evaluation of SSiCP Algorithm
2.6. Confirmation of Classification Algorithm in the Second Step of Feature Selection
2.7. Parameter Selection on Weka
3. Results
3.1. Initial Noise Removal and Comparison of Classification Algorithms
3.2. Gene Selection based on Step-by-step Improvement of Classification Performance
3.3. Comparison of Computational Results using Four Data Sets
3.4. Overfitting Evaluation
4. Discussion
5. Conclusion
Acknowledgements
Disclosure
References