원문정보
초록
영어
The quality of the protein structure can be determined by physical and chemical properties, therefore it has been used to distinguish native or native like structure from other predicted structures. In this study, the machine learning classification models are explored with six physical and chemical properties to classify the root mean square deviation (RMSD) of the protein structure in absence of its true native state and each protein structure lies between 0A˚ to 6A˚ RMSD space. Physical and chemical properties used in this paper are total surface area, Euclidean distance, total empirical energy, secondary structure penalty, residue length, and pair number. There are total 24294 decoys, having 4919 native structures. Artificial bee colony algorithm is used to determine the feature importance. The K-fold cross validation is used to measure the robustness of the best classification model. The results show that random forest method outperforms other machine learning models in the classification of protein structure prediction with sensitivity of 0.72 and accuracy of 70.33% on testing data set. The data set used in the study is available at http://bit.ly/RMSD-Classification-DS.
목차
1. Introduction
2. Materials and Methods
A. Data Transformation:
B. Feature Measurement
3. Methodology
A. Artificial Bee Colony (ABC)
B. Feature Importance using ABC
C. Machine Learning Methods
4. Model Evaluation
5. Experimental Results
6. Conclusion
References