초록
영어
Cordycepin is the principal bioactive compound produced by Cordyceps militaris and exhibits diverse pharmacological properties. However, cordycepin production is highly sensitive to cultivation conditions, leading to substantially variable production amounts and challenges in process optimization. An interpretable machine learning framework was established in this study to predict the cordycepin produced by C. militaris cultivated on Pinus densiflora sawdust. Three key cultivation parameters—input weight, growth weight, and particle size—were quantified using submerged mycelial culture. The cordycepin content was measured via high-performance liquid chromatography. Four predictive models (random forest, support vector machine, XGBoost, and artificial neural network) were optimized through a randomized hyperparameter search and evaluated using internal validation and Tropsha’s external quantitative structure-activity relationship criteria. The validation accuracy of XGBoost was the highest (root mean square error = 42.67 μg/mL), whereas the external performance of random forest was the most reliable (R² = 0.898). Shapley additive explanations revealed that input weight most strongly influenced cordycepin production, followed by growth weight and particle size, with distinct nonlinear and interaction-driven effects among the cultivation variables. Kernel density and dependence analyses confirmed the occurrence of multimodal production regimes associated with the substrate loading and particle size characteristics. Finally, the best-performing model was deployed through a streamlit-based graphical user interface, enabling the real-time prediction of cordycepin concentration with a 95% confidence interval. The results collectively demonstrate the utility of interpretable AI-driven modeling for unveiling complex biological responses, providing a practical decision-support tool for optimizing cordycepin production in fungal biotechnologies.
목차
INTRODUCTION
MATERIALS AND METHODS
Preparation of submerged culture media
Determination of mycelium dry weight
Determination of cordycepin
Dataset preparation
Data preprocessing
Model training procedures
Hyperparameter optimization
SHAP-based model interpretation
Model evaluation and validation
Graphical user interface (GUI) implementation
RESULTS AND DISCUSSION
Descriptor screening used correlation analysis
Dependence on individual variables
Influence of cultivation parameters on cordycepin content
Training dynamics and performance convergence of predictive models
Hyperparameter optimization and model interpretation used SHAP analysis
Impact of the descriptor on the output of the model (SHAP value)
QSAR model validation and evaluation
GUI development for cordycepin content
ACKNOWLEDGEMENTS
REFERENCES
