원문정보
초록
영어
Diabetes mellitus represents a significant global health challenge requiring accurate early prediction and transparent clinical decision-making tools. While traditional machine learning models achieve high predictive accuracy, their "black-box" nature limits clinical adoption due to lack of interpretability. We developed an ensemble model combining Random Forest, XGBoost, and Logistic Regression using soft voting classification on the Pima Indians Diabetes Dataset. Data preprocessing included Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance and feature standardization. Model explanations were generated using LIME and SHAP, which were subsequently processed by GPT-3.5-turbo to produce natural language clinical interpretations for individual patient predictions. Our hybrid approach successfully bridges the gap between machine learning accuracy and clinical interpretability. The framework demonstrates significant potential for real-world clinical deployment by providing both accurate predictions and comprehensible explanations, thereby supporting evidence-based diabetes care and improving patient outcomes. The core contribution of this study is not merely improving prediction accuracy, but proposing a novel explainable framework that integrates XAI techniques with large language models to generate natural language clinical interpretations that are easily understood by both healthcare professionals and patients.
목차
1. INTRODUCTION
1.1 MACHINE LEARNING IN DIABETES PREDICTION
1.2 THE INTERPRETABILITY CHALLENGE
1.3 EXPLAINABLE AI IN HEALTHCARE
1.4 LARGE LANGUAGE MODELS
1.5 CONTRIBUTION
2. METHODS
2.1 DATASET DESCRIPTION
2.2 DATASET VALIDATION AND CONTEMPORARY RELEVANCE
2.3 DATA PREPROCESSING
2.4 ENSEMBLE MODEL DESCRIPTION
2.5 MODEL TRAINING AND EVALUATION
2.6 EXPLAINABILITY ANALYSIS
2.7 LARGE LANGUAGE MODEL INTEGRRATION
2.8 STATISTICAL ANALYSIS
3. RESULTS
3.1 DATASET CHARACTERISTICS
3.2 MODEL PERFORMANCE EVALUATION
3.3 FEATURE IMPORTANCE ANALYSIS
3.4 INDIVIDUAL CASE ANALYSIS
3.5 COMPUTATIONAL PERFORMANCE
4. DISCUSSION
4.1 PRINCIPAL FINDINGS
4.2 CLINICAL SIGNIFICANCE AND IMPACT
4.3 TECHNICAL INNOVATION AND METHODOLOGICAL CONTRIBUTIONS
4.4 COMPARISON WITH PREVIOUS STUDIES
4.5 LIMITATIONS AND CONSTRAINTS
5. CONCLUSIONS AND FUTURE WORKS
ACKNOWLEDGEMENT
References
