earticle

논문검색

Human-Machine Interaction Technology (HIT)

Enhanced Explainable AI Framework for Diabetes Prediction

초록

영어

Diabetes mellitus represents a significant global health challenge requiring accurate early prediction and transparent clinical decision-making tools. While traditional machine learning models achieve high predictive accuracy, their "black-box" nature limits clinical adoption due to lack of interpretability. We developed an ensemble model combining Random Forest, XGBoost, and Logistic Regression using soft voting classification on the Pima Indians Diabetes Dataset. Data preprocessing included Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance and feature standardization. Model explanations were generated using LIME and SHAP, which were subsequently processed by GPT-3.5-turbo to produce natural language clinical interpretations for individual patient predictions. Our hybrid approach successfully bridges the gap between machine learning accuracy and clinical interpretability. The framework demonstrates significant potential for real-world clinical deployment by providing both accurate predictions and comprehensible explanations, thereby supporting evidence-based diabetes care and improving patient outcomes. The core contribution of this study is not merely improving prediction accuracy, but proposing a novel explainable framework that integrates XAI techniques with large language models to generate natural language clinical interpretations that are easily understood by both healthcare professionals and patients.

목차

Abstract
1. INTRODUCTION
1.1 MACHINE LEARNING IN DIABETES PREDICTION
1.2 THE INTERPRETABILITY CHALLENGE
1.3 EXPLAINABLE AI IN HEALTHCARE
1.4 LARGE LANGUAGE MODELS
1.5 CONTRIBUTION
2. METHODS
2.1 DATASET DESCRIPTION
2.2 DATASET VALIDATION AND CONTEMPORARY RELEVANCE
2.3 DATA PREPROCESSING
2.4 ENSEMBLE MODEL DESCRIPTION
2.5 MODEL TRAINING AND EVALUATION
2.6 EXPLAINABILITY ANALYSIS
2.7 LARGE LANGUAGE MODEL INTEGRRATION
2.8 STATISTICAL ANALYSIS
3. RESULTS
3.1 DATASET CHARACTERISTICS
3.2 MODEL PERFORMANCE EVALUATION
3.3 FEATURE IMPORTANCE ANALYSIS
3.4 INDIVIDUAL CASE ANALYSIS
3.5 COMPUTATIONAL PERFORMANCE
4. DISCUSSION
4.1 PRINCIPAL FINDINGS
4.2 CLINICAL SIGNIFICANCE AND IMPACT
4.3 TECHNICAL INNOVATION AND METHODOLOGICAL CONTRIBUTIONS
4.4 COMPARISON WITH PREVIOUS STUDIES
4.5 LIMITATIONS AND CONSTRAINTS
5. CONCLUSIONS AND FUTURE WORKS
ACKNOWLEDGEMENT
References

저자정보

  • ByungJoo Kim Professor, Department of EE, Youngsan University, Korea

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.