

Human-Machine Interaction Technology (HIT)

A Study on Diabetes Management System Based on Logistic Regression and Random Forest



In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms—Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification—but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.


1. Introduction
2. Literature Review
3. Proposed model
3.1 Logistic regression
3.2 Random Forest
3.3 Proposed model
3.4 Data
3.5 Cross validation
4. Experiment
4.1 Evaluation metrics
4.2 Experimental result
5. Conclusion


  • ByungJoo Kim Professor, Department of Electrical and Electronics Engineering Youngsan University, Korea


