Improving Accuracy of Machine Learning-Based Prediction Model for Heart Disease Classification Using Information Gain and DBSCAN

Norma Latif Fitriyani; Muhammad Syafrudin; Ganjar Alfian

Improving Accuracy of Machine Learning-Based Prediction Model for Heart Disease Classification Using Information Gain and DBSCAN

원문정보

Norma Latif Fitriyani, Muhammad Syafrudin, Ganjar Alfian

한국경영정보학회 한국경영정보학회 정기 학술대회 2022 경영정보관련학회 춘계통합학술대회 2022.06 pp.506-509

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Accuracy improvement of classification model becomes main research objective in various fields. Selecting important features and removing outliers of a dataset are two effective solutions for improving model accuracy. Information Gain is one of the feature selection methods that can be considered as a solution for selecting important features of a dataset. Information Gain selects the variable that maximizes the information gain, which in turn minimizes the entropy and best splits the dataset into groups for effective classification. Aside of selecting important feature, removing outlier is also necessary for improving accuracy of the classification model. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the powerful outlier removal methods which can identify with significant accuracy the clusters of random shape and size in large databases corrupted with noise. Therefore, in this study, we propose the accuracy improvement of heart disease classification model using Information Gain and DBSCAN applied to various machine learning algorithms. One publicly available heart disease dataset (Cleveland) is utilized in this study to build the classification model. The results showed that after implementing Information Gain, the accuracy of the model applied to Gaussian Naïve Bayes, Logistic Regression, Multi-Layer Perceptron, Support Vector Machine, Decision Tree, Random Forest, and Extreme Gradient Boosting algorithms increases as much as 1.31% in average. The accuracy also increases when DBSCAN is applied to the model after utilizing Information Gain, with the number of improvements is around 0.62%.

키워드

저자정보

Norma Latif Fitriyani Department of Data Science, Sejong University
Muhammad Syafrudin Department of Artificial Intelligence, Sejong University
Ganjar Alfian Department of Electrical Engineering and Informatics, Vocational College, Universitas Gadjah Mada

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.
※ 학술발표대회집, 워크숍 자료집 중 4페이지 이내 논문은 '요약'만 제공되는 경우가 있으니, 구매 전에 간행물명, 페이지 수 확인 부탁 드립니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle