결측값 대체를 위한 데이터 재현 기법 비교

김청호; 강기훈

기술 융합(TC)

결측값 대체를 위한 데이터 재현 기법 비교

원문정보

Comparison of Data Reconstruction Methods for Missing Value Imputation

김청호, 강기훈

국제문화기술진흥원 The Journal of the Convergence on Culture Technology (JCCT) Vol.10 No.1 2024.01 pp.603-608 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

한국어

무응답 및 결측값은 표본 탈락, 설문조사에 대한 답변 회피 등으로 발생하며 정보의 손실 및 편향된 추론의 가 능성이 있는 문제가 발생하게 되며, 이 경우 결측값을 적절한 값으로 바꾸는 대체가 필요하게 된다. 본 논문에서는 결측값에 대한 대체 방법으로 제안되었던 평균 대체, 다중회귀 대체, 랜덤 포레스트 대체, K-최근접 이웃 대체, 그리 고 딥러닝을 기본으로 한 오토인코더 대체와 잡음제거 오토인코더 대체 방법을 비교한다. 결측값을 대체하는 이러한 방법들에 대해 설명하고, 연속형의 모의실험 데이터와 실제 데이터에 접목시켜 각 방법들을 비교하였다. 비교 결과 대부분의 경우에서 다중 대체 방법인 랜덤 포레스트 대체 방법과 잡음제거 오토인코더 대체 방법의 성능이 좋았음을 확인하였다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 결측값 대체 방법
1. 평균 대체
2. 다중회귀 대체
3. 예측 평균매칭 대체
4. 랜덤 포레스트 대체
5. K-최근접 이웃 대체
6. 오토인코더 대체
7. 잡음제거 오토인코더 대체
Ⅲ. 모의실험
1. 개요
2. 완전임의결측 자료
3. 임의결측 자료
Ⅳ. 실제 데이터를 이용한 비교
Ⅴ. 결론
References

키워드

저자정보

김청호 Cheongho Kim. 준회원, 한국외국어대학교 통계학과 석사
강기훈 Kee-Hoon Kang. 정회원, 한국외국어대학교 통계학과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle

결측값 대체를 위한 데이터 재현 기법 비교

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문