데이터 확장 기법에서 손실값을 대치하는 확률 추정 방법

이종찬

데이터 확장 기법에서 손실값을 대치하는 확률 추정 방법

원문정보

Probability Estimation Method for Imputing Missing Values in Data Expansion Technique

이종찬

한국융합학회 한국융합학회논문지 제12권 제11호 2021.11 pp.91-97 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

This paper uses a data extension technique originally designed for the rule refinement problem to handling incomplete data. This technique is characterized in that each event can have a weight indicating importance, and each variable can be expressed as a probability value. Since the key problem in this paper is to find the probability that is closest to the missing value and replace the missing value with the probability, three different algorithms are used to find the probability for the missing value and then store it in this data structure format. And, after learning to classify each information area with the SVM classification algorithm for evaluation of each probability structure, it compares with the original information and measures how much they match each other. The three algorithms for the imputation probability of the missing value use the same data structure, but have different characteristics in the approach method, so it is expected that it can be used for various purposes depending on the application field.

한국어

본 논문은 불완전한 데이터를 처리하기 위해 본래 규칙개선 문제를 위해 고안되었던 데이터 확장 기법을 사용한다. 이 기법은 사건마다 중요도를 의미하는 가중치를 가질 수 있으며 각 변수를 확률값으로 나타낼 수 있는 특징이 있다. 본 논문에서의 핵심 문제가 손실값과 가장 근사한 확률을 구하여 손실값을 확률로 대치하는 것이므로, 3가지 다른 알고리즘으로 손실값에 대한 확률을 구한 후 이 데이터 구조의 형식으로 저장한다. 그리고 각각의 확률 구조에 대한 평가를 위해 SVM 분류 알고리즘으로 각각의 정보 영역을 분류하는 학습을 한 후, 본래의 정보와 비교하여 얼마나 서로 일치하느냐를 측정한다. 손실값의 대치 확률을 위한 3가지 알고리즘들은 같은 데이터 구조를 사용하고 있으나 접근 방법에서는 서로 다른 특징을 가지고 있어 적용 분야에 따라 다양한 용도로 이용될 수 있기를 기대한다.

키워드

저자정보

이종찬 Jong Chan Lee. 청운대학교 컴퓨터공학과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle