불균형 정형 데이터를 위한 SMOTE와 변형 CycleGAN 기반 하이브리드 오버샘플링 기법

노정담; 최병구

불균형 정형 데이터를 위한 SMOTE와 변형 CycleGAN 기반 하이브리드 오버샘플링 기법

원문정보

A Hybrid Oversampling Technique for Imbalanced Structured Data based on SMOTE and Adapted CycleGAN

노정담, 최병구

한국경영정보학회 경영정보학연구 제24권 제4호 2022.11 pp.97-118 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

As generative adversarial network (GAN) based oversampling techniques have achieved impressive results in class imbalance of unstructured dataset such as image, many studies have begun to apply it to solving the problem of imbalance in structured dataset. However, these studies have failed to reflect the characteristics of structured data due to changing the data structure into an unstructured data format. In order to overcome the limitation, this study adapted CycleGAN to reflect the characteristics of structured data, and proposed hybridization of synthetic minority oversampling technique (SMOTE) and the adapted CycleGAN. In particular, this study tried to overcome the limitations of existing studies by using a one-dimensional convolutional neural network unlike previous studies that used two-dimensional convolutional neural network. Oversampling based on the method proposed have been experimented using various datasets and compared the performance of the method with existing oversampling methods such as SMOTE and adaptive synthetic sampling (ADASYN). The results indicated the proposed hybrid oversampling method showed superior performance compared to the existing methods when data have more dimensions or higher degree of imbalance. This study implied that the classification performance of oversampling structured data can be improved using the proposed hybrid oversampling method that considers the characteristic of structured data.

한국어

이미지와 같은 비정형 데이터의 불균형 클래스 문제 해결에 있어 생산적 적대 신경망(generative adversarial network)에 기반한 오버샘플링 기법의 우수성이 알려짐에 따라 다양한 연구들이 이를 정형 데이터의 불균형 문제 해결에도 적용하기 시작하였다. 그러나 이러한 연구들은 데이터의 형태를 비정형 데이터 구조로 변경함으로써 정형 데이터의 특징을 정확하게 반영하지 못한다는 점이 문제로 지적되고 있다. 본 연구에서는 이를 해결하기 위해 순환 생산적 적대 신경망(cycle GAN)을 정형 데이터의 구조에 맞게 재구성하고 이를 SMOTE(synthetic minority oversampling technique) 기법과 결합한 하이브리드 오버샘플링 기법을 제안하였다. 특히 기존 연구와 달리 생산적 적대 신경망을 구성함에 있어 1차원 합성곱 신경망(1D-convolutional neural network)을 사용함으로써 기존 연구의 한계를 극복하고자 하였다. 본 연구에서 제안한 기법의 성능 비교를 위해 불균형 정형 데이터를 기반으로 오버샘플링을 진행하고 그 결과를 SMOTE, ADASYN(adaptive synthetic sampling) 등과 같은 기존 기법과 비교하였다. 비교 결과 차원이 많을수록, 불균형 정도가 심할수록 제안된 모형이 우수한 성능을 보이는 것으로 나타났다. 본 연구는 기존 연구와 달리 정형 데이터의 구조를 유지하면서 소수 클래스의 특징을 반영한 오버샘플링을 통해 분류의 성능을 향상시켰다는 점에서 의의가 있다.

요약
Ⅰ. 서론
Ⅱ. 선행 연구
2.1 오버샘플링 기법
2.2 생산적 적대 신경망(GAN)을 활용한 정형 데이터 생성 연구
2.3 순환 생산적 적대 신경망(CycleGAN)
Ⅲ. 연구 방법
3.1 연구 모형
3.2 SDOCGAN의 구조
Ⅳ. 실험 및 연구 결과
4.1 데이터 셋
4.2 실험 설정
4.3 실험 결과
4.4 추가 실험
Ⅴ. 결론
5.1 연구의 시사점
5.2 한계점 및 향후 연구 방향
참고문헌
Abstract

키워드

저자정보

노정담 Jung-Dam Noh. Afreeca TV VOD 데이터 팀 주니어
최병구 Byounggu Choi. 국민대학교 경영대학 AI빅데이터융합경영학과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

5,800원

0개의 논문이 장바구니에 담겼습니다.

earticle