기계학습을 위한 색 추적 기반 데이터셋 구축 자동화 기법

박라온; 고소정; 김성백

원문정보

Automated Data Set Generation Scheme based on Color-tracking for Machine Learning

박라온, 고소정, 김성백

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 논문지 Vol.17 No.2 2021.04 pp.19-30 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

One of the most important steps for solving problems through machine learning is to build a data set. Most previous methods of the data set generation are using commands to inflate data by changing small amounts of images during training, and image crawling in Google Image Search. However, these methods often do not guarantee the reliability of the image as data. For deeplearning training among methods of machine learning, a reliable data set must be established. Accordingly, in this study, a novel tool is designed and implemented to build a data set that can ensure reliability. We propose ways to build large data sets with little effort, and test the reliability of data sets built through them. As an example to ensure reliability, a set of flower image data is established and applied to ‘image search flower dictionary’. As a result, users can more easily and cost-effectively build the data sets needed in the deep-learning training process, and ensure reliability and efficiency in training through the generated data set. In addition, the underlying use of the ideas proposed in this study will enable the production of large sets of image data, even if the characteristics of the target to build the data set will be changed.

한국어

기계학습을 통한 문제 해결을 위한 단계에서 가장 중요한 부분 중의 하나가 데이터셋을 구축하는 것이다. 기존의 데 이터셋 구축 방법은 트레이닝 과정에서 소량의 이미지에 변화를 주어 데이터를 부풀리는 명령어를 사용하는 것과 ‘google 이미지 검색’에서 이미지 크롤링을 사용하는 것이 대부분이다. 하지만 이러한 방법들은 데이터로서 이미지 의 신뢰성이 보장되어 있지 않는 경우가 자주 발생한다. 기계학습 중 딥러닝 트레이닝을 위해서는 신뢰성이 보장된 데이터셋이 구축되어야 한다. 이에 따라 본 연구에서는 신뢰성을 보장받을 수 있는 데이터셋을 구축하기 위한 도구 를 설계 및 구현한다. 적은 노력으로 큰 데이터셋을 구축할 수 있는 방안을 고려하고, 그 방안을 통해 구축된 데이터 셋의 신뢰성을 테스트한다. 신뢰성을 보증하기 위한 예시로 꽃 이미지 데이터셋을 구축하여 ‘이미지 검색 꽃 사전’에 적용한다. 적용 결과로, 사용자는 딥러닝 트레이닝 과정에서 필요한 데이터셋을 보다 쉽고 경제적으로 구축할 수 있 고, 구축된 데이터셋을 통한 트레이닝에서의 신뢰성과 효율성이 있음을 보인다. 또한 본 연구의 아이디어를 기본으 로 활용한다면, 데이터셋을 구축하고자 하는 대상의 특성이 변하더라도 대량의 이미지 데이터셋을 생산할 수 있을 것이다.

요약
Abstract
1. 서론
1.1 기존 데이터셋 구축 기법의 문제점
1.2 관련 연구 동향과 본 연구의 차별점
2. 데이터셋 구축 자동화 기법 및 도구
2.1 비디오 객체 추적 알고리즘
2.2 자동 데이터셋 구축 도구 및 사용자 인터페이스
3. 데이터셋 검증
3.1 입력용 영상의 길이 및 각도 변수에 관한 검증
3.2 딥러닝 신경망 설계 및 적용
4. 결론 및 제언
참고문헌
저자소개

키워드

딥러닝
데이터셋
훈련 도구
꽃 이미지
deeplearning
data set
training tool
flower image

저자정보

박라온 Laon Park. 한국교원대학교
고소정 So-jung Ko. 제주대학교 사범대학부설고등학교
김성백 Seong Baeg Kim. 제주대학교

참고문헌

자료제공 : 네이버학술정보

earticle