제한된 의료 데이터셋에서 효과적인 트랜스포머와 합성곱 신경망 융합

최찬영; 이상웅

원문정보

Effective Fusion of Transformer and Convolutional Neural Networks on Limited Medical Datasets

최찬영, 이상웅

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 논문지 Vol.19 No.5 2023.10 pp.20-31 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Transformer models have been increasingly used in artificial intelligence research. Transformer architectures are used as a basic structure in various research because they can learn more weights than convolutional neural networks. However, transformers have the limitation of requiring large datasets to learn. As a result, it is difficult to apply transformer architectures in medical fields where available datasets are limited. Therefore, this study proposes a fusion architecture that can train transformers even with limited medical datasets. The fusion architecture combines a transformer encoder and a convolutional neural network decoder, which allows the transformer encoder to converge stably and learn both global and local features in the image. In addition, a loss function that combines three loss functions is used for fast and stable image segmentation learning. In this study, the superiority and stability of the proposed architecture are validated using two colonoscopic polyp datasets. The fusion architecture is expected to be used in various fields, such as medical image segmentation, medical image analysis, and disease detection in medical images, in addition to polyp segmentation.

한국어

최근 인공지능 연구에서 트랜스포머 모델이 주로 사용되고 있다. 트랜스포머 구조는 합성곱 신경망에 비해 더 많은 가중치를 학습시킬 수 있으므로 다양한 연구에서 기본 구조로 사용되고 있다. 그러나 트랜스포머는 대규모 데이터를 활용하여 학습해야 한다는 한계가 있다. 그로 인해 데이터가 한정적인 의료 분야에서는 트랜스포머 구조 적용에 어 려움이 있다. 따라서, 본 연구에서는 제한된 의료 데이터셋으로도 트랜스포머를 학습할 수 있는 융합 구조를 제안한 다. 융합 구조는 트랜스포머 인코더와 합성곱 신경망 디코더를 결합함으로써, 학습 과정에서 트랜스포머 인코더가 안정적으로 수렴할 수 있게 하고, 영상 내의 전역적 특징과 지역적 특징을 모두 학습할 수 있게 한다. 또한, 빠르고 안정적인 영상 분할 학습을 위해 세 가지의 손실함수를 결합한 손실함수를 이용한다. 본 연구에서는 두 개의 대장내 시경 용종 데이터셋을 사용하여 제안하는 구조의 우수성과 안정성을 검증한다. 융합 구조는 용종 분할과 같은 의료 영상 분할 외에도 의료 영상 분석, 의료 영상 내 질병 탐지 등 여러 분야에서 활용될 수 있을 것이라 조망된다.

요약
Abstract
1. 서론
2. 관련 연구
2.1 합성곱 신경망 기반 컴퓨터 비전 연구
2.2 트랜스포머 기반 컴퓨터 비전 연구
2.3 하이브리드 구조
3. 트랜스포머와 합성곱 신경망 융합 구조
3.1 트랜스포머 인코더
3.2 합성곱 신경망 디코더
3.3 손실함수
4. 용종 분할 실험
4.1 실험 환경
4.2 평가 지표
4.3 실험 결과
5. 결론
Acknowledgement
참고문헌

키워드

저자정보

최찬영 Chan-Young Choi. 가천대학교 AI·소프트웨어학부 석사 과정
이상웅 Sang-Woong Lee. 가천대학교 AI‧소프트웨어학부 교수

참고문헌

자료제공 : 네이버학술정보

earticle