음성 기반 치매 조기 진단 연구의 확장 : Mel-Spectrogram 중심 접근의 한계와 언어적 특징 기반 다중모달 분석

장관종

음성 기반 치매 조기 진단 연구의 확장 : Mel-Spectrogram 중심 접근의 한계와 언어적 특징 기반 다중모달 분석

원문정보

An Extension of Voice-Based Early Dementia Diagnosis Research : Limitations of Mel-Spectrogram–Centered Approaches and a Linguistic Feature–Based Multimodal Analysis

장관종

국제차세대융합기술학회 차세대융합기술학회논문지 제10권 3호 2026.03 pp.740-754 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

As the acceleration of population aging intensifies the importance of early dementia screening, voice-based artificial intelligence is gaining significant attention as a non-invasive and cost-effective alternative. A previous study utilized voice signals converted into Mel-spectrograms and applied them to models such as CNN and ViT, but found that classification accuracy remained at approximately 61–62%, confirming structural limitations in capturing dementia-specific cognitive decline using only acoustic features . To overcome these limitations, this study proposes a multimodal analysis approach that integrates linguistic features extracted from transcribed text using the ADRESS-2020 dataset. The experimental results demonstrated that just 14 linguistic variables—including lexical diversity, syntactic complexity, and semantic coherence—achieved a cross-validation accuracy of 76.8%, significantly outperforming the acoustic-based models from the previous study. In terms of performance efficiency relative to the number of features, linguistic features recorded substantially higher values than high-dimensional deep learning features, proving that qualitative feature engineering is crucial in small-scale medical data environments. While the combination of acoustic and linguistic features yielded the most balanced results in terms of stability and variance, a performance decline due to the "curse of dimensionality" was observed when all high-dimensional features were combined. In model comparisons, logistic regression exhibited the most superior generalization performance, suggesting that simple, interpretable models are more practical for real-world clinical settings. This study empirically validates that multimodal analysis centered on linguistic features is a core strategy for enhancing the accuracy and reliability of early dementia screening.

한국어

고령화 사회의 가속화로 치매 조기 선별의 중요성이 강조되는 가운데, 음성 기반 인공지능은 비침습적·저비 용 대안으로 주목받고 있다. 선행 연구에서는 음성 신호를 멜 스펙트로그램(Mel-Spectrogram)으로 변환하여 CNN, ViT 모델 등에 적용하였으나, 분류 정확도가 약 61~62% 수준에 머물며 음향적 특징만으로는 치매 특유의 인지 저 하를 포착하는 데 구조적 한계가 있음을 확인하였다. 본 연구는 이러한 한계를 극복하기 위해 ADRESS-2020 데이 터셋을 기반으로 전사 텍스트에서 추출한 언어적 특징을 결합한 다중모달 분석 접근을 제안하였다. 연구 결과, 어휘 다양성, 문장 복잡도, 의미적 응집성 등 14개의 언어적 변수만으로도 교차검증 정확도 76.8%를 달성하며 선행 연구 의 음향 기반 모델 성능을 크게 상회하였다. 특히 특징 수 대비 성능 효율성 측면에서 언어적 특징은 고차원 딥러닝 특징보다 월등히 높은 수치를 기록하여, 소규모 의료 데이터 환경에서 특징의 질적 설계가 중요함을 입증하였다. 음 향과 언어 특징의 결합은 안정성과 분산 측면에서 가장 균형 잡힌 결과를 나타냈으나, 모든 특징을 결합한 고차원 환경에서는 차원의 저주로 인한 성능 저하가 관찰되었다. 모델 비교에서는 로지스틱 회귀가 가장 우수한 일반화 성 능을 보였으며, 이는 실제 임상 현장에서 해석 가능하고 단순한 모델의 실용성이 높음을 시사한다. 본 연구는 언어적 특징 중심의 다중모달 분석이 치매 조기 선별의 정확성과 신뢰성을 높이는 핵심 전략임을 실증하였다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 이론적 배경 및 선행연구 고찰
2.1 치매의 언어·음성적 특성과 조기 진단
2.2 음성 기반 치매 진단 연구의 주요 흐름
2.3 다중모달 치매 진단 접근의 필요성
2.4 소규모 의료데이터 환경에서의 모델선택 이슈
2.5 본 연구의 이론적 위치
Ⅲ. 연구 설계 및 분석 방법
3.1 데이터셋 구성
3.2 데이터 전처리
3.3 특징 추출 방법
3.4 분석 절차
3.5 평가 지표 및 교차검증 전략
3.6 소결
Ⅳ. 실험 결과 및 분석
4.1 데이터셋 기술통계 및 분포 시각화
4.2 특징 유형별 분류 성능 분석
4.3 모델 유형별 분류 성능 분석
4.4 MMSE 점수 예측(회귀 모델) 성능 해석
4.5 모델간 통계적 유의성 검증
Ⅴ. 논의 및 결론
5.1 연구 결과에 대한 논의
5.2 임상 적용 시나리오 및 실용적 시사점
5.3 학술적 시사점 및 연구의 한계
5.4 향후 연구 방향
5.5 결론
REFERENCES
Appendix

키워드

음성기반치매진단
언어적특징
다중모달분석
조기선별
기계학습
Voice-based dementia diagnosis
Linguistic features
Multimodal analysis
Early screening
Machine learning

저자정보

장관종 Kwanjong Chang. 호서대학교 벤처대학원 융합공학과 초빙교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,800원

0개의 논문이 장바구니에 담겼습니다.

earticle