TTS 시스템을 이용한 감정 합성 모델에 관한 연구

유은정; 신승중

TTS 시스템을 이용한 감정 합성 모델에 관한 연구

원문정보

A Study on the Model of Emotional Synthesis Using TTS System

유은정, 신승중

국제차세대융합기술학회 차세대융합기술학회논문지 제4권 4호 2020.08 pp.374-380 KCI 등재후보

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

With the development of deep learning technology, voice synthesis technology similar to the performance of audio books is being developed beyond simple TTS. But it is still focus only on the TTS of a specific speaker in natural voice type and still requires a lot of training data. So this study raises an emotional synthesis model that uses only a small amount of training data to express emotions in a synthesized voice. The system is composed of two models which are Speech Synthesis and Voice Conversion. Emotions are composed of rather complex elements and because each person may express differently, this study alone does not mean that emotions are perfectly expressed. However, the systems presented in this study can raise the possibility of more expressive speech synthesis by styling with a little training data.

한국어

Deep Learning 기술의 발전으로 단순 TTS 넘어서 오디오북의 성능과 유사한 음성 합성 기술이 개발되 고 있다. 그러나 여전히 감정이 절제되어진 낭독형으로, 특정 화자의 음성 합성에만 중점적으로 연구되고 있으며 여전히 많은 훈련 데이터를 필요로 하고 있다. 본 연구에서는 적은 훈련 데이터만을 이용하여 합성된 음성에 감정 을 표현하는 감정 합성 모델을 제기한다. 3가지의 감정이 표현된 음성합성을 위해 음성합성(Speech Synthesis)과 음성변환(Voice Conversion) 두 개의 모델로 시스템을 구성한다. 감정은 다소 복합적인 요소들로 구성이 되어있 고, 사람마다 표현하는 방식이 다를 수 있기 때문에 본 연구 실험결과만으로는 감정이 완벽하게 표현된 음성합성 이라고 볼 수 없다. 그러나 본 연구에서 제시하는 시스템은 아주 적은 훈련 데이터만으로도 합성된 음성을 다양하 게 스타일링(styling)이 가능하게 하여 좀 더 표현력이 있는 음성 합성 가능성을 제기할 수 있다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 관련 연구 동향 및 기술
2.1 Seq2Seq 모델
2.2 Attention Mechanism 모델
2.3 Tacotron 모델
2.4 GANs 모델
Ⅲ. 연구 방법 및 절차
3.1 시스템 구성
3.2 실험 데이터
3.3 실험 모델
Ⅳ. 연구 분석 및 결과
4.1 Spectrogram 비교
4.2 Objective Experiment 비교
Ⅴ. 결론
REFERENCES

키워드

저자정보

유은정 Eun-Joung Yoo. 한세대학교 IT융합 전자공학과 학생
신승중 Seung-Jung Shin. 한세대학교 ICT 융합학과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle