


CNN 기반의 음성 잔향 제거 기술에서 음성 품질 고도화를 위한 다양한 뉴럴 보코더의 성능 비교


Comparison of Neural Vocoders for Speech Quality Improvement in CNN-based Speech Dereverberation

전찬준, 최우열

Reverberation degrades speech quality, and impairs speech intelligibility. This degradation can also cause difficulties in the process of analyzing speech signals and conducting scientific investigations. In addition, in case of reverberant speech, since the performance of speech recognition is degraded, dereverberation technique is widely employed as a preprocessing. In this paper, we compare the performance of various neural vocoders in a dereverberation technique based on convolutional neural network(CNN). The U-Net architecture was utilized for dereverberation, and WaveGlow, MelGAN, and Griffin Lim were employed as vocoders. These vocoders have a role of receiving speech features as input and reconstruct to speech signals in time-domain. In particular, recent neural vocoders receive mel-spectrogram as an input feature and can reconstruct to high-quality speech signals. To compare the performance of the neural vocoder, we measured perceptual evaluation of speech quality(PESQ), and it was confirmed that all values were relatively high compared to the existing reverberant signals.


Ⅰ. 서론
Ⅱ. 합성곱 신경망 기반의 음성 잔향 제거
Ⅲ. 성능 평가
Ⅳ. 결론
Ⅴ. 사사
Ⅵ. 참고문헌


  • 전찬준 Chanjun Chun. 조선대학교 컴퓨터공학과 교수
  • 최우열 Wooyeol Choi. 조선대학교 컴퓨터공학과 교수


