원문정보
Deep Learning Models for Electroglottography and Voice conversion to Text using Transfer Learning
초록
영어
In this paper, we present a comparative study on performance of deep learning models for electroglottography (EGG) and voice conversion to text using transfer learning. In this regard, we deployed range of deep learning models such as ResNet101, MobileNetv2, GoogleNet for text recognition using electroglottography and voice signals correspondingly. Firstly, short-time Fourier transform (STFT) is utilized to generate spectrogram using time-series signals (EGG, Voice). Spectrogram images are resized to fulfill the requirement of pre-trained models (ImageNet-weights). Subsequently, rigorous experiments have been performed with various combinations of EGG, Voice and hybrid (EGG and voice). In addition, we have studied the impact of healthy and pathology signals using SVD dataset. Expectedly, the accuracies of healthy voice signals were significantly higher as compared to pathology signals. We analyzed the performance of each model under two combinations (healthy and mix). ResNet 101 outperforms other models in terms of generalizability as the accuracies were significantly higher in all three scenarios. The highest accuracy of RestNet 101 in the scenario of healthy and mix for voice signal is 98.10 and 88.57 respectively.
목차
1. Introduction
2. Related works
3. Proposed methodology:
3.1. Transfer learning
3.2. Experiment setup
4. Experimental result
5. Conclusions and future work
Acknowledgement
References