earticle

논문검색

Poster Session III 차세대컴퓨팅 기술 전 분야

전이 학습을 사용하여 전자 성문 및 음성을 텍스트로 변환하는 딥 러닝 모델

원문정보

Deep Learning Models for Electroglottography and Voice conversion to Text using Transfer Learning

Sana Javed, Shan Ullah, Deok-Hwan Kim

피인용수 : 0(자료제공 : 네이버학술정보)

초록

영어

In this paper, we present a comparative study on performance of deep learning models for electroglottography (EGG) and voice conversion to text using transfer learning. In this regard, we deployed range of deep learning models such as ResNet101, MobileNetv2, GoogleNet for text recognition using electroglottography and voice signals correspondingly. Firstly, short-time Fourier transform (STFT) is utilized to generate spectrogram using time-series signals (EGG, Voice). Spectrogram images are resized to fulfill the requirement of pre-trained models (ImageNet-weights). Subsequently, rigorous experiments have been performed with various combinations of EGG, Voice and hybrid (EGG and voice). In addition, we have studied the impact of healthy and pathology signals using SVD dataset. Expectedly, the accuracies of healthy voice signals were significantly higher as compared to pathology signals. We analyzed the performance of each model under two combinations (healthy and mix). ResNet 101 outperforms other models in terms of generalizability as the accuracies were significantly higher in all three scenarios. The highest accuracy of RestNet 101 in the scenario of healthy and mix for voice signal is 98.10 and 88.57 respectively.

목차

Abstract
1. Introduction
2. Related works
3. Proposed methodology:
3.1. Transfer learning
3.2. Experiment setup
4. Experimental result
5. Conclusions and future work
Acknowledgement
References

저자정보

  • Sana Javed Department of Electrical and Computer Engineering Inha University
  • Shan Ullah Department of Electrical and Computer Engineering Inha University
  • Deok-Hwan Kim Department of Electrical and Computer Engineering Inha University

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      0개의 논문이 장바구니에 담겼습니다.