전이 학습을 사용하여 전자 성문 및 음성을 텍스트로 변환하는 딥 러닝 모델

Sana Javed; Shan Ullah; Deok-Hwan Kim

Poster Session III 차세대컴퓨팅 기술 전 분야

전이 학습을 사용하여 전자 성문 및 음성을 텍스트로 변환하는 딥 러닝 모델

원문정보

Deep Learning Models for Electroglottography and Voice conversion to Text using Transfer Learning

Sana Javed, Shan Ullah, Deok-Hwan Kim

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 학술대회 2022 한국차세대컴퓨팅학회 춘계학술대회 2022.05 pp.410-413

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

In this paper, we present a comparative study on performance of deep learning models for electroglottography (EGG) and voice conversion to text using transfer learning. In this regard, we deployed range of deep learning models such as ResNet101, MobileNetv2, GoogleNet for text recognition using electroglottography and voice signals correspondingly. Firstly, short-time Fourier transform (STFT) is utilized to generate spectrogram using time-series signals (EGG, Voice). Spectrogram images are resized to fulfill the requirement of pre-trained models (ImageNet-weights). Subsequently, rigorous experiments have been performed with various combinations of EGG, Voice and hybrid (EGG and voice). In addition, we have studied the impact of healthy and pathology signals using SVD dataset. Expectedly, the accuracies of healthy voice signals were significantly higher as compared to pathology signals. We analyzed the performance of each model under two combinations (healthy and mix). ResNet 101 outperforms other models in terms of generalizability as the accuracies were significantly higher in all three scenarios. The highest accuracy of RestNet 101 in the scenario of healthy and mix for voice signal is 98.10 and 88.57 respectively.

키워드

저자정보

Sana Javed Department of Electrical and Computer Engineering Inha University
Shan Ullah Department of Electrical and Computer Engineering Inha University
Deok-Hwan Kim Department of Electrical and Computer Engineering Inha University

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

0개의 논문이 장바구니에 담겼습니다.

earticle

전이 학습을 사용하여 전자 성문 및 음성을 텍스트로 변환하는 딥 러닝 모델

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문