earticle

논문검색

Session I : AI Deep Learning

Utterance-Level Speech Emotion Recognition using Parallel Convolutional Neural Network with Self- Attention Module

초록

영어

Automated speech emotion recognition (SER) by efficient long-term temporal context modeling is a challenging task of the digital audio signal processing domain. However, by default, the recurrent neural network (RNN) is employed to incorporate the temporal dependencies in sequence to investigate the relationships among sequences and features. In this study, we design a parallel convolutional neural network (PCNN) for SER by using a squeeze and excitation network (SEnet) with the self-attention module. Additionally, we adopt the residual learning strategy in both module, SEnet and self-attention, which is further improve the performance of the network. Our proposed SER system utilizes speech spectrogram as input and extracts utterancelevel discrete features by using the PCNN model. We experimentally evaluated our proposed system by standard speech corpus, interactive emotional dyadic motion capture (IEMOCAP). The prediction result reveals the significance and robustness of the proposed PCNN system, which obtained a high recognition rate of 72.01% over state-of-the-art (SOTA) methods.

목차

Abstract
I. INTRODUCTION
II. PROPOSED PCNN-BASED SER SYSTEM
A. SEnet Module
B. Self-Attention
III. RESULTS & DISCUSSION
IV. CONCLUSION & FUTURE DIRECTION
ACKNOWLEDGEMENT
REFERENCES

저자정보

  • Mustaqeem Interaction Technology Laboratory, Department of Software, Sejong University
  • Muhammad Ishaq Interaction Technology Laboratory, Department of Software, Sejong University
  • Guiyoung Son Interaction Technology Laboratory, Department of Software, Sejong University
  • Soonil Kwon Interaction Technology Laboratory, Department of Software, Sejong University

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      0개의 논문이 장바구니에 담겼습니다.