원문정보
초록
영어
Speech Emotion Recognition (SER) is a hot research topic in the field of Human Computer Interaction (HCI). In this paper, we recognize three emotional states: happy, sad and neutral. The explored features include: energy, pitch, linear predictive spectrum coding (LPCC), mel-frequency spectrum coefficients (MFCC), and mel-energy spectrum dynamic coefficients (MEDC). A German Corpus (Berlin Database of Emotional Speech) and our self-built Chinese emotional databases are used for training the Support Vector Machine (SVM) classifier. Finally results for different combination of the features and on different databases are compared and explained. The overall experimental results reveal that the feature combination of MFCC+MEDC+ Energy has the highest accuracy rate on both Chinese emotional database (91.3% ) and Berlin emotional database (95.1% ).
목차
1. Introduction
2. Speech Database
3. Speech Emotion Recognition System
4. Feature Extraction
4.1. Energy and Related Features
4.2. Pitch and Related Features
4.3. Linear Prediction Cepstrum Coefficients (LPCC)
4.4. Mel-Frequency Cepstrum Coefficients (MFCC)
4.5. Mel Energy Spectrum Dynamic coefficients (MEDC)
5. Experiment and Results
5.1. SVM Classification Algorithm
5.2. Training Models
5.3. Experimental Results
6. Conclusion and Future Works
References