A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech

Imen Trabelsi; Dorra Ben Ayed

A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech

원문정보

보안공학연구지원센터(IJSIP) International Journal of Signal Processing, Image Processing and Pattern Recognition Vol.6 No.2 2013.04 pp.33-42

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Several speaker identification systems are giving good performance with clean speech but are affected by the degradations introduced by noisy audio conditions. To deal with this problem, we investigate the use of complementary information at different levels for computing a combined match score for the unknown speaker. In this work, we observe the effect of two supervised machine learning approaches including support vectors machines (SVM) and naïve bayes (NB). We define two feature vector sets based on mel frequency cepstral coefficients (MFCC) and relative spectral perceptual linear predictive coefficients (RASTA-PLP). Each feature is modeled using the Gaussian Mixture Model (GMM). Several ways of combining these information sources give significant improvements in a text-independent speaker identification task using a very large telephone degraded NTIMIT database.

Abstract
1. Introduction
2. Speaker Modeling
3. Classifiers
  3.1. Naïve Bayes (NB)
  3.2. Support Vector Machines (SVM)
4. Multi Level Data Fusion
  4.1. Feature Level Fusion
  4.2. Scores Level Fusion
5. Experiments and Results
  5.1. Corpus Description
  5.2. Feature Extraction
  5.3. Baseline System
  5.4. Results
6. Conclusion
References

키워드

저자정보

Imen Trabelsi Electrical Engineering Department, National Engineering School of Tunis Signal, Image and Pattern Recognition Research Unit
Dorra Ben Ayed Electrical Engineering Department, National Engineering School of Tunis Signal, Image and Pattern Recognition Research Unit

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle