A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition

R. Thangarajan; A.M. Natarajan

A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition

원문정보

보안공학연구지원센터(IJSIP) International Journal of Signal Processing, Image Processing and Pattern Recognition vol.2 no.2 2009.06 pp.67-74

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers which are based on commonly used features like mel-frequency cepstral co-efficient (MFCC) and linear predictive coding (LPC). It is well understood that all previous auditory based feature extraction methods perform extremely well in terms of robustness due to the dominantfrequency information present in them. But these methods suffer from high computational cost. Another method called sub-band spectral centroid histograms (SSCH) integrates dominant-frequency information with sub-band power information. This method is based on sub-band spectral centroids (SSC) which are closely related to spectral peaks for both clean and noisy speech. Since SSC can be computed efficiently from short-term speech power spectrum estimate, SSCH method is quite robust to background additive noise at a lower computational cost. It has been noted that MFCC method outperforms SSCH method in the case of clean speech. However in the case of speech with additive noise, MFCC method degrades substantially. In this paper, both MFCC and SSCH feature extraction have been implemented in Carnegie Melon University (CMU) Sphinx 4.0 and trained and tested on AN4 database for clean and noisy speech. Finally, a robust speech recognizer which automatically employs either MFCC or SSCH feature extraction methods based on the variance of shortterm power of the input utterance is suggested.

Abstract
1. Introduction
  1.1. Use of dominant frequency information using time-domain analysis
  1.2. Use of dominant frequency information using short-term power spectrum
2. MFCC and SSCH features
  2.1. MFCC Feature extraction
  2.2. SSCH Feature Extraction
3.0 Combination of MFCC and SSCH feature extraction methods
  3.1 Identification of noisy speech from clean speech
4. Recognition tasks
  4.1. Preparation of Database for recognition tasks
  4.2. Evaluation and Results
5. Conclusion and Future work
References

키워드

저자정보

R. Thangarajan Assistant Professor, Department of Information Technology Kongu Engineering College
A.M. Natarajan Chief Executive and Professor Bannari Amman Institute of technology

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle