DOI QR코드

DOI QR Code

Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition

강인한 음성인식을 위한 극점 필터링 및 스케일 정규화를 이용한 켑스트럼 특징 정규화 방식

  • Received : 2015.06.12
  • Accepted : 2015.07.16
  • Published : 2015.07.31

Abstract

In this paper, the pole filtering concept is applied to the Mel-frequency cepstral coefficient (MFCC) feature vectors in the conventional cepstral mean normalization (CMN) and cepstral mean and variance normalization (CMVN) frameworks. Additionally, performance of the cepstral mean and scale normalization (CMSN), which uses scale normalization instead of variance normalization, is evaluated in speech recognition experiments in noisy environments. Because CMN and CMVN are usually performed on a per-utterance basis, in case of short utterance, they have a problem that reliable estimation of the mean and variance is not guaranteed. However, by applying the pole filtering and scale normalization techniques to the feature normalization process, this problem can be relieved. Experimental results using Aurora 2 database (DB) show that feature normalization method combining the pole-filtering and scale normalization yields the best improvements.

본 논문에서는 Cepstral Mean Normalization(CMN)과 Cepstral Mean and Variance Normalization(CMVN) 프레임워크에서 극점 필터링(pole filtering) 개념을 Mel-Frequency Cepstral Coefficient(MFCC) 특징 벡터에 적용한다. 또한 분산 정규화를 대신하여 스케일 정규화를 사용하는 Cepstral Mean and Scale Normalization(CMSN)의 성능을 잡음 환경 음성인식 실험을 통해 평가한다. CMN과 CMVN은 보통 발화 단위로 수행되기 때문에 짧은 발화의 경우 특징에 대한 평균과 분산의 추정 신뢰도가 보장되지 않는 문제점을 가지는데, 극점 필터링과 스케일 정규화 방식을 적용함으로 이러한 문제점을 보완할 수 있다. Aurora 2 데이터베이스를 이용한 실험 결과, 극점 필터링과 스케일 정규화를 결합한 특징 정규화 방식의 성능이 가장 높은 성능 향상을 보인다.

Keywords

References

  1. J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, "An overview of noise-robust automatic speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process., 22, 745-777 (2014).
  2. D. Naik, "Pole-filtered cepstral mean subtraction," in Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., 157-160 (1995).
  3. M. Alam, P. Ouellet, P. Kenny, and D. O'Shaughnessy, "Comparative evaluation of feature normalization techniques for speaker verification," Adv. Nonlinear Speech Process., 246-253 (2011).
  4. M. R. Schroeder, "Direct (nonrecursive) relations between cepstrum and predictor coefficients," IEEE Trans. Acoust., Speech, Signal Process. 29, 297-301 (1981). https://doi.org/10.1109/TASSP.1981.1163546
  5. M. J. Alam, P. Kenny, P. Dumouchel, and D. O'Shaughnessy, "Robust feature extractors for continuous speech recognition," in Proc. Eur. Signal Process. Conf., 944-948 (2014).
  6. H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. Int. Conf. on Spoken Language Process., 29-32 (2000).
  7. B. K. Choi, S. M. Ban, and H. S. Kim, "Pole-filtered cepstral normalization methods for robust speech recognition" (in Korean), in Proc. the 2015 Spring Conf. of the Korean Society of Speech Sciences, 101-102 (2015).