DOI QR코드

DOI QR Code

PCA-based Variational Model Composition Method for Roust Speech Recognition with Time-Varying Background Noise

시변 잡음에 강인한 음성 인식을 위한 PCA 기반의 Variational 모델 생성 기법

  • Kim, Wooil (School of Computer Science and Engineering, Incheon National University)
  • Received : 2013.10.21
  • Accepted : 2013.11.20
  • Published : 2013.12.31

Abstract

This paper proposes an effective feature compensation method to improve speech recognition performance in time-varying background noise condition. The proposed method employs principal component analysis to improve the variational model composition method. The proposed method is employed to generate multiple environmental models for the PCGMM-based feature compensation scheme. Experimental results prove that the proposed scheme is more effective at improving speech recognition accuracy in various SNR conditions of background music, compared to the conventional front-end methods. It shows 12.14% of average relative improvement in WER compared to the previous variational model composition method.

본 논문에서는 시간에 따라 변하는 잡음 환경에 강인한 음성 인식을 위해 효과적인 특징 보상 기법을 제안한다. 제안하는 기법에서는 기존의 Variational 모델 생성 기법의 모델 정확도를 향상시키고자 PCA를 도입한다. 제안된 기법은 다중 모델을 사용하는 PCGMM 기반의 특징 보상에 적용된다. 실험 결과는 제안한 PCA 기반의 Variational 모델 생성 기법이 배경 음악 환경의 다양한 SNR 조건에서 기존의 전처리 기법에 비하여 음성 인식 성능을 향상 시키는데 우수함을 입증한다. 제안한 모델 생성 기법이 기존의 Variational 모델 생성 방법에 비해 배경 음악 환경에서 평균 12.14%의 상대적 인식 성능 향상률을 나타낸다.

Keywords

References

  1. S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans. on Acoustics, Speech and Signal Proc., vol.27, pp.113-120, 1979. https://doi.org/10.1109/TASSP.1979.1163209
  2. Y. Ephraim and D. Malah, "Speech Enhancement Using Minimum Mean Square Error Short Time Spectral Amplitude Estimator," IEEE Trans. on Acoustics, Speech and Signal Proc., vol.32, no.6, pp.1109-1121, 1984. https://doi.org/10.1109/TASSP.1984.1164453
  3. J. H. L. Hansen and M. Clements, "Constrained Iterative Speech Enhancement with Application to Speech Recognition," IEEE Trans. on Signal Proc., vol.39, no.4, pp.795-805, 1991. https://doi.org/10.1109/78.80901
  4. P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven Environmental Compensation for Speech Recognition: A Unified Approach," Speech Communication, vol.24, no4, pp.267- 285, 1998. https://doi.org/10.1016/S0167-6393(98)00025-9
  5. W. Kim and J. H. L. Hansen, "Feature Compensation in the Cepstral Domain Employing Model Combination," Speech Communication, vol.51, no.2, pp.83-96, 2009. https://doi.org/10.1016/j.specom.2008.06.004
  6. J. L. Gauvain and C. H. Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Trans. on Speech and Audio Proc., vol.2, no.2, pp.291-298, 1994. https://doi.org/10.1109/89.279278
  7. C. J. Leggetter and P. C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density HMMs," Computer Speech and Language, 9, pp.171-185, 1995. https://doi.org/10.1006/csla.1995.0010
  8. M. J. F. Gales and S. J. Young, "Robust Continuous Speech Recognition Using Parallel Model Combination," IEEE Trans. on Speech and Audio Proc., vol.4, no.5, pp.352-359, 1996. https://doi.org/10.1109/89.536929
  9. R. Martin, "Spectral Subtraction Based on Minimum Statistics," EUSIPCO-94, pp.1182-1185, Sep. 1994.
  10. W. Kim and J. H. L. Hansen, "Variational Noise Model Composition Through Model Perturbation for Robust Speech Recognition with Time-Varying Background Noise," Speech Communication, vol.53, no4, pp.451-464, April 2011. https://doi.org/10.1016/j.specom.2010.12.001
  11. H. G. Hirsch & D. Pearce, "The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions", ISCA ITRW ASR2000, Sep. 2000.
  12. ETSI standard document, ETSI ES 201 108 v1.1.2 (2000-04), Feb. 2000.
  13. ETSI Standard Document, ETSI ES 202 050 v1.1.1 (2002-10), 2002.