DOI QR코드

DOI QR Code

PCA-based Variational Model Composition Method for Roust Speech Recognition with Time-Varying Background Noise

시변 잡음에 강인한 음성 인식을 위한 PCA 기반의 Variational 모델 생성 기법

  • Kim, Wooil (School of Computer Science and Engineering, Incheon National University)
  • Received : 2013.10.21
  • Accepted : 2013.11.20
  • Published : 2013.12.31

Abstract

This paper proposes an effective feature compensation method to improve speech recognition performance in time-varying background noise condition. The proposed method employs principal component analysis to improve the variational model composition method. The proposed method is employed to generate multiple environmental models for the PCGMM-based feature compensation scheme. Experimental results prove that the proposed scheme is more effective at improving speech recognition accuracy in various SNR conditions of background music, compared to the conventional front-end methods. It shows 12.14% of average relative improvement in WER compared to the previous variational model composition method.

Acknowledgement

Supported by : 인천대학교

References

  1. S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans. on Acoustics, Speech and Signal Proc., vol.27, pp.113-120, 1979. https://doi.org/10.1109/TASSP.1979.1163209
  2. Y. Ephraim and D. Malah, "Speech Enhancement Using Minimum Mean Square Error Short Time Spectral Amplitude Estimator," IEEE Trans. on Acoustics, Speech and Signal Proc., vol.32, no.6, pp.1109-1121, 1984. https://doi.org/10.1109/TASSP.1984.1164453
  3. J. H. L. Hansen and M. Clements, "Constrained Iterative Speech Enhancement with Application to Speech Recognition," IEEE Trans. on Signal Proc., vol.39, no.4, pp.795-805, 1991. https://doi.org/10.1109/78.80901
  4. P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven Environmental Compensation for Speech Recognition: A Unified Approach," Speech Communication, vol.24, no4, pp.267- 285, 1998. https://doi.org/10.1016/S0167-6393(98)00025-9
  5. W. Kim and J. H. L. Hansen, "Feature Compensation in the Cepstral Domain Employing Model Combination," Speech Communication, vol.51, no.2, pp.83-96, 2009. https://doi.org/10.1016/j.specom.2008.06.004
  6. J. L. Gauvain and C. H. Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Trans. on Speech and Audio Proc., vol.2, no.2, pp.291-298, 1994. https://doi.org/10.1109/89.279278
  7. C. J. Leggetter and P. C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density HMMs," Computer Speech and Language, 9, pp.171-185, 1995. https://doi.org/10.1006/csla.1995.0010
  8. M. J. F. Gales and S. J. Young, "Robust Continuous Speech Recognition Using Parallel Model Combination," IEEE Trans. on Speech and Audio Proc., vol.4, no.5, pp.352-359, 1996. https://doi.org/10.1109/89.536929
  9. R. Martin, "Spectral Subtraction Based on Minimum Statistics," EUSIPCO-94, pp.1182-1185, Sep. 1994.
  10. W. Kim and J. H. L. Hansen, "Variational Noise Model Composition Through Model Perturbation for Robust Speech Recognition with Time-Varying Background Noise," Speech Communication, vol.53, no4, pp.451-464, April 2011. https://doi.org/10.1016/j.specom.2010.12.001
  11. H. G. Hirsch & D. Pearce, "The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions", ISCA ITRW ASR2000, Sep. 2000.
  12. ETSI standard document, ETSI ES 201 108 v1.1.2 (2000-04), Feb. 2000.
  13. ETSI Standard Document, ETSI ES 202 050 v1.1.1 (2002-10), 2002.