Voice Recognition Based on Adaptive MFCC and Neural Network

적응 MFCC와 Neural Network 기반의 음성인식법

  • Received : 2010.03.08
  • Accepted : 2010.04.08
  • Published : 2010.06.30

Abstract

In this paper, we propose an enhanced voice recognition algorithm using adaptive MFCC(Mel Frequency Cepstral Coefficients) and neural network. Though it is very important to extract voice data from the raw data to enhance the voice recognition ratio, conventional algorithms are subject to deteriorating voice data when they eliminate noise within special frequency band. Differently from the conventional MFCC, the proposed algorithm imposed bigger weights to some specified frequency regions and unoverlapped filterbank to enhance the recognition ratio without deteriorating voice data. In simulation results, the proposed algorithm shows better performance comparing with MFCC since it is robust to variation of the environment.

Keywords

References

  1. S. Boll "A spectral subtraction algorithm of acoustic noise in speech", IEEE International Conference on ICASSP '79, Vol. 4, pp. 200-203, 1979.
  2. Chul-Ho Park, Keun-Sung Bea. "Performance analysis of noisy speech recognition depending on parameters for noise and signal power estimation in MMSE-STSA based speech engancement", 말소리, No.57, pp. 153-164, 2006.
  3. Young-Chu Songl "Effective noise suppression in edge region using modified wiener filter", The transactions of the Korean Institute of Electrical Engineers D/D 2003, Vol. 52, No.3, pp. 173-180, 2003.
  4. B. Widrow. et al. "Adaptive noise cancelling, principles and applications", Proc. Of IEEE 63(12), pp, 1692-1716, 1975. https://doi.org/10.1109/PROC.1975.10036
  5. Chul-Hee Han, Hong-Goo Kang, Hwang Young-Soo, Youn Dea-Hee "A microphone array beamformer for the performance enhancement of speech recognizer in car", The journal of the acoustical society of Korea, Vol. 24, No.7, pp. 423-430, 2005.
  6. Miki Kazuhiko, Nishiura Takanobu, Nakamura Satoshi "Speech recognition based on HMM decomposition and composition method with a microphone array in noisy reverberant environments", Electronics & Communications in Japan. Part 2, Electronic, Vol. 85, No.9, pp. 13-22.
  7. 정용주 "A Study on noisy speech recognition using discriminative training for PMC algorithm", The Journal of the Acoustical Society of Korea, Vol. 19, No.2, pp. 83-89, 2000.
  8. Wang F-M, Kabal P, Ramachandran R.P, O'Shaughnessy D, "Frequency domain adaptive postfiltering for enhancement of noisy speech", Speech Communication, Vol. 12 No.1, pp. 41-56, 1993. https://doi.org/10.1016/0167-6393(93)90017-F
  9. Won-Ho Shin, Tae-Young Yang, Weon-Goo Kim, Dea-Hee Youn, Young- Joo Seo, "Speech recognition using noise robust features and spectral subtraction", The Journal of the Acoustical Society of Korea, Vol. 15, No.5, pp. 38-43, 1969.
  10. Sun-Mi Kang, "잡음 환경하에서의 음성인식에 관한 연구", Journal of Institute of Industrial Technology, pp. 301-318, 1997.
  11. B.H. Nitsch, "A Frequency-selective stepfactor controlfor an adaptive filter algorithm working in the frequency domain", Signal processing, the official publication of the European Association for Signal Processing(EURASIP), Vol. 80. No.9, pp. 1733-1745, 2000.
  12. Q.C. Liu, B. Champagne, D.K.C. Ho, "Simple design of oversampled uniform DFT filter banks with applications to subband acoustic echo cancellation", Signal processing, Vol. 80, No.5, pp. 831-847, 2000. https://doi.org/10.1016/S0165-1684(99)00165-6
  13. Li Shang, Hashimoto Hideo , Wu Xiaohua, Takahashi, Nobuaki, Takebe, Tauyoshi, "Adaptive IIR bandpass decimation filter for single sinusoid detection", Electrinics and communications in Japan. Part 3, Fundamental electronic science, Vol. 83, No.7, pp. 91-101, 2000. https://doi.org/10.1002/(SICI)1520-6440(200007)83:7<91::AID-ECJC10>3.0.CO;2-M
  14. L.H. Tey, P.L. So, Y.C Chu, "Adaptive neural network control of active filters", Electric Powersystems Research, Vol. 74, No.1, pp. 37-56, 2005. https://doi.org/10.1016/j.epsr.2004.09.004
  15. Chang-Young Lee, "Improvements on MFCC by elaboration of the filter banks and windows", Speech Sciences, Vol. 14, No.4, pp. 131-144, 2007.
  16. F.F. Li, T.J. Cox, "A Neural network model for speech intelligibility quantification", Applied soft computing, Vol. 7, No.1, pp. 145-155, 2007. https://doi.org/10.1016/j.asoc.2005.05.002
  17. M.H. Kostepen, G. Kurnar, "Speech recognition using back-propagation neural networks", IEEE Region 10 International Conference on EC3-Energy, Computer, Communication and Control System, Vol. 2, pp. 144-148, 1991.
  18. Y.Ephraim. "Statistical model based speech enhancement systems", Proc. IEEE, Vol. 80, No.10, pp. 1524-1555, 1992.
  19. J.B. Allen, "How do humans process and recognize speech?", IEEE Transactions on Speech and Audio Processing, 2(4), 1994.
  20. T. Zeppenfeld and A.H. Waibel, "A hybrid neural network, dynamic programming word spotter", In Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, Vol. 2, pp. 77-80, 1992.