Speech and Noise Recognition System by Neural Network

신경회로망에 의한 음성 및 잡음 인식 시스템

  • Received : 2010.06.25
  • Accepted : 2010.08.05
  • Published : 2010.08.31

Abstract

This paper proposes the speech and noise recognition system by using a neural network in order to detect the speech and noise sections at each frame. The proposed neural network consists of a layered neural network training by back-propagation algorithm. First, a power spectrum obtained by fast Fourier transform and linear predictive coefficients are used as the input to the neural network for each frame, then the neural network is trained using these power spectrum and linear predictive coefficients. Therefore, the proposed neural network can train using clean speech and noise. The performance of the proposed recognition system was evaluated based on the recognition rate using various speeches and white, printer, road, and car noises. In this experiment, the recognition rates were 92% or more for such speech and noise when training data and evaluation data were the different.

본 논문에서는 음성 및 잡음 구간을 검출하기 위하여 신경회로망에 의한 음성 및 잡음 인식시스템을 제안한다. 제안하는 신경회로망은 오차역전파알고리즘에 의하여 학습되는 네트워크이다. 먼저, 고속 푸리에변환에 의한 전력스펙트럼 및 선형예측계수가 각 프레임에서 신경회로망의 입력으로 사용되어 네트워크가 학습된다. 따라서 제안된 신경회로망은 잡음이 중첩되지 않은 음성 및 잡음을 사용하여 학습된다. 제안한 인식시스템의 성능은 다양한 음성 및 백색, 프린터, 도로, 자동차 잡음 들을 사용하여 인식율에 의하여 평가된다. 본 실험에서는 신경회로망의 학습 데이터 및 평가 데이터가 다를 경우에도 이러한 음성 및 잡음에 대하여 92% 이상의 인식율을 구할 수 있었다.

Keywords

References

  1. A. Ishida and H. Gobata, "Speech/Non-speech Discrimination under Real Life Environments," The Journal of the Acoustical Society of Japan, Vol. 47, No. 12, pp. 911-917, 1991 (in Japanese).
  2. Y. Wu and Y. Li, "Robust speech/non-speech detection in adverse conditions using the fuzzy polarity correlation method," IEEE International Conference on Systems, Man, and Cybernetics, Vol. 4, pp. 2935-2939, 2000.
  3. W. G. Knecht, M. E. Schenkel, G. S. Moschytz, "Neural network filters for speech enhancement," IEEE Trans. Speech and Audio Processing, Vol. 3, No. 6, pp. 433-438, 1995. https://doi.org/10.1109/89.482210
  4. R. P. Lippmann, "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, April 1987.
  5. A. V. Ooyen and B. Nienhuis, "Improving the convergence of the back-propagation algorithm," Neural Networks 5, 3, pp. 465-471, 1992. https://doi.org/10.1016/0893-6080(92)90008-7
  6. D. Rumelhart, G. Hinton and R. Williams, "Learning representations by back-propagation errors," Nature323, pp. 533-536, 1986. https://doi.org/10.1038/323533a0
  7. S. K. Pal, S. Mitra, "Multilayer perceptron, fuzzy sets, and classification," IEEE Transaction on Neural Networks, Vol. 3, No. 5, pp. 683-697, 1992.
  8. D. Rumelhart, "Parallel Distributed Processing," Vol. 1 and 2, MIT Press, Cambridge, MA, 1986.
  9. 최재승, "음성신호의 선형예측계수에 의한 잡음량의 인식," 대한전자공학회 논문지 제46권 SP편, 제2호, pp. 120-126, 2009.
  10. H. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," in Proc. ISCA ITRW ASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France, 2000.