DOI QR코드

DOI QR Code

Speech Recognition Accuracy Measure using Deep Neural Network for Effective Evaluation of Speech Recognition Performance

효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표

  • Ji, Seung-eun (Department of Computer Science & Engineering, Incheon National University) ;
  • Kim, Wooil (Department of Computer Science & Engineering, Incheon National University)
  • Received : 2017.07.26
  • Accepted : 2017.10.03
  • Published : 2017.12.31

Abstract

This paper describe to extract speech measure algorithm for evaluating a speech database, and presents generating method of a speech quality measure using DNN(Deep Neural Network). In our previous study, to produce an effective speech quality measure, we propose a combination of various speech measures which are highly correlated with WER(Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we describe the method of extracting measure using DNN, and we change one of the combined measure from GMM(Gaussican Mixture Model) score used in the previous study to DNN score. The combination with DNN score shows a higher correlation with WER compared to the combination with GMM score.

본 논문에서는 음성 데이터베이스를 평가하기 위해 여러 가지의 음성 특성 지표 추출 알고리즘을 설명하고 심층 신경망 기반의 새로운 음성 성능 지표 생성 방법을 제안한다. 선행 연구에서는 효과적인 음성 인식 성능 지표를 생성하기 위해 대표적인 음성 인식 성능 지표인 단어 오인식률(Word Error Rate, WER)과 상관도가 높은 여러 가지 음성 특성 지표들을 조합하여 새로운 성능 지표를 생성하였다. 생성된 음성 성능 지표는 다양한 잡음 환경에서 각 음성 특성 지표를 단독으로 사용할 때보다 단어 오인식률과 높은 상관도를 나타내어 음성 인식 성능을 예측하는데 효과적임을 입증 하였다. 본 논문에서는 심층 신경망을 기반으로 한 음성 특성 지표 추출 방법에 대해 설명하며 선행 연구에서 조합에 사용한 GMM(Gaussian Mixture Model) 음향 모델 확률 값을 심층 신경망 학습을 통해 추출한 확률 값으로 대체해 조합함으로써 단어 오인식률과 보다 높은 상관도를 갖는 것을 확인한다.

Keywords

References

  1. S. Yoon, L. Chen, and K. Zechner, "Predicting word accuracy for the automatic speech recognition of non-native speech," Interspeech-2010, pp. 773-776, Jul. 2010.
  2. W. Kim and J. H. L. Hansen, "Phonetic distance based confidence measure," IEEE Signal Processing Letters, vol. 17, no. 2, pp. 773-776, Feb. 2010.
  3. H. Park, S. Jee and M. Bae, "Study on the Confidence-Parameter Estimation through Speech Signal," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 7, pp. 101-108, Jul. 2016.
  4. S. Ji and W. Kim, "A New Speech Quality Measure for Speech Database Verification System," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 3, pp. 464-470, Mar. 2016. https://doi.org/10.6109/jkiice.2016.20.3.464
  5. S. Ji and W. Kim, "Speech Recognition Accuracy Prediction Using Speech Quality Measure," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 3, pp. 471-476, Mar. 2016. https://doi.org/10.6109/jkiice.2016.20.3.471
  6. J. R. Deller, J. H. L. Hansen et al., Discrete-time processing of speech signals, Piscataway, NJ: IEEE Press, 1999.
  7. A. L. Garcia, Probability, Statistics and random processes for electrical engineering, 3rd ed., Pearson Education, 2008.
  8. Mel frequency cepstral coefficient tutorial. Practical cryptography [Internet]. Available: http://practicalcryptography.com/miscellaneous/machine-learning.
  9. A, S. Thakur, and N. Sahayam, "Speech recognition using euclidean distance," International Journal of Emerging Technology and Advanced Engineering (IJETAE), vol. 3, no. 3, pp. 587-590, Mar. 2013.
  10. G. Hinton, L. Deng et al., "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Oct. 2012. https://doi.org/10.1109/MSP.2012.2205597
  11. A. K. Jain, J. Mao, K.M. Mohiuddin, "Artificial neural networks: a tutorial," Computer, vol. 29, no. 3, pp. 31-44, Mar. 1996. https://doi.org/10.1109/2.485891
  12. H. N. Robert, "Theory of the backpropagation neural network," IEEE International 1989 Joint Conference on Neural Network (IJCNN), pp. 593-605, Oct. 1989.
  13. Y. Bengio, "Practical recommendations for gradient-based training of deep architectures," in Neural Networks: Tricks of the Trade, Heidelberg, Dordrecht, London New, York: Springer, pp. 437-478, 2012.
  14. X. L. Zhang, and J. Wu., "Deep neural networks based voice activity detection," IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 4, pp. 697-710, Mar. 2013. https://doi.org/10.1109/TASL.2012.2229986
  15. M. A. Nielsen, Neural network and deep learning [online]. Available: http://neuralnetworksanddeeplearning.com.
  16. TIMIT database download page. Linguistic Data Consortium [Internet]. Available: http://www.ldc.upenn.edu.
  17. CAFFE deep neural network framework download page. Berkeley Vision and Learning Center [Internet]. Available: http://github.com/BVLC/caffe.
  18. CAFFE deep neural network framework tutorial page. Berkeley Vision and Learning Center [Internet]. Available: http://caffe.berkeleyvision.org.

Cited by

  1. 뇌 기억-학습 원리를 적용한 중등영어교사 임용시험 준비용 어플 vol.21, pp.1, 2017, https://doi.org/10.5392/jkca.2021.21.01.311