Comparison of HMM models and various cepstral coefficients for Korean whispered speech recognition

은닉 마코프 모델과 켑스트럴 계수들에 따른 한국어 속삭임의 인식 비교

  • Park, Chan-Eung (Information and Communications Course, Induck Institute of Technology)
  • Published : 2006.06.25

Abstract

Recently the use of whispered speech has increased due to mobile phone and the necessity of whispered speech recognition is increasing. So various feature vectors, which are mainly used for speech recognition, are applied to their HMMs, normal speech models, whispered speech models, and integrated models with normal speech and whispered speech so as to find out suitable recognition system for whispered speech. The experimental results of recognition test show that the recognition rate of whispered speech applied to normal speech models is too low to be used in practical applications, but separate whispered speech models recognize whispered speech with the highest rates at least 85%. And also integrated models with normal speech and whispered speech score acceptable recognition rate but more study is needed to increase recognition rate. MFCE and PLCC feature vectors score higher recognition rate when applied to separate whispered speech models, but PLCC is the best when a lied to integrated models with normal speech and whispered speech.

본 논문에서는 모바일 환경에 따른 속삭임의 사용이 증가하는 데 따른 속삭임 인식을 위하여 음성인식에 많이 사용되고 있는 특징벡터들을 은닉 마코프 모델을 이용, 정상어 모델, 속삭임 모델, 정상어, 속삭임 통합 모델들에 인식 시험하고 결과를 분석하여 가장 적합한 인식 시스템을 찾으려고 하였다. 인식 시험을 통하여 속삭임의 인식은 정상어 모델로 인식하는 시스템은 낮은 인식률로 실용성이 없으며 속삭임 모델을 별도로 사용하는 것이 85%이상의 가장 높은 인식률을 보였다. 또한 '정상어+속삭임' 모델도 인식률은 조금 벌어지나 가능성을 확인할 수 있었다. 특징벡터로는 속삭임 모델을 사용하는 경우 MFCC 혹은 PLCC를 사용하는 것이 거의 유사하게 높은 인식률을 얻을 수 있었으나 '정상어+속삭임' 모델을 사용하는 경우 PLCC를 특징벡터로 사용하는 것이 속삭임 인식에서 가장 좋은 결과를 보였다.

Keywords

References

  1. S. T. Jovicic and M. M. dordevic, 'Acoustic features of whispered speech,' ACUSTICA-acta acustica., vol. 82, pp. S228, 1996
  2. Holmes J. N and A. P. Stephens, 'Acoustic correlates of Intonation in whispered speech', J. Acoust. Soc. Am., 73, S87, 1983
  3. K. J. Kallail and F. W. Emanuel, 'Formant Feature Differences Between Whispered and Voiced Sustained Vowels,' ACUSTICA-acta acustica, vol. 84, pp. 739-743, 1998
  4. Taisuke Itoh, Kazuya Takeda, and Fumitada Itakura, 'Acoustic Analysis and Recognition of Whispered Speech,' IEEE Int. Conf. on ASSP, vol. 1, pp. 389-392, 2002
  5. J. L. Flanagan, Speech Analysis Synthesis and Perception, Springer-Verlag, New York, 2nd. edition, 1972
  6. R-M. S. Heffner, General Phonetics, The University of Wisconsin Press, Madison, 1960
  7. L. R. Rabiner and R. W. Shafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, 1978
  8. J. W. Picone, 'Signal Modeling Techniques in Speech Recognition,' in Proc. IEEE, vol. 81, no. 9, pp. 1215-1247, Sep. 1993 https://doi.org/10.1109/5.237532
  9. S. Davis and P. Mermelstein, 'Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,' IEEE Trans. on Acoustic, Speech, and Signal Processing, vol. 28, pp. 357-366, 1980 https://doi.org/10.1109/TASSP.1980.1163420
  10. H. Hermansky, 'Perceptual linear predictive(PLP) analysis of speech,' J. Acoust. Soc. Am., pp. 1738-1752, 1990 https://doi.org/10.1121/1.399423
  11. L. R. Rabiner and B. H. Juang, 'An Introduction to Hidden Markov Models,' IEEE ASSP MAGAZINE, pp 4-16, Jan. 1986 https://doi.org/10.1109/MASSP.1986.1165381
  12. L. R. Rabiner, B. H. Juang, S. E. Levinson, and M. M. Sondhi, 'Recognition of Isolated Digits Using Hidden Markov Models with Continuous Mixture Densities,' AT&T Technical Journal, Vol. 64, No. 6, pp. 1211-1234, July-August 1985 https://doi.org/10.1002/j.1538-7305.1985.tb00272.x