DOI QR코드

DOI QR Code

Modified Mel Frequency Cepstral Coefficient for Korean Children's Speech Recognition

한국어 유아 음성인식을 위한 수정된 Mel 주파수 캡스트럼

  • Received : 2012.12.31
  • Accepted : 2013.01.18
  • Published : 2013.03.28

Abstract

This paper proposes a new feature extraction algorithm to improve children's speech recognition in Korean. The proposed feature extraction algorithm combines three methods. The first method is on the vocal tract length normalization to compensate acoustic features because the vocal tract length in children is shorter than in adults. The second method is to use the uniform bandwidth because children's voice is centered on high spectral regions. Finally, the proposed algorithm uses a smoothing filter for a robust speech recognizer in real environments. This paper shows the new feature extraction algorithm improves the children's speech recognition performance.

Keywords

Speech Recognition;Speech Database;Speech Interface;MFCC

References

  1. 김종훈, 송창우, 김주현, 정경용, 임기욱, 이정현, "음성인식을 이용한 상황정보 기반의 스마트 홈 개인화 서비스", 한국콘텐츠학회논문지, Vol.9, No.11, pp.80-89, 2009. https://doi.org/10.5392/JKCA.2009.9.11.080
  2. 권순일, "애니메이션 저작도구를 위한 음성 기반 음향 스케치", 한국콘텐츠학회논문지, Vol.10, No.4, pp.1-9, 2010. https://doi.org/10.5392/JKCA.2010.10.4.001
  3. 유재권, 이경미, "한국어에서 성인과 유아의 음성 인식 비교", 한국콘텐츠학회논문지, Vol.11, No.5, pp.138-147, 2011.
  4. A. Potamianos and S. Narayanan, "A review of the acoustic and linguistic properties of children's speech," in proc. of IEEE Multimedia Signal Processing Workshop, 2007.
  5. D. Elenius and M. Blomberg, "Comparing speech recognition for adults and children," in proc. of FONETIK, pp.156-159, 2004.
  6. D. Giuliani and M. Gerosa, "Investigating recognition of children's speech," in proc. of ICASSP, Vol.II, pp.137-140, 2003.
  7. S. Das, D. Nix, and M. Picheny, "Improvements in children's speech recognition performance," in proc. of ICASSP, Vol.I, pp.433-436, 1998.
  8. F. Zheng, G. Zhang, and Z. Song, "Comparison of different implementations of MFCC," Journal of computer science and technology, Vol.16, No.6, pp.582-589, 2001. https://doi.org/10.1007/BF02943243
  9. S. Umesh and R. Sinha, "A study of filter bank smoothing in MFCC features for recognition of children's speech," IEEE transactions on acoustic. speech and signal processing, Vol.15, No.8, pp.2418-2430, 2007.
  10. 유재권, 이경옥, 이경미, "한국어에서 만 3-5세 유아의 음성 데이터베이스 구축", 한국콘텐츠학회논문지, Vol.12, No.4, pp.52-59, 2012.
  11. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, "The HTK Book," Microsoft Corporation and Cambridge University Engineering Department, 2009.