DOI QR코드

DOI QR Code

A New Feature for Speech Segments Extraction with Hidden Markov Models

숨은마코프모형을 이용하는 음성구간 추출을 위한 특징벡터

  • Published : 2008.03.30

Abstract

In this paper we propose a new feature, average power, for speech segments extraction with hidden Markov models, which is based on mel frequencies of speech signals. The average power is compared with the mel frequency cepstral coefficients, MFCC, and the power coefficient. To compare performances of three types of features, speech data are collected for words with explosives which are generally known hard to be detected. Experiments show that the average power is more accurate and efficient than MFCC and the power coefficient for speech segments extraction in environments with various levels of noise.

본 논문에서는 숨은마코프모형을 사용하여 음성구간을 추출하는 경우에 사용되는 새로운 특징벡터인 평균파워를 제안하고, 이를 멜주파수 켑스트럴 계수(met frequency cepstral coefficients, MFCC)와 파워계수와 비교한다. 이들 세 가지 특징벡터의 수행력을 비교하기 위하여 일반적으로 추출이 상대적으로 어렵다고 알려진 파열음을 가진 단어에 대한 음성 데이터를 수집하여 실험한다. 다양한 수준의 잡음이 있는 환경에서 음성구간을 추출하는 경우 MFCC나 파워계수에 비해 평균파워가 더 정확하고 효율적임을 실험을 통해 보인다.

Keywords

References

  1. Abdulla, W. H. (2002). HMM-based techiques for speech segments extraction. Scientific Programming, 10, 221-239 https://doi.org/10.1155/2002/819429
  2. Abdulla, W. H. and Kasabov, N. K. (1999). Two pass hidden Markov model for speech recognition systems. In Proceeding of the ICICS'99
  3. Acero, A., Crespo, C., Torre, C. de la and Torrecilla, J. C. (1993). Robust HMM-based endpoint detector, In Proceeding of the EuroSpeech, 3, 1551-1554
  4. Ganchev T., Fakotakis N. and Kokkinakis G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceeding of the 10th International Conference on Speech and Computer, SPECOM 2005, 1, 191-194
  5. Haeb-Umbach, R. (1999). Investigations on inter-speaker variability in the feature space. In Proceeding of the IEEE ICASSP'99, 1, 397-400
  6. Rabiner, L. R. and Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice Hall PTR, New Jersey
  7. Seok, J. W. and Bae, K. S. (1999). Endpoint detection of speech signal using wavelet transform. The Journal of the Acoustical Society of Korea, 18, 57-63