부분 손상된 음성의 인식 향상을 위한 채널집중 MFCC 기법

Channel-attentive MFCC for Improved Recognition of Partially Corrupted Speech

  • 조훈영 (한국과학기술원 전자전산학과 전산학전공) ;
  • 지상문 (경성대학교 컴퓨터과학과) ;
  • 오영환 (한국과학기술원 전자전산학과 전산학전공)
  • 발행 : 2003.05.01

초록

본 논문에서는 주파수 영역의 일부가 상대적으로 심하게 손상된 음성에 대한 음성 인식기의 성능을 향상시키기 위해 채널집중 멜 켑스트럼 특징추출법을 제안한다. 이 방법은 기존멜 켑스트럼 특징추출의 필터뱅크분석 단계에서 각 채널의 신뢰도를 구하고, 신뢰도가 높은 주파수 영역이 음성인식에 보다 중요하게 사용되도록 멜 켑스트럼 추출 및 HMM의 출력확률 계산식에 채널 가중을 도입한다. TIDIGITS 데이터베이스에 음성의 일부 주파수를 손상시키는 다양한 주파수 선택 잡음을 가산하여 인식 실험을 수행한 결과, 제안한 방법은 덜 손상된 주파수영역의 음성 정보를 효과적으로 활용하며, 주파수선택 잡음에 대해 우수하다고 알려진 다중대역 음성인식에 비해 평균 11.2%더 높은 성능을 얻었다.

We propose a channel-attentive Mel frequency cepstral coefficient (CAMFCC) extraction method to improve the recognition performance of speech that is partially corrupted in the frequency domain. This method introduces weighting terms both at the filter bank analysis step and at the output probability calculation of decoding step. The weights are obtained for each frequency channel of filter bank such that the more reliable channel is emphasized by a higher weight value. Experimental results on TIDIGITS database corrupted by various frequency-selective noises indicated that the proposed CAMFCC method utilizes the uncorrupted speech information well, improving the recognition performance by 11.2% on average in comparison to a multi-band speech recognition system.

키워드

참고문헌

  1. IEEE Trans. On Speech and Audio Processing v.27 no.2 Suppression of acoustic noise in speech using spectral subtraction S.Boll https://doi.org/10.1109/TASSP.1979.1163209
  2. IEEE Trans. On Speech and Audio Processing v.4 no.5 Robust continuous speech recognition using parallel model combination M.J.F.Gales;S.J.Young https://doi.org/10.1109/89.536929
  3. Speech Communication v.34 Robust automatic speech recognition with missing and unreliable acoustic data M.Cook;P.Green;L.Josifovski;A.Vizinho https://doi.org/10.1016/S0167-6393(00)00034-0
  4. Proc. of European Conference on Speech Communication and Technology The full combination sub-bands approach to noise robust HMM/ANN based ASR A.Morris;A.Hagen;H.Bourlard
  5. IEEE Trans. On Speech and Audio Processing v.2 no.4 How do humans process and recognize speech J.B.Allen https://doi.org/10.1109/89.326615
  6. Proc. of International Conference on Spoken Language Processing v.1 Towards ASR on partially corrupted speech H.Hermansky;S.Tibrewala;M.Pavel
  7. IEEE Trans. On Speech and Audio Processing v.10 no.6 Robust speech recognition using probabilistic union models J.Ming;P.Jancovic;F.J.Smith https://doi.org/10.1109/TSA.2002.803439
  8. Ph.D. Dissertation, Department of Electrical Engineering and Computer Science, Division of Computer Sciecnce, KAIST Robust Speech Recognition based on Partial Information Technique H.Y.Cho
  9. Proc. of International Conference on Acoustics, Speech and Signal Processing Multi-band speech recognition in noisy environments S.Okawa;E.Bocchieri;A.Potamianos
  10. Speech Communication v.34 Multi-Stream adaptive evidence combination for noise robust ASR A.Morris;A.Hagen;H.Glotin;H.Bourlard https://doi.org/10.1016/S0167-6393(00)00044-3
  11. Proc. of European Conference on Speech Communication and Technology v.2 A recombination strategy for multi-band speech recognition based on mutual information criterion S.Okawa;T.Nakajima;K.Shiral
  12. Proc. of International Conference on Acoustics, Speech and Signal Processing v.1 Adaptive ML-weighting in multi-band recombination of gaussian mixture ASR A.Hagen;H.Bourlard;A.Morris
  13. 한국음향학회지 v.21 no.6 다중대역 음성인식을 위한 부대역 신뢰도의 추정 및 가중 조훈영;지상문;오영환