DOI QR코드

DOI QR Code

A Phase-related Feature Extraction Method for Robust Speaker Verification

열악한 환경에 강인한 화자인증을 위한 위상 기반 특징 추출 기법

  • 권철홍 (대전대학교 정보통신공학과)
  • Received : 2010.01.13
  • Accepted : 2010.01.29
  • Published : 2010.03.31

Abstract

Additive noise and channel distortion strongly degrade the performance of speaker verification systems, as it introduces distortion of the features of speech. This distortion causes a mismatch between the training and recognition conditions such that acoustic models trained with clean speech do not model noisy and channel distorted speech accurately. This paper presents a phase-related feature extraction method in order to improve the robustness of the speaker verification systems. The instantaneous frequency is computed from the phase of speech signals and features from the histogram of the instantaneous frequency are obtained. Experimental results show that the proposed technique offers significant improvements over the standard techniques in both clean and adverse testing environments.

화자인증 시스템은 훈련 환경과 인식 환경이 다른 경우 인식 성능이 크게 저하된다. 이러한 훈련과 인식 환경의 불일치는 다양한 잡음과 상이한 채널 환경 때문이다. 본 논문은 화자인증 시스템의 강인성 개선을 위하여 음성신호의 위상에 기반한 특정 추출 기법을 제안한다. 이 방법은 음성신호의 위상으로부터 순시 주파수를 계산하여 대역별로 순시 주파수를 모두 모아 구한 히스토그램으로부터 특징 계수를 추출한다. 이 특징 파라미터를 적용한 결과 조 용한 환경뿐만 아니라 잡음환경 그리고 채널 왜곡 환경에서도 화자인증 시스템의 성능이 개선됨을 알 수 있다.

Keywords

References

  1. J. Campbell, "Speaker Recognition: a Tutorial," Proc. IEEE, vol. 85, pp. 1437-1462, 1997. https://doi.org/10.1109/5.628714
  2. J.M. Naik. "Speaker Verification," IEEE Communication Magazine, pp. 42-49, 1990.
  3. D.A Reynolds and R.C. Rose, "Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker Models," IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, 1995. https://doi.org/10.1109/89.365379
  4. R.J. Mammone, X. Zhang and R.P. Ramachandran, "Robust Speaker Recognition : a Feature-based Approach," IEEE Signal Processing Magazine, pp. 58-70, 1996.
  5. J. Ortega-Garcia and J. Gonzalez-Rodriguez, "Overview of Speech Enhancement Techniques for Automatic Speaker Recognition," IEEE Trans. Speech and Audio Processing, pp. 929-932,1996.
  6. L.R Rabiner and R.W. Schafer, Discrete-time Speech Signal Processing, Principles and Practice, Prentice Hall, NJ, 1978.
  7. H. Pobloth and W.B. Kleijn, "On Phase Prception in Speech," Proc. ICASSP, pp. 29-32, 1999.
  8. D.S. Kim, "Perceptual Phase Redundancy in Speech," Proc. ICASSP, pp. 1383-1386, 2000.
  9. H.A. Murthy and V. Gadde, "The Modified Group Delay Function and its Application to Phoneme Recognition," Proc. ICASSP, pp. 68-71,2003.
  10. P. Maragos, J.F. Kaiser and T.F. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis," IEEE Trans. on Signal Processing, vol. 41, pp. 3024-3051, 1993. https://doi.org/10.1109/78.277799
  11. D.A. Reynolds, T.F. Quatieri and R.B. Dunn, "Speaker Verification Using Adapted Gaussian Mixture Models," Digital Signal Processing, vol. 10, pp. 19-41, 2000. https://doi.org/10.1006/dspr.1999.0361
  12. Noisex-92, http://www.speech.cs.cmu.edu/comp. speech/Sectionl/Datajnoisex.html.