DOI QR코드

DOI QR Code

Signal Subspace-based Voice Activity Detection Using Generalized Gaussian Distribution

일반화된 가우시안 분포를 이용한 신호 준공간 기반의 음성검출기법

  • 엄용섭 (전남대학교 전자컴퓨터공학부) ;
  • 장준혁 (한양대학교 전자전기공학부) ;
  • 김동국 (전남대학교 전자컴퓨터공학부)
  • Received : 2012.09.14
  • Accepted : 2012.10.19
  • Published : 2013.03.31

Abstract

In this paper we propose an improved voice activity detection (VAD) algorithm using statistical models in the signal subspace domain. A uncorrelated signal subspace is generated using embedded prewhitening technique and the statistical characteristics of the noisy speech and noise are investigated in this domain. According to the characteristics of the signals in the signal subspace, a new statistical VAD method using GGD (Generalized Gaussian Distribution) is proposed. Experimental results show that the proposed GGD-based approach outperforms the Gaussian-based signal subspace method at 0-15 dB SNR simulation conditions.

본 논문에서는 신호준공간(signal subspace) 영역에서 통계적 모델을 이용한 향상된 음성검출기법을 제안한다. 이를 위해 EP(Embedded Prewhitening) 기법에 의해 비상관적인 (uncorrelated) 신호준공간을 생성하고, 이 영역에서 잡음음성과 잡음에 대한 통계적 특성을 파악하였다. 이러한 통계적 특성에 근거하여 GGD (Generalized Gaussian Distribution)을 사용하여 보다 효율적인 음성검출 알고리즘을 제안한다. 실험을 통해 제안된 기법이 0-15dB SNR의 시뮬레이션 환경에서 기존 Gaussian을 사용한 신호준공간 기법보다 향상된 음성검출 결과를 보여준다.

Keywords

References

  1. J. S. Sohn, N. S. Kim, and W. Y. Sung, "A statistical model-based voice activity detection," IEEE Sig. Proc. Lett. 6, 1-3 (1999).
  2. Y. D. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Sig. Proc. Lett. 8, 276-278 (2001). https://doi.org/10.1109/97.957270
  3. J. H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Signal Proc. 54, 1965-1976 (2006). https://doi.org/10.1109/TSP.2006.874403
  4. J. H. Chang, J. W. Shin, and N. S. Kim, "Voice activity detector employing generalized Gaussian distribution," IEE Electronics Lett. 40, 1561-1563 (2004). https://doi.org/10.1049/el:20047090
  5. J. W. Shin, J. H. Chang, and N. S. Kim, "Statistical modeling of speech signals based on generalized Gamma distribution," IEEE Sig. Proc. Lett. 12, 258-261 (2005). https://doi.org/10.1109/LSP.2004.840869
  6. S. Gazor and W. Zhang, "A Soft Voice Activity Detector Based on a Laplacian-Gaussian Model," IEEE Trans. Speech and Audio Proc. 11, 498-505 (2003). https://doi.org/10.1109/TSA.2003.815518
  7. K. C. Ryu and D. K. Kim, "Statistical voice activity detector based on signal subspace model," (In Korean), J. Acoust. Soc. Kr. 27, 372-378 (2008).
  8. D. K. Kim and J.H. Chang, "A subspace approach based on embedded prewhitening for voice activity detection," J. Acoust. Soc. Am. 130, 304-310 (2011). https://doi.org/10.1121/1.3638927
  9. Y. S. Um, J. H. Chang, and D. K. Kim "An improved VAD approach based on generalized Gaussian distribution in signal subspace domain," (In Korean), J. Acoust. Soc. Kr. 2(s) 30, 136-139 (2011). https://doi.org/10.7776/ASK.2011.30.3.136
  10. S. Gazor and W. Zhang "Speech probability distribution," IEEE Sig. Proc. Lett. 10, 204-207 (2003). https://doi.org/10.1109/LSP.2003.813679
  11. Y. Hu and P. C. Loizou, "A generalized subspace approach for enhancing speech corrupted by colored noise," IEEE Trans. Speech Audio Proc. 11, 334-341 (2003). https://doi.org/10.1109/TSA.2003.814458
  12. A. G. Glen, L. M. Leemis, and D. R. Barr, "Order statistics in goodness-of-fit testing," IEEE Trans. Reliab. 50, 209-213 (2001). https://doi.org/10.1109/24.963129
  13. S. Nadarajah, "A generalized Gaussian distribution," J. Appl. Stat. 32, 685-694 (2005). https://doi.org/10.1080/02664760500079464

Cited by

  1. Audio Mixer Algorithm for Enhancing Speech Quality of Multi-party Audio Telephony vol.32, pp.6, 2013, https://doi.org/10.7776/ASK.2013.32.6.541