Statistical Voice Activity Defector Based on Signal Subspace Model

신호 준공간 모델에 기반한 통계적 음성 검출기

  • 류광춘 (전남대학교 전자컴퓨터공학과) ;
  • 김동국 (전남대학교 전자컴퓨터공학과)
  • Published : 2008.10.31

Abstract

Voice activity detectors (VAD) are important in wireless communication and speech signal processing, In the conventional VAD methods, an expression for the likelihood ratio test (LRT) based on statistical models is derived in discrete Fourier transform (DFT) domain, Then, speech or noise is decided by comparing the value of the expression with a threshold, This paper presents a new statistical VAD method based on a signal subspace approach, The probabilistic principal component analysis (PPCA) is employed to obtain a signal subspace model that incorporates probabilistic model of noisy signal to the signal subspace method, The proposed approach provides a novel decision rule based on LRT in the signal subspace domain, Experimental results show that the proposed signal subspace model based VAD method outperforms those based on the widely used Gaussian distribution in DFT domain.

음성 검출기 (VAD, Voice Activity Detector)는 이동 통신이나 음성신호처리 등에 매우 중요한 기법으로 사용된다. 일반적인 음성 검출방식은 이산 푸리에 변환 (DFT, Discrete Fourier Transform)영역에서 통계적인 모델을 기반으로 하여 우도비검정 (LRT, Likelihood Ratio Test)을 하게 된다. 그리고 이 값을 임계값과 비교하며 음성인지 아닌지 판단하게 된다. 본 논문에서는 신호 준공간 (Signal Subspace)에 기반한 새로운 통계적 음성 검출 기법을 제안하다. 확률적인 주성분 분석 (PPCA, Probabilistic Principal Component Analysis)은 신호 준공간 방법에서 잡음신호에 대한 확률적인 모델을 얻기 위해 사용된다. 제안된 기법은 신호 준공간 영역에서 우도비검정에 기반을 두는 결정규칙을 적용하였다. 음성 검출 실험 결과는 신호 준공간 모델에 근거한 음성 검출기 기법이 주파수 영역에 기반한 가우시안 (Gaussian) 음성 검출기 보다 향상된 검출 결과를 보여준다.

Keywords

References

  1. A. Dvis, S. Nordholm and R. Togneri, "Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold," IEEE Trans. Audio, Speech, and Language Processing, 14(2), 412-424, March 2006
  2. J. S. Sohn, N. S. Kim and W. Y. Sung, "A stistical Model -Based Voice Activity Detection," IEEE Signal pocessing Lett., 6(1), 1-3, Jan. 1999
  3. N. S. Kim, and J. -H. Chang, "Spectral Enhancement Based on Global Soft Decision," IEEE Signal Process. Lett,, 7(5), 108-110, 2000
  4. J. -H. Chang, J. W. Shin and N. S. Kim "Voice Activity Detector Employing Generalized Gaussian Distribution," IEEE Electronics Lett. 40(24), 1561-1563, Nov. 2004
  5. J. -H. Chang, N. S. Kim and S. K. Mitra, "Voice Activity Detection Based on Multiple Statistical Models," IEEE Trans. Signal Proc., 54(6), 1965-1976, June 2006
  6. S. Gazor and W. Zhang, "A Soft Voice Activity Detector Based on a Laplacian-Gaussian Model," IEEE Trans. Speech and Audio Proc., 11(5), 498-505, Sept. 2003
  7. P. Loizou, Speech Enhancement : Theory and Practice, CRC Press. 2007
  8. Y. Ephraim and H. L. Van Tress, "A Signal Subspace Approach for Speech Enhancement," IEEE Trans. Speech and Audio Proc., 3(4), 251-266, July 1995 https://doi.org/10.1109/89.397090
  9. F. Jabloun and B. Champagne, "Incorporating the Human Hearing Properties in the Signal Subspace Approach for Speech Enhancement," IEEE Trans. Speech and Audio Proc., 11(6), 700-708, Nov. 2003
  10. K. Hermus, P. Wambacq, and H. V. Hamme, "A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition," EURASIP Journal on Advances in Signal Procesing, 2007, Article D 45821, 15 pages, 2007
  11. M. Tipping and C. Bishop, "Mixtures of probabilistic principal component analyzers," Neural Computation, 11, 435-474, 1999
  12. A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, 39, 1-38, 1977
  13. Y. Ephraim and D. Malah, "Speech Enhancement Using A Minimum Mean-square Error Short-time Spectral Amplitude Estimator," IEEE Trans. Acoust., Speech, Signal Proc., ASSP -32, 1109-1121, Dec. 1984
  14. A.Varga and H.J.M. Steeneken, "Assessment for Automatic Speech Recognition: II.NOISEX-92: A Database and An Experiment to Study The Effect of Additive Noise on Speech Recognition Systems," Speech Communication, 12(3), 247- 251, Jul.1993 https://doi.org/10.1016/0167-6393(93)90095-3
  15. 강상익, 조규행, 박승섭, 장준혁, "통계적 모델 기반의 음성 검출 기를 위한 변별적 가충치 학습," 한국음향학회지, 26(5), 194- 198, 2007년 7월
  16. 장근원, 장준혁, 김동국, "UMP 테스트에 근거한 새로운 통계적 음성검출기," 한국음향학회지,26(1), 16-24, 2007년 1월
  17. S. Roweis,"EM Algorithms for PCA and SPCA,"Neural Inform. Process. System, 10, 626-632, 1997