Discriminative Weight Training for a Statistical Model-Based Voice Activity Detection

통계적 모델 기반의 음성 검출기를 위한 변별적 가중치 학습

  • 강상익 (인하대학교 전자전기공학부) ;
  • 조규행 (인하대학교 전자전기공학부) ;
  • 박승섭 (서울대학교 전기컴퓨터공학부) ;
  • 장준혁 (인하대학교 전자전기공학부)
  • Published : 2007.07.31

Abstract

In this paper, we apply a discriminative weight training to a statistical model-based voice activity detection(VAD). In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratios(LRs) based on a minimum classification error(MCE) method which is different from the previous works in that different weights are assigned to each frequency bin which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LR test.

본 논문에서는 음성의 통계적 모델에 기반한 음성검출기의 성능향상을 위해 변별적 가중치 학습(discriminative weight training) 기반의 최적화된 우도비 테스트(Likelihood Ratio Test, LRT)를 제안한다. 먼저, 기존의 통계모델기반의 음성검출기를 분석하고, 이를 기반으로 MCE(minimum classification error)방법을 도입하여, 각 주파수 채널별로 다른 가중치를 가지는 우도비 기반의 음성검출 결정법(decision rule)을 제시한다. 제안된 알고리즘은 비정상(non-stationary)잡음환경에서 기존의 동일 가중치를 가지는 기하 평균 기반의 음성검출기와 비교하였으며, 우수한 성능을 보인다.

Keywords

References

  1. Y. Ephraim and D. Malah, 'Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator: IEEE Trans. Acoustics, Speech, Sig. Process., ASSP-32, (6) 1190-1121, Dec. 1984
  2. J. Sohn and W. Sung, 'A voice activity detector employing soft decision based noise spectrum adaptation: Proc. Int. Conf. Acoustics, Speech, and Sig. Process., 1, 365--368, May 1998
  3. J. Sohn, N. S. Kim, and W. Sung, 'A statistical model-based voice activity detection,' IEEE Sig. Process. Lett., 6 (1) 1-3, Jan. 1999
  4. Y. D. Cho and A. Kondoz, 'Analysis and improvement of a statistical model-based voice activity detector,' IEEE Sig. Process. Lett., 8 (10) 276-278, Oct. 2001 https://doi.org/10.1109/97.957270
  5. J. -H. Chang, J. W. Shin, and N. S. Kim, 'Voice activity detector employing generalised gaussian distribution,' Electron. Lett., 40 (24) 1561-1563, Nov. 2004 https://doi.org/10.1049/el:20047090
  6. J. -H. Chang, N. S. Kim, and S. K. Mitra, 'Voice activity detection based on multiple statistical models,' IEEE Trans. Sig. Process., 54 (6) 1965-1976, June 2006 https://doi.org/10.1109/TSP.2006.874403
  7. Y. C. Lee and S. S. Ahn, 'Statistical model-based VAD algorithm with wavelet Transform,' IEICE Trans. Fundamentals., E89-A, (6) 1594-1600, June 2006 https://doi.org/10.1093/ietfec/e89-a.6.1594
  8. J. Ramirez, J. M. Gorriz, J. C. Segura, C. G. Puntonet, and A. J. Rubio, 'Speech/non-speech discrimination based on contextual information integrated bispectrum LRT,' IEEE Sig. Process. Lett., 13 (8) 497-500, Aug. 2006 https://doi.org/10.1109/LSP.2006.873147
  9. B. -H. Juanq, W. Chou, and C. -H. Lee, 'Mimum classification error rate methods for speech recognition,' IEEE Trans. Speech Audio Processing, 5 (3) 257-265, May 1997 https://doi.org/10.1109/89.568732
  10. Y. Kida, T. Kawahara, 'Voice activity detection based on optimally weighted combination of muliple feature,' Interspeech, 2621-2624, Sep, 2005