DOI QR코드

DOI QR Code

Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds

  • Yoo, In-Chul (Speech Information Processing Laboratory, Department of Computer and Communication Engineering, Korea University) ;
  • Yook, Dong-Suk (Speech Information Processing Laboratory, Department of Computer and Communication Engineering, Korea University)
  • 투고 : 2009.03.05
  • 심사 : 2009.06.15
  • 발행 : 2009.08.30

초록

This letter proposes the use of vowel sound detection for voice activity detection. Vowels have distinctive spectral peaks. These are likely to remain higher than their surroundings even after severe corruption. Therefore, by developing a method of detecting the spectral peaks of vowel sounds in corrupted signals, voice activity can be detected as well even in low signal-to-noise ratio (SNR) conditions. Experimental results indicate that the proposed algorithm performs reliably under various noise and low SNR conditions. This method is suitable for mobile environments where the characteristics of noise may not be known in advance.

키워드

참고문헌

  1. J. Sohn, N.S. Kim, and W. Sung, “A Statistical Model-Based Voice Activity Detection,” IEEE Signal Process. Lett., vol. 6, no. 1, 1999, pp. 1-3.
  2. A. Davis, S. Nordholm, and R. Togneri, “Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold,” IEEE Trans. Audio, Speech, and Language Process., vol. 14, no. 2, 2006, pp. 412-424. https://doi.org/10.1109/TSA.2005.855842
  3. L.F. Lamel et al., “An Improved Endpoint Detector for Isolated Word Recognition,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 4, 1981, pp. 777-785.
  4. J.L. Shen, J.W. Hung, and L.S. Lee, “Robust Entropy-Based Endpoint Detection for Speech Recognition in Noisy Environments,” Proc. Int. Conf. Spoken Language Process., paper 0232, 1998.
  5. ITU-T, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to ITU-T V.70, ITU-T Rec. G. 729 Annex B, 1996.
  6. ETSI, Digital Cellular Telecommunications System (Phase 2+); Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, GSM 06.94 v7.1.1 (ETSI EN 301 708), 1999.
  7. I.D. Lee, H.P. Stern, and S.A. Mahmoud, “A Voice Activity Detection Algorithm for Communication Systems with Dynamically Varying Background Acoustic Noises,” Proc. Veh. Technol. Conf., vol. 2, 1998, pp. 1214-1218.
  8. I.C. Yoo and D. Yook, “Automatic Sound Recognition for the Hearing Impaired,” IEEE Trans. Consum. Electron., vol. 54, no. 4, 2008, pp. 2029-2036. https://doi.org/10.1109/TCE.2008.4711269

피인용 문헌

  1. Speaker localization in noisy environments using steered response voice power vol.61, pp.1, 2009, https://doi.org/10.1109/tce.2015.7064118
  2. Formant-Based Robust Voice Activity Detection vol.23, pp.12, 2009, https://doi.org/10.1109/taslp.2015.2476762
  3. Harmonic-Based Robust Voice Activity Detection for Enhanced Low SNR Noisy Speech Recognition System vol.ea99, pp.11, 2016, https://doi.org/10.1587/transfun.e99.a.1928
  4. Voice Activity Detection Using an Adaptive Context Attention Model vol.25, pp.8, 2009, https://doi.org/10.1109/lsp.2018.2811740
  5. Efficient harmonic peak detection of vowel sounds for enhanced voice activity detection vol.12, pp.8, 2009, https://doi.org/10.1049/iet-spr.2017.0553