Speech synthesis using acoustic Doppler signal

초음파 도플러 신호를 이용한 음성 합성

Lee, Ki-Seung

  • Received : 2015.10.08
  • Accepted : 2015.12.10
  • Published : 2016.03.31


In this paper, a method synthesizing speech signal using the 40 kHz ultrasonic signals reflected from the articulatory muscles was introduced and performance was evaluated. When the ultrasound signals are radiated to articulating face, the Doppler effects caused by movements of lips, jaw, and chin observed. The signals that have different frequencies from that of the transmitted signals are found in the received signals. These ADS (Acoustic-Doppler Signals) were used for estimating of the speech parameters in this study. Prior to synthesizing speech signal, a quantitative correlation analysis between ADS and speech signals was carried out on each frequency bin. According to the results, the feasibility of the ADS-based speech synthesis was validated. ADS-to-speech transformation was achieved by the joint Gaussian mixture model-based conversion rules. The experimental results from the 5 subjects showed that filter bank energy and LPC (Linear Predictive Coefficient) cepstrum coefficients are the optimal features for ADS, and speech, respectively. In the subjective evaluation where synthesized speech signals were obtained using the excitation sources extracted from original speech signals, it was confirmed that the ADS-to-speech conversion method yielded 72.2 % average recognition rates.


Speech synthesis;Ultrasonic Doppler signals;Silence speech interface;Voice conversion


  1. B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, "Silent speech interfaces," Speech Comm. 52, 270-287 (2010).
  2. K. S. Lee, "Prediction of acoustic feature parameters using myoelectric signals," IEEE Trans. on Biomed. Eng. 51, 1587-1595 (2010).
  3. T. Toda and K. Shikano, "NAM-to-Speech conversion with Gaussian Mixture Models," in Proc. Interspeech, 1957-1960 (2005).
  4. S. Li, J. Q. Wang, M. Niu, T. Liu, and X. J. Jing, "The enhancement of millimeter wave conduct speech based on perceptual weighting," Progress in Electromagnetics Research B, 9, 199-214 (2008).
  5. K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech," Speech Comm. 54, 134-146 (2012).
  6. K. Kalgaonkar and B. Raj, "An acoustic Doppler-based front end for hands free spoken user interfaces," in Proc. SLT, 158-161 (2006).
  7. K. Kalgaonkar and B. Raj, "Acoustic Doppler sonar for gait recognition," in Proc. 2007 IEEE Conf. Advanced Video and Signal Based Surveillance, 27-32 (2007).
  8. K. Kalgaonkar and B. Raj, "One-handed gesture recognition using ultrasonic Doppler sonar," Proc. ICASSP, 1889-1892 (2009).
  9. S. Srinivasan, B. Raj, and T. Ezzat, "Ultrasonic sensing for robust speech recognition," in Proc. ICASSP, 5102-5105 (2010).
  10. K. Livescu, B. Zhu, and J. Glass, "On the phonetic information in ultrasonic microphone signals," in Proc. ICASSP, 4621-4624 (2009).
  11. A. R. Toth, B. Raj, K. Kalgaonkar, and T. Ezzat, "Synthesizing speech from Doppler signals," in Proc. ICASSP, 4638-4641 (2010).
  12. I. Almajai and B. Milner, "Visually derived Wiener filters for speech enhancement," IEEE Trans. on Audio, Speech, and Lang. Process. 19, 1642-1651 (2011).


Supported by : 한국연구재단