DOI QR코드

DOI QR Code

Enhanced Maximum Voiced Frequency Estimation Scheme for HTS Using Two-Band Excitation Model

  • Park, Jihoon (Department of Research, Center for Integrated Smart Sensors) ;
  • Hahn, Minsoo (Department of Electrical Engineering, KAIST)
  • 투고 : 2015.02.09
  • 심사 : 2015.07.08
  • 발행 : 2015.12.01

초록

In a hidden Markov model-based speech synthesis system using a two-band excitation model, a maximum voiced frequency (MVF) is the most important feature as an excitation parameter because the synthetic speech quality depends on the MVF. This paper proposes an enhanced MVF estimation scheme based on a peak picking method. In the proposed scheme, both local peaks and peak lobes are picked from the spectrum of a linear predictive residual signal. The average of the normalized distances of local peaks and peak lobes is calculated and utilized as a feature to estimate an MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves the synthetic speech quality compared with that of a conventional one in a mobile device as well as a PC environment.

키워드

참고문헌

  1. A. Hunt and A.W. Black, "Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Atlanta, GA, USA, May 7-10, 1996, pp. 373-376.
  2. K. Tokuda, T. Kobayashi, and S. Imai, "Speech Parameter Generation form HMM Using Dynamic Features," Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process., Michigan, USA, May 8-12, 1995, pp. 660-663.
  3. K. Tokuda et al., "An Algorithm for Speech Parameter Generation from Continuous Mixture HMMs with Dynamic Features," Proc. Eurospeech, Madrid, Spain, Sept. 18-21, 1995, pp. 757-760.
  4. K. Tokuda, H. Zen, and A.W. Black, "An HMM-Based Speech Synthesis System Applied to English," Proc. IEEE Workshop Speech Synthesis, Santa Monica, CA, USA, Sept. 11-13, 2002, pp. 227-230.
  5. T. Fukada et al., "An Adaptive Algorithm for Mel-cepstral Analysis of Speech," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., San Francisco, CA, USA, Mar. 23-26, 1992, pp. 137-140.
  6. K. Tokuda et al., "Speech Parameter Generation Algorithm for HMM-Based Speech Synthesis," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Istanbul, Turkey, June 5-9, 2000, pp. 1315-1318.
  7. T. Yoshimura et al., "Mixed Excitation for HMM-Based Speech Synthesis," Proc. Eurospeech, Aalborg, Denmark, Sept. 3-7, 2001, pp. 2263-2266.
  8. S.-J. Kim, J.-J. Kim, and M. Hahn, "HMM-Based Korean Speech Synthesis System for Hand-Held Devices," IEEE Trans. Consum. Electron., vol. 52, no. 4, Nov. 2006, pp. 1384-1390. https://doi.org/10.1109/TCE.2006.273160
  9. S.-J. Kim, J.-J. Kim, and M. Hahn, "Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System," IEICE Trans. Inf. Syst., vol. E89-D, no. 3, Mar. 2006, pp. 1116-1119. https://doi.org/10.1093/ietisy/e89-d.3.1116
  10. S.-J. Kim and M. Hahn, "Two-Band Excitation for HMM-Based Speech Synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no 1, Jan. 2007, pp. 378-381. https://doi.org/10.1093/ietisy/e90-1.1.378
  11. S. Han, S. Jeong, and M. Hahn, "Optimum MVF Estimation-Based Two-Band Excitation for HMM-Based Speech Synthesis," ETRI J., vol. 31, no. 4, Aug. 2009, pp. 457-459. https://doi.org/10.4218/etrij.09.0209.0112
  12. H. Zen et al., "Details of Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no 1. Jan. 2007, pp. 325-333. https://doi.org/10.1093/ietisy/e90-1.1.325
  13. X. Huang, A. Acreo, and H.-W. Hon, "Spoken Language Processing: A Guide to Theory, Algorithm, and System Development," Prentice Hall: New Jersey, 2001, pp. 840-842.