Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis

Lee, Hea-Min;Kim, Hyung-Soon;

doi:10.13064/KSSS.2012.4.3.111

말소리와 음성과학 (Phonetics and Speech Sciences)

제4권3호
/
Pages.111-117
/
2012
/
2005-8063(pISSN)
/
2586-5854(eISSN)

한국음성학회 (Korean Society of Speech Sciences)

DOI QR Code

HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선

Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis

이혜민 (부산대학교) ;
김형순 (부산대학교)

투고 : 2012.07.25
심사 : 2012.09.21
발행 : 2012.09.30

https://doi.org/10.13064/KSSS.2012.4.3.111 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.

키워드

참고문헌

Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T. & Kitamura, T. (1999). Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. Proc. of Eurospeech, 2347-2350.
http://www.synsig.org/index.php/Blizzard_Challenge_2012_Workshop.
Yamagishi, J. & Kobayashi, T. (2007). Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Trans. Inf. Syst. E90-D(2), 533-543. https://doi.org/10.1093/ietisy/e90-d.2.533
Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K. & Isogai, J. (2009). Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio, Speech, Language Process., 17(1), 66-83. https://doi.org/10.1109/TASL.2008.2006647
Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J. & Kobayashi, T. (2006). HSMM-based model adaptation algorithms for average-voice-based speech synthesis. Proc. ICASSP, 77-80.
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T. & Kitamura, T. (2004). Hidden semi-Markov model based speech synthesis. Proc. of ICSLP, 1397-1400.
Shinoda K. & Lee, C.-H. (2001). A structural Bayes approach to speaker adaptation. IEEE Trans. Speech, Audio Process., 9(3), 276-287. https://doi.org/10.1109/89.906001
Yamagishi, J. & Kobayashi, T. (2005). Adaptive training for Hidden semi-Markov model. Proc. ICASSP, 365-368.
http://hts.sp.nitech.ac.jp/archives/2.2/HTS-demo_CMU-ARCTICSLT_STRAIGHT.tar.bz2.
Lee, H. & Kim, H. S. (2012). Performance comparison of speaker adaptation methods for HMM-based Korean speech synthesis system. Proc. of Spring Conference of Korean Society of Speech Sciences, 241-242. (이혜민, 김형순 (2012). HMM 기반의 한국어 음성합성에서의 화자적응 방식 성능 비교. 한국음성학회 봄 학술대회, 241-242.)

말소리와 음성과학 (Phonetics and Speech Sciences)

HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선

Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)