Development of articulatory estimation model using deep neural network

You, Heejo;Yang, Hyungwon;Kang, Jaekoo;Cho, Youngsun;Hwang, Sung Hah;Hong, Yeonjung;Cho, Yejin;Kim, Seohyun;Nam, Hosung;

doi:10.13064/KSSS.2016.8.3.031

말소리와 음성과학 (Phonetics and Speech Sciences)

제8권3호
/
Pages.31-38
/
2016
/
2005-8063(pISSN)
/
2586-5854(eISSN)

한국음성학회 (Korean Society of Speech Sciences)

DOI QR Code

심층신경망을 이용한 조음 예측 모형 개발

Development of articulatory estimation model using deep neural network

유희조 (고려대학교 심리학과) ;
양형원 (고려대학교 영어영문학과) ;
강재구 (고려대학교 영어영문학과) ;
조영선 (고려대학교 영어영문학과) ;
황성하 (고려대학교 영어영문학과) ;
홍연정 (고려대학교 영어영문학과) ;
조예진 (고려대학교 영어영문학과) ;
김서현 (고려대학교 영어영문학과) ;
남호성 (고려대학교)

투고 : 2016.05.30
심사 : 2016.09.20
발행 : 2016.09.30

https://doi.org/10.13064/KSSS.2016.8.3.031 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Speech inversion (acoustic-to-articulatory mapping) is not a trivial problem, despite the importance, due to the highly non-linear and non-unique nature. This study aimed to investigate the performance of Deep Neural Network (DNN) compared to that of traditional Artificial Neural Network (ANN) to address the problem. The Wisconsin X-ray Microbeam Database was employed and the acoustic signal and articulatory pellet information were the input and output in the models. Results showed that the performance of ANN deteriorated as the number of hidden layers increased. In contrast, DNN showed lower and more stable RMS even up to 10 deep hidden layers, suggesting that DNN is capable of learning acoustic-articulatory inversion mapping more efficiently than ANN.

키워드

참고문헌

Ghosh, P. K. & Narayanan, S. (2011). Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. The Journal of the Acoustical Society of America, 130(4), EL251-EL257. https://doi.org/10.1121/1.3634122
Sondhi, M. M. & Resnick, J. R. (1983). The inverse problem for the vocal tract: Numerical methods, acoustical experiments, and speech synthesis. The Journal of the Acoustical Society of America, 73(3), 985-1002. https://doi.org/10.1121/1.389024
Wilson, I., Gick, B., O'Brien, M. G., Shea, C., & Archibald, J. (2006). Ultrasound technology and second language acquisition research. Proceedings of the 8th Generative Approaches to Second Language Acquisition Conference (GASLA 2006) (pp. 148-152).
Wrench, A. A., Gibbon, F., McNeill, A. M., & Wood, S. (2002). An EPG therapy protocol for remediation and assessment of articulation disorders. ICSLP.
Dusan, S. (2001). Methods for integrating phonetic and phonological knowledge in speech inversion. Proceedings of the International Conference on Speech, Signal and Image Processing. Malta.
Engwall, O. (2006). Evaluation of speech inversion using an articulatory classifier. Proceedings of the 7th International Seminar on Speech Production (pp. 469-476).
Papcun, G., Hochberg, J., Thomas, T. R., Laroche, F., Zacks, J., & Levy, S. (1992). Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. The Journal of the Acoustical Society of America, 92(2), 688-700. https://doi.org/10.1121/1.403994
Zacks, J. & Thomas, T. R. (1994). A new neural network for articulatory speech recognition and its application to vowel identification. Computer Speech & Language, 8(3), 189-209. https://doi.org/10.1006/csla.1994.1009
Richmond, K. (2001). Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. Proceedings of Workshop on Innovation in Speech Processing (WISP 2001) (pp. 259-276).
Qin, C. & Carreira-Perpinan, M. A. (2010). Articulatory inversion of american english /r/ by conditional density modes. Proceedings of 11th Annual Conference of the International Speech Communication Association (Interspeech 2010) (pp. 1998-2001)
Richmond, K., Hoole, P., & King, S. (2011). Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus. Proceedings of 12th Annual Conference of the International Speech Communication Association (Interspeech 2011) (pp. 1505-1508).
Mitra, V., Nam, H., Espy-Wilson, C., Saltzman, E., & Goldstein, L. (2011). Articulatory information for noise robust speech recognition. Audio, Speech, and Language Processing, IEEE Transaction on Audio, Speech, and Language Processing, 19(7), 1913-1924. https://doi.org/10.1109/TASL.2010.2103058
Najnin, S. & Banerjee, B. (2015). Improved speech inversion using general regression neural network. The Journal of the Acoustical Society of America,138(3), EL229-EL235. https://doi.org/10.1121/1.4929626
Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology, 49(11), 1225-1231. https://doi.org/10.1016/S0895-4356(96)00002-9
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554. https://doi.org/10.1162/neco.2006.18.7.1527
Simpson, A. J. (2015). Taming the ReLU with Parallel Dither in a Deep Neural Network (arXiv preprint). Retrieved from http://arxiv.org/abs/1509.05173 on September 17, 2015

말소리와 음성과학 (Phonetics and Speech Sciences)

심층신경망을 이용한 조음 예측 모형 개발

Development of articulatory estimation model using deep neural network

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)