DOI QR코드

DOI QR Code

Automatic pronunciation assessment of English produced by Korean learners using articulatory features

조음자질을 이용한 한국인 학습자의 영어 발화 자동 발음 평가

  • Received : 2016.11.03
  • Accepted : 2016.12.07
  • Published : 2016.12.31

Abstract

This paper aims to propose articulatory features as novel predictors for automatic pronunciation assessment of English produced by Korean learners. Based on the distinctive feature theory, where phonemes are represented as a set of articulatory/phonetic properties, we propose articulatory Goodness-Of-Pronunciation(aGOP) features in terms of the corresponding articulatory attributes, such as nasal, sonorant, anterior, etc. An English speech corpus spoken by Korean learners is used in the assessment modeling. In our system, learners' speech is forced aligned and recognized by using the acoustic and pronunciation models derived from the WSJ corpus (native North American speech) and the CMU pronouncing dictionary, respectively. In order to compute aGOP features, articulatory models are trained for the corresponding articulatory attributes. In addition to the proposed features, various features which are divided into four categories such as RATE, SEGMENT, SILENCE, and GOP are applied as a baseline. In order to enhance the assessment modeling performance and investigate the weights of the salient features, relevant features are extracted by using Best Subset Selection(BSS). The results show that the proposed model using aGOP features outperform the baseline. In addition, analysis of relevant features extracted by BSS reveals that the selected aGOP features represent the salient variations of Korean learners of English. The results are expected to be effective for automatic pronunciation error detection, as well.

Keywords

References

  1. Alderson, C. J., Wall, D., & Claphaim, C. (1996). Language Test Construction and Evaluation. Cambridge: Cambridge University Press.
  2. Cheun, S. (2004). Phonology. Seoul: Seoul National University Press. (전상범 (2004). 음운론. 서울: 서울대학교 출판부.)
  3. Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York: Harper & Row.
  4. Cincarek, T., Gruhn, R., Hacker, C., Noth, E., & Nakamura, S. (2009). Automatic pronunciation scoring of words and sentences independent from the non-native's first language. Computer Speech & Language, 23(1), 65-88. https://doi.org/10.1016/j.csl.2008.03.001
  5. Cucchiarini, C., Strik, H., & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30(2-3), 109-119. https://doi.org/10.1016/S0167-6393(99)00040-0
  6. Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. Journal of the Acoustical Society of America, 107(2), 989-999. https://doi.org/10.1121/1.428279
  7. Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners' fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862-2873. https://doi.org/10.1121/1.1471894
  8. Downey, R., Farhady, H., Present-Thomas, R., Suzuki, M., & Van Moere, A. (2008). Evaluation of the Usefulness of the Versant for English Test: A Response. Language Assessment Quarterly, 5(2), 160-167. https://doi.org/10.1080/15434300801934744
  9. Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51(10), 832-844. https://doi.org/10.1016/j.specom.2009.04.005
  10. Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Pacific Grove, CA: Brooks/Cole Publishing.
  11. Franco, H., Neumeyer, L., Yoon, K., & Ronen, O. (1997). Automatic pronunciation scoring for language instruction. Proceedings of IEEE International Conference on the Acoustics, Speech, and Signal Processing(ICASSP) 1997 (pp. 1471-1474). Munchen, Germany. 21-24 April, 1997.
  12. Garofalo, J., Graff, D., Paul, D., & Pallett, D. (2007). CSR-1 (WSJ0) complete. Philadelphia: Linguistic Data Consortium.
  13. Hong, H., Kim, S., & Chung, M. (2011). How Korean learner's English proficiency level affects English speech production variations. Phonetics and Speech Sciences, 3(3), 115-121.
  14. Hong, H., Kim, S., & Chung, M. (2014). A corpus-based analysis of English segments produced by Korean learners. Journal of Phonetics, 46, 52-67. https://doi.org/10.1016/j.wocn.2014.06.002
  15. Hong, H., Ryu, H., & Chung, M. (2014). The relationship between segmental production by Japanese learners of Korean and pronunciation evaluation. Phonetics and Speech Sciences, 6(4), 101-108. (홍혜진.류혁수.정민화 (2014). 일본인 한국어 학습자의 분절음 실현과 발음 평가의 상관성. 말소리와 음성과학, 6(4), 101-108.) https://doi.org/10.13064/KSSS.2013.6.4.101
  16. James, J., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with application in R. New York: Springer.
  17. Jang, T. (2005). Construction of an English speech database for Korean learners of English, Language and Linguistics, 35, 292-309. (장태엽 (2005). 한국인 영어학습자의 영어음성 데이터베이스 구축에 관한 연구. 언어와 언어학, 35, 292-309)
  18. Kirchhoff, K., Fink, G. A., & Sagerer, G. (2002). Combining acoustic and articulatory feature information for robust speech recognition. Speech Communication, 37(3-4), 303-319. https://doi.org/10.1016/S0167-6393(01)00020-6
  19. Lee, C.-H. (2004). From Knowledge-Ignorant to Knowledge-Rich Modeling: A New Speech Research Paradigm for Next Generation Automatic Speech Recognition. Proceedings of INTERSPEECH 2004 (pp. 109-112). Jeju Island, Korea. 4-8 October, 2004.
  20. Lee, C.-H., Clements, M. A., Dusan, S., Fosler-Lussier, E., Johnson, K., Juang, B.-H., & Rabiner, L. R. (2007). An overview on automatic speech attribute transcription (ASAT). Proceedings of INTERSPEECH 2007 (pp. 1825-1828). Antwerp, Belgium. 27-31 August, 2007.
  21. Li, W., Li, K., Siniscalchi, S. M., Chen, N. F., & Lee, C.-H. (2016). Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-guided and Data-driven Decision Trees. Proceedings of INTERSPEECH 2016 (pp. 3127-3131). San Francisco, CA. 8-12 September, 2016.
  22. Lumley, T., & Miller, A. (2009). leaps: regression subset selection. Retrieved from https://cran.r-project.org/package=leaps on October 20, 2016.
  23. Metze, F. (2005). Articulatory features for conversational speech recognition. Ph.D. Dissertation, Universitat Fridericiana zu Karlsruhe, Munchen, Germany.
  24. Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30(2-3), 83-93. https://doi.org/10.1016/S0167-6393(99)00046-1
  25. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovky, J., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011).
  26. R Core Team (2016). R: language and environment for statistical computing. Retrieved from http://www.r-project.org on October 20, 2016.
  27. Rhee, S., Lee, S., Kang, S., & Lee, Y. (2003). Design and Construction of Korea-Spoken English Corpus (K-SEC). Malsori, 46, 159-174. (이석재.이숙향.강석근.이용주 (2003). 한국인의 영어 음성 코퍼스 설계 및 구축. 말소리, 46, 159-174.)
  28. Richardson, M., Bilmes, J., & Diorio, C. (2003). Hidden-articulator Markov models for speech recognition. Speech Communication, 41(2-3), 511-529. https://doi.org/10.1016/S0167-6393(03)00031-1
  29. Ryu, H., & Chung, M. (2016). Automatic pronunciation assessment of English spoken by Korean learners using phone-level articulatory posterior probability. Proceedings of the 2016 spring conference of the Korean society of Speech Sciences (pp. 101-102). (류혁수.정민화 (2016). 조음 기반의 음소 레벨 사후 확률을 이용한 한국인 영어 학습자 유창성 자동 평가. 한국음성학회 봄 학술대회 발표논문집, 101-102.)
  30. Ryu, H., Hong, H., Kim, S., & Chung, M. (2016). Automatic Pronunciation Assessment of Korean Spoken by L2 Learners Using Best Feature Set Selection. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPA ASC) 2016, accepted.
  31. Shi, S., Kashiwagi, Y., Toyama, S., Yue, J., Yamauchi, Y., Saito, D., & Minematsu, N. (2016) Automatic assessment and error detection of shadowing speech: case of English spoken by Japanese learners. Proceedings of INTERSPEECH 2016 (pp. 3142-3146). San Francisco, CA. 8-12 Sep, 2016.
  32. Siniscalchi, S. M., Svendsen, T., & Lee, C.-H. (2008). Toward a detector-based universal phone recognizer. Proceedings of IEEE International Conference on the Acoustics, Speech, and Signal Processing(ICASSP) 2008 (pp. 4261-4264). Las Vegas, NV. 31 March - 04 April, 2008.
  33. Tepperman, J., & Narayanan, S. (2008). Using articulatory representations to detect segmental errors in nonnative pronunciation. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 8-22. https://doi.org/10.1109/TASL.2007.909330
  34. Weide, R. L. (2014). The CMU pronouncing dictionary 0.7b. Retrieved from http://www.speech.cs.cmu.edu/cgi-bin/cmudicton October 20, 2016.
  35. Witt, S. M., & Young, S. J. (2000). Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication, 30(2-3), 95-108. https://doi.org/10.1016/S0167-6393(99)00044-8
  36. Zechner, K., Higgins, D., Xi, X. M., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883-895. https://doi.org/10.1016/j.specom.2009.04.009