Acknowledgement
이 논문은 한국외국어대학교 교원연구지원사업, 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. 2020R1A2C1013162).
References
- AIHub. (2022a). Korean voice data for educational Asian language users. Retrieved from https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=71479
- AIHub. (2022b). Korean voice data from native Chinese and Japanese speakers for educational purposes. Retrieved from https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=71490
- AIHub. (2022c). Korean speech data from native European speakers for educational purposes. Retrieved from https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=71489
- AIHub. (2022d). Korean speech data from native English speakers for educational purposes. Retrieved from https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=71469
- AIHub. (2022e). News script and anchor voice data. Retrieved from https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=71557
- Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33, 12449-12460.
- Golowich, S. E., & Sun, D. X. (1998, October). A support vector/hidden Markov model approach to phoneme recognition. Proceedings of the ASA Statistical Computing Section (pp. 125-130). Dallas, TX.
- Jang, J. S., Lim, B. Y., & Kwon, H. Y. (2023). Multimodal learning model for detecting pronunciation error segments of childrens and foreigners speech data. Korean Institute of Information Scientists and Engineers, 29(8), 396-401. https://doi.org/10.5626/KTCP.2023.29.8.396
- Kannadaguli, P., & Bhat, V. (2015, March). A comparison of Gaussian mixture modeling (GMM) and hidden Markov modeling (HMM) based approaches for automatic phoneme recognition in Kannada. Proceedings of 2015 International Conference on Signal Processing and Communication (ICSC) (pp. 425-430). Noida, India.
- Kim, E. (2006). A study on the diagnosis & evaluation for pronunciation errors of Korean language learners. Korean Language Education, 17(1), 71-99.
- Kim, E., Jeon, J. J., Seo, H., & Kim, H. (2022a). Automatic pronunciation assessment using self-supervised speech representation learning. arXiv, https://doi.org/10.48550/arXiv.2204.03863
- Kim, J., & Kang, P. (2021). K-wav2vec 2.0: Automatic speech recognition based on joint decoding of graphemes and syllables. arXiv, https://doi.org/10.48550/arXiv.2110.05172
- Kim, S. Y., Min, H., & Choi, H. W. (2022b). A strategic design and construction of a non-native voice data set of Korean speech for AI model training. Journal of Languistics Science, 100, 63-88. https://doi.org/10.21296/jls.2022.3.100.63
- Korzekwa, D., Lorenzo-Trueba, J., Zaporowski, S., Calamaro, S., Drugman, T., & Kostek, B. (2021, June). Mispronunciation detection in non-native (L2) English with uncertainty modeling. Proceedings of ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8135-8139). Toronto, Canada.
- Leung, W. K., Liu, X., & Meng, H. (2019, May). CNN-RNN-CTC based end-to-end mispronunciation detection and diagnosis. Proceedings of ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8132-8136). Brighton, UK.
- Lin, B., & Wang, L. (2023, October-November). Multi-accent pronunciation assessment based on domain adversarial training. Proceedings of 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 2424-2428). Taipei, Taiwan.
- Oh, C., Kim, C., & Park, K. (2023). Building robust Korean speech recognition model by fine-tuning large pretrained model. Phonetics and Speech Sciences, 15(3), 75-82. https://doi.org/10.13064/KSSS.2023.15.3.075
- Park, K. (2019). g2pK: g2p module for Korean [Computer program]. Retrieved from https://github.com/Kyubyong/g2pk
- Peng, L., Fu, K., Lin, B., Ke, D., & Zhang, J. (2021, August-September). A study on fine-tuning wav2vec2.0 model for the task of mispronunciation detection and diagnosis. Interspeech(pp. 4448-4452). Brno, Czechia.
- Radford, A., Kim, J. W., Xu, T., Brockman, G., Mcleavey, C., & Sutskever, I. (2023, Jul). Robust speech recognition via large-scale weak supervision. Proceedings of the 40th International Conference on Machine Learning (ICML) (pp. 28492-28518). Honolulu, HI.
- Ravanelli, M., Parcollet, T., & Bengio, Y. (2019, May). The pytorch-kaldi speech recognition toolkit. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465-6469). Brighton, UK.
- Ryu, H., Hong, H., Kim, S., & Chung, M. (2016, December). Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection. Proceedings of 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). Jeju, Korea.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. Proceedings of 31st Conference on Neural Information Processing Systems (NIPS). CA, USA.
- Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Soplin, N. E. Y., ...Ochiai, T. (2018). ESPnet: End-to-end speech processing toolkit. arXiv. https://doi.org/10.48550/arXiv.1804.00015.
- Xu, Q., Baevski, A., & Auli, M. (2021). Simple and effective zero-shot cross-lingual phoneme recognition. arXiv. https://doi.org/10.48550/arXiv.2109.11680.
- Yang, S. H., & Chung, M. (2014). Prediction of Chinese learners' Korean pronunciation variations based on contrastive analysis. Annual Conference on Human and Language Technology (pp. 206-210).
- Yang, M., Hirschi, K., Looney, S. D., Kang, O., & Hansen, J. H. L. (2022). Improving mispronunciation detection with wav2vec2-based momentum pseudo-labeling for accentedness and intelligibility assessment. arXiv. https://doi.org/10.48550/arXiv.2203.15937.
- Zahran, A. I., Fahmy, A. A., Wassif, K. T., & Bayomi, H. (2023). Fine-tuning self-supervised learning models for end-to-end pronunciation scoring. IEEE Access, 11, 112650-112663. https://doi.org/10.1109/ACCESS.2023.3317236