References
- Bang, J. U., Choi, M. Y., Kim, S. H., & Kwon, O. W. (2017, August). Improving speech recognizers by refining broadcast data with inaccurate subtitle timestamps. Proceedings of the Interspeech 2017 (pp. 2929-2933). Stockholm, Sweden.
- Bang, J. U., Choi, M. Y., Kim, S. H., & Kwon, O. W. (2019, September). Extending an acoustic data-driven phone set for spontaneous speech recognition. Proceedings of the Interspeech 2019 (pp. 4405-4409). Graz, Austria.
- Chung, Y. A., Wu, C. C., Shen, C. H., Lee, H. Y., & Lee, L. S. (2016, September). Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder. Proceedings of the Interspeech 2016 (pp. 410-415). San Francisco, CA.
- Hain, T. (2005). Implicit modelling of pronunciation variation in automatic speech recognition. Speech Communication, 46(2), 171-188. https://doi.org/10.1016/j.specom.2005.03.008
- Killer, M., Stuker, S., & Schultz, T. (2003). Grapheme based speech recognition. Proceedings of the Eurospeech 2003 (pp. 3141-3144). Geneva, Switzerland.
- Lamel, L., Gauvain, J. L., & Adda, G. (2002). Lightly supervised and unsupervised acoustic model training. Computer Speech and Language, 16(1), 115-129. https://doi.org/10.1006/csla.2001.0186
- Lee, K. N., & Chung, M. (2003, January). Modeling cross-morpheme pronunciation variations for Korean large vocabulary continuous speech recognition. Proceedings of the Eurospeech 2003 (pp. 261-264). Geneva, Switzerland.
- MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281-297). Berkeley, CA.
- Mitra, V., Vergyri, D., & Franco, H. (2016, September). Unsupervised learning of acoustic units using autoencoders and Kohonen nets. Proceedings of the Interspeech 2016 (pp. 1300-1304). San Francisco, CA.
- Nakamura, M., Iwano, K., & Furui, S. (2008). Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Computer Speech and Language, 22(2), 171-184. https://doi.org/10.1016/j.csl.2007.07.003
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., ... Vesely, K. (2011). The Kaldi speech recognition toolkit. IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU). Hawaii.
- Sainath, T. N., Prabhavalkar, R., Kumar, S., Lee, S., Kannan, A., Rybach, D., Schoglo, V., ... Chiu, C. C. (2018, April). No need for a lexicon? Evaluating the value of the pronunciation lexica in end-to-end models. Proceedings of the International Conference on Acoustics, Speech, Signal Processing (pp. 5859-5863). Calgary, Canada.
- Sak, H., Senior, A., & Beaufays, F. (2014, September). Long shortterm memory recurrent neural network architectures for large scale acoustic modeling. Proceedings of the Interspeech 2014 (pp. 338-342). Singapore.
- Stolcke, A. (2002, September). SRILM-an extensible language modeling toolkit. Proceedings of the Interspeech 2002 (pp. 901-904). Denver, CO.
- Young, S. J., Odell, J. J., & Woodland, P. C. (1994, March). Tree-based state tying for high accuracy acoustic modelling. Proceedings of the Workshop on Human Language Technology (pp. 307-312). Plainsboro, NJ.