Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim;

doi:10.4218/etrij.2023-0354

ETRI Journal

Volume 46 Issue 1
/
Pages.127-136
/
2024
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

HyunJung Choi (Department of Artificial Intelligence, University of Science and Technology) ;
Muyeol Choi (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Seonhui Kim (Department of Artificial Intelligence, University of Science and Technology) ;
Yohan Lim (Department of Artificial Intelligence, University of Science and Technology) ;
Minkyu Lee (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Seung Yun (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Donghyun Kim (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
Sang Hun Kim (Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute)

Received : 2023.08.27
Accepted : 2023.12.20
Published : 2024.02.20

https://doi.org/10.4218/etrij.2023-0354 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

Keywords

Acknowledgement

This study was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government (23ZS1100, Core Technology Research for Self-improving Integrated Artificial Intelligence Systems).

References

J.-U. Bang, M.-Y. Choi, S.-H. Kim, and O.-W. Kwon, Automatic construction of a large-scale speech recognition database using multi-genre broadcast data with inaccurate subtitle timestamps, IEICE Trans. Inform. Syst. 103 (2020), no. 2, 406-415.
J.-U. Bang, J.-G. Maeng, J. Park, S. Yun, and S.-H. Kim, English-Korean speech translation corpus (enkost-c): construction procedure and evaluation results, ETRI J. 45 (2023), no. 1, 18-27.
J. Chun, C. Jo, J. Lee, and M.-W. Koo. Number normalization in Korean using the transformer model, KIISE 48 (2021), no. 5, 510-517. https://doi.org/10.5626/JOK.2021.48.5.510
Y. Choi, Y. Jung, Y. Kim, Y. Suh, and H. Kim, An end-to-end synthesis method for Korean text-to-speech systems, Phonet. Speech Sci. 10 (2018), no. 1, 39-48.
M. Sunkara, C. Shivade, S. Bodapati, and K. Kirchhoff, Neural inverse text normalization, (ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada), 2021, pp. 7573-7577.
M. Mohri, Weighted finite-state transducer algorithms. An overview, Formal Lang. Appl. 2004 (2004), 551-563.
L. Pandey, D. Paul, P. Chitkara, Y. Pang, X. Zhang, K. Schubert, M. Chou, S. Liu, and Y. Saraf, Improving data driven inverse text normalization using data augmentation, arXiv preprint, 2022, DOI 10.48550/arXiv.2207.09674
D. Paul, Y. Pang, S.-J. Chen, and X. Zhang, Improving data driven inverse text normalization using data augmentation and machine translation, (Proc. Interspeech, Incheon, Rep. of Korea), 2022, pp. 5221-5222.
Y. Gaur, N. Kibre, J. Xue, K. Shu, Y. Wang, I. Alphanso, J. Li, and Y. Gong, Streaming, fast and accurate on-device inverse text normalization for automatic speech recognition, (IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar), 2023, pp. 237-244.
M. Ihori, H. Sato, T. Tanaka, R. Masumura, S. Mizuno, and N. Hojo, Transcribing speech as spoken and written dual text using an autoregressive model, (Proc. Interspeech, Dublin, Ireland), 2023, DOI 10.21437/Interspeech.2023-1655.
M. Ihori, A. Takashima, and R. Masumura, Parallel corpus for Japanese spoken-to-written style conversion, (Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France), 2020, pp. 6346-6353.
J. Guo, T. N. Sainath, and R. J. Weiss, A spelling correction model for end-to-end speech recognition, (ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Bridhton, UK), 2019, pp. 5651-5655.
O. Hrinchuk, M. Popova, and B. Ginsburg, Correction of automatic speech recognition with transformer sequence-to-sequence model, (ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain), 2020, pp. 7074-7078.
C. Park, J. Seo, S. Lee, C. Lee, H. Moon, S. Eo, and H.-S. Lim, BTS: back transcription for speech-to-text post-processor using text-to-speech-to-text, (Proceedings of the 8th Workshop on Asian Translation (WAT2021)), 2021, pp. 106-116.
J.-U. Bang, S. Yun, S.-H. Kim, M.-Y. Choi, M.-K. Lee, Y.-J. Kim, D.-H. Kim, J. Park, Y.-J. Lee, and S.-H. Kim, Ksponspeech: Korean spontaneous speech corpus for automatic speech recognition, Appl. Sci. 10 (2020), no. 19, DOI 10.3390/app10196936.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, Adv. Neural Inform. Process. Syst. 30 (2017).
S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, and N. Chen, Espnet: end-to-end speech processing toolkit, arXiv preprint, 2018, DOI 10.48550/arXiv.1804.00015
AIHub, Aihub Korean lecture speech dataset, 2020. Last accessed on August 27, 2023.
ETRI, Etri Korean common speech dataset, 2004. Last accessed on August 27, 2023.
Y.-I. Jung, J.-S. Kim, S.-H. Kim, Y.-J. Lee, and A.-S. Yoon, A study on the arabic numeral reading rules in modern Korean, (Annual Conference on Human and Language Technology. Human and Language Technology), 2002, pp. 16-23.
M. Post, A call for clarity in reporting bleu scores, arXiv preprint, 2018, DOI 10.48550/arXiv.1804.08771
T. Sellam, D. Das, and A. P. Parikh, BLEURT: Learning robust metrics for text generation, (Proceedings of Annual Meeting of the Association for Computational Linguistics), 2020. DOI , 10.18653/v1/2020.acl-main.704.
D. Se, Deepl translate: the world's most accurate translator, 2017. https://www.deepl.com/translator

ETRI Journal

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)