DOI QR코드

DOI QR Code

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development

사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구

  • 이치훈 (티쓰리큐(주) 인공지능연구소) ;
  • 이연지 (티쓰리큐(주) 인공지능연구소) ;
  • 이동희 (국민대학교 경영학부)
  • Received : 2020.09.02
  • Accepted : 2020.10.18
  • Published : 2020.10.31

Abstract

Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.

Keywords

References

  1. 임승영, 김명지, 이주열, "KorQuAD : 기계독해를 위한 한국어 질의응답 데이터셋", 한국정보과학회 학술발표논문집, 2018, 539-541.
  2. Clark, K., U. Khandelwal, O. Levy, and C. Manning, "What Does BERT Look At? An Analysis of BERT's Attention", Stanford University, Facebook AI Research, 2019. Available at https://arxiv.org/abs/1906.04341 (Accessed June 11, 2019).
  3. Devlin, J., M.W. Chang, K. Lee, and K. Toutanova, "BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding, Google AI Language", 2018. Available at https://arxiv.org/pdf/1810.04805.pdf (Accessed May 24, 2019).
  4. Dodge, J., G. Ilharco, R. Schwartz, A. Farhad, H. Hajishirzi, and N. Smith, "Fine-Tuning Pretrained Language Models : Weight Initializations, Data Orders, and Early Stopping", Cornell University, 2020. Available at https://arxiv.org/pdf/2002.06305.pdf (Accessed February 15, 2020).
  5. Ethayarajh, K., "How Contextual Are Contextualised Word Representations? Comparing The Geometry of Bert, Elmo, And Gpt2", Stanford University, 2019. Available at https://arxiv.org/abs/1909.00512 (Accessed September 2, 2019).
  6. Kobayashi, S., Contextual Augmentation : Data Augmentation By Words With Paradigmatic Relations, Preferred Networks, Inc., 2018. Available at https://arxiv.org/abs/1805.06201 (Accessed May 16, 2018).
  7. Lalande, K.M., "CS224n Final Project : SQuAD 2.0 with BERT", 2019. Available at http://web.stanford.edu/class/cs224n/reports/default/15791990.pdf (Accessed September 5, 2020).
  8. Marivate, V. and T. Sefara, Improving short text classification through global augmentation methods, CD-MAKE 2020 : Machine Learning and Knowledge Extraction, 2019, 385-399.
  9. Mohammadi, M., R. Mundra, R. Socher, L. Wang, and A. Kamat, "Natural Language Processing With Deep Learning", Stanford University, 2019. Available at http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03-neuralnets.pdf (Accessed June 10, 2020).
  10. Qin, Z., W. Mao, and Z. Zhu, "Diverse Ensembling with Bert and its variations for Question Answering on SQuAD 2.0", 2019. Available at pdfs.semanticscholar.org/728e/855946e2683dd34fe8eb165f223059cb2961.pdf (Accessed October 10, 2020).
  11. Semnani, J.S., R.K. Sadagopan, and F. Tlili, "BERTA : Fine-tuning BERT with Adapters and Data Augmentation", Standford University, 2019. Available at https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15848417.pdf (Accessed June 10, 2020).
  12. Sun, C., X. Qiu, Y. Xu, and X. Huang, How To Fine-Tune BERT For Text Classification?, Shanghai : Fudan University, 2020. Available at https://arxiv.org/pdf/1905.05583.pdf (Accessed June 10, 2020).
  13. Wang, R., H. Su, C. Wang, K. Ji, and J. Ding, "To Tune or not tune? How about the best of both worlds?", Percent Group, AI Lab. 2019. Available at https://arxiv.org/pdf/1907.05338.pdf (Accessed Oct 10, 2020).
  14. Yang, W., Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin, "Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering", 2019. Available at https://arxiv.org/pdf/1904.06652.pdf (Accessed April 14, 2019).
  15. Ying, A., "Really Paying Attention : A BERT+BiDAF Ensemble Model for Question-Answering", Standford University, 2019. Available at https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15792214.pdf (Accessed June 10, 2020).