DOI QR코드

DOI QR Code

Multi-task learning with contextual hierarchical attention for Korean coreference resolution

  • 투고 : 2021.08.25
  • 심사 : 2022.08.29
  • 발행 : 2023.02.20

초록

Coreference resolution is a task in discourse analysis that links several headwords used in any document object. We suggest pointer networks-based coreference resolution for Korean using multi-task learning (MTL) with an attention mechanism for a hierarchical structure. As Korean is a head-final language, the head can easily be found. Our model learns the distribution by referring to the same entity position and utilizes a pointer network to conduct coreference resolution depending on the input headword. As the input is a document, the input sequence is very long. Thus, the core idea is to learn the word- and sentence-level distributions in parallel with MTL, while using a shared representation to address the long sequence problem. The suggested technique is used to generate word representations for Korean based on contextual information using pre-trained language models for Korean. In the same experimental conditions, our model performed roughly 1.8% better on CoNLL F1 than previous research without hierarchical structure.

키워드

참고문헌

  1. V. Ng and C. Cardie, Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution, (Proceedings of the 19th International Conference on Computational Linguistics, Stroudsburg, PA, USA), 2002. https://doi.org/10.3115/1072228.1072367
  2. W. M. Soon, H. T. Ng, and D. C. Y. Lim, A machine learning approach to coreference resolution of noun phrases, Comput. Ling. 27 (2001), no. 4, 521-544. https://doi.org/10.1162/089120101753342653
  3. K. Lee, L. He, M. Lewis, and L. Zettlemoyer, End-to-end neural coreference resolution, (Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark), 2017, pp. 188-197. https://doi.org/10.18653/v1/D17-1018
  4. K. Lee, L. He, and L. Zettlemoyer, Higher-order coreference resolution with coarse-to-fine inference, (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 2 (short papers), New Orleans, LA, USA), 2018, pp. 687-692. https://doi.org/10.18653/v1/N18-2108
  5. D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, (3rd International Conference on Learning Representations), 2015. https://doi.org/10.48550/arXiv.1409.0473
  6. O. Vinyals, M. Fortunato, and N. Jaitly, Pointer networks, in Advances in neural information processing systems, 2015, pp. 2692-2700.
  7. K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, (Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar), 2014, pp. 1724-1734. https://doi.org/10.3115/v1/D14-1179
  8. R. Caruana, Multitask learning, Learning to learn, S. Thrun and L. Y. Pratt, (eds.), Springer, 1998, pp. 95-133. https://doi.org/10.1007/978-1-4615-5529-2_5
  9. S. Ruder, An overview of multi-task learning in deep neural networks, arXiv preprint, 2017. https://doi.org/10.48550/arXiv.1706.05098
  10. H. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky, Deterministic coreference resolution based on entity-centric, precision-ranked rules, Comput. Ling. 39 (2013), no. 4, 885-916. https://doi.org/10.1162/COLI_a_00152
  11. C. Park, K.-H. Choi, C. Lee, and S. Lim, Korean coreference resolution with guided mention pair model using deep learning, ETRI J. 38 (2016), no. 6, 1207-1217. https://doi.org/10.4218/etrij.16.0115.0896
  12. M. A. Ur Rahman and V. Ng, Supervised models for coreference resolution, (Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore), 2009, pp. 968-977. https://www.aclweb.org/anthology/D09-1101/
  13. C. Lee, S. Jung, and C.-E. Park, Anaphora resolution with pointer networks, Pattern Recognit. Lett. 95 (2017), 1-7. https://doi.org/10.1016/j.patrec.2017.05.015
  14. K. Clark and C. D. Manning, Improving coreference resolution by learning entity-level distributed representations, (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany) 2016, pp. 643-653. https://doi.org/10.18653/v1/P16-1061
  15. K. Clark and C. D. Manning, Deep reinforcement learning for mention-ranking coreference models, (Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Austin, TX, USA), 2016, pp. 2256-2262. https://doi.org/10.18653/v1/D16-1245
  16. J. Li, M.-T. Luong, and D. Jurafsky, A hierarchical neural autoencoder for paragraphs and documents, (Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, China), 2015, pp. 1106-1115. https://doi.org/10.3115/v1/P15-1107
  17. I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau, Building end-to-end dialogue systems using generative hierarchical neural network models, (Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA), 2016, pp. 3776-3784.
  18. A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. G. Simonsen, and J.-Y. Nie, A hierarchical recurrent encoder-decoder for generative context-aware query suggestion, (Proceedings of the 24th ACM International Conference on Information and Knowledge Management, New York, NY, USA), 2015, pp. 553-562. https://doi.org/10.1145/2806416.2806493
  19. R. Lin, S. Liu, M. Yang, M. Li, M. Zhou, and S. Li, Hierarchical recurrent neural network for document modeling, (Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal), 2015, pp. 899-907. https://doi.org/10.18653/v1/d15-1106
  20. R. Nallapati, B. Zhou, C. N. Dos Santos, C. Gulcehre, and B. Xiang, Abstractive text summarization using sequence-to-sequence rnns and beyond, (Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany), 2016, pp. 280-290. https://doi.org/10.18653/v1/k16-1028
  21. B. McCann, N. S. Keskar, C. Xiong, and R. Socher, The natural language decathlon: Multitask learning as question answering, arXiv preprint, 2018. http://arxiv.org/abs/1806.08730
  22. Y. Xu, X. Liu, Y. Shen, J. Liu, and J. Gao, Multi-task learning with sample re-weighting for machine reading comprehension, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 2644-2655. https://doi.org/10.18653/v1/n19-1271
  23. M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, Multi-task sequence to sequence learning, (4th International Conference on Learning Representations), 2016. http://arxiv.org/abs/1511.06114
  24. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pretraining of deep bidirectional transformers for language understanding, (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 4171-4186. https://doi.org/10.18653/v1/N19-1423
  25. K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, ELECTRA: pre-training text encoders as discriminators rather than generators, (8th International Conference on Learning Representations), 2020. https://openreview.net/forum?id%3Dr1xMH1BtvB
  26. W. Wang, N. Yang, F. Wei, B. Chang, and M. Zhou, Gated self-matching networks for reading comprehension and question answering, (Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada), 2017, pp. 189-198.
  27. T. Dozat and C. D. Manning, Deep biaffine attention for neural dependency parsing, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1611.01734
  28. T. Lei, Y. Zhang, S. I. Wang, H. Dai, and Y. Artzi, Simple recurrent units for highly parallelizable recurrence, (Proceedings of the 2018 Conference on Eempirical Methods in Natural Language Processing, Brussels, Belgium), 2018, pp. 4470-4481. https://doi.org/10.18653/v1/D18-1477
  29. D.-A. Clevert, T. Unterthiner, and S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELU s), arXiv preprint, 2015. https://doi.org/10.48550/arXiv.1511.07289
  30. G. E. Nasr, E. A. Badr, and C. Joun, Cross entropy error function in neural networks: Forecasting gasoline demand, in Proceedings of the fifteenth international florida artificial intelligence research society conference, AAAI Press, 2002, pp. 381-384. http://dl.acm.org/citation.cfm?id%3D646815.708603
  31. C. Park, C. Lee, J. Ryu, and H. Kim, Contextualized embedding-and character embedding-based pointer network for Korean coreference resolution, In Proceedings of the 30th Annual Conference on Human and Cognitive Language Technology, 2018, pp. 239-242.
  32. M. Joshi, O. Levy, L. Zettlemoyer, and D. Weld, BERT for coreference resolution: Baselines and analysis, (Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China), 2019, pp. 5803-5808. https://doi.org/10.18653/v1/D19-1588
  33. C. Park, J. Shin, S. Park, J. Lim, and C. Lee, Fast end-to-end coreference resolution for Korean, (Findings of the Association for Computational Linguistics: EMNLP, Online), 2020, pp. 2610-2624. https://doi.org/10.18653/v1/2020.findingsemnlp.237
  34. C. Park, J. Lim, J. Ryu, H. Kim, and C. Lee, Simple and effective neural coreference resolution for korean language, ERI J. 43, (2021), 1038-1048. https://doi.org/10.4218/etrij.2020-0282
  35. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint, 2014. https://doi.org/10.48550/arXiv.1412.6980