DOI QR코드

DOI QR Code

The Ability of L2 LSTM Language Models to Learn the Filler-Gap Dependency

  • Kim, Euhee (Dept. of Computer Science & Engineering, Shinhan University)
  • Received : 2020.10.06
  • Accepted : 2020.11.14
  • Published : 2020.11.30

Abstract

In this paper, we investigate the correlation between the amount of English sentences that Korean English learners (L2ers) are exposed to and their sentence processing patterns by examining what Long Short-Term Memory (LSTM) language models (LMs) can learn about implicit syntactic relationship: that is, the filler-gap dependency. The filler-gap dependency refers to a relationship between a (wh-)filler, which is a wh-phrase like 'what' or 'who' overtly in clause-peripheral position, and its gap in clause-internal position, which is an invisible, empty syntactic position to be filled by the (wh-)filler for proper interpretation. Here to implement L2ers' English learning, we build LSTM LMs that in turn learn a subset of the known restrictions on the filler-gap dependency from English sentences in the L2 corpus that L2ers can potentially encounter in their English learning. Examining LSTM LMs' behaviors on controlled sentences designed with the filler-gap dependency, we show the characteristics of L2ers' sentence processing using the information-theoretic metric of surprisal that quantifies violations of the filler-gap dependency or wh-licensing interaction effects. Furthermore, comparing L2ers' LMs with native speakers' LM in light of processing the filler-gap dependency, we not only note that in their sentence processing both L2ers' LM and native speakers' LM can track abstract syntactic structures involved in the filler-gap dependency, but also show using linear mixed-effects regression models that there exist significant differences between them in processing such a dependency.

본 논문은 장단기기억신경망(LSTM)이 영어를 배우면서 학습한 암묵적 통사 관계인 필러-갭 의존 관계를 조사하여 영어 문장 학습량과 한국인 영어 학습자(L2ers)의 문장 처리 패턴 간의 상관관계를 규명한다. 이를 위해, 먼저 장단기기억신경망 언어모델(LSTM LM)을 구축하였다. 이 모델은 L2ers가 영어 학습 과정에서 잠재적으로 배울 수 있는 L2 코퍼스의 영어 문장들로 심층학습을 하였다. 다음으로, 이 언어 모델을 이용하여 필러-갭 의존 관계 구조를 위반한 영어 문장을 대상으로 의문사 상호작용 효과(wh-licensing interaction effect) 즉, 정보 이론의 정보량인 놀라움(surprisal)의 정도를 계산하여 문장 처리 양상을 조사하였다. 또한 L2ers 언어모델과 상응하는 원어민 언어모델을 비교 분석함으로써, 두 언어모델이 문장 처리에서 필러-갭 의존 관계에 내재된 추상적 구문 구조를 추적할 수 있음을 보여주었을 뿐만 아니라, 또한 선형 혼합효과 회귀모델을 사용하여 본 논문의 중심 연구 주제인 의존 관계 처리에 있어서 원어민 언어모델과 L2ers 언어모델간 통계적으로 유의미한 차이가 존재함을 규명하였다.

Keywords

References

  1. J. L. Elman, "Distributed representations, simple recurrent networks, and grammatical structure," Machine learning, Vol. 7(2-3), pp. 195-225, Sep 1991. https://doi.org/10.1007/BF00114844
  2. Y. Goldberg, "Neural network methods for natural language processing," Synthesis lectures on Human language Technologies, Vol. 10(1), pp. 1-309, Apr 2017. https://doi.org/10.2200/S00762ED1V01Y201703HLT037
  3. S. Hochreiter and S. Jurgen, "Long short-term memory," Neural Computation, Vol. 9(8), pp. 1735-1780, Nov 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  4. E. Kim, "Sentence Comprehension with an LSTM Language Model," Journal of Digital Contents Society, Vol, 19(12), pp. 2393-2401, Dec 2018. https://doi.org/10.9728/dcs.2018.19.12.2393
  5. T. Linzen, E. Dupoux, and Y. Goldberg, "Assessing the ability of LSTMs to learn syntax-sensitive dependencies," Transactions of the Association for Computational Linguistics, Vol. 4, pp. 521-535, Dec 2016. https://doi.org/10.1162/tacl_a_00115
  6. K. Gulordava, P. Bojanowski, E. Grave, T. Linzen, and M. Baroni, "Colorless green recurrent networks dream hierarchically," NAACL-HLT, pp. 1195-1205, Jun 2018.
  7. E. Wilcox, R. Levy, T. Morita, and R. Futell, "What do RNN Language Models Learn about Filler-Gap Dependencies?," ACL Anthology, Proceedings of the 2018 EMNLP Workshop Blackbox NLP: Analyzing and Interpreting Neural Networks for NLP, pp. 211-221, Aug 2019.
  8. A. Kuncoro, C. Dyer, J. Hale, D. Yogatama, S. Clark, and P. Blunsom, "LSTMs can learn syntax-sensitive dependencies well, but modeling structure makes them better," Computational Linguistics, Vol. 1, pp. 1426-1436, Aug 2018.
  9. K. Tran, A. Bisazza, and C. Monz, "The importance of being recurrent for modeling hierarchical structure," In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Aug 2018.
  10. J. Hale, "Uncertainty about the rest of the sentence," Cognitive Science, Vol. 30(4), pp. 609-642, Jul 2006. https://doi.org/10.1207/s15516709cog0000_60
  11. J. Hale, "A probabilistic Earley parser as a psycholinguistic model," Proceedings of the Second meeting of the North American Chapter of the Association for Computational Linguistic and Language Technologies, pp. 1-8, Jun 2001.
  12. R. Levy, "Expectation-based syntactic comprehension," Cognition, Vol. 106(3), pp. 1126-1177, Mar 2008. https://doi.org/10.1016/j.cognition.2007.05.006
  13. R.H. Baayen, D.J. Davidson, and D.M. Bates, "Mixed-effects modeling with crossed random effects for subjects and items," Journal of memory and language, Vol. 59(4), pp. 390-412, Mar 2008. https://doi.org/10.1016/j.jml.2007.12.005
  14. E. Kim, M. Park, and W. Chung, "On Korean English L2ers' processing of wh-filler-gap dependencies: An ERP study," Language and Information , Vol. 21(3), pp. 1-24, Nov 2017. https://doi.org/10.29403/li.21.3.1
  15. E. Kim, "A Deep Learning-based Article- and Paragraph-level Classification," The Journal of the Korea Society of Computer and Information, pp. 31-41, Nov 2018.
  16. E. Kim, "The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean," The Journal of the Korea Society of Computer and Information, pp. 41-49, Nov 2019.