S2-Net: Machine reading comprehension with SRU-based self-matching networks

  • 투고 : 2017.11.15
  • 심사 : 2018.12.03
  • 발행 : 2019.06.03


Machine reading comprehension is the task of understanding a given context and finding the correct response in that context. A simple recurrent unit (SRU) is a model that solves the vanishing gradient problem in a recurrent neural network (RNN) using a neural gate, such as a gated recurrent unit (GRU) and long short-term memory (LSTM); moreover, it removes the previous hidden state from the input gate to improve the speed compared to GRU and LSTM. A self-matching network, used in R-Net, can have a similar effect to coreference resolution because the self-matching network can obtain context information of a similar meaning by calculating the attention weight for its own RNN sequence. In this paper, we construct a dataset for Korean machine reading comprehension and propose an $S^2-Net$ model that adds a self-matching layer to an encoder RNN using multilayer SRU. The experimental results show that the proposed $S^2-Net$ model has performance of single 68.82% EM and 81.25% F1, and ensemble 70.81% EM, 82.48% F1 in the Korean machine reading comprehension test dataset, and has single 71.30% EM and 80.37% F1 and ensemble 73.29% EM and 81.54% F1 performance in the SQuAD dev dataset.


연구 과제번호 : Development of Knowledge Evolutionary WiseQA Platform Technology for Human Knowledge Augmented Services

연구 과제 주관 기관 : Institute for Information & Communications Technology Promotion (IITP)


  1. P. Rajpurkar et al., SQuAD: 100,000+ questions for machine comprehension of text, arXiv preprint arXiv:1606.05250, 2016.
  2. F. Hill et al., The goldilocks principle: reading children's books with explicit memory representations, arXiv preprint arXiv:1511.02301, 2015.
  3. T. Nguyen et al., MS MARCO: A human generated machine reading comprehension dataset, arXiv preprint arXiv:1611.09268, 2016.
  4. D. Chen et al., Reading Wikipedia to answer open-domain questions, arXiv preprint arXiv:1704.00051, 2017.
  5. D. Weissenborn, G. Wiese, and L. Seiffe, Making neural QA as simple as possible but not simpler, in Proc. 21st Conf. Comput. Nat. Lang. Learning (CoNLL 2017), Vancouver, Canada, 2017.
  6. W. Wang et al., Gated self‐matching networks for reading comprehension and question answering, in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, Vancouver, Canada, July 2017, pp. 189-198.
  7. Y. Cui et al., Attention-over-attention neural networks for reading comprehension, arXiv preprint arXiv:1607.04423, 2016.
  8. M. Seo et al., Bidirectional attention flow for machine comprehension, arXiv preprint arXiv:1611.01603, 2016.
  9. S. Wang and J. Jiang, Machine comprehension using match-LSTM and answer pointer, arXiv preprint arXiv:1608.07905, 2016.
  10. O. Vinyals, M. Fortunato, and N. Jaitly, Pointer networks, in Adv. Neural Inform. Process. Syst., Montreal, Canada, 2015, pp. 2674-2682.
  11. D. Bahdanau et al., Neural machine translation by jointly learning to align and translate, Proc. ICLR '15, arXiv:1409.0473, 2015.
  12. K. Cho et al., Learning phrase representation using RNN encoder-decoder for statistical machine translation, in Proc. EMNLP '14, Doha, Qatar, Oct. 25-29, 2014.
  13. S. Hochreiter and J. Schmidhuber, Long short‐term memory, Neural Comput. 9 (1997), no. 8, 1735-1780.
  14. T. Lei and Y. Zhang, Training RNNs as fast as CNNs, arXiv preprint arXiv:1709.02755, 2017.
  15. C. Lee, J. Kim, and J. Kim, Korean dependency parsing using deep learning, in Proc. KIISE HCLT, 2014, pp. 87-91(in Korean).
  16. Y. Kim, Convolutional neural networks for sentence classification, in Proc. EMNLP '14, Doha, Qatar, Oct. 25-29, 2014.
  17. D. Kingma and J. Ba. Adam, A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014.
  18. K. Lee et al., Learning recurrent span representations for extractive question answering, arXiv:1611.01436, 2017.
  19. Z. Wang et al., Multi‐perspective context matching for machine comprehension, arXiv:1612.04211, 2016.
  20. Z. Chen et al., Smarnet: Teaching machines to read and comprehend like human, arXiv:1710.02772, 2017.
  21. J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, in Proc. EMNLP '14, Doha, Qatar, Onct. 25-29, 2014, pp. 1532-1543.
  22. M. E. Peters et al., Deep contextualized word presentations, Int. Conf. Learning Representations, 2018.