DOI QR코드

DOI QR Code

Subword Neural Language Generation with Unlikelihood Training

  • Received : 2020.02.27
  • Accepted : 2020.03.10
  • Published : 2020.05.31

Abstract

A Language model with neural networks commonly trained with likelihood loss. Such that the model can learn the sequence of human text. State-of-the-art results achieved in various language generation tasks, e.g., text summarization, dialogue response generation, and text generation, by utilizing the language model's next token output probabilities. Monotonous and boring outputs are a well-known problem of this model, yet only a few solutions proposed to address this problem. Several decoding techniques proposed to suppress repetitive tokens. Unlikelihood training approached this problem by penalizing candidate tokens probabilities if the tokens already seen in previous steps. While the method successfully showed a less repetitive generated token, the method has a large memory consumption because of the training need a big vocabulary size. We effectively reduced memory footprint by encoding words as sequences of subword units. Finally, we report competitive results with token level unlikelihood training in several automatic evaluations compared to the previous work.

Keywords

References

  1. Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi, The Curious Case of Neural Text Degeneration, in Proc. of International Conference on Learning Representations, 2020.
  2. Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston, Neural Text Generation with Unlikelihood Training, In Proc. of International Conference on Learning Representations, 2020.
  3. OpenAI, Language models are unsupervised multitask learners. https://openai.com/blog/better-language-models/
  4. Alexander M. Rush, Yin-Wen Chang, and Michael Collins, Optimal Beam Search for Machine Translation, in Proc. of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 210-221, Oct. 18-21, 2013.
  5. Angela Fan, Mike Lewis, and Yann Dauphin, Hierarchical Neural Story Generation, in Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 889-898, July 15-20, 2018. DOI: https://doi.org/10.18653/v1/P18-1082
  6. Liang Huang, Kai Zhao, and Mingbo Ma, When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size), in Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2134-2139, September 7-11, 2017. DOI: https://doi.org/10.18653/v1/D17-1227
  7. Rico Sennrich, Barry Haddow, Alexandra Birch, Neural Machine Translation of Rare Words with Subword Units, in Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715-1725, August, 2016. DOI: https://doi.org/10.18653/v1/P16-1162
  8. Taku Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, in Proc. of 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 66-75, July 15-20, 2018. DOI: https://doi.org/10.18653/v1/P18-1007
  9. Rohan Chitnis, John DeNero, Variable-Length Word Encodings for Neural Translation Models," in Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2088-2093, September 17-21, 2015. DOI: https://doi.org/10.18653/v1/D15-1249
  10. Philip Gage, A New Algorithm for Data Compression, C users Journal, Vol. 12, No. 2, pp. 23-382, June 1994. DOI: https://dl.acm.org/doi/10.5555/177910.177914
  11. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, in Proc. of the 2017 Advances in Neural Information Processing Systems, pages 5998-6008, 2017. DOI: https://dl.acm.org/doi/10.5555/3295222.3295349
  12. Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher, Pointer sentinel mixture models, in Proc. of International Conference on Learning Representations, 2017.