Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model

• Journal title : Phonetics and Speech Sciences
• Volume 7, Issue 4,  2015, pp.3-8
• Publisher : The Korean Society of Speech Sciences
• DOI : 10.13064/KSSS.2015.7.4.003
Title & Authors
Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model
Kim, Kwang-Ho; Lee, Donghyun; Lim, Minkyu; Kim, Ji-Hwan;

Abstract
In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google`s Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-$\small{{\left|V\right|}}$ coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).
Keywords
deep neural network;language model;continuous word vector;input dimension reduction;
Language
Korean
Cited by
References
1.
Bengio, Y., Ducharme, R., Vincent, P. and Jauvin, C. (2003). A neural probabilistic language model, Journal of Machine Learning Research, Vol. 3, 1137-1155.

2.
Bengio, Y. (2009). Learning deep architectures for AI, Journal of Foundations and Trends in Machine Learning, Vol. 2, No. 1, 1-127.

3.
Schwenk, H. & Gauvain, J. (2005). Training neural network language models on very large corpora, in Proc. Empirical Methods in Natural Language Processing, 201-208.

4.
Arisoy, E., Sainath, T., Kingsbury, B. and Ramabhadran, B. (2012). Deep neural network language models, in Proc. NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 20-28.

5.
Turney, P. & Pantel, P. (2010) From frequency to meaning: vector space models of semantics, Journal of Artificial Intelligence Research, Vol. 37, No. 1, 141-188.

6.
Schutze, H. & Pedersen, J. (1995). Information retrieval based on word sense, in Proc. Symposium on Document Analysis and Information Retrieval, 161-175.

7.
Rubenstein, H. & Goodenough, J. (1965) Contextual correlates of synonymy, Communications of the ACM, Vol. 8, No. 10, 627-633.

8.
Bruni, E., Boleda, G., Baroni, M. and Tran, N. (2012). Distributional semantics in technicolor, in Proc. 50th Annual Meeting of the Associations for Computational Linguistics, 136-145.

9.