- Volume 16 Issue 10
DOI QR Code
Semantic Extention Search for Documents Using the Word2vec
Word2vec을 활용한 문서의 의미 확장 검색방법
- Received : 2016.09.27
- Accepted : 2016.10.10
- Published : 2016.10.28
Conventional way to search documents is keyword-based queries using vector space model, like tf-idf. Searching process of documents which is based on keywords can make some problems. it cannot recogize the difference of lexically different but semantically same words. This paper studies a scheme of document search based on document queries. In particular, it uses centrality vectors, instead of tf-idf vectors, to represent query documents, combined with the Word2vec method to capture the semantic similarity in contained words. This scheme improves the performance of document search and provides a way to find documents not only lexically, but semantically close to a query document.
Semantic Search;Document Feature Vector;Vector Space Model;Word2vec
Supported by : 한국철도기술연구원
- S. Brin and L. Page, "The Anatomy of a Large-scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems, Vol.33, pp.107-117, 1998.
- T. Mikolov, K. Chen, G. Corrado, and J, Dean "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and theier compositionality," Advances in neural information processing systems, 2013.
- Yoshua Bengio, New distributed probabilistic language models. Dept. IRO, University de Montreal, Montreal, QC, Canada, Tech. Rep, 1215, 2002.
- Yoshua Bengio and Samy Bengio, "Modeling high-dimensional discrete data with multi-layer neural networks," In NIPS, Vol.99, pp.400-406, 1999.
- Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin, "A neural probabilistic language model," The Journal of Machine Learning Research, Vol.3, pp.1137-1155, 2003.
- Yoshua Bengio and Jean-Sebastien Senecal, et al. Quick training of probabilistic neural nets by importance sampling, In AISTATS Conference, 2003.
- Gerard Salton, Anita Wong, and Chung-Shu Yang, "A vector space model for automatic indexing," Communication of the ACM, Vol.18, No.11, pp.613-620, 1975. https://doi.org/10.1145/361219.361220
- David Dubin, The most inuential paper gerard salton never wrote, 2004.
- Ronan Collobert and Jason Weston, A unied architecture for natural language processing: Deep neural networks with multitask learning, In Proceedings of the 25th international conference on Machine learning, pp.160-167, ACM, 2008.