Latent Keyphrase Extraction Using Deep Belief Networks

Jo, Taemin;Lee, Jee-Hyong

  • Received : 2015.08.21
  • Accepted : 2015.09.24
  • Published : 2015.09.25


Nowadays, automatic keyphrase extraction is considered to be an important task. Most of the previous studies focused only on selecting keyphrases within the body of input documents. These studies overlooked latent keyphrases that did not appear in documents. In addition, a small number of studies on latent keyphrase extraction methods had some structural limitations. Although latent keyphrases do not appear in documents, they can still undertake an important role in text mining because they link meaningful concepts or contents of documents and can be utilized in short articles such as social network service, which rarely have explicit keyphrases. In this paper, we propose a new approach that selects qualified latent keyphrases from input documents and overcomes some structural limitations by using deep belief networks in a supervised manner. The main idea of this approach is to capture the intrinsic representations of documents and extract eligible latent keyphrases by using them. Our experimental results showed that latent keyphrases were successfully extracted using our proposed method.


Latent keyphrase;Deep belief networks;Weighted cost function;Keyphrase extraction


  1. G. E. Hinton, “A practical guide to training restricted boltzmann machines,” Neural Networks: Tricks of the Trade, 2012, 599-619. 32
  2. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: a CPU and GPU math expression compiler,” Proceedings of the Python for scientific computing conference, vol. 4, 2010. bergstra theano.pdf
  3. S. Rho, B. Kim, and N. Huh, “Representative Keyword Extraction from Few Documents through Fuzzy Inference,” Journal of The Korean Institute of Intelligent Systems, vol. 11, no. 9, 2001, pp. 837-843.
  4. A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,” Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 2003.
  5. M. Krapivin, A. Autaeu, and M. Marchese, “Large dataset for keyphrases extraction,” Technical Report DISI-09-055, 2009.
  6. S. N. Kim, O. Medelyan, M. K. Kan, and T. Baldwin, “Semeval-2010 task 5: automatic keyphrase extraction from scientific articles,” Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010.
  7. G. E. Hinton, and O. Simon, and T. Y. The, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, 2006, pp. 1527-1554.
  8. “Stop Word List 1,” Available
  9. K. Toutanova, and C. D. Manning, “Enriching the knowledge sources used in a maximum entropy part-of-speech tagger,” Association for Computational Linguistics, vol. 13, 2000.
  10. M. F. Porter, “An algorithm for suffix stripping. Program: electronic library and information systems,” vol. 14, no. 3, 1980, pp. 130-137.
  11. Z. Liu, W. Huang, Y. Zheng, and M. Sun, “Automatic keyphrase extraction via topic decomposition,” Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010.
  12. R. Wang, W. Liu, and C. McDonald, “Using Word Embeddings to Enhance Keyword Identification for Scientific Publications,” Databases Theory and Applications, 2015, pp. 257-268.
  13. T. Cho, H. Cho, J. Lee, and J. H. Lee, "Latent keyphrase generation by combining contextually similar primitive words," Joint 7th International Conference on Soft Computing and Intelligent Systems and The 15th International Symposium on Advanced Intelligent Systems, 2014, pp. 600-604. all.jsp?arnumber=7044871
  14. Z. Liu, X. Chen, Y. Zheng, and M. Sun, “Automatic keyphrase extraction by bridging vocabulary gap,” Proceedings of the Fifteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2011.
  15. T. Cho, and J. H. Lee, “Latent Keyphrase Extraction Using LDA Model,” Journal of The Korean Institute of Intelligent Systems, vol. 25, no. 2, 2015, pp. 180-185.
  16. J. H. Kim, Q. Gao, and Y. I. Cho, “A Context-Awareness Modeling User Profile Construction Method for Personalized Information Retrieval System,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 14, no. 2, 2014, pp. 122-129.
  17. E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning, “Domain-specific keyphrase extraction,” Proceedings of the 16th international joint conference on artificial intelligence, 1999, pp. 668-673.
  18. K. Zhang, H. Xu, J. Tang, and J. Li, “Keyword extraction using support vector machine,” Proceedings of the 7th international conference on web-age information management, 2006, pp. 86-96.
  19. C. Zhang, H. Wang, Y. Liu, D. Wu, Y. Liao, and B. Wang, “Automatic keyword extraction from documents using conditional random fields,” Journal of Computational Information System, vol. 4, no. 3, 2008, pp. 1169-1180.
  20. G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information processing & management, vol. 24, no. 5, 1988, pp. 513-523.
  21. R. Mihalcea, and P. Tarau, “Textrank: bringing order into texts,” Association for Computational Linguistics, 2004.
  22. X. Wan, and J. Xiao, “Single Document Keyphrase Extraction Using Neighborhood Knowledge,” Association for the Advancement of Artificial Intelligence, vol. 8, 2008.


Grant : 디지털 소상공인 지원을 위한 지역 비즈니스 전략 분석 및 맞춤형 영상홍보 창작 SW 플랫폼 개발