DOI QR코드

DOI QR Code

Latent Keyphrase Extraction Using Deep Belief Networks

Jo, Taemin;Lee, Jee-Hyong

  • Received : 2015.08.21
  • Accepted : 2015.09.24
  • Published : 2015.09.25

Abstract

Nowadays, automatic keyphrase extraction is considered to be an important task. Most of the previous studies focused only on selecting keyphrases within the body of input documents. These studies overlooked latent keyphrases that did not appear in documents. In addition, a small number of studies on latent keyphrase extraction methods had some structural limitations. Although latent keyphrases do not appear in documents, they can still undertake an important role in text mining because they link meaningful concepts or contents of documents and can be utilized in short articles such as social network service, which rarely have explicit keyphrases. In this paper, we propose a new approach that selects qualified latent keyphrases from input documents and overcomes some structural limitations by using deep belief networks in a supervised manner. The main idea of this approach is to capture the intrinsic representations of documents and extract eligible latent keyphrases by using them. Our experimental results showed that latent keyphrases were successfully extracted using our proposed method.

Keywords

Latent keyphrase;Deep belief networks;Weighted cost function;Keyphrase extraction

References

  1. G. E. Hinton, “A practical guide to training restricted boltzmann machines,” Neural Networks: Tricks of the Trade, 2012, 599-619. http://link.springer.com/chapter/10.1007/978-3-642-35289-8 32
  2. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: a CPU and GPU math expression compiler,” Proceedings of the Python for scientific computing conference, vol. 4, 2010. https://projects.scipy.org/scipy2010/slides/james bergstra theano.pdf
  3. S. Rho, B. Kim, and N. Huh, “Representative Keyword Extraction from Few Documents through Fuzzy Inference,” Journal of The Korean Institute of Intelligent Systems, vol. 11, no. 9, 2001, pp. 837-843. http://www.dbpia.co.kr/Journal/ArticleDetail/NODE01008078
  4. A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,” Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 2003. http://dl.acm.org/citation.cfm?id=1119383
  5. M. Krapivin, A. Autaeu, and M. Marchese, “Large dataset for keyphrases extraction,” Technical Report DISI-09-055, 2009. http://eprints.biblio.unitn.it/1671/
  6. S. N. Kim, O. Medelyan, M. K. Kan, and T. Baldwin, “Semeval-2010 task 5: automatic keyphrase extraction from scientific articles,” Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010. http://dl.acm.org/citation.cfm?id=1859668
  7. G. E. Hinton, and O. Simon, and T. Y. The, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, 2006, pp. 1527-1554. http://www.mitpressjournals.org/doi/abs/10.1162/neco.2006.18.7.1527 https://doi.org/10.1162/neco.2006.18.7.1527
  8. “Stop Word List 1,” Available http://www.lextek.com/manuals/onix/stopwords1.html
  9. K. Toutanova, and C. D. Manning, “Enriching the knowledge sources used in a maximum entropy part-of-speech tagger,” Association for Computational Linguistics, vol. 13, 2000. http://dl.acm.org/citation.cfm?id=1117802
  10. M. F. Porter, “An algorithm for suffix stripping. Program: electronic library and information systems,” vol. 14, no. 3, 1980, pp. 130-137. http://www.emeraldinsight.com/doi/abs/10.1108/eb046814 https://doi.org/10.1108/eb046814
  11. Z. Liu, W. Huang, Y. Zheng, and M. Sun, “Automatic keyphrase extraction via topic decomposition,” Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010. http://dl.acm.org/citation.cfm?id=1870694
  12. R. Wang, W. Liu, and C. McDonald, “Using Word Embeddings to Enhance Keyword Identification for Scientific Publications,” Databases Theory and Applications, 2015, pp. 257-268. http://link.springer.com/chapter/10.1007/978-3-319-19548-321
  13. T. Cho, H. Cho, J. Lee, and J. H. Lee, "Latent keyphrase generation by combining contextually similar primitive words," Joint 7th International Conference on Soft Computing and Intelligent Systems and The 15th International Symposium on Advanced Intelligent Systems, 2014, pp. 600-604. http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=7044871
  14. Z. Liu, X. Chen, Y. Zheng, and M. Sun, “Automatic keyphrase extraction by bridging vocabulary gap,” Proceedings of the Fifteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2011. http://dl.acm.org/citation.cfm?id=2018952
  15. T. Cho, and J. H. Lee, “Latent Keyphrase Extraction Using LDA Model,” Journal of The Korean Institute of Intelligent Systems, vol. 25, no. 2, 2015, pp. 180-185. http://www.dbpia.co.kr/Journal/ArticleDetail/NODE06277944 https://doi.org/10.5391/JKIIS.2015.25.2.180
  16. J. H. Kim, Q. Gao, and Y. I. Cho, “A Context-Awareness Modeling User Profile Construction Method for Personalized Information Retrieval System,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 14, no. 2, 2014, pp. 122-129. http://www.dbpia.co.kr/Journal/ArticleDetail/3468702 https://doi.org/10.5391/IJFIS.2014.14.2.122
  17. E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning, “Domain-specific keyphrase extraction,” Proceedings of the 16th international joint conference on artificial intelligence, 1999, pp. 668-673. http://researchcommons.waikato.ac.nz/handle/10289/1508
  18. K. Zhang, H. Xu, J. Tang, and J. Li, “Keyword extraction using support vector machine,” Proceedings of the 7th international conference on web-age information management, 2006, pp. 86-96. http://link.springer.com/chapter/10.1007/117753008
  19. C. Zhang, H. Wang, Y. Liu, D. Wu, Y. Liao, and B. Wang, “Automatic keyword extraction from documents using conditional random fields,” Journal of Computational Information System, vol. 4, no. 3, 2008, pp. 1169-1180. http://eprints.rclis.org/handle/10760/12305
  20. G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information processing & management, vol. 24, no. 5, 1988, pp. 513-523. http://www.sciencedirect.com/science/article/pii/0306457388900210 https://doi.org/10.1016/0306-4573(88)90021-0
  21. R. Mihalcea, and P. Tarau, “Textrank: bringing order into texts,” Association for Computational Linguistics, 2004. http://digital.library.unt.edu/ark:/67531/metadc30962/
  22. X. Wan, and J. Xiao, “Single Document Keyphrase Extraction Using Neighborhood Knowledge,” Association for the Advancement of Artificial Intelligence, vol. 8, 2008. http://www.aaai.org/Papers/AAAI/2008/AAAI08-136.pdf

Acknowledgement

Grant : 디지털 소상공인 지원을 위한 지역 비즈니스 전략 분석 및 맞춤형 영상홍보 창작 SW 플랫폼 개발