DOI QR코드

DOI QR Code

A Study on Research Trends of Graph-Based Text Representations for Text Mining

텍스트 마이닝을 위한 그래프 기반 텍스트 표현 모델의 연구 동향

  • 장재영 (한성대학교 컴퓨터공학과)
  • Received : 2013.09.24
  • Accepted : 2013.10.11
  • Published : 2013.10.31

Abstract

Text Mining is a research area of retrieving high quality hidden information such as patterns, trends, or distributions through analyzing unformatted text. Basically, since text mining assumes an unstructured text, it needs to be represented as a simple text model for analyzing it. So far, most frequently used model is VSM(Vector Space Model), in which a text is represented as a bag of words. However, recently much researches tried to apply a graph-based text model for representing semantic relationships between words. In this paper, we survey research trends of graph-based text representation models for text mining. Additionally, we also discuss about future models of graph-based text mining.

텍스트 마이닝은 비정형화된 텍스트를 분석하여 그 안에 내재된 패턴, 추세, 분포 등의 고급정보들을 추출하는 분야이다. 텍스트 마이닝은 기본적으로 비정형 데이터를 가정하므로 텍스트를 단순화된 모델로 표현하는 것이 필요하다. 현재까지 가장 많이 사용되고 있는 모델은 텍스트를 단순한 단어들의 집합으로 표현한 벡터공간 모델이다. 그러나 최근 들어 단어들의 의미적 관계까지 표현하기 위해 그래프를 이용한 텍스트 표현 모델을 많이 사용하고 있다. 본 논문에서는 텍스트 마이닝을 위한 기존의 연구 중에서 그래프에 기반한 텍스트 표현 모델의 방법들과 그들의 특징들을 기술한다. 또한 그래프 기반 텍스트 마이닝의 향후 발전방향에 대해서도 논한다.

Keywords

References

  1. G. Salton, A. Wong, and C. S. Yang , "A Vector Space Model for Automatic Indexing," Communications of the ACM, Vol. 18, Vo. 11, pp. 613-620, 1975. https://doi.org/10.1145/361219.361220
  2. G. Salton and M. J. Mcgill, Introduction to Moderm Information Retrieval, McGraw-Hill, New York, 1983.
  3. J. Wu, Z. Xuan, and D. Pan, "Enhancing Text Representation for Classification Tasks with Semantic Graph Structures", International Journal if Innovative Computing, Information Control, Vol. 7, No. 5(B), pp. 2689-2698, 2011.
  4. W. Wang, D. B. Do, and X. Lin, "Term Graph Model for Text Classification", Proceedings of the First international conference on Advanced Data Mining and Applications, pp. 19-30, 2005.
  5. K. Valle and P. Ozturk, "Graph-Based Representation for Text Classification", India-Norway Workshop on Web Concepts and Technologies, 2011.
  6. C. Jiang F. Coenen, R. Sanderson, and M. Zito, "Text Classification Using Graph Mining-Based Feature Extraction", Knowledge-Based Systems, Vol. 23, No. 4, pp. 302-308, 2009.
  7. A. Schenker, M. Last, H. Bunke, and A. Kandel, "Classification of Web Documents Using a Graph Model", 2003. Proceedings. Seventh International Conference on Document Analysis and Recognition, pp. 240-244, 2003.
  8. R. Chau, A. C. Tsoi, M. Hagenbuchner, and V. C.S. Lee, "A Concept Graph for Text Structure Mining", Proceedings of the Thirty-Second Australasian Conference on Computer Science, Vol 91, pp. 141-150, 2009.
  9. K. M. Hammouda and M S. Kamel, "Document Similarity Using a Phrase Indexing Graph Model", Knowledge and Information Systems, Vol. 6, No. 6, pp. 710-727, 2006.
  10. M. S. Hossain, R. A. Angryk, "GDClust: A Graph-Based Document Clustering Technique", Proceedings of Seventh IEEE International Conference on Data Mining Workshops, pp. 417-422, 2007.
  11. I. Yoo, X. Hu, and I.-Y. Song, "Integration of Semantic-based Bipartite Graph Representation and Mutual Refinement Strategy for Biomedical Literature Clustering", Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 791-796, 2006.
  12. M. Litvak and M. Last, "Graph-Based Keyword Extraction for Single-Document Summarization", Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17-24, 2008.
  13. J. Leskovec, M. Grobelnik, and N. Milic-Fraying, "Learning Semantic Graph Mapping for Document Summarization", Proceedings of the ECML/PKDD-2004 Workshop on Knowledge Discovery and Ontologies. 2005.
  14. G. Erkan and D. R. Radev, "LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization", Journal of Artificial Intelligence Research, Vol. 22, No. 1, pp. 457-479, 2004.
  15. S. Hariharan and R. Srinivasan, "Studies on Graph based Approaches for Single and Multi Document Summarizations", International Journal of Computer Theory and Engineering, Vol. 1, No. 5, pp. 1793-8201, 2009.
  16. C. A. Chahine, N. Chaignaud, JHP Kotowicz, and JP Pecuchet, "Context and Keyword Extraction in Plain Text Using a Graph Representation", Proceedings of the 2008 IEEE International Conference on Signal Image Technology and Internet Based Systems, pp. 692-696, 2008.
  17. R. Mihalcea and P. Tarau, "TextRank: Bringing Order into Texts", Proceedings of International Conference on Empirical Methods in Natural Language Processing, 2004.
  18. S. T. Dumais, "Latent Semantic Analysis", Annual Review of Information Science and Technology, Vol. 38, No. 1, pp. 188-230, 2004
  19. S. Hensman, "Construction of Conceptual Graph Representation of Texts", Proceedings of the Student Research Workshop at HLT-NAACL, pp. 49-54, 2004.
  20. M. Gamon, "Graph-Based Text Representation for Novelty Detection", Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing, pp. 17-24, 2006.
  21. B. Li, L. Zhou, S. Feng, and K.-F. Wong "A Unified Graph Model for Sentence-Based Opinion Retrieval" Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1367-1375, 2010.
  22. J. Tomita, H. Nakawatase, and M. Ishii, "Graph-Based Text Database for Knowledge Discovery", Proceedings of the 13th international World Wide Web conference, pp. 454-455, 2004.
  23. F. Zhou, F. Zhang, and B. Yang, "Graph-Based Text Representation Model and its Realization", Proceedings of International Conference on Natural Lan guage Processing and Knowledge Engineering, pp. 1-8, 2010.
  24. Y. Wu, Q. Zhang X. Huang, and L Wu, "Structural Opinion Mining for Graph-based Sentiment Representation", Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1332-1341, 2011.
  25. X. Wan and J. Yang, "Improved Affinity Grapg Based Multi-Document Summarization", Proceedings of the Human Language Technology Conference of the NAACL, pp. 181-184, 2006.
  26. R. Mihalcea, "Graph-Based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization", Proceedings of 3rd International Conference on Emerging Trends in Engineering and Technology(ICETET), pp. 516-519, 2010.
  27. R. Mihalcea and P. Tarau, "A Language Independent Algorithm for Single and Multiple Document Summarization", Proceedings of International Joint Conference on Natural Language Processing, 2005.
  28. L. Zhang, C. Li, J. Liu, and H. Wang, "Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge", World Academy of Science, Engineering and Technology, Issue 59, pp. 1548-1553, 2011.
  29. S. Brin and L. Page, "The Anatomy of a Large-scale Hypertextual Web Search Engine", Proceedings of the seventh International Conference on World Wide Web 7, pp. 107-117, 1998.
  30. J. M. Kleinberg, "Authoritative Sources in a Hyperlinked Environment", Journal of ACM, Vol. 45, No. 5, pp. 605-632, 1999.
  31. C. Jiang, F. Coenen, and M. Zito, "A Survey of Frequent Subgraph Mining Algorithm", The Knowledge Engineering Review, Vol. 28, Issue 1, pp. 75-105, 2012.
  32. G. Jeh and J. Widom, "SimRank: A Measure of Structural-Context Similarity", Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 538-543, 2002
  33. W.-S. Bae and J.-W Cha, "Text Categorization Using TextRank Algorithm", Journal of KIISE, Vol. 16, No. 1, pp. 110-114, 2010.
  34. J. H. Lyu and S. C. Park, "Document Summarization Method Using Complete Graph", Journal of Korea Society of Industrial Information Systems, Vol. 10, No. 2, pp. 26-31, 2005.
  35. H. K. Bae, H. Park, S. Lee, and K. Kim, "Improved Concept-based Search System Using HITS Algorithm on Conceptual Graph", Proceedings of KIISE conference, pp. 470-472, 2003.
  36. S. Cho and K. Lee, "Query Expansion Based on Word Graphs Using Pseudo Non-Relevant Documents and Term Proximity", Journal of KIPS, Vol 19B, No. 3, pp. 189-194, 2012. https://doi.org/10.3745/KIPSTB.2012.19B.3.189
  37. W. M. Song, Y. Kim, E.-J. Kim, and M. Kim, "A Document Summarization System Using Dynamic Connection Graph", Journal of KIISE, Vol. 36, No. 1, pp. 62-69, 2009.
  38. http://en.wikipedia.org/wiki/Vector_space_mode
  39. M. Hwang, D. Choi, and P. Kim "A Context Information Extraction Method according to Subject for Semantic Text Processing", Journal of Korean Institute of Information Technology, vol. 8, No. 11, pp. 197-204, 2010.
  40. J. Shim, H. C. Lee, "The Development of Automatic Ontology Generation System Using Extended Search Keywords" Journal of the Korea Academia-Industrial cooperation Society, Vol. 11, no. 6, 2009.
  41. J. Chang, "Efficient Retrieval of Short Opinion Documents Using Learning to Rank", Journal of the Institute of Internet, Broadcasting and Communication, Vol. 13, No. 4, Aug., 2013.