DOI QR코드

DOI QR Code

Automated Development of Rank-Based Concept Hierarchical Structures using Wikipedia Links

위키피디아 링크를 이용한 랭크 기반 개념 계층구조의 자동 구축

  • Received : 2015.08.18
  • Accepted : 2015.10.17
  • Published : 2015.11.30

Abstract

In general, we have utilized the hierarchical concept tree as a crucial data structure for indexing huge amount of textual data. This paper proposes a generality rank-based method that can automatically develop hierarchical concept structures with the Wikipedia data. The goal of the method is to regard each of Wikipedia articles as a concept and to generate hierarchical relationships among concepts. In order to estimate the generality of concepts, we have devised a special ranking function that mainly uses the number of hyperlinks among Wikipedia articles. The ranking function is effectively used for computing the probabilistic subsumption among concepts, which allows to generate relatively more stable hierarchical structures. Eventually, a set of concept pairs with hierarchical relationship is visualized as a DAG (directed acyclic graph). Through the empirical analysis using the concept hierarchy of Open Directory Project, we proved that the proposed method outperforms a representative baseline method and it can automatically extract concept hierarchies with high accuracy.

Keywords

Text Mining;Information Retrieval;Concept Hierarchy;Indexing;Concept;Wikipedia;DAG

References

  1. Agrawal, D., Das, S., and El Abbadi, A., "Big data and cloud computing: new wine or just new bottles?," Proceedings of VLDB Endowment, Vol. 3, No. 1-2, pp. 1647-1648, 2010. https://doi.org/10.14778/1920841.1921063
  2. Allan, J., "Automatic hypertext link typing," Proceedings of the 7th ACM Conference on Hypertext, pp. 42-52, 1996.
  3. Amiri, H., Ahmad, A., Rahgozar, M., and Oroumchian, F., "Query Expansion Using Wikipedia Concept Graph," University of Wollongong in Dubai, 2008.
  4. Conklin, J., "Hypertext: An Introduction and Survey," IEEE Computer, Vol. 20, No. 9, pp. 17-41, 1987.
  5. De Melo, G. and Weikum, G., "MENTA: Inducing multilingual taxonomies from Wikipedia," Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1099-1108, 2010.
  6. Dubitzky, W., Wolkenhauer, O., Yokota, H., and Cho, K. H., "Encyclopedia of systems biology," Springer Publishing Company, 2013.
  7. Jensen, F. V., "An introduction to Bayesian Networks," UCL press, London, Vol. 210, 1996.
  8. Kim, H. and Chang, J., "A Semantic Text Model with Wikipedia-based Concept Space," The Journal of Society for e-Business Studies, Vol. 19, No. 3, pp. 107-123, 2014. https://doi.org/10.7838/jsebs.2014.19.3.107
  9. Kim, H. and Hong, K., "Building Semantic Concept Networks by Wikipedia-Based Formal Concept Analysis," Advanced Science Letters, Vol. 21, No. 3, pp. 435-438, 2015. https://doi.org/10.1166/asl.2015.5868
  10. Lee, G. and Kim H., "Automated Development of Concept Hierarchy Tree using Backlink Information of Wikipedia," Database Research, Vol. 31, No. 1, pp. 40-49, 2015.
  11. Lohr, S., "The age of big data," New York Times, Vol. 11, 2012.
  12. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A. H., "Big data: The next frontier for innovation, competition, and productivity," The McKinsey Global Institute, 2011.
  13. McAfee, A., Brynjolfsson, E., Daven port, T. H., Patil, D. J., and Barton, D., "Big data," The Management Revolution Harvard Bus Review, Vol. 90, No. 10, pp. 61-67, 2012.
  14. Miller, G. A., "WordNet: a lexical database for English," Communications of the ACM, Vol. 38, No. 11, pp. 39-41, ACM, 1995.
  15. Nastase, V., Strube, M., Borschinger, B., Zirn, C., and Elghafari, A., "WikiNet: A Very Large Scale Multi-Lingual Concept Network," LREC, 2010.
  16. Open directory project, http://dmoz.org
  17. Perugini, S., "Supporting mutiple paths to objects in information hierarchies: Faceted classification, facet search, and symbolic links," Information Processing and Management, Vol. 46, No. 1, pp. 22-43, 2010. https://doi.org/10.1016/j.ipm.2009.06.007
  18. Sanderson, M. and Croft, B., "Deriving concept hierarchies from text," Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206-213, 1999.
  19. STAMFORD, Conn, "Gartner Says Solving Big Data Challenge Involves More Than Just Managing Volumes of Data," http://www.gartner.com/newsroom/id/1731916, 2011.
  20. Strube, M. and Ponzetto, S. P., "WikiRelate! Computing semantic relatedness using Wikipedia," AAAI, Vol. 6, pp. 1419-1424, 2006.
  21. Vassiliadis, P. and Sellis, T., "A survey of logical models for OLAP databases," ACM SIGMOD Record, Vol. 28, No. 4, pp. 64-69, 1999. https://doi.org/10.1145/344816.344869
  22. Wikipedia, http://en.wikipedia.org.
  23. Xu, M., Wang, Z., Bie, R., Li, J., Zheng, C., Ke, W., and Zhou, M., "Discovering missing semantic relations between entities in Wikipedia," The Semantic Web-ISWC 2013, pp. 673-686, 2013.

Acknowledgement

Supported by : 한국연구재단