DOI QR코드

DOI QR Code

An Effect of Semantic Relatedness on Entity Disambiguation: Using Korean Wikipedia

개체중의성해소에서 의미관련도 활용 효과 분석: 한국어 위키피디아를 사용하여

  • Kang, In-Su (School of Computer Science & Engineering, College of Engineering, Kyungsung University)
  • 강인수 (경성대학교 공과대학 컴퓨터공학부)
  • Received : 2014.11.05
  • Accepted : 2015.03.10
  • Published : 2015.04.25

Abstract

Entity linking is to link entity's name mentions occurring in text to corresponding entities within knowledge bases. Since the same entity mention may refer to different entities according to their context, entity linking needs to deal with entity disambiguation. Most recent works on entity disambiguation focus on semantic relatedness between entities and attempt to integrate semantic relatedness with entity prior probabilities and term co-occurrence. To the best of my knowledge, however, it is hard to find studies that analyze and present the pure effects of semantic relatedness on entity disambiguation. From the experimentation on Korean Wikipedia data set, this article empirically evaluates entity disambiguation approaches using semantic relatedness in terms of the following aspects: (1) the difference among semantic relatedness measures such as NGD, PMI, Jaccard, Dice, Simpson, (2) the influence of ambiguities in co-occurring entity mentions' set, and (3) the difference between individual and collective disambiguation approaches.

개체 링킹은 텍스트에 출현하는 개체 표현을 위키피디아 등의 지식베이스 항목으로 연결하는 작업이다. 동일한 개체 표현을 공유하는 서로 다른 개체들의 존재로 인해 개체 링킹에서는 개체 표현의 중의성을 해소할 필요가 있다. 개체 중의성 해소를 위한 최근 연구에서는 공기 개체 의미관련도를 중심으로 개체 출현 선험 확률와 공기 용어 정보 등을 결합하는 시도들이 주류를 형성하고 있다. 그러나 의미관련도의 왕성한 활용에도 불구하고 의미관련도 기반 방법이 개체중의성해소에 미치는 순수 효과를 분석 제시한 연구는 찾기 힘들다. 이 연구는 NGD, PMI, Jaccard, Dice, Simpson 등 서로 다른 의미관련도 지표의 차이, 공기개체집합 내 중의성 정도의 차이, 개별적/집단적 중의성해소 방식의 차이의 세 가지 관점에서 의미관련도 기반 개체중의성해소 방법들을 한국어 위키피디아 데이터를 사용하여 실험적으로 평가한 결과를 제시한다.

Keywords

References

  1. X. Han, L. Sun, J. Zhao, "Collective entity linking in web text: a graph-based method," Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011
  2. O. Medelyan, I. H. Witten, D. Milne, "Topic indexing with Wikipedia," Proceedings of the Wikipedia and AI workshop at AAAI-08, 2008.
  3. D. N. Milne, I. H. Witten, "Learning to link with Wikipedia," Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008.
  4. P. Ferragina, U. Scaiella, "TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities)," Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010.
  5. S. Kulkarni, A. Singh, G. Ramakrishnan, S. Chakrabarti, "Collective annotation of Wikipedia entities in web text," Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009.
  6. J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum, "Robust disambiguation of named entities in text," Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011.
  7. L. Ratinov, D. Roth, D. Downey, M. Anderson, "Local and global algorithms for disambiguation to Wikipedia," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011.
  8. R. Mihalcea, A. Csomai, "Wikify!: linking documents to encyclopedic knowledge," Proceedings of the 16th ACM Conference on Information and Knowledge Management, 2007.
  9. D. Bollegala, Y. Matsuo, M. Ishizuka, "Measuring semantic similarity between words using web search engines," Proceedings of the 16th International Conference on World Wide Web, 2007.
  10. A. Islam, E. E. Milios, V. Keselj, "Comparing word relatedness measures based on Google n-grams," Proceedings of COLING 2012: Posters, 2012.
  11. C. Li, A. Sun, A. Datta, "A generalized method for word sense disambiguation based on Wikipedia," Proceedings of the 33rd European Conference on IR Research, 2011.
  12. I. Kang, S. Kang, "A single-step machine learning approach to link detection in Wikipedia: NTCIR Crosslink-2 Experiments at KSLP," Proceedings of the 10th NTCIR Conference, 2013.
  13. S. Kang, "English-Korean cross-lingual link discovery using link probability and named entity recognition", Journal of The Korean Institute of Intelligent Systems, vol. 23, no. 3, pp. 191-195, 2013. https://doi.org/10.5391/JKIIS.2013.23.3.191
  14. S. Hassan, R. Mihalcea, "Semantic relatedness using salient semantic analysis," Proceedings of the 25th AAAI Conference on Artificial Intelligence, 2011.
  15. R. Cilibrasi, P. M. B. Vitányi, "The Google similarity distance", Available: http://arxiv.org/pdf/cs/0412098.pdf, 2004, [Accessed: October 29, 2014]
  16. J. Gracia, R. Trillo, M. Espinoza, E. Mena, "Querying the web: a multiontology disambiguation method," Proceedings of the 6th International Conference on Web Engineering, 2006.
  17. K. W. Church, P. Hanks, "Word association norms, mutual information, and lexicography," Computational Linguistics, vol. 16, no. 1, pp. 22-29, 1990.
  18. P. Jaccard, "Nouvelles recherches sur la distribution florale," Bull. Soc. Vaud. Sci. Nat., vol. 44, pp. 223-270, 1908.
  19. G. G. Simpson, "Notes on the measurement of faunal resemblance," American Journal of Science, vol. 258a, pp. 300-311, 1960.
  20. L. R. Dice, "Measures of the amount of ecologic association between species," Ecology, vol. 26, pp. 297-302, 1945. https://doi.org/10.2307/1932409
  21. S. Brin, L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks, vol. 30, pp. 107-117, 1998.
  22. R. Navigli, "Word sense disambiguation: a survey," ACM Computing Surveys, vol. 41, no. 2, 2009.