Features for Author Disambiguation

저자 식별을 위한 자질 비교

  • 강인수 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 이승우 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 정한민 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 김평 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 구희관 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 이미경 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 성원경 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 박동인 (한국과학기술정보연구원 정보서비스연구팀)
  • Published : 2008.02.28


There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.


  1. N. Aswani, K. Bontcheva, and H. Cunningham, Mining information for instance unification, ISWC-2006, pp.329-342, 2006.
  2. A. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum, Author disambiguation using error-driven machine learning with a ranking loss function, IIWeb-2007, 2007.
  3. R. Guha, A. Garg, Disambiguating people in search, WWW-2004, 2004.
  4. H. Han, C. L. Giles, and H. Zha, A model-based k-means algorithm for name disambiguation, Semantic Web Technologies for Searching and Retrieving Scientific Data, 2003.
  5. H. Han, C. L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations, JCDL-2004, 2004.
  6. J. Huang, S. Ertekin, and C. L. Giles, Efficient name disambiguation for large scale databases, PKDD-2006, pp.536-544, 2006.
  7. P. Kanani, A. McCallum, and C. Pal, Improving author coreference by resource-bounded information gathering from the Web, IJCAI-2007, 2007.
  8. Y. Song, J. Huang, I. Councill, J. Li, and C. L. Giles, Efficient topic-based unsupervised name disambiguation, JCDL-2007, 2007.
  9. V. I. Torvik, M. Weeber, D. R. Swanson, and N. R. Smalheiser, "A probabilistic similarity metric for Medline records: a model for author name disambiguation," JASIST, Vol.56, No.2, pp.140-158, 2005.
  10. W. E. Winkler, Overview of record linkage and current research directions, Research Report Series #2006-2, Statistical Research Division, U.S. Census Bureau., 2006.

Cited by

  1. A Comparative Study on Authority Records for Japanese Writers in Japan and the United States of America vol.48, pp.1, 2014,