A Large-scale Test Set for Author Disambiguation

저자 식별을 위한 대용량 평가셋 구축

  • 강인수 (경성대학교 컴퓨터정보학부) ;
  • 김평 (한국과학기술정보연구원 정보기술연구실) ;
  • 이승우 (한국과학기술정보연구원 정보기술연구실) ;
  • 정한민 (한국과학기술정보연구원 정보기술연구실) ;
  • 류범종 (한국과학기술정보연구원 정보기술연구실)
  • Published : 2009.11.28


To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.


Author Disambiguation;Test Set for Author Disambiguation;Test Set Construction


  1. Y. Song, J. Huang, I. Councill, J. Li and C. L. Giles, "Efficient topic-based unsupervised name disambiguation," In Proceedings of the ACM IEEE Joint Conference on Digital Libraries (JCDL), 2007(6).
  2. H. Han, H. Zha, and C. L. Giles, ''Name disambiguation in author citations using a k-way spectral clustering method," In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries(JCDL), pp.334-343, 2005(6).
  3. D. W. Lee, B. W. On, J. W. Kang, and S. H. Park, " Effective and scalable solutions for mixed and split citation problems in digital libraries," In Proceedings of the International Workshop on Information Quality in Information Systems(IQIS), pp.69-76, 2005(6).
  4. P. Kanani and A. McCallum, "Efficient strategies for improving partitioning-based author coreference by incorporating Web pages as graph nodes," In Proceedings of the 6th International Workshop on Information Integration on the Web(IIWeb-07), 2007(7).
  5. D. M. McRae-Spencer and N. R. Shadbolt, "Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation," In Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp.53-54, 2006(6).
  6. D. A. Pereira, B. Ribeiro-Neto, N. Ziviani, A. H. F. Laender, M. A. Goncalves, and A. A. Ferreira, "Using web information for author name disambiguation," In Proceedings of ACM/IEEE-CS Joint Conference on Digital Libraries(JCDL), pp.49-58, 2009(6).
  7. J. Huang, S. Ertekin, and C. L. Giles, "Efficient name disambiguation for large scale databases," In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD), pp.536-544. 2006(9).
  8. Y. F. Tan, M. Y. Kan, and D. W. Lee, "Search engine driven author disambiguation," In Proceedings of ACM/IEEE Joint Conference on Digital Libraries(JCDL), pp.314-315, 2006(6).
  9. M. Ley, "DBLP - some lessons learned," In Proceedings of International Conference on Very Large Data Bases(VLDB), 2009(8).
  10. V. Petricek, I. J. Cox, H. Han, I. G. Councill, and C. L. Giles, "A comparison of on-line computer science citation databases," In Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), 2005.
  11. O. Fatemieh, K. Manzoor, A. Jain, and A. Ramani, "Home Page Finder. University of Illinois at Urbana-Champaign," 2005.
  12. 강인수, "한글 저자명 군집화를 위한 계층적 기법 비교", 정보관리연구, 제40권, 제2호, pp.95-115, 2009.
  13. I. S. Kang, S. H. Na, S. W. Lee, H. M. Jung, P. Kim, W. K. Sung, and J. H. Lee, "On co-authorship for author disambiguation," Information Processing and Management, Vol.45, No.1, pp.84-97, 2009.

Cited by

  1. A Comparative Study on Authority Records for Japanese Writers in Japan and the United States of America vol.48, pp.1, 2014,