JOURNAL BROWSE
Search
Advanced SearchSearch Tips
The Impact of Name Ambiguity on Properties of Coauthorship Networks
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
The Impact of Name Ambiguity on Properties of Coauthorship Networks
Kim, Jinseok; Kim, Heejun; Diesner, Jana;
  PDF(new window)
 Abstract
Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.
 Keywords
bibliometrics;name ambiguity;initial based disambiguation;coauthorship networks;collaboration networks;
 Language
English
 Cited by
 References
1.
Barabasi, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica a-Statistical Mechanics and Its Applications, 311(3-4), 590-614. doi: 10.1016/s0378-4371(02)00736-7 crossref(new window)

2.
Bettencourt, L. M. A., Lobo, J., & Strumsky, D. (2007). Invention in the city: Increasing returns to patenting as a scaling function of metropolitan size. Research Policy, 36(1), 107-120. doi: 10.1016/j.respol.2006.09.026 crossref(new window)

3.
Brandes, U. (2008). On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2), 136-145. doi: http://dx.doi.org/10.1016/j.socnet.2007.11.001 crossref(new window)

4.
Braun, T., Glanzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51(3), 499-510. doi: 10.1023/a:1019643002560 crossref(new window)

5.
de Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek: Cambridge University Press.

6.
Diesner, J., & Carley, K. M. (2009). He says, she says, pat says, Tricia says: how much reference resolution matters for entity extraction, relation extraction, and social network analysis. Paper presented at the Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications, Ottawa, Ontario, Canada.

7.
Fegley, B. D., & Torvik, V. I. (2013). Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption? Plos One, 8(7). doi: 10.1371/journal.pone.0070299 crossref(new window)

8.
Fiala, D. (2012). Time-aware PageRank for bibliographic networks. Journal of Informetrics, 6(3), 370-388. doi: 10.1016/j.joi.2012.02.002 crossref(new window)

9.
Franceschet, M. (2011). Collaboration in Computer Science: A Network Science Approach. Journal of the American Society for Information Science and Technology, 62(10), 1992-2012. doi: 10.1002/asi.21614 crossref(new window)

10.
Friedkin, N. E. (1981). The Development of Structure in Random Networks: An Analysis of the Effects of Increasing Network Density on Five Measures of Structure. Social Networks, 3(1), 41-52. crossref(new window)

11.
Goyal, S., van der Leij, M. J., & Moraga-Gonzalez, J. L. (2006). Economics: An emerging small world. Journal of Political Economy, 114(2), 403-412. doi: 10.1086/500990 crossref(new window)

12.
He, B., Ding, Y., & Ni, C. (2011). Mining Enriched Contextual Information of Scientific Collaboration: A Meso Perspective. Journal of the American Society for Information Science and Technology, 62(5), 831-845. doi: 10.1002/asi.21510 crossref(new window)

13.
Huber, J. C. (2002). A new model that generates Lotka's Law. Journal of the American Society for Information Science and Technology, 53(3), 209-219. doi: 10.1002/asi.10025 crossref(new window)

14.
Knoke, D., & Yang, S. (2008). Social network analysis. Los Angeles, CA: Sage Publications.

15.
Lariviere, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science's first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997-1016. doi: 10.1002/asi.22645 crossref(new window)

16.
Lee, D., Goh, K. I., Kahng, B., & Kim, D. (2010). Complete trails of coauthorship network evolution. Physical Review E, 82(2). doi: 10.1103/PhysRevE.82.026112 crossref(new window)

17.
Ley, M. (2002). The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In A. F. Laender & A. Oliveira (Eds.), String Processing and Information Retrieval (Vol. 2476, pp. 1-10): Springer Berlin Heidelberg.

18.
Ley, M. (2009). DBLP: some lessons learned. Proc. VLDB Endow., 2(2), 1493-1500. crossref(new window)

19.
Leydesdorff, L., & Sun, Y. (2009). National and International Dimensions of the Triple Helix in Japan: University-Industry-Government Versus International Coauthorship Relations. Journal of the American Society for Information Science and Technology, 60(4), 778-788. doi: 10.1002/asi.20997 crossref(new window)

20.
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019-1031. doi: 10.1002/asi.20591 crossref(new window)

21.
Milojevic, S. (2010). Modes of Collaboration in Modern Science: Beyond Power Laws and Preferential Attachment. Journal of the American Society for Information Science and Technology, 61(7), 1410-1423. doi: 10.1002/asi.21331 crossref(new window)

22.
Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767-773. doi: http://dx.doi.org/10.1016/j.joi.2013.06.006 crossref(new window)

23.
Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213-238. crossref(new window)

24.
Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404-409. doi: 10.1073/pnas.021544898 crossref(new window)

25.
Newman, M. E. J. (2002). Assortative mixing in networks. Physical Review Letters, 89(20), 208701. crossref(new window)

26.
Rorissa, A., & Yuan, X. J. (2012). Visualizing and mapping the intellectual structure of information retrieval. Information Processing & Management, 48(1), 120-135. doi: 10.1016/j.ipm.2011.03.004 crossref(new window)

27.
Smalheiser, N. R., & Torvik, V. I. (2009). Author Name Disambiguation. Annual Review of Information Science and Technology, 43, 287-313.

28.
Strotmann, A., & Zhao, D. Z. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820-1833. doi: Doi 10.1002/Asi.22695 crossref(new window)

29.
Torvik, V. I., & Smalheiser, N. R. (2009). Author Name Disambiguation in MEDLINE. Acm Transactions on Knowledge Discovery from Data, 3(3). doi: Doi 10.1145/1552303.1552304 crossref(new window)

30.
Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140-158. doi: Doi 10.1002/Asi/20105 crossref(new window)

31.
Treeratpituk, P., & Giles, C. L. (2009). Disambiguating Authors in Academic Publications using Random Forests. Paper presented at the Jcdl 09: Proceedings of the 2009 Acm/Ieee Joint Conference on Digital Libraries.

32.
Velden, Haque, A., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. Paper presented at the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries.

33.
Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608-1618. doi: http://dx.doi.org/10.1016/j.respol.2005.08.002 crossref(new window)

34.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. New York, NY: Cambridge University Press.

35.
Yoshikane, F., Nozawa, T., Shibui, S., & Suzuki, T. (2009). An analysis of the connection between researchers' productivity and their co-authors' past attributions, including the importance in collaboration networks. Scientometrics, 79(2), 435-449. doi: 10.1007/s11192-008-0429-8 crossref(new window)