DOI QR코드

DOI QR Code

Anonymizing Graphs Against Weight-based Attacks with Community Preservation

  • Li, Yidong (School of Computer Science, University of Adelaide) ;
  • Shen, Hong (School of Computer Science, University of Adelaide)
  • Received : 2011.02.01
  • Accepted : 2011.03.20
  • Published : 2011.09.30

Abstract

The increasing popularity of graph data, such as social and online communities, has initiated a prolific research area in knowledge discovery and data mining. As more real-world graphs are released publicly, there is growing concern about privacy breaching for the entities involved. An adversary may reveal identities of individuals in a published graph, with the topological structure and/or basic graph properties as background knowledge. Many previous studies addressing such attacks as identity disclosure, however, concentrate on preserving privacy in simple graph data only. In this paper, we consider the identity disclosure problem in weighted graphs. The motivation is that, a weighted graph can introduce much more unique information than its simple version, which makes the disclosure easier. We first formalize a general anonymization model to deal with weight-based attacks. Then two concrete attacks are discussed based on weight properties of a graph, including the sum and the set of adjacent weights for each vertex. We also propose a complete solution for the weight anonymization problem to prevent a graph from both attacks. In addition, we also investigate the impact of the proposed methods on community detection, a very popular application in the graph mining field. Our approaches are efficient and practical, and have been validated by extensive experiments on both synthetic and real-world datasets.

Keywords

References

  1. R. Agrawal and R. Srikant, "Privacy-preserving data mining," ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000, pp. 439-450.
  2. J. Domingo-Ferrer and J. M. Mateo-Sanz, "Practical data-oriented microaggregation for statistical disclosure control," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 1, pp. 189-201, 2002. https://doi.org/10.1109/69.979982
  3. K. Liu, H. Kargupta, and J. Ryan, "Random projection-based multiplicative data perturbation for privacy preserving distributed data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 92-106, 2006. https://doi.org/10.1109/TKDE.2006.14
  4. L. Sweeney, "K-anonymity: a model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems, vol. 10, no. 5, pp. 557-570, 2002. https://doi.org/10.1142/S0218488502001648
  5. M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava, Anonymizing Social Networks. Technical Report No. 07-19, Amherst, MA: University of Massachusetts Amherst, Mar. 2007.
  6. L. Backstrom, C. Dwork, and J. Kleinberg, "Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography," Proceedings of the 16th International World Wide Web Conference, Banff, Canada, 2007, pp. 181-190.
  7. B. Zhou and J. Pei, "Preserving privacy in social networks against neighborhood attacks," Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 506-515.
  8. K. Liu and E. Terzi, "Towards identity anonymization on graphs," ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 93-106.
  9. S. Fortunato, "Community detection in graphs," Physics Reports, vol. 486, no. 3-5, pp. 75-174, 2010. https://doi.org/10.1016/j.physrep.2009.11.002
  10. M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 74, no. 3, pp. 036104, 2006. https://doi.org/10.1103/PhysRevE.74.036104
  11. Z. Li, S. Zhang, R. S. Wang, X. S. Zhang, and L. Chen, "Quantitative function for community detection," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 77, no. 3, pp. 036109, 2008. https://doi.org/10.1103/PhysRevE.77.036109
  12. M. Brinkmeier, S. Recknagel, and J. Werner, "Communities in graphs and hypergraphs," Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisboa, Portugal, 2007, pp. 869-872.
  13. V. A. Traag and J. Bruggeman, "Community detection in networks with positive and negative links," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 80, no. 3, pp. 036115, 2009. https://doi.org/10.1103/PhysRevE.80.036115
  14. A. Lancichinetti and S. Fortunato, "Community detection algorithms: a comparative analysis," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 80, no. 5, pp. 056117, 2009. https://doi.org/10.1103/PhysRevE.80.056117
  15. K. Muralidhar and R. Sarathy, "Data shuffling: a new masking approach for numerical data," Management Science, vol. 52, no. 5, pp. 658-670, May 2006. https://doi.org/10.1287/mnsc.1050.0503
  16. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "L-diversity: privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, pp. Article 3, Mar. 2007. https://doi.org/10.1145/1217299.1217302
  17. N. Li, T. Li, and S. Venkatasubramanian, "T-closeness: privacy beyond k-anonymity and l-diversity," Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 106-115.
  18. C. C. Aggarwal and P. S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, New York: Springer, 2008.
  19. E. Zheleva and L. Getoor, "Preserving the privacy of sensitive relationships in graph data," Proceedings of the 1st ACM SIGKDD International Conference on Privacy, Security, and Trust in KDD, San Jose, CA, 2008.
  20. X. Ying and X. Wu, "Randomizing social networks: a spectrum preserving approach," The 8th SIAM International Conference on Data Mining, Atlanta, GA, 2008, pp. 739-750.
  21. L. Liu, J. Wang, J. Liu, and J. Zhang, "Privacy preservation in social networks with sensitive edge weights," The 9th SIAM International Conference on Data Mining, Sparks, NV, 2009, pp. 949-960.
  22. S. Das, O. Egecioglu, and A. El Abbadi, "Anonymizing weighted social network graphs," Proceedings of the 26th International Conference on Data Engineering, Long Beach, CA, 2010, pp. 904-907.
  23. M. Girvan and M. E. J. Newman, "Community structure in social and biological networks," Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 12, pp. 7821-7826, Jun. 2002. https://doi.org/10.1073/pnas.122653799
  24. J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, "Email as spectroscopy: automated discovery of community structure within organizations," Communities and Technologies: Proceedings of the First International Conference on Communities and Technologies, C & T 2003, M. Huysman, E. Wenger, and V. Wulf, Eds., Dordrecht: Kluwer Academic Publishers, 2003.
  25. P. Holme, M. Huss, and H. Jeong, "Subnetwork hierarchies of biochemical pathways," Bioinformatics, vol. 19, no. 4, pp. 532-538, 2003. https://doi.org/10.1093/bioinformatics/btg033
  26. J. W. Pinney and D. R. Westhead, "Betweenness-based decomposition methods for social and biological networks," Interdisciplinary Statistics and Bioinformatics, S. Barber, P. Baxter, K. Mardia, and R. Walls, Eds., Leeds, UK: Leeds University Press, 2006, pp. 87-90.
  27. S. Gregory, "An algorithm to find overlapping community structure in networks," Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 2007, pp. 91-102.
  28. M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 69, no. 2, pp. 026113, 2004. https://doi.org/10.1103/PhysRevE.69.026113
  29. U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikolski, and D. Wagner, On modularity-NP-completeness and beyond, Karlsruhe: Universitat Karlsruhe Fakultat fur Informatik, 2006.
  30. R. Guimera and L. A. N. Amaral, "Functional cartography of complex metabolic networks," Nature, vol. 433, no. 7028, pp. 895-900, 2005. https://doi.org/10.1038/nature03288
  31. J. Duch and A. Arenas, "Community detection in complex networks using extremal optimization," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 72, no. 2, pp. 027104, 2005. https://doi.org/10.1103/PhysRevE.72.027104
  32. Y. Sun, B. Danila, K. Josic, and K. E. Bassler, "Improved community structure detection using a modified fine-tuning strategy," EPL (Europhysics Letters), vol. 86, no. 2, pp. 28004, 2009. https://doi.org/10.1209/0295-5075/86/28004
  33. B. Mohar, "The laplacian spectrum of graphs," Graph Theory, Combinatorics, and Applications, Y. Alavi, G. Chartrand, O. R. Oellermann, and A. J. Schwenk, Eds., New York: Wiley, 1991, pp. 871-898.
  34. M. Fiedler, "Laplacian of graphs and algebraic connectivity," Combinatorics and Graph Theory, Z. Skupien and M. Borowiecki, Eds., Warszawa, Poland: PWN-Polish Scientific Publishers, 1989, pp. 57-70.
  35. F. R. K. Chung, Spectral Graph Theory, Providence, RI: American Mathematical Society, 1997.
  36. L. Halbeisen and N. Hungerbühler, "Reconstruction of weighted graphs by their spectrum," European Journal of Combinatorics, vol. 21, no. 5, pp. 641-650, 2000. https://doi.org/10.1006/eujc.1999.0410
  37. A. Oganian and J. Domingo-Ferrer, "On the complexity of optimal microaggregation for statistical disclosure control," Statistical Journal of the United Nations Economic Commission for Europe, vol. 18, no. 4, pp. 345-353, 2001.
  38. P. Erdos and T. Gallai, "Graphs with prescribed degrees of vertices," Matematikai Lapok, vol. 11, pp. 264-274, 1960.
  39. S. L. Hakimi, "On realizability of a set of integers as degrees of the vertices of a linear graph I," SIAM Journal on Applied Mathematics, vol. 10, no. 3, pp. 496-506, 1962. https://doi.org/10.1137/0110037
  40. F. Boesch and F. Harary, "Line removal algorithms for graphs and their degree lists," IEEE Transactions on Circuits Systems, vol. CAS-23, no. 12, pp. 778-782, 1976. https://doi.org/10.1109/TCS.1976.1084170
  41. G. H. Golub and C. F. Van Loan, Matrix Computations, Baltimore, MD: Johns Hopkins University Press, 1983.
  42. H. R. Bernard, P. D. Killlworth, and L. Sailer, "Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data," Social Networks, vol. 2, pp. 191-218, 1979-80.
  43. M. E. Newman, "The structure of scientific collaboration networks," Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 2, pp. 404-409, Jan. 2001. https://doi.org/10.1073/pnas.021544898
  44. T. Opsahl and P. Panzarasa, "Clustering in weighted networks," Social Networks, vol. 31, no. 2, pp. 155-163, 2009. https://doi.org/10.1016/j.socnet.2009.02.002
  45. M. E. J. Newman, "Analysis of weighted networks," Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 70, no. 5, pp. 056131, 2004. https://doi.org/10.1103/PhysRevE.70.056131