DOI QR코드

DOI QR Code

Secure Multi-Party Computation of Correlation Coefficients

상관계수의 안전한 다자간 계산

  • 홍선경 (강원대학교 컴퓨터과학과) ;
  • 김상필 (강원대학교 컴퓨터과학과) ;
  • 임효상 (연세대학교 컴퓨터정보통신공학부) ;
  • 문양세 (강원대학교 컴퓨터과학과)
  • Received : 2014.07.17
  • Accepted : 2014.08.25
  • Published : 2014.10.15

Abstract

In this paper, we address the problem of computing Pearson correlation coefficients and Spearman's rank correlation coefficients in a secure manner while data providers preserve privacy of their own data in distributed environment. For a data mining or data analysis in the distributed environment, data providers(data owners) need to share their original data with each other. However, the original data may often contain very sensitive information, and thus, data providers do not prefer to disclose their original data for preserving privacy. In this paper, we formally define the secure correlation computation, SCC in short, as the problem of computing correlation coefficients in the distributed computing environment while preserving the data privacy (i.e., not disclosing the sensitive data) of multiple data providers. We then present SCC solutions for Pearson and Spearman's correlation coefficients using secure scalar product. We show the correctness and secure property of the proposed solutions by presenting theorems and proving them formally. We also empirically show that the proposed solutions can be used for practical applications in the performance aspect.

본 논문에서는 분산 컴퓨팅 환경에서 데이터 제공자들이 각자 소유한 데이터의 프라이버시는 보호하면서도 피어슨(Pearson) 상관계수와 스피어만(Spearman)의 순위상관계수를 안전하게 계산하는 해결책을 각각 제안한다. 분산 컴퓨팅 환경에서 마이닝(또는 데이터 분석)을 수행하기 위해서는 원본 데이터를 상대방에게 제공해야 한다. 그러나, 원본 데이터는 민감한 정보를 포함하는 경우가 많고, 이때 데이터 제공자(소유자)는 프라이버시 보호를 이유로 정확한 값을 직접 노출하기를 원하지 않는다. 본 논문에서는 분산 컴퓨팅 환경의 데이터 제공자들이 각자 소유한 데이터는 상대방에게 공개하지 않으면서 상관관계를 계산하는 문제, 즉 안전한 상관관계 계산(SCC: Secure Correlation Computation) 문제를 정형적으로 정의한다. 그리고, 임의 행렬 기반 안전한 스칼라 곱을 사용하여 피어슨 상관계수와 순위상관계수에 대한 SCC 문제를 해결하는 방법을 각각 제안한다. 제안한 해결책이 바르게 수행함을 보이기 위해, 정확성과 안전성을 정리로 제시하고 증명한다. 또한, 실험을 통해 제안한 기법이 수행 시간 측면에서도 실용적인 방법임을 보인다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. S.-K. Hong, Y.-S. Moon, and H.-S. Kim, "Privacy-Preserving Time-Series Data Mining," Journal of KIISE: Databases, Vol. 40, No. 2, pp. 124-133, Apr. 2013. (in Korean)
  2. S.-K. Hong, J. Hong, and Y.-S. Moon, "Correlationaware Noise Generation on Time-Series Databases," Journal of KIISE: Databases, Vol. 40, No. 5, pp. 319-327, Oct. 2013. (in Korean)
  3. W. Du and M. J. Atallah, "Secure Multi-Party Computation Problems and Their Applications - A Review and Open Problems," Proc. of the 2001 Workshop on New Security Paradigms, pp. 13-22, Sept. 2001.
  4. R. Agrawal and R. Srikant, "Privacy-Preserving Data Mining," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, pp. 439- 450, Jun. 2000.
  5. J. Vaidya and C. Clifton, "Privacy Preserving Association Rule Mining in Vertically Partitioned Data," Proc. of the 9th ACM Int'l Conf. on Knowledge Discovery and Data Mining, pp. 206-215, Aug. 2003.
  6. L. Liu, J. Wang, and J. Zhang, "Wavelet-based Data Perturbation for Simultaneous Privacy-Preserving Statistics-Preserving," Proc. of the 8th IEEE Int'l Conf. on Data Mining Workshop, pp. 27-35, Dec. 2008.
  7. M. G. Kaosar, R. Paulet, and X. Yi, "Fully Homomorphic Encryption based Two-Party Association Rule Mining," Data & Knowledge Engineering, Vol. 76-78, pp. 1-15, June/Aug. 2012. https://doi.org/10.1016/j.datak.2012.03.003
  8. M. Sayal and L. Singh, "Privately Detecting Pairwise Correlations in Distributed Time Series," Proc. of IEEE Int'l Conf. on Privacy, Security, Risk, and Trust and IEEE Int'l Conf. on Social Computing, pp. 981-987, Oct. 2011.
  9. W. Jiang, M. Murugesan, C. Clifton, and L. Si, "Similar Document Detection with Limited Information Disclosure," Proc. of the 24th Int'l Conf. on Data Engineering, pp. 735-743, Apr. 2008.
  10. B. Goethals, S. Laur, H. Lipmaa, and T. Mieli- kainen, "On Private Scalar Product Computation for Privacy-Preserving Data Mining," Proc. of the 7th Int'l Conf. on Information Security and Cryptology, pp. 104-120, Dec. 2004.
  11. W. K. Wong, D. W. Cheung, B. Kao, and N. Mamoulis, "Secure kNN Computation on Encrypted Databases," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, pp. 139-152, Jun. 2009.
  12. A. C. Yao, "Protocols for Secure Computations," Proc. of the 23th IEEE Symp. on Foundations of Computer Science, pp. 160-164, Nov. 1982.
  13. J. Vaidya and C. Clifton, "Secure Set Intersection Cardinality with Application to Association Rule Mining," Journal of Computer Security, Vol. 13, No. 4, pp. 593-622, 2005.
  14. P. Ravikumar, W. W. Cohen, and S. E. Fienberg, "A Secure Protocol for Computing String Distance Metrics," Proc. of Workshop on Privacy and Security Aspects of Data Mining at the Int'l Conf. on Data Mining, pp. 40-46, Nov. 2004.
  15. A. Lewko and T. Okamoto, "Fully Secure Functional Encryption: Attribute-Based Encryption and (Hierarchical) Inner Product Encryption," Proc. of the 29th Int'l Conf. on the Theory and Applications of Cryptographic Techniques, pp. 62-91, May/June 2010.
  16. X. Yi, M. G. Kaosar, R. Paulet, and E. Bertino, "Single-Database Private Information Retrieval from Fully Homomorphic Encryption," IEEE Trans. on Knowledge and Data Engineering, Vol. 25, No. 5, pp. 1125-1134, May 2013. https://doi.org/10.1109/TKDE.2012.90
  17. W. Du and M. J. Atallah, "Privacy-Preserving Cooperative Statistical Analysis," Proc. of the 17th Int'l Conf. on Computer Security Applications, pp. 102-110, Dec. 2001.
  18. N. Blaikie, Analyzing Quantitative Data, London, Sage Publications, 2003.
  19. R. Agrawal, C. Faloutsos, and A. Swami, "Efficient Similarity Search in Sequence Databases," Proc. of the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, pp. 69-84, Oct. 1993.
  20. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast Subsequence Matching in Time-Series Dtabases," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, pp. 419-429, May 1994.
  21. Y.-S. Moon, B.-S. Kim, M. S. Kim, and K.-Y. Whang, "Scaling-Invariant Boundary Image Matching Using Time-Series Matching Techniques," Data & Knowledge Engineering, Vol. 69, No. 10, pp. 1022-1042, Oct. 2010. https://doi.org/10.1016/j.datak.2010.07.001
  22. National Climate Data Center, http://www.ncdc.noaa.gov.

Cited by

  1. Depression Index Service Using Knowledge Based Crowdsourcing in Smart Health vol.93, pp.1, 2017, https://doi.org/10.1007/s11277-016-3923-3
  2. Emerging risk forecast system using associative index mining analysis vol.20, pp.1, 2017, https://doi.org/10.1007/s10586-016-0702-6