JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Secure Multiparty Computation of Principal Component Analysis
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 42, Issue 7,  2015, pp.919-928
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2015.42.7.919
 Title & Authors
Secure Multiparty Computation of Principal Component Analysis
Kim, Sang-Pil; Lee, Sanghun; Gil, Myeong-Seon; Moon, Yang-Sae; Won, Hee-Sun;
 
 Abstract
In recent years, many research efforts have been made on privacy-preserving data mining (PPDM) in data of large volume. In this paper, we propose a PPDM solution based on principal component analysis (PCA), which can be widely used in computing correlation among sensitive data sets. The general method of computing PCA is to collect all the data spread in multiple nodes into a single node before starting the PCA computation; however, this approach discloses sensitive data of individual nodes, involves a large amount of computation, and incurs large communication overheads. To solve the problem, in this paper, we present an efficient method that securely computes PCA without the need to collect all the data. The proposed method shares only limited information among individual nodes, but obtains the same result as that of the original PCA. In addition, we present a dimensionality reduction technique for the proposed method and use it to improve the performance of secure similar document detection. Finally, through various experiments, we show that the proposed method effectively and efficiently works in a large amount of multi-dimensional data.
 Keywords
privacy-preserving data mining;principal component analysis;multiparty computation;secure similar document detection;
 Language
Korean
 Cited by
1.
Secure principal component analysis in multiple distributed nodes, Security and Communication Networks, 2016, 9, 14, 2348  crossref(new windwow)
 References
1.
S.-K Hong, Y.-S. Moon, and H.-S. Kim, "Privacy-Preserving Time-Series Data Mining," Journal of KIISE: Databases, Vol. 40, No. 2, pp. 124-133, Apr. 2013. (in Korean)

2.
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, "Tools for Privacy Preserving Distributed Data Mining," Knowledge Discovery and Data Mining Explorations Newsletter, ACM SIGKDD, Vol. 4, Issue 2, pp. 28-34, Jun. 2002.

3.
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, "Privacy-Preserving Data Publishing: A Survey of Recent Developments," ACM Computing Surveys, Vol. 42, No. 4, pp. 14-53, Jun. 2010.

4.
W. Du and M. J. Atallah, "Secure Multi-Party Computation Problems and Their Applications - A Review and Open Problems," Proc. of the 2001 Workshop on New Security Paradigms, New Mexico, USA, pp. 13-22, Sept. 2001.

5.
S. Buyrukbilen and S. Bakiras, "Secure Similar Document Detection with Simhash," Proc. of the 2014 Workshop on VLDB-Secure Data Management, SDM 2013, Trento, Italy, pp. 61-75, Aug. 2013.

6.
Y. Peng, G. Kou, Y. Shi, and Z. Chen, "Privacy-Preserving Data Mining for Medical Data: Application of Data Partition Methods," Communications and Discoveries from Multidisciplinary Data, Vol. 123, pp. 331-340, Oct. 2008. crossref(new window)

7.
S. Kim, M.-K. Sung, and Y.-D. Chung, "A Delayfree Anonymization Method for Preserving Privacy of Data Streams," Journal of KIISE: Databases, Vol. 40, No. 6, pp. 411-422, Dec. 2013. (in Korean)

8.
A. Sharma and K. K. Paliwal, "Fast Principal Component Analysis Using Fixed-point Algorithm," Pattern Recognition Letters, Vol. 28, No. 1, pp. 1151-1155, Jan. 2007. crossref(new window)

9.
R. P. Browne and P. D. McNicholas, "Estimating Common Principal Components in High Dimensions," Journal of Data Analysis and Classification, Vol. 8, No. 2, pp. 217-226, Jun. 2014. crossref(new window)

10.
K. L. Elmore and M. B. Richman, "Euclidean Distance as a Similarity Metric for Principal Component Analysis," Journal of American Meteorological Society, Vo1. 129, Issue 3, pp. 540-549, Mar. 2001.

11.
M. Lu, H.-S. Lee, D. Hadley, J. Z. Huang, and X. Qian, "Logistic Principal Component Analysis for Rare Variants in Gene-Environment Interaction Analysis," IEEE Trans. on Computational Biology and Bioinformatics, Vol. 11, No. 6, pp. 1020-102, Nov. 2014. crossref(new window)

12.
L.-C. Yu and C.-Y. Ho, "Identifying Emotion Labels from Psychiatric Social Texts Using Independent Component Analysis," Proc. the 25th Int'l Conf. on Computational Linguistics, Dublin, Ireland, pp. 837-847, Aug. 2014.

13.
S.-K. Hong, S.-P. Kim, H. S. Lim, and Y.-S. Moon, "Secure Multi-Party Computation of Correlation Coefficients," Journal of KIISE: Databases, Vol. 41, No. 10, pp. 799-809, Oct. 2014. (in Korean) crossref(new window)

14.
W. Jiang, M. Murugesan, C, Clifton, and L. Si, "Similar Document Detection with Limited Information Disclosure," Proc. of the 24th IEEE Int'l Conf. on Data Engineering, Cancun, Mexico, pp. 735-743, Apr. 2008.

15.
M. Murugesan, W. Jiang, C. Clifton, L. Si, and J. Vaidya, "Efficient Privacy-Preserving Similar Document Detection," Journal on Very Large Data Bases, Vol. 19, No. 4, pp. 457-475, Aug. 2010. crossref(new window)

16.
K. Yang and C. Shahabi, "A PCA-based Similarity Measure for Multivariate Time Series," Proc. of the 2nd ACM Int'l Workshop on Multimedia Databases, Washington DC, pp. 65-74, Nov. 2004.

17.
National Climate Data Center, [Online]. Available: http://www.ncdc.noaa.gov.

18.
UCI, [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words