JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Big Data Analysis Using Principal Component Analysis
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Big Data Analysis Using Principal Component Analysis
Lee, Seung-Joo;
  PDF(new window)
 Abstract
In big data environment, we need new approach for big data analysis, because the characteristics of big data, such as volume, variety, and velocity, can analyze entire data for inferring population. But traditional methods of statistics were focused on small data called random sample extracted from population. So, the classical analyses based on statistics are not suitable to big data analysis. To solve this problem, we propose an approach to efficient big data analysis. In this paper, we consider a big data analysis using principal component analysis, which is popular method in multivariate statistics. To verify the performance of our research, we carry out diverse simulation studies.
 Keywords
Big Data;Principal Component Analysis;Eigenvalue;Big Data Analysis;Statistical Analysis;
 Language
Korean
 Cited by
 References
1.
K. Pearson, "On lines and planes of closest fit to systems of points in space", Phil Mag, vol. 2, pp. 559-572, 1901. crossref(new window)

2.
J. Gower, "Some distance properties of latent root and vector methods used in multivariate analysis", Biometrika, vol. 53, pp. 325-338, 1966. crossref(new window)

3.
G. Arnold and A. Collins, "Interpretation of transformed axes in multivariate analysis", Applied Statistics, vol. 42, pp. 381-400, 1993. crossref(new window)

4.
I. Jolliffe, Principal component analysis, Springer, 2002.

5.
M. Oleksiak, J. Roach, and D. Crawford, "Natural variation in cardiac metabolism and gene expression in fundulus heteroclitus", Nature Genetics, vol. 37, pp. 62-72, 2005.

6.
Johnson, R. A. and Wichern, D. W., Applied multivariate statistical analysis, Prentice-Hall, NJ, 1982.

7.
W. R. Zwick and W. F. Velicer, "Comparison of five rules for determining the number of components to retain", Psychological Bulletin, vol. 99, pp. 432-442, 1986. crossref(new window)

8.
M. S. Bartlett, "Tests of significance in factor analysis", British Journal of Psychology, vol. 3, pp. 77-85, 1950.

9.
M. S. Bartlett, "A further note on tests of significance in factor analysis", British Journal of Psychology, vol. 4, pp. 1-2, 1951.

10.
H. F. Kaiser, "The application of electronic computers to factor analysis", Educational and Psychological Measurement, vol. 20, pp. 141-151, 1960. crossref(new window)

11.
R. B. Cattle, "The scree test for the number of factors", Multivariate Behavioral Research, vol. 1, pp. 245-276, 1966. crossref(new window)

12.
J. L. Horn, "A rationale and test for the number of factors in factor analysis", Psychometrika, vol. 30, pp. 179-185, 1965. crossref(new window)

13.
W. F. Velicer, "Determining the number of components from the matrix of partial correlations", Psychometrika, vol. 41, pp. 321-327, 1976. crossref(new window)

14.
J. Han and M. Kamber, Data mining: concepts & techniques, 2nd ed., Elsevier Inc., New York, 2006.

15.
S. Jun, "A Big Data Learning for Patent Analysis", Journal of Korean Institute of Intelligent Systems, Vol. 23, No. 5, pp. 406-411, 2013. crossref(new window)

16.
B. Choi, J. Kong, and M. Han, "The Model of Network Packet Analysis based on Big Data", Journal of Korean Institute of Intelligent Systems, Vol. 23, No. 5, pp. 392-399, 2013. crossref(new window)

17.
K. Kim, J. Jeong, and G. Park, "Assessment of External Force Acting on Ship Using Big Data in Maritime Traffic", Journal of Korean Institute of Intelligent Systems, Vol. 23, No. 5, pp. 379-384, 2013. crossref(new window)

18.
S. Hong, and M. Han, "The Efficient Method of Parallel Genetic Algorithm using MapReduce of Big Data", Journal of Korean Institute of Intelligent Systems, Vol. 23, No. 5, pp. 385-391, 2013. crossref(new window)

19.
H. C. Cho, and Y. J. Jung, "Probabilistic Modeling of Photovoltaic Power Systems with Big Learning Data Sets", Journal of Korean Institute of Intelligent Systems, Vol. 23, No. 5, pp. 412-417, 2013. crossref(new window)

20.
J. H. Cho, D. J. Lee, J. I. Park and M. G. Chun, "Feature Extraction and Classification of High Dimensional Biomedical Spectral Data", Journal of Korean Institute of Intelligent Systems, Vol. 19, No. 3, pp. 297-303, 2009. crossref(new window)

21.
W. G. Cochran, Sampling techniques, 3rd ed., New York, Wiley, 1977.

22.
W. R. Zwick and W. F. Velicer, "Factors influencing four rules for determining the number of components to retain", Multivariate Behavioral Research, vol. 17, pp. 253-269, 1982. crossref(new window)

23.
N. Cliff, "The eigen value greater than one rule and the reliability of components", Psychological Bulletin, vol. 103, pp. 276-279, 1988. crossref(new window)

24.
R. L. Gorsuch, Factor analysis, 2nd ed., Lawrence Erlbaum Associates, Inc., 1983.

25.
B. P. O'Connor, "SPSS and SAS programs for determining the number og components using parallel analysis and Velicer's MAP test", Behavioral Research Methods Instruments & Computers, vol. 32, pp. 396-402, 2000. crossref(new window)

26.
L. W. Glorfeld, "An improvement on Horn's parallel analysis methodology for selecting the correct number of factors to rertain", Educational and Psychological Measurement, vol. 55, pp. 377-393, 1995. crossref(new window)

27.
R Development Core Team, R: A language and environment for statistical computing, R Foundation for statistical computing, http://www.R-project.org, 2011.