DOI QR코드

DOI QR Code

Hierarchically penalized sparse principal component analysis

계층적 벌점함수를 이용한 주성분분석

  • Received : 2016.11.16
  • Accepted : 2017.01.25
  • Published : 2017.02.28

Abstract

Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.

주성분 분석(principal component analysis; PCA)은 서로 상관되어 있는 다변량 자료의 차원을 축소하는 대표적인 기법으로 많은 다변량 분석에서 활용되고 있다. 하지만 주성분은 모든 변수들의 선형결합으로 이루어지므로, 그 결과의 해석이 어렵다는 한계가 있다. sparse PCA(SPCA) 방법은 elastic net 형태의 벌점함수를 이용하여 보다 성긴(sparse) 적재를 가진 수정된 주성분을 만들어주지만, 변수들의 그룹구조를 이용하지 못한다는 한계가 있다. 이에 본 연구에서는 기존 SPCA를 개선하여, 자료가 그룹화되어 있는 경우에 유의한 그룹을 선택함과 동시에 그룹 내 불필요한 변수를 제거할 수 있는 새로운 주성분 분석 방법을 제시하고자 한다. 그룹과 그룹 내 변수 구조를 모형 적합에 이용하기 위하여, sparse 주성분 분석에서의 elastic net 벌점함수 대신에 계층적 벌점함수 형태를 고려하였다. 또한 실제 자료의 분석을 통해 제안 방법의 성능 및 유용성을 입증하였다.

Keywords

References

  1. Bernard, A., Guinot, C., and Saporta, G. (2012). Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In Proceedings of 20th International Conference on Computational Statistics (pp. 99-106).
  2. Gemperline, P. J., Miller, K. H., West, T. L., Weinstein, J. E., Hamilton, J. C., and Bray, J. T. (1992). Principal component analysis, trace elements, and blue crab shell disease, Analytical Chemistry, 64, 523-531. https://doi.org/10.1021/ac00029a014
  3. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634
  4. Kang, J., Bang, S., and Jhun, M. (2016). Hierarchically penalized quantile regression, Journal of Statistical Computation and Simulation, 86, 340-356. https://doi.org/10.1080/00949655.2015.1014038
  5. Rose, A. K. and Spiegel, M. M. (2011). Cross-country causes and consequences of the crisis: an update, European Economic Review, 55, 309-324. https://doi.org/10.1016/j.euroecorev.2010.12.006
  6. Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, 99, 1015-1034. https://doi.org/10.1016/j.jmva.2007.06.007
  7. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 58, 267-288.
  8. Wang, S., Nan, B., Zhou, N., and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables, Biometrika, 96, 307-322. https://doi.org/10.1093/biomet/asp016
  9. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B (Methodological), 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  10. Zou, H. and Hastie, T. (2003). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B (Methodological), 67, 301-320.
  11. Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis, Journal of Computational and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430