DOI QR코드

DOI QR Code

Hierarchically penalized sparse principal component analysis

계층적 벌점함수를 이용한 주성분분석

  • Received : 2016.11.16
  • Accepted : 2017.01.25
  • Published : 2017.02.28

Abstract

Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.

Acknowledgement

Supported by : 화랑대연구소, 한국연구재단

References

  1. Bernard, A., Guinot, C., and Saporta, G. (2012). Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In Proceedings of 20th International Conference on Computational Statistics (pp. 99-106).
  2. Gemperline, P. J., Miller, K. H., West, T. L., Weinstein, J. E., Hamilton, J. C., and Bray, J. T. (1992). Principal component analysis, trace elements, and blue crab shell disease, Analytical Chemistry, 64, 523-531. https://doi.org/10.1021/ac00029a014
  3. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634
  4. Kang, J., Bang, S., and Jhun, M. (2016). Hierarchically penalized quantile regression, Journal of Statistical Computation and Simulation, 86, 340-356. https://doi.org/10.1080/00949655.2015.1014038
  5. Rose, A. K. and Spiegel, M. M. (2011). Cross-country causes and consequences of the crisis: an update, European Economic Review, 55, 309-324. https://doi.org/10.1016/j.euroecorev.2010.12.006
  6. Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, 99, 1015-1034. https://doi.org/10.1016/j.jmva.2007.06.007
  7. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 58, 267-288.
  8. Wang, S., Nan, B., Zhou, N., and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables, Biometrika, 96, 307-322. https://doi.org/10.1093/biomet/asp016
  9. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B (Methodological), 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  10. Zou, H. and Hastie, T. (2003). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B (Methodological), 67, 301-320.
  11. Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis, Journal of Computational and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430