Hierarchically penalized sparse principal component analysis

Kang, Jongkyeong;Park, Jaeshin;Bang, Sungwan;

doi:10.5351/KJAS.2017.30.1.135

The Korean Journal of Applied Statistics (응용통계연구)

Volume 30 Issue 1
/
Pages.135-145
/
2017
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Hierarchically penalized sparse principal component analysis

계층적 벌점함수를 이용한 주성분분석

Kang, Jongkyeong (Department of Mathematics, Korea Military Academy) ;
Park, Jaeshin (Department of Mathematics, Korea Military Academy) ;
Bang, Sungwan (Department of Mathematics, Korea Military Academy)

강종경 (육군사관학교 수학과) ;
박재신 (육군사관학교 수학과) ;
방성완 (육군사관학교 수학과)

Received : 2016.11.16
Accepted : 2017.01.25
Published : 2017.02.28

https://doi.org/10.5351/KJAS.2017.30.1.135 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.

주성분 분석(principal component analysis; PCA)은 서로 상관되어 있는 다변량 자료의 차원을 축소하는 대표적인 기법으로 많은 다변량 분석에서 활용되고 있다. 하지만 주성분은 모든 변수들의 선형결합으로 이루어지므로, 그 결과의 해석이 어렵다는 한계가 있다. sparse PCA(SPCA) 방법은 elastic net 형태의 벌점함수를 이용하여 보다 성긴(sparse) 적재를 가진 수정된 주성분을 만들어주지만, 변수들의 그룹구조를 이용하지 못한다는 한계가 있다. 이에 본 연구에서는 기존 SPCA를 개선하여, 자료가 그룹화되어 있는 경우에 유의한 그룹을 선택함과 동시에 그룹 내 불필요한 변수를 제거할 수 있는 새로운 주성분 분석 방법을 제시하고자 한다. 그룹과 그룹 내 변수 구조를 모형 적합에 이용하기 위하여, sparse 주성분 분석에서의 elastic net 벌점함수 대신에 계층적 벌점함수 형태를 고려하였다. 또한 실제 자료의 분석을 통해 제안 방법의 성능 및 유용성을 입증하였다.

Keywords

References

Bernard, A., Guinot, C., and Saporta, G. (2012). Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In Proceedings of 20th International Conference on Computational Statistics (pp. 99-106).
Gemperline, P. J., Miller, K. H., West, T. L., Weinstein, J. E., Hamilton, J. C., and Bray, J. T. (1992). Principal component analysis, trace elements, and blue crab shell disease, Analytical Chemistry, 64, 523-531. https://doi.org/10.1021/ac00029a014
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634
Kang, J., Bang, S., and Jhun, M. (2016). Hierarchically penalized quantile regression, Journal of Statistical Computation and Simulation, 86, 340-356. https://doi.org/10.1080/00949655.2015.1014038
Rose, A. K. and Spiegel, M. M. (2011). Cross-country causes and consequences of the crisis: an update, European Economic Review, 55, 309-324. https://doi.org/10.1016/j.euroecorev.2010.12.006
Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, 99, 1015-1034. https://doi.org/10.1016/j.jmva.2007.06.007
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 58, 267-288.
Wang, S., Nan, B., Zhou, N., and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables, Biometrika, 96, 307-322. https://doi.org/10.1093/biomet/asp016
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B (Methodological), 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
Zou, H. and Hastie, T. (2003). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B (Methodological), 67, 301-320.
Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis, Journal of Computational and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430

The Korean Journal of Applied Statistics (응용통계연구)

Hierarchically penalized sparse principal component analysis

계층적 벌점함수를 이용한 주성분분석

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)