A Criterion for the Selection of Principal Components in the Robust Principal Component Regression

- Journal title : Communications for Statistical Applications and Methods
- Volume 18, Issue 6, 2011, pp.761-770
- Publisher : The Korean Statistical Society
- DOI : 10.5351/CKSS.2011.18.6.761

Title & Authors

A Criterion for the Selection of Principal Components in the Robust Principal Component Regression

Kim, Bu-Yong;

Kim, Bu-Yong;

Abstract

Robust principal components regression is suggested to deal with both the multicollinearity and outlier problem. A main aspect of the robust principal components regression is the selection of an optimal set of principal components. Instead of the eigenvalue of the sample covariance matrix, a selection criterion is developed based on the condition index of the minimum volume ellipsoid estimator which is highly robust against leverage points. In addition, the least trimmed squares estimation is employed to cope with regression outliers. Monte Carlo simulation results indicate that the proposed criterion is superior to existing ones.

Keywords

Multicollinearity;outlier;robust principal components regression;minimum volume ellipsoid estimator;condition index;least trimmed squares estimation;

Language

Korean

References

2.

Fauconnier, C. and Haesbroeck, G. (2009). Outliers detections with the minimum covariance determinant estimator in practice, Statistical Methodology, 6, 363-379.

3.

Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272.

4.

Hubert, M. and Verboven, S. (2003). A robust PCR method for high-dimensional regressors, Journal of Chemometrics, 17, 438-452.

5.

Jolliffe, I. T. (1972). Discarding variables in a principal component analysis. I: artificial data, Applied Statistics, 21, 160-173.

6.

Karlis, D., Saporta, G. and Spinakis, A. (2003). A simple rule for the selection of principal components, Communications in Statistics-Theory and Methods, 32, 643-666.

7.

Kim, B. Y. and Kim, H. Y. (2002). Hybrid algorithm for identification of regression outliers, The Korean Communications in Statistics, 9, 291-304.

8.

Kim, B. Y. and Oh, M. H. (2004). Identification of regression outliers based on clustering of LMS-residual plots, The Korean Communications in Statistics, 11, 485-494.

9.

Legendre, P. and Legendre, L. (1998). Numerical Ecology, Elsevier Science, Amsterdam.

10.

Marden, J. I. (1999). Some robust estimates of principal components, Statistics & Probability Letters, 43, 349-359.

11.

Marquardt, D. W. (1970). Generalized inverse, ridge regression, biased linear estimation, and nonlinear estimation, Technometrics, 12, 591-612.

12.

13.

McKean, J. W., Sheather, S. J. and Hettmansperger, T. P. (1993). The use and interpretation of residuals based on robust estimation, Journal of the American Statistical Association, 88, 1254-1263.

14.

Pidot, Jr., G. B. (1969). A principal components of the determinants of local government fiscal patterns, The Review of Economics and Statistics, 51, 176-188.

15.

Rocke, D. M. and Woodruff, D. L. (1997). Robust estimation of multivariate location and shape, Journal of Statistical Planning and Inference, 57, 245-255.

16.

Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880.

17.

Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223.

18.

Rousseeuw, P. J. and Driessen, K. (2006). Computing LTS regression for large data sets, Data Mining and Knowledge Discovery, 12, 29-45.

19.

Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection, Wiley-Interscience.