Principal Components Logistic Regression based on Robust Estimation

- Journal title : Korean Journal of Applied Statistics
- Volume 22, Issue 3, 2009, pp.531-539
- Publisher : The Korean Statistical Society
- DOI : 10.5351/KJAS.2009.22.3.531

Title & Authors

Principal Components Logistic Regression based on Robust Estimation

Kim, Bu-Yong; Kahng, Myung-Wook; Jang, Hea-Won;

Kim, Bu-Yong; Kahng, Myung-Wook; Jang, Hea-Won;

Abstract

Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

Keywords

Datamining;multicollinearity;outlier;principal components logistic regression;robust estimation;

Language

Korean

Cited by

References

1.

Aguilera, A. M., Escabias, M. and Valderrama, M. J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data, Computational Statistics & Data Analysis, 50, 1905-1924

2.

Carroll, R. J. and Pederson, S. (1993), On robustness in the logistic regression model, Journal of the Royal Statistical Society, Series E, 55, 693-706

3.

Copas, J. B. (1988). Binary regression models for contaminated data, Journal of the Royal Statistical Society, Series E, 50, 225-265

4.

Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression, Computational Statistics & Data Analysis, 44, 273-295

5.

Hadi, A. S. (1994). A modification of a method for the detection of outliers in multivariate samples, Journal of the Royal Statistical Society, Series E, 56, 393-396

6.

Hardin, J. and Rocke, D. M. (2004). Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, 44, 625-638

7.

Kim, B. Y. (2005). V-mask type criterion for identification of outliers in logistic regression, The Korean Communications in Statistics, 12, 625-634

8.

Kim, B. Y. and Kahng, M. W. (2008). Principal components regression in logistic model, The Korean Journal of Applied Statistics, 21, 571-580

9.

Kim, B. Y., Kahng, M. W. and Choi, M. A. (2007). Algorithm for the robust estimation in logistic regression, The Korean Journal of Applied Statistics, 20, 551-559

10.

Kordzakhia, N., Mishra, G. D. and Reiersolmoen, L. (2001). Robust estimation in the logistic regression model, Journal of Statistical Planning and Inference, 98, 211-223

11.

Mason, R. L. and Gunst, R. F. (1985). Selecting principal components in regression, Statistics & Probability Letters, 3, 299-301

12.

Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223

13.

Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection, John Wiley & Sons, New York