Fast robust variable selection using VIF regression in large datasets

Seo, Han Son;

doi:10.5351/KJAS.2018.31.4.463

The Korean Journal of Applied Statistics (응용통계연구)

Volume 31 Issue 4
/
Pages.463-473
/
2018
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Fast robust variable selection using VIF regression in large datasets

대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법

Seo, Han Son (Department of Applied Statistics, Konkuk University)

서한손 (건국대학교 응용통계학과)

Received : 2018.05.02
Accepted : 2018.06.11
Published : 2018.08.31

https://doi.org/10.5351/KJAS.2018.31.4.463 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

연구에서는 선형회귀모형을 가정한 대형 데이터에서의 변수선택 알고리즘을 다룬다. 방법의 속도와 강건성에 주안점을 둔 여러 알고리즘들이 제안되었다. 그 중에서 streamwise 회귀 접근법을 사용한 VIF회귀는 신속하고 정확하게 수행된다. 그러나 VIF회귀는 최소제곱방법에 의해 모형이 추정되므로 이상치에 민감하다. 변수선택방법의 강건성을 높이기 위해 가중 추정치를 사용한 강건측도가 제안되었으며 강건 VIF회귀도 제안되었다. 본 연구에서는 잠재적 이상치를 탐지하여 제거한 후 VIF회귀를 수행하는, 빠르고 강건한 변수선택 방법을 제안한다. 제안된 방법은 모의실험과 데이터 분석 통해 다른 방법들과 비교된다.

Keywords

References

Dupuis, D. J. and Victoria-Feser, M. P. (2011). Fast robust model selection in large Datasets, Journal of the American Statistical Association, 106, 203-212. https://doi.org/10.1198/jasa.2011.tm09650
Dupuis, D. J. and Victoria-Feser, M. P. (2013). Robust VIF regression with application to variable selection in large data sets, Annals of Applied Statistics, 7, 319-341. https://doi.org/10.1214/12-AOAS584
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society. Series B, 70, 849-911. https://doi.org/10.1111/j.1467-9868.2008.00674.x
Foster, D. P. and Stine, R. A. (2008). investing: a procedure for sequential control of expected false discoveries, Journal of the Royal Statistical Society. Series B, 70, 429-444. https://doi.org/10.1111/j.1467-9868.2007.00643.x
Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
Harrison, D. and Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air, Journal of Environmental Economics and Management, 5, 81-102. https://doi.org/10.1016/0095-0696(78)90006-2
Lin, D., Foster, D. P., and Ungar, L. H. (2011). VIF regression: a fast regression algorithm for large data, Journal of the American Statistical Association, 106, 232-247. https://doi.org/10.1198/jasa.2011.tm10113
Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
Zhou, J., Foster, D. P., and Ungar, L. H. (2006). Streamwise feature selection, Journal of Machine Learning Research, 7, 1861-1885.

The Korean Journal of Applied Statistics (응용통계연구)

Fast robust variable selection using VIF regression in large datasets

대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)