DOI QR코드

DOI QR Code

A two-step approach for variable selection in linear regression with measurement error

  • Received : 2018.10.16
  • Accepted : 2019.01.07
  • Published : 2019.01.31

Abstract

It is important to identify informative variables in high dimensional data analysis; however, it becomes a challenging task when covariates are contaminated by measurement error due to the bias induced by measurement error. In this article, we present a two-step approach for variable selection in the presence of measurement error. In the first step, we directly select important variables from the contaminated covariates as if there is no measurement error. We then apply, in the following step, orthogonal regression to obtain the unbiased estimates of regression coefficients identified in the previous step. In addition, we propose a modification of the two-step approach to further enhance the variable selection performance. Various simulation studies demonstrate the promising performance of the proposed method.

Keywords

References

  1. Amemiya Y and Fuller WA (1984). Estimation for the multivariate errors-in-variables model with estimated error covariance matrix, The Annals of Statistics, 12, 497-509. https://doi.org/10.1214/aos/1176346502
  2. Carroll RJ, Ruppert D, Stefanski LA, and Crainiceanu C (2006). Measurement Error in Nonlinear Models: A Modern Perspective, CRC Press.
  3. Cook JR and Stefanski LA (1994). Simulation-extrapolation estimation in parametric measurement error models, Journal of the American Statistical Association, 89, 1314-1328. https://doi.org/10.1080/01621459.1994.10476871
  4. Efron B, Hastie T, Johnstone I, and Tibshirani R (2004). Least angle regression, The Annals of Statistics, 32, 407-499. https://doi.org/10.1214/009053604000000067
  5. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  6. Fuller WA (1987). Measurement Error Models, John Willey, New York.
  7. Harrison D and Rubinfeld DL (1978). Hedonic housing prices and the demand for clean air, Journal of Environmental Economics and Management, 5, 81-102. https://doi.org/10.1016/0095-0696(78)90006-2
  8. Liang H and Li R (2009). Variable selection for partially linear models with measurement errors, Journal of the American Statistical Association, 104, 234-248. https://doi.org/10.1198/jasa.2009.0127
  9. Ma Y and Li R (2010). Variable selection in measurement error models, Bernoulli: official journal of the Bernoulli Society for Mathematical Statistics and Probability, 16, 274. https://doi.org/10.3150/09-BEJ205
  10. Meinshausen N (2007). Relaxed lasso, Computational Statistics & Data Analysis, 52, 374-393. https://doi.org/10.1016/j.csda.2006.12.019
  11. Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  12. Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729