DOI QR코드

DOI QR Code

Variable selection in Poisson HGLMs using h-likelihoood

Ha, Il Do;Cho, Geon-Ho

  • 투고 : 2015.07.23
  • 심사 : 2015.09.17
  • 발행 : 2015.11.30

초록

Selecting relevant variables for a statistical model is very important in regression analysis. Recently, variable selection methods using a penalized likelihood have been widely studied in various regression models. The main advantage of these methods is that they select important variables and estimate the regression coefficients of the covariates, simultaneously. In this paper, we propose a simple procedure based on a penalized h-likelihood (HL) for variable selection in Poisson hierarchical generalized linear models (HGLMs) for correlated count data. For this we consider three penalty functions (LASSO, SCAD and HL), and derive the corresponding variable-selection procedures. The proposed method is illustrated using a practical example.

키워드

LASSO;penalized h-likelihood;Poisson HGLMs;SCAD;variable selection

참고문헌

  1. Androulakis, E., Koukouvinos, C. and Vonta, F. (2012). Estimation and variable selection via frailty models with penalized likelihood. Statistics in Medicine, 31, 2223-2239. https://doi.org/10.1002/sim.5325
  2. Efron, B. and Morris, C. (1975). Data analysis using Steins estimator and its generalizations. Journal of the American Statistical Association, 70, 311-319. https://doi.org/10.1080/01621459.1975.10479864
  3. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  4. Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics, 30, 74-99. https://doi.org/10.1214/aos/1015362185
  5. Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101-148.
  6. Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32, 928-961. https://doi.org/10.1214/009053604000000256
  7. Ha, I. D. and Cho, G.-H. (2012). H-likelihood approach for variable selection in gamma frailty models. Journal of the Korean Data & Information Science Society, 23, 199-207. https://doi.org/10.7465/jkdi.2012.23.1.199
  8. Ha, I. D. and Lee, Y. (2003). Estimating frailty models via Poisson hierarchical generalized linear models. Journal of Computational and Graphical Statistics, 12, 663-681. https://doi.org/10.1198/1061860032256
  9. Ha, I. D., Lee, Y. and MacKenzie, G. (2007). Model selection for multi-component frailty models. Statistics in Medicine, 26, 4790-4807. https://doi.org/10.1002/sim.2879
  10. Ha, I. D. and Noh, M. (2013). A visualizing method for investigating individual frailties using frailtyHL R-package. Journal of the Korean Data & Information Science Society, 24, 931-940. https://doi.org/10.7465/jkdi.2013.24.4.931
  11. Ha, I. D. Pan, J., Oh, S. and Lee, Y. (2014). Variable selection in general frailty Models using penalized h-Likelihood. Journal of Computational and Graphical Statistics, 23, 1044-1060. https://doi.org/10.1080/10618600.2013.842489
  12. Ha, I. D., Sylvester, R., Legrand, C. and MacKenzie, G. (2011). Frailty modelling for survival data from multi-centre clinical trials. Statistics in Medicine, 30, 2144-2159. https://doi.org/10.1002/sim.4250
  13. Hunter, D. and Li, R. (2005). Variable selection using MM algorithms. The Annals of Statistics, 33, 1617-1642. https://doi.org/10.1214/009053605000000200
  14. Johnson, B. A., Lin, D. Y. and Zeng, D. (2008). Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103, 672-680. https://doi.org/10.1198/016214508000000184
  15. Kwon, S., Oh, S. and Lee Y. (2014). The use of random-effect models for high-dimensional variable selection problems. revision sent to Scandinavian Journal of Statistics.
  16. Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2010). Super sparse principal component analysis for high-throughput genomic data. BMC Bioinformatics, 11, 296. https://doi.org/10.1186/1471-2105-11-296
  17. Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2011a). Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemo-metrics and Intelligent Laboratory Systems, 109, 1-8. https://doi.org/10.1016/j.chemolab.2011.07.002
  18. Lee, S. (2015). A note on standardization in penalized regressions. Journal of the Korean Data & Information Science Society, 26, 505-516. https://doi.org/10.7465/jkdi.2015.26.2.505
  19. Lee, W., Lee, D., Lee, Y. and Pawitan, Y. (2011b). Sparse canonical covariance analysis for high-throughput data. Statistical Applications in Genetics and Molecular Biology, 10, 1-24.
  20. Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series B, 58, 619-678.
  21. Lee, Y. and Nelder, J. A. (2006). Double hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series C, 55, 139-185. https://doi.org/10.1111/j.1467-9876.2006.00538.x
  22. Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalised Linear Models with Random Effects: Unified Analysis via h-Likelihood, London, Chapman and Hall.
  23. Lee, Y. and Oh, H. S. (2014). A new sparse variable selection via random-effect model. Journal of Multivariate Analysis, 125, 89-9. https://doi.org/10.1016/j.jmva.2013.11.016
  24. Paik, M. C., Lee, Y. and Ha, I. D. (2015). Frequentist inference on random effects based on summarizability. Statistica Sinica, 25, 11071132.
  25. Rondeau, V., Michiels, S., Liquet, B. and Pignon, J. P. (2008). Investigating trial and treatment heterogeneity in an individual patient data meta-analysis of survival data by means of the penalized maximum likelihood approach. Statistics in Medicine, 27, 1894-1910. https://doi.org/10.1002/sim.3161
  26. Shin, S. B. and Kim, Y. J. (2014). Statistical analysis of recurrent gap time events with incomplete observation gaps. Journal of the Korean Data & Information Science Society, 25, 327-336. https://doi.org/10.7465/jkdi.2014.25.2.327
  27. Thall and Vail (1990) Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics, 46, 657-671. https://doi.org/10.2307/2532086
  28. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.
  29. Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine, 16, 385-395. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
  30. Wang, H., Li, R. and Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553-568. https://doi.org/10.1093/biomet/asm053
  31. Zhang, Y., Li, R. and Tsai, C. L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105, 312-323. https://doi.org/10.1198/jasa.2009.tm08013
  32. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735
  33. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of Royal Statistical Society B, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

피인용 문헌

  1. ML estimation using Poisson HGLM approach in semi-parametric frailty models vol.27, pp.5, 2016, https://doi.org/10.7465/jkdi.2015.26.6.1513
  2. Joint HGLM approach for repeated measures and survival data vol.27, pp.4, 2016, https://doi.org/10.7465/jkdi.2015.26.6.1513

과제정보

연구 과제 주관 기관 : Pukyong National University