- Volume 26 Issue 6
Selecting relevant variables for a statistical model is very important in regression analysis. Recently, variable selection methods using a penalized likelihood have been widely studied in various regression models. The main advantage of these methods is that they select important variables and estimate the regression coefficients of the covariates, simultaneously. In this paper, we propose a simple procedure based on a penalized h-likelihood (HL) for variable selection in Poisson hierarchical generalized linear models (HGLMs) for correlated count data. For this we consider three penalty functions (LASSO, SCAD and HL), and derive the corresponding variable-selection procedures. The proposed method is illustrated using a practical example.
LASSO;penalized h-likelihood;Poisson HGLMs;SCAD;variable selection
- Androulakis, E., Koukouvinos, C. and Vonta, F. (2012). Estimation and variable selection via frailty models with penalized likelihood. Statistics in Medicine, 31, 2223-2239. https://doi.org/10.1002/sim.5325
- Efron, B. and Morris, C. (1975). Data analysis using Steins estimator and its generalizations. Journal of the American Statistical Association, 70, 311-319. https://doi.org/10.1080/01621459.1975.10479864
- Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
- Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics, 30, 74-99. https://doi.org/10.1214/aos/1015362185
- Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101-148.
- Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32, 928-961. https://doi.org/10.1214/009053604000000256
- Ha, I. D. and Cho, G.-H. (2012). H-likelihood approach for variable selection in gamma frailty models. Journal of the Korean Data & Information Science Society, 23, 199-207. https://doi.org/10.7465/jkdi.2012.23.1.199
- Ha, I. D. and Lee, Y. (2003). Estimating frailty models via Poisson hierarchical generalized linear models. Journal of Computational and Graphical Statistics, 12, 663-681. https://doi.org/10.1198/1061860032256
- Ha, I. D., Lee, Y. and MacKenzie, G. (2007). Model selection for multi-component frailty models. Statistics in Medicine, 26, 4790-4807. https://doi.org/10.1002/sim.2879
- Ha, I. D. and Noh, M. (2013). A visualizing method for investigating individual frailties using frailtyHL R-package. Journal of the Korean Data & Information Science Society, 24, 931-940. https://doi.org/10.7465/jkdi.2013.24.4.931
- Ha, I. D. Pan, J., Oh, S. and Lee, Y. (2014). Variable selection in general frailty Models using penalized h-Likelihood. Journal of Computational and Graphical Statistics, 23, 1044-1060. https://doi.org/10.1080/10618600.2013.842489
- Ha, I. D., Sylvester, R., Legrand, C. and MacKenzie, G. (2011). Frailty modelling for survival data from multi-centre clinical trials. Statistics in Medicine, 30, 2144-2159. https://doi.org/10.1002/sim.4250
- Hunter, D. and Li, R. (2005). Variable selection using MM algorithms. The Annals of Statistics, 33, 1617-1642. https://doi.org/10.1214/009053605000000200
- Johnson, B. A., Lin, D. Y. and Zeng, D. (2008). Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103, 672-680. https://doi.org/10.1198/016214508000000184
- Kwon, S., Oh, S. and Lee Y. (2014). The use of random-effect models for high-dimensional variable selection problems. revision sent to Scandinavian Journal of Statistics.
- Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2010). Super sparse principal component analysis for high-throughput genomic data. BMC Bioinformatics, 11, 296. https://doi.org/10.1186/1471-2105-11-296
- Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2011a). Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemo-metrics and Intelligent Laboratory Systems, 109, 1-8. https://doi.org/10.1016/j.chemolab.2011.07.002
- Lee, S. (2015). A note on standardization in penalized regressions. Journal of the Korean Data & Information Science Society, 26, 505-516. https://doi.org/10.7465/jkdi.2015.26.2.505
- Lee, W., Lee, D., Lee, Y. and Pawitan, Y. (2011b). Sparse canonical covariance analysis for high-throughput data. Statistical Applications in Genetics and Molecular Biology, 10, 1-24.
- Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series B, 58, 619-678.
- Lee, Y. and Nelder, J. A. (2006). Double hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series C, 55, 139-185. https://doi.org/10.1111/j.1467-9876.2006.00538.x
- Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalised Linear Models with Random Effects: Unified Analysis via h-Likelihood, London, Chapman and Hall.
- Lee, Y. and Oh, H. S. (2014). A new sparse variable selection via random-effect model. Journal of Multivariate Analysis, 125, 89-9. https://doi.org/10.1016/j.jmva.2013.11.016
- Paik, M. C., Lee, Y. and Ha, I. D. (2015). Frequentist inference on random effects based on summarizability. Statistica Sinica, 25, 11071132.
- Rondeau, V., Michiels, S., Liquet, B. and Pignon, J. P. (2008). Investigating trial and treatment heterogeneity in an individual patient data meta-analysis of survival data by means of the penalized maximum likelihood approach. Statistics in Medicine, 27, 1894-1910. https://doi.org/10.1002/sim.3161
- Shin, S. B. and Kim, Y. J. (2014). Statistical analysis of recurrent gap time events with incomplete observation gaps. Journal of the Korean Data & Information Science Society, 25, 327-336. https://doi.org/10.7465/jkdi.2014.25.2.327
- Thall and Vail (1990) Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics, 46, 657-671. https://doi.org/10.2307/2532086
- Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.
- Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine, 16, 385-395. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
- Wang, H., Li, R. and Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553-568. https://doi.org/10.1093/biomet/asm053
- Zhang, Y., Li, R. and Tsai, C. L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105, 312-323. https://doi.org/10.1198/jasa.2009.tm08013
- Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735
- Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of Royal Statistical Society B, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
연구 과제 주관 기관 : Pukyong National University