DOI QR코드

DOI QR Code

Analysis of multi-center bladder cancer survival data using variable-selection method of multi-level frailty models

다수준 프레일티모형 변수선택법을 이용한 다기관 방광암 생존자료분석

  • Kim, Bohyeon (Clinical Trial Center, Busan Paik Hospital of Inje University) ;
  • Ha, Il Do (Department of Statistics, Pukyong National University) ;
  • Lee, Donghwan (Department of Statistics, Ewha Womans University)
  • 김보현 (인제대학교 부산백병원 임상시험센터) ;
  • 하일도 (부경대학교 통계학과) ;
  • 이동환 (이화여자대학교 통계학과)
  • Received : 2016.02.04
  • Accepted : 2016.03.21
  • Published : 2016.03.31

Abstract

It is very important to select relevant variables in regression models for survival analysis. In this paper, we introduce a penalized variable-selection procedure in multi-level frailty models based on the "frailtyHL" R package (Ha et al., 2012). Here, the estimation procedure of models is based on the penalized hierarchical likelihood, and three penalty functions (LASSO, SCAD and HL) are considered. The proposed methods are illustrated with multi-country/multi-center bladder cancer survival data from the EORTC in Belgium. We compare the results of three variable-selection methods and discuss their advantages and disadvantages. In particular, the results of data analysis showed that the SCAD and HL methods select well important variables than in the LASSO method.

생존분석 회귀모형에서 적절한 변수를 선택하는 것은 매우 중요하다. 본 논문에서는 "frailtyHL" R 패키지 (Ha 등, 2012)를 기반으로 하여 다수준 프레일티 모형 (multi-level frailty models)에서 벌점화 변수선택 방법 (penalized variable-selection method)의 절차를 소개한다. 여기서 모형 추정은 벌점화 다단계 가능도에 기초하며, 세 가지 벌점 함수 (LASSO, SCAD 및 HL)가 고려된다. 개발된 방법의 예증을 위해 벨기에 EORTC (European Organization for Research and Treatment of Cancer; 유럽 암 치료기구)에서 수행된 다국가/다기관 임상시험 자료를 이용하여 세 가지 변수 선택 방법의 결과를 비교하고, 그 결과들의 상대적 장 단점에 대해 토론한다. 특히, 자료 분석 결과에 의하면 SCAD와 HL방법이 LASSO보다 중요한 변수를 잘 선택하는 것으로 나타났다.

Keywords

References

  1. Androulakis, E., Koukouvinos, C. and Vonta, F. (2012). Estimation and variable selection via frailty models with penalized likelihood. Statistics in Medicine, 31, 2223-2239. https://doi.org/10.1002/sim.5325
  2. Cox, D. R. (1972). Regression models and life tables (with Discussion). Journal of the Royal Statistical Society, Series B, 74, 187-220.
  3. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  4. Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics, 30, 74-99. https://doi.org/10.1214/aos/1015362185
  5. Goldstein, H. (1995). Multilevel statistical models, Arnold, London.
  6. Ha, I. D. and Cho, G. H. (2012). H-likelihood approach for variable selection in gamma frailty models Journal of the Korean Data & Information Science Society, 23, 199-207. https://doi.org/10.7465/jkdi.2012.23.1.199
  7. Ha, I. D. and Lee, Y. (2003). Estimating frailty models via Poisson hierarchical generalized linear models. Journal of Computational and Graphical Statistics, 12, 663-681. https://doi.org/10.1198/1061860032256
  8. Ha, I. D., Lee, Y. and Song, J. K. (2001). Hierarchical likelihood approach for frailty models. Biometrika, 88, 233-243. https://doi.org/10.1093/biomet/88.1.233
  9. Ha, I. D., Lee, Y. and MacKenzie, G. (2007). Model selection for multi-component frailty models. Statistics in Medicine, 26, 4790-4807. https://doi.org/10.1002/sim.2879
  10. Ha, I. D. and Noh, M. (2013). A visualizing method for investigating individual frailties using frailtyHL R-package. Journal of the Korean Data & Information Science Society, 24, 931-940. https://doi.org/10.7465/jkdi.2013.24.4.931
  11. Ha, I. D., Noh, M. and Lee, Y. (2012). FrailtyHL: A package for fitting frailty models with h-likelihood. The R Journal, 4, 28-36.
  12. Ha, I. D. Pan, J., Oh, S. and Lee, Y. (2014). Variable selection in general frailty Models using penalized h-Likelihood. Journal of Computational and Graphical Statistics, 23, 1044-1060. https://doi.org/10.1080/10618600.2013.842489
  13. Ha, I. D., Sylvester, R., Legrand, C. and MacKenzie, G. (2011). Frailty modelling for survival data from multi-centre clinical trials. Statistics in Medicine, 30, 2144-2159. https://doi.org/10.1002/sim.4250
  14. Lee, S. (2015). A note on standardization in penalized regressions. Journal of the Korean Data & Information Science Society, 26, 505-516. https://doi.org/10.7465/jkdi.2015.26.2.505
  15. Lee, W., Lee, D., Lee, Y. and Pawitan, Y. (2011). Sparse canonical covariance analysis for high-throughput data. Statistical Applications in Genetics and Molecular Biology, 10, 1-24.
  16. Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society B, 58, 619-678.
  17. Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalised linear models with random effects: Unified analysis via h-likelihood, Chapman and Hall, London.
  18. Lee, Y. and Oh, H. S. (2014). A new sparse variable selection via random-effect model. Journal of Multivariate Analysis, 125, 89-99. https://doi.org/10.1016/j.jmva.2013.11.016
  19. Oddens, J., Brausi, M., Sylvester, R., Bono, A., Bono, A., Beek, C.V.D., Andel, G.V., Gontero P., Hoeltl, W., Turkeri, L., Marreaud, S., Collette, S. and Oosterlinck, W. (2013). Final results of an EORTCGU cancers group randomized study of maintenance Bacillus Calmette-Guerin in intermediate- and highrisk Ta, T1 papillary carcinoma of the urinary bladder: One-third dose versus full dose and 1 year versus 3 years of maintenance. European Urology, 63, 462-472. https://doi.org/10.1016/j.eururo.2012.10.039
  20. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society B, 58, 267-288.
  21. Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine, 16, 385-395. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
  22. Nelder, J. A. andWedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society A, 135, 370-384. https://doi.org/10.2307/2344614
  23. Wang, H., Li, R. and Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553-568. https://doi.org/10.1093/biomet/asm053
  24. Yau, K. K. W. (2001). Multilevel models for survival analysis with random effects. Biometrics, 57, 96-102. https://doi.org/10.1111/j.0006-341X.2001.00096.x

Cited by

  1. 사례연구: 대구 파티마 병원 폐렴 입원 환자 수에 영향을 미치는 날씨 변수 선택 vol.28, pp.1, 2016, https://doi.org/10.7465/jkdi.2017.28.1.131