DOI QR코드

DOI QR Code

Estimation and variable selection in censored regression model with smoothly clipped absolute deviation penalty

  • Shim, Jooyong (Department of Statistics, Institute of Statistical Information, Inje University) ;
  • Bae, Jongsig (Department of Mathematics, Sungkyunkwan University) ;
  • Seok, Kyungha (Department of Statistics, Institute of Statistical Information, Inje University)
  • 투고 : 2016.09.28
  • 심사 : 2016.11.23
  • 발행 : 2016.11.30

초록

Smoothly clipped absolute deviation (SCAD) penalty is known to satisfy the desirable properties for penalty functions like as unbiasedness, sparsity and continuity. In this paper, we deal with the regression function estimation and variable selection based on SCAD penalized censored regression model. We use the local linear approximation and the iteratively reweighted least squares algorithm to solve SCAD penalized log likelihood function. The proposed method provides an efficient method for variable selection and regression function estimation. The generalized cross validation function is presented for the model selection. Applications of the proposed method are illustrated through the simulated and a real example.

키워드

참고문헌

  1. Bair, E. and Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS Biology, 2, 511-522.
  2. Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66, 429-436. https://doi.org/10.1093/biomet/66.3.429
  3. Cox, D. R. (1972) Regression models and life tables (with discussions). Journal of the Royal Statistical Society B, 74, 187-220.
  4. Geyer, C. J. (1992). Practical Markov chain Monte Carlo (with discussion). Statistical Science, 7, 473-511. https://doi.org/10.1214/ss/1177011137
  5. Ghosh, K. S. and Ghosal, S. (2006). Semiparametric accelerated failure time models for censored data. Bayesian Statistics and Its Applications, 15, 213-229.
  6. Hu, S. and Rao, J. S. (2010). Sparse penalization with censoring constraints for estimating high dimensional AFT models with applications to microarray data analysis, Technical Report 07 of Division of Biostatistics, Case Western Reserve University, OH, USA.
  7. Huang, J., Ma, S. and Xie, H. (2005). Regularized estimation in the accelerated failure time model with high dimensional covariates, Technical Report No. 349, Department of Statistics and Actuarial Science, The University of Iowa, IA, USA.
  8. Hwang, C., Kim, M. and Shim, J. (2011). Variable selection in L1 penalized censored regression. Journal of the Korean Data & Information Science Society, 22, 951-959.
  9. Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of American Statistical Association, 53, 457-481. https://doi.org/10.1080/01621459.1958.10501452
  10. Kim, J., Sohn, I., Kim, D. H., Son, D. S., Ahn, H. and Jung, S. H. (2013). Prediction of a time-to-event trait using genome wide SNP data. BMC Bioinformatics, 14, 58. https://doi.org/10.1186/1471-2105-14-58
  11. Koul, H., Susarla, V. and Van Ryzin, J. (1981). Regression analysis with randomly right censored data. The Annal of Statistics, 9, 1276-1288. https://doi.org/10.1214/aos/1176345644
  12. Krishnapuram, B., Carlin, L., Figueiredo, M. A. T. and Hartermink, A. J. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 957-968. https://doi.org/10.1109/TPAMI.2005.127
  13. Li, H. (2006). Censored data regression in high-dimension and low-sample size settings for genomic applications, UPenn Biostatistics Working Paper 9, University of Pennsylvania, PA, USA.
  14. Orbe, J., Ferreira, E. and Nunez-Anton, V. (2003). Censored partial regression. Biostatistics, 4, 109-121. https://doi.org/10.1093/biostatistics/4.1.109
  15. Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., Giltnane J. M. and et al. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New England Journal of Medicine, 346, 1937-1947. https://doi.org/10.1056/NEJMoa012914
  16. Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Statistical Medicine, 11, 2093-2099. https://doi.org/10.1002/sim.4780111607
  17. Shim, J. and Seok, K. (2014). A transductive least squares support vector machine with the difference convex algorithm. Journal of the Korean Data & Information Science Society, 25, 455-464. https://doi.org/10.7465/jkdi.2014.25.2.455
  18. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267-288.
  19. Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385-395. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
  20. Zhou, M. (1992). M-estimation in censored linear models. Biometrika, 79, 837-841. https://doi.org/10.1093/biomet/79.4.837

피인용 문헌

  1. 사례연구: 대구 파티마 병원 폐렴 입원 환자 수에 영향을 미치는 날씨 변수 선택 vol.28, pp.1, 2016, https://doi.org/10.7465/jkdi.2017.28.1.131