A Comparative Study of Microarray Data with Survival Times Based on Several Missing Mechanism

Kim Jee-Yun;Hwang Jin-Soo;Kim Seong-Sun

  • 발행 : 2006.04.01


One of the most widely used method of handling missingness in microarray data is the kNN(k Nearest Neighborhood) method. Recently Li and Gui (2004) suggested, so called PCR(Partial Cox Regression) method which deals with censored survival times and microarray data efficiently via kNN imputation method. In this article, we try to show that the way to treat missingness eventually affects the further statistical analysis.




  1. Bishop, C.M. (1999). Variational principal components. In IEE Conference Publication on Artificial Neural Networks, 509-514
  2. Efron, B., Johnston, I., Hastie, T. and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, Vol. 32, 407-499
  3. Gui, J. and Li, H. (2004). Penalized Cox Regression Analysis in the High-Dimensional and Low-sample Size Settings with Applications to Mi-croarray Gene Expression Data. Center for Bioinformatics & Molecular Biostatistics
  4. Kim, H., Golub, G.H. and Park, H. (2005). Missing value estimation for DNA microarray gene expression data : local least squares imputation. Bioinformatics, Vol. 21, 187-198
  5. Kim, K.Y., Kim, B.J. and Yi, G.S. (2004). Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinformatics, Vol. 5, 160
  6. Li, H. and Gui, J. (2004). Partial Cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics, Vol. 20, i208-i215
  7. Li, H. and Luan, Y. (2003). Kernel Cox regression models for linking gene expression profiles to censored survival data. Pacific Symposium on Biocomputing, 65-76
  8. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K. and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, Vol. 19, 2088-2096
  9. Rosenwald, A., Wright, G., Chan, W.C, Connors, J.M., Campo, E., Fisher, R.I., Gascoyne, R.D., Muller-Hermelink, H.K., Smeland, E.B. and Staudt, L.M. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. The New England Journal of Medicine, Vol. 346, 1937-1947
  10. Rubin, D.B. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association, Vol. 72, 538-543
  11. Segal, M.R. (2005). Microarray gene expression data with linked survival phenotypes : Diffuse large- B-cell lymphoma revisited. Center for Bioinformatics & Molecular Biostatistics
  12. Tibshirani, R. (1997). The Lasso method for variable selection in the Cox model. Statistics in Medicine, Vol. 16, 385-395<385::AID-SIM380>3.0.CO;2-3
  13. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, Vol. 67, 301-320
  14. Bo, T.H., Dysvik, B. and Jonassen, I. (2004). Lsimpute : accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Research, Vol. 32, No.3 e34
  15. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R, Botstein, D. and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, Vol. 17, 520-525
  16. Hastie, T., Alter, O., Sherlock, G., Eisen, M., Tibshirani, R., Botstein, D. and Brown, P. (1999). Imputation of missing values in DNA microarrays. Technical report Stanford University Statistics Department
  17. Park, P.J., Tian, L. and Kohane, I.S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics, Vol. 18, S120-S127