Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha

  • Received : 20111100
  • Accepted : 20111200
  • Published : 2011.12.31


Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.


Gene expression;imputation;gene selection;classication


  1. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Bordrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T. Jr. J. H., Lu, L., Lwis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brouwn, P. O. and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511.
  2. Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77-87.
  3. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, K. P., Coller, H., Loh, M., Downing, J. R., Caligiuri, M. A., Bloom eld, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 531-537.
  4. Guyon, I., Weston, J. and Barnhill, S. (2002). Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422.
  5. Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C. and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679.
  6. Kim, H., Golub, G. H. and Park, H. (2005). Missing value estimation for DNA microarray gene expression data: Local least squares imputation, Bioinformatics, 21, 187-198.
  7. Lee, J. W., Lee, J. B., Park, M. and Song, S. H. (2005). An extensive comparison of recent classification tools applied to microarray data, Computational Statistics and Data Analysis, 48, 869-885.
  8. Liew, A. W., Law, N. and Yan, H. (2010). Missing value imputation for gene expression data: computational techniques to recover missing data from available information, Briefings in Bioinformatics, 12, 498-513.
  9. Liu, X., Krishnan, A. and Mondry, A. (2005). An entropy-based gene selection method for cancer classification using microarray data, BMC Bioinformatics, 6, 76.
  10. Nguyen, D. V., Wang, N. and Carroll, R. J. (2004). Evaluation of missing value estimation for microarray data, Journal of Data Science, 2, 347-370.
  11. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K. and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data, Bioinformatics, 19, 2088-2096.
  12. Scheel, I., Aldrin, M., Glad, I. K., Sorum, R., Lyng, H. and Frigessi, A. (2005). The influence of missing value imputation on detection of differentially expressed genes from microarray data, Bioinformatics, 21, 4272-4279.
  13. Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression, Proceedings of the National Academy of Sciences, 99, 6567-6572.
  14. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525.