DOI QR코드

DOI QR Code

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

  • Oh, Hee-Seok (Department of Statistics, Seoul National University) ;
  • Jang, Dong-Ik (Department of Statistics, Seoul National University) ;
  • Oh, Seung-Yoon (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Kim, Hee-Bal (Interdisciplinary Program in Bioinformatics, Seoul National University)
  • Received : 2010.03.17
  • Accepted : 2010.05.31
  • Published : 2010.06.30

Abstract

The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

Keywords

References

  1. Aittokallio, T., Kurki, M., Nevalainen, O., Nikula, T., West, A. and Lahesmaa, R. (2003). Computational strategies for analyzing data in gene expression microarray experiments. J Bioinform Comput Biol 1, 541-586. https://doi.org/10.1142/S0219720003000319
  2. Allison, D. B., Cui, X., Page, G. P. and Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7, 55-65. https://doi.org/10.1038/nrg1749
  3. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289-300.
  4. Cox, D. D. (1983). Asymptotics for M-type smoothing splines. Ann. Statist 11, 530-551. https://doi.org/10.1214/aos/1176346159
  5. Cui, X., Hwang, J. T., Qiu, J., Blades, N. J. and Churchill, G. A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. https://doi.org/10.1093/biostatistics/kxh018
  6. Gosset, W. S. (1908). The probable error of a mean. Biometrika 6, 1-25. https://doi.org/10.1093/biomet/6.1.1
  7. Hever, A., Roth, R. B., Hevezi, P., Marin, M. E., Acosta, J. A., Acosta, H., Rojas, J., Herrera, R., Grigoriadis, D., White, E., Conlon, P. J., Maki, R. A. and Zlotnik, A. (2007). Human endometriosis is associated with plasma cells and overexpression of B lymphocyte stimulator. Proceedings of the National Academy of Sciences 104, 12451-12456. https://doi.org/10.1073/pnas.0703451104
  8. Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo. Annals of Statistics 1, 799-821. https://doi.org/10.1214/aos/1176342503
  9. Irizarry, R. A. (2005). From CEL files to annotated lists of interesting genes. Bioinformatics and Computational Biology Solutions Using R and Bioconductor?Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, eds, 434-435.
  10. Ishwaran, H. and Rao, J. S. (2003). Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection. Journal of the American Statistical Association 98, 438-456. https://doi.org/10.1198/016214503000224
  11. Ishwaran, H. and Rao, J. S. (2005). Spike and Slab Gene Selection for Multigroup Microarray Data. Journal of the American Statistical Association 100, 764-781. https://doi.org/10.1198/016214505000000051
  12. Oh, H. S., Nychka, D. W. and Lee, T. (2007). The Role of Pseudo Data for Robust Smoothing with Application to Wavelet Regression. Biometrika 94, 893. https://doi.org/10.1093/biomet/asm064
  13. Papana, A. and Ishwaran, H. (2006). CART variance stabilization and regularization for high-throughput genomic data. Bioinformatics 22, 2254-2261. https://doi.org/10.1093/bioinformatics/btl384
  14. Pavlidis, P., Li, Q. and Noble, W. S. (2003). The effect of replication on gene expression microarray experiments. Bioinformatics 19, 1620-1627. https://doi.org/10.1093/bioinformatics/btg227
  15. Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470. https://doi.org/10.1126/science.270.5235.467
  16. Smyth, G. K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology 3, 1027.
  17. Tsai, C. A., Hsueh, H. M. and Chen, J. J. (2003). Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 59, 1071-1081. https://doi.org/10.1111/j.0006-341X.2003.00123.x
  18. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  19. Wang, S. and Ethier, S. (2004). A generalized likelihood ratio test to identify differentially expressed genes from microarray data. Bioinformatics 20, 100-104. https://doi.org/10.1093/bioinformatics/btg384
  20. Yan, X., Deng, M., Fung, W. K. and Qian, M. (2005). Detecting differentially expressed genes by relative entropy. J Theor Biol 234, 395-402. https://doi.org/10.1016/j.jtbi.2004.11.039
  21. Yoon, S., Yang, Y., Choi, J. and Seong, J. (2006). Large scale data mining approach for gene-specific standardization of microarray gene expression data. Bioinformatics 22, 2898-2904. https://doi.org/10.1093/bioinformatics/btl500