- Volume 2 Issue 2
DOI QR Code
Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach
- Oh, Hee-Seok (Department of Statistics, Seoul National University) ;
- Jang, Dong-Ik (Department of Statistics, Seoul National University) ;
- Oh, Seung-Yoon (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
- Kim, Hee-Bal (Interdisciplinary Program in Bioinformatics, Seoul National University)
- Received : 2010.03.17
- Accepted : 2010.05.31
- Published : 2010.06.30
The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.
Supported by : Korea Research Foundation
- Aittokallio, T., Kurki, M., Nevalainen, O., Nikula, T., West, A. and Lahesmaa, R. (2003). Computational strategies for analyzing data in gene expression microarray experiments. J Bioinform Comput Biol 1, 541-586. https://doi.org/10.1142/S0219720003000319
- Allison, D. B., Cui, X., Page, G. P. and Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7, 55-65. https://doi.org/10.1038/nrg1749
- Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289-300.
- Cox, D. D. (1983). Asymptotics for M-type smoothing splines. Ann. Statist 11, 530-551. https://doi.org/10.1214/aos/1176346159
- Cui, X., Hwang, J. T., Qiu, J., Blades, N. J. and Churchill, G. A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. https://doi.org/10.1093/biostatistics/kxh018
- Gosset, W. S. (1908). The probable error of a mean. Biometrika 6, 1-25. https://doi.org/10.1093/biomet/6.1.1
- Hever, A., Roth, R. B., Hevezi, P., Marin, M. E., Acosta, J. A., Acosta, H., Rojas, J., Herrera, R., Grigoriadis, D., White, E., Conlon, P. J., Maki, R. A. and Zlotnik, A. (2007). Human endometriosis is associated with plasma cells and overexpression of B lymphocyte stimulator. Proceedings of the National Academy of Sciences 104, 12451-12456. https://doi.org/10.1073/pnas.0703451104
- Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo. Annals of Statistics 1, 799-821. https://doi.org/10.1214/aos/1176342503
- Irizarry, R. A. (2005). From CEL files to annotated lists of interesting genes. Bioinformatics and Computational Biology Solutions Using R and Bioconductor?Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, eds, 434-435.
- Ishwaran, H. and Rao, J. S. (2003). Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection. Journal of the American Statistical Association 98, 438-456. https://doi.org/10.1198/016214503000224
- Ishwaran, H. and Rao, J. S. (2005). Spike and Slab Gene Selection for Multigroup Microarray Data. Journal of the American Statistical Association 100, 764-781. https://doi.org/10.1198/016214505000000051
- Oh, H. S., Nychka, D. W. and Lee, T. (2007). The Role of Pseudo Data for Robust Smoothing with Application to Wavelet Regression. Biometrika 94, 893. https://doi.org/10.1093/biomet/asm064
- Papana, A. and Ishwaran, H. (2006). CART variance stabilization and regularization for high-throughput genomic data. Bioinformatics 22, 2254-2261. https://doi.org/10.1093/bioinformatics/btl384
- Pavlidis, P., Li, Q. and Noble, W. S. (2003). The effect of replication on gene expression microarray experiments. Bioinformatics 19, 1620-1627. https://doi.org/10.1093/bioinformatics/btg227
- Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470. https://doi.org/10.1126/science.270.5235.467
- Smyth, G. K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology 3, 1027.
- Tsai, C. A., Hsueh, H. M. and Chen, J. J. (2003). Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 59, 1071-1081. https://doi.org/10.1111/j.0006-341X.2003.00123.x
- Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
- Wang, S. and Ethier, S. (2004). A generalized likelihood ratio test to identify differentially expressed genes from microarray data. Bioinformatics 20, 100-104. https://doi.org/10.1093/bioinformatics/btg384
- Yan, X., Deng, M., Fung, W. K. and Qian, M. (2005). Detecting differentially expressed genes by relative entropy. J Theor Biol 234, 395-402. https://doi.org/10.1016/j.jtbi.2004.11.039
- Yoon, S., Yang, Y., Choi, J. and Seong, J. (2006). Large scale data mining approach for gene-specific standardization of microarray gene expression data. Bioinformatics 22, 2898-2904. https://doi.org/10.1093/bioinformatics/btl500