How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review

  • Oh, Sunghee (Department of Veterinary Medicine, Jeju National University)
  • Received : 2015.02.23
  • Accepted : 2015.03.19
  • Published : 2015.03.31


In a short history, RNA-seq data have established a revolutionary tool to directly decode various scenarios occurring on whole genome-wide expression profiles in regards with differential expression at gene, transcript, isoform, and exon specific quantification, genetic and genomic mutations, and etc. RNA-seq technique has been rapidly replacing arrays with seq-based platform experimental settings by revealing a couple of advantages such as identification of alternative splicing and allelic specific expression. The remarkable characteristics of high-throughput large-scale expression profile in RNA-seq are lied on expression levels of read counts, structure of correlated samples and genes, larger number of genes compared to sample size, different sampling rates, inevitable systematic RNA-seq biases, and etc. In this study, we will comprehensively review how robust Bayesian and non-parametric methods have a better performance than classical statistical approaches by explicitly incorporating such intrinsic RNA-seq specific features with flexible and more appropriate assumptions and distributions in practice.


  1. Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data, Genome Biology, 11, R106.
  2. Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W. and Robinson, M. D. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor, Nature Protocols, 8, 1765-1786.
  3. Anders, S., Reyes, A. and Huber, W. (2012). Detecting differential usage of exons from RNA-seq data, Genome Research, 22, 2008-2017.
  4. Aryee, M. J., Gutierrez-Pabello, J. A., Kramnik, I., Maiti, T. and Quackenbush, J. (2009). An improved empirical bayes approach to estimating differential gene expression in microarray timecourse data: BETR (Bayesian Estimation of Temporal Regulation), BMC Bioinformatics, 10, 409.
  5. Bar-Joseph, Z., Gitter, A. and Simon, I. (2012). Studying and modelling dynamic biological processes using time-series gene expression data, Nature Reviews Genetics, 13, 552-564.
  6. Beretta, S., Bonizzoni, P., Vedova, G. D., Pirola, Y. and Rizzi, R. (2014). Modeling alternative splicing variants from RNA-Seq data with isoform graphs, Journal of Computational Biology : A Journal of Computational Molecular Cell Biology, 21, 16-40.
  7. Bernard, E., Jacob, L., Mairal, J. and Vert, J. P. (2014). Efficient RNA isoform identification and quantification from RNA-Seq data with network flows, Bioinformatics, 30, 2447-2455.
  8. Bi, Y. and Davuluri, R. V. (2013). NPEBseq: Nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data, BMC Bioinformatics, 14, 262.
  9. Bullard, J. H., Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments, BMC Bioinformatics, 11, 94.
  10. Cumbie, J. S., Kimbrel, J. A., Di, Y., Schafer, D. W., Wilhelm, L. J., Fox, S. E., Sullivan, C. M., Curzon, A. D., Carrington, J. C., Mockler, T. C. and Chang, J. H. (2011). GENE-counter: A computational pipeline for the analysis of RNA-Seq data for gene expression differences, PloS One, 6, e25279.
  11. Deng, N., Puetter, A., Zhang, K., Johnson, K., Zhao, Z., Taylor, C., Flemington, E. K. and Zhu, D. (2011). Isoform-level microRNA-155 target prediction using RNA-seq, Nucleic Acids Research, 39, e61.
  12. Gao, X. and Song, P. X. (2005). Nonparametric tests for differential gene expression and interaction effects in multi-factorial microarray experiments, BMC Bioinformatics, 6, 186.
  13. Gatto, A., Torroja-Fungairino, C., Mazzarotto, F., Cook, S. A., Barton, P. J., Sanchez-Cabo, F. and Lara-Pezzi, E. (2014). FineSplice, enhanced splice junction detection and quantification: A novel pipeline based on the assessment of diverse RNA-Seq alignment solutions, Nucleic Acids Research, 42, e71.
  14. Gerns Storey, H. L., Richardson, B. A., Singa, B., Naulikha, J., Prindle, V. C., Diaz-Ochoa, V. E., Felgner, P. L., Camerini, D., Horton, H., John-Stewart, G. and Walson, J. L. (2014). Use of principal components analysis and protein microarray to explore the association of HIV-1-specific IgG responses with disease progression, AIDS Research and Human Retroviruses, 30, 37-44.
  15. Ginsberg, S. D., Alldred, M. J., Counts, S. E., Cataldo, A. M., Neve, R. L., Jiang, Y., Wuu, J., Chao, M. V., Mufson, E. J., Nixon, R. A. and Che, S. (2010). Microarray analysis of hippocampal CA1 neurons implicates early endosomal dysfunction during Alzheimer's disease progression, Biological Psychiatry, 68, 885-893.
  16. Glaus, P., Honkela, A. and Rattray, M. (2012). Identifying differentially expressed transcripts from RNA-seq data with biological variation, Bioinformatics, 28, 1721-1728.
  17. Goncalves, A., Tikhonov, A., Brazma, A. and Kapushesky, M. (2011). A pipeline for RNA-seq data processing and quality assessment, Bioinformatics, 27, 867-869.
  18. Gupta, V., Markmann, K., Pedersen, C. N. S., Stougaard, J. and Andersen, S. U. (2012). shortran: A pipeline for small RNA-seq data analysis, Bioinformatics, 28, 2698-2700.
  19. Han, H. and Jiang, X. (2014). Disease biomarker query from RNA-seq data, Cancer Informatics, 13, 81-94.
  20. Hardcastle, T. J. and Kelly, K. A. (2010). baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data, BMC Bioinformatics, 11, 422.
  21. Hill, J. T., Demarest, B. L., Bisgrove, B. W., Gorsi, B., Su, Y. C. and Yost, H. J. (2013). MMAPPR:Mutation mapping analysis pipeline for pooled RNA-seq, Genome Research, 23, 687-697.
  22. Hiller, D., Jiang, H., Xu, W. and Wong, W. H. (2009). Identifiability of isoform deconvolution from junction arrays and RNA-Seq, Bioinformatics, 25, 3056-3059.
  23. Hiller, D. and Wong, W. H. (2013). Simultaneous isoform discovery and quantification from RNAseq, Statistics in Biosciences, 5, 100-118.
  24. Howard, B. E. and Heber, S. (2010). Towards reliable isoform quantification using RNA-SEQ data, BMC Bioinformatics, 11, S6.
  25. Hu, Y., Liu, Y., Mao, X., Jia, C., Ferguson, J. F., Xue, C., Reilly, M. P., Li, H. and Li, M. (2014). PennSeq: Accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution, Nucleic Acids Research, 42, e20.
  26. Ji, H. and Liu, X. S. (2010). Analyzing 'omics data using hierarchical models, Nature Biotechnology, 28, 337-340.
  27. Jiang, H. and Wong, W. H. (2009). Statistical inferences for isoform expression in RNA-Seq, Bioinformatics, 25, 1026-1032.
  28. Katz, Y.,Wang, E. T., Airoldi, E. M. and Burge, C. B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation, Nature Methods, 7, 1009-1015.
  29. Kaur, H., Mao, S., Li, Q., Sameni, M., Krawetz, S. A., Sloane, B. F. and Mattingly, R. R. (2012). RNA-Seq of human breast ductal carcinoma in situ models reveals aldehyde dehydrogenase isoform 5A1 as a novel potential target, PloS One, 7, e50249.
  30. Kim, K. H., Moon, M., Yu, S. B., Mook-Jung, I. and Kim, J. I. (2012). RNA-Seq analysis of frontal cortex and cerebellum from 5XFAD mice at early stage of disease pathology, Journal of Alzheimer's Disease: JAD, 29, 793-808.
  31. Kimes, P. K., Cabanski, C. R., Wilkerson, M. D., Zhao, N., Johnson, A. R., Perou, C. M., Makowski, L., Maher, C. A., Liu, Y., Marron, J. S. and Hayes, D. N. (2014). SigFuge: Single gene clustering of RNA-seq reveals differential isoform usage among cancer samples, Nucleic Acids Research, 42, e113.
  32. Knowles, D. G., Roder, M., Merkel, A. and Guigo, R. (2013). Grape RNA-Seq analysis pipeline environment, Bioinformatics, 29, 614-621.
  33. Kroll, K. W., Kroll, K. W., Mokaram, N. E., Pelletier, A. R., Frankhouser, D. E., Westphal, M. S., Stump, P. A., Stump, C. L., Bundschuh, R., Blachly, J. S. and Yan, P. (2014). Quality control for RNA-seq (QuaCRS): An integrated quality control pipeline, Cancer Informatics, 13, 7-14.
  34. Kumar, R., Lawrence, M. L., Watt, J., Cooksey, A. M., Burgess, S. C. and Nanduri, B. (2012). RNAseq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336", PloS One, 7, e29435.
  35. Lee, J., Ji, Y., Liang, S., Cai, G. and Muller, P. (2011). On differential gene expression using RNA-Seq data, Cancer Informatics, 10, 205-215.
  36. Leon-Novelo, L. G., McIntyre, L. M., Fear, J. M. and Graze, R. M. (2014). A flexible Bayesian method for detecting allelic imbalance in RNA-seq data, BMC Genomics, 15, 920.
  37. Lerch, J. K., Kuo, F., Motti, D., Morris, R., Bixby, J. L. and Lemmon, V. P. (2012). Isoform diversity and regulation in peripheral and central neurons revealed through RNA-Seq, PloS One, 7, e30417.
  38. Li, B., Tsoi, L. C., Swindell,W. R., Gudjonsson, J. E., Tejasvi, T., Johnston, A., Ding, J., Stuart, P. E., Xing, X., Kochkodan, J. J., Voorhees, J. J., Kang, H. M., Nair, R. P., Abecasis, G. R. and Elder, J. T. (2014). Transcriptome analysis of psoriasis in a large case-control sample: RNA-seq provides insights into disease mechanisms, The Journal of Investigative Dermatology, 134, 1828-1838.
  39. Li, J. J., Jiang, C. R., Brown, J. B., Huang, H. and Bickel, P. J. (2011). Sparse linear modeling of nextgeneration mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation, Proceedings of the National Academy of Sciences of the United States of America, 108, 19867-19872.
  40. Li, W. and Jiang, T. (2012). Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads, Bioinformatics, 28, 2914-2921.
  41. Lin, Y., Reynolds, P. and Feingold, E. (2003). An empirical bayesian method for differential expression studies using one-channel microarray data, Statistical Applications in Genetics and Molecular Biology, 2, 8.
  42. Lin, Z., Puetter, A., Coco, J., Xu, G., Strong, M. J., Wang, X., Fewell, C., Baddoo, M., Taylor, C. and Flemington, E. K. (2012) Detection of murine leukemia virus in the Epstein-Barr viruspositive human B-cell line JY, using a computational RNA-Seq-based exogenous agent detection pipeline, PARSES, Journal of Virology, 86, 2970-2977.
  43. Ma, X. and Zhang, X. (2013). NURD: An implementation of a new method to estimate isoform expression from non-uniform RNA-seq data, BMC Bioinformatics, 14, 220.
  44. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. and Gilad, Y. (2008). RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays, Genome Research, 18, 1509-1517.
  45. Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M. and Wang, Z. (2010). Rnnotator: An automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads, BMC Genomics, 11, 663.
  46. Mezlini, A. M., Smith, E. J. M., Fiume, M., Buske, O., Savich, G., Shah, S., Aparicion, S., Chiang, D., Goldenberg, A. and Brudno, M. (2013). iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data, Genome Research, 23, 519-529.
  47. Mills, J. D., Nalpathamkalam, T., Jacobs, H. I., Janitz, C., Merico, D., Hu, P. and Janitz, M. (2013). RNA-Seq analysis of the parietal cortex in Alzheimer's disease reveals alternatively spliced isoforms related to lipid metabolism, Neuroscience Letters, 536, 90-95.
  48. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nature Methods, 5, 621-628.
  49. Nariai, N., Hirose, O., Kojima, K. and Nagasaki, M. (2013). TIGAR: Transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference, Bioinformatics, 29, 2292-2299.
  50. Nariai, N., Kojima, K., Mimori, T., Sato, Y., Kawai, Y., Yamaguchi-Kabata, Y. and Nagasaki, M. (2014). TIGAR2: Sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads, BMC Genomics, 15, S5.
  51. Nicolae, M., Mangul, S., Mandoiu, I. I. and Zelikovsky, A. (2011). Estimation of alternative splicing isoform frequencies from RNA-Seq data, Algorithms for Molecular Biology: AMB, 6, 9.
  52. Nishiu, M., Yanagawa, R., Nakatsuka, S., Yao, M., Tsunoda, T., Nakamura, Y. and Aozasa, K. (2002). Microarray analysis of gene-expression profiles in diffuse large B-cell lymphoma: Identification of genes related to disease progression, Japanese Journal of Cancer Research: Gann, 93, 894-901.
  53. Niu, L., Huang, W., Umbach, D. M. and Li, L. (2014). IUTA: A tool for effectively detecting differential isoform usage from RNA-Seq data, BMC Genomics, 15, 862.
  54. Oh, S., Song, S., Grabowski, G., Zhao, H. and Noonan, J. P. (2013). Time series expression analyses using RNA-seq: A statistical approach, BioMed Research International, 2013, 203681.
  55. Oshlack, A., Robinson, M. D. and Young, M. D. (2010). From RNA-seq reads to differential expression results, Genome Biology, 11, 220.
  56. Pandey, R. V., Franssen, S. U., Futschik, A. and Schlotterer, C. (2013). Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data, Molecular Ecology Resources, 13, 740-745.
  57. Patro, R., Mount, S. M. and Kingsford, C. (2014) Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms, Nature Biotechnology, 32, 462-464.
  58. Pollier, J., Rombauts, S. and Goossens, A. (2013). Analysis of RNA-Seq data with TopHat and Cufflinks for genome-wide expression analysis of jasmonate-treated plants and plant cultures, Methods in Molecular Biology, 1011, 305-315.
  59. Rehrauer, H., Opitz, L., Tan, G., Sieverling, L. and Schlapbach, R. (2013). Blind spots of quantitative RNA-seq: The limits for assessing abundance, differential expression, and isoform switching, BMC Bioinformatics, 14, 370.
  60. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. and Pachter, L. (2011). Improving RNA-Seq expression estimates by correcting for fragment bias, Genome Biology, 12, R22.
  61. Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A Bioconductor package for 198 Sunghee Oh differential expression analysis of digital gene expression data, Bioinformatics, 26, 139-140.
  62. Robinson, M. D. and Oshlack, A. A. (2010). Scaling normalization method for differential expression analysis of RNA-seq data, Genome Biology, 11, R25.
  63. Ryan, M. C., Cleland, J., Kim, R., Wong, W. C. and Weinstein, J. N. (2012). SpliceSeq: A resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts, Bioinformatics, 28, 2385-2387.
  64. Safikhani, Z., Sadeghi, M., Pezeshk, H. and Eslahchi, C. (2013). SSP: An interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads, Genomics, 102, 507-514.
  65. Satoh, J., Yamamoto, Y., Asahina, N., Kitano, S. and Kino, Y. (2014). RNA-Seq data mining: Downregulation of NeuroD6 serves as a possible biomarker for alzheimer's disease brains, Disease Markers, 2014, 123165.
  66. Shen, S., Park, J. W., Huang, J., Dittmar, K. A., Lu, Z. X., Zhou, Q., Carstens, R. P. and Xing, Y. (2012). MATS: A Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data, Nucleic Acids Research, 40, e61.
  67. Shen, S., Park, J. W., Lu, Z. X., Lin, L., Henry, M. D., Wu, Y. N., Zhou, Q. and Xing, Y. (2014). rMATS: Robust and flexible detection of differential alternative splicing from replicate RNASeq data, Proceedings of the National Academy of Sciences of the United States of America, 111, E5593-5601.
  68. Shi, Y. and Jiang, H. (2013). rSeqDiff: Detecting differential isoform expression from RNA-Seq data using hierarchical likelihood ratio test, PloS One, 8, e79448.
  69. Skelly, D. A., Johansson, M., Madeoy, J., Wakefield, J. and Akey, J. M. (2011). A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNAseq data, Genome Research, 21, 1728-1737.
  70. Stegle, O., Denby, K. J., Cooke, E. J., Wild, D. L., Ghahramani, Z. and Borgwardt, K. M. (2010). A robust Bayesian two-sample test for detecting intervals of differential gene expression in microarray time series, Journal of Computational Biology : A Journal of Computational Molecular Cell Biology, 17, 355-367.
  71. Suo, C., Calza, S., Salim, A. and Pawitan, Y. (2014). Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data, Bioinformatics, 30, 506-513.
  72. Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. and Conesa, A. (2011). Differential expression in RNA-seq: A matter of depth, Genome Research, 21, 2213-2223.
  73. Trapnell, C., Pachter, L. and Salzberg, S. L. (2009). TopHat: Discovering splice junctions with RNASeq, Bioinformatics, 25, 1105-1111.
  74. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., Pimentel, H., Salzberg, S. L., Rinn, J. L. and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nature Protocols, 7, 562-578.
  75. Trapnell, C., Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nature Biotechnology, 28, 511-515.
  76. Vardhanabhuti, S., Li, M. and Li, H. A. (2013). Hierarchical Bayesian Model for Estimating and Inferring Differential Isoform Expression for Multi-Sample RNA-Seq Data, Statistics in Biosciences, 5, 119-137.
  77. Vitting-Seerup, K., Porse, B. T., Sandelin, A. and Waage, J. (2014). spliceR: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data, BMC Bioinformatics, 15, 81.
  78. Wagner, G. P., Kin, K. and Lynch, V. J. (2012). Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples, Theory in Biosciences = Theorie in den Biowissenschaften, 131, 281-285.
  79. Wang, L., Feng, Z., Wang, X., Wang, X. and Zhang, X. (2010a). DEGseq: An R package for identifying differentially expressed genes from RNA-seq data, Bioinformatics, 26, 136-138.
  80. Wang, L., Xi, Y., Yu, J., Dong, L., Yen, L. and Li, W. (2010b). A statistical method for the detection of alternative splicing using RNA-seq, PloS One, 5, e8529.
  81. Wang, R., Sun, L., Bao, L., Zhang, J., Jiang, Y., Yao, J., Song, L., Feng, J., Liu, S. and Liu, Z. (2013). Bulk segregant RNA-seq reveals expression and positional candidate genes and allele-specific expression for disease resistance against enteric septicemia of catfish, BMC Genomics, 14, 929.
  82. Wang, X., Wu, Z. and Zhang, X. (2010c). Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq, Journal of Bioinformatics and Computational Biology, 8, 177-192.
  83. Warren, A.S., Aurrecoechea, C., Brunk, B., Desai, P., Emrich, S., Giraldo-Calderon, G. I., Harb, O., Hix, D., Lawson, D., Machi, D., Mao, C., McClelland, M., Nordberg, E., Shukla, M., Vosshall, L. B., Wattam, A. R., Will, R., Yoo, H. S. and Sobral, B. (2015). RNA-Rocket: An RNA-Seq analysis resource for infectious disease research, Bioinformatics, 31.
  84. Wu, Z., Wang, X. and Zhang, X. (2011). Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq, Bioinformatics, 27, 502-508.
  85. Yalamanchili, H. K., Li, Z., Wang, P., Wong, M. P., Yao, J. and Wang, J. (2014). SpliceNet: Recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples, Nucleic Acids Research, 42, e121.
  86. Young, M. D., Wakefield, M. J., Smyth, G. K. and Oshlack, A. (2010). Gene ontology analysis for RNA-seq: Accounting for selection bias, Genome Biology, 11, R14.
  87. Zhang, J., Kuo, C. C. and Chen, L. (2014). WemIQ: An accurate and robust isoform quantification method for RNA-seq data, Bioinformatics, 30. The cytochrome P450 genes of channel catfish: Their involvement in disease defense responses as revealed by meta-analysis of RNA-Seq data sets, Biochim Biophys Acta, 1840, 2813-2828.
  88. Zhang, Y., Lameijer, E. W., 't Hoen, P. A., Ning, Z., Slagboom, P. E. and Ye, K. (2012). PASSion: A pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data, Bioinformatics, 28, 479-486.
  89. Zhao, H., Chan, K. L., Cheng, L. M. and Yan, H. (2008). Multivariate hierarchical Bayesian model for differential gene expression analysis in microarray experiments, BMC Bioinformatics, 9, S9.
  90. Zhao, K., Lu, Z. X., Park, J. W., Zhou, Q. and Xing, Y. (2013). GLiMMPS: Robust statistical model for regulatory variation of alternative splicing using RNA-seq data, Genome Biology, 14, R74.

Cited by

  1. Identifying differentially expressed genes using the Polya urn scheme vol.24, pp.6, 2017,