Advanced SearchSearch Tips
Multiple Testing in Genomic Sequences Using Hamming Distance
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Multiple Testing in Genomic Sequences Using Hamming Distance
Kang, Moonsu;
  PDF(new window)
High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.
Pseudo-marginal approach;false discovery rate;Hamming distance;genomic sequence;
 Cited by
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B, 57, 289-300.

Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, 29, 1165-1188. crossref(new window)

Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 71-103. crossref(new window)

Dye, C. and Gay, N. (2003). Modeling the SARS epidemic, Perspectives Epidemiology, 300.

Ghosh, D. (2003). Penalized discriminant methods for the classification of tumors from microarray experiments, Bioinformatics, 59, 992-1000.

Huber, P. J. and Ronchetti, E. M. (1981). Robust Statistics, Wiley Series in Probability and Statistics, New York

Kang, M. and Sen, P. K. (2007). Multiple Testing in Genome-wide Studies, University of North Carolina at Chapel Hill.

Kang, M. and Sen, P. K. (2008). Kendall tau type rank statistics in genomic data, Applications of Mathematics, 3, 207-221.

Krishnaiah, P. R. and Sen, P. K. (1985). Handbook of Statistics 4: Nonparametric Methods, North- Holland, Netherlands

Odeh, R. E. (1972). On the power of Jonckheere's k-sample test against ordered alternatives, Biometrika, 59, 467-471. crossref(new window)

Pinhero, H. P., Pinhero, A. D. S. and Sen, P. K. (2005). Comparison of genomic sequences using the hamming distance, Journal of Statistical Planning and Inference, 130, 325-339. crossref(new window)

Sen, P. K. (1977). Some invariance principles relating to jackknifing and their role in sequential analysis, The Annals of Statistics, 5, 316-329. crossref(new window)

Sen, P. K. (2005). Gini diversity index, hamming distance, and curse of dimensionality, METRON - International Journal of Statistics, LXIII, 329-349.

Sen, P. K. (2006). Robust statistical inference for high dimensional data models with application to genomics, Austrian Journal of Statistics, 35, 197-214.

Sen, P. K. (2008). Kendall's tau in high-dimensional genomic parsimony, Institute of mathematical Statistics, Collection Series, 3, 251-266.

Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics, Chapman and Hall/CRC, New York.

Sidak, Z., Sen, P. K. and Hajek, J. (1999). Theory of Rank Tests, Second Edition (Probability and Mathematical Statistics), San Diego, Academic Press, CA.

Silvapulle, M. J. and Sen, P. K. (2004). Constrained Statistical Inference: Inequality, Order, and Shape Restrictions, Wiley-Interscience, New York.

Storey, J. (2002). A direct approach to false discovery rates, Journal of the Royal Statistical Society: Series B, 64, 479-498. crossref(new window)

Storey, J. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value, Annals of Statistics, 3, 2013-2035.

Storey, J., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach, Journal of the Royal Statistical Society, Series B, 66, 187-205. crossref(new window)