DOI QR코드

DOI QR Code

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu (Department of Information Statistics, Gangneung-Wonju National University)
  • Received : 2012.08.29
  • Accepted : 2012.11.15
  • Published : 2012.11.30

Abstract

High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.

Keywords

References

  1. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B, 57, 289-300.
  2. Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, 29, 1165-1188. https://doi.org/10.1214/aos/1013699998
  3. Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 71-103. https://doi.org/10.1214/ss/1056397487
  4. Dye, C. and Gay, N. (2003). Modeling the SARS epidemic, Perspectives Epidemiology, 300.
  5. Ghosh, D. (2003). Penalized discriminant methods for the classification of tumors from microarray experiments, Bioinformatics, 59, 992-1000.
  6. Huber, P. J. and Ronchetti, E. M. (1981). Robust Statistics, Wiley Series in Probability and Statistics, New York
  7. Kang, M. and Sen, P. K. (2007). Multiple Testing in Genome-wide Studies, University of North Carolina at Chapel Hill.
  8. Kang, M. and Sen, P. K. (2008). Kendall tau type rank statistics in genomic data, Applications of Mathematics, 3, 207-221.
  9. Krishnaiah, P. R. and Sen, P. K. (1985). Handbook of Statistics 4: Nonparametric Methods, North- Holland, Netherlands
  10. Odeh, R. E. (1972). On the power of Jonckheere's k-sample test against ordered alternatives, Biometrika, 59, 467-471. https://doi.org/10.1093/biomet/59.2.467
  11. Pinhero, H. P., Pinhero, A. D. S. and Sen, P. K. (2005). Comparison of genomic sequences using the hamming distance, Journal of Statistical Planning and Inference, 130, 325-339. https://doi.org/10.1016/j.jspi.2003.03.002
  12. Sen, P. K. (1977). Some invariance principles relating to jackknifing and their role in sequential analysis, The Annals of Statistics, 5, 316-329. https://doi.org/10.1214/aos/1176343797
  13. Sen, P. K. (2005). Gini diversity index, hamming distance, and curse of dimensionality, METRON - International Journal of Statistics, LXIII, 329-349.
  14. Sen, P. K. (2006). Robust statistical inference for high dimensional data models with application to genomics, Austrian Journal of Statistics, 35, 197-214.
  15. Sen, P. K. (2008). Kendall's tau in high-dimensional genomic parsimony, Institute of mathematical Statistics, Collection Series, 3, 251-266.
  16. Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics, Chapman and Hall/CRC, New York.
  17. Sidak, Z., Sen, P. K. and Hajek, J. (1999). Theory of Rank Tests, Second Edition (Probability and Mathematical Statistics), San Diego, Academic Press, CA.
  18. Silvapulle, M. J. and Sen, P. K. (2004). Constrained Statistical Inference: Inequality, Order, and Shape Restrictions, Wiley-Interscience, New York.
  19. Storey, J. (2002). A direct approach to false discovery rates, Journal of the Royal Statistical Society: Series B, 64, 479-498. https://doi.org/10.1111/1467-9868.00346
  20. Storey, J. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value, Annals of Statistics, 3, 2013-2035.
  21. Storey, J., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach, Journal of the Royal Statistical Society, Series B, 66, 187-205. https://doi.org/10.1111/j.1467-9868.2004.00439.x