Multiple Testing in Genomic Sequences Using Hamming Distance

Kang, Moonsu;

doi:10.5351/CKSS.2012.19.6.899

Communications for Statistical Applications and Methods

Volume 19 Issue 6
/
Pages.899-904
/
2012
/
2287-7843(pISSN)
/
2383-4757(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Multiple Testing in Genomic Sequences Using Hamming Distance

Kang, Moonsu (Department of Information Statistics, Gangneung-Wonju National University)

Received : 2012.08.29
Accepted : 2012.11.15
Published : 2012.11.30

https://doi.org/10.5351/CKSS.2012.19.6.899 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.

Keywords

References

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B, 57, 289-300.
Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, 29, 1165-1188. https://doi.org/10.1214/aos/1013699998
Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 71-103. https://doi.org/10.1214/ss/1056397487
Dye, C. and Gay, N. (2003). Modeling the SARS epidemic, Perspectives Epidemiology, 300.
Ghosh, D. (2003). Penalized discriminant methods for the classification of tumors from microarray experiments, Bioinformatics, 59, 992-1000.
Huber, P. J. and Ronchetti, E. M. (1981). Robust Statistics, Wiley Series in Probability and Statistics, New York
Kang, M. and Sen, P. K. (2007). Multiple Testing in Genome-wide Studies, University of North Carolina at Chapel Hill.
Kang, M. and Sen, P. K. (2008). Kendall tau type rank statistics in genomic data, Applications of Mathematics, 3, 207-221.
Krishnaiah, P. R. and Sen, P. K. (1985). Handbook of Statistics 4: Nonparametric Methods, North- Holland, Netherlands
Odeh, R. E. (1972). On the power of Jonckheere's k-sample test against ordered alternatives, Biometrika, 59, 467-471. https://doi.org/10.1093/biomet/59.2.467
Pinhero, H. P., Pinhero, A. D. S. and Sen, P. K. (2005). Comparison of genomic sequences using the hamming distance, Journal of Statistical Planning and Inference, 130, 325-339. https://doi.org/10.1016/j.jspi.2003.03.002
Sen, P. K. (1977). Some invariance principles relating to jackknifing and their role in sequential analysis, The Annals of Statistics, 5, 316-329. https://doi.org/10.1214/aos/1176343797
Sen, P. K. (2005). Gini diversity index, hamming distance, and curse of dimensionality, METRON - International Journal of Statistics, LXIII, 329-349.
Sen, P. K. (2006). Robust statistical inference for high dimensional data models with application to genomics, Austrian Journal of Statistics, 35, 197-214.
Sen, P. K. (2008). Kendall's tau in high-dimensional genomic parsimony, Institute of mathematical Statistics, Collection Series, 3, 251-266.
Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics, Chapman and Hall/CRC, New York.
Sidak, Z., Sen, P. K. and Hajek, J. (1999). Theory of Rank Tests, Second Edition (Probability and Mathematical Statistics), San Diego, Academic Press, CA.
Silvapulle, M. J. and Sen, P. K. (2004). Constrained Statistical Inference: Inequality, Order, and Shape Restrictions, Wiley-Interscience, New York.
Storey, J. (2002). A direct approach to false discovery rates, Journal of the Royal Statistical Society: Series B, 64, 479-498. https://doi.org/10.1111/1467-9868.00346
Storey, J. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value, Annals of Statistics, 3, 2013-2035.
Storey, J., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach, Journal of the Royal Statistical Society, Series B, 66, 187-205. https://doi.org/10.1111/j.1467-9868.2004.00439.x

Communications for Statistical Applications and Methods

Multiple Testing in Genomic Sequences Using Hamming Distance

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)