DOI QR코드

DOI QR Code

Multiple testing and its applications in high-dimension

고차원자료에서의 다중검정의 활용

  • Received : 2013.07.25
  • Accepted : 2013.09.09
  • Published : 2013.09.30

Abstract

The power of modern technology is opening a new era of big data. The size of the datasets affords us the opportunity to answer many open scientific questions but also presents some interesting challenges. High-dimensional data such as microarray are common in big data. In this paper, we give an overview of recent development of multiple testing including global and simultaneous testing and its applications to high-dimensional data.

현대 과학기술의 발전으로 빅데이터의 시대가 도래하였다, 이러한 빅데이터는 여러가지 과학적 문제에 대한 해답을 제공하지만 반면에 이로 인해 새로운 도전에 직면하고 있다. 마이크로어레이 자료와 같은 고차원자료는 이러한 빅데이터에서 흔히 볼 수 있는 유형중의 하나이다. 본 논문에서는 고차원 자료분석에 많이 쓰이고 있는 대역검정과 동시검정, 그리고 이의 응용에 대한 소개를 한다.

Keywords

References

  1. Arias-Castro, E., Candes, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: Anova, multiple comparisons and the higher criticism. Annals of Statistics, 39, 2533-2556. https://doi.org/10.1214/11-AOS910
  2. Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. and Golani, I (2001). Controlling the false discovery rate in behavior genetics research. Behavioural Brain Research, 125, 279-284. https://doi.org/10.1016/S0166-4328(01)00297-2
  3. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289-300.
  4. Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165-1188. https://doi.org/10.1214/aos/1013699998
  5. Donoho, D. L. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures, Annals of Statistics, 32, 962-994. https://doi.org/10.1214/009053604000000265
  6. Efron, B. (2010). Large-scale inferece: Empirical Bayes methods for estimation, testing, and prediction, Cambridge University Press, Cambridge.
  7. Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman's truncation. Journal of the American Statistical Association, 91, 674-688. https://doi.org/10.1080/01621459.1996.10476936
  8. Fan, J. and Lin, S.-K.(1998). Test of significance when data are curves. Journal of the American Statistical Association, 93, 1007-1021. https://doi.org/10.1080/01621459.1998.10473763
  9. Genovese, C. R., Lazar, N. A. and Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using false discovery rate. Neuroimage, 15, 870-878. https://doi.org/10.1006/nimg.2001.1037
  10. Genovese, C. R. and Wasserman, L. A. (2004). A stochastic process approach to false discovery control. Annals of Statistics, 32, 1035-1061. https://doi.org/10.1214/009053604000000283
  11. Jin, J. (2008). Proportion of non-zero normal means: Universal oracle equivalences and uniformly consistent estimators. Journal of the Royal Statistical Society B, 70, 461-493. https://doi.org/10.1111/j.1467-9868.2007.00645.x
  12. Lindsay, B. G., Kettenring, J. and Siegmund, D. O. (2004). A report on the future of statistics. Statistical Science, 19, 387-413. https://doi.org/10.1214/088342304000000404
  13. Park, C., Ahn, J., Hendry, M. and Jang, W. (2011). Analysis of long period variable stars with nonparametric tests for trend detection. Journal of the American Statistical Association, 106, 832-845. https://doi.org/10.1198/jasa.2011.ap08689
  14. Patti, M. E., Butte, A. J., Crunkhorn, S., Cusi, K., Berria, R., Kashyap, S., Miyazaki, Y., Kohane, I., Costello, M., Saccone, R., Landaker, E. J., Goldfine, A. B., Mun, E., DeFronzo, R., Finlayson, J., Kahn, C. R. and Mandarino, L. J. (2003). Coordinate reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: Potential role of PGC1 and NRF1. Proceedings of the National Academy Sciences of USA, 100, 8466-8471.
  15. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., de Bakker, P. I., Daly, M. J. and Sham, P. C. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81, 559-575. https://doi.org/10.1086/519795
  16. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society B, 100, 9440-9445.
  17. Storey, J. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy Sciences of USA, 98, 5116-5121.
  18. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of micorarrays applied to the ionizing radiation response. Proceedings of the National Academy Sciences of USA, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  19. Weisberg, S. P., McCann, D., Desai, M., Rosenbaum, M., Leibel, R. L. and Ferrante, A. W. (2003). Obesity is associated with macrophage accumulation in adipose tissue. Journal of Clinical Investigation, 112, 1796-1808. https://doi.org/10.1172/JCI200319246
  20. Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. https://doi.org/10.1037/0003-066X.54.8.594
  21. Wit, E. (2010). Comments on Discovering the false discovery rate by Benjamini. Journal of the Royal Statistical Society B, 72, 410-411.

Cited by

  1. Comparison and analysis of multiple testing methods for microarray gene expression data vol.25, pp.5, 2014, https://doi.org/10.7465/jkdi.2014.25.5.971
  2. A Study on Improving Classification Performance for Manufacturing Process Data with Multicollinearity and Imbalanced Distribution vol.41, pp.1, 2015, https://doi.org/10.7232/JKIIE.2015.41.1.025
  3. Estimation of Gini-Simpson index for SNP data vol.28, pp.6, 2013, https://doi.org/10.7465/jkdi.2017.28.6.1557