DOI QR코드

DOI QR Code

Reproducibility and Sample Size in High-Dimensional Data

고차원 자료의 재현성과 표본 수

  • Received : 20100800
  • Accepted : 20100800
  • Published : 2010.12.31

Abstract

A number of methods have been developed to determine sample sizes in clinical trial, and most clinical trial organizations determine sample sizes based on the methods. In contrast, determining sufficient sample sizes needed for experiments using microarray chips is unsatisfactory and not widely in use. In this paper, our objective is to provide a guideline in determining sample sizes, utilizing reproducibility of real microarray data. In the reproducibility comparison, five methods for discovering differential expression are used: Fold change, Two-sample t-test, Wilcoxon rank-sum test, SAM, and LPE. In order to standardize gene expression values, both MAS5 and RMA methods are considered. According to the number of repetitions, the upper 20 and 100 gene accordances are also compared. In determining sample sizes, more realistic information can be added to the existing method because of our proposed approach.

임상시험을 위한 표본 수 산정방법에 대해 지금까지 많은 방법이 개발되었고 현재 국내외 임상시험 기관에서 이 방법들을 토대로 표본 수를 산정하고 있다. 하지만 마이크로어레이칩 을 이용한 실험에 필요한 표본 수 산정에 대한 연구는 아직 미비하여 제대로 이용되지 않고 있다. 본 연구의 목적은 마이크로어레이 실험에 필요한 표본 수를 산정하는 데 있어 실제 마이크로어레이 자료의 재현성에 대한 정보를 이용하여 그 지침을 제공하는데 있다. 재현성 비교에서는 5가지 검정방법 즉, Fold change, Two-sample t-test, Wilcoxon rank-sum test, SAM, LPE 방법 별로 재현성을 측정하였다. 발현 값의 표준화 방법에 있어서는 MAS5, RMA 두 가지로 세분화 하였으며 반복수에 따라 상위 20개 또는 100개 유전자에 대한 일치성도 측정하였다. 또한, 표본수를 산정하는데 있어 기존에 제시한 방법에 현실적인 정보를 이용하여 좀 더 세분화하여 실험에 필요한 표본수를 산정해 보았다.

Keywords

References

  1. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological), 57, 289-300. https://doi.org/10.2307/2346101
  2. Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, Annals of Statistics, 29, 1165-1188. https://doi.org/10.1214/aos/1013699998
  3. Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 71-103. https://doi.org/10.1214/ss/1056397487
  4. Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics, 4, 249-264. https://doi.org/10.1093/biostatistics/4.2.249
  5. Jain, N., Cho, H. J., O'Connell, M. and Lee, J. K. (2005). Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data, BMC Bioinformatics, 6, 187. https://doi.org/10.1186/1471-2105-6-187
  6. Jain, N., Thatte, J., Braciale, T., Ley, K., O'Connell, M. and Lee, J. K. (2003). Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays, Bioinformatics, 19, 1945-1951. https://doi.org/10.1093/bioinformatics/btg264
  7. Jung, S. H. (2005). Sample size for FDR-control in microarray data analysis, Bioinformatics, 21, 3097-3104. https://doi.org/10.1093/bioinformatics/bti456
  8. Shao, Y. and Tseng, C. H. (2007). Sample size calculation with dependence adjustment for FDR-control in microarray studies, Statistics in Medicine, 26, 4219-4237. https://doi.org/10.1002/sim.2862
  9. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, Proceedings of National Academy of Sciences USA, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498