DOI QR코드

DOI QR Code

A Penalized Spline Based Method for Detecting the DNA Copy Number Alteration in an Array-CGH Experiment

Kim, Byung-Soo;Kim, Sang-Cheol

  • Published : 2009.02.28

Abstract

The purpose of statistical analyses of array-CGH experiment data is to divide the whole genome into regions of equal copy number, to quantify the copy number in each region and finally to evaluate its significance of being different from two. Several statistical procedures have been proposed which include the circular binary segmentation, and a Gaussian based local regression for detecting break points (GLAD) by estimating a piecewise constant function. We propose in this note a penalized spline regression and its simultaneous confidence band(SCB) approach to evaluate the statistical significance of regions of genetic gain/loss. The region of which the simultaneous confidence band stays above 0 or below 0 can be considered as a region of genetic gain or loss. We compare the performance of the SCB procedure with GLAD and hidden Markov model approaches through a simulation study in which the data were generated from AR(1) and AR(2) models to reflect spatial dependence of the array-CGH data in addition to the independence model. We found that the SCB method is more sensitive in detecting the low level copy number alterations.

Keywords

DNA copy number alteration;gastric cancer;penalized spline;simultaneous confidence band

References

  1. Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems, Journal of the American Statistical Association, 88, 309-319 https://doi.org/10.2307/2290726
  2. Broet, P. and Richardson, S. (2006). Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model, Bioinformatics, 22, 911-918 https://doi.org/10.1093/bioinformatics/btl035
  3. Chari, R., Lockwood, W. W. and Lam, W. L. (2006). Computational methods for the analysis of array comparative genomic hybridization, Cancer Informatics, 2, 48-58
  4. Eilers, P. H. C and de Menezes, R X. (2005). Quantile smoothing of array CGH data, Bioinformatics, 21, 1146-1153 https://doi.org/10.1093/bioinformatics/bti148
  5. Fan, J. and Niu, Y. (2007). Selection and validation of normalization methods fore-DNA microarrays using within-array replications, Bioinformatics, 23, 2391-2398 https://doi.org/10.1093/bioinformatics/btm361
  6. Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. and Jain, A. N. (2004). Hidden Markov models approach to the annlysis of array CGH data, Journal of Multivariate Analysis, 90, 132-153 https://doi.org/10.1016/j.jmva.2004.02.008
  7. Henderson, C R. (1975). Best linear unbiased estimation and prediction under a selection model, Biometrics, 31, 423-447 https://doi.org/10.2307/2529430
  8. Hsu, L., Self, S. G., Grove, D., Randolf, T., Wang, K., Delrow, J. J., Loo, L. and Porter, P. (2005). Denoising array-based comparative genomic hybridization data using wavelets, Biostatistics, 6, 211-226 https://doi.org/10.1093/biostatistics/kxi004
  9. Huang, T., Wu, B., Lizardi, P. and Zhao, H. (2005). Detection of DNA copy number alterations using penalized least squares regression, Bioinformatics, 21, 3811-3817 https://doi.org/10.1093/bioinformatics/bti646
  10. Hupe, P., Stransky, N., Thiery, J. P., Radvanyi, F. and Barillot, E. (2004). Analysis of array CGH data: From signal ratio to gain and loss of DNA regions, Bioinformatics, 20, 3413-3422 https://doi.org/10.1093/bioinformatics/bth418
  11. Jong, K., Marchiori, E., Meijer, G., Vaart, A. V. D. and Ylstra, B. (2004). Breakpoint identification and smoothing of array comparative genomic hybridization data, Bioinformatics, 20, 3636-3637 https://doi.org/10.1093/bioinformatics/bth355
  12. Kim, B. S., Kim, I., Lee, S., Kim, S., Rha, S. Y. and Chung, C H. (2005). Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer, Bioinformatics, 21, 517-528 https://doi.org/10.1093/bioinformatics/bti029
  13. Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data, Bioinformatics, 21, 3763-3770 https://doi.org/10.1093/bioinformatics/bti611
  14. Li, Y. and Zhu, J. (2007). Analysis of array CGH data for cancer studies using fused quantile regression, Bioinformatics, 23, 2470-2476 https://doi.org/10.1093/bioinformatics/btm364
  15. Mestre-Escorihuela, C, Rubio-Moscardo, F., Richter, J. A., Seibert, R, Clement, J., Fresquet, V., Beltran, E., Agirre, X., Marugan, I., Marin, M., Rosenwald, A., Sugimoto, K. J., Wheat, L. M., Karran, E. L., Garcia, J. F., Sanchez. L., Prosper, F., Staudt, L. M., Pinkel, D., Dyer, M. J. and Martinez-Climent, J. A. (2007). Homozygous deletions localize novel tumor suppressor gene in B-cell lymphoma, Blood, 109, 271-280 https://doi.org/10.1182/blood-2006-06-026500
  16. Myers, C L., Dunham, M. J., Kung, S. Y. and Troyanskaya, O. G. (2004). Accurate detection of aneuploidies in array CGH and gene expression microaray data, Bioinformatics, 20, 3533-3543 https://doi.org/10.1093/bioinformatics/bth440
  17. Olshen, A. B., Venkatraman, E. S., Lucito, Rand Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics; 5, 557-572 https://doi.org/10.1093/biostatistics/kxh008
  18. Picard, F., Robin,S., lebarbier, E. and Daudin, J-.J. (2007). A segmentation/clustering model for the analysis of array CGH data, Biometrics, 63, 758-766 https://doi.org/10.1111/j.1541-0420.2006.00729.x
  19. Pinkel, D. and Albertson, D. G. (2005). Array comparative genomic hybridization and its applications in cancer, Nature Genetics, 37, S11-S17 https://doi.org/10.1038/ng1569
  20. Pollack, J. R, Sorlie, T., Perou, C M., Rees, C A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R, Botstein, D., Borresen-Dale, A. L. and Brown, P. O. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors, Proceedings of the National Academy of Sciences, 99, 12963-12968 https://doi.org/10.1073/pnas.162471999
  21. Rabiner, L. R (1989). A tutorial on hidden Markov models and selected applications in speech recognition, In Proceedings of the IEEE, 77, 257-286 https://doi.org/10.1109/5.18626
  22. Rigaill, G., Hupe, P., LaRosa, P., Meyniel, J-.P., Decraene, C, Almeida, A. and Barillot, E. (2008). ITALICS: An algorithm for normalization and DNA copy number calling for Affymetrix SNP arrays, Bioinformatics, 24, 768-774 https://doi.org/10.1093/bioinformatics/btn048
  23. Rouveirol, C, Stransky, N., Hupe, P., Rosa, P. L., Viara, E., Barillot, E. and Radvanyi, F. (2006). Computation of recurrent minimal genomic alterations from array-CGH data, Bioinformatics, 22, 849-856 https://doi.org/10.1093/bioinformatics/btl004
  24. Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression, Cambridge University Press, New York
  25. Scheel, I., Aldrin, M., Glad, I. K., Sorum, R., Lying, H, and Frigessi, A. (2005). The inference of missing value imputation on detection of differentially expressed genes from microarray data, Bioinformatics, 21, 4272-4279 https://doi.org/10.1093/bioinformatics/bti708
  26. Shah, S. P., Lam, W. L., Ng, R. T. and Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data, Bioinformatics, 23, i450-i458 https://doi.org/10.1093/bioinformatics/btm221
  27. Stjernqvist, S., Ryden, T., Skold, M. and Staaf, J. (2007). Continuous-index hidden Markov modelling of array CGH copy number data, Bioinformatics. 23, 1006-1014 https://doi.org/10.1093/bioinformatics/btm059
  28. Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso, Biostatistics, 9, 18-29 https://doi.org/10.1093/biostatistics/kxm013
  29. Venkatraman, E. S. and Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data, Bioinformatics, 23, 657-663 https://doi.org/10.1093/bioinformatics/btl646
  30. Wen, C.C., Wu, Y-J., Huang, Y-H., Chen, W-C., Liu, S-C., Jiang, S. S., Juang, J. L., Lin, C. Y., Fang, W. T., Hsiung, C. A. and Chang, I. S. (2006). A Bayes regression approach to array-CGH data, Statistical Applications in Genetics and Molecular Biology, 5, Article 3 https://doi.org/10.2202/1544-6115.1149
  31. Yang, S. (2007). Gene amplifications at chromosome 7 of the human gastric cancer genome, International Journal of Molecular Medicine, 20, 225-231
  32. Yang,S., Jeung, H. C., Choi, Y. H., Kim, J. E., Jung, J-J., Jeong, H. J., Rha, S. Y., Yang, W. I. and Chung, H. C. (2007). Identification of genes with correlated patterns of variations in DNA copy number and gene expression level in gastric cancer, Genomics, 89, 451-459 https://doi.org/10.1016/j.ygeno.2006.12.001
  33. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002). Normalization for cDNA rnicroarray data: A robust composite method addressing single and multiple slide systematic variation, Nucleic Acid Research, 30, e15 https://doi.org/10.1093/nar/30.4.e15
  34. Yistra, B., van der lJssel, P., Carvalho, B., Brakenhoff, R. H. and Meijer, G. A. (2006). BAC to the future! or oligonucleotides: A perspective for micro array comparative genomic hybridization(array CGH), Nucleic Acid Research, 34, 445-450 https://doi.org/10.1093/nar/gkj456