A Penalized Spline Based Method for Detecting the DNA Copy Number Alteration in an Array-CGH Experiment

Kim, Byung-Soo;Kim, Sang-Cheol

  • Published : 2009.02.28


The purpose of statistical analyses of array-CGH experiment data is to divide the whole genome into regions of equal copy number, to quantify the copy number in each region and finally to evaluate its significance of being different from two. Several statistical procedures have been proposed which include the circular binary segmentation, and a Gaussian based local regression for detecting break points (GLAD) by estimating a piecewise constant function. We propose in this note a penalized spline regression and its simultaneous confidence band(SCB) approach to evaluate the statistical significance of regions of genetic gain/loss. The region of which the simultaneous confidence band stays above 0 or below 0 can be considered as a region of genetic gain or loss. We compare the performance of the SCB procedure with GLAD and hidden Markov model approaches through a simulation study in which the data were generated from AR(1) and AR(2) models to reflect spatial dependence of the array-CGH data in addition to the independence model. We found that the SCB method is more sensitive in detecting the low level copy number alterations.


DNA copy number alteration;gastric cancer;penalized spline;simultaneous confidence band


  1. Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems, Journal of the American Statistical Association, 88, 309-319
  2. Broet, P. and Richardson, S. (2006). Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model, Bioinformatics, 22, 911-918
  3. Chari, R., Lockwood, W. W. and Lam, W. L. (2006). Computational methods for the analysis of array comparative genomic hybridization, Cancer Informatics, 2, 48-58
  4. Eilers, P. H. C and de Menezes, R X. (2005). Quantile smoothing of array CGH data, Bioinformatics, 21, 1146-1153
  5. Fan, J. and Niu, Y. (2007). Selection and validation of normalization methods fore-DNA microarrays using within-array replications, Bioinformatics, 23, 2391-2398
  6. Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. and Jain, A. N. (2004). Hidden Markov models approach to the annlysis of array CGH data, Journal of Multivariate Analysis, 90, 132-153
  7. Henderson, C R. (1975). Best linear unbiased estimation and prediction under a selection model, Biometrics, 31, 423-447
  8. Hsu, L., Self, S. G., Grove, D., Randolf, T., Wang, K., Delrow, J. J., Loo, L. and Porter, P. (2005). Denoising array-based comparative genomic hybridization data using wavelets, Biostatistics, 6, 211-226
  9. Huang, T., Wu, B., Lizardi, P. and Zhao, H. (2005). Detection of DNA copy number alterations using penalized least squares regression, Bioinformatics, 21, 3811-3817
  10. Hupe, P., Stransky, N., Thiery, J. P., Radvanyi, F. and Barillot, E. (2004). Analysis of array CGH data: From signal ratio to gain and loss of DNA regions, Bioinformatics, 20, 3413-3422
  11. Jong, K., Marchiori, E., Meijer, G., Vaart, A. V. D. and Ylstra, B. (2004). Breakpoint identification and smoothing of array comparative genomic hybridization data, Bioinformatics, 20, 3636-3637
  12. Kim, B. S., Kim, I., Lee, S., Kim, S., Rha, S. Y. and Chung, C H. (2005). Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer, Bioinformatics, 21, 517-528
  13. Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data, Bioinformatics, 21, 3763-3770
  14. Li, Y. and Zhu, J. (2007). Analysis of array CGH data for cancer studies using fused quantile regression, Bioinformatics, 23, 2470-2476
  15. Mestre-Escorihuela, C, Rubio-Moscardo, F., Richter, J. A., Seibert, R, Clement, J., Fresquet, V., Beltran, E., Agirre, X., Marugan, I., Marin, M., Rosenwald, A., Sugimoto, K. J., Wheat, L. M., Karran, E. L., Garcia, J. F., Sanchez. L., Prosper, F., Staudt, L. M., Pinkel, D., Dyer, M. J. and Martinez-Climent, J. A. (2007). Homozygous deletions localize novel tumor suppressor gene in B-cell lymphoma, Blood, 109, 271-280
  16. Myers, C L., Dunham, M. J., Kung, S. Y. and Troyanskaya, O. G. (2004). Accurate detection of aneuploidies in array CGH and gene expression microaray data, Bioinformatics, 20, 3533-3543
  17. Olshen, A. B., Venkatraman, E. S., Lucito, Rand Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics; 5, 557-572
  18. Picard, F., Robin,S., lebarbier, E. and Daudin, J-.J. (2007). A segmentation/clustering model for the analysis of array CGH data, Biometrics, 63, 758-766
  19. Pinkel, D. and Albertson, D. G. (2005). Array comparative genomic hybridization and its applications in cancer, Nature Genetics, 37, S11-S17
  20. Pollack, J. R, Sorlie, T., Perou, C M., Rees, C A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R, Botstein, D., Borresen-Dale, A. L. and Brown, P. O. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors, Proceedings of the National Academy of Sciences, 99, 12963-12968
  21. Rabiner, L. R (1989). A tutorial on hidden Markov models and selected applications in speech recognition, In Proceedings of the IEEE, 77, 257-286
  22. Rigaill, G., Hupe, P., LaRosa, P., Meyniel, J-.P., Decraene, C, Almeida, A. and Barillot, E. (2008). ITALICS: An algorithm for normalization and DNA copy number calling for Affymetrix SNP arrays, Bioinformatics, 24, 768-774
  23. Rouveirol, C, Stransky, N., Hupe, P., Rosa, P. L., Viara, E., Barillot, E. and Radvanyi, F. (2006). Computation of recurrent minimal genomic alterations from array-CGH data, Bioinformatics, 22, 849-856
  24. Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression, Cambridge University Press, New York
  25. Scheel, I., Aldrin, M., Glad, I. K., Sorum, R., Lying, H, and Frigessi, A. (2005). The inference of missing value imputation on detection of differentially expressed genes from microarray data, Bioinformatics, 21, 4272-4279
  26. Shah, S. P., Lam, W. L., Ng, R. T. and Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data, Bioinformatics, 23, i450-i458
  27. Stjernqvist, S., Ryden, T., Skold, M. and Staaf, J. (2007). Continuous-index hidden Markov modelling of array CGH copy number data, Bioinformatics. 23, 1006-1014
  28. Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso, Biostatistics, 9, 18-29
  29. Venkatraman, E. S. and Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data, Bioinformatics, 23, 657-663
  30. Wen, C.C., Wu, Y-J., Huang, Y-H., Chen, W-C., Liu, S-C., Jiang, S. S., Juang, J. L., Lin, C. Y., Fang, W. T., Hsiung, C. A. and Chang, I. S. (2006). A Bayes regression approach to array-CGH data, Statistical Applications in Genetics and Molecular Biology, 5, Article 3
  31. Yang, S. (2007). Gene amplifications at chromosome 7 of the human gastric cancer genome, International Journal of Molecular Medicine, 20, 225-231
  32. Yang,S., Jeung, H. C., Choi, Y. H., Kim, J. E., Jung, J-J., Jeong, H. J., Rha, S. Y., Yang, W. I. and Chung, H. C. (2007). Identification of genes with correlated patterns of variations in DNA copy number and gene expression level in gastric cancer, Genomics, 89, 451-459
  33. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002). Normalization for cDNA rnicroarray data: A robust composite method addressing single and multiple slide systematic variation, Nucleic Acid Research, 30, e15
  34. Yistra, B., van der lJssel, P., Carvalho, B., Brakenhoff, R. H. and Meijer, G. A. (2006). BAC to the future! or oligonucleotides: A perspective for micro array comparative genomic hybridization(array CGH), Nucleic Acid Research, 34, 445-450