Advanced SearchSearch Tips
Binary Segmentation Procedure for Detecting Change Points in a DNA Sequence
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Binary Segmentation Procedure for Detecting Change Points in a DNA Sequence
Yang Tae Young; Kim Jeongjin;
  PDF(new window)
It is interesting to locate homogeneous segments within a DNA sequence. Suppose that the DNA sequence has segments within which the observations follow the same residue frequency distribution, and between which observations have different distributions. In this setting, change points correspond to the end points of these segments. This article explores the use of a binary segmentation procedure in detecting the change points in the DNA sequence. The change points are determined using a sequence of nested hypothesis tests of whether a change point exists. At each test, we compare no change-point model with a single change-point model by using the Bayesian information criterion. Thus, the method circumvents the computational complexity one would normally face in problems with an unknown number of change points. We illustrate the procedure by analyzing the genome of the bacteriophage lambda.
Bayesian information criterion;bacteriophage lambda;binary segmentation procedure;
 Cited by
Akaike, H. (1973). Information measures and model selection, Bulletin of the International Statistical Institute, Vol. 50, 277-290

Braun, J.V. and Muller, H. (1998). Statistical methods for DNA sequence segmentation, Statistical Science, Vol. 13, 142-162 crossref(new window)

Braun, J,V., Braun, P.K. and Muller, H. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation, Biometrika, Vol 87, 301-314 crossref(new window)

Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees, Wadworth and Brooks/Cole, Monterey

Chen, J. and Gupta, A. (1997). Testing and locating variance change points with applications to stock prices, Journal of the American Statistical Association, Vol. 92, 739-747 crossref(new window)

Holm, S. (1979). A simple sequentially rejective Bonferroni test procedure, Scandinavian Journal of Statistics. Vol. 6, 65-70

Kadane, J.B. and Lazar, N.A. (2004). Methods and criteria for model selection, Journal of the American Statistical Society, Vol. 99 279-290 crossref(new window)

Kass, R.E. and Raftery, A.E. (1995). Bayes factor, Journal of the American Statistical Association, Vol. 90, 773-795 crossref(new window)

Kim, H. and Mallick, B.K. (2002). Analyzing spatial data using skew-Gaussian processes, In Spatial Cluster Modelling, A Lawson and D. Denison (editors). Chapman and Hall, London, 163-173

Liu, J.S. and Lawrence, C.E. (1999). Bayesian inference on bipolymer models, Bioinformatics, Vol. 15, 38-52 crossref(new window)

Raftery, A. (1995). Bayesian model selection in social research, In Sociological Methodology, Marsden P(ed). Blackwells, Cambridge, 111-196

Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals, Journal of the American Statistical Association, Vol. 92, 894-902 crossref(new window)

Schlattmann, P., Gallinat, J. and Bohning, D. (2002). Spatia-temporal partition modelling: an example from neurophysiology, In Spatial Cluster Modelling, A Lawson and D. Denison (editors). Chapman and Hall, London, 227-234

Schwarz, G. (1978). Estimating the dimension of a model, The Annals of Statistics, Vol. 6, 461-464 crossref(new window)

Scott, A. and Knott, M. (1974). Cluster analysis method for grouping means in the analysis of variance, Biometrics, Vol. 30, 507-512 crossref(new window)

Skalka, A. Burge, E. and Hershey, A.D. (1968). Segmental distribution of nucleotides in the DNA of bacteriophage lambda, Journal of Molecular Biology, Vol. 34, 1-16 crossref(new window)

Titterington, D.M., Smith, A.F.M. and Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions, Wiley, New York

van Dyk, D.A. and Hans, C.M. (2002). Accounting for absorption lines in images obtained with the Chandra X-ray Observatory, In Spatial Cluster Modelling, A. Lawson and D. Denison (editors). Chapman and Hall, London, 175-198

Venkatraman, E.S. (1992). Consistency results in multiple change-point situations, Unpublished PhD Thesis, Department of Statistics, Stanford University

Vostrikova, L.J, (1981). Detecting 'disorder' in multidimensional random processes, Soviet Mathematics Doklady, Vol. 24, 55-59

Yang, T.Y. and Kuo, L. (2001). Bayesian binary segmentation procedure for a Poisson process with multiple changepoints, Journal of Computational and Graphical Statistics, Vol. 10, 772-785 crossref(new window)

Yang, T.Y. (2004). Bayesian binary segmentation procedure for detecting streakiness in sports, Journal of the Royal Statistical Society Series A, Vol. 167, 627-637 crossref(new window)

Yang, T.Y. (2005). A tree-based model for homogeneous groupings of multinomials, Statistics in Medicine, in press

Yang, T.Y. and Swartz, T. (2005). Applications of binary segmentation to the estimation of quantal response curves and spatial intensity. Biometrical Journal, in press