A Statistical Analysis of SNPs, In-Dels, and Their Flanking Sequences in Human Genomic Regions

  • Shin, Seung-Wook (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Kim, Young-Joo (Functional Genomics Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Kim, Byung-Dong (Interdisciplinary Program in Bioinformatics, Seoul National University)
  • Published : 2007.06.30


Due to the increasing interest in SNPs and mutational hot spots for disease traits, it is becoming more important to define and understand the relationship between SNPs and their flanking sequences. To study the effects of flanking sequences on SNPs, statistical approaches are necessary to assess bias in SNP data. In this study we mainly applied Markov chains for SNP sequences, particularly those located in intronic regions, and for analysis of in-del data. All of the pertaining sequences showed a significant tendency to generate particular SNP types. Most sequences flanking SNPs had lower complexities than average sequences, and some of them were associated with microsatellites. Moreover, many Alu repeats were found in the flanking sequences. We observed an elevated frequency of single-base-pair repeat-like sequences, mirror repeats, and palindromes in the SNP flanking sequence data. Alu repeats are hypothesized to be associated with C-to-T transition mutations or A-to-I RNA editing. In particular, the in-del data revealed an association between particular changes such as palindromes or mirror repeats. Results indicate that the mechanism of induction of in-del transitions is probably very different from that which is responsible for other SNPs. From a statistical perspective, frequent DNA lesions in some regions probably have effects on the occurrence of SNPs.


single nucleotide polymorphisms;SNPs;Intron;Markov chain


  1. Asicioglu, F., Oguz-Savran, F., and Ozbek, U. (2004). Mutation rate at commonly used forensic STR loci: paternity testing experience. Dis. Markers. 20, 313-315 https://doi.org/10.1155/2004/643086
  2. Brendel, V., Beckman, J. S., and Trifonov, E. N. (1986). E. N. Linguistics of nucleotide sequences. J. Biomol. Struct. Dyn. 4, 11-21 https://doi.org/10.1080/07391102.1986.10507643
  3. Flomen, R., Knight, J., Sham, P., Kerwin, R., and Makoff, A. (2004). Evidence that RNA editing modulates splice site selection in the 5-HT2C receptor gene. Nucleic Acids Res. 32, 2113-2122 https://doi.org/10.1093/nar/gkh536
  4. Hellmann-Blumberg, U., McCarthy Hintz, M. F., Gatewood, J. M., and Schmid, C. W. (1993). Developmental differences in methylation of human Alu repeats. Mol. Cell. Biol. 13, 4523-4530 https://doi.org/10.1128/MCB.13.8.4523
  5. Robertson, K. D. and Jones, P. A. (2000). DNA methylation: past, present and future. Carcinogenesis 21, 461-467 https://doi.org/10.1093/carcin/21.3.461
  6. Roos, D., de Boer, M., Kuribayashi, F., Meischl, C., Weening, R. S., Segal, A. W., Ahlin, A., Nemet, K., Hossle, J. P., Bernatowska-Matuszkiewicz, E., and Middleton-Price, H. (1996). Mutations in the X-linked and autosomal recessive forms of chronic granulomatous disease. Blood 87, 1663-1681
  7. Wang, G. and Vasquez, K. M. (2004). Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc. Natl. Acad. Sci. USA 101, 13448-13453
  8. Wolfe, K. H., Sharp, P. M., and Li, W. H. (1989). Mutation rates vary among regions of the mammalian genome. Nature 337, 283-285 https://doi.org/10.1038/337283a0
  9. Leung, M.-Y., Marsh, G. M., and Speed, T. P. (1996). Overand underrepresentation of short DNA words in herpesvirus genomes. J. Comput. Biol. 3, 345-360 https://doi.org/10.1089/cmb.1996.3.345
  10. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. (2000). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 28, 10-14 https://doi.org/10.1093/nar/28.1.10
  11. Levanon, E. Y., Eisenberg, E., Yelin, R., Nemzer, S., Hallegger, M., Shemesh, R., Fligelman, Z. Y., Shoshan, A., Pollock, S.R., Sztybel, D., Olshansky, M., Rechavi, G., and Jantsch, M. F. (2004). Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001-1005 https://doi.org/10.1038/nbt996
  12. Taylor, J. G., Choi, E. H., Foster, C. B., and Chanock, S. J. (2001). Using genetic variation to study human disease. Trends Mol. Med. 7, 507-512 https://doi.org/10.1016/S1471-4914(01)02183-9
  13. Kim, D. D., Kim, T. T., Walsh, T., Kobayashi, Y., Matise, T. C., Buyske, S., and Gabriel, A. (2004). Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res. 14, 1719-1725 https://doi.org/10.1101/gr.2855504
  14. Rocha, E. P. C., Viari, A., and Danchin, A. (1998). Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. Nucleic Acids Res. 26, 2971-2980 https://doi.org/10.1093/nar/26.12.2971
  15. Schbath, S., Prum, B., and Turckheim, É. (1995). Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol. 2, 417-437 https://doi.org/10.1089/cmb.1995.2.417
  16. Brinkmann, B., Klintschar, M., Neuhuber, F., Hu Hne, J., and Rolf, B. (1998). Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62, 1408-1415 https://doi.org/10.1086/301869
  17. Batzer, M. A., Rubin, C. M., Hellmann-Blumberg, U., Alegria-Hartman, M., Leeflang, E. P., Stern, J. D., Bazan, H. A., Shaikh, T. H., Deininger, P. L., and Schmid, C. W. (1995). Dispersion and insertion polymorphism in two small subfamilies of recently amplified human Alu repeats. J. Mol. Biol. 247, 418-427 https://doi.org/10.1006/jmbi.1994.0150
  18. Eisenberg, E., Adamsky, K., Cohen, L., Amariglio, N., Hirshberg, A., Rechavi, G., and Levanon, E. Y. (2005). Identification of RNA editing sites in the SNP Database Eisenberg. Nucleic Acids Res. 33, 4612-4617 https://doi.org/10.1093/nar/gki771
  19. Vasquez, K. M., Christensen, J., Li, L., Finch, R. A., and Glazer, P. M. (2002). Human XPA and RPA DNA repair proteins participate in specific recognition of triplex-induced helical distortions. Proc. Natl. Acad. Sci. USA 99, 5848-5853
  20. Levinson, G. and Gutman, G. A. (1987). Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4, 203-221
  21. International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921 https://doi.org/10.1038/35057062
  22. Batzer, M.A., Deininger, P.L., Hellmann-Blumberg, U., Jurka, J., Labuda, D., Rubin, C.M., Schmid, C.W., Zietkiewicz, E., and Zuckerkandl, E. (1996). Standardized nomenclature for Alu repeats. J. Mol. Evol. 42, 3-6 https://doi.org/10.1007/BF00163204
  23. Schbath, S. (1997). An efficient statistic to detect over- and under-represented words in DNA sequences. J. Comput. Biol. 4, 189-192 https://doi.org/10.1089/cmb.1997.4.189
  24. Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M., and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311 https://doi.org/10.1093/nar/29.1.308
  25. The International SNP Map Working Group. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928-933 https://doi.org/10.1038/35057149
  26. Belotserkovskii, B. P., Krasilnikova, M. M., Veselkov, A. G., and Frank-Kamenetskii, M. D. (1992). Kinetic trapping of H-DNA by oligonucleotide binding. Nucleic Acids Res. 20, 1903-1908 https://doi.org/10.1093/nar/20.8.1903
  27. Ikehata, H., Nakamura, S., Asamura, T., and Ono, T. (2004). Mutation spectrum in sunlight-exposed mouse skin epidermis: Small but appreciable contribution of oxidative stress-mediated mutagenesis. Mutat. Res. 556, 11-24 https://doi.org/10.1016/j.mrfmmm.2004.06.038
  28. Jurka, J. (1997). Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl. Acad. Sci. USA 94, 1872-1877
  29. Francino, M. P. and Ochman, H. (1997). Strand asymmetries in DNA evolution. Trends Genet. 13, 240-245 https://doi.org/10.1016/S0168-9525(97)01118-9
  30. McCarthy, J. G. and Rich, A. (1991). Detection of an unusual distortion in A-tract DNA using KMnO4: effect of temperature and distamycin on the altered conformation. Nucleic Acids Res. 19, 3421-3429 https://doi.org/10.1093/nar/19.12.3421
  31. Liu, Z., Sun, H. X., Zhang, Y. W., Li, Y. F., Zuo, J., Meng, Y., and Fang, F. D. (2004). Effect of SNPs in protein kinase Cz gene on gene expression in the reporter gene detection system. World J.Gastroenterol. 10, 2357-2360 https://doi.org/10.3748/wjg.v10.i16.2357
  32. Zingg, J. M. and Jones, P. A. (1997). Genetic and epigenetic aspects of DNA methylation on genome expression, evolution, mutation and carcinogenesis. Carcinogenesis 18, 869-882 https://doi.org/10.1093/carcin/18.5.869
  33. Burns, D. P. and Temin, H. M. (1994). High rates of frameshift mutations within homo-oligomeric runs during a single cycle of retroviral replication. J. Virol. 68, 4196-4203
  34. Knight, A., Batzer, M. A., Stoneking, M., Tiwari, H. K., Scheer, W. D., Herrera, R. J., and Deininger, P. L. (1996). DNA sequences of Alu elements indicate a recent replacement of the human autosomal genetic complement. Proc. Natl.Acad. Sci. USA 93, 4360-4364
  35. Burge, C., Campbell, A. M., and Karlin, S. (1992). Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 89, 1358-1362
  36. Mirkin, S. M., Lyamichev, V. I., Drushlyak, K. N., Dobrynin, V. N., Filippov, S. A., and Frank-Kamenetskii, M. D. (1987). DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330, 495-497 https://doi.org/10.1038/330495a0
  37. Kim, B. D. (1985). Four-stranded DNA: An intermediate of homologous recombination and transposition. Kor. J. Breed. 17, 453-466