A Statistical Analysis of SNPs, In-Dels, and Their Flanking Sequences in Human Genomic Regions

  • Shin, Seung-Wook (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Kim, Young-Joo (Functional Genomics Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Kim, Byung-Dong (Interdisciplinary Program in Bioinformatics, Seoul National University)
  • Published : 2007.06.30

Abstract

Due to the increasing interest in SNPs and mutational hot spots for disease traits, it is becoming more important to define and understand the relationship between SNPs and their flanking sequences. To study the effects of flanking sequences on SNPs, statistical approaches are necessary to assess bias in SNP data. In this study we mainly applied Markov chains for SNP sequences, particularly those located in intronic regions, and for analysis of in-del data. All of the pertaining sequences showed a significant tendency to generate particular SNP types. Most sequences flanking SNPs had lower complexities than average sequences, and some of them were associated with microsatellites. Moreover, many Alu repeats were found in the flanking sequences. We observed an elevated frequency of single-base-pair repeat-like sequences, mirror repeats, and palindromes in the SNP flanking sequence data. Alu repeats are hypothesized to be associated with C-to-T transition mutations or A-to-I RNA editing. In particular, the in-del data revealed an association between particular changes such as palindromes or mirror repeats. Results indicate that the mechanism of induction of in-del transitions is probably very different from that which is responsible for other SNPs. From a statistical perspective, frequent DNA lesions in some regions probably have effects on the occurrence of SNPs.

Keywords

References

  1. Asicioglu, F., Oguz-Savran, F., and Ozbek, U. (2004). Mutation rate at commonly used forensic STR loci: paternity testing experience. Dis. Markers. 20, 313-315 https://doi.org/10.1155/2004/643086
  2. Batzer, M. A., Rubin, C. M., Hellmann-Blumberg, U., Alegria-Hartman, M., Leeflang, E. P., Stern, J. D., Bazan, H. A., Shaikh, T. H., Deininger, P. L., and Schmid, C. W. (1995). Dispersion and insertion polymorphism in two small subfamilies of recently amplified human Alu repeats. J. Mol. Biol. 247, 418-427 https://doi.org/10.1006/jmbi.1994.0150
  3. Batzer, M.A., Deininger, P.L., Hellmann-Blumberg, U., Jurka, J., Labuda, D., Rubin, C.M., Schmid, C.W., Zietkiewicz, E., and Zuckerkandl, E. (1996). Standardized nomenclature for Alu repeats. J. Mol. Evol. 42, 3-6 https://doi.org/10.1007/BF00163204
  4. Belotserkovskii, B. P., Krasilnikova, M. M., Veselkov, A. G., and Frank-Kamenetskii, M. D. (1992). Kinetic trapping of H-DNA by oligonucleotide binding. Nucleic Acids Res. 20, 1903-1908 https://doi.org/10.1093/nar/20.8.1903
  5. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. (2000). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 28, 10-14 https://doi.org/10.1093/nar/28.1.10
  6. Brendel, V., Beckman, J. S., and Trifonov, E. N. (1986). E. N. Linguistics of nucleotide sequences. J. Biomol. Struct. Dyn. 4, 11-21 https://doi.org/10.1080/07391102.1986.10507643
  7. Brinkmann, B., Klintschar, M., Neuhuber, F., Hu Hne, J., and Rolf, B. (1998). Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62, 1408-1415 https://doi.org/10.1086/301869
  8. Burge, C., Campbell, A. M., and Karlin, S. (1992). Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 89, 1358-1362
  9. Burns, D. P. and Temin, H. M. (1994). High rates of frameshift mutations within homo-oligomeric runs during a single cycle of retroviral replication. J. Virol. 68, 4196-4203
  10. Eisenberg, E., Adamsky, K., Cohen, L., Amariglio, N., Hirshberg, A., Rechavi, G., and Levanon, E. Y. (2005). Identification of RNA editing sites in the SNP Database Eisenberg. Nucleic Acids Res. 33, 4612-4617 https://doi.org/10.1093/nar/gki771
  11. Flomen, R., Knight, J., Sham, P., Kerwin, R., and Makoff, A. (2004). Evidence that RNA editing modulates splice site selection in the 5-HT2C receptor gene. Nucleic Acids Res. 32, 2113-2122 https://doi.org/10.1093/nar/gkh536
  12. Francino, M. P. and Ochman, H. (1997). Strand asymmetries in DNA evolution. Trends Genet. 13, 240-245 https://doi.org/10.1016/S0168-9525(97)01118-9
  13. Hellmann-Blumberg, U., McCarthy Hintz, M. F., Gatewood, J. M., and Schmid, C. W. (1993). Developmental differences in methylation of human Alu repeats. Mol. Cell. Biol. 13, 4523-4530 https://doi.org/10.1128/MCB.13.8.4523
  14. Ikehata, H., Nakamura, S., Asamura, T., and Ono, T. (2004). Mutation spectrum in sunlight-exposed mouse skin epidermis: Small but appreciable contribution of oxidative stress-mediated mutagenesis. Mutat. Res. 556, 11-24 https://doi.org/10.1016/j.mrfmmm.2004.06.038
  15. International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921 https://doi.org/10.1038/35057062
  16. Jurka, J. (1997). Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl. Acad. Sci. USA 94, 1872-1877
  17. Kim, B. D. (1985). Four-stranded DNA: An intermediate of homologous recombination and transposition. Kor. J. Breed. 17, 453-466
  18. Kim, D. D., Kim, T. T., Walsh, T., Kobayashi, Y., Matise, T. C., Buyske, S., and Gabriel, A. (2004). Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res. 14, 1719-1725 https://doi.org/10.1101/gr.2855504
  19. Knight, A., Batzer, M. A., Stoneking, M., Tiwari, H. K., Scheer, W. D., Herrera, R. J., and Deininger, P. L. (1996). DNA sequences of Alu elements indicate a recent replacement of the human autosomal genetic complement. Proc. Natl.Acad. Sci. USA 93, 4360-4364
  20. Leung, M.-Y., Marsh, G. M., and Speed, T. P. (1996). Overand underrepresentation of short DNA words in herpesvirus genomes. J. Comput. Biol. 3, 345-360 https://doi.org/10.1089/cmb.1996.3.345
  21. Levanon, E. Y., Eisenberg, E., Yelin, R., Nemzer, S., Hallegger, M., Shemesh, R., Fligelman, Z. Y., Shoshan, A., Pollock, S.R., Sztybel, D., Olshansky, M., Rechavi, G., and Jantsch, M. F. (2004). Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001-1005 https://doi.org/10.1038/nbt996
  22. Levinson, G. and Gutman, G. A. (1987). Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4, 203-221
  23. Liu, Z., Sun, H. X., Zhang, Y. W., Li, Y. F., Zuo, J., Meng, Y., and Fang, F. D. (2004). Effect of SNPs in protein kinase Cz gene on gene expression in the reporter gene detection system. World J.Gastroenterol. 10, 2357-2360 https://doi.org/10.3748/wjg.v10.i16.2357
  24. McCarthy, J. G. and Rich, A. (1991). Detection of an unusual distortion in A-tract DNA using KMnO4: effect of temperature and distamycin on the altered conformation. Nucleic Acids Res. 19, 3421-3429 https://doi.org/10.1093/nar/19.12.3421
  25. Mirkin, S. M., Lyamichev, V. I., Drushlyak, K. N., Dobrynin, V. N., Filippov, S. A., and Frank-Kamenetskii, M. D. (1987). DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330, 495-497 https://doi.org/10.1038/330495a0
  26. Robertson, K. D. and Jones, P. A. (2000). DNA methylation: past, present and future. Carcinogenesis 21, 461-467 https://doi.org/10.1093/carcin/21.3.461
  27. Rocha, E. P. C., Viari, A., and Danchin, A. (1998). Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. Nucleic Acids Res. 26, 2971-2980 https://doi.org/10.1093/nar/26.12.2971
  28. Roos, D., de Boer, M., Kuribayashi, F., Meischl, C., Weening, R. S., Segal, A. W., Ahlin, A., Nemet, K., Hossle, J. P., Bernatowska-Matuszkiewicz, E., and Middleton-Price, H. (1996). Mutations in the X-linked and autosomal recessive forms of chronic granulomatous disease. Blood 87, 1663-1681
  29. Schbath, S. (1997). An efficient statistic to detect over- and under-represented words in DNA sequences. J. Comput. Biol. 4, 189-192 https://doi.org/10.1089/cmb.1997.4.189
  30. Schbath, S., Prum, B., and Turckheim, É. (1995). Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol. 2, 417-437 https://doi.org/10.1089/cmb.1995.2.417
  31. Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M., and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311 https://doi.org/10.1093/nar/29.1.308
  32. Taylor, J. G., Choi, E. H., Foster, C. B., and Chanock, S. J. (2001). Using genetic variation to study human disease. Trends Mol. Med. 7, 507-512 https://doi.org/10.1016/S1471-4914(01)02183-9
  33. The International SNP Map Working Group. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928-933 https://doi.org/10.1038/35057149
  34. Vasquez, K. M., Christensen, J., Li, L., Finch, R. A., and Glazer, P. M. (2002). Human XPA and RPA DNA repair proteins participate in specific recognition of triplex-induced helical distortions. Proc. Natl. Acad. Sci. USA 99, 5848-5853
  35. Wang, G. and Vasquez, K. M. (2004). Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc. Natl. Acad. Sci. USA 101, 13448-13453
  36. Wolfe, K. H., Sharp, P. M., and Li, W. H. (1989). Mutation rates vary among regions of the mammalian genome. Nature 337, 283-285 https://doi.org/10.1038/337283a0
  37. Zingg, J. M. and Jones, P. A. (1997). Genetic and epigenetic aspects of DNA methylation on genome expression, evolution, mutation and carcinogenesis. Carcinogenesis 18, 869-882 https://doi.org/10.1093/carcin/18.5.869