DOI QR코드

DOI QR Code

Detection and Prediction of Alternative Splicing with One-leaf One-node Tree

One-leaf One-node 트리를 이용한 선택 스플라이싱 탐지 및 예측

  • 박민서 (메사추세츠 대학교 컴퓨터과학과)
  • Received : 2010.08.23
  • Accepted : 2010.10.25
  • Published : 2010.10.28

Abstract

Alternative splicing is an important process in gene expression. Alternative Splicing can lead to mutations and diseases. Most studies detect alternatively spliced genes with ESTs (Expressed Sequence Tags). However, reliance on ESTs might have some weaknesses in predicting alternative splicing. ESTs have been stored in the libraries. The EST libraries are often not clearly organized and annotated. We can pick erroneous ESTs. It is also difficult to predict whether or not alternative splicing exists for those genes where ESTs are not available. To address these issues and to improve the quality of detection and prediction for alternative splicing, we propose the One-leaf One-node Tree Algorithm that uses pre-mRNAs. It is achieved by codons, three nucleotides, as attributes for each chromosome in Arabidopsis thaliana. The proposed decision tree shows that alternative and normal splicing have different splicing patterns according to triplet nucleotides in each chromosome. Based on the patterns, alternative splicing of unlabeled genes can also be predicted.

Keywords

Alternative Splicing;One-leaf One-node Tree Algorithm;pre-mRNA

References

  1. T. Chuang, F. Chen, and M. Chou, "A compareative method for identification of gene structures and alternatively spliced variant," Bioinformatics, Vol.20, pp.3064-3079, 2004. https://doi.org/10.1093/bioinformatics/bth368
  2. R. Sorek, R. Shemesh, Y. Cohen, O. Basechess, G. Ast, and R. Shamir, "A Non-EST-Based Method for Exon-Skipping Prediction," Genome Research, Vol.14, pp.1617-1623, 2004. https://doi.org/10.1101/gr.2572604
  3. S. Stamm, J. Riethoven, V. Le Texier, C. Gopalakrishnan, V. Kumanduri, Y. Tang, N. Barbosa-Morais, and T. Thanaraj, "ASD: a bioinformatics resource on alternative splicing," Nucleic Acids Research, Vol.34, pp.D46–D55, 2006. https://doi.org/10.1093/nar/gkj031
  4. http://www.ncbi.nlm.nih.gov.
  5. B. Haas, A. Delcher, S. Mount, J. Wortman, R. Smith Jr, L. Hannick, R. Maiti, C. Ronning, D. Rusch, C. Town, S. Salzberg, and O. White, "Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies," Nucleic Acids Research, Vol.31, pp.5654-5666, 2003. https://doi.org/10.1093/nar/gkg770
  6. M. Campbell, B. Haas, J. Hamilton, S. Mount, and C. Buell, "Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis," BMC Genomics, Vol.7, p.327, 2006. https://doi.org/10.1186/1471-2164-7-327
  7. R. Nurtdinov, I. Artamonova, A. Mironov, and M. Gelfand, "Low conservation of alternative splicing patterns in the human and mouse genomes," Human Molecular Genetic, Vol.12, pp.1313-1320, 2003. https://doi.org/10.1093/hmg/ddg137
  8. http://www.arabidopsis.org.
  9. http://www.tigr.org
  10. D. Black, "Mechanisms of alternative pre-messenger RNA splicing," Annual Review of Biochemistry, Vol.72, pp.291-336, 2003. https://doi.org/10.1146/annurev.biochem.72.121801.161720
  11. K. Iida, M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda, A. Konagaya, and K. Shinozaki, "Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis Thaliana based on full-length cDNA sequences," Nucleic Acids Re-search, Vol.32, pp.5096-5103, 2004. https://doi.org/10.1093/nar/gkh845
  12. M. Pertea, X. Lin, and S. Salzberg, "GeneSplicer: a new computational method for splice site prediction," Nucleic Acids Research, Vol.29, pp.1185-1190, 2001. https://doi.org/10.1093/nar/29.5.1185
  13. B. Wang and V. Brendel, "Genomewide comparative analysis of alternative splicing in plants," in Proceedings of the National Academy of Science of the United States of America, pp.7175-7180, 2006. https://doi.org/10.1073/pnas.0602039103
  14. W. Zhu, S. Schlueter, and V. Brendel, "Refined annotation of the Arabidopsis Thaliana genome by complete EST mapping," Plant Physiology, Vol.132, pp.469-484, 2003. https://doi.org/10.1104/pp.102.018101
  15. C. Iseli, V. Jongeneel, and P. Bucher, "ESTScan: A program for detecting, evaluating, and reconstructing potential coding regions in EST sequences," in Proceedings of the Seventh ISMB, pp.138-148, 1999.
  16. C. Jongeneel, "Searching the expressed sequence tag (EST) databases: panning for genes," Briefings in Bioinformatics, Vol.1, pp.76-92, 2000. https://doi.org/10.1093/bib/1.1.76
  17. J. Collins, M. Goward, C. Cole, L. Smink, E. Huckle, S. Knowles, J. M. Bye, D. Beare, and I. Dunham, "Reevaluating human gene annotation: a second-generation analysis of chromosome 22," Genome Research, Vol.13, pp.27-36, 2003. https://doi.org/10.1101/gr.695703
  18. D. Raghunandan, L. Guglielmo, D. K., and A. Animesh, "Clinical applications of DNA microarray analysis," Journal of Experimental Therapeutics and Oncology, Vol.3, pp.297-304, 2003. https://doi.org/10.1111/j.1533-869X.2003.01104.x
  19. S. Mehta, "DNA Microarrays in Health Care & Drug Discovery," http://plasticdog.cheme.colum bia.edu/.
  20. G. Hu, S. Madore, B. Moldever, T. Jatkoe, D. Balaban, J. Thomas, and Y. Want, "Predicting Splice Variant from DNA Chip Expression Data," Genome Research, Vol.11, pp.1237-1245, 2001. https://doi.org/10.1101/gr.165501
  21. E. Garrett-Mayer and G. Parmigiani, "Clustering and Classification Methods for Gene Expression Data Analysis," Johns Hopkins University, Dept. of Biostatistics Working Papers, Vol.70, 2004.
  22. T. Cover and P. Hart, "Nearest Neighbor Pattern Classification," in Proceedings of IEEE Transaction on Information Theory, pp.21-27, 1967. https://doi.org/10.1109/TIT.1967.1053964
  23. R. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, Vol.7, pp.178-188, 1936.
  24. V. Vapnik, Statistical Learning Theory. New York, NY: John Wiley & Sons, 1998.
  25. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth International Group, 1984.
  26. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations. Academic Press, 2000.
  27. A. Nabhan and A. Rafea, "Tuning statistical machine translation parameters using perplexity," in Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, pp.338-343, 2005. https://doi.org/10.1109/IRI-05.2005.1506496
  28. E. Brand and R. Gerritsen, "Decision Trees," DBMS Online, 1988, http://www.dbmsmag. com/-9807m05.html.
  29. K. Delisle, "Decision Trees and Evolutionary Programming," Artificial Intelligence Depot., Tech. Report, http://aidepot.com/Tutorial/ DecisionTrees .html.
  30. C. Burge and S. Karlin, "Prediction of complete gene structures in human genomic DNA," Journal of Molecular Biology, Vol.268, pp.78-94, 1997. https://doi.org/10.1006/jmbi.1997.0951
  31. H. Zhang and C. Yu, "Tree-based analysis of microarray data for classifying breast cancer," Frontiers in Bioscience, Vol.7, pp.C63-C67, 2002. https://doi.org/10.2741/zhang