Detection and Prediction of Alternative Splicing with One-leaf One-node Tree

One-leaf One-node 트리를 이용한 선택 스플라이싱 탐지 및 예측

  • 박민서 (메사추세츠 대학교 컴퓨터과학과)
  • Received : 2010.08.23
  • Accepted : 2010.10.25
  • Published : 2010.10.28


Alternative splicing is an important process in gene expression. Alternative Splicing can lead to mutations and diseases. Most studies detect alternatively spliced genes with ESTs (Expressed Sequence Tags). However, reliance on ESTs might have some weaknesses in predicting alternative splicing. ESTs have been stored in the libraries. The EST libraries are often not clearly organized and annotated. We can pick erroneous ESTs. It is also difficult to predict whether or not alternative splicing exists for those genes where ESTs are not available. To address these issues and to improve the quality of detection and prediction for alternative splicing, we propose the One-leaf One-node Tree Algorithm that uses pre-mRNAs. It is achieved by codons, three nucleotides, as attributes for each chromosome in Arabidopsis thaliana. The proposed decision tree shows that alternative and normal splicing have different splicing patterns according to triplet nucleotides in each chromosome. Based on the patterns, alternative splicing of unlabeled genes can also be predicted.


Alternative Splicing;One-leaf One-node Tree Algorithm;pre-mRNA


  1. T. Chuang, F. Chen, and M. Chou, "A compareative method for identification of gene structures and alternatively spliced variant," Bioinformatics, Vol.20, pp.3064-3079, 2004.
  2. R. Sorek, R. Shemesh, Y. Cohen, O. Basechess, G. Ast, and R. Shamir, "A Non-EST-Based Method for Exon-Skipping Prediction," Genome Research, Vol.14, pp.1617-1623, 2004.
  3. S. Stamm, J. Riethoven, V. Le Texier, C. Gopalakrishnan, V. Kumanduri, Y. Tang, N. Barbosa-Morais, and T. Thanaraj, "ASD: a bioinformatics resource on alternative splicing," Nucleic Acids Research, Vol.34, pp.D46–D55, 2006.
  5. B. Haas, A. Delcher, S. Mount, J. Wortman, R. Smith Jr, L. Hannick, R. Maiti, C. Ronning, D. Rusch, C. Town, S. Salzberg, and O. White, "Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies," Nucleic Acids Research, Vol.31, pp.5654-5666, 2003.
  6. M. Campbell, B. Haas, J. Hamilton, S. Mount, and C. Buell, "Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis," BMC Genomics, Vol.7, p.327, 2006.
  7. R. Nurtdinov, I. Artamonova, A. Mironov, and M. Gelfand, "Low conservation of alternative splicing patterns in the human and mouse genomes," Human Molecular Genetic, Vol.12, pp.1313-1320, 2003.
  10. D. Black, "Mechanisms of alternative pre-messenger RNA splicing," Annual Review of Biochemistry, Vol.72, pp.291-336, 2003.
  11. K. Iida, M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda, A. Konagaya, and K. Shinozaki, "Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis Thaliana based on full-length cDNA sequences," Nucleic Acids Re-search, Vol.32, pp.5096-5103, 2004.
  12. M. Pertea, X. Lin, and S. Salzberg, "GeneSplicer: a new computational method for splice site prediction," Nucleic Acids Research, Vol.29, pp.1185-1190, 2001.
  13. B. Wang and V. Brendel, "Genomewide comparative analysis of alternative splicing in plants," in Proceedings of the National Academy of Science of the United States of America, pp.7175-7180, 2006.
  14. W. Zhu, S. Schlueter, and V. Brendel, "Refined annotation of the Arabidopsis Thaliana genome by complete EST mapping," Plant Physiology, Vol.132, pp.469-484, 2003.
  15. C. Iseli, V. Jongeneel, and P. Bucher, "ESTScan: A program for detecting, evaluating, and reconstructing potential coding regions in EST sequences," in Proceedings of the Seventh ISMB, pp.138-148, 1999.
  16. C. Jongeneel, "Searching the expressed sequence tag (EST) databases: panning for genes," Briefings in Bioinformatics, Vol.1, pp.76-92, 2000.
  17. J. Collins, M. Goward, C. Cole, L. Smink, E. Huckle, S. Knowles, J. M. Bye, D. Beare, and I. Dunham, "Reevaluating human gene annotation: a second-generation analysis of chromosome 22," Genome Research, Vol.13, pp.27-36, 2003.
  18. D. Raghunandan, L. Guglielmo, D. K., and A. Animesh, "Clinical applications of DNA microarray analysis," Journal of Experimental Therapeutics and Oncology, Vol.3, pp.297-304, 2003.
  19. S. Mehta, "DNA Microarrays in Health Care & Drug Discovery," http://plasticdog.cheme.colum
  20. G. Hu, S. Madore, B. Moldever, T. Jatkoe, D. Balaban, J. Thomas, and Y. Want, "Predicting Splice Variant from DNA Chip Expression Data," Genome Research, Vol.11, pp.1237-1245, 2001.
  21. E. Garrett-Mayer and G. Parmigiani, "Clustering and Classification Methods for Gene Expression Data Analysis," Johns Hopkins University, Dept. of Biostatistics Working Papers, Vol.70, 2004.
  22. T. Cover and P. Hart, "Nearest Neighbor Pattern Classification," in Proceedings of IEEE Transaction on Information Theory, pp.21-27, 1967.
  23. R. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, Vol.7, pp.178-188, 1936.
  24. V. Vapnik, Statistical Learning Theory. New York, NY: John Wiley & Sons, 1998.
  25. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth International Group, 1984.
  26. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations. Academic Press, 2000.
  27. A. Nabhan and A. Rafea, "Tuning statistical machine translation parameters using perplexity," in Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, pp.338-343, 2005.
  28. E. Brand and R. Gerritsen, "Decision Trees," DBMS Online, 1988, http://www.dbmsmag. com/-9807m05.html.
  29. K. Delisle, "Decision Trees and Evolutionary Programming," Artificial Intelligence Depot., Tech. Report, DecisionTrees .html.
  30. C. Burge and S. Karlin, "Prediction of complete gene structures in human genomic DNA," Journal of Molecular Biology, Vol.268, pp.78-94, 1997.
  31. H. Zhang and C. Yu, "Tree-based analysis of microarray data for classifying breast cancer," Frontiers in Bioscience, Vol.7, pp.C63-C67, 2002.