JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for Primitive Bioinformatics Tasks and Choosing a Suitable Language
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for Primitive Bioinformatics Tasks and Choosing a Suitable Language
Ryu, Tae-Wan;
  PDF(new window)
 Abstract
Recently many different programming languages have emerged for the development of bioinformatics applications. In addition to the traditional languages, languages from open source projects such as BioPerl, BioPython, and BioJava have become popular because they provide special tools for biological data processing and are easy to use. However, it is not well-studied which of these programming languages will be most suitable for a given bioinformatics task and which factors should be considered in choosing a language for a project. Like many other application projects, bioinformatics projects also require various types of tasks. Accordingly, it will be a challenge to characterize all the aspects of a project in order to choose a language. However, most projects require some common and primitive tasks such as file I/O, text processing, and basic computation for counting, translation, statistics, etc. This paper presents the benchmarking results of six popular languages, Perl, BioPerl, Python, BioPython, Java, and BioJava, for several common and simple bioinformatics tasks. The experimental results of each language are compared through quantitative evaluation metrics such as execution time, memory usage, and size of the source code. Other qualitative factors, including writeability, readability, portability, scalability, and maintainability, that affect the success of a project are also discussed. The results of this research can be useful for developers in choosing an appropriate language for the development of bioinformatics applications.
 Keywords
A programming language comparison;BioPerl;BioJava;BioPython;
 Language
English
 Cited by
 References
1.
P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, 2nd edition, MIT Press, 2001.

2.
A.D. Baxevanis and B.F.F. Ouellette, eds., Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd edition, Wiley, 2005.

3.
D.A. Benson, M. Boguski, D. J. Lipman, J. Ostell, B.F. Ouellete, B.A. Rapp, and D.L. Wheeler, "GenBank," Nucleic Acids Research, 27, 12-17, 1999. crossref(new window)

4.
R.C.G. Holland, T. Down, M. Pocock, A. Prlic, D. Huen, K. James, S. Foisy, A. Drager, A. Yates, M. Heuer, M.J. Schreiber, "BioJava: an Open-Source Framework for Bioinformatics," Bioinformatics, Vol. 24(18), 2008, pp. 2096-2097. crossref(new window)

5.
BioJava Web information, http://www.biojava.org, 2009.

6.
BioPerl Web information, http://www.bioperl.org, 2009.

7.
BioPython Web information, http://www.biopython.org, 2009.

8.
M. Catanho, D. Mascarenhas, W. Degrave, A. Miranda, "BioParser: a tool for processing of sequence similarity analysis reports," Applied Bioinformatics, 5 (1): 49–53, 2006. crossref(new window)

9.
N. Cristianini and M.W. Hahn, Introduction to Computational Genomics, Cambridge University Press, 2006.

10.
O. Croce, M. Lamarre, R. Christen, "Querying the public databases for sequences using complex keywords contained in the feature lines," BMC Bioinformatics 7:45, 2006. crossref(new window)

11.
H. Deitel, P. Deitel, J. Liperi, and B. Wiedermann, B., Python: How to Program, Prentice Hall, 2002.

12.
J. Dugan, Open Source Initiatives in Bioinformatics, A report submitted to health science initiative application working group Internet2, 2001.

13.
M. Fourment and M.R. Gillings, "A comparison of common programming languages used in bioinformatics," BMC Bioinformatics, Vol. 9:82, 2008. crossref(new window)

14.
W. Keedwell, Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems, Wiley, 2005.

15.
R. Khaja, J. MacDonald, J. Zhang, S. Scherer, "Methods for identifying and mapping recent segmental and gene duplications in eukaryotic genomes," Methods Molecular Biology 338: 9–20, 2006.

16.
B. Landsteiner, M. Olson, R. Rutherford, "Current Comparative Table (CCT) automates customized searches of dynamic biological databases," Nucleic Acids Research 33, 2005.

17.
B. Lenhard, W. Wasserman, "TFBS: Computational framework for transcription factor binding site analysis," Bioinformatics 18(8): 1135–6, 2002. crossref(new window)

18.
A.M. Lesk, Introduction to Bioinformatics, Oxford University Press, 2008.

19.
D.W. Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2001.

20.
Open Bioinformatics Foundation, http://www.open-bio.org, 2009.

21.
L. Pachter and B. Sturmfels, Algebraic Statistics for Computational Biology, Cambridge University Press, 2005.

22.
P.A. Pevzner, Computational Molecular Biology: An Algorithmic Approach, The MIT Press, 2001.

23.
L. Prechelt, "An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl," IEEE Computer Vol. 33, 23-29, 2000. crossref(new window)

24.
R. Schwartz, T. Phoenix, and B. Foy, Learning Perl, 5th Edition, O'Reilly, 2008.

25.
R.W. Sebesta, Concepts of programming languages, Addison Wesley, 206-208, 2006.

26.
S. Shah, G. McVicker, A. Mackworth, S. Rogic, B. Ouellette, B., "GeneComber: combining outputs of gene prediction programs for improved results," Bioinformatics 19 (10): 1296–7, 2003. crossref(new window)

27.
J. Shirazi, Java Performance Tuning, O'Reilly, 2003.

28.
J. Stajich, D. Block, K. Boulez, S. Brenner, S. Chervitz, C. Dagdigian, G. Fuellen, J. Gilbert, I. Korf, H. Lapp, H. Lehvaslaiho, C. Matsalla, C. Mungall, B. Osborne, M. Pocock, P. Schattner, M. Senger, L. Stein, E. Stupka, M. Wilkinson, E. Birney, "The Bioperl toolkit: Perl modules for the life sciences" Genome Res 12(10): 1611–8, 2002. crossref(new window)

29.
J.D. Tisdall, Beginning Perl for Bioinformatics, O'Reilly, 2001.

30.
N. Trivedi, K.T. Pedretti, T.A. Braun, T.E. Scheetz, and T.L. Casavant, "Alternative parallelization strategies in EST clustering," Lecture Notes in Computer Science, Vol. 2763, 384 – 394, 2003. crossref(new window)

31.
M.S. Waterman, Introduction to Computational Biology: Sequences, Maps and Genomes, CRC Press, 1995.

32.
J. Zobel, S. Heinz, and H.E. Williams, "In-memory hash tables for accumulating text vocabularies," Information Processing Letters, Vol. 80:6, 271 – 277, 2001. crossref(new window)