DOI QR코드

DOI QR Code

Algorithm for Predicting Functionally Equivalent Proteins from BLAST and HMMER Searches

  • Yu, Dong Su (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Lee, Dae-Hee (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Kim, Seong Keun (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Lee, Choong Hoon (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Song, Ju Yeon (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Kong, Eun Bae (Department of Computer Science and Engineering, Chungnam National University) ;
  • Kim, Jihyun F. (Systems and Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology)
  • Received : 2012.03.21
  • Accepted : 2012.04.01
  • Published : 2012.08.28

Abstract

In order to predict biologically significant attributes such as function from protein sequences, searching against large databases for homologous proteins is a common practice. In particular, BLAST and HMMER are widely used in a variety of biological fields. However, sequence-homologous proteins determined by BLAST and proteins having the same domains predicted by HMMER are not always functionally equivalent, even though their sequences are aligning with high similarity. Thus, accurate assignment of functionally equivalent proteins from aligned sequences remains a challenge in bioinformatics. We have developed the FEP-BH algorithm to predict functionally equivalent proteins from protein-protein pairs identified by BLAST and from protein-domain pairs predicted by HMMER. When examined against domain classes of the Pfam-A seed database, FEP-BH showed 71.53% accuracy, whereas BLAST and HMMER were 57.72% and 36.62%, respectively. We expect that the FEP-BH algorithm will be effective in predicting functionally equivalent proteins from BLAST and HMMER outputs and will also suit biologists who want to search out functionally equivalent proteins from among sequence-homologous proteins.

Keywords

References

  1. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410.
  2. Caspi, R., T. Altman, K. Dreher, C. A. Fulcher, P. Subhraveti, I. M. Keseler, et al. 2012. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/ genome databases. Nucleic Acids Res. 40: D742-D753. https://doi.org/10.1093/nar/gkr1014
  3. Deb, K. and A. Raji Reddy. 2003. Reliable classification of two-class cancer data using evolutionary algorithms. Biosystems 72: 111-129. https://doi.org/10.1016/S0303-2647(03)00138-2
  4. Eddy, S. R. 1998. Profile hidden Markov models. Bioinformatics 14: 755-763. https://doi.org/10.1093/bioinformatics/14.9.755
  5. Finn, R. D., J. Mistry, J. Tate, P. Coggill, A. Heger, J. E. Pollington, et al. 2010. The Pfam protein families database. Nucleic Acids Res. 38: D211-D222. https://doi.org/10.1093/nar/gkp985
  6. Fischer, S., B. P. Brunk, F. Chen, X. Gao, O. S. Harb, J. B. Iodice, et al. 2011. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinformatics 35: 6.12.1-6.12.19.
  7. Haft, D. H., B. J. Loftus, D. L. Richardson, F. Yang, J. A. Eisen, I. T. Paulsen, and O. White. 2001. TIGRFAMs: A protein family resource for the functional identification of proteins. Nucleic Acids Res. 29: 41-43. https://doi.org/10.1093/nar/29.1.41
  8. Karlik, B., M. O. Tokhi, and M. Alci. 2003. A fuzzy clustering neural network architecture for multifunction upper-limb prosthesis. IEEE Trans. Biomed. Eng. 50: 1255-1261. https://doi.org/10.1109/TBME.2003.818469
  9. Keim, D. A., D. Oelke, R. Truman, and K. Neuhaus. 2006. Finding correlations in functionally equivalent proteins by integrating automated and visual data exploration, pp. 183-192. In: Proceedings of the Sixth IEEE Symposium on BioInformatics and BioEngineering, 16-18 October 2006. IEEE Computer Society Washington, DU, USA.
  10. Koski, L. B., M. W. Gray, B. F. Lang, and G. Burger. 2005. AutoFACT: An automatic functional annotation and classification tool. BMC Bioinformatics 6: 151. https://doi.org/10.1186/1471-2105-6-151
  11. Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, et al. 2004. ARB: A software environment for sequence data. Nucleic Acids Res. 32: 1363-1371. https://doi.org/10.1093/nar/gkh293
  12. Magrane, M. and U. Consortium. 2011. UniProt Knowledgebase: A hub of integrated protein data. Database (Oxford) 2011: bar009.
  13. Mardis, E. R. 2008. The impact of next-generation sequencing technology on genetics. Trends Genet. 24: 133-141. https://doi.org/10.1016/j.tig.2007.12.007
  14. Ma, Z., C. Zhou, L. Lu, Y. Ma, P. Sun, and Y. Cui. 2007. Predicting protein-protein interactions based on BP neural network, pp. 3-7. In: Proceedings of IEEE International Conference on Bioinformatics and Biomedicine Workshops, 2007. IEEE Computer Society Washington, DC, USA.
  15. McMillan, L. E. and A. C. Martin. 2008. Automatically extracting functionally equivalent proteins from SwissProt. BMC Bioinformatics 9: 418. https://doi.org/10.1186/1471-2105-9-418
  16. Michalopoulos, D. and C.-K. Hu. 2002. An error backpropagation artificial neural networks application in automatic car license plate recognition, pp. 1-8. In: Lecture Notes in Computer Science. Vol. 2358. Springer Berlin/Heidelberg.
  17. Moreno-Hagelsieb, G. and K. Latimer. 2008. Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics 24: 319-324. https://doi.org/10.1093/bioinformatics/btm585
  18. Naik, A. D. and S. S. Bhagwat. 2005. Optimization of an artificial neural network for modeling protein solubility. J. Chem. Eng. Data 50: 460-467. https://doi.org/10.1021/je049713d
  19. Nair, T. M., S. S. Tambe, and B. D. Kulkarni. 1994. Application of artificial neural networks for prokaryotic transcription terminator prediction. FEBS Lett. 346: 273-277. https://doi.org/10.1016/0014-5793(94)00489-7
  20. Needleman, S. B. and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: 443-453. https://doi.org/10.1016/0022-2836(70)90057-4
  21. Oh, S.-H. 2011. Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74: 1058-1061. https://doi.org/10.1016/j.neucom.2010.11.024
  22. Ponting, C. P. 2001. Issues in predicting protein function from sequence. Briefings Bioinformatics 2: 19-29. https://doi.org/10.1093/bib/2.1.19
  23. Smith, T. F. and M. S. Waterman. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147: 195-197. https://doi.org/10.1016/0022-2836(81)90087-5
  24. Watson, J. D., R. A. Laskowski, and J. M. Thornton. 2005. Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15: 275-284. https://doi.org/10.1016/j.sbi.2005.04.003
  25. Wilamowski, B. M. 2009. Neural network architectures and learning algorithms. Ind. Electron. Mag. IEEE 3: 56-63.
  26. Zhang, W., J. Chen, Y. Yang, Y. Tang, J. Shang, and B. Shen. 2011. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS One 6: e17915. https://doi.org/10.1371/journal.pone.0017915

Cited by

  1. Characterization of the Cadherin–Catenin Complex of the Sea Anemone Nematostella vectensis and Implications for the Evolution of Metazoan Cell–Cell Adhesion vol.33, pp.8, 2012, https://doi.org/10.1093/molbev/msw084
  2. PdumBase: a transcriptome database and research tool for Platynereis dumerilii and early development of other metazoans vol.19, pp.None, 2012, https://doi.org/10.1186/s12864-018-4987-0
  3. Designing an Outer Membrane Protein (Omp-W) Based Vaccine for Immunization against Vibrio and Salmonella: An in silico Approach vol.14, pp.4, 2012, https://doi.org/10.2174/1874609813666200929113341