DOI QR코드

DOI QR Code

Automatic Orthologous-Protein-Clustering from Multiple Complete-Genomes by the Best Reciprocal BLAST Hits

유전체 상호간의 BLAST 최대 히트(best-hit)를 사용하여 서열화가 완성된 다수의 유전체로부터 Orthologous 단백질그룹을 자동적으로 클러스터링하는 기법

  • 김선신 (충북대학교 대학원 전자계산학과) ;
  • 이충세 (충북대학교 컴퓨터과학과) ;
  • 류근호 (충북대학교 컴퓨터과학과)
  • Published : 2006.04.01

Abstract

Though the number of completely sequenced genomes quickly grows in recent years, the methods to predict protein functions by homology from the genomes have not been used sufficiently. It has been a successful technique to construct an OPCs(Orthologous Protein Clusters) with the best reciprocal BLAST hits from multiple complete-genomes. But it takes time-consuming-processes to make the OPCs with manual work. We, here, propose an automatic method that clusters OPs(Orthologous Proteins) from multiple complete-genomes, which is, to be extended, based on INPARANOID which is an automatic program to detect OPs between two complete-genomes. We also Prove all possible clustering mathematically.

서열화가 완성된 유전체의 수가 최근에 빠르게 상승하고 있지만, 상동성에 의한 단백질 기능을 예측하는 방법은 충분히 연구되고 있지 않다. 서열화가 완성된 다수의 유전체로부터 유전체 상호간의 BLAST 최대 히트(best-hit)를 사용하여 OPCs(Orthologous Protein Clusters)를 만드는 일은 성공적으로 연구되어 왔다. 그러나 OPCs를 수작업으로 구축하는 것은 시간과 노력이 많이 드는 일이다. 이 논문에서 우리는 서열화가 완성된 다수의 유전체로부터 OPs(Orthologous Proteins)를 클러스터링하는 자동화 방법을 제시하고, 해당 클러스터링의 타당성을 수학적으로 증명 한다.

Keywords

References

  1. W. M. Fitch, 'Distinguishing homologous from analogous proteins', Syst. Zool., Vol.19, pp.99-113, 1970 https://doi.org/10.2307/2412448
  2. R.L. Tatusov, et al., 'A genomic perspective on protein families', Science, Vol.278(5338), pp.631-637, Oct., 24, 1997 https://doi.org/10.1126/science.278.5338.631
  3. L. Roman, et al., 'The COG database: a tool for genome-scale analysis of protein functions and evolution', Nucleic Acids Research, Vol.28(1), pp.33-36, 2000 https://doi.org/10.1093/nar/28.1.33
  4. S.F. Altschul, et al., 'Basic local alignment search tool', J. Mol. Biol., Vol.215, pp.403-410, 1990 https://doi.org/10.1016/S0022-2836(05)80360-2
  5. R.L. Tatusov, et al., 'The COG database: an updated version includes eukaryotes', BMC Bioinformatics, Vol.4(1), No.41, Sep., 11, 2003 https://doi.org/10.1186/1471-2105-4-41
  6. S. A. Chervitz, et al., 'Comparison of the complete protein set of worm and yeast.orthology and divergence', Science, Vol.282, pp.2022-2028, 1998 https://doi.org/10.1126/science.282.5396.2022
  7. G. M. Rubin, et al., 'Comparative genomics of the eukaryotes', Science, Vol.287, pp.2204-2215, 2000 https://doi.org/10.1126/science.287.5461.2204
  8. S. J. Wheelan, et al., 'Human and nematode orthologs-lessons from the analysis of 1800 human genes and the proteome of Caenorhabditis elegans', Gene, Vol.238, pp.163-170, 1999 https://doi.org/10.1016/S0378-1119(99)00298-X
  9. A. R. Mushegian, et al., 'Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes', Genome Res., Vol.8, pp.590-598, 1998
  10. M Kanehisa & B. Peer, 'Bioinformatics in the posts-equences era', nature genetics supplement, Vol.33, pp.305-310, 2003 https://doi.org/10.1038/ng1109
  11. P. Bork & E.V. Koonin, 'Predicting functions from protein sequences-where are the bottlenecks?', Nat. Genet., Vol. 18, pp.313-318, 1998 https://doi.org/10.1038/ng0498-313
  12. J.A. Eisen, 'Phylogenomics:improving functional predictions for uncharacterized genes by evolutionary analysis', Genome Res., Vol.8, pp.163-167, 1998 https://doi.org/10.1101/gr.8.3.163
  13. M. Y. Galperin & E.V. Koonin, 'Sources of systematic error in functional annotation of genomes : domain rearrangement, nonorthologous gene displacement and operon disruption', In Silico Biol., Vol.1, pp.55-67, 1998
  14. S. Kimmen, 'Phylogenonmic inference of protein molecular function: advances and challenges', Bioinformatics, Vol.20, No.2, pp.170-179, 2004 https://doi.org/10.1093/bioinformatics/bth021
  15. H. Bono, et al., 'Systematic Prediction of Orthologous Units of Genes in the Complete Genomes', Genome Inform Ser Workshop Genome Inform., Vol.9, pp.32-40, 1998
  16. R. Maido et al., 'Automatic Clustering of Orthologs and in-paralogs from Pairwise Species Comparisons', J Mol. Biol., Vol.314, pp.1041-1052, 2001 https://doi.org/10.1006/jmbi.2000.5197
  17. G. Michael and A. H. Clyde, 'Gene content phylogeny of herpesviruses', PNAS, Vol.98, No.10, May, 9, 2000 https://doi.org/10.1073/pnas.97.10.5334
  18. J. M. Stuart, et al., 'A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules', Science, Vol.302, Oct., 10, 2003 https://doi.org/10.1126/science.1087447