Directional adjacency-score function for protein fold recognition

  • Heo, Mu-Young (Creative Research Initiatives Center for Proteome Biophysics, Department of Physics, Pusan National University) ;
  • Cheon, Moo-Kyung (Creative Research Initiatives Center for Proteome Biophysics, Department of Physics, Pusan National University) ;
  • Kim, Suhk-Mann (Department of Chemistry, Pusan National University) ;
  • Chung, Kwang-Hoon (Creative Research Initiatives Center for Proteome Biophysics, Department of Physics, Pusan National University) ;
  • Chang, Ik-Soo (Creative Research Initiatives Center for Proteome Biophysics, Department of Physics, Pusan National University)
  • Published : 2009.06.30


Introduction: It is a challenge to design a protein score function which stabilizes the native structures of many proteins simultaneously. The coarse-grained description of proteins to construct the pairwise-contact score function usually ignores the backbone directionality of protein structures. We propose a new two-body score function which stabilizes all native states of 1,006 proteins simultaneously. This two-body score function differs from the usual pairwise-contact functions in that it considers two adjacent amino acids at two ends of each peptide bond with the backbone directionality from the N-terminal to the C-terminal. The score is a corresponding propensity for a directional alignment of two adjacent amino acids with their local environments. Results and Discussion: We show that the construction of a directional adjacency-score function was achieved using 1,006 training proteins with the sequence homology less than 30%, which include all representatives of different protein classes. After parameterizing the local environments of amino acids into 9 categories depending on three secondary structures and three kinds of hydrophobicity of amino acids, the 32,400 adjacency-scores of amino acids could be determined by the perceptron learning and the protein threading. These could stabilize simultaneously all native folds of 1,006 training proteins. When these parameters are tested on the new distinct 382 proteins with the sequence homology less than 90%, 371 (97.1%) proteins could recognize their native folds. We also showed using these parameters that the retro sequence of the SH3 domain, the B domain of Staphylococcal protein A, and the B1 domain of Streptococcal protein G could not be stabilized to fold, which agrees with the experimental evidence.


  1. Anfinsen, C.B. (1973). Principles that govern the folding of protein chains. Science 181, 223-230
  2. Baker, D. (2000). A surprising simplicity to protein folding. Nature 405, 39-42
  3. Bowie, J.U., Luthy, R., Eisenberg, D. (1991). A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164-170
  4. Bryant, S.H., Lawrence, C.E. (1993). An empirical energy function for threading protein sequence through folding motif. Proteins Struct. Funct. Genet. 16, 92-112
  5. Chang, I., Cieplak, M., Dima, R.I., Maritan, A., Banavar, J.R. (2001). Protein threading by learning. Proc. Natl. Acad. Sci. USA 98, 14350-14355
  6. Dima, R.I., Banavar, J.R., Maritan, A. (2000). Scoring functions in protein folding and design. Protein Sci. 9, 812-819
  7. Fersht, A.R. (1998). Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. Freeman, New York
  8. Fischer, D., Rice, D.W., Bowie, J.U., Eisenberg, D. (1996) Assigning amino acid sequences to 3D protein folds. FASEB J. 10, 126-136
  9. Friedrichs, M.S., Wolynes, P.G. (1989). Toward Protein Tertiary Structure Recognition by means of Associative Memory Hamiltonians. Science 246, 371-373
  10. Godzik, A., Kolinski, A., Skolnick, J. (1992). Topology fingerprint approach to the inverse protein folding problem. J. Mol. Biol. 227, 227-238
  11. Goldstein, R.A., Luthey-Schulten, Z.A., Wolynes, P.G. (1992). Protein tertiary structure recognition using optimized Hamiltonians with local interactions. Proc. Natl. Acad. Sci. USA 89, 9029-9033
  12. Heo, M., Cheon, M., Chang, I. (2004). Nonsymmetric two-body score function for protein fold recognition; Next nearest neighbor-adjacency of two amino acids. Int. J. Mod. Phys. C, 15, 1087-1094
  13. Heo, M., Kim, S., Moon, E.J., Cheon, M., Chung, K., Chang, I. (2005). Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment. Phys. Rev. E. 72, 011906/1-011906/9
  14. Hobohm, U., Sander, C. (1994). Enlarged representative set of protein structures. Protein Sci. 3, 522-524
  15. Hobohm, U., Scharf, M., Schneider, R., Sander, C. (1992). Selection of representative protein data sets. Protein Sci. 1, 409-417
  16. Holm, L., Sander, C. (1996). Mapping the Protein Universe. Science 273, 595-602
  17. Jones, D.T., Taylor, W.R., Thonton, J.M. (1992). A new approach to protein fold recognition. Nature 358, 86-89
  18. Kolinski, A., Skolnick, J. (1994) a. Monte Carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins Struct. Funct. Genet. 18, 338-352
  19. Kolinski, A., Skolnick, J. (1994)b. Monte Carlo simulation of protein folding. II. Application to protein A, ROP and crambin. Proteins Struct. Funct. Genet. 18, 353-366
  20. Krauth, W., Mezard, M. (1987). Learning algorithms with optimal stability in neural networks. J. Phys. A 20, L745-L752
  21. Lacroix, E., Viguerra, A.R., Serrano, L. (1998). Reading protein sequences backwards. Fold. Des. 3, 79-85
  22. Lee, B., Richards, F.M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379-400
  23. Maiorov, V.N., Crippen, G.M. (1992). Contact potential that recognizes the correct folding of globular proteins. J. Mol. Biol. 227, 876-888
  24. Mayor, U., Guydosh, N.R., Johnson, C.M., Grossmann, J.G., Sato, S., Jas, G.S., Freund, S.M., Alonso, D.O., Daggett, V., Fersht, A.R. (2003). The complete folding pathway of a protein from nanoseconds to microseconds. Nature 421, 863-867
  25. Micheletti, C., Banavar, J.R., Maritan, A., Seno, F. (1998). Steric Constraints in Model Proteins. Phys. Rev. Lett. 80, 5683–5686
  26. Miyazawa, S., Jernigan, R.L. (1985). Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534-552
  27. Olszewsk, K.A., Kolinski, A., Skolnick, J. (1996). Does a backwardly read protein sequence have a unique native state? Protein Eng. 9, 5-14
  28. Ouzounis, C., Sander, C., Scharf, M., Schneider, R. (1993). Prediction of protein structure by evaluation of equence-structure fitness. Aligning sequences to contact profiles derived from 3D structures. J. Mol. Biol. 232, 805-823
  29. Pattabiraman, N., Ward, K.B., Fleming, P.J. (1995). Occluded molecular surface: analysis of protein packing. J. Mol. Recognit. 8, 334-344
  30. Salvi, G., DeLosRios, P. (2003). Effective interactions cannot replace solvent effects in a lattice model of proteins. Phys. Rev. Lett. 91, 258102
  31. Seno, F., Vendruscolo, M., Maritan, A., Banavar, J.R. (1996). Optimal Protein Design Procedure. Phys. Rev. Lett. 77, 1901-1904
  32. Sippl, M.J., Weitckus, S. (1992). Detection of native like models for amino acid sequences of unknown three dimensional structure in a data base of known protein conformations. Proteins Struct. Funct. Genet. 13, 258-271
  33. Vendruscolo, M., Najmanovich, R., and Domany, E. (1999). Protein Folding in Contact Map Space. Phys. Rev. Lett. 82, 656–659
  34. Wilmanns, M., Eisenberg, D. (1993). Three-dimensional profiles from residue pair preferences: Identification of sequences with β/α-barrel fold. Proc. Natl. Acad. Sci. USA 90, 1379-1383
  35. Wolynes, P.G., Onuchic, J.N., Thirumalai, D. (1995). Navigating the folding routes. Science 267, 1619-1620
  36. Zhang, C., Kim, S.H. (2000). Environment-dependent residue contact energies for proteins. Proc. Natl. Acad. Sci. USA 97, 2550-2555