DOI QR코드

DOI QR Code

Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM

SVM 모델을 이용한 3차원 패치 기반 단백질 상호작용 사이트 예측기법

  • 박성희 (숭실대학교 생명정보학과) ;
  • Received : 2011.11.08
  • Accepted : 2012.02.27
  • Published : 2012.02.29

Abstract

Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.

모노머 단백질의 상호작용 사이트 예측은 기능을 알지 못하는 단백질에 대해서 이것과 상호작용하는 단백질로부터 기능을 예측하거나 단백질 도킹을 위한 검색 공간의 감소에 중요한 역할을 한다. 그러나 상호작용사이트 예측은 대부분 단백질 상호작용이 세포 내에서 순간적 반응에 일어나는 약한 상호작용으로 실험에 의한 3차원 결정 구조 식별의 어려움이 따르며 이로 인해 3차원의 복합체 데이터가 제한적으로 양산된다. 이 논문에서는 모노머 단백질의 3차원 패치 계산을 통하여 구조가 알려진 복합체의 상호작용사이트와 비상호작용사이트에 대한 패치 속성을 추출하고 이를 기반으로 Support Vector Machine (SVM) 분류기법을 이용한 예측 모델 개발을 제시한다. 타겟 클래스의 데이터 불균형 문제 해결을 위해 under-sampling 기법을 이용한다. 사용된 패치속성은 2차 구조 요소와 아미노산 구성으로부터 총 9개가 추출된다. 147개의 단백질 복합체에 대해서 10 fold cross validation을 통해서 다양한 분류모델의 성능 평가를 하였다. 평가한 분류 모델 중 SVM은 92.7%의 높은 정확성을 보이고 이를 이용하여 분류 모델을 개발하였다.

Keywords

References

  1. S. Jones and J. M. Thornton, "Analysis of protein-protein interaction sites using surface patches.", J Mol Biol, Vol.272, pp.121-132, 1997. https://doi.org/10.1006/jmbi.1997.1234
  2. S.B. Qin and H.X. Zhou, "A holistic approach to protein docking.", Proteins Vol.69, pp.743-74, 2007. https://doi.org/10.1002/prot.21752
  3. Z. Qiu and X. Wang, "Prediction of protein-protein interaction sites using patch-based residue characterization.", Journal of Theoretical Biology, Vol.293, pp.143-150, 2011.
  4. W. S. Valdar and J. M. William, "Protein-protein interfaces: Analysis of amino acid conservation in homodimers.", Proteins Vol. 42, No.1, pp.108-124, 2001. https://doi.org/10.1002/1097-0134(20010101)42:1<108::AID-PROT110>3.0.CO;2-O
  5. H. Neuvirth, R. Raz and G. Schreiber, "ProMate: a structure based prediction program to identify the location of protein-protein binding sites.", J Mol Biol, Vol.338, pp.181-199, 2004. https://doi.org/10.1016/j.jmb.2004.02.040
  6. F. P. Davis and A. Sali, "PIBASE: a comprehensive database of structurally defined protein interfaces.", Bioinformatics, Vol.21, No.9, pp.1901-1907, 2005. https://doi.org/10.1093/bioinformatics/bti277
  7. C. D. Livingstone and G. J. Barton, "Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation.", Computer Applications in the Biosciences, Vol. 9, No.6, pp.745-756, 1993.
  8. J. R. Bock and D. A. Gough, "Predicting protein-protein interactions from primary structure.", Bioinformatics, Vol.17, No.5, pp.455-460, 2001. https://doi.org/10.1093/bioinformatics/17.5.455
  9. A. Koike and T. Takagi, "Prediction of protein-protein interaction sites using support vector machines.", Protein Eng Des Sel, Vol.17, No.2, pp.165-173, 2004. https://doi.org/10.1093/protein/gzh020
  10. Bradford J. R. and D. R. Westhead, "Improved prediction of protein-protein binding sites using a support vector machines approach.", Bioinformatics, Vol.21, No.8, pp.1487- 1494, 2005. https://doi.org/10.1093/bioinformatics/bti242
  11. H. Zhu, F. S. Domingues, I. Sommer and T. Lengauer, "NOXclass: prediction of protein-protein interaction types.", BMC Bioinformatics, Vol.7, No.27, 2006.
  12. X. W. Chen and M. Liu, "Prediction of protein-protein interactions using random decision forest framework.", Bioinformatics, Vol.21, No.24, pp.4394-4400, 2005. https://doi.org/10.1093/bioinformatics/bti721
  13. S. H. Park, J. A. Reyes, D. R. Gilbert DR, J. W. Kim and S. Kim, "Prediction of protein-protein interaction types using association rule based classification.", BMC Bioifnromatics, Vol.10, No.36, 2009.
  14. S. J. Hubbard, S. F. Campbell and J. M. Thornton, "Molecular recognition: conformational analysis of limited proteolytic sites and serine proteinase inhibitors.", J. Mol. Biol., Vol.220, pp.507-530, 1991. https://doi.org/10.1016/0022-2836(91)90027-4
  15. J-L. Fauchere and V. E. Pliska, "Hydrophobic parameters p of amino acid side chains from partitioning of N-acetyl-amino-acid amides.", Eur J Med Chem., Vol.18, pp.369-375, 1983.
  16. W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.", Biopolymers, Vol.22, No.12, pp.2577- 637, 1983. https://doi.org/10.1002/bip.360221211
  17. R. Quinlan, 'C4.5: Programs for Machine Learning', Morgan Kaufmann Publishers, 1993.
  18. L. Breiman, "Random Forests.", Machine Learning, Vol.45, No.1, pp.5-32, 2001. https://doi.org/10.1023/A:1010933404324
  19. J. Platt, "Using Analytic QP and Sparseness to Speed Training of Support Vector Machines.", NIPS 11, pp.557-563, 1999.
  20. I. H. Witten and E. Frank, 'Data Mining: Practical machine learning tools and techniques',2nd ED, San Francisco: Morgan Kaufmann, 2005.