DOI QR코드

DOI QR Code

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences

단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측

  • Chi, Sang-mun (Department of Computer Science and Engineering, Kyungsung University)
  • Received : 2016.05.30
  • Accepted : 2016.06.17
  • Published : 2016.09.30

Abstract

Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

단백질은 대부분의 생물학적 과정에서 중대한 역할을 수행하고 있으므로, 단백질 진화, 구조와 기능을 알아내기 위하여 많은 연구가 수행되고 있는데, 단백질의 이차 구조는 이러한 연구의 중요한 기본적 정보이다. 본 연구는 대규모 단백질 구조 자료로부터 단백질 이차 구조 정보를 효과적으로 추출하여 미지의 단백질 서열이 가지는 이차 구조를 예측하려 한다. 질의 서열과 상동관계에 있는 단백질 구조자료내의 서열들을 광범위하게 찾아내기 위하여, 탐색에 사용하는 프로파일의 구성에 질의 서열과 유사한 서열들을 사용하고 갭을 허용하여 반복적인 탐색이 가능한 PSI-BLAST를 사용하였다. 상동 단백질들의 이차구조는 질의 서열과의 상동 관계의 강도에 따라 가중되어 이차 구조 예측에 기여되었다. 이차 구조를 각각 세 개와 여덟 개로 분류하는 예측 실험에서 상동 서열들과 신경망을 동시에 사용하여 93.28%와 88.79%의 정확도를 얻어서 기존 방법보다 성능이 향상되었다.

Keywords

References

  1. H. Lodish, A. Berk, C.A. Kaiser, et al., Molecular Cell Biology, 6th Ed. New York, NY: W. H. Freeman and Company, 2007.
  2. H. W. Buchan, et al., "Scalable web services for the PSIPRED protein analysis workbench," Nucleic Acids Res., vol. 41, W72-W76, Jul. 2013. https://doi.org/10.1093/nar/gks1467
  3. C. N. Magnan and P. Baldi, "SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity," Bioinformatics, vol. 30, no. 18, pp, 2592-2597, Sep. 2014. https://doi.org/10.1093/bioinformatics/btu352
  4. C. Mirabello and G. Pollastri, "Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility," Bioinformatics, vol. 29, no. 16, pp. 2056-2058, Aug. 2013. https://doi.org/10.1093/bioinformatics/btt344
  5. R. Yan, et al, "A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction," Scientific Reports 3, Article number: 2619, Sep. 2013.
  6. J. Zhou and O. Troyanskaya, "Deep supervised convolutional generative stochastic network for protein secondary structure prediction," in JMLR Proceedings, 32, pp. 745-753, Beijing, China, 2014.
  7. M. Spencer, J. Eickholt and J. Cheng, "A deep learning network approach to ab initio protein secondary structure prediction," IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 12, no. 1, pp. 103-112, Jan.-Feb. 2015. https://doi.org/10.1109/TCBB.2014.2343960
  8. R. Heffernan, et al, "Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning," Scientific Reports 5, Article number: 11476, June 2015.
  9. S. Wang, J. Peng, J. Ma, and J. Xu, "Protein secondary structure prediction using deep convolutional neural fields," Scientific Reports 6, Article number: 18962, Jan. 2016.
  10. S. F. Altschul, et al, "Gapped blast and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res., vol. 25, no. 17, pp. 3389-3402, Sep. 1997. https://doi.org/10.1093/nar/25.17.3389
  11. B. E. Suzek, et al, "Uniref: comprehensive and nonreduncant uniprot reference clusters," Bioinformatics, vol. 23, no. 10, pp. 1282-1288, May 2007. https://doi.org/10.1093/bioinformatics/btm098
  12. H. M. Berman, et al, "The protein data bank," Nucleic Acids Res. vol. 28, no. 1, pp. 235-242, Jan. 2000. https://doi.org/10.1093/nar/28.1.235
  13. W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pattern recognition of hydrohen-bonded and geometrical features," Biopolymers, vol. 22, no. 12, pp. 2577-2637, Dec. 1983. https://doi.org/10.1002/bip.360221211