DOI QR코드

DOI QR Code

A Performance Comparison of Protein Profiles for the Prediction of Protein Secondary Structures

단백질 이차 구조 예측을 위한 단백질 프로파일의 성능 비교

  • Chi, Sang-Mun (Department of Computer Science, Kyungsung University)
  • Received : 2017.08.24
  • Accepted : 2017.09.28
  • Published : 2018.01.31

Abstract

The protein secondary structures are important information for studying the evolution, structure and function of proteins. Recently, deep learning methods have been actively applied to predict the secondary structure of proteins using only protein sequence information. In these methods, widely used input features are protein profiles transformed from protein sequences. In this paper, to obtain an effective protein profiles, protein profiles were constructed using protein sequence search methods such as PSI-BLAST and HHblits. We adjust the similarity threshold for determining the homologous protein sequence used in constructing the protein profile and the number of iterations of the profile construction using the homologous sequence information. We used the protein profiles as inputs to convolutional neural networks and recurrent neural networks to predict the secondary structures. The protein profile that was created by adding evolutionary information only once was effective.

단백질의 이차구조는 단백질의 진화, 구조, 기능을 연구하는데 중요한 정보이다. 단백질 서열 정보만을 이용하여 단백질의 이차 구조를 예측하는 분야에 심층 학습 방법들이 최근 들어 활발히 적용되고 있다. 이러한 방법에서 널리 사용되는 입력은 단백질 서열을 변환하여 만들어진 단백질 프로파일이다. 본 논문에서는 효과적인 단백질 프로파일을 얻기 위하여 단백질 서열 탐색 방법으로 PSI-BLAST와 더불어서 HHblits를 사용하였다. 단백질 프로파일의 구성에 사용되는 상동 단백질 서열을 결정하기 위한 유사도 문턱치와 상동 단백질 서열 정보를 반복적으로 사용하는 회수를 조절하였다. 합성곱 신경망과 순환 신경망을 사용하여 단백질 이차구조를 예측하였는데, 진화적 정보를 한번만 추가하여 만들어진 단백질 프로파일이 효과적이었다.

Keywords

References

  1. D. Baker and A. Sali., "Protein structure prediction and structural genomics," Science, vol. 294, pp. 93-96, Oct. 2001. https://doi.org/10.1126/science.1065659
  2. H. Lodish, et al., Molecular Cell Biology, sixth Ed., W.H. Freeman and Company, New York, 2007.
  3. H. W. Buchan, et al., "Scalable web services for the PSIPRED protein analysis workbench," Nucleic Acids Research, vol. 41, W72-W76, Jul. 2013. https://doi.org/10.1093/nar/gks1467
  4. C. N. Magnan and P. Baldi, "SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity," Bioinformatics, vol. 30, pp, 2592-2597, Sep. 2014. https://doi.org/10.1093/bioinformatics/btu352
  5. J. Zhou, and O. Troyanskaya, "Deep supervised convolutional generative stochastic network for protein secondary structure prediction," Journal of Machine Learning Research W&CP, vol. 32, pp. 745-753, Jun. 2014.
  6. M. Spencer, J. Eickholt, and J. Cheng, "A deep learning network approach to ab initio protein secondary structure prediction," IEEE/ACM Transactions on Computational Biology Bioinformatics, 12, pp. 103-112, Jan/Feb. 2015. https://doi.org/10.1109/TCBB.2014.2343960
  7. S. Wang, et al., "Protein secondary structure prediction using deep convolutional neural fields," Scientific Reports 6, Article number: 18962, Jan. 2016.
  8. J. Schimidhuber, "Deep learning in neural networks: An overview," Neural Networks, vol. 61, pp. 85-117, Jan, 2015. https://doi.org/10.1016/j.neunet.2014.09.003
  9. Y. LeCunn, Y. Bengio, and G. Hinton, "Deep Learning," Nature, vol. 521, pp. 436-444. May 2015. https://doi.org/10.1038/nature14539
  10. O. Abdel-Hamid, et al., "Convolutional Neural Networks for Speech Recognition". IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 22, no. 10. pp. 1533-1545, Jul. 2014. https://doi.org/10.1109/TASLP.2014.2339736
  11. A. Graves, et al., "Generating sequences with recurrent neural networks," arXiv preprint 1308.0850, Jun. 2014.
  12. C. Kyunghyun, et al., "On the properties of neural machine translation: Encoder-decoder approaches," arXiv preprint 1409.1259, Oct. 2014.
  13. S. F. Altschul, et al., "Gapped blast and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, Sep. 1997. https://doi.org/10.1093/nar/25.17.3389
  14. M. Remmert, A. Biegert, and J. Soding, "HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment," Nature Methods, vol. 9, pp. 173-175, Dec. 2011.
  15. A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," Proceeding of International Conference on Acoustics, Speech and Signal Processing, Vancouver, Cananda, May 2013.
  16. A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, Vancouver, Canada, May 2013.
  17. B. E. Suzek, et al., "Uniref: comprehensive and non-reduncant uniprot reference clusters," Bioinformatics, vol. 23, pp. 1282-1288, May 2007. https://doi.org/10.1093/bioinformatics/btm098
  18. The UniProt Consortium, "UniProt: the universal protein knowledgebase," Nucleic Acids Research, vol. 45, D158-D169, Jan. 2017. https://doi.org/10.1093/nar/gkw1099
  19. G. Wang and R.L. Dunbrack "PISCES: a protein sequence culling server," Bioinformatics, vol. 19, pp. 1589-1591, Aug. 2003. https://doi.org/10.1093/bioinformatics/btg224
  20. W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pattern recognition of hydrohen-bonded and geometrical features," Biopolymers, vol. 22, pp. 2577-2637, Dec. 1983. https://doi.org/10.1002/bip.360221211
  21. Theano Development Team. "Theano: A Python framework for fast computation of mathematical expressions," arXiv e-prints, 1605.02688, May 2016.
  22. S.. Dieleman, et al., "Lasagne: First release," DOI:10.5281/zenodo.27878, http://dx.doi.org/10.5281/zenodo.27878, Aug. 2015.
  23. J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, pp. 2121-2159, Jul. 2011.