DOI QR코드

DOI QR Code

Protein Disorder Prediction Using Multilayer Perceptrons

  • Received : 2013.08.16
  • Accepted : 2013.12.06
  • Published : 2013.12.28

Abstract

"Protein Folding Problem" is considered to be one of the "Great Challenges of Computer Science" and prediction of disordered protein is an important part of the protein folding problem. Machine learning models can predict the disordered structure of protein based on its characteristic of "learning from examples". Among many machine learning models, we investigate the possibility of multilayer perceptron (MLP) as the predictor of protein disorder. The investigation includes a single hidden layer MLP, multi hidden layer MLP and the hierarchical structure of MLP. Also, the target node cost function which deals with imbalanced data is used as training criteria of MLPs. Based on the investigation results, we insist that MLP should have deep architectures for performance improvement of protein disorder prediction.

Keywords

Protein Disorder Prediction;Multilayer Perceptron;Error Function;Hierarchical Structure

References

  1. P. Romero, Z. Obradovic, and A. K. Dunker, "Intelligent data analysis for protein disorder prediction," Artificial Intelligence Review, vol. 14, 2000, pp. 447-484. https://doi.org/10.1023/A:1006678623815
  2. R. Linding, L. J. Jensen, F. Diella, P. Bork, T. J. Gibson, and R. B. Russell, "Protein disorder prediction: Implications for structural proteomics," Structure, vol. 11, 2003, pp. 1453-1459. https://doi.org/10.1016/j.str.2003.10.002
  3. Z. R. Yang and R. Thomson, "Bio-basis function neural network for prediction of protease cleavage sites in proteins," IEEE Trans. Neural Networks, vol. 16, 2005, pp. 263-274. https://doi.org/10.1109/TNN.2004.836196
  4. Z. R. Yang, R. Thomson, P. McNeil, and R. M. Esnouf, "RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins," Bioinformatics, vol. 21, 2005, pp. 3369-3376. https://doi.org/10.1093/bioinformatics/bti534
  5. FCCST, Grand Challenges 1993: High performance computing and communications, A report by the committee on physical, mathematical, and engineering sciences, Federal coordinating council for science and technology.
  6. O. Noivirt-Brik, J. Prilusky, and J. L. Sussman, "Assessment of disorder predictions in CASP8," Proteins, vol. 77, 2009, pp. 210-216. https://doi.org/10.1002/prot.22586
  7. F. Ferron, S. Longhi, B. Canard, and D. Karlin, "A practical overview of protein disorder prediction methods," PROTEINS: Structure, Function, and Bioinformatics, vol. 65, 2006, pp. 1-14. https://doi.org/10.1002/prot.21075
  8. B. He, K. Wang, Y. Liu, B. Xue, V. N. Uversky, and A. K. Dunker, "Predicting intrinsic disorder in proteins: an overview," Cell Research, vol. 19, 2009, pp. 929-949. https://doi.org/10.1038/cr.2009.87
  9. P. Kang and S. Cho, "EUS SVMs: ensemble of undersampled SVMs for data imbalance problem," Proc. ICONIP'06, 2006, pp. 837-846.
  10. R. Bi, Y. Zhou, F. Lu, and W. Wang, "Predicting gene ontology functions based on support vector machines and statistical significance estimation," Neurocomputing, vol. 70, 2007, pp. 718-725. https://doi.org/10.1016/j.neucom.2006.10.006
  11. L. Bruzzone, and S. B. Serpico, "Classification of Remote-Sensing Data by Neural Networks," Pattern Recognition Letters, vol. 18, 1997, pp. 1323-1328. https://doi.org/10.1016/S0167-8655(97)00109-8
  12. Y. M. Huang, C. M. Hung, and H. C. Jiau, "Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem," Nonlinear Analysis, vol. 7, 2006, pp. 720-747. https://doi.org/10.1016/j.nonrwa.2005.04.006
  13. K. Hornik, M. Stincombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, 1989, pp. 359-366. https://doi.org/10.1016/0893-6080(89)90020-8
  14. D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, Cambridge, MA, 1986.
  15. N. V. Chawla, K. W. Bowyer, L. O. all, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," J. Artificial Intelligence Research, vol. 16, 2002, pp. 321-357.
  16. S. H. Oh, "Error back-propagation algorithm for classification of imbalanced data", Neurocomputing, vol. 74, 2011, pp. 1058-1061. https://doi.org/10.1016/j.neucom.2010.11.024
  17. S. H. Oh, "Improving the Error Back-Propagation Algorithm with a Modified Error Function," IEEE Trans. Neural Networks, vol. 8, 1997, pp. 799-803. https://doi.org/10.1109/72.572117
  18. S. H. Oh, "A Statistical Perspective of Neural Networks for Imbalanced Data Problems," Int. Journal of Contents, vol. 7, no. 3, 2011, pp. 1-5.
  19. Y. Lee, S. H. Oh, and M. W. Kim, "An Analysis of Premature Saturation in Back-Propagation Learning," Neural Networks, vol. 6, 1993, pp. 719-728. https://doi.org/10.1016/S0893-6080(05)80116-9
  20. G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, 2006, pp. 504-507. https://doi.org/10.1126/science.1127647
  21. Y. Bengio, "Learning Deep Architecture for AI," Foundations and Trends in Machine Learning, vol. 2, 2009, pp. 1-127. https://doi.org/10.1561/2200000006