DOI QR코드

DOI QR Code

Improved Deep Learning Algorithm

  • Received : 2018.11.15
  • Accepted : 2018.12.27
  • Published : 2018.12.31

Abstract

Training a very large deep neural network can be painfully slow and prone to overfitting. Many researches have done for overcoming the problem. In this paper, a combination of early stopping and ADAM based deep neural network was presented. This form of deep network is useful for handling the big data because it automatically stop the training before overfitting occurs. Also generalization ability is better than pure deep neural network model.

Keywords

Acknowledgement

Supported by : Youngsan University

References

  1. S. Geman, E. Bienenstock, and R. Doursat. "Neural networks and the bias/variance dilemma", Neural Computation, Vol. 4, No. 1, pp. 1-58, Jan. 1992. https://doi.org/10.1162/neco.1992.4.1.1
  2. S. E. Fahlman and C. Lebiere, "The Cascade-Correlation learning architecture", Advances in Neural Information Processing Systems 5, San Mateo, CA, Morgan Kaufman Publishers Inc, 1993. pp. 524-532.
  3. Y. L. Cun, S. D. John, and S. A. Solla, "Second order derivatives for network pruning", Advances in Neural Information Processing Systems 5, San Mateo, CA,. Morgan Kaufman Publishers Inc, 1990, pp. 598-6055
  4. Y. L. Cun, S. D. John, and S. A. Solla, "Optimal brain damage", Advances in Neural Information Processing Systems 5, San Mateo, CA, Morgan Kaufman Publishers Inc, 1993, pp. 164-171.
  5. S. J. Nowlan and G. E. Hinton, "Simplifying neural networks by soft weight-sharing", Neural Computation, Vol. 4, No. 4, pp.473-493, July 1992. https://doi.org/10.1162/neco.1992.4.4.473
  6. A. Krogh and J. A. Hertz, "A simple weight decay can improve generalization", Advances in Neural Information Processing Systems 5, San Mateo, CA, Morgan Kaufman Publishers Inc, 1993, pp. 950-957.
  7. A.S. Weigend, D. E. Rumelhart, and B. A. Huberman, "Generalization by weight-elimination with application to forecasting", Advances in Neural Information Processing Systems 5, San Mateo, CA, Morgan Kaufman Publishers Inc, 1993, pp. 875-882.
  8. N. Morgan and H. Bourlard, "Generalization and parameter estimation in feedforward nets: Some experiments", Advances in Neural Information Processing Systems 5, San Mateo, CA, Morgan Kaufman Publishers Inc, 1990, pp. 630-637.
  9. R. Russel, "Pruning algorithms a survey", IEEE Transactions on Neural Networks, Vol. 4, No. 5, pp.740-746, Sep. 1993. https://doi.org/10.1109/72.248452
  10. http://www.cis.pku.edu.cn/faculty/vision/zlin/1983 A Method of Solving a Convex Programming Problem with Convergence Rate O(k^(-2))_Nesterov.pdf
  11. B. Polyak "Some methods of speeding up the convergence of iteration methods", USSR Computational Mathematics and Mathematical Physics, Vol. 4, Issue 5, pp. 1-17. 1964. https://doi.org/10.1016/0041-5553(64)90137-5
  12. Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function", http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.332.931&rep=rep1&type=pdf.
  13. G, Hinton, "Neural Networks for Machine Learning", http://goo.gl/RsQeis; video: https://goo.gl/XUbIyJ.
  14. D. P. KIngma, "ADAM: A Method for Stochastic Optimization," https://arxiv.org/pdf/14126980.pdf.