DOI QR코드

DOI QR Code

A study on complexity of deep learning model

딥러닝 모형의 복잡도에 관한 연구

  • Kim, Dongha (Department of Statistics, Seoul National University) ;
  • Baek, Gyuseung (Department of Statistics, Seoul National University) ;
  • Kim, Yongdai (Department of Statistics, Seoul National University)
  • Received : 2017.10.31
  • Accepted : 2017.11.23
  • Published : 2017.11.30

Abstract

Deep learning has been studied explosively and has achieved excellent performance in areas like image and speech recognition, the application areas in which computations have been challenges with ordinary machine learning techniques. The theoretical study of deep learning has also been researched toward improving the performance. In this paper, we try to find a key of the success of the deep learning in rich and efficient expressiveness of the deep learning function, and analyze the theoretical studies related to it.

딥러닝은 영상 인식, 음성 인식 등 기존의 머신 러닝 기법들로 해결이 어려웠던 분야에서 매우 우수한 성능을 보였고, 그로 인해 딥러닝의 폭발적인 연구의 증가가 있었다. 좋은 성능을 보이는 모형 및 모수 추정 방법에 대한 연구들이 주를 이루고 있는 현 흐름 속에서 딥러닝의 이론적인 연구 또한 조심스럽게 진행되고 있다. 본 논문에서는 딥러닝의 성공을 딥러닝 함수가 복잡한 함수를 효율적으로 잘 표현할 수 있음에 해답을 찾고, 이에 관련된 이론적인 연구들을 조사하여 분석하고자 한다.

Keywords

Acknowledgement

Supported by : 삼성미래기술육성재단

References

  1. Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
  2. Clevert, D., Unterthiner, T. and Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289.
  3. Cybenko G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2, 303-314. https://doi.org/10.1007/BF02551274
  4. Eldan, R. and Shamir, O. (2016). The power of depth for feedforward neural networks. Conference on Learning Theory, 907-940.
  5. He, K., Zhang, X., Ren, S. and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, 1026-1034.
  6. He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778.
  7. Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504-507. https://doi.org/10.1126/science.1127647
  8. Hinton, G. E., Osindero, S. and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554. https://doi.org/10.1162/neco.2006.18.7.1527
  9. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  10. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  11. Hornik, K., Stinchcombe, Ma. and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366. https://doi.org/10.1016/0893-6080(89)90020-8
  12. Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
  13. Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  14. Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 1097-1105.
  15. Larochelle, H., Erhan, D., Courville, A., Bergstra, J. and Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th International Conference on Machine Learning, 473-480.
  16. LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791
  17. Lee, K. J., Lee, H. J. and Oh, K. J. (2015). Using fuzzy-neural network to predict hedge fund survival. Journal of the Korean Data & Information Science Society, 26, 1189-1198. https://doi.org/10.7465/jkdi.2015.26.6.1189
  18. Lee, W. (2017). A deep learning analysis of the KOSPI’s directions. Journal of the Korean Data & Information Science Society, 28, 287-295. https://doi.org/10.7465/jkdi.2017.28.2.287
  19. Lee, W. and Chun, H. (2016). A deep learning analysis of the Chinese Yuan’s volatility in the onshore and offshore markets. Journal of the Korean Data & Information Science Society, 27, 327-335. https://doi.org/10.7465/jkdi.2016.27.2.327
  20. Maas, A. L., Hannun, A. Y. and Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. Proceedings of the 30th International Conference on Machine Learning, 30.
  21. Mikolov, T., Karafiat, M., Burget, L., Cernock'y, J. and Khudanpur, S. (2010). Recurrent neural network based language model. Interspeech, 2.
  22. Miotto, R., Wang, F., Wang, S., Jiang, X. and Dudley, J. T. (2017). Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics.
  23. Montufar, G. F., Pascanu, R., Cho, K. and Bengio, Y. (2014). On the number of linear regions of deep neural networks. Advances in Neural Information Processing Systems, 2924-2932.
  24. Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on Machine Learning, 807-814.
  25. Oord, A., and Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
  26. Pascanu, R., Montufar, G. and Bengio, Y. (2013). On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098.
  27. Raghu, M., Poole, B., Kleinberg, J., Ganguli, S. and Sohl-Dickstein, J. (2016). On the expressive power of deep neural networks. arXiv preprint arXiv:1606.05336.
  28. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M. and others. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484-489. https://doi.org/10.1038/nature16961
  29. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A. and others. (2017). Mastering the game of go without human knowledge. Nature, 550, 354-359. https://doi.org/10.1038/nature24270
  30. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. Colorado University at Boulder Department of Computer Science.
  31. Sutskever, I., Vinyals, O and Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 3104-3112.
  32. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9.
  33. Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural Networks for Machine Learning, 4.
  34. Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.

Cited by

  1. Finding the Optimal Data Classification Method Using LDA and QDA Discriminant Analysis vol.13, pp.4, 2020, https://doi.org/10.13160/ricns.2020.13.4.132