DOI QR코드

DOI QR Code

Introduction to convolutional neural network using Keras; an understanding from a statistician

  • Received : 2019.07.25
  • Accepted : 2019.10.21
  • Published : 2019.11.30

Abstract

Deep Learning is one of the machine learning methods to find features from a huge data using non-linear transformation. It is now commonly used for supervised learning in many fields. In particular, Convolutional Neural Network (CNN) is the best technique for the image classification since 2012. For users who consider deep learning models for real-world applications, Keras is a popular API for neural networks written in Python and also can be used in R. We try examine the parameter estimation procedures of Deep Neural Network and structures of CNN models from basics to advanced techniques. We also try to figure out some crucial steps in CNN that can improve image classification performance in the CIFAR10 dataset using Keras. We found that several stacks of convolutional layers and batch normalization could improve prediction performance. We also compared image classification performances with other machine learning methods, including K-Nearest Neighbors (K-NN), Random Forest, and XGBoost, in both MNIST and CIFAR10 dataset.

Keywords

References

  1. Ciresan D, Meier U, and Schmidhuber J (2012). Multi-column deep neural networks for image classification, arXiv:1202.2745.
  2. Chollet F (2017). Deep Learning with Python, Manning, New York.
  3. Falbel D, Allaire JJ, and Chollet F. R interface to 'Keras'. https://keras.rstudio.com/index.html
  4. Glorot X and Bengio Y (2010). Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9, 249-256.
  5. Goodfellow I, Bengio Y, and Courville A (2015). Deep Learning, MIT Press, Cambridge.
  6. He K, Zhang X, Ren S, and Sun J (2015). Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, arXiv:1502.01852.
  7. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, and Salakhutdinov RR (2012). Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580.
  8. Hinton G, Srivastava N, and Swersky K (2014). RMSprop: Divide the gradient by a running average of its recent magnitude. Available from: https://www.cs.toronto.edu/-tijmen/csc321/slides/lectureslideslec6.pdf
  9. Huang Y, Cheng Y, Bapna A, et al. (2018). GPipe: efficient training of giant neural networks using pipeline parallelism, arXiv:1811.06965.
  10. Ioffe S and Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, 37, 448-456, arXiv:1502.03167.
  11. Krizhevsky A, Ilya Sutskever, Geoffrey Hinton. (2012). ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25.
  12. Krizhevsky A, Nair V, Hinton G. CIFAR10 and CIFAR100 datasets. Available from: https://www.cs.toronto.edu/ kriz/cifar.html
  13. LeCun Y, Bottou L, Bengio Y, and Haffner P (1998). Gradient-based learning applied to document recognition, Proceedings of the IEEE 86, 2278-2324. https://doi.org/10.1109/5.726791
  14. LeCun Y, Cortes C, and Burges CJC. MNIST handwritten digit database. Available from: http://yann.lecun.com/exdb/mnist/
  15. McCulloch WS and Pitts WH (1943). A logical calculus of the ideas immanent in nervous activity, The Bulletin of Mathematical Biophysics, 5, 115-133. https://doi.org/10.1007/BF02478259
  16. Rosenblatt F (1958). The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review, 65, 386-408. https://doi.org/10.1037/h0042519
  17. Ruder S (2017). An overview of gradient descent optimization algorithms, arXiv:1609.04747.
  18. Rumelhart D, Hinton G, and Williams RJ (1986). Learning representations by back-propagating errors, Nature, 323, 533-536. https://doi.org/10.1038/323533a0
  19. Simonyan K and Zisserman A (2014). Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556.
  20. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, and Salakhutdinov R (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, 15, 1929-1958.
  21. Zhang Y andWallace B (2015). A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. arXiv:1510.03820.