DOI QR코드

DOI QR Code

A Deep Neural Network Model Based on a Mutation Operator

돌연변이 연산 기반 효율적 심층 신경망 모델

  • 전승호 (고려대학교 정보보호학과) ;
  • 문종섭 (고려대학교 전자 및 정보공학과)
  • Received : 2017.09.01
  • Accepted : 2017.10.19
  • Published : 2017.12.31

Abstract

Deep Neural Network (DNN) is a large layered neural network which is consisted of a number of layers of non-linear units. Deep Learning which represented as DNN has been applied very successfully in various applications. However, many issues in DNN have been identified through past researches. Among these issues, generalization is the most well-known problem. A Recent study, Dropout, successfully addressed this problem. Also, Dropout plays a role as noise, and so it helps to learn robust feature during learning in DNN such as Denoising AutoEncoder. However, because of a large computations required in Dropout, training takes a lot of time. Since Dropout keeps changing an inter-layer representation during the training session, the learning rates should be small, which makes training time longer. In this paper, using mutation operation, we reduce computation and improve generalization performance compared with Dropout. Also, we experimented proposed method to compare with Dropout method and showed that our method is superior to the Dropout one.

심층 신경망은 많은 노드의 층을 쌓아 만든 거대한 신경망이다. 심층 신경망으로 대표되는 딥 러닝은 오늘날 많은 응용 분야에서 괄목할만한 성과를 거두고 있다. 하지만 다년간의 연구를 통해 심층 신경망에 대한 다양한 문제점이 식별되고 있다. 이 중 일반화는 가장 널리 알려진 문제점들 중 하나이며, 최근 연구 결과인 드롭아웃은 이러한 문제를 어느 정도 성공적으로 해결하였다. 드롭아웃은 노이즈와 같은 역할을 하여 신경망이 노이즈에 강건한 데이터 표현형을 학습할 수 있도록 하는데, 오토인코더와 관련된 연구에서 이러한 효과가 입증되었다. 하지만 드롭아웃은 빈번한 난수 연산과 확률연산으로 인해 신경망의 학습 시간이 길어지고, 신경망 각 계층의 데이터 분포가 크게 변화하여 작은 학습율을 사용해야하는 단점이 있다. 본 논문에서는 돌연변이 연산을 사용하여 비교적 적은 횟수의 연산으로 드롭아웃과 동등 이상의 성능을 나타내는 모델을 제시하고, 실험을 통하여 논문에서 제시한 방법이 드롭아웃 방식과 동등한 성능을 보임과 동시에 학습 시간 문제를 개선함을 보인다.

Keywords

References

  1. Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning," Nature, Vol.521, pp.436-444, 2015. https://doi.org/10.1038/nature14539
  2. E. Levin, N. Tishby, and S. Solla, "A statistical approach to learning and generalization in layered neural networks," Proceedings of the IEEE, Vol.78, No.10, pp.1568-1574, 1990 https://doi.org/10.1109/5.58339
  3. C. M. Bishop, "Neural Network for Pattern Recognition," Oxford University Press, pp.332-380, 1995.
  4. L. Bottou, "Stochastic gradient learning in neural networks," Proceedings of Neuro-Nimes, Vol.91, No.8, 1991.
  5. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, Vol.15, No.1, pp.1929-1958, 2014.
  6. S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," In International Conference on Machine Learning, pp.448-456, 2015.
  7. M. Mitchell, "Genetic Algorithms: An Overview," Complexity, Vol.1, No.1, pp.31-39, 1995. https://doi.org/10.1002/cplx.6130010108
  8. L. K. Hansen and P Salamon, "Neural Network Ensembles," IEEE Trans, Pattern Analysis and Machine Intelligence, Vol. 12, No.10, pp.993-1001, 1990. https://doi.org/10.1109/34.58871
  9. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and Composing Robust Features with Denoising Autoencoders," In International Conference on Machine Learning, pp.1096-1103, 2008.
  10. P. Vincent, H. Larochelle, L. Lajoie, Y. Bengio, and P. A. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," In International Conference on Machine Learning, pp.3371-3408, 2010.
  11. R. S. Sutton and A. G. Barto, "Reinforcement Learning: An Introduction," Cambridge: MIT Press, pp.25-42, 2012.
  12. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
  13. Tieleman, Tijmen and G. Hinton, "Lecture 6.5: RMSProp-Divide the gradient by a running average of its recent magnitude," In COURSERA: Neural Networks for Machine Learning, 2012.
  14. Y. LeCun, C. Cortes, and C. J. C. Burges, "The Mnist Database of handwritten digits" [Internet], http://yann.lecun.com/exdb/mnist/.
  15. X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectified Neural Networks," In Conference on Artificial Intelligence and Statistics, 2011.