DOI QR코드

DOI QR Code

Adaptive stochastic gradient method under two mixing heterogenous models

두 이종 혼합 모형에서의 수정된 경사 하강법

  • Received : 2017.10.31
  • Accepted : 2017.11.14
  • Published : 2017.11.30

Abstract

The online learning is a process of obtaining the solution for a given objective function where the data is accumulated in real time or in batch units. The stochastic gradient descent method is one of the most widely used for the online learning. This method is not only easy to implement, but also has good properties of the solution under the assumption that the generating model of data is homogeneous. However, the stochastic gradient method could severely mislead the online-learning when the homogeneity is actually violated. We assume that there are two heterogeneous generating models in the observation, and propose the a new stochastic gradient method that mitigate the problem of the heterogeneous models. We introduce a robust mini-batch optimization method using statistical tests and investigate the convergence radius of the solution in the proposed method. Moreover, the theoretical results are confirmed by the numerical simulations.

온라인 학습은 자료가 실시간으로 혹은 배치 단위로 축적되는 상황에서 주어진 목적함수의 해를 계산하는 방법을 말한다. 온라인 학습 알고리즘 중 배치를 이용한 확률적 경사 하강법 (stochastic gradient decent method)은 가장 많이 사용되는 방법 중 하나다. 이 방법은 구현이 쉬울 뿐만 아니라 자료가 동질적인 분포를 따른다는 가정 하에서 그 해의 성질이 잘 연구되어 있다. 하지만 자료에 특이값이 있거나 임의의 배치가 확률적으로 이질적 성질을 가질 때, 확률적 경사 하강법이 주는 해는 큰 편이를 가질 수 있다. 본 연구에서는 이러한 비정상 배치 (abnormal batch) 있는 자료 하에서 효과적으로 온라인 학습을 수행할 수 있는 수정된 경사 하강 알고리즘 (modified gradient decent algorithm)을 제안하고, 그 알고리즘을 통해 계산된 해의 수렴성을 밝혔다. 뿐만 아니라 간단한 모의실험을 통해 제안한 방법의 이론적 성질을 실증하였다.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea(NRF)

References

  1. Boyd, S. and Lieven, V. (2004). Convex optimization. Cambridge university press, 466-468.
  2. Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. Proceedings of COMPSTAT' 2010, 177-186.
  3. Dekel, O., Gilad-Bachrach, R., Shamir, O. and Xiao, L. (2012). Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13, 165-202.
  4. Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159.
  5. Hwang, C. and Shim, J. (2016). Deep LS-SVM for regression. Journal of the Korean Data & Information Science Society, 27, 827-833. https://doi.org/10.7465/jkdi.2016.27.3.827
  6. Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
  7. Konecny, J., Liu, J., Richtarik, P. and Takac, M. (2016). Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE Journal of Selected Topics in Signal Processing, 10, 242-255. https://doi.org/10.1109/JSTSP.2015.2505682
  8. LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature, 521, 436-444. https://doi.org/10.1038/nature14539
  9. Lee, W. and Chun, H. (2016). A deep learning analysis of the Chinese Yuans volatility in the onshore and offshore markets. Journal of the Korean Data & Information Science Society, 27, 327-335. https://doi.org/10.7465/jkdi.2016.27.2.327
  10. Li, M., Zhang, T., Chen, Y. and Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
  11. Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-538. https://doi.org/10.1038/323533a0
  12. Shapiro, A. and Wardi, Y. (1996). Convergence analysis of gradient descent stochastic algorithms. Journal of optimization theory and applications, 91, 439-454. https://doi.org/10.1007/BF02190104
  13. Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4.2, 26-31
  14. Yamanishi, K., Takeuchi, J. I., Williams, G. and Milne, P. (2004). On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8, 275-300. https://doi.org/10.1023/B:DAMI.0000023676.72185.7c
  15. Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701