Adaptive stochastic gradient method under two mixing heterogenous models

Moon, Sang Jun;Jeon, Jong-June;

doi:10.7465/jkdi.2017.28.6.1245

Journal of the Korean Data and Information Science Society

Volume 28 Issue 6
/
Pages.1245-1255
/
2017
/
1598-9402(pISSN)

The Korean Data and Information Science Society (한국데이터정보과학회)

DOI QR Code

Adaptive stochastic gradient method under two mixing heterogenous models

두 이종 혼합 모형에서의 수정된 경사 하강법

Moon, Sang Jun (Department of Statistics, University of Seoul) ;
Jeon, Jong-June (Department of Statistics, University of Seoul)

문상준 (서울시립대 통계학과) ;
전종준 (서울시립대 통계학과)

Received : 2017.10.31
Accepted : 2017.11.14
Published : 2017.11.30

https://doi.org/10.7465/jkdi.2017.28.6.1245 Citation KSCI

⟨ Previous Next ⟩

Abstract

The online learning is a process of obtaining the solution for a given objective function where the data is accumulated in real time or in batch units. The stochastic gradient descent method is one of the most widely used for the online learning. This method is not only easy to implement, but also has good properties of the solution under the assumption that the generating model of data is homogeneous. However, the stochastic gradient method could severely mislead the online-learning when the homogeneity is actually violated. We assume that there are two heterogeneous generating models in the observation, and propose the a new stochastic gradient method that mitigate the problem of the heterogeneous models. We introduce a robust mini-batch optimization method using statistical tests and investigate the convergence radius of the solution in the proposed method. Moreover, the theoretical results are confirmed by the numerical simulations.

온라인 학습은 자료가 실시간으로 혹은 배치 단위로 축적되는 상황에서 주어진 목적함수의 해를 계산하는 방법을 말한다. 온라인 학습 알고리즘 중 배치를 이용한 확률적 경사 하강법 (stochastic gradient decent method)은 가장 많이 사용되는 방법 중 하나다. 이 방법은 구현이 쉬울 뿐만 아니라 자료가 동질적인 분포를 따른다는 가정 하에서 그 해의 성질이 잘 연구되어 있다. 하지만 자료에 특이값이 있거나 임의의 배치가 확률적으로 이질적 성질을 가질 때, 확률적 경사 하강법이 주는 해는 큰 편이를 가질 수 있다. 본 연구에서는 이러한 비정상 배치 (abnormal batch) 있는 자료 하에서 효과적으로 온라인 학습을 수행할 수 있는 수정된 경사 하강 알고리즘 (modified gradient decent algorithm)을 제안하고, 그 알고리즘을 통해 계산된 해의 수렴성을 밝혔다. 뿐만 아니라 간단한 모의실험을 통해 제안한 방법의 이론적 성질을 실증하였다.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea(NRF)

References

Boyd, S. and Lieven, V. (2004). Convex optimization. Cambridge university press, 466-468.
Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. Proceedings of COMPSTAT' 2010, 177-186.
Dekel, O., Gilad-Bachrach, R., Shamir, O. and Xiao, L. (2012). Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13, 165-202.
Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159.
Hwang, C. and Shim, J. (2016). Deep LS-SVM for regression. Journal of the Korean Data & Information Science Society, 27, 827-833. https://doi.org/10.7465/jkdi.2016.27.3.827
Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Konecny, J., Liu, J., Richtarik, P. and Takac, M. (2016). Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE Journal of Selected Topics in Signal Processing, 10, 242-255. https://doi.org/10.1109/JSTSP.2015.2505682
LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature, 521, 436-444. https://doi.org/10.1038/nature14539
Lee, W. and Chun, H. (2016). A deep learning analysis of the Chinese Yuans volatility in the onshore and offshore markets. Journal of the Korean Data & Information Science Society, 27, 327-335. https://doi.org/10.7465/jkdi.2016.27.2.327
Li, M., Zhang, T., Chen, Y. and Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-538. https://doi.org/10.1038/323533a0
Shapiro, A. and Wardi, Y. (1996). Convergence analysis of gradient descent stochastic algorithms. Journal of optimization theory and applications, 91, 439-454. https://doi.org/10.1007/BF02190104
Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4.2, 26-31
Yamanishi, K., Takeuchi, J. I., Williams, G. and Milne, P. (2004). On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8, 275-300. https://doi.org/10.1023/B:DAMI.0000023676.72185.7c
Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

Journal of the Korean Data and Information Science Society

Adaptive stochastic gradient method under two mixing heterogenous models

두 이종 혼합 모형에서의 수정된 경사 하강법

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)