DOI QR코드

DOI QR Code

A MEMORY EFFICIENT INCREMENTAL GRADIENT METHOD FOR REGULARIZED MINIMIZATION

  • Yun, Sangwoon (Department of Mathematics Education, Sungkyunkwan University)
  • Received : 2015.04.09
  • Published : 2016.03.31

Abstract

In this paper, we propose a new incremental gradient method for solving a regularized minimization problem whose objective is the sum of m smooth functions and a (possibly nonsmooth) convex function. This method uses an adaptive stepsize. Recently proposed incremental gradient methods for a regularized minimization problem need O(mn) storage, where n is the number of variables. This is the drawback of them. But, the proposed new incremental gradient method requires only O(n) storage.

Keywords

References

  1. D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim. 7 (1997), no. 4, 913-926. https://doi.org/10.1137/S1052623495287022
  2. D. P. Bertsekas, Nonlinear Programming, 2, Athena Scientific, Belmont, MA, 1999.
  3. D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, 1989.
  4. D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), no. 1, 29-51. https://doi.org/10.1137/040615961
  5. P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, Mathematical programming for data mining: formulations and challenges, INFORMS J. Comput. 11 (1999), no. 3, 217-238. https://doi.org/10.1287/ijoc.11.3.217
  6. S. Chen, D. Donoho, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33-61. https://doi.org/10.1137/S1064827596304010
  7. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, 2000.
  8. I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (2004), no. 11, 1413-1457. https://doi.org/10.1002/cpa.20042
  9. J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized lienar models via coordinate descent, Report, Department of Statistics, Stanford University, Stanford, May 2009.
  10. A. A. Gaivoronski, Convergence properties of back-propagation for neural nets via theory of stochastic gradient methods. Part I, Optim. Methods Softw. 4 (1994), 117-134. https://doi.org/10.1080/10556789408805582
  11. L. Grippo, A class of unconstrained minimization methods for neural network training, Optim. Methods Softw. 4 (1994), 135-150. https://doi.org/10.1080/10556789408805583
  12. C.-H. Ho and C.-J. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012), 3323-3348.
  13. A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, Stochastic approximation approach to stochastic programming, SIAM J. Optim. 19 (2009), 1574-1609. https://doi.org/10.1137/070704277
  14. K. Koh, S.-J. Kim, and S. Boyd, An interior-point method for large-scale ℓ1-regularized logistic regression, J. Mach. Learn. Res. 8 (2007), 1519-1555.
  15. S. Lee, H. Lee, P. Abeel, and A. Ng, Efficient ${\ell}1$-regularized logistic regression, In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
  16. Z.-Q. Luo and P. Tseng, Analysis of an approximate gradient projection method with applications to the backpropagation algorithm, Optim. Methods Softw. 4 (1994), 85-101. https://doi.org/10.1080/10556789408805580
  17. O. L. Mangasarian and D. R. Musicant, Large scale kernel regression via linear pro-gramming, Mach. Learn. 46 (2002), 255-269. https://doi.org/10.1023/A:1012422931930
  18. O. L. Mangasarian and M. V. Solodov, Serial and parallel backpropagation convergence via nonmonotone perturbed minimization, Optim. Methods Softw. 4 (1994), 103-116. https://doi.org/10.1080/10556789408805581
  19. Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221-259. https://doi.org/10.1007/s10107-007-0149-x
  20. R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
  21. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing-Explorations in the Microstructure of Cognition, edited by Rumelhart and McClelland, 318-362, MIT press, Cambridge, 1986.
  22. S. Sardy and P. Tseng, AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets, J. Comput. Graph. Statist. 13 (2004), no. 2, 283-309. https://doi.org/10.1198/1061860043434
  23. R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267-288.
  24. P. Tseng, On the rate of convergence of a partially asynchronous gradient projection algorithm, SIAM J. Optim. 1 (1991), no. 4, 603-619. https://doi.org/10.1137/0801036
  25. P. Tseng, and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), no. 1-2, 387-423. https://doi.org/10.1007/s10107-007-0170-0
  26. P. Tseng, and S. Yun, Incrementally updated gradient methods for constrained and regularized opti-mization, J. Optim. Theory Appl. 160 (2014), no. 3, 832-853. https://doi.org/10.1007/s10957-013-0409-2
  27. V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 2000.
  28. L. Wang, Efficient regularized solution path algorithms with applications in machine learning and data mining, Ph.D thesis, University of Michigan, 2008.
  29. H. White, Learning in artificial neural networks: a statistical perspective, Neural Com-put. 1 (1989), 425-464. https://doi.org/10.1162/neco.1989.1.4.425
  30. H. White, Some asymptotic results for learning in single hidden-layer feedforward network models, J. Amer. Statist. Assoc. 84 (1989), no. 408, 1003-1013. https://doi.org/10.1080/01621459.1989.10478865
  31. L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), 2543-2596.