A MEMORY EFFICIENT INCREMENTAL GRADIENT METHOD FOR REGULARIZED MINIMIZATION

Title & Authors
A MEMORY EFFICIENT INCREMENTAL GRADIENT METHOD FOR REGULARIZED MINIMIZATION
Yun, Sangwoon;

Abstract
In this paper, we propose a new incremental gradient method for solving a regularized minimization problem whose objective is the sum of m smooth functions and a (possibly nonsmooth) convex function. This method uses an adaptive stepsize. Recently proposed incremental gradient methods for a regularized minimization problem need O(mn) storage, where n is the number of variables. This is the drawback of them. But, the proposed new incremental gradient method requires only O(n) storage.
Keywords
Language
English
Cited by
References
1.
D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim. 7 (1997), no. 4, 913-926.

2.
D. P. Bertsekas, Nonlinear Programming, 2, Athena Scientific, Belmont, MA, 1999.

3.
D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, 1989.

4.
D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), no. 1, 29-51.

5.
P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, Mathematical programming for data mining: formulations and challenges, INFORMS J. Comput. 11 (1999), no. 3, 217-238.

6.
S. Chen, D. Donoho, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33-61.

7.
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, 2000.

8.
I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (2004), no. 11, 1413-1457.

9.
J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized lienar models via coordinate descent, Report, Department of Statistics, Stanford University, Stanford, May 2009.

10.
A. A. Gaivoronski, Convergence properties of back-propagation for neural nets via theory of stochastic gradient methods. Part I, Optim. Methods Softw. 4 (1994), 117-134.

11.
L. Grippo, A class of unconstrained minimization methods for neural network training, Optim. Methods Softw. 4 (1994), 135-150.

12.
C.-H. Ho and C.-J. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012), 3323-3348.

13.
A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, Stochastic approximation approach to stochastic programming, SIAM J. Optim. 19 (2009), 1574-1609.

14.
K. Koh, S.-J. Kim, and S. Boyd, An interior-point method for large-scale ℓ1-regularized logistic regression, J. Mach. Learn. Res. 8 (2007), 1519-1555.

15.
S. Lee, H. Lee, P. Abeel, and A. Ng, Efficient ${\ell}1$-regularized logistic regression, In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.

16.
Z.-Q. Luo and P. Tseng, Analysis of an approximate gradient projection method with applications to the backpropagation algorithm, Optim. Methods Softw. 4 (1994), 85-101.

17.
O. L. Mangasarian and D. R. Musicant, Large scale kernel regression via linear pro-gramming, Mach. Learn. 46 (2002), 255-269.

18.
O. L. Mangasarian and M. V. Solodov, Serial and parallel backpropagation convergence via nonmonotone perturbed minimization, Optim. Methods Softw. 4 (1994), 103-116.

19.
Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221-259.

20.
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.

21.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing-Explorations in the Microstructure of Cognition, edited by Rumelhart and McClelland, 318-362, MIT press, Cambridge, 1986.

22.
S. Sardy and P. Tseng, AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets, J. Comput. Graph. Statist. 13 (2004), no. 2, 283-309.

23.
R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267-288.

24.
P. Tseng, On the rate of convergence of a partially asynchronous gradient projection algorithm, SIAM J. Optim. 1 (1991), no. 4, 603-619.

25.
P. Tseng, and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), no. 1-2, 387-423.

26.
P. Tseng, and S. Yun, Incrementally updated gradient methods for constrained and regularized opti-mization, J. Optim. Theory Appl. 160 (2014), no. 3, 832-853.

27.
V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 2000.

28.
L. Wang, Efficient regularized solution path algorithms with applications in machine learning and data mining, Ph.D thesis, University of Michigan, 2008.

29.
H. White, Learning in artificial neural networks: a statistical perspective, Neural Com-put. 1 (1989), 425-464.

30.
H. White, Some asymptotic results for learning in single hidden-layer feedforward network models, J. Amer. Statist. Assoc. 84 (1989), no. 408, 1003-1013.

31.
L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), 2543-2596.