DOI QR코드

DOI QR Code

A STOCHASTIC VARIANCE REDUCTION METHOD FOR PCA BY AN EXACT PENALTY APPROACH

  • Jung, Yoon Mo (Department of Mathematics, Sungkyunkwan University) ;
  • Lee, Jae Hwa (Applied Algebra and Optimization Research Center, Sungkyunkwan University) ;
  • Yun, Sangwoon (Department of Mathematics Education, Sungkyunkwan University)
  • Received : 2017.09.05
  • Accepted : 2018.03.08
  • Published : 2018.07.31

Abstract

For principal component analysis (PCA) to efficiently analyze large scale matrices, it is crucial to find a few singular vectors in cheaper computational cost and under lower memory requirement. To compute those in a fast and robust way, we propose a new stochastic method. Especially, we adopt the stochastic variance reduced gradient (SVRG) method [11] to avoid asymptotically slow convergence in stochastic gradient descent methods. For that purpose, we reformulate the PCA problem as a unconstrained optimization problem using a quadratic penalty. In general, increasing the penalty parameter to infinity is needed for the equivalence of the two problems. However, in this case, exact penalization is guaranteed by applying the analysis in [24]. We establish the convergence rate of the proposed method to a stationary point and numerical experiments illustrate the validity and efficiency of the proposed method.

Keywords

References

  1. Z. Allen-Zhu and E. Hazan, Variance Reduction for Faster Non-Convex Optimization, Preprint arXiv:1603.05643, 2016.
  2. Z. Allen-Zhu and Y. Li, LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain, Preprint arXiv:1607.03463v2, 2017.
  3. Z. Allen-Zhu and Y. Yuan, Improved SVRG for Non-Strongly-Convex or Sum-of-NonConvex Objectives, Preprint arXiv:1506.01972v3, 2016.
  4. J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal. 8 (1988), no. 1, 141-148. https://doi.org/10.1093/imanum/8.1.141
  5. L. Bottou, F. E. Curtis, and J. Nocedal, Optimization Methods for Large-Scale Machine Learning, Preprint arXiv:1606.04838v1, 2016.
  6. J. P. Cunningham and Z. Ghahramani, Linear dimensionality reduction: survey, insights, and generalizations, J. Mach. Learn. Res. 16 (2015), 2859-2900.
  7. D. Garber and E. Hazan, Fast and Simple PCA via Convex Optimization, Preprint arXiv:1509.05647v4, 2015.
  8. G. H. Golub and C. F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013.
  9. R. A. Horn and C. R. Johnson, Matrix Analysis, second edition, Cambridge University Press, Cambridge, 2013.
  10. B. Jiang, C. Cui, and Y.-H. Dai, Unconstrained optimization models for computing several extreme eigenpairs of real symmetric matrices, Pac. J. Optim. 10 (2014), no. 1, 53-71.
  11. R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (2013), 315-323.
  12. I. T. Jolliffe, Principal Component Analysis, second edition, Springer Series in Statistics, Springer-Verlag, New York, 2002.
  13. H. Kasai, H. Sato, and B. Mishra, Riemannian stochastic variance reduced gradient on Grassmann manifold, Preprint arXiv:1605.07367v3, 2017.
  14. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), no. 11, 2278-2324. https://doi.org/10.1109/5.726791
  15. X. Liu, Z. Wen, and Y. Zhang, Limited memory block Krylov subspace optimization for computing dominant singular value decompositions, SIAM J. Sci. Comput. 35 (2013), no. 3, A1641-A1668. https://doi.org/10.1137/120871328
  16. S. J. Reddi, A. Hefny, S. Sra, and B. Poczos, Stochastic Variance Reduction for Nonconvex Optimization, Preprint arXiv:1603.06160v2, 2016.
  17. Y. Saad, Numerical methods for large eigenvalue problems, revised edition of the 1992 original, Classics in Applied Mathematics, 66, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2011.
  18. M. Schmidt, N. Le Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program. 162 (2017), no. 1-2, Ser. A, 83-112. https://doi.org/10.1007/s10107-016-1030-6
  19. S. Shalev-Shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, J. Mach. Learn. Res. 14 (2013), 567-599.
  20. O. Shamir, A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate, In The 32nd International Conference on Machine Learning (ICML 2015), 2015.
  21. O. Shamir, Fast stochastic algorithms for SVD and PCA: convergence properties and convexity, Preprint arXiv:1507.08788v1, 2015.
  22. C. Tan, S. Ma, Y.-H. Dai, and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, Preprint arXiv:1605.04131v2, 2016.
  23. D. S. Watkins, The Matrix Eigenvalue Problem, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2007.
  24. Z. Wen, C. Yang, X. Liu, and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput. 66 (2016), no. 3, 1175-1203. https://doi.org/10.1007/s10915-015-0061-0
  25. J. H. Wilkinson, The Algebraic Eigenvalue Problem, Monographs on Numerical Analysis, The Clarendon Press, Oxford University Press, New York, 1988.
  26. Z. Xu and Y. Ke, Stochastic variance reduced Riemannian eigensolver, Preprint arXiv:1605.08233v2, 2016.
  27. H. Zhang, S. J. Reddi, and S. Sra, Fast stochastic optimization on Riemannian manifolds, Preprint arXiv:1605.07147v2, 2017.