DOI QR코드

DOI QR Code

ADMM algorithms in statistics and machine learning

통계적 기계학습에서의 ADMM 알고리즘의 활용

  • Choi, Hosik (Department of Applied Statistics, Kyonggi University) ;
  • Choi, Hyunjip (Department of Applied Statistics, Kyonggi University) ;
  • Park, Sangun (Department of Management Information System, Kyonggi University)
  • 최호식 (경기대학교 응용통계학과) ;
  • 최현집 (경기대학교 응용통계학과) ;
  • 박상언 (경기대학교 경영정보학과)
  • Received : 2017.10.31
  • Accepted : 2017.11.21
  • Published : 2017.11.30

Abstract

In recent years, as demand for data-based analytical methodologies increases in various fields, optimization methods have been developed to handle them. In particular, various constraints required for problems in statistics and machine learning can be solved by convex optimization. Alternating direction method of multipliers (ADMM) can effectively deal with linear constraints, and it can be effectively used as a parallel optimization algorithm. ADMM is an approximation algorithm that solves complex original problems by dividing and combining the partial problems that are easier to optimize than original problems. It is useful for optimizing non-smooth or composite objective functions. It is widely used in statistical and machine learning because it can systematically construct algorithms based on dual theory and proximal operator. In this paper, we will examine applications of ADMM algorithm in various fields related to statistics, and focus on two major points: (1) splitting strategy of objective function, and (2) role of the proximal operator in explaining the Lagrangian method and its dual problem. In this case, we introduce methodologies that utilize regularization. Simulation results are presented to demonstrate effectiveness of the lasso.

최근 여러 분야에서 데이터에 근거한 분석방법론에 대한 수요가 증대됨에 따라 이를 처리할 수 있는 최적화 방법이 발전되고 있다. 특히 통계학과 기계학습 분야의 문제들에서 요구되는 다양한 제약 조건은 볼록 최적화 (convex optimization) 방법으로 해결할 수 있다. 본 논문에서 리뷰하는 alternating direction method of multipliers (ADMM) 알고리즘은 선형 제약 조건을 효과적으로 처리할 수 있으며, 합의 방식을 통해 병렬연산을 수행할 수 있어서 범용적인 표준 최적화 툴로 자리매김 되고 있다. ADMM은 원래의 문제보다 최적화가 쉬운 부분문제로 분할하고 이를 취합함으로써 복잡한 원 문제를 해결하는 방식의 근사알고리즘이다. 부드럽지 않거나 복합적인 (composite) 목적 함수를 최적화할 때 유용하며, 쌍대이론과 proximal 작용소 이론을 토대로 체계적으로 알고리즘을 구성할 수 있기 때문에 통계 및 기계학습 분야에서 폭 넓게 활용되고 있다. 본 논문에서는 최근 통계와 관련된 여러 분야에서 ADMM알고리즘의 활용도를 살펴보고자 하며 주요한 두 가지 주제에 중점을 두고자 한다. (1) 목적식의 분할 전략과 증강 라그랑지안 방법 및 쌍대문제의 설명과 (2) proximal 작용소의 역할이다. 알고리즘이 적용된 사례로, 별점화 함수 추정 등의 조정화 (regularization)를 활용한 방법론들을 소개한다. 모의 자료를 활용하여 lasso 문제의 최적화에 대한 실증결과를 제시한다.

Keywords

References

  1. Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183-202. https://doi.org/10.1137/080716542
  2. Bertsekas, D. P. (2003). Nonlinear Programming. 2nd edition, Athena Scientific.
  3. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Optimization, 3, 1-122.
  4. Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
  5. Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. (2012). Smoothing proximal gradient method for general structured sparse regression. Annals of Applied Statistics, 12, 719-752.
  6. Chi, E. C. and Lange, K. (2015). Splitting methods for convex clustering. Journal of Computational and Graphical Statistics, 24, 994-1013. https://doi.org/10.1080/10618600.2014.948181
  7. Choi, H., Koo, J.-Y., and Park, C. (2015). Fused least absolute shrinkage and selection operator for credit scoring. Journal of Statistical Computation and Simulation, 85, 2135-2147. https://doi.org/10.1080/00949655.2014.922685
  8. Choi, H. and Lee, S. (2017). Convex clustering for binary data. Technical Report.
  9. Choi, H. and Park, C. (2016). Clustering analysis of particulate matter data using shrinkage boxplot. Journal of the Korean Data Analysis Society, 18, 2435-2443.
  10. Choi, H., Park, H., and Park, C. (2013). Support vector machines for big data analysis. Journal of the Korean data & Information Science Society, 24, 989-998. https://doi.org/10.7465/jkdi.2013.24.5.989
  11. Davis, D. and Wotao, Y. (2016). Convergence rate analysis of several splitting schemes, Splitting methods in communication, imaging, science, and engineering, Springer International Publishing, 115-163.
  12. Fang, E. X., He, B., Liu, H., and Yuan, X. (2015). Generalized alteranting direction method of multipliers: new theoretical insights and applications. Mathematical Programming Computation, 7, 149-187. https://doi.org/10.1007/s12532-015-0078-2
  13. Forero, P. A., Cano, A., and Giannakis, G. B. (2010). Consensus-based distributed support vector machines. Journal of Machine Learning Research, 6, 2873-2898.
  14. Hastie, T., Tibshirani, and R. Wainwright, M. (2016). Statistical learning with sparsity: The lasso and generalizations, CRC Press.
  15. Hallac, D., Leskovec, J., and Boyd, S. (2015). Network lasso: Clustering and optimization in large graphs. Proceeding KDD '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 387-396.
  16. Hwang, C. and Shim, J. (2017). Geographically weighted least squares-support vector machine. Journal of the Korean Data & Information Science Society, 28, 227-235. https://doi.org/10.7465/jkdi.2017.28.1.227
  17. He, B. and Yuan, X. (2012). On the o(1/n) convergence rate of teh Douglas-Rachford alternating direction method. SIAM Journal on Numerical Analysis, 50, 700-709. https://doi.org/10.1137/110836936
  18. Jeon, J. and Choi, H. (2016). The sparse Luce model. Applied Intelligence, https://doi.org/10.1007/s10489-016-0861-4, In press.
  19. Jeon, J., Kwon, S., and Choi, H. (2017). Homogeneity detection for the high-dimensional generalized linear model. Computational Statistics and Data Analysis, 114, 61-74. https://doi.org/10.1016/j.csda.2017.04.001
  20. Kim, S. and Xing, E. P. (2012). Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. The Annals of Applied Statistics, 6, 1095-1117. https://doi.org/10.1214/12-AOAS549
  21. Lange, K. (2016). MM optimization algorithms. SIAM-Society for Industrial and Applied Mathematics.
  22. Parekh, A. and Selesnick, I. W. (2017). Improved sparse low-rank matrix estimation. arXiv:1605.00042v2.
  23. Parikh, N. and Boyd, S. (2013). Proximal algorithms. Foundations and Trends in Optimization. 1, 123-231.
  24. Park, C., Kim, Y., Kim, J., Song, J., and Choi, H. (2015). Data mining using R. 2nd edition, Kyowoosa.
  25. Polson, N. G., Scott, J. G., and Willard, B. T. (2015). Proximal algorithms in statistics and machine learning. Statistical Science, 30, 559-581. https://doi.org/10.1214/15-STS530
  26. Ramdas, A. and Tibshirani, R. (2016). Fast and flexible admm algorithms for trend filtering. Journal of Computational and Graphical Statistics, 25, 839-858. https://doi.org/10.1080/10618600.2015.1054033
  27. Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T. (2016). Training neural networks without gradients: A scalable ADMM approach. CoRR, arXiv1605.02026.
  28. Tibshirani, R. J., Hoefling, H., and Tibshirani, R. (2011). Nearly-Isotonic regression. Technometrics, 53, 54-61. https://doi.org/10.1198/TECH.2010.10111
  29. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, 67, 91-108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
  30. Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. The Annals of Statistics, 39, 1335-1371. https://doi.org/10.1214/11-AOS878
  31. Tibshirani, R. (2014). Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics, 42, 285-323. https://doi.org/10.1214/13-AOS1189
  32. Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. (2016). Learning structured sparsity in deep neural networks. In Neural Information Processing Systems, 2074-2082.
  33. Xu, Z., Taylor, G., Li, H., Figueiredo, M., Yuan, X., and Goldstein, T. (2017). Adaptive consensus ADMM for distributed optimization. arXiv:1706.02869v2.
  34. Xu, Y., Yin, W., Wen, Z., and Zhang, Y. (2012). An alternating direction algorithm for matrix completion with nonnegative factors. Frontiers of Mathematics in China, 7, 365-384. https://doi.org/10.1007/s11464-012-0194-5
  35. Yang, Y., Sun, J., Li, H., and Xu, Z. (2017). ADMM-Net: A deep learning approach for compressive sensing MRI. CoRR, arXiv:1705.06869.
  36. Yin, W., Osher, S., Goldfarb, D., and Darbon, J. (2008). Bregman iterative algorithms for l1-minimization with applications to compressed sensing. SIAM Journal on Imaging Sciences, 1, 143-168. https://doi.org/10.1137/070703983
  37. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  38. Yan, X. and Bien, J. (2015). Hierarchical sparse modeling: A choice of two group lasso formulations. Technical Report.
  39. Yu, G. and Liu, Y. (2016). Sparse regression incorporating graphical structure among predictors. Journal of the American Statistical Association, 111, 707-720. https://doi.org/10.1080/01621459.2015.1034319
  40. Zhang, X., Burger, M., Bresson, X., and Osher, S. (2010). Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM Journal of Imaging Science, 3, 253-276. https://doi.org/10.1137/090746379
  41. Zhang, X., Burger, M., and Osher, S. (2011). A unified primal-dual algorithm framework based on Bregman iteration. Journal of Scientific Computing, 46, 20-46. https://doi.org/10.1007/s10915-010-9408-8
  42. Zhao, P., Rocha, G., and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37, 3468-3497. https://doi.org/10.1214/07-AOS584
  43. Zhu, Y. (2017). An augmented ADMM algorithm with application to the generalized lasso Problem. Journal of Computational and Graphical Statistics, 26, 195-204. https://doi.org/10.1080/10618600.2015.1114491