DOI QR코드

DOI QR Code

BCDR algorithm for network estimation based on pseudo-likelihood with parallelization using GPU

유사가능도 기반의 네트워크 추정 모형에 대한 GPU 병렬화 BCDR 알고리즘

  • Received : 2016.02.25
  • Accepted : 2016.03.23
  • Published : 2016.03.31

Abstract

Graphical model represents conditional dependencies between variables as a graph with nodes and edges. It is widely used in various fields including physics, economics, and biology to describe complex association. Conditional dependencies can be estimated from a inverse covariance matrix, where zero off-diagonal elements denote conditional independence of corresponding variables. This paper proposes a efficient BCDR (block coordinate descent with random permutation) algorithm using graphics processing units and random permutation for the CONCORD (convex correlation selection method) based on the BCD (block coordinate descent) algorithm, which estimates a inverse covariance matrix based on pseudo-likelihood. We conduct numerical studies for two network structures to demonstrate the efficiency of the proposed algorithm for the CONCORD in terms of computation times.

그래피컬 모형은 변수들 사이의 조건부 종속성을 노드와 연결선을 통하여 그래프로 나타낸다. 변수들 사이의 복잡한 연관성을 표현하기 위하여 그래피컬 모형은 물리학, 경제학, 생물학을 포함하여 다양한 분야에 적용되고 있다. 조건부 종속성은 공분산 행렬의 역행렬의 비대각 성분이 0인 것과 대응하는 두 변수의 조건부 독립이 동치임에 기반하여 공분산 행렬의 역행렬로부터 추정될 수 있다. 본 논문은 공분산 행렬의 역행렬을 희박하게 추정하는 유사가능도 기반의 CONCORD (convex correlation selection method) 방법에 대하여 기존의 BCD (block coordinate descent) 알고리즘을 랜덤 치환을 활용한 갱신 규칙과 그래픽 처리 장치 (graphics processing unit)의 병렬 연산을 활용하여 고차원 자료에 대하여 보다 효율적인 BCDR (block coordinate descent with random permutation) 알고리즘을 제안하였다. 두 종류의 네트워크 구조를 고려한 모의실험에서 제안하는 알고리즘의 효율성을 수렴까지의 계산 시간을 비교하여 확인하였다.

Keywords

References

  1. Banerjee, O., Ghaoui, L. E. and d'Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. Journal of Machine Learning Research, 9, 485-516.
  2. Barabasi, A. and Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509-512. https://doi.org/10.1126/science.286.5439.509
  3. Boyd, S. and Vandenberghe, L. (2004). Convex optimization, Cambridge University Press, New York.
  4. Cai, T., Liu, W. D. and Luo, X. (2011). A constrained ${\ell}_1$ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594-607. https://doi.org/10.1198/jasa.2011.tm10155
  5. Candes, E. J. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313-2351. https://doi.org/10.1214/009053606000001523
  6. Candes, E. J. and Plan, Y. (2011). A probabilistic and RIPless theory of compressed sensing. Information Theory, IEEE Transactions, 57, 7235-7254. https://doi.org/10.1109/TIT.2011.2161794
  7. Dong, H., Luo, L., Hong, S., Siu, H., Xiao, Y., Jin, L., Chen, R. and Xiong, M. (2010). Integrated analysis of mutations, miRNA and mRNA expression in glioblastoma. BMC Systems Biology, 4, 1-20. https://doi.org/10.1186/1752-0509-4-1
  8. Drton, M. and Perlman, M. D. (2004). Model selection for Gaussian concentration graphs. Biometrika, 91, 591-602. https://doi.org/10.1093/biomet/91.3.591
  9. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441. https://doi.org/10.1093/biostatistics/kxm045
  10. Fu, W. (1998). Penalized regressions: The bridge vs the lasso. Journal of Computational and Graphical Statistics, 7, 397-416.
  11. Khare, K., Oh, S.-Y. and Rajaratnam, B. (2015). A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. Journal of the Royal Statistical Society B, 77, 803-825. https://doi.org/10.1111/rssb.12088
  12. Kwon, S., Han, S. and Lee, S. (2013). A small review and further studies on the LASSO. Journal of the Korean Data & Information Science Society, 24, 1077-1088. https://doi.org/10.7465/jkdi.2013.24.5.1077
  13. Lauritzen, S. (1996). Graphical Models. Oxford Unversity Press Inc., New York.
  14. Meinshausen, N. and Buhlmann, P. (2006). High-dimensional graph and variable selection with the lasso. Annals of Statistics, 34, 1436-1462. https://doi.org/10.1214/009053606000000281
  15. Nesterov, Y. (2012). Efficiency of coordinate descent methods on huge-scale optimizatioin problems. SIAM Journal on Optimization, 22, 341-362. https://doi.org/10.1137/100802001
  16. Pang, H., Liu, H. and Vanderbei, R. (2014). The FASTCLIME package for linear programming and largescale precision matrix estimation in R. Journal of Machine Learning Research, 15, 489-493.
  17. Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by Joint sparse regression models. Journal of the American Statistical Association, 104, 735-746. https://doi.org/10.1198/jasa.2009.0126
  18. Shalev-Shwartz, S. and Tewari, A. (2011). Stochastic Methods for ℓ1-regularized loss minimization. Journal of Machine Learning Research, 12, 1865-1892.
  19. Tang, H., Xiao, G., Behrens, C., Schiller, J., Allen, J., Chow, C. W., Suraokar, M., Corvalan, A., Mao, J., White, M. A., Wistuba, I. I., Minna, J. D. and Xie, Y. (2013). A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients. Clinical Cancer Research. 19, 1577-1586. https://doi.org/10.1158/1078-0432.CCR-12-2321
  20. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267-288.
  21. Vandenberghe, L., Boyd, S. and Wu, S. P. (1998). Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19, 499-533. https://doi.org/10.1137/S0895479896303430
  22. Witten, D., Friedman, J. and Simon, N. (2011). New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, 20, 892-900. https://doi.org/10.1198/jcgs.2011.11051a
  23. Yu, D. and Lim, J. (2013). Introduction to general purpose GPU computing. Journal of the Korean Data & Information Science Society, 24, 1043-1061. https://doi.org/10.7465/jkdi.2013.24.5.1043
  24. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19-35. https://doi.org/10.1093/biomet/asm018