DOI QR코드

DOI QR Code

Implementation of Efficient Power Method on CUDA GPU

CUDA 기반 GPU에서 효율적인 Power Method의 구현

  • 김정환 (건국대학교 컴퓨터응용과학부) ;
  • 김진수 (건국대학교 컴퓨터응용과학부)
  • Received : 2010.11.19
  • Accepted : 2010.12.27
  • Published : 2011.02.28

Abstract

GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

Acknowledgement

Supported by : 건국대학교

References

  1. John Nickolls and William J. Dally "The GPU Computing Era," IEEE Micro, Vol. 30, Issue 2, March-April 2010.
  2. Tom R. Halfhill, "Parallel Processing with CUDA," Microprocessor Report, Jan. 2008.
  3. NVIDIA CUDA C Programming Guide, Ver. 3.1.1, Nvidia, July 2010.
  4. T. Brandvik and G. Pullan, "Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware," Proc. 48th AIAA Aerospace Sciences Meeting and Exhibit, AIAA Press, 2008.
  5. J.A. Anderson, C.D. Lorenz and A. Travesset, "General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units," J. Computational Physics, Vol. 227, No. 10, May 2008.
  6. S. Ryoo et al., "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA,'' Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 2008.
  7. S.A. Johnson et al., Apparatus and Method for Imaging Objects with Wavefields, US patent 6,636,584, Patent and Trademark Office, 2003.
  8. ju Hwan Kim, Koojoo Kwon, Byeong-Seok Shin, "Large-Scale Ultrasound Volume Rendering using Bricking",Korea Society of Computer Information,No13(7) pp117-126,Dec. 2008
  9. Chinmay Karande, Kumar Chellapilla and Reid Andersen, "Speeding up Algorithms on Compressed Web Graphs," Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009.
  10. S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems Vol. 33, No. 3, pp.107-117, 1998.
  11. Tianji Wu, Bo Wang, Yi Shan, Feng Yan, Yu Wang and Ningyi Xu, "Efficient PageRank and SpMV Computation on AMD GPUs," 39th International Conference on Parallel Processing, 2010.
  12. Imran Patel and John R. Gilbert, "An Empirical Study of the Performance and Productivity of Two Parallel Programming Models," IEEE International Symposium on Parallel and Distributed Processing, 2008.
  13. Brian Bradie, A Friendly Introduction to Numerical Analysis, Pearson Prentice Hall, 2006.
  14. J. D. Z. Bai, J. Dongarra, A. Ruhe and H. van der Vorst, "Templates for the solution of algebraic eigenvalue problems: A practical guide," In Society for Industrial and Applied Mathematics, 2000.
  15. Eun-jin Im,"An Efficient Computation of Matrix Triple Products",Korea Society of Computer Information,No11(3) pp141-149,