DOI QR코드

DOI QR Code

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System

CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현

  • Received : 2016.02.01
  • Accepted : 2016.03.22
  • Published : 2016.04.30

Abstract

In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

Keywords

References

  1. J. Lee, N.B. Lakshminarayana, Kim H., R. Vuduc, "Many-thread aware prefetching mech anisms for GPGPU applications," Proceedings of 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 213-224, 2010.
  2. P. Rogers, A.C. Fellow, "Heterogeneous system architecture overview," Proceedings of Hot Chips, Vol. 25. 2013.
  3. J. Power, A. Basu, J. Gu, S. Puthoor, B.M. Beckmann, M.D. Hill, D.A. Wood, "Heteroge- neous system coherence for integrated CPU- GPU systems," Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 457-467. 2013.
  4. Y.H. Park, C.H. Kim, J.M Kim, "Implementation and Performance Evaluation of the Faddev-Leverrier Algorithm using GPGPU," IEMEK J. Embed. Sys. Appl., Vol. 8, No. 6, pp. 171-178, 2013 (in Korean). https://doi.org/10.14372/IEMEK.2013.8.3.171
  5. G.Y. Jeong, J.H. jeong, H.C. Lee, G.G. Jeon, J.H. Cho, "Efficient Implementation of Candidate Region Extractor for Pedestrian Detection System with Stereo Camera based on GP-GPU," IEMEK J. Embed. Sys. Appl., Vol. 8, No. 2, pp. 121-128, 2013 (in Korean). https://doi.org/10.14372/IEMEK.2013.8.2.121
  6. S. Che, M. Boyer, J. Meng, D. Tarjan., J.W Sheaffer, S.H. Lee, K. Skadron, "Rodinia: A benchmark suite for heterogeneous com- putting," Proceedings of IEEE International Symposium on Workload Characterization, pp. 44-54. 2009.
  7. J. Feehrer, P. Rotker, M. Shih, P. Gingras, P. Yakutis, S. Phillips, J. Heath, "Coherency hub design for multisocket sun servers with coolthreads technology," IEEE Micro, Vol. 29, No. 4, pp. 36-47, 2009. https://doi.org/10.1109/MM.2009.62
  8. I. Singh, A. Shriraman, W. Fung, M. O'Connor, T. Aamodt, "Cache coherence for GPU architectures," Proceedings of IEEE 19th International Symposium on High Performance Computer Architecture, pp. 578-590, 2013.
  9. P. Hammarlund, R. Kumar, R.B. Osborne, R. Rajwar, R. Singhal, R. D'Sa, S. Gunther, "Haswell: The fourth-generation Intel core processor," IEEE Micro, Vol. 34, No. 02, pp. 6-20, 2014. https://doi.org/10.1109/MM.2014.10
  10. K. Wang, X. Ding, R. Lee, S. Kato, X. Zhang, "GDM: Device memory management for gpgpu computing," Proceedings of The 2014 ACM international conference on Measurement and modeling of computer systems, pp. 533-545, 2014.
  11. O. Kayiran, N.C. Nachiappan, A. Jog, R., Ausavarungnirun, M.T. Kandemir, G.H. Loh, C.R. Das, "Managing GPU concurrency in heterogeneous architectures," Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 114-126, 2014.
  12. Rodinia Benchmark Group. The Rodinia Benchmark Suite version 3.1. 2015.
  13. H.M. Pack, J.S. Kwon, T.H. Gwang, D.S. Kim, "A Development of Fusion Processor Architecture for Efficient Main Memory Access in CPU-GPU Environment," Journal of KIECS, Vol. 11, no. 2, pp. 151-158, 2016 (in Korean).