An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs

Son, Dong Oh;Kim, Jong Myon;Kim, Cheol Hong;

doi:10.9708/jksci.2016.21.2.009

Journal of the Korea Society of Computer and Information (한국컴퓨터정보학회논문지)

Volume 21 Issue 2
/
Pages.9-16
/
2016
/
1598-849X(pISSN)
/
2383-9945(eISSN)

Korean Society of Computer Information (한국컴퓨터정보학회)

DOI QR Code

An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs

Son, Dong Oh (School of Electronics and Computer Engineering, Chonnam National University) ;
Kim, Jong Myon (School of Electrical Engineering, University of Ulsan) ;
Kim, Cheol Hong (School of Electronics and Computer Engineering, Chonnam National University)

Received : 2015.08.18
Accepted : 2015.11.23
Published : 2016.02.29

https://doi.org/10.9708/jksci.2016.21.2.009 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, many research groups have focused on GPGPUs in order to improve the performance of computing systems. GPGPUs can execute general-purpose applications as well as graphics applications by using parallel GPU hardware resources. GPGPUs can process thousands of threads based on warp scheduling and CTA scheduling. In this paper, we utilize the traditional CTA scheduler to assign a various number of CTAs to SMs. According to our simulation results, increasing the number of CTAs assigned to the SM statically does not improve the performance. To solve the problem in traditional CTA scheduling schemes, we propose a new IPC-based dynamic CTA scheduling scheme. Compared to traditional CTA scheduling schemes, the proposed dynamic CTA scheduling scheme can increase the GPU performance by up to 13.1%.

Keywords

References

V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," In Proceedings of 27th International Symposium on Computer Architecture, pp. 248-259, 2000.
Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of 31th Annual Conference on Computer Graphics, pp.777-786, 2004.
NVIDIA CUDA Programming, available at http://www.nvidia.com/object/cuda_home_new.html
OpenCL, available at http://www.khronos.org/opencl/
ATI Streaming, available at http://www.amd.com/stream
General-purpose computation on graphics hardware, available at http://www.gpgpu.org
I. A. Buck, "Programming CUDA," In Supercomputing 2007 Tutorial Notes, 2007.
A. Jog, O. Kayiran, N. C. Nachiappan, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das, "OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance," In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 395-406, 2013.
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, "Improving GPU Performance via Large Warps and Two-Level Warp Scheduling," In Proceedings of the International Symposium on Microarchitecture (MICRO), pp. 308-317, 2011.
T. G. Rogers, M. O'Connor, and T. M. Aamodt, "Cache-Conscious Wavefront Scheduling," In Proceedings of the International Symposium on Microarchitecture (MICRO), pp. 78-85, 2012.
V. W. Lee, C. K. Kim, J. Chhugani, M. Deisher, D. H. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal and P. Dubey, "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU," In proceedings of International Symposium on Computer Architecture, pp.451-460, 2010.
NVIDIA's Next Generation CUDA Compute Architecture: Fermi, available at www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_ architecture_whitepaper.pdf
J. E. Thornton, "Parallel operation in the control data 6600," In AFIPS Proceedings of FJCC, Part.2, Volume.26, pp.33-40, 1964.
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp.777-786, 2004.
M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. Cho, and S. Ryu, "Improving GPGPU Resource Utilization Through Alternative Thread Block Scheduling," In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp.260-271, 2014.
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
S. Li, J. H Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," In Proceedings of the International Symposium on Microarchitecture, pp.469-480, 2009.
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," In Proceedings of the International Symposium Computer Architecture, pp.487-498, 2013.
CUDA SDK, available at http://developerdownload.nvidia.com/compute/cuda/sdk/website/samples.html
S. Che, M. Boyer, M. Jiayuan, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K.Skadron, "Rodinia: A Benchmark Suite for Heterogeneous Computing," In Proceedings of the International Symposium on Workload Characterization (IISWC), pp. 44-54, 2009.

Journal of the Korea Society of Computer and Information (한국컴퓨터정보학회논문지)

An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)