- D. Luebke and G. Humphreys. "How GPUs work," Journal of Computer, Vol. 40, No. 2, pp. 96-100, February 2007. DOI: 10.1109/MC.2007.59
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 777-786, August 2004. DOI: 10.1145/1186562.1015800
- CUDA Programming Guide Version 3.0, available at
- Khronos Group, OpenCL, available at
- ATI Stream SDK, available at
- General-purpose computation on graphics hardware, available at
- J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Computer Graphics Forum, Vol. 26, No. 1, pp. 21-51, March 2007. DOI: 10.1111/j.1467-8659.2007.01012.x
- Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-assisted GPGPU on fused CPU-GPU architectures," In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, pp. 1-12, March 2012. DOI: 10.1109/HPCA.2012.6168948
- NVIDIA TITAN X, available at
- X. Zhang, and K.K. Parhi, "High-speed VLSI architectures for the AES algorithm," In Proceedings of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 957-967, August 2004. DOI: 10.1109/TVLSI.2004.832943
- NVIDA Co. Ltd., available at
- AMD(Advanced Micro Devices) Inc., available at
- NVIDIA's Next Generation CUDA Compute Architecture: Fermi, available at
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of International Symposium on Microarchitecture, pp. 407-420, December 2007.
- J. E. Thornton, "Parallel operation in the control data 6600," In AFIPS Proceedings of FJCC, Part. 2, Vol. 26, pp. 33-40, 1964. DOI: 10.1109/MICRO.2007.12
- M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. Cho, and S. Ryu, "Improving GPGPU Resource Utilization Through Alternative Thread Block Scheduling," In Proceedings of the International Symposium on High Performance Computer Architecture, pp. 260-271, June 2014. DOI: 10.1109/HPCA.2014.6835937
- K. M. Abdalla, L. V. Shah, J. F. Duluk, T. J. Purcell, T. Mandal, and G. Hirota, "Scheduling and Execution of Compute Tasks," US Patent US20130185725, 2013.
- H. Choi, D. Son, J. Kim, and C. Kim, “Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization,” Journal of SuperComputing, Vol. 69, No. 1, pp. 330-356, July 2014. DOI: 10.1007/s11227-014-1155-4
- D. Son, J. Kim, and C. Kim, “An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs,” Journal of The Korea Society of Computer and Information, Vol. 21, No. 2, pp. 9-16, February 2016. DOI: 10.9708/jksci.2016.21.2.009
- G. Kim, J. Kim, and C. Kim, “Latency Hiding based Warp Scheduling Policy for High Performance GPUs,” Journal of The Korea Society of Computer and Information, Vol. 24, No. 4, pp. 1-9, April 2019. DOI: 10.9708/jksci.2019.24.04.001
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of International Symposium on Performance Analysis of Systems and Software, pp. 163-174, April 2009. DOI: 10.1109/ISPASS.2009.4919648
- S. Li, J. H Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," In Proceedings of the International Symposium on Microarchitecture, pp. 469-480, January 2009. DOI: 10.1145/1669112.1669172
- J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," In Proceedings of the International Symposium Computer Architecture, pp. 487-498, June 2013. DOI: 10.1145/2485922.2485964
- GTX480 NVIDIA, available at
- M. Abdel-Majeed, D. Wong, and M. Annavaram, "Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs," In Proceedings of International Symposium on Microarchitecture, pp. 111-122, December 2013. DOI: 10.1145/2540708.2540719