- Volume 11 Issue 6
DOI QR Code
Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL
OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화
- Park, Joo-Yul (Hanyang University) ;
- Hong, Jung-Hyun (Hanyang University) ;
- Chung, Ki-Seok (Hanyang University)
- Received : 2016.06.24
- Accepted : 2016.10.04
- Published : 2016.12.31
Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.
Supported by : 한국연구재단
- Y.H. Park, C.H. Kim, J.M. Kim, "Implementation and performance evaluation of the faddev-leverrier algorithm using GPGPU," IEMEK J. Embed. Sys. Appl., No. 8, Vol. 3, 2013 (in Korean).
- Khronos OpenCL Working Group, "The OpenCL specification version 1.2," Document Revision 19, 2012.
- R.G. Gallager, "Low-density parity check codes," IEEE IRE Transactions on Information Theory, Vol. 8, No. 1, pp. 21-28, 1962. https://doi.org/10.1109/TIT.1962.1057683
- S.M. Choi, B.H. Moon, J.T. Ryu, S.H. Park, "Performance analysis on error correction scheme for wireless sensor network over node-to-node interference," IEMEK J. Embed. Sys. Appl., Vol. 2, No. 1, 2006 (in Korean).
- S. Wang, S. Cheng, Q. Wu, "A parallel decoding algorithm of LDPC codes using CUDA," Proceedings of Asilomar Conference on Signals, Systems and Computers, pp. 171-175, 2008.
- G. Falcão, S. Leonel, S. Vitor, "Massive parallel LDPC decoding on GPU," Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 83-90, 2008.
- H.W. Ji, J.H. Cho, W.Y. Sung, "Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU," Journal of Signal Processing Systems, Vol. 64, No. 1, pp. 149-159, 2011. https://doi.org/10.1007/s11265-010-0547-9
- G. Falcão, V. Silva, L. Sousa, "How GPUs can outperform ASICs for fast LDPC decoding," Proceedings of the 23rd international conference on Supercomputing, pp. 390-399, 2009.
- J. Shen, J. Fang, H. Sips, A. L. Varbanescu, "Performance traps in OpenCL for CPUs," Proceedings of IEEE 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 38-45, 2013.
- B. R. Gaster, L. Howes, D. R. Kaeli, P. Mistry, D. Schadd, "Heterogeneous computing with OpenCL: Revised OpenCL 1.2 Edition," Morgan Kaufmann, 2012.
- J.Y. Park, K.S. Chung, "Parallel LDPC decoding using CUDA and OpenMP," EURASIP Journal on Wireless Communications and Networking, Vol. 2011, No. 1, pp. 1-8, 2011. https://doi.org/10.1186/1687-1499-2011-1
- Advanced Micro Devices, "AMD accelerates parallel processing OpenCL programming guide," 2013.
- D. Leonardo, R. Menon, "OpenMP: an industry standard API for shared-memory programming," IEEE Computational Science and Engineering, Vol. 5, No.1, pp. 46-55, 1998. https://doi.org/10.1109/99.660313
- Nvidia, "Compute unified device architecture programming guide Version 2.0," 2008.