DOI QR코드

DOI QR Code

Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL

OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크

  • Kim, Daehee (Dept. of Computer Science and Engineering, Konkuk University) ;
  • Park, Neungsoo (Dept. of Computer Science and Engineering, Konkuk University)
  • Received : 2018.01.04
  • Accepted : 2018.01.10
  • Published : 2018.02.01

Abstract

Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.

Keywords

References

  1. Jeffrey Dean and Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM,. 51(1). pp. 107-133. 2008. https://doi.org/10.1145/1327452.1327492
  2. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. pp. 2-2. 2012.
  3. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica, "Spark: cluster computing with working sets," Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. pp.10-10. 2010.
  4. Tudor Alexandru Voicu, "SparkJNI: A Reference Design for a Heterogeneous Apache Spark Framework," M.S. Thesis, the Delft University of Technology, 2016.
  5. Oren Segal, Pilip Colangelo, Nasibeh Nasiri, Zhuo Qian and Martin Margala, "SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters," arXiv preprint arXiv:1505. 01120, 2015.
  6. Oren Segal, Pilip Colangelo, Nasibeh Nasiri, Zhuo Qian and Martin Margala, "Aparapi-Ucores: Ahigh level programming framework for unconventional cores," High Performance Extreme Computing Conference(HPEC). pp. 1-6. 2015.
  7. Diego Caballero, Sara Royuela and Roger Ferrer, "Optimizing Overlapped Memory Accesses in Userdirected Vectorization," In Proceedings of the 29th ACM on International Conference on Supercomputing. pp. 393-404. 2015.
  8. Toshiya Komoda, Shinobu Miwa and Hiroshi Nakamura, "Communication Library to Overlap Computation and Communication for OpenCL Application," In Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW). pp. 567-573. 2012.
  9. Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker and Byung-Gon Chun, "Making sense of performance in data analytics frameworks," 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). pp.293-307. 2015.
  10. Naila Farooqui, "Runtime specialization for heterogeneous CPU-GPU platforms," Ph.D. Dissertation, the Georgia Institute of Technology, 2016.
  11. Max Grossman, Shams lmam and Vivek Sarkar, "HJOpenCL: Reducing the Gap Between the JVM and Accelerators," Proceedings of the Principles and Practices of Programming on The Java Platform. pp. 2-15. 2015.
  12. NVIDIA, "CUDA C PROGRAMMING GUIDE." Technical Report, September 2015.
  13. AMD, "AMD APP SDK OpenCLTM User Guide." Technical Report, Advanced Micro Devices(AMD), August 2015.
  14. AMD, "AMD APP SDK OpenCLTM Optimization Guide." Technical Report, Advanced Micro Devices (AMD), August 2015.