DOI QR코드

DOI QR Code

TBBench: A Micro-Benchmark Suite for Intel Threading Building Blocks

  • Marowka, Ami (Dept. of Computer Science, Bar-Ilan University)
  • Received : 2011.07.21
  • Accepted : 2012.04.09
  • Published : 2012.06.30

Abstract

Task-based programming is becoming the state-of-the-art method of choice for extracting the desired performance from multi-core chips. It expresses a program in terms of lightweight logical tasks rather than heavyweight threads. Intel Threading Building Blocks (TBB) is a task-based parallel programming paradigm for multi-core processors. The performance gain of this paradigm depends to a great extent on the efficiency of its parallel constructs. The parallel overheads incurred by parallel constructs determine the ability for creating large-scale parallel programs, especially in the case of fine-grain parallelism. This paper presents a study of TBB parallelization overheads. For this purpose, a TBB micro-benchmarks suite called TBBench has been developed. We use TBBench to evaluate the parallelization overheads of TBB on different multi-core machines and different compilers. We report in detail in this paper on the relative overheads and analyze the running results.

Keywords

References

  1. A. Aiken et al., "Towards Pervasive Parallelism". Presentation of Pervasive Parallelism Laboratory Stanford University, http://ppl.stanford.edu/wiki/index.php/Pervasive_Parallelism_Laboratory.
  2. K. Asanovic et al., "The landscape of parallel computing research: A view from Berkeley". University of California at Berkeley, Technical Report No. UCB/EECS-2006-183, December, 18, 2006. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
  3. M. Bull, "Measuring Synchronization and Scheduling Overheads in OpenMP", Proceeding of First European Workshop on OpenMP (EWOMP '99) Lund, Sweden, October, 1999.
  4. M. Bull and D. O'Neill, "micro-benchmark Suite for OpenMP 2.0", Proceedings of the Third European Workshop on OpenMP (EWOMP'01), Barcelona, Spain, September, 2001, pp.41-48.
  5. G. Contreras and M. Martonosi, "Characterizing and Improving the Performance of Intel Threading Building Blocks", IEEE Proceeding of International Symposium on Workload Characterization, 2008, pp.57-66.
  6. K. Fuerlinger and M. Gerndt, "ompP: A profiling tool for OpenMP", In Proceedings of the First International Workshop on OpenMP (IWOMP 2005), Eugene, Oregon, USA, May, 2005.
  7. K. Fuerlinger, "The OpenMP Profiler ompP: User Guide and Manual", May, 2008. http://www.cs.utk.edu/karl/research/ompp/usage.html
  8. K. Fuerlinger and D. Skinner, "Performance Profiling for OpenMP Tasks", In Proceedings of the 5th International Workshop on OpenMP (IWOMP 2009). Dresden, Germany, June, 2009.
  9. D. Hower and S. Jackson, "TaskMan: Simple Task-Parallel Programming", http://pages.cs.wisc.edu/david/courses/cs758/Fall2009/includes/Projects/JacksonHower-slides.pdf
  10. B. Nicols et al., "Pthreads Programming, A POSIX Standard for Better Multiprocessing", O'reilly, September 1996.
  11. A. Marowka, "Parallel Computing on Any Desktop", Communication of ACM, Vol.50, Issue 9, September, 2007, pp.74-78.
  12. A. Marowka, "Execution Model of Three Parallel Languages: OpenMP, UPC and CAF". Scientific Programming, Vol.13(2), October, 2005, pp.127-135. https://doi.org/10.1155/2005/914081
  13. A. Marowka, "Performance of OpenMP Benchmarks on Multi-core Processors", 8th International Conference on Algorithms and Architectures for Parallel Processing(ICA3PP), Agia Napa, Cyprus, June, 9-11, 2008, LNCS proceeding Vol.5022, pp.208-219.
  14. A. Marowka, "Pitfalls and Issues of Manycore Programming", ADVANCES IN COMPUTERS, Volume 79, 2010, Elsevier.
  15. A. Marowka, "Back to Thin-Core Massively Parallel Processors", IEEE Computer, Vol.44, No.12, December, 2011, pp.49-54.
  16. A. Marowka, "On Performance Analysis of a Multithreaded Application Parallelized by Different Programming Models using Intel VTune", Malyshkin, V. (ed.) Eleventh International Conference on Parallel Computing Technologies (PaCT). LNCS 6873, Springer (2011), pp.317-331.
  17. J. Reinders, "Intel Threading Building Blocks, Outfitting C++ for Multi-core Processor Parallelism", O'Reilly, 2007.
  18. P. Kegel, M. Schellmann, S. Gorlatch, S. (2009): "Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-Cores". In Sips, H. J., Epema, D. H. J., Lin H. (Hrsg.): Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, Delft, The Netherlands, August, 25-28, 2009, Seiten 654-665.
  19. A. Podobas, M. Brorsson, and K. Faxan, "A Comparison of some recent Task-based Parallel Programming Models", in the proceeding of the Third Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG), Pisa, Italy, January, 24, 2010.
  20. A. Robison, M. Voss and A. Kukanov, "Optimization via Reflection on Work Stealing in TBB", In Proceeding of IEEE International Symposium on Parallel and Distributed Processing, IPDPS, 2008, pp.1-8.
  21. H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software". Dr. Dobb's Journal, 30(3), March, 2005.
  22. H. Sutter and J. Larus, "Software and the concurrency revolution". ACM Queue 3, 7 (September, 2005), 54-62.
  23. L. Wang and X. Xu, "Parallel Software Development with Intel Threading Analysis Tools", Intel Technology Journal, Vol.11, Issue 04, 2007, pp.287-297.
  24. "High Productivity Computing Systems", http://www.highproductivity.org/
  25. "Intel Parallel Studio", http://www.intel.com/cd/software/products/asmo-na/eng/399359.htm
  26. "Sphinx Micro-benchmark Suite", http://www.llnl.gov/CASC/RTSReport/sphinx.html
  27. TBB Web Site: http://www.threadingbuildingblocks.org/
  28. UPCRC: http://www.upcrc.illinois.edu/index.html

Cited by

  1. More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms vol.50, pp.8, 2015, https://doi.org/10.1145/2858788.2688501
  2. Implications of shallower memory controller transaction queues in scalable memory systems vol.72, pp.5, 2016, https://doi.org/10.1007/s11227-015-1485-x
  3. NTB branch predictor: dynamic branch predictor for high-performance embedded processors vol.72, pp.5, 2016, https://doi.org/10.1007/s11227-014-1280-0
  4. Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus vol.74, pp.4, 2018, https://doi.org/10.1007/s11227-017-2231-3