DOI QR코드

DOI QR Code

Optimization of LU-SGS Code for the Acceleration on the Modern Microprocessors

  • Jang, Keun-Jin (Department of Aerospace Engineering, Pusan National University) ;
  • Kim, Jong-Kwan (Department of Aerospace Engineering, Pusan National University) ;
  • Cho, Deok-Rae (Department of Aerospace Engineering, Pusan National University) ;
  • Choi, Jeong-Yeol (Department of Aerospace Engineering, Pusan National University)
  • Received : 2012.12.04
  • Accepted : 2013.03.29
  • Published : 2013.06.30

Abstract

An approach for composing a performance optimized computational code is suggested for the latest microprocessors. The concept of the code optimization, termed localization, is maximizing the utilization of the second level cache that is common to all the latest computer systems, and minimizing the access to system main memory. In this study, the localized optimization of the LU-SGS (Lower-Upper Symmetric Gauss-Seidel) code for the solution of fluid dynamic equations was carried out in three different levels and tested for several different microprocessor architectures widely used these days. The test results of localized optimization showed a remarkable performance gain of more than two times faster solution than the baseline algorithm for producing exactly the same solution on the same computer system.

Keywords

References

  1. Schreiber, R., and Dongarra, J., "Automatic Blocking of Nested Loops", University of Tennessee Computer Science Technical Report, CS-90-108, 1990.
  2. Dongarra, J. J., Du Croz, J., Duff, I. S., and Hammarling, S., "A Set of Level 3 Basic Linear Algebra Subprograms", ACM Trans. Math. Soft., Vol. 16, Issue 1, 1990, pp. 1-17. https://doi.org/10.1145/77626.79170
  3. Yoon, S., and Jameson, A., "Lower-Upper Symmetric- Gauss-Seidel Method for the Euler and navier-Stokes Equations", AIAA Journal, Vol. 26, No. 9, 1988, pp. 1025-1026. https://doi.org/10.2514/3.10007
  4. Choi, J.-Y., and Oh, S., "Acceleration of LU-SGS Code on Latest Microprocessors Considering the Increase of Level 2 Cache Hit-Rate", Journal of KSAS, Vol. 30, No. 7., 2002, pp. 68-80. https://doi.org/10.5139/JKSAS.2002.30.7.068
  5. Moore, G.E., "Cramming more components onto integrated circuits", Electronics, Vol. 38, No. 8, 1965, pp. 114-117.
  6. Crandall, R.E., "PowerPC G4 for Engineering, Science, and Education", Apple Computer, Inc., Oct. 2000, URL : http://www.apple.com/powermac/pdf/PowerPCG4velocityengine. pdf.
  7. Tendler, J.M., Dodson, S., Fields, S., Le, H., and Sinharoy, B., "Power 4 System Micro architecture", IBM Corp., Oct. 2001.
  8. Intel Corp., "The Xeon Processor MP Product Overview", Intel Corp., URL : http://www.intel.com/design / Xeon/xeonmp/prodbref/index.htm.
  9. Johnson, J.J., "The AMD-760™ MPX Platform for the AMD $Athlon^{TM}$ MP Processor", White Paper PID# 25787A, AMD Inc., Jan. 2002.
  10. Intel Corp., "Intel 850 Chipset: 82850 Memory Controller Hub (MCH) Datasheet", Intel Document Number 290691-001, Nov. 2000.
  11. Intel Corp., "Intel 845 Chipset: 82845 Memory Controller Hub (MCH) for SDR Datasheet", Intel Document Number 290725-002, Jan. 2002.
  12. Intel Architecture Optimization Reference Manual, Intel Corp., 1998-1999.
  13. Intel Pentium 4 and Xeon Processor Optimization Reference Manual, Intel Corp., 1999-2001.
  14. URL : http://www.netlib.org/atlas/index.html.
  15. Anderson, E., et al., LAPACK Users' Guide Third Edition, SIAM 1999, Philadelphia, PA.
  16. Choi, J.-Y., Jeung, I.-S., and Yoon, Y., "Computational Fluid Dynamics Algorithms for Unsteady Shock-Induced Combustion, Part 1: Validation", AIAA Journal, Vol. 38, No. 7, 2000, pp. 1179-1187. https://doi.org/10.2514/2.1112
  17. URL : http://www.polyhedron.co.uk.