DOI QR코드

DOI QR Code

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache

재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법

  • Son, Dong-Oh (School of Electronics and Computer Engineering, Chonnam National University) ;
  • Choi, Hong-Jun (School of Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Jong-Myon (School of Computer Engineering and Information Technology, University of Ulsan) ;
  • Kim, Cheol-Hong (School of Electronics and Computer Engineering, Chonnam National University)
  • 손동오 (전남대학교 전자컴퓨터공학부) ;
  • 최홍준 (전남대학교 전자컴퓨터공학부) ;
  • 김종면 (울산대학교 컴퓨터정보통신공학부) ;
  • 김철홍 (전남대학교 전자컴퓨터공학부)
  • Received : 2013.06.04
  • Accepted : 2013.09.12
  • Published : 2013.11.29

Abstract

In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.

Acknowledgement

Supported by : 전남대학교

References

  1. V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," In Proceedings of 27th international symposium on computer architecture, pp. 248-259, Vancouver, Canada, June. 2000.
  2. Y. J. Kwon, C. D. Kim, S. R. Maeng, and J. H. Huh, "Virtualizing performance asymmetric multi-core systems," In Proceedings of 38th International Symposium on Computer Architecture, pp. 45-56, San Jose, USA, June. 2011.
  3. M. DeVuyst, A. Venkat, and D. M. Tullsen, "Execution migration in a heterogeneous-ISA chip multiprocessor," In Proceedings of 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 261-272, London, UK, Mar. 2012.
  4. M. K. Qureshi, D. Thompson, and Y. N. Patt, "The V-Way Cache : Demand-Based Associativity via Global Replacement," In Proceedings of The 32nd International Symposium on Computer Architecture, pp. 544-555, Madison, USA, June. 2005.
  5. H. Dybdahl, and P. Stenstrom, "An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 2-12, Phoenix, USA, Feb. 2007.
  6. A. Jaleel, M. Mattina, and B. Jacob, "Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP -A Case Study of Parallel Bioinformatics Workloads," In Proceedings of 12th International Symposium on High Performance Computer Architecture, pp. 88-98, Austin, USA, Feb. 2006.
  7. J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: An Adaptive Hybrid Memory Model for Accelerators," IEEE MICRO, Vol. 31, Issue 1, pp. 42-55, Jan.-Feb. 2011.
  8. A. Meixner, and D. J. Sorin, "Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 145-156, Phoenix, USA, Feb. 2007.
  9. L. Cheng, J. B. Carter, and D. Dai, "An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 328-339, Phoenix, USA, Feb. 2007.
  10. M. Chaudhuri, "Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches," In Proceedings of 42nd Microarchitecture, pp. 401-412, New York, USA. Dec. 2009.
  11. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely. Jr, and J. Emer, "Adaptive insertion policies for managingshared caches," In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp. 208-219, Toronto, Canada, Oct. 2008.
  12. A. Jaleel, K. B. Theobald, S. C. Steely. Jr, and J. Emer, "Highperformance cache replacement using re-reference intervalprediction (RRIP)," In Proceedings of 32nd International Symposium on Computer Architecture, pp. 60-71, Madison, USA, June. 2010.
  13. S. Kim, D. Chandra, and D. Solihin, "Fair cache sharing and partitioning in a chip multiprocessor architecture," In Proceedings of the 13th international conference on Parallel architectures and compilation techniques, pp. 111-122, Antibes Juan-les-Pins, France, Sep. 2004.
  14. M. K. Qureshi, and Y. N. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," In Proceedings of 39th Microarchitecture, pp. 423-432, Orlando, USA, Dec. 2006.
  15. S. Srikantaiah, M. Kandemir, and Q. Wang, "Sharp control:Controlled shared cache management in chip multiprocessors," In Proceedings of 42nd Microarchitecture, pp. 517-528, New York, USA, dec. 2009.
  16. Y. Xie and G. H. Loh, "PIPP: promotion/insertion pseudopartitioning of multi-core shared caches," In Proceedings of The 36th International Symposium on Computer Architecture, pp. 174-183, Austin, USA, June. 2009.
  17. Y. Xie and G. H. Loh, "Scalable shared-cache management by containing thrashing workloads," In Proceedings of High Performance Embedded Architectures and Compilers, pp. 262-276. Pisa, Italy, Jan, 2010.
  18. J. M. Kim and S. W. Chung, "Group-Based Replacement Algorithm to Reduce Cache Miss in Last Level Cache," Journal of The Korea Society of Computer and Information, Vol. 6, No. 5, pp.44-50, Oct. 2010.
  19. J. Lee, and H. Kim, "TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture," In Proceedings of 18th International Symposium on High Performance Computer Architecture, pp. 91-102, New Orleans, USA, Feb. 2012.
  20. E. Perelman, M. Polito, J. B, J. Sampson, B. Calder, and C. Dulong, "Detecting Phases in Parallel Applications on Shared Memory Architectures," In Proceedings of International Parallel and Distributed Processing Symposium, pp. 88-88, Rhodes Island, Greece, April. 2006.
  21. Z. Zhang, Z. Zhu, and X. Zhang, "Design and Optimization of Large Size and Low Overhead Off-Chip Caches," IEEE Transactions on Computer, Vol. 53, Issue 7, pp. 843-855, July. 2004. https://doi.org/10.1109/TC.2004.27
  22. Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-Assisted GPGPU on Fused CPU-GPU Architectures," In Proceedings of 18th International Symposium on High Performance Computer Architecture, pp. 1-12, New Orleans, USA, Feb. 2012.