DOI QR코드

DOI QR Code

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache

재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법

  • Son, Dong-Oh (School of Electronics and Computer Engineering, Chonnam National University) ;
  • Choi, Hong-Jun (School of Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Jong-Myon (School of Computer Engineering and Information Technology, University of Ulsan) ;
  • Kim, Cheol-Hong (School of Electronics and Computer Engineering, Chonnam National University)
  • 손동오 (전남대학교 전자컴퓨터공학부) ;
  • 최홍준 (전남대학교 전자컴퓨터공학부) ;
  • 김종면 (울산대학교 컴퓨터정보통신공학부) ;
  • 김철홍 (전남대학교 전자컴퓨터공학부)
  • Received : 2013.06.04
  • Accepted : 2013.09.12
  • Published : 2013.11.29

Abstract

In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.

멀티코어 프로세서에서 라스트 레벨 캐쉬는 코어와 메모리의 속도 차이를 줄여주는 역할을 하는 중요한 하드웨어 자원이다. 때문에 라스트 레벨 캐쉬의 효율적인 관리는 프로세서의 성능에 큰 영향을 미친다. 라스트 레벨 캐쉬를 구성하는 공유/비공유 캐쉬는 코어들이 공유하는 데이터와 각 코어의 독립된 데이터를 각각 적재한다. 최근 많은 연구를 통해 라스트 레벨 캐쉬 관리기법이 연구되었지만 주로 공유 캐쉬에 대한 연구만 이뤄지고 있으며 라스트 레벨 캐쉬의 비공유 캐쉬에 대한 연구는 아직 미약하다. 라스트 레벨 캐쉬의 비공유 캐쉬는 각 코어에 동일한 영역이 할당되기 때문에 코어별 작업량이 다를 경우 캐쉬 관리가 효과적이지 않다. 본 논문에서는 라스트 레벨 캐쉬 중 비공유 캐쉬의 효율적인 관리를 위해 코어 인지 캐쉬 교체 기법을 제안한다. 제안된 코어 인지 캐쉬 교체 기법은 비공유 캐쉬를 동적으로 재구성함으로써, 라스트 레벨 캐쉬의 적중률을 향상시킨다. 또한, 우리는 캐쉬 교체 기법의 성능 향상을 위해 2비트 포화 카운터를 적용하였다. 실험 결과 기존의 교체 기법과 비교하여 9.23%의 적중률 향상과 12.85%의 라스트 레벨 캐쉬 접근 시간 감소의 효과가 있었다.

Keywords

References

  1. V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," In Proceedings of 27th international symposium on computer architecture, pp. 248-259, Vancouver, Canada, June. 2000.
  2. Y. J. Kwon, C. D. Kim, S. R. Maeng, and J. H. Huh, "Virtualizing performance asymmetric multi-core systems," In Proceedings of 38th International Symposium on Computer Architecture, pp. 45-56, San Jose, USA, June. 2011.
  3. M. DeVuyst, A. Venkat, and D. M. Tullsen, "Execution migration in a heterogeneous-ISA chip multiprocessor," In Proceedings of 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 261-272, London, UK, Mar. 2012.
  4. M. K. Qureshi, D. Thompson, and Y. N. Patt, "The V-Way Cache : Demand-Based Associativity via Global Replacement," In Proceedings of The 32nd International Symposium on Computer Architecture, pp. 544-555, Madison, USA, June. 2005.
  5. H. Dybdahl, and P. Stenstrom, "An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 2-12, Phoenix, USA, Feb. 2007.
  6. A. Jaleel, M. Mattina, and B. Jacob, "Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP -A Case Study of Parallel Bioinformatics Workloads," In Proceedings of 12th International Symposium on High Performance Computer Architecture, pp. 88-98, Austin, USA, Feb. 2006.
  7. J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: An Adaptive Hybrid Memory Model for Accelerators," IEEE MICRO, Vol. 31, Issue 1, pp. 42-55, Jan.-Feb. 2011.
  8. A. Meixner, and D. J. Sorin, "Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 145-156, Phoenix, USA, Feb. 2007.
  9. L. Cheng, J. B. Carter, and D. Dai, "An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 328-339, Phoenix, USA, Feb. 2007.
  10. M. Chaudhuri, "Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches," In Proceedings of 42nd Microarchitecture, pp. 401-412, New York, USA. Dec. 2009.
  11. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely. Jr, and J. Emer, "Adaptive insertion policies for managingshared caches," In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp. 208-219, Toronto, Canada, Oct. 2008.
  12. A. Jaleel, K. B. Theobald, S. C. Steely. Jr, and J. Emer, "Highperformance cache replacement using re-reference intervalprediction (RRIP)," In Proceedings of 32nd International Symposium on Computer Architecture, pp. 60-71, Madison, USA, June. 2010.
  13. S. Kim, D. Chandra, and D. Solihin, "Fair cache sharing and partitioning in a chip multiprocessor architecture," In Proceedings of the 13th international conference on Parallel architectures and compilation techniques, pp. 111-122, Antibes Juan-les-Pins, France, Sep. 2004.
  14. M. K. Qureshi, and Y. N. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," In Proceedings of 39th Microarchitecture, pp. 423-432, Orlando, USA, Dec. 2006.
  15. S. Srikantaiah, M. Kandemir, and Q. Wang, "Sharp control:Controlled shared cache management in chip multiprocessors," In Proceedings of 42nd Microarchitecture, pp. 517-528, New York, USA, dec. 2009.
  16. Y. Xie and G. H. Loh, "PIPP: promotion/insertion pseudopartitioning of multi-core shared caches," In Proceedings of The 36th International Symposium on Computer Architecture, pp. 174-183, Austin, USA, June. 2009.
  17. Y. Xie and G. H. Loh, "Scalable shared-cache management by containing thrashing workloads," In Proceedings of High Performance Embedded Architectures and Compilers, pp. 262-276. Pisa, Italy, Jan, 2010.
  18. J. M. Kim and S. W. Chung, "Group-Based Replacement Algorithm to Reduce Cache Miss in Last Level Cache," Journal of The Korea Society of Computer and Information, Vol. 6, No. 5, pp.44-50, Oct. 2010.
  19. J. Lee, and H. Kim, "TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture," In Proceedings of 18th International Symposium on High Performance Computer Architecture, pp. 91-102, New Orleans, USA, Feb. 2012.
  20. E. Perelman, M. Polito, J. B, J. Sampson, B. Calder, and C. Dulong, "Detecting Phases in Parallel Applications on Shared Memory Architectures," In Proceedings of International Parallel and Distributed Processing Symposium, pp. 88-88, Rhodes Island, Greece, April. 2006.
  21. Z. Zhang, Z. Zhu, and X. Zhang, "Design and Optimization of Large Size and Low Overhead Off-Chip Caches," IEEE Transactions on Computer, Vol. 53, Issue 7, pp. 843-855, July. 2004. https://doi.org/10.1109/TC.2004.27
  22. Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-Assisted GPGPU on Fused CPU-GPU Architectures," In Proceedings of 18th International Symposium on High Performance Computer Architecture, pp. 1-12, New Orleans, USA, Feb. 2012.