DOI QR코드

DOI QR Code

Improving Instruction Cache Performance by Dynamic Management of Cache-Image

캐시 이미지의 동적 관리 방법을 이용한 명령어 캐시 성능 개선

  • 서효중 (가톨릭대학교 컴퓨터정보공학부)
  • Received : 2017.04.25
  • Accepted : 2017.07.17
  • Published : 2017.09.15

Abstract

The burst loading of a pre-created cache-image is an effective method to reduce the instruction cache misses in the early stage of the program execution. It is useful to alleviate the performance degradation as well as the energy inefficiency, which is induced by the concentrated cold misses at the instruction cache. However, there are some defects, including software overhead on the compiler and installer. Furthermore, there are several mismatches as a result of the dynamic properties for specific applications. This paper addresses these issues and proposes a cache-image maintenance/recreation policy that can conduct dynamic management using a hardware-assisted method. The results of the simulation show that the proposed method can maintain the cache-image with a proper size and validity.

Acknowledgement

Supported by : 한국연구재단

References

  1. D. Grunwald, C. B. Morrey, III, P. Levis, M. Neufeld, and K. I. Farkas, "Policies for dynamic clock scheduling," Proc. of the 4th Conf. Symp. Operating System Design & Implementation, Vol. 4, No. 6, 2000.
  2. J. Pouwelse, K. Langendoen, H. Sips, "Dynamic voltage scaling on a low-power microprocessor," Proc. of the Intl. Conf. Mobile computing and Networking, pp. 251-259, 2001.
  3. H.J. Suh, T. Kim, "Burst Loading Method of Instruction Cache Image for Program Latency Reduction and Energy Saving," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 4, pp. 163-170, 2013. (in Korean)
  4. S.Y. Hwang, H.J. Suh, "Program Latency Reduction and Energy Saving by Way-Selective Cache Image Pre-Loading of Instruction Cache," Journal of KIISE : Computing Practices and Letters, Vol. 20, No. 3, pp. 121-130, 2014. (in Korean)
  5. G. Semeraro, G. Magklis, R. Balasubramonian, D.H. Albonesi, S. Dwarkadas, M. L. Scott, "Energy- efficient processor design using multiple clock domains with dynamic voltage and frequency scaling," Proc. of the 8th Intl. Symp. High-Performance Computer Architecture, pp. 29-40, 2002.
  6. X. Jin, Xin, S. Goto, "Hilbert Transform-Based Workload Prediction and Dynamic Frequency Scaling for Power-Efficient Video Encoding," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 31, No. 5, pp. 649-661, 2012. https://doi.org/10.1109/TCAD.2011.2180383
  7. Y. Xie and G. H. Loh, "Scalable Shared-Cache Management by Containing Thrashing Workloads," Proc. of the 5th Intl. Conf. High Performance Embedded Architectures and Compilers, pp. 262-276, 2010.
  8. X. Ding, K. Wang, X. Zhang, "ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores," ACM SIGPLAN Notices, Vol. 46, No. 8, pp. 103-112, 2011. https://doi.org/10.1145/2038037.1941568
  9. D. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0," ACM SIGARCH Computer Architecture News, Vol. 25, No. 3, pp. 13-25, 1997. https://doi.org/10.1145/268806.268810
  10. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," Proc. of the IEEE Intl. Work. Workload Characterization, pp. 3-14, 2001.
  11. Hynix Semiconductor, 240pin DDR2 SDRAM Unbuffered DIMMs based on 2Gb A version Rev.0.1, 2009.