DOI QR코드

DOI QR Code

칩의 크기가 제한된 단일칩 프로세서를 위한 레벨 1 캐시구조

A Level One Cache Organization for Chip-Size Limited Single Processor

  • 주영관 (충북대학교 전자계산학과) ;
  • 김석일 (충북대학교 전기전자컴퓨터공학부, 유비쿼터스바이오정보기술연구센터)
  • 발행 : 2005.04.01

초록

이 논문에서는 단일 칩 프로세서에서 제한된 공간의 레벨 1 캐시를 구성하고 있는 선인출 캐시 $L_P$와 요구인출 캐시 $L_1$의 합이 일정한 때, $L_1$$L_P$의 크기의 적정한 비율을 실험을 통하여 분석하였다. 실험 결과, $L_1$$L_P$의 합이 16KB일 경우에는 $L_1$을 12KB, $L_P$를 4KB로 구성하고 $L_P$의 선인출 기법과 캐시교체정책은 각각 OBL과 FEO을 적용시키는 레벨 1 캐시 구조가 가장 성능이 우수함을 보였다. 또한 이 분석은 $L_1$$L_P$의 합이 32KB 이상인 경우에는 $L_P$의 선인출 기법으로는 동적필터 기법을 사용하는 것이 유리함을 보였고 32KB의 공간이 가용한 경우에는 $L_1$을 28KB, $L_P$를 4KB로, 64KB가 가용한 경우에는 $L_1$을 48KB, $L_P$를 16KB로 레벨 1 캐시를 분할하는 것이 가장 좋은 성능을 발휘함을 보였다.

This paper measured a proper ratio of the size of demand fetch cache $L_1$ to that of prefetch cache $L_P$ by imulation when the size of $L_1$ and $L_P$ are constant which organize space-limited level 1 cache of a single microprocessor chip. The analysis of our experiment showed that in the condition of the sum of the size of $L_1$ and $L_P$ are 16 KB, the level 1 cache organization by constituting $L_P$ with 4 KB and employing OBL and FIFO as a prefetch technique and a cache replacement policy respectively resulted in the best performance. Also, this analysis showed that in the condition of the sum of the size of $L_1$ and $L_P$ are over 32 KB, employing dynamic filtering as prefetch technique of $L_P$ are more advantageous and splitting level 1 cache by constituting $L_1$ with 28 KB and $L_P$ with 4 KB in the case of 32 KB of space are available, by constituting $L_1$ with 48 KB and $L_P$ with 16 KB in the case of 64 KB elicited the best performance.

키워드

참고문헌

  1. J. Fritts, Multi-Level Memory Prefetching for Media and Streaming Processing, Proceedings of International Conference on Multimedia and Expo, 2002 https://doi.org/10.1109/ICME.2002.1035522
  2. J. L. Bear and W. H. Wang, 'Architectural Choices for Multi-level Cache Hierachies,' Proceedings of 16th international Conference on Parallel Processing, pp.258-256, 1987
  3. S. P. VanderWiel and D.J. Lilja, When Caches Aren't Enough: Data Prefetching Techniques. IEEE Computers, 23-30, May 1995 https://doi.org/10.1109/2.596622
  4. T. F. Chen and J. L. Baer, Effective Hardware-Based Data Prefetching for High Performance Processors, IEEE Transactions on Computers, 44(5):609-623, May 1995 https://doi.org/10.1109/12.381947
  5. A. Smith, Sequential Program Prefetching in Memory Hierarchies, IEEE Computer, 11(2):7-21, 1997 https://doi.org/10.1109/C-M.1978.218016
  6. N. P. Jouppi, Improving Direct-mapped Cache Performance by the Addition of a Small Fully associative Cache and Prefetch Buffers, Proceedings of the 17th Annual International Symposium on Computer Architecture, pp.364-373, May 1990 https://doi.org/10.1109/ISCA.1990.134547
  7. A. Srivastava and A. Eustace, ATOM: A System for Building Customized Program Analysis Tools, Proceedings of the ACM SIGPLAN 94, 196-205, 1994 https://doi.org/10.1145/178243.178260
  8. M. D. Hill, Dinero III Cache Simulator, Technical Report, Computer Sciences Department, University of Wisconsin, Madison
  9. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia Communications Systems. Proceedings of the 30th Annual international Symposium on Microarchitecture, December 1997 https://doi.org/10.1109/MICRO.1997.645830
  10. F. Harmsze, A. Timmer and J. van Meerbergen, Memory Arbitration and Cache Management in Stream-Based Systems, Proceedings of the Date 2000, pp.257-262, March 2000 https://doi.org/10.1109/DATE.2000.840048
  11. A. J. Smith, 'Cache Memories', ACM Computing Surveys, Vol. 14, pp.473-530, September 1982 https://doi.org/10.1145/356887.356892
  12. D. Joseph and D. Grunwald, 'Prefetching Using Markov Predictors,' Proceedings 24th Inl, Symp. Computer Architecture, pp.252-263, June 1997 https://doi.org/10.1145/264107.264207
  13. X. Zhang, H. S. Lee, A hardware-based cache pollution filtering mechanism for aggressive prefetches, Proceedings. 2003 International Conference on Parallel Processing , pp.286 - 293, 6-9, October 2003 https://doi.org/10.1109/ICPP.2003.1240591
  14. A. Leung, K. Palem and C. Ungureanu, Run-time versus Compile-time Instruction Scheduling in Superscalar (RISC) Processors: Performance and Tradeoffs, Technical report 699, New York University, July 1995
  15. C. Basoglu, W. Lee and J. S. O'Donnell, 'The MAP1000A VLIW mediaprocessor,' IEEE Micro, Vol. 20, No. 2, pp.48-59, March 2000 https://doi.org/10.1109/40.848472
  16. R. B. Lee, 'Subword Parallelism with MAX-2,' IEEE Micro, Vol. 16, No. 4, pp.51-59, August, 1996 https://doi.org/10.1109/40.526925
  17. C. Young, N. Gloy and M. D. Smith, 'A comparative analysis of schemes for correlated branch prediction,' Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp.22-24, June 1995 https://doi.org/10.1145/223982.224438
  18. H. S. Stone, High-Performance Computer Architecture, Addison Wesley, 1993
  19. S. Carr, K. S. McKinley and C. W. Tseng, 'Compiler Optimization for Improving Data Locality,' Proceedings of 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 252-262, October, 1994 https://doi.org/10.1145/195473.195557
  20. M. E. Wolf and M. S. Lam, 'A Data Locality Optimizing Algorithm,' Proceedings of SIGPLAN'91 Conference on Programming Language Design and Implementation, pp.30-44, June 1991 https://doi.org/10.1145/113445.113449
  21. J. R. Goodman, Cache Consistency and Sequential Consistency, Technical Report TR-1006, University of Wisconsin-Madison, February, 1991
  22. F. Harmsze, A. Timmer and J. van Meerbergen, 'Memory Arbitration and Cache Management in Stream-Based Systems,' Proceedings of the DATE 2000, pp.257-262, March 2000 https://doi.org/10.1109/DATE.2000.840048
  23. T. Horel and G. Lauterbach, 'UltraSPARC-III : Designing Third-generation 64-bit Performance,' IEEE Micro, Vol. 19, No. 3, pp.73-85, May 1999 https://doi.org/10.1109/40.768506
  24. J. Hennessy, D. Citron, D. Patterson and G. Sohi, 'The use and abuse of SPEC: An ISCA panel,' IEEE Micro, Vol. 23, pp.73-77, July-August 2003 https://doi.org/10.1109/MM.2003.1225977
  25. H. J. Moon, J. N. Jeon, S. I. Kim, 'Design of A Media Processor Equipped with Dual Cache,' Journal of KISS, Vol. 29, No. 9, pp.573-581, October 2002
  26. H. J. Moon, A Cache Managing Strategy for Fast Media Data Access, Ph. D. Thesis, Dept. of Computer Science, Chungbuk National University, February 2003
  27. N. B. Gaddis, J. R. Butler, A. Kumar, W. J. Queen, A 56-entry instruction reorder buffer, Solid-State Circuits Conference, Digest of Technical Papers. 43rd ISSCC, 1996 IEEE International, pp.212-213, 447, February 1996 https://doi.org/10.1109/ISSCC.1996.488575
  28. Y. Solihin, J. Lee, J. Torrellas, 'Correlation prefetching with a user-level memory thread,' IEEE Transactions on Parallel and Distributed Systems, Vol. 14, pp.563-580, June 2003 https://doi.org/10.1109/TPDS.2003.1206504
  29. N. Mitchell, 'Philips TriMedia: A Digital Convergence Platform,' Wescon'97, pp.56-60, 1997 https://doi.org/10.1109/WESCON.1997.632319
  30. Z. Hu, M. Martonosi and S. Kaxiras, 'TCP: Tag Correlating Prefetchers,' Proceedings of 9th International Symposium on High-Performance Computer Architecture, pp.137-147, 2003 https://doi.org/10.1109/HPCA.2003.1183549
  31. M. Denamn, 'PowerPC 604,' Hot Chips VI, pp.193-200, 1994
  32. Pentium Processor User's Manual, Vol.1 Pentium Processor Databook, Intel, 1993