A Level One Cache Organization for Chip-Size Limited Single Processor

Ju YoungKwan;Kim Sukil;

doi:10.3745/KIPSTA.2005.12A.2.127

정보처리학회논문지A (The KIPS Transactions:PartA)

제12A권2호
/
Pages.127-136
/
2005
/
1598-2831(pISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

칩의 크기가 제한된 단일칩 프로세서를 위한 레벨 1 캐시구조

A Level One Cache Organization for Chip-Size Limited Single Processor

주영관 (충북대학교 전자계산학과) ;
김석일 (충북대학교 전기전자컴퓨터공학부, 유비쿼터스바이오정보기술연구센터)

Ju YoungKwan ;
Kim Sukil

발행 : 2005.04.01

https://doi.org/10.3745/KIPSTA.2005.12A.2.127 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

이 논문에서는 단일 칩 프로세서에서 제한된 공간의 레벨 1 캐시를 구성하고 있는 선인출 캐시 $L_P$와 요구인출 캐시 $L_1$의 합이 일정한 때, $L_1$와 $L_P$의 크기의 적정한 비율을 실험을 통하여 분석하였다. 실험 결과, $L_1$와 $L_P$의 합이 16KB일 경우에는 $L_1$을 12KB, $L_P$를 4KB로 구성하고 $L_P$의 선인출 기법과 캐시교체정책은 각각 OBL과 FEO을 적용시키는 레벨 1 캐시 구조가 가장 성능이 우수함을 보였다. 또한 이 분석은 $L_1$와 $L_P$의 합이 32KB 이상인 경우에는 $L_P$의 선인출 기법으로는 동적필터 기법을 사용하는 것이 유리함을 보였고 32KB의 공간이 가용한 경우에는 $L_1$을 28KB, $L_P$를 4KB로, 64KB가 가용한 경우에는 $L_1$을 48KB, $L_P$를 16KB로 레벨 1 캐시를 분할하는 것이 가장 좋은 성능을 발휘함을 보였다.

This paper measured a proper ratio of the size of demand fetch cache $L_1$ to that of prefetch cache $L_P$ by imulation when the size of $L_1$ and $L_P$ are constant which organize space-limited level 1 cache of a single microprocessor chip. The analysis of our experiment showed that in the condition of the sum of the size of $L_1$ and $L_P$ are 16 KB, the level 1 cache organization by constituting $L_P$ with 4 KB and employing OBL and FIFO as a prefetch technique and a cache replacement policy respectively resulted in the best performance. Also, this analysis showed that in the condition of the sum of the size of $L_1$ and $L_P$ are over 32 KB, employing dynamic filtering as prefetch technique of $L_P$ are more advantageous and splitting level 1 cache by constituting $L_1$ with 28 KB and $L_P$ with 4 KB in the case of 32 KB of space are available, by constituting $L_1$ with 48 KB and $L_P$ with 16 KB in the case of 64 KB elicited the best performance.

키워드

참고문헌

J. Fritts, Multi-Level Memory Prefetching for Media and Streaming Processing, Proceedings of International Conference on Multimedia and Expo, 2002 https://doi.org/10.1109/ICME.2002.1035522
J. L. Bear and W. H. Wang, 'Architectural Choices for Multi-level Cache Hierachies,' Proceedings of 16th international Conference on Parallel Processing, pp.258-256, 1987
S. P. VanderWiel and D.J. Lilja, When Caches Aren't Enough: Data Prefetching Techniques. IEEE Computers, 23-30, May 1995 https://doi.org/10.1109/2.596622
T. F. Chen and J. L. Baer, Effective Hardware-Based Data Prefetching for High Performance Processors, IEEE Transactions on Computers, 44(5):609-623, May 1995 https://doi.org/10.1109/12.381947
A. Smith, Sequential Program Prefetching in Memory Hierarchies, IEEE Computer, 11(2):7-21, 1997 https://doi.org/10.1109/C-M.1978.218016
N. P. Jouppi, Improving Direct-mapped Cache Performance by the Addition of a Small Fully associative Cache and Prefetch Buffers, Proceedings of the 17th Annual International Symposium on Computer Architecture, pp.364-373, May 1990 https://doi.org/10.1109/ISCA.1990.134547
A. Srivastava and A. Eustace, ATOM: A System for Building Customized Program Analysis Tools, Proceedings of the ACM SIGPLAN 94, 196-205, 1994 https://doi.org/10.1145/178243.178260
M. D. Hill, Dinero III Cache Simulator, Technical Report, Computer Sciences Department, University of Wisconsin, Madison
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia Communications Systems. Proceedings of the 30th Annual international Symposium on Microarchitecture, December 1997 https://doi.org/10.1109/MICRO.1997.645830
F. Harmsze, A. Timmer and J. van Meerbergen, Memory Arbitration and Cache Management in Stream-Based Systems, Proceedings of the Date 2000, pp.257-262, March 2000 https://doi.org/10.1109/DATE.2000.840048
A. J. Smith, 'Cache Memories', ACM Computing Surveys, Vol. 14, pp.473-530, September 1982 https://doi.org/10.1145/356887.356892
D. Joseph and D. Grunwald, 'Prefetching Using Markov Predictors,' Proceedings 24th Inl, Symp. Computer Architecture, pp.252-263, June 1997 https://doi.org/10.1145/264107.264207
X. Zhang, H. S. Lee, A hardware-based cache pollution filtering mechanism for aggressive prefetches, Proceedings. 2003 International Conference on Parallel Processing , pp.286 - 293, 6-9, October 2003 https://doi.org/10.1109/ICPP.2003.1240591
A. Leung, K. Palem and C. Ungureanu, Run-time versus Compile-time Instruction Scheduling in Superscalar (RISC) Processors: Performance and Tradeoffs, Technical report 699, New York University, July 1995
C. Basoglu, W. Lee and J. S. O'Donnell, 'The MAP1000A VLIW mediaprocessor,' IEEE Micro, Vol. 20, No. 2, pp.48-59, March 2000 https://doi.org/10.1109/40.848472
R. B. Lee, 'Subword Parallelism with MAX-2,' IEEE Micro, Vol. 16, No. 4, pp.51-59, August, 1996 https://doi.org/10.1109/40.526925
C. Young, N. Gloy and M. D. Smith, 'A comparative analysis of schemes for correlated branch prediction,' Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp.22-24, June 1995 https://doi.org/10.1145/223982.224438
H. S. Stone, High-Performance Computer Architecture, Addison Wesley, 1993
S. Carr, K. S. McKinley and C. W. Tseng, 'Compiler Optimization for Improving Data Locality,' Proceedings of 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 252-262, October, 1994 https://doi.org/10.1145/195473.195557
M. E. Wolf and M. S. Lam, 'A Data Locality Optimizing Algorithm,' Proceedings of SIGPLAN'91 Conference on Programming Language Design and Implementation, pp.30-44, June 1991 https://doi.org/10.1145/113445.113449
J. R. Goodman, Cache Consistency and Sequential Consistency, Technical Report TR-1006, University of Wisconsin-Madison, February, 1991
F. Harmsze, A. Timmer and J. van Meerbergen, 'Memory Arbitration and Cache Management in Stream-Based Systems,' Proceedings of the DATE 2000, pp.257-262, March 2000 https://doi.org/10.1109/DATE.2000.840048
T. Horel and G. Lauterbach, 'UltraSPARC-III : Designing Third-generation 64-bit Performance,' IEEE Micro, Vol. 19, No. 3, pp.73-85, May 1999 https://doi.org/10.1109/40.768506
J. Hennessy, D. Citron, D. Patterson and G. Sohi, 'The use and abuse of SPEC: An ISCA panel,' IEEE Micro, Vol. 23, pp.73-77, July-August 2003 https://doi.org/10.1109/MM.2003.1225977
H. J. Moon, J. N. Jeon, S. I. Kim, 'Design of A Media Processor Equipped with Dual Cache,' Journal of KISS, Vol. 29, No. 9, pp.573-581, October 2002
H. J. Moon, A Cache Managing Strategy for Fast Media Data Access, Ph. D. Thesis, Dept. of Computer Science, Chungbuk National University, February 2003
N. B. Gaddis, J. R. Butler, A. Kumar, W. J. Queen, A 56-entry instruction reorder buffer, Solid-State Circuits Conference, Digest of Technical Papers. 43rd ISSCC, 1996 IEEE International, pp.212-213, 447, February 1996 https://doi.org/10.1109/ISSCC.1996.488575
Y. Solihin, J. Lee, J. Torrellas, 'Correlation prefetching with a user-level memory thread,' IEEE Transactions on Parallel and Distributed Systems, Vol. 14, pp.563-580, June 2003 https://doi.org/10.1109/TPDS.2003.1206504
N. Mitchell, 'Philips TriMedia: A Digital Convergence Platform,' Wescon'97, pp.56-60, 1997 https://doi.org/10.1109/WESCON.1997.632319
Z. Hu, M. Martonosi and S. Kaxiras, 'TCP: Tag Correlating Prefetchers,' Proceedings of 9th International Symposium on High-Performance Computer Architecture, pp.137-147, 2003 https://doi.org/10.1109/HPCA.2003.1183549
M. Denamn, 'PowerPC 604,' Hot Chips VI, pp.193-200, 1994
Pentium Processor User's Manual, Vol.1 Pentium Processor Databook, Intel, 1993

정보처리학회논문지A (The KIPS Transactions:PartA)

칩의 크기가 제한된 단일칩 프로세서를 위한 레벨 1 캐시구조

A Level One Cache Organization for Chip-Size Limited Single Processor

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)