• Title, Summary, Keyword: Cache-miss

Search Result 89, Processing Time 0.041 seconds

An efficient pipelined architecture for 3D graphics accelerator (3차원 그래픽 가속기의 효율적인 파이프라인 설계)

  • 우현재;정종철;이문기
    • Proceedings of the IEEK Conference
    • /
    • /
    • pp.357-360
    • /
    • 2002
  • This paper is proposed about an efficient pipelined architecture for 3D graphics accelerator to reduce Cache miss ratio. Because cache miss takes a considerable time, about 20∼30 cycle, we reduce cache miss ratio to use pre-fetch. As a result of simulation, we figure out that the miss ratio of cache depends on the size of tile, cache memory and auxiliary cache memory. We can save 6.6% cache miss ratio maximumly.

  • PDF

A Cache Controller to Maximize Effectiveness of Hierarchical Memory Architecture (계층적 메모리 구조의 효과를 극대화하는 캐시 제어기)

  • Uh Bong Yong;Ju Young Kwan;Cheon Joong Nam;Kim Suk Il
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.608-616
    • /
    • 2005
  • A cache architecture is proposed here which evokes prefetch at level 1 cache miss. Existing structures only prefetch at level 2 cache miss. In the proposed cache architecture, level 1 cache miss would select demand fetch block and prefetch block from the level 2 cache and store to level 1 cache and prefetch cache, respectively. According to an experimental analysis using 11 benchmark programs, the hierarchical cache architecture that employs both a level 1 cache prefetcher and a level 2 cache prefetcher obtained a maximum $19\%$ increased performance when compared to the cache architecture that employs only a level 2 cache prefetcher.

Analytical Models of Instruction Fetch on Superscalar Processors

  • Kim, Sun-Mo;Jung, Jin-Ha;Park, Sang-Bang
    • Proceedings of the IEEK Conference
    • /
    • /
    • pp.619-622
    • /
    • 2000
  • This research presents an analytical model to predict the instruction fetch rate on superscalar Processors. The proposed model is also able to analyze the performance relationship between cache miss and branch prediction miss. The proposed model takes into account various kind of architectural parameters such as branch instruction probability, cache miss rate, branch prediction miss rate, and etc.. To prove the correctness of the proposed model, we performed extensive simulations and compared the results with those of the analytical models. Simulation results showed that the pro-posed model can estimate the instruction fetch rate accurately within 10% error in most cases. The model is also able to show the effects of the cache miss and branch prediction miss on the performance of instruction fetch rate, which can provide an valuable information in designing a balanced system.

  • PDF

Cache Architecture Design for the Performance Improvement of OpenRISC Core (OpenRISC 코어의 성능향상을 위한 캐쉬 구조 설계)

  • Jung, Hong-Kyun;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.1
    • /
    • pp.68-75
    • /
    • 2009
  • As the recent performance of microprocessor is improving quickly, the necessity of cache is growing because of the increase of the access time of main memory. Every block of direct-mapped cache maps to one cache line. Although the mapping rule is simple, if different blocks map to one cache line, the miss ratio will be higher than the set-associative cache due to conflicts. In this paper, for the improvement of the direct-mapped cache of OpenRISC, 4-way set-associative cache is proposed. Four blocks of the main memory of the proposed cache map to one cache line so that the miss ratio is less than the direct-mapped cache. Pseudo-LRU Policy, which is one of the Line Replacement Policies, is used for decreasing the number of bits that store LRU value. The OpenRISC core including the 4-way set-associative cache was verified with FPGA emulation. As the result of performance measurement using test program, the performance of the OpenRISC core including the 4-way set-associative cache is higher than the previous one by 50% and the decrease of miss ratio is more than 15%.

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System (내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계)

  • Lee, Jong-Min;Kim, Soon-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.5
    • /
    • pp.292-303
    • /
    • 2010
  • On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.

CC-GiST: A Generalized Framework for Efficiently Implementing Arbitrary Cache-Conscious Search Trees (CC-GiST: 임의의 캐시 인식 검색 트리를 효율적으로 구현하기 위한 일반화된 프레임워크)

  • Loh, Woong-Kee;Kim, Won-Sik;Han, Wook-Shin
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1
    • /
    • pp.21-34
    • /
    • 2007
  • According to recent rapid price drop and capacity growth of main memory, the number of applications on main memory databases is dramatically increasing. Cache miss, which means a phenomenon that the data required by CPU is not resident in cache and is accessed from main memory, is one of the major causes of performance degradation of main memory databases. Several cache-conscious trees have been proposed for reducing cache miss and making the most use of cache in main memory databases. Since each cache-conscious tree has its own unique features, more than one cache-conscious tree can be used in a single application depending on the application's requirement. Moreover, if there is no existing cache-conscious tree that satisfies the application's requirement, we should implement a new cache-conscious tree only for the application's sake. In this paper, we propose the cache-conscious generalized search tree (CC-GiST). The CC-GiST is an extension of the disk-based generalized search tree (GiST) [HNP95] to be tache-conscious, and provides the entire common features and algorithms in the existing cache-conscious trees including pointer compression and key compression techniques. For implementing a cache-conscious tree based on the CC-GiST proposed in this paper, one should implement only a few functions specific to the cache-conscious tree. We show how to implement the most representative cache-conscious trees such as the CSB+-tree, the pkB-tree, and the CR-tree based on the CC-GiST. The CC-GiST eliminates the troublesomeness caused by managing mire than one cache-conscious tree in an application, and provides a framework for efficiently implementing arbitrary cache-conscious trees with new features.

Bitmap-based Prefix Caching for Fast IP Lookup

  • Kim, Jinsoo;Ko, Myeong-Cheol;Nam, Junghyun;Kim, Junghwan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.3
    • /
    • pp.873-889
    • /
    • 2014
  • IP address lookup is very crucial in performance of routers. Several works have been done on prefix caching to enhance the performance of IP address lookup. Since a prefix represents a range of IP addresses, a prefix cache shows better performance than an IP address cache. However, not every prefix is cacheable in itself. In a prefix cache it causes false hit to cache a non-leaf prefix because there is possibly the longer matching prefix in the routing table. Prefix expansion techniques such as complete prefix tree expansion (CPTE) make it possible to cache the non-leaf prefixes as the expanded forms, but it is hard to manage the expanded prefixes. The expanded prefixes sometimes incur a great deal of update overhead in a routing table. We propose a bitmap-based prefix cache (BMCache) to provide low update overhead as well as low cache miss ratio. The proposed scheme does not have any expanded prefixes in the routing table, but it can expand a non-leaf prefix using a bitmap on caching time. The trace-driven simulation shows that BMCache has very low miss ratio in spite of its low update overhead compared to other schemes.

An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor (Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.18 no.3
    • /
    • pp.392-397
    • /
    • 2014
  • In this paper, we propose a memory operation system architecture for memory latency penalty reduction in SIMT architecture based stream processor. The proposed architecture applied non-blocking cache architecture to reduce cache miss penalty generated by blocking cache architecture. We verified that the proposed memory operation architecture improve the performance of the stream processor by comparing processing performances of various algorithms. We measured the performance improvement rate that was improved in accordance with the ratio of memory instruction in each algorithm. As a result, we confirmed that the performance of stream processor improves up to minimum 8.2% and maximum 46.5%.

Warp-Based Load/Store Reordering to Improve GPU Time Predictability

  • Huangfu, Yijie;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.58-68
    • /
    • 2017
  • While graphics processing units (GPUs) can be used to improve the performance of real-time embedded applications that require high throughput, it is challenging to estimate the worst-case execution time (WCET) of GPU programs, because modern GPUs are designed for improving the average-case performance rather than time predictability. In this paper, a reordering framework is proposed to regulate the access to the GPU data cache, which helps to improve the accuracy of the estimation of GPU L1 data cache miss rate with low performance overhead. Also, with the improved cache miss rate estimation, tighter WCET estimations can be achieved for GPU programs.

High Performance Data Cache Memory Architecture (고성능 데이터 캐시 메모리 구조)

  • Kim, Hong-Sik;Kim, Cheong-Ghil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.4
    • /
    • pp.945-951
    • /
    • 2008
  • In this paper, a new high performance data cache scheme that improves exploitation of both the spatial and temporal locality is proposed. The proposed data cache consists of a hardware prefetch unit and two sub-caches such as a direct-mapped (DM) cache with a large block size and a fully associative buffer with a small block size. Spatial locality is exploited by fetching and storing large blocks into a direct mapped cache, and is enhanced by prefetching a neighboring block when a DM cache hit occurs. Temporal locality is exploited by storing small blocks from the DM cache in the fully associative buffer according to their activity in the DM cache when they are replaced. Experimental results on Spec2000 programs show that the proposed scheme can reduce the average miss ratio by $12.53%\sim23.62%$ and the AMAT by $14.67%\sim18.60%$ compared to the previous schemes such as direct mapped cache, 4-way set associative cache and SMI(selective mode intelligent) cache[8].