• Title, Summary, Keyword: multicore processor

Search Result 49, Processing Time 0.032 seconds

On-Chip Debug Architecture for Multicore Processor

  • Park, Hyeong-Bae;Xu, Jing-Zhe;Kim, Kil-Hyun;Park, Ju-Sung
    • ETRI Journal
    • /
    • v.34 no.1
    • /
    • pp.44-54
    • /
    • 2012
  • Because of the intrinsic lack of internal-system observability and controllability in highly integrated multicore processors, very restricted access is allowed for the debugging of erroneous chip behavior. Therefore, the building of an efficient debug function is an important consideration in the design of multicore processors. In this paper, we propose a flexible on-chip debug architecture that embeds a special logic supporting the debug functionality in the multicore processor. It is designed to support run-stop-type debug functions that can halt and control the execution of the multicore processor at breakpoint events and inspect the possible causes of any errors. The debug architecture consists of the following three functional components: the core debug support block, the multicore debug support block, and the debug interface and control block. By embedding this debug infrastructure, the embedded processor cores within the multicore processor can be debugged simultaneously as well as independently. The debug control is performed by employing a JTAG-based scanning operation. We apply this on-chip debug architecture to build a debugger for a prototype multicore processor and demonstrate the validity and scalability of our approach.

Easily Adaptable On-Chip Debug Architecture for Multicore Processors

  • Xu, Jing-Zhe;Park, Hyeongbae;Jung, Seungpyo;Park, Ju Sung
    • ETRI Journal
    • /
    • v.35 no.2
    • /
    • pp.301-310
    • /
    • 2013
  • Nowadays, the multicore processor is watched with interest by people all over the world. As the design technology of system on chip has developed, observing and controlling the processor core's internal state has not been easy. Therefore, multicore processor debugging is very difficult and time-consuming. Thus, we need a reliable and efficient debugger to find the bugs. In this paper, we propose an on-chip debug architecture for multicore processors that is easily adaptable and flexible. It is based on the JTAG standard and supports monitoring mode debugging, which is different from run-stop mode debugging. Compared with the debug architecture that supports the run-stop mode debugging, the proposed architecture is easily applied to a debugger and has the advantage of having a desirable gate count and execution cycle. To verify the on-chip debug architecture, it is applied to the debugger of the prototype multicore processor and is tested by interconnecting it with a software debugger based on GDB and configured for the target processor.

Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing

  • Pei, Songwen;Zhang, Junge;Jiang, Linhua;Kim, Myoung-Seo;Gaudiot, Jean-Luc
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.7
    • /
    • pp.3231-3244
    • /
    • 2016
  • As processor design has been transiting from homogeneous multicore processor to heterogeneous multicore processor, traditional Amdahl's law cannot meet the new challenges for asymmetric multicore system. In order to further investigate the impact factors related to the Overhead of Data Preparation (ODP) for Asymmetric multicore systems, we evaluate an asymmetric multicore system built with CPU-GPU by measuring the overheads of memory transfer, computing kernel, cache missing and synchronization. This paper demonstrates that decreasing the overhead of data preparation is a promising approach to improve the whole performance of heterogeneous system.

The DRAM Effects on The Performance of Multicore Processors (멀티코어 프로세서의 성능에 대한 DRAM의 영향)

  • Lee, Jongbok
    • The Journal of The Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.3
    • /
    • pp.203-208
    • /
    • 2017
  • Recently, the importance of DRAM is very significant in multicore processors which are widely used in computers, laptops, tablet PCs, and mobile devices. To keep up with this, both industry and academia have actively studied various types of future DRAMs. Therefore, accurate DRAM model is requisite when evaluating the multicore processor performance. In this paper, a multicore processor trace-driven simulator which can couple with the cycle-accurate DRAM simulator has been developed. Using SPEC 2000 benchmarks as input, the effect of cycle-accurate DDR3 model on the multicore processor performance has been evaluated.

Performance Study of Asymmetric Multicore Processor Architectures (비대칭적 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of The Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.3
    • /
    • pp.163-169
    • /
    • 2014
  • Recently, the importance of multicore processor system is growing rapidly. Multicore processors are classified either as symmetric or asymmetric. Asymmetric multicore processors consist of a high performance complex core and number of low performance simple cores, and are known to be more efficient than symmetric multicore processors. Therefore, performance impact on various configurations of asymmetric multi-core processor needs to be studied. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for different asymmetric quad-core and octa-core processors and compared to the corresponding symmetric ones.

A Study on Power Dissipation of The Multicore Processor (멀티코어 프로세서의 전력 소비에 대한 연구)

  • Lee, Jongbok
    • The Journal of The Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.2
    • /
    • pp.251-256
    • /
    • 2017
  • Recently, multicore processor system is widely adopted not only in general purpose computers but also in embedded systems and mobile devices in order to improve performance. Since the power dissipation issue of multicore processor system is very significant, it must be estimated accurately in the early design stage. In this paper, a fast power analysis tool for a high performance multicore processor based on the trace-driven simulator has been developed. To achieve it, the power dissipation of each hardware unit per core are added. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed to estimate the average power dissipation per instruction.

Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution (다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석)

  • Lee, Jong Bok
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.2
    • /
    • pp.198-205
    • /
    • 2013
  • In this paper, the performance of multicore processor architecture is analyzed which utilizes out-of-order superscalar processor core using multiple basic block execution. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar processor with the window size from 32 to 64 and the number of cores between 1 and 16, exploiting multiple basic block execution from 1 to 4 extensively. As a result, the multicore out-of-order superscalar processor with 4 basic block execution achieves 22.0 % average performance increase over the same architecture with the single basic block execution.

Performance Study of Multicore Digital Signal Processor Architectures (멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of The Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.171-177
    • /
    • 2013
  • Due to the demand for high speed 3D graphic rendering, video file format conversion, compression, encryption and decryption technologies, the importance of digital signal processor system is growing rapidly. In order to satisfy the real-time constraints, high performance digital signal processor is required. Therefore, as in general purpose computer systems, digital signal processor should be designed as multicore architecture as well. Using UTDSP benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2 to 16-core digital signal processor architectures with the cores from simple RISC to in-order and out-of-order superscalar processors for the various window sizes, extensively.

Low-power Filter Cache Design Technique for Multicore Processors (멀티 코어 프로세서를 위한 저전력 필터 캐쉬 설계 기법)

  • Park, Young-Jin;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.12
    • /
    • pp.9-16
    • /
    • 2009
  • Energy consumption as well as performance should be considered when designing up-to-date multicore processors. In this paper, we propose new design technique to reduce the energy consumption in the instruction cache for multicore processors by using modified filter cache. The filter cache has been recognized as one of the most energy-efficient design techniques for singlecore processors. The energy consumed in the instruction cache accounts for a significant portion of total processor energy consumption. Therefore, energy-aware instruction cache design techniques are essential to reduce the energy consumption in a multicore processor. The proposed technique reduces the energy consumption in the instruction cache for multicore processors by reducing the number of accesses to the level-1 instruction cache. We evaluate the proposed design using a simulation infrastructure based on SimpleScalar and CACTI. Simulation results show that the proposed architecture reduces the energy consumption in the instruction cache for multicore processors by up to 3.4% compared to the conventional filter cache architecture. Moreover, the proposed architecture shows better performance over the conventional filter cache architecture.

  • PDF

Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.4
    • /
    • pp.285-299
    • /
    • 2013
  • To enable hard real-time systems to take advantage of multicore processors, it is crucial to obtain the worst-case execution time (WCET) for programs running on multicore processors. However, this is challenging and complicated due to the inter-thread interferences from the shared resources in a multicore processor. Recent research used the combined cache conflict graph (CCCG) to model and compute the worst-case inter-thread interferences on a shared L2 cache in a multicore processor, which is called the CCCG-based approach in this paper. Although it can compute the WCET safely and accurately, its computational complexity is exponential and prohibitive for a large number of cores. In this paper, we propose three counter-based approaches to significantly reduce the complexity of the multicore WCET analysis, while achieving absolute safety with tightness close to the CCCG-based approach. The basic counter-based approach simply counts the worst-case number of cache line blocks mapped to a cache set of a shared L2 cache from all the concurrent threads, and compares it with the associativity of the cache set to compute the worst-case cache behavior. The enhanced counter-based approach uses techniques to enhance the accuracy of calculating the counters. The hybrid counter-based approach combines the enhanced counter-based approach and the CCCG-based approach to further improve the tightness of analysis without significantly increasing the complexity. Our experiments on a 4-core processor indicate that the enhanced counter-based approach overestimates the WCET by 14% on average compared to the CCCG-based approach, while its averaged running time is less than 1/380 that of the CCCG-based approach. The hybrid approach reduces the overestimation to only 2.65%, while its running time is less than 1/150 that of the CCCG-based approach on average.