• Title/Summary/Keyword: Hardware Accelerator

Search Result 111, Processing Time 0.037 seconds

Energy Efficient and Low-Cost Server Architecture for Hadoop Storage Appliance

  • Choi, Do Young;Oh, Jung Hwan;Kim, Ji Kwang;Lee, Seung Eun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4648-4663
    • /
    • 2020
  • This paper proposes the Lempel-Ziv 4(LZ4) compression accelerator optimized for scale-out servers in data centers. In order to reduce CPU loads caused by compression, we propose an accelerator solution and implement the accelerator on an Field Programmable Gate Array(FPGA) as heterogeneous computing. The LZ4 compression hardware accelerator is a fully pipelined architecture and applies 16 dictionaries to enhance the parallelism for high throughput compressor. Our hardware accelerator is based on the 20-stage pipeline and dictionary architecture, highly customized to LZ4 compression algorithm and parallel hardware implementation. Proposing dictionary architecture allows achieving high throughput by comparing input sequences in multiple dictionaries simultaneously compared to a single dictionary. The experimental results provide the high throughput with intensively optimized in the FPGA. Additionally, we compare our implementation to CPU implementation results of LZ4 to provide insights on FPGA-based data centers. The proposed accelerator achieves the compression throughput of 639MB/s with fine parallelism to be deployed into scale-out servers. This approach enables the low power Intel Atom processor to realize the Hadoop storage along with the compression accelerator.

FPGA-Based Post-Quantum Cryptography Hardware Accelerator Design using High Level Synthesis (HLS 를 이용한 FPGA 기반 양자내성암호 하드웨어 가속기 설계)

  • Haesung Jung;Hanyoung Lee;Hanho Lee
    • Transactions on Semiconductor Engineering
    • /
    • v.1 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • This paper presents the design and implementation of Crystals-Kyber, a next-generation postquantum cryptography, as a hardware accelerator on an FPGA using High-Level Synthesis (HLS). We optimized the Crystals-Kyber algorithm using various directives provided by Vitis HLS, configured the AXI interface, and designed a hardware accelerator that can be implemented on an FPGA. Then, we used Vivado tool to design the IP block and implement it on the ZYNQ ZCU106 FPGA. Finally, the video was recorded and H.264 compressed with Python code in the PYNQ framework, and the video encryption and decryption were accelerated using Crystals-Kyber hardware accelerator implemented on the FPGA.

FPGA based Implementation of FAST and BRIEF algorithm for Object Recognition (객체인식을 위한 FAST와 BRIEF 알고리즘 기반 FPGA 설계)

  • Heo, Hoon;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.17 no.2
    • /
    • pp.202-207
    • /
    • 2013
  • This paper implemented the conventional FAST and BRIEF algorithm as hardware on Zynq-7000 SoC Platform. Previous feature-based hardware accelerator is mostly implemented using the SIFT or SURF algorithm, but it requires excessive internal memory and hardware cost. The proposed FAST & BRIEF accelerator reduces approximately 57% of internal memory usage and 70% of hardware cost compared to the conventional SIFT or SURF accelerator, and it processes 0.17 pixel per Clock.

Hardware and Software Co-Design Platform for Energy-Efficient FPGA Accelerator Design (에너지 효율적인 FPGA 가속기 설계를 위한 하드웨어 및 소프트웨어 공동 설계 플랫폼)

  • Lee, Dongkyu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.20-26
    • /
    • 2021
  • Recent systems contain hardware and software components together for faster execution speed and less power consumption. In conventional hardware and software co-design, the ratio of software and hardware was divided by the designer's empirical knowledge. To find optimal results, designers iteratively reconfigure accelerators and applications and simulate it. Simulating iteratively while making design change is time-consuming. In this paper, we propose a hardware and software co-design platform for energy-efficient FPGA accelerator design. The proposed platform makes it easy for designers to find an appropriate hardware ratio by automatically generating application program code and hardware code by parameterizing the components of the accelerator. The co-design platform based on the Vitis unified software platform runs on a server with Xilinx Alveo U200 FPGA card. As a result of optimizing the multiplication accelerator for two matrices with 1000 rows, execution time was reduced by 90.7% and power consumption was reduced by 56.3%.

Design of Open Vector Graphics Accelerator for Mobile Vector Graphics (모바일 벡터 그래픽을 위한 OpenVG 가속기 설계)

  • Kim, Young-Ouk;Roh, Young-Sup
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.10
    • /
    • pp.1460-1470
    • /
    • 2008
  • As the performance of recent mobile systems increases, a vector graphic has been implemented to represent various types of dynamic menus, mails, and two-dimensional maps. This paper proposes a hardware accelerator for open vector graphics (OpenVG), which is widely used for two-dimensional vector graphics. We analyze the specifications of an OpenVG and divide the OpenVG into several functions suitable for hardware implementation. The proposed hardware accelerator is implemented on a field programmable gate array (FPGA) board using hardware description language (HDL) and is about four times faster than an Alex processor.

  • PDF

Development of BPM System using EPICS (1) (EPICS 를 이용한 BPM시스템 개발 (1))

  • Lee, Eun-H.;Yun, Jong-C.;Lee, Jin-W.;Choi, Jin-H.;Hwang, Jung-Y.;Nam, Sang-H.
    • Proceedings of the KIEE Conference
    • /
    • 2002.07d
    • /
    • pp.2325-2327
    • /
    • 2002
  • 포항 가속기연구소(PAL)에서는 포항방사광가속기(PLS)가 가동을 시작한 1994년 이후 현재까지 사용되어 온 기존의 제어 시스템을 새로운 환경인 EPICS(Experimental Physics and Industrial Control System) 시스템으로 개발하고 있다. EPICS 시스템의 구성은 IOC(Input/Ouput Controller) 와 OPI(Operator Interface)의 2-Layer로 구성되며 이는 MIU(Machine Interfaces Unit), SCC(Subsystem Computer Control System) 그리고 HMI(Human Machine Interface)로 이어지는 기존의 3-Layer 단계 중 SCC단계를 줄여 2-Layer로 구성된다. 이들 두 계층간의 통신은 Client(OPI)/Server(IOC) 구조의 Channel Access를 통해서 이루어진다. 개발중인 EPICS 시스템은 Open Architecture 구조로 IOC와 OPI 각 부분에서 개발시에 사용된 운영체제나 Hardware 를 사용하지 않고 다른 운영체제나 Hardware를 사용하더라도 하나의 공통부분 즉, Channel Access만 있으면 이를 통해 서로 다른 Subsystem IOC의 데이터를 Access할 수 있다. 전체 EPICS 제어시스템 중 저장링 운전의 핵심이 되는 BPM(Beam Position Monitoring) 및 MPS(Magnet Power Supply) 시스템은 IOC부분에 MVME5100(Target Machine) 보드와 vxWorks(Operating System)를 이용하고 OPI부분에는 SUN Workstation(Host Machine)와 Solaris(Operating System)을 사용하여 개발하고 있다. 본 논문에서는 IOC 및 OPI의 설치 절차와 설치 방법에 대해 기술하였다.

  • PDF

A programmable Soc for Var ious Image Applications Based on Mobile Devices

  • Lee, Bongkyu
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.3
    • /
    • pp.324-332
    • /
    • 2014
  • This paper presents a programmable System-On-a-chip for various embedded applications that need Neural Network computations. The system is fully implemented into Field-Programmable Gate Array (FPGA) based prototyping platform. The SoC consists of an embedded processor core and a reconfigurable hardware accelerator for neural computations. The performance of the SoC is evaluated using real image processing applications, such as optical character recognition (OCR) system.

A Security SoC embedded with ECDSA Hardware Accelerator (ECDSA 하드웨어 가속기가 내장된 보안 SoC)

  • Jeong, Young-Su;Kim, Min-Ju;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1071-1077
    • /
    • 2022
  • A security SoC that can be used to implement elliptic curve cryptography (ECC) based public-key infrastructures was designed. The security SoC has an architecture in which a hardware accelerator for the elliptic curve digital signature algorithm (ECDSA) is interfaced with the Cortex-A53 CPU using the AXI4-Lite bus. The ECDSA hardware accelerator, which consists of a high-performance ECC processor, a SHA3 hash core, a true random number generator (TRNG), a modular multiplier, BRAM, and control FSM, was designed to perform the high-performance computation of ECDSA signature generation and signature verification with minimal CPU control. The security SoC was implemented in the Zynq UltraScale+ MPSoC device to perform hardware-software co-verification, and it was evaluated that the ECDSA signature generation or signature verification can be achieved about 1,000 times per second at a clock frequency of 150 MHz. The ECDSA hardware accelerator was implemented using hardware resources of 74,630 LUTs, 23,356 flip-flops, 32kb BRAM, and 36 DSP blocks.

A 4K-Capable Hardware Accelerator of Haze Removal Algorithm using Haze-relevant Features

  • Lee, Seungmin;Kang, Bongsoon
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.212-218
    • /
    • 2022
  • The performance of vision-based intelligent systems, such as self-driving cars and unmanned aerial vehicles, is subject to weather conditions, notably the frequently encountered haze or fog. As a result, studies on haze removal have garnered increasing interest from academia and industry. This paper hereby presents a 4K-capable hardware implementation of an efficient haze removal algorithm with the following two improvements. First, the depth-dependent haze distribution is predicted using a linear model of four haze-relevant features, where the model parameters are obtained through maximum likelihood estimates. Second, the approximated quad-decomposition method is adopted to estimate the atmospheric light. Extensive experimental results then follow to verify the efficacy of the proposed algorithm against well-known benchmark methods. For real-time processing, this paper also presents a pipelined architecture comprised of customized macros, such as split multipliers, parallel dividers, and serial dividers. The implementation results demonstrated that the proposed hardware design can handle DCI 4K videos at 30.8 frames per second.

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network (인공 신경망 가속기 온칩 메모리 크기에 따른 주메모리 접근 횟수 추정에 대한 연구)

  • Cho, Seok-Jae;Park, Sungkyung;Park, Chester Sungchung
    • Journal of IKEEE
    • /
    • v.25 no.1
    • /
    • pp.180-192
    • /
    • 2021
  • One widely used algorithm for image recognition and pattern detection is the convolution neural network (CNN). To efficiently handle convolution operations, which account for the majority of computations in the CNN, we use hardware accelerators to improve the performance of CNN applications. In using these hardware accelerators, the CNN fetches data from the off-chip DRAM, as the massive computational volume of data makes it difficult to derive performance improvements only from memory inside the hardware accelerator. In other words, data communication between off-chip DRAM and memory inside the accelerator has a significant impact on the performance of CNN applications. In this paper, a simulator for the CNN is developed to analyze the main memory or DRAM access with respect to the size of the on-chip memory or global buffer inside the CNN accelerator. For AlexNet, one of the CNN architectures, when simulated with increasing the size of the global buffer, we found that the global buffer of size larger than 100kB has 0.8x as low a DRAM access count as the global buffer of size smaller than 100kB.