Search | Korea Science

AB9: A neural processor for inference acceleration

Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.491-504
- /
- 2020
We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.
https://doi.org/10.4218/etrij.2020-0134 인용 PDF KSCI

A 0.9-V human body communication receiver using a dummy electrode and clock phase inversion scheme

Oh, Kwang-Il;Kim, Sung-Eun;Kang, Taewook;Kim, Hyuk;Lim, In-Gi;Park, Mi-Jeong;Lee, Jae-Jin;Park, Hyung-Il
- ETRI Journal
- /
- v.44 no.5
- /
- pp.859-874
- /
- 2022
This paper presents a low-power and lightweight human body communication (HBC) receiver with an embedded dummy electrode for improved signal acquisition. The clock data recovery (CDR) circuit in the receiver operates with a low supply voltage and utilizes a clock phase inversion scheme. The receiver is equipped with a main electrode and dummy electrode that strengthen the capacitive-coupled signal at the receiver frontend. The receiver CDR circuit exploits a clock inversion scheme to allow 0.9-V operation while achieving a shorter lock time than at 3.3-V operation. In experiments, a receiver chip fabricated using 130-nm complementary metal-oxide-semiconductor technology was demonstrated to successfully receive the transmitted signal when the transmitter and receiver are placed separately on each hand of the user while consuming only 4.98 mW at a 0.9-V supply voltage.
https://doi.org/10.4218/etrij.2022-0106 인용 PDF KSCI

A layer-wise frequency scaling for a neural processing unit

Chung, Jaehoon;Kim, HyunMi;Shin, Kyoungseon;Lyuh, Chun-Gi;Cho, Yong Cheol Peter;Han, Jinho;Kwon, Youngsu;Gong, Young-Ho;Chung, Sung Woo
- ETRI Journal
- /
- v.44 no.5
- /
- pp.849-858
- /
- 2022
Dynamic voltage frequency scaling (DVFS) has been widely adopted for runtime power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Because the frequency scaling is fast enough to keep up with each layer, we propose a layerwise dynamic frequency scaling (DFS) technique for an NPU. Our proposed DFS exploits the highest frequency under the power limit of an NPU for each layer. To determine the highest allowable frequency, we build a power model to predict the power consumption of an NPU based on a real measurement on the fabricated NPU. Our evaluation results show that our proposed DFS improves frame per second (FPS) by 33% and saves energy by 14% on average, compared with DVFS.
https://doi.org/10.4218/etrij.2022-0094 인용 PDF KSCI

An Architecture Model on Artificial Intelligence for Ground Tactical Echelons (지상 전술 제대 인공지능 아키텍처 모델)

Kim, Jun Sung;Park, Sang Chul
- Journal of the Korea Institute of Military Science and Technology
- /
- v.25 no.5
- /
- pp.513-521
- /
- 2022
This study deals with an AI architecture model for collecting battlefield data using the tactical C4I system. Based on this model, the artificial staff can be utilized in tactical echelon. In the current structure of the Army's tactical C4I system, Servers are operated by brigade level and above and divided into an active and a standby server. In this C4I system structure, the AI server must also be installed in each unit and must be switched when the C4I server is switched. The tactical C4I system operates a server(DB) for each unit, so data matching is partially delayed or some data is not matched in the inter-working process between servers. To solve these issues, this study presents an operation concept so that all of alternate server can be integrated based on virtualization technology, which is used as an source data for AI Meta DB. In doing so, this study can provide criteria for the AI architectural model of the ground tactical echelon.
https://doi.org/10.9766/KIMST.2022.25.5.513 인용 PDF KSCI

An impulse radio (IR) radar SoC for through-the-wall human-detection applications

Park, Piljae;Kim, Sungdo;Koo, Bontae
- ETRI Journal
- /
- v.42 no.4
- /
- pp.480-490
- /
- 2020
More than 42 000 fires occur nationwide and cause over 2500 casualties every year. There is a lack of specialized equipment, and rescue operations are conducted with a minimal number of apparatuses. Through-the-wall radars (TTWRs) can improve the rescue efficiency, particularly under limited visibility due to smoke, walls, and collapsed debris. To overcome detection challenges and maintain a small-form factor, a TTWR system-on-chip (SoC) and its architecture have been proposed. Additive reception based on coherent clocks and reconfigurability can fulfill the TTWR demands. A clock-based single-chip infrared radar transceiver with embedded control logic is implemented using a 130-nm complementary metal oxide semiconductor. Clock signals drive the radar operation. Signal-to-noise ratio enhancements are achieved using the repetitive coherent clock schemes. The hand-held prototype radar that uses the TTWR SoC operates in real time, allowing seamless data capture, processing, and display of the target information. The prototype is tested under various pseudo-disaster conditions. The test standards and methods, developed along with the system, are also presented.
https://doi.org/10.4218/etrij.2020-0116 인용 PDF KSCI

Automated optimization for memory-efficient high-performance deep neural network accelerators

Kim, HyunMi;Lyuh, Chun-Gi;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.505-517
- /
- 2020
The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.
https://doi.org/10.4218/etrij.2020-0125 인용 PDF KSCI

Synthesis of Gamma Aluina Powder for Catalytic Support from Kaolin (카올린으로부터 촉매담체용 감마알루미나 분말의 합성)

Kang, H.K.;Park, H.C.;Choi, I.S.;Lee, H.;Son, M.M.
- Korean Journal of Materials Research
- /
- v.6 no.9
- /
- pp.943-949
- /
- 1996
카올린에서 추출된 황산알루미늄 용액으로부터 촉매담체용 r-AI2O3분말 합성에 대하여 연구하였다. 황산알루미늄 용액을 교반중의 에탄올(ethanol)에 적하하여 단일상(single phase)의 AI2(So4)3 18H2O석출물을 제조하고 이 석출물의 하소로부터 r-AI2O3분말을 합성하였다. 이 분말을 100$0^{\circ}C$에서 2시간 하소하였을 때 열적안정성을 보였으나, 120$0^{\circ}C$-2시간 하소에서 $\alpha$-AI2O3 로 전이하였다. r-AI2O3의 열적안정성에 미치는 BaO 첨가의 영향을 조사 연구하였다. 첨가량은 r-AI2O3에 대해서 1.0-6.0wt%로 하였다. BaO를 4.0wt%첨가한 시료는 AI2O3-BaO.6AI2O3(hexa-aluminate)혼합상생성으로 인하여 120$0^{\circ}C$에서 r-AI2O3의 전이를 방지하는데 효과적이었다. BaO를 4.0wt%첨가한 시료와 BaO를 첨가하지 않은 시료에 대하여 r-AI2O3$\longrightarrow$ $\alpha$-AI2O3전이에 따른 비표면적을 조사하여 보았다. 120$0^{\circ}C$-2시간에서 BaO 4.0wt% 첨가한 시료와 BaO를 첨가하지 않은 시료의 비표면적은 각각 95$m^2$/g과 50$m^2$/g을 유지하였다.
PDF

Trends in Hardware Acceleration Techniques for Fully Homomorphic Encryption Operations (완전동형암호 연산 가속 하드웨어 기술 동향)

Park, S.C.;Kim, H.W.;Oh, Y.R.;Na, J.C.
- Electronics and Telecommunications Trends
- /
- v.36 no.6
- /
- pp.1-12
- /
- 2021
As the demand for big data and big data-based artificial intelligence (AI) technology increases, the need for privacy preservations for sensitive information contained in big data and for high-speed encryption-based AI computation systems also increases. Fully homomorphic encryption (FHE) is a representative encryption technology that preserves the privacy of sensitive data. Therefore, FHE technology is being actively investigated primarily because, with FHE, decryption of the encrypted data is not required in the entire data flow. Data can be stored, transmitted, combined, and processed in an encrypted state. Moreover, FHE is based on an NP-hard problem (Lattice problem) that cannot be broken, even by a quantum computer, because of its high computational complexity and difficulty. FHE boasts a high-security level and therefore is receiving considerable attention as next-generation encryption technology. However, despite being able to process computations on encrypted data, the slow computation speed due to the high computational complexity of FHE technology is an obstacle to practical use. To address this problem, hardware technology that accelerates FHE operations is receiving extensive research attention. This article examines research trends associated with developments in hardware technology focused on accelerating the operations of representative FHE schemes. In addition, the detailed structures of hardware that accelerate the FHE operation are described.
https://doi.org/10.22648/ETRI.2021.J.360601 인용 PDF

Low Power SoC Design Trends Using EDA Tools (설계툴을 사용한 저전력 SoC 설계 동향)

Park, Nam Jin;Joo, Yu Sang;Na, Jung-Chan
- Electronics and Telecommunications Trends
- /
- v.35 no.2
- /
- pp.69-78
- /
- 2020
Small portable devices such as mobile phones and laptops currently display a trend of high power consumption owing to their characteristics of high speed and multifunctionality. Low-power SoC design is one of the important factors that must be considered to increase portable time at limited battery capacities. Popular low power SoC design techniques include clock gating, multi-threshold voltage, power gating, and multi-voltage design. With a decreasing semiconductor process technology size, leakage power can surpass dynamic power in total power consumption; therefore, appropriate low-power SoC design techniques must be combined to reduce power consumption to meet the power specifications. This study examines several low-power SoC design trends that reduce semiconductor SoC dynamic and static power using EDA tools. Low-power SoC design technology can be a competitive advantage, especially in the IoT and AI edge environments, where power usage is typically limited.
https://doi.org/10.22648/ETRI.2020.J.350206 인용 PDF

Artificial Intelligence Applications on Mobile Telecommunication Systems (AI의 이동통신시스템 적용)

Yeh, C.I.;Chang, K.S.;Ko, Y.J.
- Electronics and Telecommunications Trends
- /
- v.37 no.4
- /
- pp.60-69
- /
- 2022
So far, artificial intelligence (AI)/machine learning (ML) has produced impressive results in speech recognition, computer vision, and natural language processing. AI/ML has recently begun to show promise as a viable means for improving the performance of 5G mobile telecommunication systems. This paper investigates standardization activities in 3GPP and O-RAN Alliance regarding AI/ML applications on mobile telecommunication system. Future trends in AI/ML technologies are also summarized. As an overarching technology in 6G, there appears to be no doubt that AI/ML could contribute to every part of mobile systems, including core, RAN, and air-interface, in terms of performance enhancement, automation, cost reduction, and energy consumption reduction.
https://doi.org/10.22648/ETRI.2022.J.370407 인용 PDF

Search Result 78, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)