Design of Multipliers Optimized for CNN Inference Accelerators

Lee, Jae-Woo;Lee, Jaesung;

doi:10.6109/jkiice.2021.25.10.1403

한국정보통신학회논문지 (Journal of the Korea Institute of Information and Communication Engineering)

제25권10호
/
Pages.1403-1408
/
2021
/
2234-4772(pISSN)
/
2288-4165(eISSN)

한국정보통신학회 (The Korea Institute of Information and Commucation Engineering)

DOI QR Code

CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계

Design of Multipliers Optimized for CNN Inference Accelerators

이재우 ;
이재성

Lee, Jae-Woo (Department of Electronic Engineering, Korea National University of Transportation) ;
Lee, Jaesung (Department of Electronic Engineering, Korea National University of Transportation)

투고 : 2021.08.19
심사 : 2021.09.03
발행 : 2021.10.31

https://doi.org/10.6109/jkiice.2021.25.10.1403 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

AI 프로세서를 FPGA 기반으로 구현하는 연구가 최근 활발하게 진행되고 있다. Deep Convolutional Neural Networks (CNN) 는 AI 프로세서가 수행하는 기본적인 연산 구조로서 매우 방대한 양의 곱셈을 필요로 한다. CNN 추론 연산에서 사용되는 곱셈 계수는 상수라는 점과 FPGA 은 특정 계수에 맞춰진 곱셈기 설계가 용이하다는 점에 착안하여 곱셈기를 최적화 구현할 수 있는 방법을 제안한다. 본 방법은 2의 보수와 분배법칙을 활용하여 곱셈 계수에서 값이 1인 비트의 개수를 최소화하여 필요한 적층 덧셈기의 개수를 절감한다. CNN 을 FPGA 에 구현한 실제 예제에 본 방법을 적용해본 결과 로직 사용량은 최대 30.2%까지, 신호 전달 지연은 최대 22%까지 줄어들었다. ASIC 전용 칩으로 구현할 경우에도 하드웨어 면적은 최대 35%까지, 신호 전달 지연은 최대 19.2%까지 줄어드는 것으로 나타났다.

Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

키워드

과제정보

This study was supported by a grant of the SME R&D project for the Start-up & Grow stage company, Ministry of SMEs and Startups. (S2798837)

참고문헌

A. Shawahna, S. M. Sait, and A. El-Maleh, "FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review," IEEE Access, vol. 7, pp. 7823-7859, 2019. https://doi.org/10.1109/ACCESS.2018.2890150
Y. C. Ling, H. H. Chin, H. I. Wu, and R. S. Tsay, "Designing A Compact Convolutional Neural Network Processor on Embedded FPGAs," 2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT), Dubai, United Arab Emirates, pp. 1-7, 2020.
FPGA Design of a Neural Network for Color Detection [Internet]. Available: https://github.com/Marco-Winzker/NN_RGB_FPGA.
Deep convolutional neural network inference with floating-point weights and fixed-point activations [Internet]. Available: https://arxiv.org/abs/1703.03073.
N. Dong, D. Kim, and J. Lee, "Double MAC: Doubling the performance of convolutional neural networks on modern FPGAs," Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, pp. 890-893 2017.
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPG-based accelerator design for deep convolutional neural networks," Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp. 161-170, Feb. 2015.
H. K. Jeon, K. Y. Lee, and C. Y. Kim, "An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning," Journal of IKEEE, vol. 20, no. 3, pp. 303-306, 2016. https://doi.org/10.7471/IKEEE.2016.20.3.303
D. S. Kim and K. W. Shin, "Montgomery Multiplier Supporting Dual-Field Modular Multiplication," Journal of the Korea Institute of Information and Communication Engineering, vol. 24, no. 6, pp. 736-743, 2020. https://doi.org/10.6109/JKIICE.2020.24.6.736
M. S. Kang, "Design of a High-speed FPGA-Based ARIA Cipher Processor," Journal of Security Engineering, vol. 11, no. 3, pp. 195-206, 2014. https://doi.org/10.14257/jse.2014.06.01
D. Harris and S. L. Harris, Digital design and computer architecture, Morgan Kaufmann, Aug. 2012.

한국정보통신학회논문지 (Journal of the Korea Institute of Information and Communication Engineering)

CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계

Design of Multipliers Optimized for CNN Inference Accelerators

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)