DOI QR코드

DOI QR Code

CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계

Design of Multipliers Optimized for CNN Inference Accelerators

  • Lee, Jae-Woo (Department of Electronic Engineering, Korea National University of Transportation) ;
  • Lee, Jaesung (Department of Electronic Engineering, Korea National University of Transportation)
  • 투고 : 2021.08.19
  • 심사 : 2021.09.03
  • 발행 : 2021.10.31

초록

AI 프로세서를 FPGA 기반으로 구현하는 연구가 최근 활발하게 진행되고 있다. Deep Convolutional Neural Networks (CNN) 는 AI 프로세서가 수행하는 기본적인 연산 구조로서 매우 방대한 양의 곱셈을 필요로 한다. CNN 추론 연산에서 사용되는 곱셈 계수는 상수라는 점과 FPGA 은 특정 계수에 맞춰진 곱셈기 설계가 용이하다는 점에 착안하여 곱셈기를 최적화 구현할 수 있는 방법을 제안한다. 본 방법은 2의 보수와 분배법칙을 활용하여 곱셈 계수에서 값이 1인 비트의 개수를 최소화하여 필요한 적층 덧셈기의 개수를 절감한다. CNN 을 FPGA 에 구현한 실제 예제에 본 방법을 적용해본 결과 로직 사용량은 최대 30.2%까지, 신호 전달 지연은 최대 22%까지 줄어들었다. ASIC 전용 칩으로 구현할 경우에도 하드웨어 면적은 최대 35%까지, 신호 전달 지연은 최대 19.2%까지 줄어드는 것으로 나타났다.

Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

키워드

과제정보

This study was supported by a grant of the SME R&D project for the Start-up & Grow stage company, Ministry of SMEs and Startups. (S2798837)

참고문헌

  1. A. Shawahna, S. M. Sait, and A. El-Maleh, "FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review," IEEE Access, vol. 7, pp. 7823-7859, 2019. https://doi.org/10.1109/ACCESS.2018.2890150
  2. Y. C. Ling, H. H. Chin, H. I. Wu, and R. S. Tsay, "Designing A Compact Convolutional Neural Network Processor on Embedded FPGAs," 2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT), Dubai, United Arab Emirates, pp. 1-7, 2020.
  3. FPGA Design of a Neural Network for Color Detection [Internet]. Available: https://github.com/Marco-Winzker/NN_RGB_FPGA.
  4. Deep convolutional neural network inference with floating-point weights and fixed-point activations [Internet]. Available: https://arxiv.org/abs/1703.03073.
  5. N. Dong, D. Kim, and J. Lee, "Double MAC: Doubling the performance of convolutional neural networks on modern FPGAs," Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, pp. 890-893 2017.
  6. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPG-based accelerator design for deep convolutional neural networks," Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp. 161-170, Feb. 2015.
  7. H. K. Jeon, K. Y. Lee, and C. Y. Kim, "An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning," Journal of IKEEE, vol. 20, no. 3, pp. 303-306, 2016. https://doi.org/10.7471/IKEEE.2016.20.3.303
  8. D. S. Kim and K. W. Shin, "Montgomery Multiplier Supporting Dual-Field Modular Multiplication," Journal of the Korea Institute of Information and Communication Engineering, vol. 24, no. 6, pp. 736-743, 2020. https://doi.org/10.6109/JKIICE.2020.24.6.736
  9. M. S. Kang, "Design of a High-speed FPGA-Based ARIA Cipher Processor," Journal of Security Engineering, vol. 11, no. 3, pp. 195-206, 2014. https://doi.org/10.14257/jse.2014.06.01
  10. D. Harris and S. L. Harris, Digital design and computer architecture, Morgan Kaufmann, Aug. 2012.