DOI QR코드

DOI QR Code

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Chung, Jaehoon (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Yang, Jeongmin (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Lyuh, Chun-Gi (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Kim, HyunMi (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Kim, Chan (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Ham, Je-seok (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Choi, Minseok (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Shin, Kyoungseon (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Han, Jinho (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute) ;
  • Kwon, Youngsu (AI Processor Research Team, AI SoC Research Department, Electronics and Telecommunications Research Institute)
  • Received : 2020.04.16
  • Accepted : 2020.07.02
  • Published : 2020.08.18

Abstract

We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

Keywords

References

  1. ETRI Technology, Aldebaran microcontroller SoC for mobile robot (low power MCU core technology), 2017, available at https://www.etri.re.kr/eng/bbs/view.etri?b_board_id=ENG03&b_idx=16719
  2. J. Han et al., A 1GHz fault tolerant processor with dynamic lockstep and self-recovering cache for ADAS SoC complying with ISO26262 in automotive electronics, in Proc. IEEE Asian Solid-State Circuits Conf. (Seoul, Rep. of Korea), Nov. 2017, pp. 313-316.
  3. Y. Jia, Learning semantic image representations at a large scale, Ph.D. Thesis, EECS Department, Univ. of California, Berkeley, May 2014.
  4. S. Gupta et al., Deep learning with limited numerical precision, Int. Conf. Mach. Learn. 37 (2015), 1737-1746.
  5. J. Redmon and A. Farhadi, Yolo9000: Better, faster, stronger, 2016, available at https://arxiv.org/abs/1612.08242, preprint.
  6. J. Kim, J. K. Lee, and K. M. Lee, Accurate image super-resolution using very deep convolutional networks, in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (Las Vegas, NV, USA), 2016, pp. 1646-1654.
  7. A. Ignatov et al., AI benchmark: All about deep learning on smartphones in 2019, in Proc. IEEE/CVF Int. Conf. Comput. Vision Workshop (Seoul, Rep. of Korea), Oct. 2019, pp. 3617-3635.
  8. AI-Benchmark, available at http://www.ai-bench mark.com
  9. J. Johnson. Benchmarks for popular CNN models, available at https://github.com/jcjoh nson/cnn-bench marks
  10. Coral, Edge TPU performance benchmarks, available at https://coral.ai/docs/edget pu/benchmarks/
  11. T. Narayan and Intel AI Academy, A comparison of performance of deep learning models on Edge using Intel Movidius Neural Compute Stick and Raspberry PI3, available at https://medium.com/intel-student-ambassadors/object-detection-a-comparison-of-performance-of-deep-learning-models-on-edge-using-intel-f66eb7f45b17
  12. S. Hossain and D. Lee, Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices, Sensors 19 (2019), no. 15, 3371:1-3424.
  13. J. Guerreiro et al., Modeling and decoupling the GPU power consumption for cross-domain DVFS, IEEE Trans. Parallel Distrib. Syst. 30 (2019), no. 11, 2494-2506. https://doi.org/10.1109/TPDS.2019.2917181

Cited by

  1. 인공지능 프로세서 컴파일러 개발 동향 vol.36, pp.2, 2020, https://doi.org/10.22648/etri.2021.j.360204