DOI QR코드

DOI QR Code

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

  • Han, Jinho (AI Processor Research Section, Electronics and Telecommunications Research Institute) ;
  • Choi, Minseok (AI Processor Research Section, Electronics and Telecommunications Research Institute) ;
  • Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute)
  • 투고 : 2020.03.28
  • 심사 : 2020.07.02
  • 발행 : 2020.08.18

초록

The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

키워드

참고문헌

  1. T. Luo et al., DaDianNao: a neural network supercomputer, IEEE Trans. Comput. 66 (2017), no. 1, 73-88. https://doi.org/10.1109/TC.2016.2574353
  2. N. Jouppi et al., In-datacenter performance analysis of a tensor processing unit, in Proc. ACM/IEEE Annu. Int. Sym. Comput. Architecture (Toronto, Canada), June 2017, pp. 1-12.
  3. A. Parashar et al., SCNN: An accelerator for compressed-sparse convolutional neural networks, in Proc. ACM/IEEE Annu. Int. Symp. Comput. Architecture (Toronto, Canada), June 2017, pp. 27-40.
  4. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn. Representations (San Diego, CA, US), May 2015.
  5. A. Krizhevsky et al., ImageNet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012), 1106-1114.
  6. J. Redmon et al., You only look once: unified, real-time object detection, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las Vegas, NV, USA), 2016, pp. 779-788.
  7. C. Szegedy et al., Going deeper with convolutions, ArXiv:1409.4842, 17th Sep 2014.
  8. P. Gupta, An Overview of NVIDIA's Autonomous Vehicles Platform, in Proc. HotChips (Cupertino, CA, USA), 2017.
  9. J. Choquette, O. Giroux, and D. Foley, Volta: Performance and Programmability, IEEE Micro 38 (2018), no. 2, 42-52. https://doi.org/10.1109/MM.2018.022071134
  10. P. Bannon et al., Compute and redundancy solution for the full self-driving computer, in Proc. 31th Hot Chips (Silicon Valley, CA, USA), 2019.
  11. ISO26262 2nd Edition: Road vehicles - Functional Safety, 2018.
  12. C. Takahashi et al., A 16nm FinFET Heterogeneous Nona-Core SoC Complying with ISO26262 ASIL-B: Achieving 10-7 Random Hardware Failures per Hour Reliability, in Proc. IEEE Int. Solid-State Circuits Conf. (San Francisco, CA, USA), 2016, pp. 80-81.
  13. Reliability data handbook, IEC TR 62380, 2004.
  14. ISO/PAS21448: Road vehicles - Safety of The Intended Functionality, 2019.
  15. Y. Kwon et al., Function-Safe Vehicle AI Processor with Nano Core-in-Memory Architecture, in Proc. IEEE Int. Conf. Artif. Intell. Circuits Syst. (Hsinchu, Taiwan), Mar. 2019, 127-131.
  16. A. Golander et al., Synchronizing Redundant Cores in a Dynamic DMR Multicore Architecture, IEEE Trans. Circuits Syst. II Exp. Briefs 56 (2009), no. 6, 474-478. https://doi.org/10.1109/TCSII.2009.2020930
  17. E. Rotenberg, AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors, in Proc. Int. Symp. Fault-Tolerant Comput. (Madison, WI, USA), June 1999, pp. 84-91.
  18. M. Zhang et al., Reliable ultra-low-voltage cache design for many-core systems, IEEE Trans. Circuits Syst. II Exp. Briefs 59 (2010), no. 12, 858-862. https://doi.org/10.1109/TCSII.2012.2231013
  19. M. R. Kakoee et al., Variation-tolerant architecture for ultra low power shared-L1 processor clusters, IEEE Trans. Circuits Syst. II Exp. Briefs 59 (2012), no. 12, 927-931. https://doi.org/10.1109/TCSII.2012.2231039
  20. A. R. Alameldeen et al., Energy-efficient cache design using variable-strength error-correcting codes, in Proc. Annu. Int. Symp. Comput. Architecture (San Jose, CA, USA), June 2011, pp. 461-471.
  21. D. Rossi et al., Error correcting code analysis for cache memory high reliability and performance, in Proc. Design, Autom. Test Eur. (Grenoble, France), Mar 2011, pp. 1620-1625.
  22. A. Neale et al., Adjacent-MBU-Tolerant SEC-DED-TAEC-yAED Codes for Embedded SRAMs, IEEE Trans. Circuits Syst. II Exp. Briefs 62 (2015), no. 4, 387-391. https://doi.org/10.1109/TCSII.2014.2368262
  23. Y. Kwon et al., 80mW/MHz 0.68V ultra low-power variation-tolerant superscalar dual-core processor, IEIE Trans. Smart Process. Comput. 4 (2015), no. 2, 71-77. https://doi.org/10.5573/IEIESPC.2015.4.2.071
  24. J. Han et al., 80${\mu}m$/MHz, 850MHz fault tolerant processor with fault monitor systems, J. Semiconductor Technol. Sci. 17 (2017), no. 5, 627-635. https://doi.org/10.5573/JSTS.2017.17.5.627
  25. J. Han et al., A fault tolerant cache system of automotive vision processor complying with ISO26262, IEEE Trans. Circuits Syst. II: Express Briefs 63 (2016), no. 12, 1146-1150. https://doi.org/10.1109/TCSII.2016.2620997
  26. J. Han, Y. Kwon, and H.-J. Yoo, A 1GHz fault tolerant processor with dynamic lockstep and self-recovering cache for ADAS SoC complying with ISO26262 in automotive electronics, in Proc. IEEE Asian Solid State Circuits Conf. (Seoul, Rep. of Korea), Nov. 2017, pp. 313-316.
  27. H. Kimura et al., A 40 nm flash microcontroller with 0.80${\mu}$s field-oriented-control intelligent motor timer and functional safety system for next-generation EV/HEV, in Proc. IEEE Int. Solid-State Circuits Conf. (San Francisco, CA, USA), Feb. 2017, pp. 58-59.
  28. R. Venkatasubramanian et al., A 16 nm 3.5B+ transistor >14TOPS 2-to-10W multicore SoC platform for automotive and embedded applications with integrated safety MCU, 512b vector VLIW DSP, embedded vision and imaging acceleration, in Proc. IEEE Int. Solid-State Circuits Conf. (San Francisco, CA, USA), Feb. 2020, pp. 52-54.

피인용 문헌

  1. A Survey of Software-Defined Networks-on-Chip: Motivations, Challenges and Opportunities vol.12, pp.2, 2021, https://doi.org/10.3390/mi12020183
  2. 인공지능 프로세서 컴파일러 개발 동향 vol.36, pp.2, 2020, https://doi.org/10.22648/etri.2021.j.360204
  3. DiLO: Direct light detection and ranging odometry based on spherical range images for autonomous driving vol.43, pp.4, 2020, https://doi.org/10.4218/etrij.2021-0088
  4. SoC-Level Safety-Oriented Design Process in Electronic System Level Development Environment vol.30, pp.14, 2020, https://doi.org/10.1142/s0218126621502546