DOI QR코드

DOI QR Code

Automated optimization for memory-efficient high-performance deep neural network accelerators

  • Kim, HyunMi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Lyuh, Chun-Gi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute)
  • 투고 : 2020.03.28
  • 심사 : 2020.07.02
  • 발행 : 2020.08.18

초록

The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.

키워드

참고문헌

  1. Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521 (2015), 436-444. https://doi.org/10.1038/nature14539
  2. L. Besacier et al., Automatic speech recognition for under-resourced languages: a survey, Speech Commun. 56 (2014), 85-100. https://doi.org/10.1016/j.specom.2013.07.008
  3. K. Arulkumaran, A. Cully, and J. Togelius, AlphaStar: An evolutionary computation perspective, arXiv preprint arXiv:1902.01724v2, 2019.
  4. M. Fatima and M. Pasha, Survey of machine learning algorithms for disease diagnostic, J. Intell. Learn. Syst. Aapplicat. 9 (2017), 1-16. https://doi.org/10.4236/jilsa.2017.91001
  5. S. Grigorescu et al., A survey of deep learning techniques for autonomous driving, arXiv preprint arXiv:1910.07738, 2019.
  6. I. S. Krizhevsky and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. Int. Conf. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1097-1105.
  7. J. Albericio et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing, in Proc. Int. Symp. Comput. Architecture (Seoul, Rep. of Korea), (2016), 1-13.
  8. S. Han et al., EIE: Efficient inference engine on compressed deep neural network, in Proc. Int. Symp. Computer Architecture (Seoul, Rep. of Korea), (2016), 243-254.
  9. Y.-H. Chen et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, IEEE J. Solid- State Circuits 52 (2017), 127-138. https://doi.org/10.1109/JSSC.2016.2616357
  10. Y. Chen et al., DaDianNao: A machine-learning supercomputer, in Proc. Int. Symp. Microarchitecture (Cambridge, UK), (2014), 609-622.
  11. N. Jouppi et al., In-datacenter performance analysis of a tensor processing unit, in Proc. Int. Symp. Computer Architecture (Toronto, Canada), (2017), 1-12.
  12. Y. H. Chen et al., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices, IEEE J. Emerg. Sel. Top. Circuits Syst. 9 (2019), 292-308. https://doi.org/10.1109/JETCAS.2019.2910232
  13. V. Sze et al., Efficient processing of deep neural networks: A tutorial and survey, Proc. IEEE 105 (2017), 2295-2329. https://doi.org/10.1109/JPROC.2017.2761740
  14. R. Andri et al., YodaNN: An architecture for ultra-low power binary-weight cnn acceleration, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37 (2018), 48-60. https://doi.org/10.1109/TCAD.2017.2682138
  15. Y. C. Yoon et al., Image classification and captioning model considering a CAM-based disagreement loss, ETRI J. 42 (2020), 67-77. https://doi.org/10.4218/etrij.2018-0621
  16. J. Jung and J. Park, Improving visual relationship detection using linguistic and spatial cues, ETRI J. 42 (2020), 399-410. https://doi.org/10.4218/etrij.2019-0093
  17. J. A. B. Fortes and B. W. Benjamin, Systolic arrays - From concept to implementation, IEEE Comput 20 (1987), 12-17.
  18. K. He et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Nevada, USA), 2016, 770-778.
  19. I. Sutskever Krizhevsky and G. Hinton. Imagenet classification with deep convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1106-1114.
  20. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn. Representations (San Diego, USA), 2015.
  21. J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, arXiv preprint, arXiv1612.08242, 2016.
  22. J. Redmon and A. Farhadi, Yolov3: An incremental improvement, arXiv preprint, arXiv:1804.02767, 2018.
  23. F. N. Iandola et al., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and; 0.5 mb model size, arXiv preprint, arXiv:1602.07360, 2016.
  24. Y. Kwon et al., Function-safe vehicular ai processor with nano core-in-memory architecture, in Proc. IEEE Int. Conf. Art. Intel. Circuits Syst. (Hsinchu, Taiwan), 2019, 127-131.
  25. O. Russakovsky et al., ImageNet large scale visual recognition challenge, arXiv preprint, arXiv:1409.0575, 2014.

피인용 문헌

  1. 인공지능 프로세서 컴파일러 개발 동향 vol.36, pp.2, 2020, https://doi.org/10.22648/etri.2021.j.360204
  2. Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory vol.21, pp.7, 2020, https://doi.org/10.3390/s21072364
  3. Memory Optimization Techniques in Neural Networks: A Review vol.10, pp.6, 2020, https://doi.org/10.35940/ijeat.f2991.0810621