DOI QR코드

DOI QR Code

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue

글로벌 큐를 통한 임베디드 멀티코어 프로세서의 멀티 DNN 연산 성능 향상

  • Cho, Ho-jin (Department of Applied IT Engineering, Hansung University) ;
  • Kim, Myung-sun (Department of IT Convergence Engineering, Hansung University)
  • Received : 2020.03.31
  • Accepted : 2020.04.24
  • Published : 2020.06.30

Abstract

DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.

DNN은 로봇 및 자율주행차 등의 임베디드 시스템에서 활용 분야가 넓어지고 있다. 최근 높은 인식 정확도를 위하여 연산 복잡도가 크게 증가되고 비주기적으로 다수의 DNN을 사용하는 형태가 증가되고 있다. 따라서 임베디드 환경에서 다수의 DNN을 처리할 수 있는 능력은 중요한 이슈가 되었다. 이에 따라 멀티코어 기반 플랫폼들이 출시되고 있다. 하지만 대부분의 DNN 모델들은 배치 프로세스로 운용되어, 여러 DNN이 함께 멀티코어에서 운용될 때 어떻게 코어에 할당되느냐에 따라 각 DNN 간 수행시간 편차가 클 수 있고 시스템 전체적인 DNN 수행 시간이 길어질 수 있다. 본 논문에서는 각 DNN들을 배치 형태가 아닌 레이어별로 재구성한 후 글로벌 큐를 통하여 멀티코어에 분산시킬 수 있는 프레임워크를 제공하여 이러한 문제를 해결한다. 실험 결과 전체 DNN 수행 시간은 31% 감소하였고 다수의 동일 DNN을 운용 시 그 수행시간 편차는 최대 95.1% 감소하였다.

Keywords

References

  1. S. Lin, Y. Zhang, C. Hong, M. Skach, M. Haque L. Tang and J. Mars, " The Architectural Implications of Autonomous Driving: Constraints and Acceleration," in Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, Williamsburg, USA, pp. 751-66, 2018.
  2. J. Dyrstad and J. Mathiassen, "Grasping virtual fish: A step towards robotic deep learning from demonstration in virtual reality," in Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, China, 2017.
  3. D. Vasisht. Z. Kapetanovic, J. Won, X. Jin, R. Chandra, A. Kapoor, N. sinha, and M. Sudarshan, "FarmBeats: An IoT Platform for Data-Driven Agriculture," in Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, Boston, USA, 2017.
  4. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake, Utah, pp. 269-284, 2014.
  5. V. Sze, Y. Chen, T. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Jan. 2017. https://doi.org/10.1109/JPROC.2017.2761740
  6. Jetson AGX Xavier Developer Kit [Internet]. Available: https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.
  7. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proceedings of the 26th Conference on Neural Information Processing Systems, Lake Tahoe, pp. 1097-1105, 2012.
  8. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proceedings of the International Conference on Learning Representations, San Diego, CA, 2015.
  9. H. Kim, J. Kim, and H. Jung, "Convolutional Neural Network Based Image Processing System," Journal of Information and Communication Convergence Engineering, vol. 16, no. 3, pp. 160-165, Sep. 2018. https://doi.org/10.6109/JICCE.2018.16.3.160
  10. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770-778, 2016.
  11. Caffe, Deep learning framework by BAIR [Internet]. Available: http://caffe.berkeleyvision.org/.
  12. Torch, [Internet]. Available: http://torch.ch/.
  13. TensorFlow, [Internet]. Available: http://download.tensorflow.org/paper/whitepaper2015.pdf/.
  14. S. Huh, J. Yoo, M. Kim and S. Hong, "Providing Fair Share Scheduling on Multicore Cloud Servers via Virtual Runtime-based Task Migration Algorithm", in Proceedings of the 32nd IEEE International Conference on Distributed Computing Systems (ICDCS), Macau, China pp. 606-614, 2012.
  15. S. Eyerman and L. Eeckhout,"System-Level Performance Metrics for Multiprogram Workloads" in Micro, IEEE. vol. 28, pp. 42-53, 2008.
  16. L. Nguyen, D. Lin, Z. Lin and J. Cao, "Deep CNNs for microscopic image classification by exploiting transfer learning and feature concatenation", in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 2018.
  17. X. Yu, N. Zeng, S. Liu and Y. Zhang, "Utilization of DenseNet201 for diagnosis of breast abnormality", in Machine Vision and Applications. vol. 30, Oct. 2019.