Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing

Kim, Ji-Min;Kim, In-Mo;Kim, Myung-Sun;

doi:10.6109/jkiice.2022.26.6.842

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 26 Issue 6
/
Pages.842-849
/
2022
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing

터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법

Kim, Ji-Min (Department of IT Convergence Engineering, Hansung University) ;
Kim, In-Mo (Department of IT Convergence Engineering, Hansung University) ;
Kim, Myung-Sun (Department of Applied Artificial Intelligence, Hansung University)

Received : 2022.05.06
Accepted : 2022.05.27
Published : 2022.06.30

https://doi.org/10.6109/jkiice.2022.26.6.842 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.

딥러닝 기술의 발달로 무인 자동차, 드론, 로봇 등의 임베디드 시스템 분야에서 DNN을 활용하는 사례가 많아지고 있다. 대표적으로 자율주행 시스템의 경우 정확도가 높고 연산량이 큰 몇 개의 DNN들을 동시에 수행하는 것이 필수적이다. 하지만 상대적으로 낮은 성능을 갖는 임베디드 환경에서 다수의 DNN을 동시에 수행하면 추론에 걸리는 시간이 길어진다. 이러한 현상은 추론 결과에 따른 동작이 제때 이루어지지 않아 비정상적인 기능을 수행하는 문제를 발생시킬 수 있다. 이를 해결하기 위하여 본 논문에서 제안한 솔루션은 먼저 연산량이 큰 DNN에 터커 분해 기법을 적용하여 연산량을 감소시킨다. 그다음으로 DNN 모델들을 GPU 내부에서 은닉층 단위로 최대한 병렬적으로 수행될 수 있게 한다. 실험 결과 DNN의 추론 시간이 제안된 기법을 적용하기 전 대비 최대 75.6% 감소하였다.

Keywords

Acknowledgement

This research was financially supported by Hansung University.

References

J. Peng, L. Tian, X. Jia, H. Guo, Y. Xu, D. Xie, H. Luo, Y. Shan, Y. Shan, and Y. Wang, "Multi-task ADAS system on FPGA," in Proceedings of the IEEE Interntaional Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, Taiwan, pp. 171-174, 2019. DOI: 10.1109/AICAS.2019.877615.
J. Chandrasekaran, Y. Lei, R. Kacker, and D. R. Kuhn, "A Combinatorial Approach to Testing Deep Neural Network-based Autonomous Driving Systems," in Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Proto de Galinhad, Brazil, pp. 57-66, 2021. DOI: 10.1109/ICSTW52544.2021.00022.
Jetson AGX Xavier Developer Kit [Internet]. Available: https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolution neural networks," in Proceedings of the 25th Conference of Neural Information Processing Systems, Lake Tahoe, Neveda, pp. 1106-1114, 2012.
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size," arXiv preprint arXiv:1602.07360, 2016. DOI: 10.48550/arXiv.1602.07360.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proceedings of the International Conference on Learning Representations, San Diego: CA, USA, pp. 1-14, 2015.
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, "Designing Network Design Spaces," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle: WA, USA, pp. 10428-10436, 2020.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brabury, G. Chanan, T. Kileen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in Proceedings of the Neural Information Processing Systems (NeurIPS), Vancouver: BC, Canada, pp. 8024-8035, 2019.
ImageNet 1-crop error rates [Internet]. Available: https://pytorch.org/vision/stable/modes.html.
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steniner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: A System for Large-Scale Machine Learning," in Proceedings of the 12th USENIX Symposium on Operating System Design and Implementation (OSDI'16), Savannah: GA, USA, pp. 265-283, 2016.
MPS(multi-process service) [Internet]. Available: https://docs.nvidia.com/deploy/mps/index.html.
C. Lim and M. Kim, "ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerators," IEEE Access, vol. 9, pp. 85403-85417, Jun. 2021. DOI: 10.1109/ACCESS.2021.308861.
T. G. Kolda and B. W. Bader "Tensor Decompositions and Applications," SIAM Review, vol. 51, no. 3, pp. 455-500, Sep. 2009. https://doi.org/10.1137/07070111x
L. R. Tucker, "Some Mathematical notes on three-mode factor analysis," Psychometrika, vol. 31, no. 3, pp. 279-311, Sep. 1966. https://doi.org/10.1007/BF02289464
T. Amert, N. Otterness, M. Yang, J. H. Anderson, and F. D. Smith, "GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed," in Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Paris, France, pp. 104-115, 2017. DOI: 10.1109/RTSS.2017.00017.
L. D. Nguyen, D. Lin, Z. Lin, and J. Cao, "Deep CNNs for microscopic image classification by exploiting transfer learning and feature concatenation," in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, pp. 1-5, 2018.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas: NV, USA, pp. 2818-2826, 2016.
A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," M. S. theses, University of Tront, Toronto: ON, Canada, 2009.

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing

터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)