TVM-based Performance Optimization for Image Classification in Embedded Systems

Cheonghwan Hur;Minhae Ye;Ikhee Shin;Daewoo Lee;

doi:10.14372/IEMEK.2023.18.3.101

대한임베디드공학회논문지 (IEMEK Journal of Embedded Systems and Applications)

제18권3호
/
Pages.101-108
/
2023
/
1975-5066(pISSN)

대한임베디드공학회 (Institute of Embedded Engineering of Korea)

DOI QR Code

임베디드 시스템에서의 객체 분류를 위한 TVM기반의 성능 최적화 연구

TVM-based Performance Optimization for Image Classification in Embedded Systems

Cheonghwan Hur (RTST) ;
Minhae Ye (RTST) ;
Ikhee Shin (RTST) ;
Daewoo Lee (RTST)

투고 : 2023.03.23
심사 : 2023.05.18
발행 : 2023.06.30

https://doi.org/10.14372/IEMEK.2023.18.3.101 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Optimizing the performance of deep neural networks on embedded systems is a challenging task that requires efficient compilers and runtime systems. We propose a TVM-based approach that consists of three steps: quantization, auto-scheduling, and ahead-of-time compilation. Our approach reduces the computational complexity of models without significant loss of accuracy, and generates optimized code for various hardware platforms. We evaluate our approach on three representative CNNs using ImageNet Dataset on the NVIDIA Jetson AGX Xavier board and show that it outperforms baseline methods in terms of processing speed.

키워드

과제정보

본 논문은 2021년도 정부 (과학기술정보통신부)의 재원으로 '자율주행기술개발혁신사업'의 지원을 받아 수행된 연구임 (No.2021-0-00905, (3세부) Cloud, Edge, Car 3-Tier 연계 인지/판단/제어 SW 및 공통 SW 플랫폼 기술 개발).

참고문헌

R. Padilla, S. L. Netto, E. A. Da Silva, "A Survey on Performance Metrics for Object-detection Algorithms," 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 237-242, 2020.
A. Kumar, A. Kaur, M. Kumar, "Face Detection Techniques: A Review," Artificial Intelligence Review Vol. 52, pp. 927-948, 2019. https://doi.org/10.1007/s10462-018-9650-2
Y. Kang, Z. Cai, C. W. Tan, Q. Huang, H. Liu, "Natural Language Processing (NLP) in Management Research: A Literature Review," Journal of Management Analytics Vol. 7, No. 2, pp. 139-172, 2020. https://doi.org/10.1080/23270012.2020.1756939
J ,Chen, X. Ran, "Deep Learning with Edge Computing: A Review," Proceedings of the IEEE Vol. 107, No. 8, pp. 1655-1674, 2019. https://doi.org/10.1109/JPROC.2019.2921977
T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, "TVM: An Automated End-to-end Optimizing Compiler for Deep Learning," arXiv preprint arXiv:1802.04799, 2018.
H. Wu, P. Judd, X. Zhang, M. Isaev, P. Micikevicius, "Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation," arXiv preprint arXiv:2004.09602, 2020.
M. A. C. Fernandes, H. T. Kung, "A Novel Training Strategy for Deep Learning Model Compression Applied to Viral Classifications," 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1-9, 2021.
Y. Zhou, X. Hu, L. Wang, G. Zhou, S. Duan, "QuantBayes: Weight Optimization for Memristive Neural Networks Via Quantization-aware Bayesian Inference," IEEE Transactions on Circuits and Systems I: Regular Papers Vol. 68, No. 12, pp. 4851-4861, 2021. https://doi.org/10.1109/TCSI.2021.3115787
N. Shoghi, A. Bersatti, M. Qureshi, H. Kim, "SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering," IEEE Computer Architecture Letters Vol. 20, No. 2, pp. 126-129, 2021. https://doi.org/10.1109/LCA.2021.3108505
B. Liberatori, C. A. Mami, G. Santacatterina, M. Zullich, F. A. Pellegrino, "YOLO-Based Face Mask Detection on Low-End Devices Using Pruning and Quantization," 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), pp. 900-905, 2022.
S. Zhang, X. Li, C. Zhang, "Neural Network Quantization Methods For Voice Wake Up Network," Journal of Physics: Conference Series Vol. 1871, No. 1, pp. 012049, 2021.
https://github.com/apache/tvm-rfcs/blob/main/rfcs/0006-AMP_pass.md
M. H. Shin, I. K. Ye, D. W. Lee, "Performance Analysis on TVM Optimization for AI Framework in Autonomous Vehicles," Institute of Embedded Engineering of Korea (IEMEK), 2021 (in Korean).
L. Zheng, C. Jia, M. Sun, Z. Wu, C. H. Yu, A. Haj-Ali, Y. Wang, J. Yang, D. Zhuo, K. Sen, "Ansor: Generating High-performance Tensor Programs for Deep Learning," Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, pp. 863-879, 2020.
M. H. Shin, I. K. Ye, D. W. Lee, "A Study on the Effect of Low-level Code Optimization for DNNs via TVM Optimization Performance Analysis," Korea Institute of Military Science and Technology (KIMST), 2021 (in Korean).
M. H. Shin, L. K. Ye, D. W. Lee, "A Study on TVM for the Embedded Software in Weapon Systems," The Korea Institute of Intelligent Transport Systems Vol. 2022, No. 6, pp. 246-251, 2022 (in Korean).
H. A. Abdelhafez, H. Halawa, K. Pattabiraman, M. Ripeanu, "Snowflakes at the Edge: A Study of Variability Among NVIDIA Jetson AGX Xavier Boards," Proceedings of the 4th International Workshop on Edge Systems, Analytics and Networking, pp. 1-6, 2021.
K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
M. Tan, Q. Le, "Efficientnetv2: Smaller Models and Faster Training," International Conference on Machine Learning, pp. 10096-10106, 2021.
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei, "Imagenet: A Large-scale Hierarchical Image Database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
Y. Wei-Wei, J. ZHANG, "Real-Time Drivers' Violation Behaviors Detection Based on Improved YOLOv3-tiny Algorithm-Based on Model Pruning and Half-Precision Acceleration [J]," Computer Systems & Applications Vol. 29, No. 04, pp. 41-47, 2020.
D. Lin, S. Talathi, S. Annapureddy, "Fixed Point Quantization of Deep Convolutional Networks," International Conference on Machine Learning, pp. 2849-2858, 2016.
P. Nayak, D. Zhang, S. Chai, "Bit Efficient Quantization for Deep Neural Networks," 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp. 52-56, 2019.
S. Yang, R. Wu, M. Wang, L. Jiao, "Evolutionary Clustering Based Vector Quantization and SPIHT Coding for Image Compression," Pattern Recognition Letters Vol. 31, No. 13, pp. 1773-1780, 2010.
R. C. O. Rocha, V. Porpodas, P. Petoumenos, L. F. Goes, Z. Wang, M. Cole, H. Leather, "Vectorization-aware Loop Unrolling with Seed Forwarding," Proceedings of the 29th International Conference on Compiler Construction, pp. 1-13, 2020.
K. Hammond, S. P. Jones, "Profiling Scheduling Strategies on the GRIP Parallel Reducer," Submitted to Journal of Parallel and Distributed Computing, 1991.
S. D. Hammond, C. T. Vaughan, D. Dinge, P. Lin, C. Hughes, C. R. Trott, J. Cook, R. J. Hoekstra, "Sandia ATDM Performance Execution Tools & Analysis," 2018.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, "Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, "Pytorch: An Imperative Style, High-performance Deep Learning Library," Advances in Neural Information Processing Systems Vol. 32, 2019.
J. Bai, F. Lu, K. Zhang, "Onnx: Open Neural Network Exchange," GitHub Repository, pp. 54, 2019.
https://gist.github.com/masahi/e4c611694e3dfd307a8b6bba45eb1658
G. Bradski, "The OpenCV Library," Dr. Dobb's Journal: Software Tools for the Professional Programmer Vol. 25, No. 11, pp. 120-123, 2000.
S32V Vision and Sensor Fusion Evaluation Board. https://www.nxp.com/products/processors-and-microcontrollers/armbased-processors-and-mcus/s32-automotiveplatform/s32v-vision-andsensor-fusion-evaluation-board:SBC-S32V234
D. H. Son, H. Y. Lee, D. H. Im, "Development of High Reliable Real-Time Operating System (RTWORKS) Based on Partitioning and Application of Weapon System," Communications of the Korean Institute of Information Scientists and Engineers Vol. 34, No. 10, pp. 53-59, 2016.

대한임베디드공학회논문지 (IEMEK Journal of Embedded Systems and Applications)

임베디드 시스템에서의 객체 분류를 위한 TVM기반의 성능 최적화 연구

TVM-based Performance Optimization for Image Classification in Embedded Systems

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)