DOI QR코드

DOI QR Code

비디오 인코더를 통한 딥러닝 모델의 정수 가중치 압축

Compression of DNN Integer Weight using Video Encoder

  • 김승환 (성균관대학교 컴퓨터교육과) ;
  • 류은석 (성균관대학교 컴퓨터교육과)
  • Kim, Seunghwan (Department of Computer Education, Sungkyunkwan University) ;
  • Ryu, Eun-Seok (Department of Computer Education, Sungkyunkwan University)
  • 투고 : 2021.08.19
  • 심사 : 2021.10.18
  • 발행 : 2021.11.30

초록

최근 다양한 분야에서 뛰어난 성능을 나타내는 Convolutional Neural Network(CNN)모델을 모바일 기기에서 사용하기 위한 다양한 연구가 진행되고 있다. 기존의 CNN 모델은 모바일 장비에서 사용하기에는 가중치의 크기가 크고 연산복잡도가 높다는 문제점이 있다. 이를 해결하기 위해 가중치의 표현 비트를 낮추는 가중치 양자화를 포함한 여러 경량화 방법들이 등장하였다. 많은 방법들이 다양한 모델에서 적은 정확도 손실과 높은 압축률을 나타냈지만, 대부분의 압축 모델들은 정확도 손실을 복구하기 위한 재학습 과정을 포함시켰다. 재학습 과정은 압축된 모델의 정확도 손실을 최소화하지만 많은 시간과 데이터를 필요로 하는 작업이다. Weight Quantization이후 각 층의 가중치는 정수형 행렬로 나타나는데 이는 이미지의 형태와 유사하다. 본 논문에서는 Weight Quantization이후 각 층의 정수 가중치 행렬을 이미지의 형태로 비디오 코덱을 사용하여 압축하는 방법을 제안한다. 제안하는 방법의 성능을 검증하기 위해 ImageNet과 Places365 데이터 셋으로 학습된 VGG16, Resnet50, Resnet18모델에 실험을 진행하였다. 그 결과 다양한 모델에서 2%이하의 정확도 손실과 높은 압축 효율을 달성했다. 또한, 재학습 과정을 제외한 압축방법인 No Fine-tuning Pruning(NFP)와 ThiNet과의 성능비교 결과 2배 이상의 압축효율이 있음을 검증했다.

Recently, various lightweight methods for using Convolutional Neural Network(CNN) models in mobile devices have emerged. Weight quantization, which lowers bit precision of weights, is a lightweight method that enables a model to be used through integer calculation in a mobile environment where GPU acceleration is unable. Weight quantization has already been used in various models as a lightweight method to reduce computational complexity and model size with a small loss of accuracy. Considering the size of memory and computing speed as well as the storage size of the device and the limited network environment, this paper proposes a method of compressing integer weights after quantization using a video codec as a method. To verify the performance of the proposed method, experiments were conducted on VGG16, Resnet50, and Resnet18 models trained with ImageNet and Places365 datasets. As a result, loss of accuracy less than 2% and high compression efficiency were achieved in various models. In addition, as a result of comparison with similar compression methods, it was verified that the compression efficiency was more than doubled.

키워드

과제정보

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2021-2017-0-01630) supervised by the IITP (Institute for Information & communications Technology Promotion).

참고문헌

  1. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision,", International conference on machine learning. PMLR, 2015, pp. 1737-1746.
  2. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704 - 2713.
  3. R. David, J. Duke, A. Jain, V. J. Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, S. Regev et al., "Tensorflow lite micro: Embedded machine learning on tinyml systems," arXiv preprint arXiv:2010.08678, 2020.
  4. S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," arXiv preprint arXiv:1510.00149, 2015.
  5. J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized convolutional neural networks for mobile devices," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820-4828.
  6. N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," IEEE transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.
  7. S. Kim, E.-S. Park, M. Ghulam, and E.-S. Ryu, "Compression method for cnn models using dct," Proceedings of the Korean Society of Broad-cast Engineers Conference. The Korean Institute of Broadcast and Media Engineers, 2020, pp. 553-556.
  8. Y. Wang, C. Xu, S. You, D. Tao, and C. Xu, "Cnnpack: Packing convolutional neural networks in the frequency domain." NIPS, vol. 1, 2016, p. 3.
  9. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, "Pruning filters for efficient convnets," arXiv preprint arXiv:1608.08710, 2016.
  10. A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, "A survey of quantization methods for efficient neural network inference," arXiv preprint arXiv:2103.13630, 2021.
  11. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248-255.
  12. J. Frankle and M. Carbin, "The lottery ticket hypothesis: Finding sparse, trainable neural networks," arXiv preprint arXiv:1803.03635, 2018.
  13. M. Schmidt, G. Fung, and R. Rosales, "Fast optimization methods for l 1 regularization: A comparative study and two new approaches," European Conference on Machine Learning. Springer, 2007, pp. 286-297.
  14. Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, "Learning efficient convolutional networks through network slimming," Proceedings of the IEEE international conference on computer vision, 2017, pp. 2736-2744.
  15. S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," International conference on machine learning. PMLR, 2015, pp. 448-456.
  16. S. Ioffe, "Batch renormalization: Towards reducing minibatch dependence in batch-normalized models," arXiv preprint arXiv:1702.03275, 2017.
  17. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
  18. R. Liu, J. Cao, P. Li, W. Sun, Y. Zhang, and Y. Wang, "Nfp: A no finet-uning pruning approach for convolutional neural network compression," 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD). IEEE, 2020, pp. 74-77.
  19. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
  20. G. K. Wallace, "The jpeg still picture compression standard," IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992. https://doi.org/10.1109/30.125072
  21. J. H. Ko, D. Kim, T. Na, J. Kung, and S. Mukhopadhyay, "Adaptive weight compression for memory-efficient neural networks," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 2017, pp. 199-204.
  22. H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, "Integer quantization for deep learning inference: Principles and empirical evaluation," arXiv preprint arXiv:2004.09602, 2020.
  23. Y. Guo, "A survey on methods and theories of quantized neural networks," arXiv preprint arXiv:1808.04752, 2018.
  24. G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the high efficiency video coding (hevc) standard," IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649-1668, 2012. https://doi.org/10.1109/TCSVT.2012.2221191
  25. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the h. 264/avc video coding standard," IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560-576, 2003. https://doi.org/10.1109/TCSVT.2003.815165
  26. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, "Places: A 10 million image database for scene recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1452-1464, 2017. https://doi.org/10.1109/tpami.2017.2723009
  27. J.-H. Luo, J. Wu, and W. Lin, "Thinet: A filter level pruning method for deep neural network compression," Proceedings of the IEEE international conference on computer vision, 2017, pp. 5058-5066.
  28. D. Marpe, H. Schwarz, and T. Wiegand, "Context-based adaptive binary arithmetic coding in the h.264/avc video compression standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620-636, July 2003. https://doi.org/10.1109/TCSVT.2003.815173
  29. S. Wiedemann, H. Kirchhoffer, S. Matlage, P. Haase, A. Marban,T. Marinc, D. Neumann, A. Osman, D. Marpe, H. Schwarzet al.,"Deepcabac: Context-adaptive binary arithmetic coding for deep neuralnetwork compression,"arXiv preprint arXiv:1905.08318, 2019.