DOI QR코드

DOI QR Code

Bit Operation Optimization and DNN Application using GPU Acceleration

GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용

  • Kim, Sang Hyeok (Dept. of Computer Engineering, Hanbat National University) ;
  • Lee, Jae Heung (Dept. of Computer Engineering, Hanbat National University)
  • Received : 2019.12.10
  • Accepted : 2019.12.27
  • Published : 2019.12.31

Abstract

In this paper, we propose a new method for optimizing bit operations and applying them to DNN(Deep Neural Network) in software environment. As a method for this, we propose a packing function for bitwise optimization and a masking matrix multiplication operation for application to DNN. The packing function converts 32-bit real value to 2-bit quantization value through threshold comparison operation. When this sequence is over, four 32-bit real values are changed to one 8-bit value. The masking matrix multiplication operation consists of a special operation for multiplying the packed weight value with the normal input value. And each operation was then processed in parallel using a GPU accelerator. As a result of this experiment, memory saved about 16 times than 32-bit DNN Model. Nevertheless, the accuracy was within 1%, similar to the 32-bit model.

본 논문에서는 소프트웨어 환경에서 비트연산을 최적화 하고 DNN으로 응용하는 방법을 제안한다. 이를 위해 비트연산 최적화를 위한 패킹 함수와 DNN으로 응용을 위한 마스킹 행렬 곱 연산을 제안한다. 패킹 함수의 경우는 32bit의 실제 가중치값을 2bit로 변환하는 연산을 수행한다. 연산을 수행할 땐, 임계값 비교 연산을 통해 2bit 값으로 변환한다. 이 연산을 수행하면 4개의 32bit값이 1개의 8bit 메모리에 들어가게 된다. 마스킹 행렬 곱 연산의 경우 패킹된 가중치 값과 일반 입력 값을 곱하기 위한 특수한 연산으로 이루어져 있다. 그리고 각각의 연산은 GPU 가속기를 이용해 병렬로 처리되게 하였다. 그 결과 HandWritten 데이터 셋에 환경에서 32bit DNN 모델에 비해 약 16배의 메모리 절약을 볼 수 있었다. 그럼에도 정확도는 32bit 모델과 비슷한 1% 이내의 차이를 보였다.

Keywords

References

  1. M. Courbariaux, Y. Bengio and J. David, "Binary Connect: Training Deep Neural Networks with binary weights during propagations," 2015. https://arxiv.org/abs/1511.00363
  2. C. Zhu, S. Han, H. Mao and W. J. Dally, "Trained ternary quantization," International Conference on Learning Representations, 2017.
  3. S. Han, H. Mao and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," NIPS Deep Learning Symposium, 2015.
  4. Nikola Sakharnykh, "Maximizing Unified Memory Performance in CUDA," https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda
  5. J. Choi, P. I-Jen, C. Z. Wang, S. Venkataramani, V. Srinivasan and K. Gopalakrishnan, "Bridging the Accuracy Gap for 2-bit Quantized Neural Networks(QNN)," https://arxiv.org/abs/1807.06964
  6. S. Uhlich, L. Mauch, K. Yoshiyama, F. Cardinaux, J. A. Garcia, S. Tiedmann, T. Kemp and A. Nakamura, "Differentiable Quantization of Deep Neural Networks," https://arxiv.org/abs/1905.11452
  7. M. Rastegari, V. Ordonez, J. Redmon and A. Farhadi., "Xnor-net: Imagenet classification using binary convolution neural networks," European Conference on Computer Vision, pp.525-542, 2016.
  8. J. Choi, S. Venkataramani, V. Srinivasan, K. Gopalakrishana, Z. Wang, and P. Chuang, "Accurate And Efficient 2-bit Quantized Neural Networks," https://sysml.cc/doc/2019/168.pdf
  9. F. Li, B. Zhang and B. Liu, "Ternary Weight Networks," https://arxiv.org/abs/1605.04711
  10. N. S. Sohoni, C. R. Aberger, M. Leszczyynski, J. Zhamg and C. Re "Low Memory Neural Network Training: A Technical Report," https://arxiv.org/abs/1904.10631