Bit Operation Optimization and DNN Application using GPU Acceleration

Kim, Sang Hyeok;Lee, Jae Heung;

doi:10.7471/ikeee.2019.23.4.1314

Journal of IKEEE (전기전자학회논문지)

Volume 23 Issue 4
/
Pages.1314-1320
/
2019
/
1226-7244(pISSN)
/
2288-243X(eISSN)

Institute of Korean Electrical and Electronics Engineers (한국전기전자학회)

DOI QR Code

Bit Operation Optimization and DNN Application using GPU Acceleration

GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용

Kim, Sang Hyeok (Dept. of Computer Engineering, Hanbat National University) ;
Lee, Jae Heung (Dept. of Computer Engineering, Hanbat National University)

김상혁 ;
이재흥

Received : 2019.12.10
Accepted : 2019.12.27
Published : 2019.12.31

https://doi.org/10.7471/ikeee.2019.23.4.1314 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a new method for optimizing bit operations and applying them to DNN(Deep Neural Network) in software environment. As a method for this, we propose a packing function for bitwise optimization and a masking matrix multiplication operation for application to DNN. The packing function converts 32-bit real value to 2-bit quantization value through threshold comparison operation. When this sequence is over, four 32-bit real values are changed to one 8-bit value. The masking matrix multiplication operation consists of a special operation for multiplying the packed weight value with the normal input value. And each operation was then processed in parallel using a GPU accelerator. As a result of this experiment, memory saved about 16 times than 32-bit DNN Model. Nevertheless, the accuracy was within 1%, similar to the 32-bit model.

본 논문에서는 소프트웨어 환경에서 비트연산을 최적화 하고 DNN으로 응용하는 방법을 제안한다. 이를 위해 비트연산 최적화를 위한 패킹 함수와 DNN으로 응용을 위한 마스킹 행렬 곱 연산을 제안한다. 패킹 함수의 경우는 32bit의 실제 가중치값을 2bit로 변환하는 연산을 수행한다. 연산을 수행할 땐, 임계값 비교 연산을 통해 2bit 값으로 변환한다. 이 연산을 수행하면 4개의 32bit값이 1개의 8bit 메모리에 들어가게 된다. 마스킹 행렬 곱 연산의 경우 패킹된 가중치 값과 일반 입력 값을 곱하기 위한 특수한 연산으로 이루어져 있다. 그리고 각각의 연산은 GPU 가속기를 이용해 병렬로 처리되게 하였다. 그 결과 HandWritten 데이터 셋에 환경에서 32bit DNN 모델에 비해 약 16배의 메모리 절약을 볼 수 있었다. 그럼에도 정확도는 32bit 모델과 비슷한 1% 이내의 차이를 보였다.

Keywords

References

M. Courbariaux, Y. Bengio and J. David, "Binary Connect: Training Deep Neural Networks with binary weights during propagations," 2015. https://arxiv.org/abs/1511.00363
C. Zhu, S. Han, H. Mao and W. J. Dally, "Trained ternary quantization," International Conference on Learning Representations, 2017.
S. Han, H. Mao and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," NIPS Deep Learning Symposium, 2015.
Nikola Sakharnykh, "Maximizing Unified Memory Performance in CUDA," https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda
J. Choi, P. I-Jen, C. Z. Wang, S. Venkataramani, V. Srinivasan and K. Gopalakrishnan, "Bridging the Accuracy Gap for 2-bit Quantized Neural Networks(QNN)," https://arxiv.org/abs/1807.06964
S. Uhlich, L. Mauch, K. Yoshiyama, F. Cardinaux, J. A. Garcia, S. Tiedmann, T. Kemp and A. Nakamura, "Differentiable Quantization of Deep Neural Networks," https://arxiv.org/abs/1905.11452
M. Rastegari, V. Ordonez, J. Redmon and A. Farhadi., "Xnor-net: Imagenet classification using binary convolution neural networks," European Conference on Computer Vision, pp.525-542, 2016.
J. Choi, S. Venkataramani, V. Srinivasan, K. Gopalakrishana, Z. Wang, and P. Chuang, "Accurate And Efficient 2-bit Quantized Neural Networks," https://sysml.cc/doc/2019/168.pdf
F. Li, B. Zhang and B. Liu, "Ternary Weight Networks," https://arxiv.org/abs/1605.04711
N. S. Sohoni, C. R. Aberger, M. Leszczyynski, J. Zhamg and C. Re "Low Memory Neural Network Training: A Technical Report," https://arxiv.org/abs/1904.10631

Journal of IKEEE (전기전자학회논문지)

Bit Operation Optimization and DNN Application using GPU Acceleration

GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)