DOI QR코드

DOI QR Code

Object Detection Model Using Attention Mechanism

주의 집중 기법을 활용한 객체 검출 모델

  • Kim, Geun-Sik (Department of Information Convergence Engineering, Pusan National University) ;
  • Bae, Jung-Soo (College of Software Convergence, Dongseo University) ;
  • Cha, Eui-Young (Department of Computer Engineering, Pusan National University)
  • Received : 2020.09.08
  • Accepted : 2020.09.29
  • Published : 2020.12.31

Abstract

With the emergence of convolutional neural network in the field of machine learning, the model for solving image processing problems has seen rapid development. However, the computing resources required are also rising, making it difficult to learn from a typical environment. Attention mechanism is originally proposed to prevent the gradient vanishing problem of the recurrent neural network, but this can also be used in a direction favorable to learning of the convolutional neural network. In this paper, attention mechanism is applied to convolutional neural network, and the excellence of the proposed method is demonstrated through the comparison of learning time and performance difference at this time. The proposed model showed that both learning time and performance were superior in object detection based on YOLO compared to models without attention mechanism, and experimentally demonstrated that learning time could be significantly reduced. In addition, this is expected to increase accessibility to machine learning by end users.

기계 학습 분야에 합성 곱 신경망이 대두되면서 이미지 처리 문제를 해결하는 모델은 비약적인 발전을 맞이했다. 하지만 그만큼 요구되는 컴퓨팅 자원 또한 상승하여 일반적인 환경에서 이를 학습해보기는 쉽지 않은 일이다. 주의 집중 기법은 본래 순환 신경망의 기울기 소실 문제를 방지하기 위해 제안된 기법이지만, 이는 합성 곱 신경망의 학습에도 유리한 방향으로 활용될 수 있다. 본 논문에서는 합성 곱 신경망에 주의 집중 기법을 적용하고, 이때의 학습 시간과 성능 차이 비교를 통해 제안하는 방법의 우수성을 입증한다. 제안하는 모델은 YOLO를 기반으로 한 객체 검출에서 주의 집중 기법을 적용하지 않은 모델에 비해 학습 시간, 성능 모두 우수한 것으로 나타났으며, 특히 학습 시간을 현저히 낮출 수 있음을 실험적으로 증명하였다. 또한, 이를 통해 일반 사용자의 기계 학습에 대한 접근성 증대가 기대된다.

Keywords

References

  1. J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," University of Washington, Washington: WA, Technical Report, 2018.
  2. K. Xu, J. Ba, R. Kiros, K. Cho, and A. Courville, "Show, attend and tell: Neural image caption generation with visual attention," in International conference on machine learning, France: FR, pp. 2048-2057, 2015.
  3. H. Nam, J. Ha, and J. Kim, "Dual attention networks for multimodal reasoning and matching," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 299-307, 2017.
  4. F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, and X. Tang, "Residual attention network for image classification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 3156-3164, 2017.
  5. J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Utah: UT, pp. 7132-7141, 2018.
  6. S. Woo, J. Park, J. Lee, and K. So, "Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), Germany: DE, pp. 3-19, 2018.
  7. Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, "ECA-net: Efficient channel attention for deep convolutional neural networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, pp. 11534-11542, 2020.
  8. Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression," in Proceeding of the AAAI Conference on Artificial Intelligence, New York: NY, vol. 34, no. 7, pp. 12993-13000, 2020.
  9. K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 2961-2969, 2017.
  10. H. Qassim, A. Verma, and D. Feinzimer, "Compressed residual-VGG16 CNN model for big data places image recognition," in 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Nevada: NV, pp. 169-175, 2018.
  11. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nevada: NV, pp. 770-778, 2016.
  12. G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 4700-4708, 2017.
  13. T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 2980-2988, 2017.
  14. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized intersection over union: A metric and a loss for bounding box regression," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, California: CA, pp. 658-666, 2019.
  15. T. Dozat, "Incorporating nesterov momentum into adam," in ICLR 2016 workshop submission, Puerto Rico: PR, 2016.
  16. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in Proceedings of the 3rd International Conference on Learning Representations (ICLR), California: CA, pp. 1-15, 2015.
  17. A. Mittal, A. Zisserman, and P. Torr. Hand Dataset [Internet]. Available: http://www.robots.ox.ac.uk/-vgg/data/hands/.