DOI QR코드

DOI QR Code

하드 파라미터 쉐어링 기반의 보행자 및 운송 수단 거리 추정

Pedestrian and Vehicle Distance Estimation Based on Hard Parameter Sharing

  • Seo, Ji-Won (Department of Information Convergence Engineering, Pusan National University) ;
  • Cha, Eui-Young (Department of Computer Engineering, Pusan National University)
  • 투고 : 2022.02.18
  • 심사 : 2022.03.02
  • 발행 : 2022.03.31

초록

심층 학습 기술의 발전으로 인해 분류, 객체 검출, 분할과 같은 시각 정보를 이용한 심층 학습이 다양한 분야에서 활용되고 있다. 그 중 자율 주행은 시각 데이터를 잘 활용하는 대표적인 분야 중 하나이다. 본 논문에서는 도로 위의 사람과 운송수단 객체에 대한 개별적인 깊이 값을 예측하는 망을 제안한다. 제안하는 모델은 YOLOv3와 Monodepth를 기반으로 하며, 하드 파라미터 쉐어링을 이용한 인코더와 디코더를 통해 객체 검출과 깊이 추정을 동시에 수행한다. 또한 주의 집중 기법을 사용하여 객체 검출 및 깊이 추정의 정확도를 높이고자 하였다. 깊이 추정은 단안 이미지를 통해 이루어지며, 자가 학습 방법을 통해 학습을 수행하였다.

Because of improvement of deep learning techniques, deep learning using computer vision such as classification, detection and segmentation has also been used widely at many fields. Expecially, automatic driving is one of the major fields that applies computer vision systems. Also there are a lot of works and researches to combine multiple tasks in a single network. In this study, we propose the network that predicts the individual depth of pedestrians and vehicles. Proposed model is constructed based on YOLOv3 for object detection and Monodepth for depth estimation, and it process object detection and depth estimation consequently using encoder and decoder based on hard parameter sharing. We also used attention module to improve the accuracy of both object detection and depth estimation. Depth is predicted with monocular image, and is trained using self-supervised training method.

키워드

과제정보

This work was supported by a 2-Year Research Grant of Pusan National University.

참고문헌

  1. Y. Wang, W. L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, "Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8437-8445, 2019.
  2. J. T. Lin, D. Dai, and L. V. Goll, "Depth Estimation from Monocular Images and Sparse Radar Data," in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10233-10240, 2020.
  3. A. Saxena, J. Schulte, and A. Y. Ng, "Depth Estimation using Monocular and Stereo Cues," in Proceedings of the IEEE International Joint Conference on Artificial Intelligence, pp. 2197-2203, 2007.
  4. D. Xu, E. Ricci, W. Ouyang, X. Wang, and N. Sebe, "Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1161-169, 2017.
  5. M. Ramamonjisoa, Y. Du, and V. Lepetit, "Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14636-14645, 2020.
  6. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
  7. R. A. Caruana, "Multitask Learning: A Knowledge-Based Source of Inductive Bias," in Proceedings of the Tenth International Conference on Machine Learning, pp. 41-75, 1997.
  8. C. Godard, O. M. Aodha, and G. J. Brostow, "Unsupervised Monocular Depth Estimation with Left-Right Consistency," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6602-6611, 2017.
  9. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems, pp. 91-99, 2015.
  10. K. He, G. Gkioxari, P. Dollar, and R. Girchick, "Mask R-CNN," arXiv preprint arXiv:1703.06870, 2017.
  11. S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," in Proceedings of the European Conference on Computer Vision, pp. 3-19, 2018.
  12. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized Intersection Over Union: A Metric and a Los for Bounding Box Regression," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658-666, 2019.
  13. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, 2015.
  14. J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.