DOI QR코드

DOI QR Code

Survey on Deep Learning-based Panoptic Segmentation Methods

딥 러닝 기반의 팬옵틱 분할 기법 분석

  • Received : 2021.08.20
  • Accepted : 2021.10.06
  • Published : 2021.10.31

Abstract

Panoptic segmentation, which is now widely used in computer vision such as medical image analysis, and autonomous driving, helps understanding an image with holistic view. It identifies each pixel by assigning a unique class ID, and an instance ID. Specifically, it can classify 'thing' from 'stuff', and provide pixel-wise results of semantic prediction and object detection. As a result, it can solve both semantic segmentation and instance segmentation tasks through a unified single model, producing two different contexts for two segmentation tasks. Semantic segmentation task focuses on how to obtain multi-scale features from large receptive field, without losing low-level features. On the other hand, instance segmentation task focuses on how to separate 'thing' from 'stuff' and how to produce the representation of detected objects. With the advances of both segmentation techniques, several panoptic segmentation models have been proposed. Many researchers try to solve discrepancy problems between results of two segmentation branches that can be caused on the boundary of the object. In this survey paper, we will introduce the concept of panoptic segmentation, categorize the existing method into two representative methods and explain how it is operated on two methods: top-down method and bottom-up method. Then, we will analyze the performance of various methods with experimental results.

Keywords

Acknowledgement

This research was supported by Samsung Electronics, the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1C1C1009662), and Basic Science Research Program through the NRF funded by Ministry of Education (No. NRF-2020X1A3A1093880).

References

  1. J. Long, E. Shelhamer, T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
  2. K. He, G. Gkioxari, P. Dollar, R. Girshick, "Mask r-cnn," In Proceedings of the IEEE International Conference on Computer Vision, pp. 2961-2969, 2017
  3. A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollar, "Panoptic Segmentation," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404-9413, 2019.
  4. A. Kirillov, R. Girshick, K. He, P. Dollar, "Panoptic Feature Pyramid Networks," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6399-6408, 2019.
  5. Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, R. Urtasun "Upsnet: A Unified Panoptic Segmentation Network," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8818-8826, 2019.
  6. T. J. Yang, M. D. Collins, Y. Zhu, J. J. Hwang, T. Liu, X. Zhang, L. C. Chen, "Deeperlab: Single-shot Image Parser," arXiv preprint arXiv:1902.05093, 2019.
  7. B. Cheng, M. D. Collins, Y. Zhu, T. Liu, T. S. Huang, H. Adam, L. C. Chen," Panoptic-deeplab: A Simple, Strong, and Fast Baseline for Bottom-up Panoptic Segmentation," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12475-12485, 2020
  8. H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, L. C. Chen, "Axial-deeplab: Stand-alone Axial-attention for Panoptic Segmentation," In European Conference on Computer Vision, pp. 108-126, Springer, Cham, August 2020.
  9. T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature Pyramid Networks for Object Detection," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117-2125, 2017.
  10. H. Zhao, J. Jia, V. Koltun, "Exploring Self-attention for Image Recognition," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076-10085, 2020.
  11. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, B. Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," In Proceedings of the IEEE Conference on Computer Vision and Pattern Eecognition, pp. 3213-3223, 2016.
  12. A. Veit, T. Matera, L. Neumann, J. Matas, S. Belongie, "Coco-text: Dataset and Benchmark for Text Detection and Recognition in Natural Images," arXiv preprint arXiv:1601.07140, 2016.