Training Performance Analysis of Semantic Segmentation Deep Learning Model by Progressive Combining Multi-modal Spatial Information Datasets

Lee, Dae-Geon;Shin, Young-Ha;Lee, Dong-Cheon;

doi:10.7848/ksgpc.2022.40.2.91

Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography (한국측량학회지)

Volume 40 Issue 2
/
Pages.91-108
/
2022
/
1598-4850(pISSN)
/
2288-260X(eISSN)

Korean Society of Surveying, Geodesy, Photogrammetry and Cartography (한국측량학회)

DOI QR Code

Training Performance Analysis of Semantic Segmentation Deep Learning Model by Progressive Combining Multi-modal Spatial Information Datasets

다중 공간정보 데이터의 점진적 조합에 의한 의미적 분류 딥러닝 모델 학습 성능 분석

Lee, Dae-Geon (AI Studio Lab, INFINIQ.) ;
Shin, Young-Ha (Dept. of Geoinformation Engineering, Sejong University) ;
Lee, Dong-Cheon (Dept. of Environment, Energy & Geoinformatics, Sejong University)

Received : 2022.03.14
Accepted : 2022.04.19
Published : 2022.04.30

https://doi.org/10.7848/ksgpc.2022.40.2.91 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In most cases, optical images have been used as training data of DL (Deep Learning) models for object detection, recognition, identification, classification, semantic segmentation, and instance segmentation. However, properties of 3D objects in the real-world could not be fully explored with 2D images. One of the major sources of the 3D geospatial information is DSM (Digital Surface Model). In this matter, characteristic information derived from DSM would be effective to analyze 3D terrain features. Especially, man-made objects such as buildings having geometrically unique shape could be described by geometric elements that are obtained from 3D geospatial data. The background and motivation of this paper were drawn from concept of the intrinsic image that is involved in high-level visual information processing. This paper aims to extract buildings after classifying terrain features by training DL model with DSM-derived information including slope, aspect, and SRI (Shaded Relief Image). The experiments were carried out using DSM and label dataset provided by ISPRS (International Society for Photogrammetry and Remote Sensing) for CNN-based SegNet model. In particular, experiments focus on combining multi-source information to improve training performance and synergistic effect of the DL model. The results demonstrate that buildings were effectively classified and extracted by the proposed approach.

대부분의 경우 광학 RGB 영상을 딥러닝(DL: Deep learning)의 학습 데이터로 사용하여 객체탐지, 인식, 식별, 분류, 의미적 분할 및 객체 분할 등을 수행하지만, 실세계의 3차원 객체들을 2차원 영상으로 완전하게 파악하는 것은 한계가 있다. 그러므로 대표적인 3차원 지형 공간정보인 수치표면모델(DSM: Digital Surface Model)과 더불어 DSM에 내재된 특성정보를 이용하여 3차원 지형지물을 분석하는 것이 효과적이다. 건물과 같이 기하학적으로 정형화된 형태의 인공구조물은 3차원 공간데이터로부터 얻을 수 있는 기하학적 요소와 특성을 이용하여 객체의 분류와 형상 묘사가 가능하다. 이 연구는 고차원 시각정보(high-level visual information) 시스템에서 중요한 역할을 하는 내재된 고유의 특성정보(intrinsic information)를 기반으로 하며, 이를 위하여 객체의 기하학적 요소인 경사와 주향을 DSM으로부터 도출하고, 다방향에서 생성한 음영기복영상(SRI: Shaded Relief Image)과 함께 DL 모델의 학습 수행에 사용하였다. 실험은 ISPRS (International Society for Photogrammetry and Remote Sensing)에서 제공하는 데이터 셋 중에서 DSM과 레이블 데이터를 객체의 의미적 분류를 위해 개발된 합성곱 기반의 SegNet 학습에 사용하였다. 지형지물을 분류하고 분류 결과를 이용하여 건물을 추출하였다. 특히 DL 모델의 학습 성능 향상을 위해 학습 데이터의 여러 조합에 따른 시너지 효과를 분석하는 것에 핵심이다. 제안한 방법은 건물 분류와 추출에 효과적임을 보여주고 있다.

Keywords

Acknowledgement

이 논문은 2021년도 세종대학교 교내연구비 지원에 의한 논문임. The Vaihingen data set was provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF)(Cramer, 2010) http://www.ifp.unistuttgart.de/dgpf/DKEP-Allg.html.

References

Audebert, N., Le Saux, B., and Lefevre, S. (2018), Beyond RGB: very high resolution urban remote sensing with multimodal deep networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 140, pp. 20-32. https://doi.org/10.1016/j.isprsjprs.2017.11.011
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017), SegNet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 39, No. 12, pp. 2481-2495. https://doi.org/10.1109/TPAMI.2016.2644615
Ballard, D. and Brown, C. (1982), Computer Vision, Prentice-Hall, Inc., Englewood Cliffs, NJ, 523p.
Bronshtein, A. (2017), Train/test split and cross validation in Python, https://towardsdatascience.com/train-test-split-andcross-validation-in-python-80b61beca4b6 (last date accessed: 30 August 2020).
Chen, K., Weinmann, M., Gao1, X., Yan, M., Hinz, S., Jutzi, B., and Weinmann, M. (2018), Residual shuffling convolutional neural networks for deep semantic image segmentation using multi-modal data, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4-7 June, Riva del Garda, Italy, pp. 65-72.
Cheng, W., Yang, W., Wang, M., Wang, G., and Chen, J. (2019), Context aggregation network for semantic labeling in aerial images, Remote Sensing, Vol. 11, No. 10, pp. 1-19.
Cho, E. and Lee, D.C. (2020), Building detection by convolutional neural network with infrared image, LiDAR data and characteristic information fusion, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 38, No. 6, pp. 635-644. (in Korean with English abstract) https://doi.org/10.7848/KSGPC.2020.38.6.635
Cramer, M. (2010), The DGPF test on digital aerial camera evaluation: Overview and test design. Photogrammetrie, Fernerkundung, Geoinformation, Vol. 2, pp. 73-82. https://doi.org/10.1127/1432-8364/2010/0041
Goodfellow, I., Bengio, Y., and Courville, A. (2016), Deep Learning, The MIT Press, Cambridge, MA, 775p.
Horn, B.K. (1981), Hill shading and the reflectance map, Proceedings of the IEEE, Vol. 69, No. 1, pp. 14-47. https://doi.org/10.1109/PROC.1981.11918
Kim, J. and Bathe, K. (2013), The finite element method enriched by interpolation covers, Computers and Structures, Vol. 116, pp. 35-49. https://doi.org/10.1016/j.compstruc.2012.10.001
Krizhevsky, A., Sutskever, I., and Hinton, G. (2017), ImageNet classification with deep convolutional neural networks, Communications of the ACM, Vol. 60, No. 6, pp. 84-90. https://doi.org/10.1145/3065386
Laurini, R. and Thompson, D. (1998), Fundamentals of Spatial Information Systems, Academic Press, London, 680p.
Lee, D., Cho, E., and Lee, D.C. (2019), Semantic classification of DSM using convolutional neural network based deep learning, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 2-3 July, Korea, pp. 93-96. (in Korean with English abstract) https://doi.org/10.7848/KSGPC.2021.39.2.93
Lee, D., Shin, Y., and Lee, D.C. (2020), Land cover classification using SegNet with slope, aspect and multi-directional shaded relief images derived from digital surface model, Journal of Sensors, Vol. 2020, pp. 1-19.
Lee, D.C., Lee, D.H., and Lee, D. (2019), Determination of building model key points using multidirectional shaded relief images generated from airborne LiDAR data, Journal of Sensors, Vol. 2019, pp. 1-19.
Lemaire, C. (2008), Aspects of the DSM production with high resolution images, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 37, Part B4, pp. 1143-1146.
Lillesand, T., Kiefer, R., and Chipman, J. (2004), Remote Sensing and Image Interpretation - 5th edition, John Wiley & Sons, New York, NY, 763p.
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., F. Li, Yuille1, A., Huang, J., and Murphy, K. (2018), Progressive neural architecture search, Computer Vision - ECCV 2018, pp. 19-35.
Macko, V., Weill, C., Mazzawi, H., and Gonzalvo, J. (2019), Improving neural architecture search image classifiers via ensemble learning, arXiv:1903.06236v1.
Maltezos, E., Doulamis, A, Doulamis, N., and Ioannidis, C. (2019), Building extraction from LiDAR data applying deep convolutional neural networks, IEEE Geoscience and Remote Sensing Letters, Vol. 16, No. 1, pp. 155-159. https://doi.org/10.1109/LGRS.2018.2867736
Maune, D., Kopp, S., Crawford, C., and Zervas, C. (2007), Digital Elevation Model Technologies and Applications: The DEM Users Manual - 2nd edition, American Society for Photogrammetry and Remote Sensing, Bethesada, MD, 655p.
McCulloch, W. and Pitts, W. (1943), A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, Vol. 7, pp. 115-133. https://doi.org/10.1007/BF02478313
Meyer, F. and Beucher, S. (1990), Morphological segmentation, Journal of Visual Communication and Image Representation, Vol. 1, No. 1, pp. 21-46. https://doi.org/10.1016/1047-3203(90)90014-M
Nahhas, F., Shafri, H., Sameen, M, Pradhan, B., and Mansor, S. (2018), Deep learning approach for building detection using LiDAR - orthophoto fusion, Journal of Sensors, Vol. 2020, No. 7, pp. 1-12.
Pibre, L., Chaumont, M., Subsol, G., Ienco, D., and Derras, M. (2017), How to deal with multi-source data for tree detection based on deep learning, IEEE Global Conference on Signal and Information Processing, pp. 1150-1154.
Prados, E. and Faugeras, O. (2006), Shape from Shading, In: Paragios, N., Chen, Y., and Faugeras, O. (eds.), Handbook of Mathematical Models in Computer Vision, Springer, New York, N.Y., pp. 375-403.
Rottensteiner, F., Sohn, G., Gerke, M., and Wegner, J. (2013), ISPRS test project on urban classification and 3D building reconstruction, ISPRS, http://www2.isprs.org/tl_files/isprs/wg34/docs/ComplexScenes_revision_v4.pdf (last date accessed: 30 July 2020).
Rusu, A., Rabinowitz, N., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., and Hadsel, R. (2016), Progressive neural networks, arXiv:1606.04671v3.
Sander, R. (2020), Data Fusion and Class Imbalance Correction Techniques for Efficient Multi-Class Point Cloud Semantic Segmentation. https://www.researchgate.net/publication/339323048_Sparse_Data_Fusion_and_Class_Imbalance_Correction_Techniques_for_Efficient_Multi-Class_Point_Cloud_Semantic_Segmentation (last date accessed: 25 January 2022).
Sharada, P., Mohanty, S., Hughes, D., and Salathe, M. (2016), Using deep learning for image-based plant disease detection, Frontiers in Plant Science, doi: 10.3389/fpls.2016.01419, pp. 1-10.
Shin, Y.H., Son, K.W., and Lee, D.C. (2022), Semantic segmentation and building extraction from airborne LiDAR data with multiple return using PointNet++, Applied Sciences, Vol. 12, No. 4, pp. 1-20.
Simonyan, K. and Zisserman, A. (2015), Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations, 7-9 May, San Diego, CA, USA, arXiv:1409.1556v6.
Speldekamp, T., Fries, C., Gevaert, C., and Gerke, M. (2015), Automatic semantic labelling of urban areas using a rulebased approach and realized with MeVisLab, https://www.researchgate.net/publication/275639040_Automatic_Semantic_Labelling_of_Urban_Areas_using_a_rule-based_approach_and_realized_with_MeVisLab (last date accessed: 13 August 2020).
Szeliski, R. (2011), Computer Vision: Algorithms and Applications, Springer-Verlag, London, U.K., 812p.
Tao, S. (2019), Deep neural network ensembles, arXiv:1904.05488v2.
Varney, N., Asari, V.K., and Graehling, Q. (2020), DALES: A large-scale aerial LiDAR data set for semantic segmentation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14 April 2020.
Wang, X. and Wang, X. (2020), Spatiotemporal fusion of remote sensing image based on deep learning, Journal of Sensors, Vol. 2020, Article ID 8873079, pp. 1-11.
Zhou, K., Ming, D., Lv, X., Fang, J., and Wang, M. (2019), CNN-based land cover classification combining stratified segmentation and fusion of point cloud and very high-spatial resolution remote sensing image data, Remote Sensing, Vol. 2019, No. 11, pp. 1-28.

Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography (한국측량학회지)

Training Performance Analysis of Semantic Segmentation Deep Learning Model by Progressive Combining Multi-modal Spatial Information Datasets

다중 공간정보 데이터의 점진적 조합에 의한 의미적 분류 딥러닝 모델 학습 성능 분석

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)