DOI QR코드

DOI QR Code

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei (Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring, Ministry of Education, School of Geosciences and Info-Physics, Central South University) ;
  • Yang, Can (Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring, Ministry of Education, School of Geosciences and Info-Physics, Central South University) ;
  • Wang, Xiao-Mi (School of Resources and Environmental Science, Hunan Normal University)
  • Received : 2020.10.28
  • Accepted : 2021.02.06
  • Published : 2021.04.10

Abstract

Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

Keywords

References

  1. Akgun, A., Sezer, E.A., Nefeslioglu, H.A., Gokceoglu, C. and Pradhan, B. (2012), "An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm", Land Degrad. Develop., 38(1), 23-34. https://doi.org/10.1016/j.cageo.2011.04.012.
  2. Amato, G., Eisank, C., Castro Camilo, D. and Lombardo, L. (2019), "Accounting for covariate distributions in slope-unitbased landslide susceptibility models. A case study in the alpine environment", Eng. Geol., 260(3), 105237. https://doi.org/10.1016/j.enggeo.2019.105237.
  3. Balzano, B., Tarantino, A., Nicotera, M.V., Forte, G., de Falco, M. and Santo, A. (2019), "Building physically based models for assessing rainfall-induced shallow landslide hazard at catchment scale: Case study of the Sorrento Peninsula (Italy)", Can. Geotech. J., 56(9), 1291-1303. https://doi.org/10.1139/cgj-2017-0611.
  4. Boulfoul, K., Hammoud, F. and Abbeche, K. (2020), "Numerical study on the optimal position of a pile for stabilization purpose of a slope", Geomech. Eng., 21(5), 401-411. https://doi.org/10.12989/gae.2020.21.5.401.
  5. Breiman, L. (2001), "Random forests", Machine Learn., 45(1), 5-32. https://doi.org/10.1023/a:1010933404324.
  6. Bui, D.T., Tuan, T.A., Klempe, H., Pradhan, B. and Revhaug, I. (2016), "Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree", Landslides, 13(2), 361-378. https://doi.org/10.1007/s10346-015-0557-6.
  7. Catani, F., Lagomarsino, D., Segoni, S. and Tofani, V. (2013), "Landslide susceptibility estimation by random forests technique: Sensitivity and scaling issues", Nat. Hazards Earth Syst. Sci., 13(11), 2815-2831. https://doi.org/10.5194/nhess-13-2815-2013.
  8. Chen, W., Panahi, M. and Pourghasemi, H.R. (2017a), "Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling", Catena, 157, 310-324. https://doi.org/10.1016/j.catena.2017.05.034.
  9. Chen, W., Pourghasemi, H.R. and Zhao, Z. (2017b), "A GIS-based comparative study of Dempster-Shafer, logistic regression and artificial neural network models for landslide susceptibility mapping", Geocarto Int., 32(4), 367-385. https://doi.org/10.1080/10106049.2016.1140824.
  10. Chen, W., Xie, X., Peng, J., Shahabi, H., Hong, H., Bui, D.T., Duan, Z., Li, S. and Zhu, A.X. (2018), "GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method", Catena, 164, 135-149. https://doi.org/10.1016/j.catena.2018.01.012.
  11. Chen, W., Xie, X., Wang, J., Pradhan, B., Hong, H., Bui, D.T., Duan, Z. and Ma, J. (2017c), "A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility", Catena, 151, 147-160. https://doi.org/10.1016/j.catena.2016.11.032.
  12. Cheng, W.C., Ni, J.C., Arulrajah, A. and Huang, H.W. (2018), "A simple approach for characterising tunnel bore conditions based upon pipe jacking data", Tunn. Undergr. Sp. Tech., 71, 494-504. https://doi.org/10.1016/j.tust.2017.10.002.
  13. Cortes, C. and Vapnik, V. (1995), "Support-vector networks", Machine Learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018.
  14. Degraff, J.V. and Canuti, P. (1988), "Using isopleth mapping to evaluate landslide activity in relation to agricultural practices", B. Eng. Geol. Environ., 38(1), 61-71.
  15. Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002), "Gene selection for cancer classification using support vector machines", Machine Learning, 46(1-3), 389-422. https://doi.org/10.1023/A:1012487302797.
  16. Hong, H., Ilia, I., Tsangaratos, P., Chen, W. and Xu, C. (2017), "A hybrid fuzzy weight of evidence method in landslide susceptibility analysis on the Wuyuan area, China", Geomorphology, 290, 1-16. https://doi.org/10.1016/j.geomorph.2017.04.002.
  17. Hong, H., Liu, J., Bui, D.T., Pradhan, B., Acharya, T.D., Pham, B.T., Zhu, A.X., Chen, W. and Ahmad, B.B. (2018a), "Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China)", Catena, 163, 399-413. https://doi.org/10.1016/j.catena.2018.01.005.
  18. Hong, H., Pourghasemi, H.R. and Pourtaghi, Z.S. (2016), "Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models", Geomorphology, 259, 105-118. https://doi.org/10.1016/j.geomorph.2016.02.012.
  19. Hong, H., Pradhan, B., Sameen, M.I., Kalantar, B., Zhu, A. and Chen, W. (2018b), "Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach", Landslides, 15(4), 753-772. https://doi.org/10.1007/s10346-017-0906-8.
  20. Hong, H., Tsangaratos, P., Ilia, I., Liu, J., Zhu, A.X. and Chen, W. (2018c), "Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China", Sci. Total Environ., 625, 575-588. https://doi.org/10.1016/j.scitotenv.2017.12.256.
  21. Irigaray, C., Fernández, T., El Hamdouni, R. and Chacon, J. (2007), "Evaluation and validation of landslide-susceptibility maps obtained by a GIS matrix method: Examples from the Betic Cordillera (southern Spain)", Nat. Hazards, 41(1), 61-79. https://doi.org/10.1007/s11069-006-9027-8.
  22. Kavzoglu, T. and Mather, P.M. (2010), "The role of feature selection in artificial neural network applications", Int. J. Remote Sensing, 23(15), 2919-2937. https://doi.org/10.1080/01431160110107743.
  23. Lagomarsino, D., Tofani, V., Segoni, S., Catani, F. and Casagli, N. (2017), "A tool for classification and regression Using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling", Environ. Model. Asses., 22(3), 201-214. https://doi.org/10.1007/s10666-016-9538-y.
  24. Li, C., Yao, D., Wang, Z., Liu, C.C., Wuliji, N., Yang, L., Li, L. and Amini, F. (2016), "Model test on rainfall-induced loess-mudstone interfacial landslides in Qingshuihe, China", Environ. Earth Sci., 75(9), 835. https://doi.org/10.1007/s12665-016-5658-6.
  25. Liu, D. and Chen, X. (2015), "Shearing characteristics of slip zone soils and strain localization analysis of a landslide", Geomech. Eng., 8(1), 33-52. https://doi.org/10.12989/gae.2015.8.1.033.
  26. Liu, L.L., Cheng, Y.M., Pan, Q.J. and Dias, D. (2020), "Incorporating stratigraphic boundary uncertainty into reliability analysis of slopes in spatially variable soils using one-dimensional conditional Markov chain model", Comput. Geotech., 118, 103321. https://doi.org/10.1016/j.compgeo.2019.103321.
  27. Liu, L.L., Deng, Z.P., Zhang, S.H. and Cheng, Y.M. (2018), "Simplified framework for system reliability analysis of slopes in spatially variable soils", Eng. Geol., 239, 330-343. https://doi.org/10.1016/j.enggeo.2018.04.009.
  28. Lombardi, M., Cardarilli, M. and Raspa, G. (2017), "Spatial variability analysis of soil strength to slope stability assessment", Geomech. Eng., 12(3), 483-503. https://doi.org/10.12989/gae.2017.12.3.483.
  29. Lombardo, L. and Mai, P.M. (2018), "Presenting logistic regression-based landslide susceptibility results", Eng. Geol., 244, 14-24. https://doi.org/10.1016/j.enggeo.2018.07.019.
  30. Merghadi, A., Yunus, A.P., Dou, J., Whiteley, J., ThaiPham, B., Bui, D.T., Avtar, R. and Abderrahmane, B. (2020), "Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance", Earth-Sci. Rev., 207, 103225. https://doi.org/10.1016/j.earscirev.2020.103225.
  31. Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, M. and Kanevski, M. (2014), "Machine learning feature selection methods for landslide susceptibility mapping", Math. Geosci., 46(1), 33-57. https://doi.org/10.1007/s11004-013-9511-0.
  32. Moore, I., Grayson, R. and Ladson, T. (1991), "Digital Terrain Modeling: A review of hydrological, geomorphological, and biological applications", Hydrol. Process., 5, 3-30. https://doi.org/10.1002/hyp.3360050103.
  33. Paola, R., Galli, M., Cardinali, M., Guzzetti, F. and Ardizzone, F., (2004), Geomorphological Mapping to Assess Landslide Risk: Concepts, Methods and Applications in the Umbria Region of Central Italy, in Landslide Hazard and Risk, Hoboken, New Jersey, U.S.A.
  34. Pham, B.T., Avand, M., Janizadeh, S., Phong, T.V., Al-Ansari, N., Ho, L.S., Das, S., Le, H.V., Amini, A., Bozchaloei, S.K., Jafari, F. and Prakash, I. (2020), "GIS based hybrid computational approaches for flash flood susceptibility assessment", Water, 12(3), 683. https://doi.org/10.3390/w12030683.
  35. Pourghasemi, H.R. and Rahmati, O. (2018), "Prediction of the landslide susceptibility: Which algorithm, which precision?", Catena, 162, 177-192. https://doi.org/10.1016/j.catena.2017.11.022.
  36. Pourghasemi, H.R., Kornejady, A., Kerle, N. and Shabani, F. (2020), "Investigating the effects of different landslide positioning techniques, landslide partitioning approaches, and presence-absence balances on landslide susceptibility mapping", Catena, 187, 104364. https://doi.org/10.1016/j.catena.2019.104364.
  37. Pourghasemi, H.R., Pradhan, B., Gokceoglu, C., Mohammadi, M. and Moradi, H.R. (2013), "Application of weights-of-evidence and certainty factor models and their comparison in landslide susceptibility mapping at Haraz watershed, Iran", Arab. J. Geosci., 6(7), 2351-2365. https://doi.org/10.1007/s12517-012-0532-7.
  38. Rasmussen, C.E. and Nickisch, H. (2010), "Gaussian processes for machine learning (GPML) toolbox", J. Mach. Learn. Res., 11(6), 3011-3015. https://doi.org/10.1115/1.4002474.
  39. Reichenbach, P., Rossi, M., Malamud, B.D., Mihir, M. and Guzzetti, F. (2018), "A review of statistically-based landslide susceptibility models", Earth-Sci. Rev., 180, 60-91. https://doi.org/10.1016/j.earscirev.2018.03.001.
  40. Reif, D.M., Motsinger, A.A., Mckinney, B.A., Jr, J.E.C. and Moore, J.H. (2006), "Feature selection using a random forests classifier for the integrated analysis of multiple data types" Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics & Computational Biology, Toronto, Canada, September.
  41. Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M. and Sabeti, P.C. (2011), "Detecting novel associations in large data sets", Science, 334(6062), 1518-1524. https://doi.org/10.1126/science.1205438.
  42. Samia, J., Temme, A., Bregt, A., Wallinga, J., Guzzetti, F. and Ardizzone, F. (2020), "Dynamic path-dependent landslide susceptibility modelling", Nat. Hazards Earth Syst. Sci., 20(1), 271-285. https://doi.org/10.5194/nhess-20-271-2020.
  43. Sheil, B.B., Suryasentana, S.K. and Cheng, W.C. (2020), "Assessment of anomaly detection methods applied to microtunneling", J. Geotech. Geoenviron. Eng., 146(9), 04020094. https://doi.org/10.1061/(ASCE)GT.1943-5606.0002326.
  44. Shou, K.J. and Lin, J.F. (2020), "Evaluation of the extreme rainfall predictions and their impact on landslide susceptibility in a subcatchment scale", Eng. Geol., 265, 105434. https://doi.org/10.1016/j.enggeo.2019.105434.
  45. Skolidis, G. and Sanguinetti, G. (2011), "Bayesian nultitask classification with Gaussian process priors", IEEE T. Neur. Networks, 22(12), 2011-2021. https://doi.org/10.1109/tnn.2011.2168568.
  46. Sun, D., Wen, H., Wang, D. and Xu, J. (2020), "A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm", Geomorphology, 362, 107201. https://doi.org/10.1016/j.geomorph.2020.107201.
  47. Vasu, N.N. and Lee, S.R. (2016), "A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea", Geomorphology, 263, 50-70. https://doi.org/10.1016/j.geomorph.2016.03.023.
  48. Wang, F., Xu, P., Wang, C., Wang, N. and Jiang, N. (2017), "Application of a GIS-based slope unit method for landslide susceptibility mapping along the Longzi river, Southeastern Tibetan Plateau, China", ISPRS Int. J. Geo-Inform., 6(6), 172. https://doi.org/10.3390/ijgi6060172.
  49. Wang, L.J., Guo, M., Sawada, K., Lin, J. and Zhang, J. (2015), "Landslide susceptibility mapping in Mizunami City, Japan: A comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models", Catena, 135, 271-282. https://doi.org/10.1016/j.catena.2015.08.007.
  50. Weiss, A. (2001), "Topographic position and landforms analysis", Proceedings of the ESRI User Conference, San Diego, California, U.S.A., July.
  51. Wold, S., Esbensen, K. and Geladi, P. (1987), "Principal component analysis", Chemometr. Intell. Lab., 2(1-3), 37-52. https://doi.org/10.1016/0169-7439(87)80084-9.
  52. Xing, H., Liu, L. and Luo, Y. (2019), "Water-induced changes in mechanical parameters of soil-rock mixture and their effect on talus slope stability", Geomech. Eng., 18(4), 353-362. https://doi.org/10.12989/gae.2019.18.4.353.
  53. Yalcin, A., Reis, S., Aydinoglu, A.C. and Yomralioglu, T. (2011), "A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey", Catena, 85(3), 274-287. https://doi.org/https://doi.org/10.1016/j.catena.2011.01.014.
  54. Yang, Y., Yang, J., Xu, C., Xu, C. and Song, C. (2019), "Local-scale landslide susceptibility mapping using the B-GeoSVC model", Landslides, 16(7), 1301-1312. https://doi.org/10.1007/s10346-019-01174-y.
  55. Youssef, A.M., Al-Kathery, M. and Pradhan, B. (2015), "Landslide susceptibility mapping at Al-Hasher area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models", Geosci. J., 19(1), 113-134. https://doi.org/10.1007/s12303-014-0032-8.
  56. Zhang, K., Wu, X., Niu, R., Yang, K. and Zhao, L. (2017), "The assessment of landslide susceptibility mapping using random forest and decision tree methods in the Three Gorges Reservoir area, China", Environ. Earth Sci., 76(11), 405. https://doi.org/10.1007/s12665-017-6731-5.