DOI QR코드

DOI QR Code

Novel two-stage hybrid paradigm combining data pre-processing approaches to predict biochemical oxygen demand concentration

생물화학적 산소요구량 농도예측을 위하여 데이터 전처리 접근법을 결합한 새로운 이단계 하이브리드 패러다임

  • Kim, Sungwon (Department of Railroad Construction and Safety Engineering, Dongyang University) ;
  • Seo, Youngmin (Department of Constructional and Environmental Engineering, Kyungpook National University) ;
  • Zakhrouf, Mousaab (URMER Laboratory, Department of Hydraulics, Faculty of Technology, University of Tlemcen) ;
  • Malik, Anurag (Punjab Agricultural University, Regional Research Station)
  • Received : 2021.08.18
  • Accepted : 2021.10.06
  • Published : 2021.12.31

Abstract

Biochemical oxygen demand (BOD) concentration, one of important water quality indicators, is treated as the measuring item for the ecological chapter in lakes and rivers. This investigation employed novel two-stage hybrid paradigm (i.e., wavelet-based gated recurrent unit, wavelet-based generalized regression neural networks, and wavelet-based random forests) to predict BOD concentration in the Dosan and Hwangji stations, South Korea. These models were assessed with the corresponding independent models (i.e., gated recurrent unit, generalized regression neural networks, and random forests). Diverse water quality and quantity indicators were implemented for developing independent and two-stage hybrid models based on several input combinations (i.e., Divisions 1-5). The addressed models were evaluated using three statistical indices including the root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and correlation coefficient (CC). It can be found from results that the two-stage hybrid models cannot always enhance the predictive precision of independent models confidently. Results showed that the DWT-RF5 (RMSE = 0.108 mg/L) model provided more accurate prediction of BOD concentration compared to other optimal models in Dosan station, and the DWT-GRNN4 (RMSE = 0.132 mg/L) model was the best for predicting BOD concentration in Hwangji station, South Korea.

주요한 수질지표 중의 하나인 생물화학적 산소요구량(BOD) 농도는 호소와 하천에서 생태학적 측면에서 관측항목으로 취급하고 있다. 본 연구에서는 대한민국의 도산 및 황지지점에서 BOD 농도예측을 위하여 새로운 이단계 하이브리드 패러다임(웨이블릿 기반 게이트 순환 유닛, 웨이블릿 기반 일반화된 회귀신경망, 그리고 웨이블릿 기반 랜덤 포레스트) 을 활용하였다. 이러한 모형들은 각 대응하는 독립모형들(게이트 순환 유닛, 일반화된 회귀신경망, 그리고 랜덤 포레스트) 과 함께 평가되었다. 다양한 수질 및 수량지표들이 여러 개의 입력조합(분류1-5) 을 기본으로 하여 독립 및 이단계 하이브리드 모형을 개발하기 위하여 구현되었다. 언급한 모형들은 root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), 그리고 correlation coefficient (CC) 를 포함한 세 개의 통계지표로서 평가되었으며, 통계결과치를 분석하면 이단계 하이브리드 모형들이 항상 대응하는 독립모형들의 예측 정도를 개선하지 않은 것으로 나타났다. 대한민국의 도산관측소에서는 DWT-RF5 (RMSE = 0.108 mg/L) 모형이 다른 최적모형과 비교하여 BOD 농도의 더 정확한 예측을 나타내었으며, 황지관측소에서는 DWT-GRNN4 (RMSE = 0.132 mg/L) 모형이 BOD 농도를 예측하는 최고의 모형이다.

Keywords

References

  1. Ahmadi, A., Fatemi, Z., and Nazari, S. (2018). "Assessment of input data selection methods for BOD simulation using data-driven models: A case study." Environmental Monitoring and Assessment, Vol. 190, No. 4, p. 239. https://doi.org/10.1007/s10661-018-6608-4
  2. Ahmadi, A., Nasseri, M., and Solomatine, D.P. (2019). "Parametric uncertainty assessment of hydrological models: coupling UNEEC-P and a fuzzy general regression neural network." Hydrological Sciences Journal, Vol. 64, No. 9, pp. 1080-1094. https://doi.org/10.1080/02626667.2019.1610565
  3. Ahmed, A.A.M., and Shah, S.M.A. (2017). "Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River." Journal of King Saud University-Engineering Sciences, Vol. 29, No. 3, pp. 237-243. https://doi.org/10.1016/j.jksues.2015.02.001
  4. Alizamir, M., Kim, S., Zounemat-Kermani, M., Heddam, S., Shahrabadi, A.H., and Gharabaghi, B. (2021). "Modelling daily soil temperature by hydro-meteorological data at different depths using a novel data-intelligence model: Deep echo state network model." Artificial Intelligence Review, Vol. 54, No. 4, pp. 2863-2890. https://doi.org/10.1007/s10462-020-09915-5
  5. Ay, M., and Kisi, O. (2012). "Modeling of dissolved oxygen concentration using different neural network techniques in Foundation Creek, El Paso County, Colorado." Journal of Environmental Engineering, Vol. 138, No. 6, pp. 654-662. https://doi.org/10.1061/(asce)ee.1943-7870.0000511
  6. Breiman, L. (2001). "Random forests." Machine Learning, Vol. 45, No. 1, pp. 5-32. https://doi.org/10.1023/A:1010933404324
  7. Cho, K., Van Merrienboer, B., Bahdanau, and D., Bengio, Y. (2014). "On the properties of neural machine translation: Encoder-decoder approaches." arXiv preprint arXiv, 1409. 1259. doi: 10.3115/v1/W14-4012
  8. Deo, R.C., Sahin, M., Adamowski, J.F., and Mi, J. (2019). "Universally deployable extreme learning machines integrated with remotely sensed MODIS satellite predictors over Australia to forecast global solar radiation: A new approach." Renewable and Sustainable Energy Reviews, Vol. 104, pp. 235-261. https://doi.org/10.1016/j.rser.2019.01.009
  9. Diamantopoulou, M.J., Antonopoulos, V.Z., and Papamichail, D.M. (2007). "Cascade correlation artificial neural networks for estimating missing monthly values of water quality parameters in rivers." Water Resources Management, Vol. 21, No. 3, pp. 649-662. https://doi.org/10.1007/s11269-006-9036-0
  10. Dogan, E., Sengorur, B., and Koklu, R. (2009). "Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique." Journal of Environmental Management, Vol. 90, Issue 2, pp. 1229-1235. https://doi.org/10.1016/j.jenvman.2008.06.004
  11. Emamgholizadeh, S., Kashi, H., Marofpoor, I., and Zalaghi, E. (2014). "Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models." International Journal of Environmental Science and Technology, Vol. 11, No. 3, pp. 645-656. https://doi.org/10.1007/s13762-013-0378-x
  12. Fallah, H., Kisi, O., Kim, S., and Rezaie-Balf, M. (2019). "A new optimization approach for the least-cost design of water distribution networks: Improved crow search algorithm." Water Resources Management, Vol. 33, No. 10, pp. 3595-3613. https://doi.org/10.1007/s11269-019-02322-8
  13. Friedman, J.H. (2002). "Stochastic gradient boosting." Computational Statistics and Data Analysis, Vol. 38, No. 4, pp. 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2
  14. Garrick, M., Cunnane, C., and Nash, J.E. (1978). "A criterion of efficiency for rainfall-runoff models." Journal of Hydrology, Vol. 36, No. 3-4, pp. 375-381. https://doi.org/10.1016/0022-1694(78)90155-5
  15. Granata, F., Papirio, S., Esposito, G., Gargano, R., and de Marinis, G. (2017). "Machine learning algorithms for the forecasting of wastewater quality indicators." Water, Vol. 9, No. 2, p. 105. https://doi.org/10.3390/w9020105
  16. Jouanneau, S., Recoules, L., Durand, M.J., Boukabache, A., Picot, V., Primault, Y., Lakel, A., Sengelin, M., Barillon, B., and Thouand, G. (2014). "Methods for assessing biochemical oxygen demand (BOD): A review." Water Research, Vol. 49, pp. 62-82. https://doi.org/10.1016/j.watres.2013.10.066
  17. Kalteh, A.M. (2015). "Wavelet genetic algorithm-support vector regression (wavelet GA-SVR) for monthly flow forecasting." Water Resources Management, Vol. 29, No. 4, pp.1283-1293. https://doi.org/10.1007/s11269-014-0873-y
  18. Khaled, B., Abdellah, A., Noureddine, D., Salim, H., and Sabeha, A. (2017). "Modelling of biochemical oxygen demand from limited water quality variable by ANFIS using two partition methods." Water Quality Research Journal of Canada, Vol. 53, No. 1, pp. 24-40.
  19. Kim, S. (2000). "The application of neural networks method for the flood discharge forecasting in the river basin." Journal of Korean Society of Civil Engineers, Vol. 20, No. 6-B, pp. 801-811 (in Korean).
  20. Kim, S. (2011). "Nonlinear hydrologic modeling using the stochastic and neural networks approach." Disaster Advances, Vol. 4, No. 1, pp. 53-63.
  21. Kim, S., Alizamir, M., Zounemat-Kermani, M., Kisi, O., and Singh, V.P. (2020). "Assessing the biochemical oxygen demand using neural networks and ensemble tree approaches in South Korea." Journal of Environmental Management, Vol. 270, p. 110834. https://doi.org/10.1016/j.jenvman.2020.110834
  22. Kim, S., and Kim, H.S. (2007). "Neural networks-genetic algorithm model for modeling of nonlinear evaporation and evapotranspiration time series 1. Theory and application of the model." Journal of Korean Water Resources Association, Vol. 40, No. 1, pp. 73-88. (in Korean) https://doi.org/10.3741/JKWRA.2007.40.1.073
  23. Kim, S., and Kim, H.S. (2008). "Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling." Journal of Hydrology, Vol. 351, No. 3-4, pp. 299-317. https://doi.org/10.1016/j.jhydrol.2007.12.014
  24. Kim, S., Kim, J.H., and Park, K.B. (2009). "Statistical learning theory for the disaggregation of the climatic data." Proceedings of the 33rd IAHR Congress, Vancouver, Canada, pp. 1154-1162.
  25. Kim, S., Kisi, O., Seo, Y., Singh, V.P., and Lee, C.J. (2017). "Assessment of rainfall aggregation and disaggregation using data-driven models and wavelet decomposition." Hydrology Research, Vol. 48, No. 1, pp. 99-116. https://doi.org/10.2166/nh.2016.314
  26. Kim, S., Maleki, N., Rezaie-Balf, M., Singh, V.P., Alizamir, M., Kim, N.W., Lee, J.T., and Kisi, O. (2021). "Assessment of the total organic carbon employing the different nature-inspired approaches in the Nakdong River, South Korea." Environmental Monitoring and Assessment, Vol. 193, No. 7, pp.1-22. https://doi.org/10.1007/s10661-020-08746-9
  27. Kim, S., Park, K.B., and Seo, Y.M. (2012). "Estimation of pan evaporation using neural networks and climate-based models." Disaster Advances, Vol. 5, No. 3, pp. 34-43.
  28. Kim, S., Seo, Y., and Lee, C.J. (2016). "Modeling of rainfall by combining neural computation and wavelet technique." Procedia Engineering, Vol. 154, pp. 1231-1236. https://doi.org/10.1016/j.proeng.2016.07.442
  29. Kisi, O. (2006). "Generalized regression neural networks for evapotranspiration modelling." Hydrological Sciences Journal, Vol. 51, No. 6, pp. 1092-1105. https://doi.org/10.1623/hysj.51.6.1092
  30. Ladlani, I., Houichi, L., Djemili, L., Heddam, S., and Belouz, K. (2012). "Modeling daily reference evapotranspiration (ETo) in the north of Algeria using generalized regression neural networks (GRNN) and radial basis function neural networks (RBFNN): A comparative study." Meteorology and Atmospheric Physics, Vol. 118, No. 3, pp. 163-178. https://doi.org/10.1007/s00703-012-0205-9
  31. Li, J., Abdulmohsin, H.A., Hasan, S.S., Kaiming, L., Al-Khateeb, B., Ghareb, M.I., and Mohammed, M.N. (2019). "Hybrid soft computing approach for determining water quality indicator: Euphrates River." Neural Computing and Applications, Vol. 31, No. 3, pp. 827-837. https://doi.org/10.1007/s00521-017-3112-7
  32. Li, X., Zecchin, A.C., and Maier, H.R. (2014). "Selection of smoothing parameter estimators for general regression neural networks - applications to hydrological and water resources modelling." Environmental Modelling and Software, Vol. 59, pp. 162-186. https://doi.org/10.1016/j.envsoft.2014.05.010
  33. Mallat, S.G. (1989). "A theory of multiresolution signal decomposition: the wavelet representation." IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 3, pp. 674-693. https://doi.org/10.1109/34.192463
  34. Ministry of Environment (ME) (2020). Full-scale implementation of the total water pollution control system in the 2030 phase of the Four Major Rivers (7.15). Press release.
  35. Nash, J.E., and Sutcliffe, J.V. (1970). "River flow forecasting through conceptual models, Part 1 - A discussion of principles." Journal of Hydrology, Vol. 10, No. 3, pp. 282-290. https://doi.org/10.1016/0022-1694(70)90255-6
  36. Noori, R., Yeh, H.D., Abbasi, M., Kachoosangi, F.T., and Moazami, S. (2015). "Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand." Journal of Hydrology, Vol. 527, pp. 833-843. https://doi.org/10.1016/j.jhydrol.2015.05.046
  37. Percival, D.B., and Walden, A.T. (2000). Wavelet methods for time series analysis. Cambridge University Press, New York, NY, U.S.
  38. Raheli, B., Aalami, M.T., El-Shafie, A., Ghorbani, M.A., and Deo, R.C. (2017). "Uncertainty assessment of the multilayer perceptron (MLP) neural network model with implementation of the novel hybrid MLP-FFA method for prediction of biochemical oxygen demand and dissolved oxygen: A case study of Langat River." Environmental Earth Sciences, Vol. 76, No. 14, p. 503. https://doi.org/10.1007/s12665-017-6842-z
  39. Rezaie-Balf, M., Maleki, N., Kim, S., Ashrafian, A., Babaie-Miri, F., Kim, N.W., Chung, I.M., and Alaghmand, S. (2019). "Forecasting daily solar radiation using CEEMDAN decomposition-based MARS model trained by crow search algorithm." Energies, Vol. 12, No. 8, p. 1416. https://doi.org/10.3390/en12081416
  40. Royal Commission on Sewage Disposal (1908). Fifth report on methods of treating and disposing of sewage. UK.
  41. Sahay, R.R., and Srivastava, A. (2014). "Predicting monsoon floods in rivers embedding wavelet transform, genetic algorithm and neural network." Water Resources Management, Vol. 28, No. 2, pp. 301-317. https://doi.org/10.1007/s11269-013-0446-5
  42. Seo, Y., and Kim, S. (2016). "Hydrological forecasting using hybrid data-driven approach." American Journal of Applied Sciences, Vol. 13, No. 8, pp.891-899. https://doi.org/10.3844/ajassp.2016.891.899
  43. Seo, Y., Kim, S., and Singh, V.P. (2018). "Comparison of different heuristic and decomposition techniques for river stage modeling." Environmental Monitoring and Assessment, Vol. 190, No. 7, pp. 1-22. https://doi.org/10.1007/s10661-017-6336-1
  44. Seo, Y., Kim, S., Kisi, O., and Singh, V.P. (2015). "Daily water level forecasting using wavelet decomposition and artificial intelligence techniques." Journal of Hydrology, Vol. 520, pp. 224-243. https://doi.org/10.1016/j.jhydrol.2014.11.050
  45. Seo, Y., Kim, S., Kisi, O., Singh, V.P., and Parasuraman, K. (2016). "River stage forecasting using wavelet packet decomposition and machine learning models." Water Resources Management, Vol. 30, No. 11, pp. 4011-4035. https://doi.org/10.1007/s11269-016-1409-4
  46. Simard, M., Saatchi, S.S., and De Grandi, G. (2000). "The use of decision tree and multiscale texture for classification of JERS-1 SAR data over tropical forest." IEEE Transactions on Geoscience and Remote Sensing, Vol. 38, No. 5, pp. 2310-2321. https://doi.org/10.1109/36.868888
  47. Solgi, A., Pourhaghi, A., Bahmani, R., and Zarei, H. (2017). "Improving SVR and ANFIS performance using wavelet transform and PCA algorithm for modeling and predicting biochemical oxygen demand (BOD)." Ecohydrology and Hydrobiology, Vol. 17, No. 2, pp.164-175. https://doi.org/10.1016/j.ecohyd.2017.02.002
  48. Specht, D.F. (1991). "A general regression neural network." IEEE Transactions on Neural Networks, Vol. 2, No. 6, pp. 568-576. https://doi.org/10.1109/72.97934
  49. Tao, H., Bobaker, A.M., Ramal, M.M., Yaseen, Z.M., Hossain, M.S., and Shahid, S. (2019). "Determination of biochemical oxygen demand and dissolved oxygen for semi-arid river environment: application of soft computing models." Environmental Science and Pollution Research, Vol. 26, No. 1, pp. 923-937. https://doi.org/10.1007/s11356-018-3663-x
  50. Taylor, K.E. (2001). "Summarizing multiple aspects of model performance in a single diagram." Journal of Geophysical Research: Atmospheres, Vol. 106, No. D7, pp. 7183-7192. https://doi.org/10.1029/2000JD900719
  51. Willmott, C.J., and Matsuura, K. (2005). "Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance." Climate Research, Vol. 30, No. 1, pp. 79-82. https://doi.org/10.3354/cr030079
  52. Yang, G., Lee, H., and Lee, G. (2020). "A hybrid deep learning model to forecast particulate matter concentration levels in Seoul, South Korea." Atmosphere, Vol. 11, No. 4, p. 348. https://doi.org/10.3390/atmos11040348
  53. Yaseen, Z.M., Karami, H., Ehteram, M., Mohd, N.S., Mousavi, S.F., Hin, L.S., Kisi, O., Farzin, S., Kim, S., and El-Shafie, A. (2018). "Optimization of reservoir operation using new hybrid algorithm." KSCE Journal of Civil Engineering, Vol. 22, No. 11, pp. 4668-4680. https://doi.org/10.1007/s12205-018-2095-y
  54. Zakhrouf, M., Bouchelkia, H., Stamboul, M., and Kim, S. (2020). "Novel hybrid approaches based on evolutionary strategy for streamflow forecasting in the Chellif River, Algeria." Acta Geophysica, Vol. 68, No. 1, pp.167-180. https://doi.org/10.1007/s11600-019-00380-5
  55. Zakhrouf, M., Bouchelkia, H., Stamboul, M., Kim, S., and Heddam, S. (2018). "Time series forecasting of river flow using an integrated approach of wavelet multi-resolution analysis and evolutionary data-driven models. A case study: Sebaou River (Algeria)." Physical Geography, Vol. 39, No. 6, pp. 506-522. https://doi.org/10.1080/02723646.2018.1429245
  56. Zhang, Y., Pulliainen, J., Koponen, S., and Hallikainen, M. (2002). "Application of an empirical neural network to surface water quality estimation in the Gulf of Finland using combined optical data and microwave data." Remote Sensing of Environment, Vol. 81, No. 2-3, pp. 327-336. https://doi.org/10.1016/S0034-4257(02)00009-3
  57. Zou, R., Lung, W.S., and Wu, J. (2007). "An adaptive neural network embedded genetic algorithm approach for inverse water quality modeling." Water Resources Research, Vol. 43, No. 8, W08427. https://doi.org/10.1029/2006WR005158
  58. Zounemat-Kermani, M., Rajaee, T., Ramezani-Charmahineh, A., and Adamowski, J.F. (2017). "Estimating the aeration coefficient and air demand in bottom outlet conduits of dams using GEP and decision tree methods." Flow Measurement and Instrumentation, Vol. 54, pp. 9-19. https://doi.org/10.1016/j.flowmeasinst.2016.11.004
  59. Zounemat-Kermani, M., Seo, Y., Kim, S., Ghorbani, M.A., Samadianfard, S., Naghshara, S., Kim, N.W., and Singh, V.P. (2019). "Can decomposition approaches always enhance soft computing models? Predicting the dissolved oxygen concentration in the St. Johns River, Florida." Applied Sciences, Vol. 9, No. 12, p. 2534. https://doi.org/10.3390/app9122534