DOI QR코드

DOI QR Code

Study on data preprocessing methods for considering snow accumulation and snow melt in dam inflow prediction using machine learning & deep learning models

머신러닝&딥러닝 모델을 활용한 댐 일유입량 예측시 융적설을 고려하기 위한 데이터 전처리에 대한 방법 연구

  • Jo, Youngsik (Department of Civil Engineering, Chungnam National University) ;
  • Jung, Kwansue (Department of Civil Engineering, Chungnam National University)
  • 조영식 (충남대학교 토목공학과) ;
  • 정관수 (충남대학교 토목공학과)
  • Received : 2023.11.03
  • Accepted : 2023.12.29
  • Published : 2024.01.31

Abstract

Research in dam inflow prediction has actively explored the utilization of data-driven machine learning and deep learning (ML&DL) tools across diverse domains. Enhancing not just the inherent model performance but also accounting for model characteristics and preprocessing data are crucial elements for precise dam inflow prediction. Particularly, existing rainfall data, derived from snowfall amounts through heating facilities, introduces distortions in the correlation between snow accumulation and rainfall, especially in dam basins influenced by snow accumulation, such as Soyang Dam. This study focuses on the preprocessing of rainfall data essential for the application of ML&DL models in predicting dam inflow in basins affected by snow accumulation. This is vital to address phenomena like reduced outflow during winter due to low snowfall and increased outflow during spring despite minimal or no rain, both of which are physical occurrences. Three machine learning models (SVM, RF, LGBM) and two deep learning models (LSTM, TCN) were built by combining rainfall and inflow series. With optimal hyperparameter tuning, the appropriate model was selected, resulting in a high level of predictive performance with NSE ranging from 0.842 to 0.894. Moreover, to generate rainfall correction data considering snow accumulation, a simulated snow accumulation algorithm was developed. Applying this correction to machine learning and deep learning models yielded NSE values ranging from 0.841 to 0.896, indicating a similarly high level of predictive performance compared to the pre-snow accumulation application. Notably, during the snow accumulation period, adjusting rainfall during the training phase was observed to lead to a more accurate simulation of observed inflow when predicted. This underscores the importance of thoughtful data preprocessing, taking into account physical factors such as snowfall and snowmelt, in constructing data models.

댐유입량 예측에 대하여 데이터 기반 머신러닝 및 딥러닝(Machine Learning & Deep Learning, ML&DL) 분석도구들이 공개되어 다양한 분야에서 ML&DL의 적용연구가 활발히 진행되고 있으며, 모델의 자체 성능향상 뿐만 아니라 모델의 특성을 고려한 데이터의 전처리도 댐유입량을 정확하게 예측하게 하는 중요한 모델성능 향상의 요소라고 할 수 있다. 특히 기존 강우자료는 적설량을 열선 설비를 통하여 녹여 강우량으로 환산되어 있으므로, 융적설에 따른 강우와 유입량의 상관관계를 왜곡하게 된다. 따라서 본연구에서는 소양강댐과 같이 융적설의 영향을 받는 댐유역에 대한 댐일유입량 예측시 겨울에 강설량이 적설이 되어 적게 유출되는 현상과, 봄에 융설로 인하여 무강우나 적은 비에도 많은 유출이 일어나는 물리적 현상을 ML&DL모델로 적용하기 위하여 필요한 강우 데이터의 전처리에 대한 연구를 수행 하였다. 강우계열, 유입량계열을 조합하여 3가지 머신러닝(SVM, RF, LGBM)과 2가지 딥러닝(LSTM, TCN) 모델을 구축하고, 최적 하이퍼파라메터 튜닝을 통하여 적합 모델을 적용하고 한 결과, NSE 0.842~0.894로 높은 수준의 예측성능을 나타내었다. 또한 융적설을 반영한 강우보정 데이터를 만들기 위하여 융적설 모의 알고리즘을 개발하고, 이를 통하여 산정된 보정강우를 머신러닝 및 딥러닝 모델에 적용한 결과 NSE 0.841~0.896 으로 융적설 적용전과 비슷한 높은 수준의 예측 성능을 나타내었으나, 융적설 기간에는 조정된 강우로 학습되어 예측되었을 때 실측유입량에 근접하는 모의결과를 나타내었다. 결론적으로, 융적설이 영향을 미치는 유역에서의 데이터 모델 적용시에는 입력자료 구축시 적설 및 융설이 물리적으로 타당한 강우-유출 반응에 적합하도록 전처리과정이 중요함을 밝혔다.

Keywords

References

  1. Arnold, J.G., Srinivasan, R., Muttiah, R.S., and Williams, J.R. (1998). "Large area hydrologic modeling and assessment part I: Model development." Journal of the American Water Resources Association, Vol. 34, No. 1, pp. 73-89. https://doi.org/10.1111/j.1752-1688.1998.tb05961.x
  2. Babur, M., Babel, M.S., Shrestha, S., Kawasaki, A., and Tripathi, N.K. (2016). "Assessment of climate change impact on reservoir inflows using multi climate-models under RCPs-the case of Mangla Dam in Pakistan." Water, Vol. 8, No. 9, 389.
  3. Bai, S., Kolter, J.Z., and Koltun, V. (2018). "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling." arXiv, doi: 10.48550/arXiv.1803.01271.
  4. Bicknell, B.R., Imhoff, J.C., Kittle, Jr., J.L. Jobes, T.H., and Donigian, Jr., A.S. (2001). Hydrological Simulation Program - FORTRAN (HSPF), User's manual for version 12.0. United States Environmental Protection Agency, Athens, Georgia.
  5. Breiman, L. (2001). "Random forests." Machine Learning, Vol. 45, pp. 5-32. https://doi.org/10.1023/A:1010933404324
  6. Chen, T., and Guestrin, C. (2016). "XGBoost: A scalable tree boosting system." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, U.S., pp. 785-794. doi: 10.1145/2939672.2939785.
  7. Cortes, C., and Vapnik, V. (1995). "Support-vector networks." Machine Learning, Vol. 20, No. 3, pp. 273-297.
  8. Day, G.N. (1990). A methodology for updating a conceptual snow model with snow measurements. The Johns Hopkins University ProQuest Dissertations Publishing, Baltimore, MD, U.S.
  9. El-Nasr, A.A., Arnold, J.G., Feyen, J., and Berlamont, J. (2005). "Modelling the hydrology of a catchment using a distributed and a semi-distributed model." Hydrological Processes: An International Journal, Vol. 19, No. 3, pp. 573-587. https://doi.org/10.1002/hyp.5610
  10. Fan, H., Jiang, M., Xu, L., Zhu, H., Cheng, J., and Jiang, J. (2020). "Comparison of long short term memory networks and the hydrological model in runoff simulation." Water, Vol. 12, No. 1, 175.
  11. Ghoraba, S.M. (2015). "Hydrological modeling of the Simly Dam watershed (Pakistan) using GIS and SWAT model." Alexandria Engineering Journal, Vol. 54, No. 3, pp. 583-594.
  12. Hochreiter, S., and Schmidhuber, J. (1997). "Long short-term memory." Neural Computation, Vol. 9, No. 8, pp. 1735-1780. doi: 10.1162/neco.1997.9.8.1735.
  13. Hu, C., Wu, Q., Li, H., Jian, S., Li, N., and Lou, Z. (2018). "Deep learning with a long short-term memory networks approach for rainfall-runoff simulation." Water, Vol. 10, No. 11, 1543.
  14. Janiesch, C., Zschech, P., and Heinrich, K. (2021). "Machine learning and deep learning." Electronic Markets, Vol. 31, No. 3, pp. 685-695. https://doi.org/10.1007/s12525-021-00475-2
  15. Jo, Y., and Jung, K. (2023). "Comparative study of machine learning and deep learning models applied to data preprocessing methods for dam inflow prediction." GEO DATA, Vol. 5, No. 2, pp. 92-102. https://doi.org/10.22761/GD.2023.0016
  16. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). "Lightgbm: A highly efficient gradient boosting decision tree." 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, U.S.
  17. Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M. (2018). "Rainfall-runoff modelling using Long Short-Term Memory (LSTM) networks." Hydrology and Earth System Sciences, Vol. 22, No. 11, pp. 6005-6022. https://doi.org/10.5194/hess-22-6005-2018
  18. Leavesley, G.H. (1984). Precipitation-runoff modeling system: User's manual (Vol. 83, No. 4238). US Department of the Interior, Washington, D.C., U.S.
  19. Lee, S.H., Ahn, T.J., Yun, B.M., and Shim, M.P. (2003). "A tank model application to Soyanggang Dam and Chungju Dam with snow accumulation and snow melt." Journal of Korea Water Resources Association, Vol. 36, No. 5, pp. 851-861. https://doi.org/10.3741/JKWRA.2003.36.5.851
  20. Nash, J.E., and Sutcliffe, J.V. (1970). "River flow forecasting through conceptual models part I - A discussion of principles." Journal of Hydrology, Vol. 10, No. 3, pp. 282-290. https://doi.org/10.1016/0022-1694(70)90255-6
  21. Thapa, S., Zhao, Z., Li, B., Lu, L., Fu, D., Shi, X., Tang, B., and Qi, H. (2020). "Snowmelt-driven streamflow prediction using machine learning techniques (LSTM, NARX, GPR, and SVR)." Water, Vol. 12, 1734.
  22. Xiang, Z., Yan, J., and Demir, I. (2020). "A rainfall-runoff model with LSTM-based sequence-to-sequence learning." Water Resources Research, Vol. 56, No. 1, e2019WR025326.