DOI QR코드

DOI QR Code

Forecasting daily PM10 concentrations in Seoul using various data mining techniques

  • Choi, Ji-Eun (Department of Statistics, Ewha Womans University) ;
  • Lee, Hyesun (Department of Statistics, Ewha Womans University) ;
  • Song, Jongwoo (Department of Statistics, Ewha Womans University)
  • Received : 2017.11.02
  • Accepted : 2017.12.27
  • Published : 2018.03.31

Abstract

Interest in $PM_{10}$ concentrations have increased greatly in Korea due to recent increases in air pollution levels. Therefore, we consider a forecasting model for next day $PM_{10}$ concentration based on the principal elements of air pollution, weather information and Beijing $PM_{2.5}$. If we can forecast the next day $PM_{10}$ concentration level accurately, we believe that this forecasting can be useful for policy makers and public. This paper is intended to help forecast a daily mean $PM_{10}$, a daily max $PM_{10}$ and four stages of $PM_{10}$ provided by the Ministry of Environment using various data mining techniques. We use seven models to forecast the daily $PM_{10}$, which include five regression models (linear regression, Randomforest, gradient boosting, support vector machine, neural network), and two time series models (ARIMA, ARFIMA). As a result, the linear regression model performs the best in the $PM_{10}$ concentration forecast and the linear regression and Randomforest model performs the best in the $PM_{10}$ class forecast. The results also indicate that the $PM_{10}$ in Seoul is influenced by Beijing $PM_{2.5}$ and air pollution from power stations in the west coast.

Keywords

References

  1. Box GEP and Jenkins GM (1976). Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco.
  2. Breiman L (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  3. Chaloulakou A, Kassomenos P, Spyrellis N, Demokritou P, and Koutrakis P (2003). Measurements of $PM_{10}$ and $PM_{2.5}$ particle concentrations in Athens, Greece, Atmospheric Environment, 37, 649-660. https://doi.org/10.1016/S1352-2310(02)00898-1
  4. Cheng S,Wang F, Li J, Chen D, Li M, Zhou Y, and Ren Z (2013). Application of trajectory clustering and source apportionment methods for investigating trans-boundary atmospheric $PM_{10}$ pollution, Aerosol and Air Quality Research, 13, 333-342.
  5. Cortes C and Vapnik V (1995). Support-vector networks, Machine Learning, 20, 273-297.
  6. Friedman JH (2002). Stochastic gradient boosting, Computational Statistics & Data Analysis, 38, 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2
  7. Granger CWJ and Roselyne J (1980). An introduction to long-memory time series model and frac-tional differencing. Journal of Time Series Analysis, 1, 15-29. https://doi.org/10.1111/j.1467-9892.1980.tb00297.x
  8. Hastie T, Tibshirani R, and Friedman J (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (2nd ed), Springer-Verlag, New York.
  9. Hooyberghs J, Mensink C, Dumont G, Fierens F, and Brasseur O (2005). A neural network forecast for daily average $PM_{10}$ concentrations in Belgium, Atmospheric Environment, 39, 3279-3289. https://doi.org/10.1016/j.atmosenv.2005.01.050
  10. Kubat M, Holte R, and Matwin S (1997). Learning when negative examples abound. In Proceedings of the 9th European Conference on Machine Learning (pp. 146-153), Springer, London.
  11. Nejadkoorki F and Baroutian S (2012). Forecasting extreme $PM_{10}$ concentrations using artificial Neural Networks, International Journal of Environmental Research, 6, 277-284.
  12. Park C, Kim Y, Kim J, Song J, and Choi H (2011). Datamining using R, Kyowoo, Seoul.
  13. Perez P and Reyes J (2006). An integrated neural network model for $PM_{10}$ forecasting, Atmospheric Environment, 40, 2845-2851. https://doi.org/10.1016/j.atmosenv.2006.01.010
  14. Poggi JM and Portier B (2011). $PM_{10}$ forecasting using clusterwise regression, Atmospheric Environment, 45, 7005-7014. https://doi.org/10.1016/j.atmosenv.2011.09.016
  15. Ridgeway G (2012). Generalized Boosted Models: A guide to the gbm package, Accessed March 31, 2010, from: http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf
  16. Sayegh AS, Munir S, and Habeebullah TM (2014). Comparing the performance of statistical models for predicting $PM_{10}$ concentrations, Aerosol and Air Quality Research, 14, 653-665.
  17. Shaughnessy WJ, Venigalla MM, and Trump D (2015). Health effects of ambient levels of res-pirable particulate matter (PM) on healthy, young-adult population, Atmospheric Environment, 123, 102-111. https://doi.org/10.1016/j.atmosenv.2015.10.039
  18. Taneja K, Ahmad S, Ahmad K, and Attri SD (2016). Time series analysis of aerosol optical depth over New Delhi using Box-Jenkins ARIMA modeling approach, Atmospheric Pollution Research, 7, 585-596. https://doi.org/10.1016/j.apr.2016.02.004
  19. Zuniga J, Tarajia M, Herrera V, Urriola W, Gomez B, and Motta J (2016). Assessment of the possible association of air pollutants $PM_{10}$, $O_3$, $NO_2$ with an increase in cardiovascular, respiratory, and diabetes mortality in Panama City, Medicine, 95, e2464. https://doi.org/10.1097/MD.0000000000002464