DOI QR코드

DOI QR Code

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi (Korea Rural Economic Institute) ;
  • Kim, Jaehee (Department of Statistics, Duksung Women's University)
  • Received : 2020.07.14
  • Accepted : 2020.09.26
  • Published : 2020.11.30

Abstract

The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Keywords

References

  1. Al-Jhrrah OY, Al-Hammadi Y, and Muhaidat S (2017). Multi-layered clustering for power consumption profiling in smart grids, IEEE Access, Digital Object Identifier/ACCESS.2017.2712258.
  2. Batista GE, Wang X, and Keogh EJ (2011). A complexity-invariant distance measure for time series. In Proceedings of the 2011 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 699-710.
  3. Bohte Z, Cepar D, and Kosmelj K (1980). Clustering of time series. In Compstat 1980: Proceeding in Computational Statistics, (MM Barritt, D Wishart (eds), 587-593), Physica-Verlag, Heidelberg.
  4. Caiado J, Crato N, and Pena D (2006). A periodogram-based metric for time series classification, Computational Statistics & Data Analysis, 50, 2668-2684. https://doi.org/10.1016/j.csda.2005.04.012
  5. Chouakria AD and Nagabhushan PN (2007). Adaptive dissimilarity index for measuring time series proximity, Advances in Data Analysis and Classification, 1, 5-21. https://doi.org/10.1007/s11634-006-0004-6
  6. Dunn J (1974). Well separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, 95-104. https://doi.org/10.1080/01969727408546059
  7. D'Urso P and Maharaj EA (2009). Autocorrelation-based fuzzy clustering of time series, Fuzzy Sets and Systems, 160, 3565-3589. https://doi.org/10.1016/j.fss.2009.04.013
  8. Eiter T and Mannila H (1994) Computing discrete frechet distance (Technical Report CD-TR 94/64), Information Systems Department, Technical University of Vienna, Vienna, Austria.
  9. Frechet MM (1906). Sur quelques points du calcul fonctionnel, Rendiconti del Circolo Matematico di Palermo (1884-1940), 22, 1-72. https://doi.org/10.1007/BF03018603
  10. Galeano P and Pena D (2000). Multivariate analysis in vector time series, Department de Estadistica y Econometria, Universidad Carlos III de Madrid, Working Paper 01-24 Statistics and Econometrics Series 15.
  11. Golay X, Kollias S, Stoll G, Meier D, Valavanis A, and Boesiger P (1998). A new correlation-based fuzzy logic clustering algorithm for fMRI, Magnetic Resonance in Medicine, 40, 249-260. https://doi.org/10.1002/mrm.1910400211
  12. Haben S, Singleton C, and Grindrod P (2015). Analysis and clustering of residential customers energy behavioral demand using smart meter data, IEEE Transactions on Smart Grid, 7, 136-144. https://doi.org/10.1109/TSG.2015.2409786
  13. Kalpakis K, Gada D, and Puttagunta V (2001). Distance measures for effective clustering of ARIMA time-series. In Proceedings 2001 IEEE International Conference on Data Mining, 273-280.
  14. Keogh E, Lonardi S, and Ratanamahatana CA (2004). Towards parameter-free data mining. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 206-215.
  15. Keogh E, Lonardi S, Ratanamahatana CA, Wei L, Lee SH, and Handley J (2007). Compression-based data mining of sequential data, Data Mining and Knowledge Discovery, 14, 99-129. https://doi.org/10.1007/s10618-006-0049-3
  16. Li M, Chen X, Li X, Ma B, and Vitanyi PM (2004). The similarity metric, IEEE Transactions on Information Theory, 50, 3250-3264. https://doi.org/10.1109/TIT.2004.838101
  17. Liao TW (2005). Clustering of time series data-a survey, Pattern Recognition, 38, 1857-1874. https://doi.org/10.1016/j.patcog.2005.01.025
  18. Maharaj EA (1996). A significance test for classifying ARMA models, Journal of Statistical Computation and Simulation, 54, 305-331. https://doi.org/10.1080/00949659608811737
  19. Maharaj EA (2000). Cluster of time series, Journal of Classification, 17, 297-314. https://doi.org/10.1007/s003570000023
  20. Montero P and Vilar JA (2014). TSclust: An R package for time series clustering, Journal of Statistical Software, 62, 1-43.
  21. Moritz S and Bartz-Beielstein T (2017). imputeTS: time series missing value imputation in R, The R Journal, 9, 207-218. https://doi.org/10.32614/rj-2017-009
  22. Piccolo D (1990). A distance measure for classifying ARIMA models, Journal of Time Series Analysis, 11, 153-164. https://doi.org/10.1111/j.1467-9892.1990.tb00048.x
  23. Rousseeuw PJ (1987). Silhouettes: graphical aid to the interpretation and validation of cluster analysis, Journal of Computation and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  24. Serban N and Wasserman L (2005). CATS: clustering after transformation and smoothing, Journal of American Statistical Association, 471, 990-999. https://doi.org/10.1198/016214504000001574
  25. Stineman RW (1980). A consistently well-behaved method for interpolation, Creative Computing, 6, 54-57.
  26. Tsekouras GJ, Hatziargyriou ND, and Dialynas EN (2007). Two-stage pattern recognition of load curves for classification of electricity customers, IEEE Transactions on Power Systems, 22, 1120-1128. https://doi.org/10.1109/TPWRS.2007.901287
  27. Wang X, Smith K, and Hyndman R (2006). Characteristic-based clustering for time series data, Data Mining and Knowledge Discovery, 13, 335-364. https://doi.org/10.1007/s10618-005-0039-x
  28. Xiong Y and Yeung DY (2004). Time series clustering with ARMA mixtures, Pattern Recognition, 37, 1675-1689. https://doi.org/10.1016/j.patcog.2003.12.018