Classification of Time-Series Data Based on Several Lag Windows

- Journal title : Communications for Statistical Applications and Methods
- Volume 17, Issue 3, 2010, pp.377-390
- Publisher : The Korean Statistical Society
- DOI : 10.5351/CKSS.2010.17.3.377

Title & Authors

Classification of Time-Series Data Based on Several Lag Windows

Kim, Hee-Young; Park, Man-Sik;

Kim, Hee-Young; Park, Man-Sik;

Abstract

In the case of time-series analysis, it is often more convenient to rely on the frequency domain than the time domain. Spectral density is the core of the frequency-domain analysis that describes autocorrelation structures in a time-series process. Possible ways to estimate spectral density are to compute a periodogram or to average the periodogram over some frequencies with (un)equal weights. This can be an attractive tool to measure the similarity between time-series processes. We employ the metrics based on a smoothed periodogram proposed by Park and Kim (2008) for the classification of different classes of time-series processes. We consider several lag windows with unequal weights instead of a modified Daniel's window used in Park and Kim (2008). We evaluate the performance under various simulation scenarios. Simulation results reveal that the metrics used in this study split the time series into the preassigned clusters better than do the raw-periodogram based ones proposed by Caiado et al. 2006. Our metrics are applied to an economic time-series dataset.

Keywords

Clustering;autoregressive model;moving-average model;smoothed periodogram;nonstationary time series;

Language

English

References

1.

Baker, F. B. and Hubert, L. J. (1975). Measuring the power of hierarchical cluster analysis, Journal of the American Statistical Association, 70, 31-38.

2.

Bohte, Z., Cepar, D. and Kosmelij, K. (1980). Clustering of time series, In Proceedings of COMPSTAT, 587-593.

3.

Brillinger, D. (1981). Time Series: Data Analysis and Theory, Holden-Day, San Francisco.

4.

Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, Springer-Verlag, New York.

5.

Caiado, J., Crato, N. and Pena, D. (2006). A periodogram-based metric for time series classification, Computational Statistics and Data Analysis, 50, 2668-2684.

6.

Chatfield, C. (1975). The Analysis of Time Series: Theory and practice, Chapman & Hall, London.

7.

Chen, G., Abraham, B. and Peiris, S. (1994). Lag window estimation of the degree of differencing in fractionally integrated time series models, Journal of Time Series Analysis, 15, 473-487.

8.

Corduas, M. and Piccolo, D. (2008). Time series clustering and classification by the autoregressive metric, Computational Statistics and Data Analysis, 52, 1860-1872.

9.

Cowpertwait, P. S. P. and Cox, T. F. (1992). Clustering population means under heterogeneity of variance with an application to a rainfall time series problem, The Statistician, 41, 113-121.

10.

Galeano, P. and Pena, D. (2000). Multivariate analysis in vector time series, Resenhas, 4, 383-403.

11.

Golay, X., Kollias, S., Stoll, G., Meier, D., Valvanis, A. and Boesiger, P. (1998). A new correlation-based fuzzy logic clustering algorithm for fMRI, Magnetic Resonance in Medicine, 40, 249-260.

12.

Goutte, C., Toft, P., Rostrup, E., Nielsen, F. A. and Hansen, L. K. (1999). On clustering fMRI time series, Neuroimage, 9, 298-310.

13.

Kakizawa, Y., Shumway, R. H. and Taniguchi, M. (1998). Discrimination and clustering for multivariate time series, Journal of American Statstical Association, 93, 328-340.

14.

Kovacic, Z. J. (1996). Classification of time series with applications to the leading indicator selection, In Proceedings of the Fifth Conference of IFCS, 2, 204-207.

15.

Kullback, S. (1978). Information Theory and Statistics, Peter Smith, Gloucester, Massachusetts.

16.

Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, Annals of Mathematical Statistics, 22, 79-86.

17.

Macchiato, M., La Rotonda, L., Lapenna, V. and Ragosta, M. (1995). Time modelling and spatial clustering of daily ambient temperature an application in Southern Italy, Environmetrics, 6, 31-53.

19.

Park, M. S. and Kim, H.-Y. (2008). Classification of precipitation data based on smoothed periodogram, The Korean Journal of Applied Statistics, 21, 547-560.

20.

Pena, D. and Poncela, P. (2006). Nonstationary dynamic factor models, Journal of Statistical Planning and Inference, 136, 1237-1257.

21.

Piccolo, D. (1990). A distance measure for classifying ARIMA models, Journal of Time Series Analysis, 11, 153-164.

22.

Priestley, M. B. (1981). Spectral Analysis and Time Series, Academic Press, New York.

23.

R Development Core Team (2006). R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.