Publisher : Korean Data and Information Science Society
DOI : 10.7465/jkdi.2016.27.3.621
Title & Authors
Categorical time series clustering: Case study of Korean pro-baseball data Pak, Ro Jin;
A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is `time series clustering`, or more specifically `categorical time series clustering`. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.
Categorical time series;evolutionary tree;frequency analysis;periodogram;spectral analysis;
Aghabozorgi, S., Shirkhorshidi, A. S. and Wah, T. Y. (2015). Time series clustering - A decade review. Information Systems, 53, 16-38.
Cho, Y. J. and Lee, K. H. (2015). Bayesian estimation of the Korea professional baseball players' hitting ability based on the batting average. Journal of the Korean Data & Information Science Society, 26, 197-207.
Choi, S. S., Cha, S. H. and Tappert, C. C. (2010). A survey of binary similarity and distance measures. Systems, Cybernetics and Informatics, 8, 43-48.
Han, G. H., Chung, J. and Yoo, J. K. (2014). A study on prediction for attendances of Korean probaseball games using covariates. Journal of the Korean Data & Information Science Society, 25, 1481-1489.
Hillis, D. M. and Bull, J. J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systems Biology, 42, 182-192.
Hillis, D. M., Huelsenbeck, J. P. and Cunningham, C. W. (1994). Application and accuracy of molecular phylogenesis. Science, 264, 671-677.
Jukes, T. H. and Cantor, C. R. (1969). Evolution of protein molecules in mammalian protein metabolism, Academic Press, New York.
Jung, Y. A. and Jeon, J. H. (2015). A fusion of the period characterized and hierarchical bayesian techniques for efficient cluster analysis of time series data. Journal of Digital Convergence, 13, 169-175.
Kim, N. K. and Kim, S. H. (2015). Comprehensive evaluation of baseball player's offensive ability by use of simulation. Journal of the Korean Data & Information Science Society, 26, 865-874.
Lee, J. T. (2015a). Long term trends in the Korean professional baseball. Journal of the Korean Data & Information Science Society, 26, 1-10.
Lee, J. T. (2015b). Measuring the accuracy of the Pythagorean theorem in Korean pro-baseball. Journal of the Korean Data & Information Science Society, 26, 653-659.
Lim, J. Y., Zhang, B.-T. and Lee, K. M. (2001). Clustering fMRI time series using self-organizing map. Proceeding of KFIS Fall Conference, 251-254.
Park, M. S. and Kim, H. Y. (2008). Classification of precipitation data based on smoothed periodogram. The Korean Journal of Applied Statistics, 21, 547-560.
Proakis, J. G. and Manolakis, D. K. (2006). Digital signal processing: Principles, algorithms, and applications, Prentice Hall, New York.
Saitou, N. and Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406-425.