DOI QR코드

DOI QR Code

Categorical time series clustering: Case study of Korean pro-baseball data

범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구

  • Pak, Ro Jin (Department of Applied Statistics, Dankook University)
  • 박노진 (단국대학교 응용통계학과)
  • Received : 2016.03.12
  • Accepted : 2016.05.11
  • Published : 2016.05.31

Abstract

A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

범주형 시계열 자료의 군집화에 대하여 정리해 보았다. 시계열 자료의 군집화는 일반적인 군집화에 시간을 고려해야하는 측면이 있다. 한편, 범주형 시계열 자료의 군집화에 대한 연구가 진행되었으나 현재 정리 요약된 국내외 논문을 찾기 어렵다. 본 논문에서는 범주형 시계열을 군집화 하는 몇 가지 방법들을 제시하고 그 방법들을 비교하기 위해 프로야구 데이터를 이용하였다. 프로야구 팀들 간에 어떤 팀이 특정 팀에 유독 약한 경기력을 보이는 경우가 있다. 국내 최강이라는 S팀이 유독 H팀에게 그런 경우가 그렇다. 2015년 S팀의 상대전적의 군집화를 통해 S팀과 H팀의 관계가 유별난 지를 밝히려 한다. 통계적으로 말하자면, 승/패로 이루어진 시계열 자료의 군집화를 수행하려는 것이다. 분석결과 S팀과 H팀과의 관계가 다른 팀들과의 관계에 비해 눈에 띠는 차이가 있음을 알 수 있었다.

Keywords

References

  1. Aghabozorgi, S., Shirkhorshidi, A. S. and Wah, T. Y. (2015). Time series clustering - A decade review. Information Systems, 53, 16-38. https://doi.org/10.1016/j.is.2015.04.007
  2. Cho, Y. J. and Lee, K. H. (2015). Bayesian estimation of the Korea professional baseball players' hitting ability based on the batting average. Journal of the Korean Data & Information Science Society, 26, 197-207. https://doi.org/10.7465/jkdi.2015.26.1.197
  3. Choi, S. S., Cha, S. H. and Tappert, C. C. (2010). A survey of binary similarity and distance measures. Systems, Cybernetics and Informatics, 8, 43-48.
  4. Han, G. H., Chung, J. and Yoo, J. K. (2014). A study on prediction for attendances of Korean probaseball games using covariates. Journal of the Korean Data & Information Science Society, 25, 1481-1489. https://doi.org/10.7465/jkdi.2014.25.6.1481
  5. Hillis, D. M. and Bull, J. J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systems Biology, 42, 182-192. https://doi.org/10.1093/sysbio/42.2.182
  6. Hillis, D. M., Huelsenbeck, J. P. and Cunningham, C. W. (1994). Application and accuracy of molecular phylogenesis. Science, 264, 671-677. https://doi.org/10.1126/science.8171318
  7. Jukes, T. H. and Cantor, C. R. (1969). Evolution of protein molecules in mammalian protein metabolism, Academic Press, New York.
  8. Jung, Y. A. and Jeon, J. H. (2015). A fusion of the period characterized and hierarchical bayesian techniques for efficient cluster analysis of time series data. Journal of Digital Convergence, 13, 169-175.
  9. Kim, N. K. and Kim, S. H. (2015). Comprehensive evaluation of baseball player's offensive ability by use of simulation. Journal of the Korean Data & Information Science Society, 26, 865-874. https://doi.org/10.7465/jkdi.2015.26.4.865
  10. Lee, J. T. (2015a). Long term trends in the Korean professional baseball. Journal of the Korean Data & Information Science Society, 26, 1-10. https://doi.org/10.7465/jkdi.2015.26.1.1
  11. Lee, J. T. (2015b). Measuring the accuracy of the Pythagorean theorem in Korean pro-baseball. Journal of the Korean Data & Information Science Society, 26, 653-659. https://doi.org/10.7465/jkdi.2015.26.3.653
  12. Lim, J. Y., Zhang, B.-T. and Lee, K. M. (2001). Clustering fMRI time series using self-organizing map. Proceeding of KFIS Fall Conference, 251-254.
  13. Park, M. S. and Kim, H. Y. (2008). Classification of precipitation data based on smoothed periodogram. The Korean Journal of Applied Statistics, 21, 547-560. https://doi.org/10.5351/KJAS.2008.21.3.547
  14. Proakis, J. G. and Manolakis, D. K. (2006). Digital signal processing: Principles, algorithms, and applications, Prentice Hall, New York.
  15. Saitou, N. and Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406-425.

Cited by

  1. Graphical exploratory data analysis for ball games in sports vol.27, pp.5, 2016, https://doi.org/10.7465/jkdi.2016.27.5.1413