DOI QR코드

DOI QR Code

Prediction of Key Variables Affecting NBA Playoffs Advancement: Focusing on 3 Points and Turnover Features

미국 프로농구(NBA)의 플레이오프 진출에 영향을 미치는 주요 변수 예측: 3점과 턴오버 속성을 중심으로

  • An, Sehwan (Graduate School of Technology & Innovation Management, Hanyang University) ;
  • Kim, Youngmin (Graduate School of Technology & Innovation Management, Hanyang University)
  • 안세환 (한양대학교 기술경영학과) ;
  • 김영민 (한양대학교 기술경영학과)
  • Received : 2022.03.04
  • Accepted : 2022.03.22
  • Published : 2022.03.31

Abstract

This study acquires NBA statistical information for a total of 32 years from 1990 to 2022 using web crawling, observes variables of interest through exploratory data analysis, and generates related derived variables. Unused variables were removed through a purification process on the input data, and correlation analysis, t-test, and ANOVA were performed on the remaining variables. For the variable of interest, the difference in the mean between the groups that advanced to the playoffs and did not advance to the playoffs was tested, and then to compensate for this, the average difference between the three groups (higher/middle/lower) based on ranking was reconfirmed. Of the input data, only this year's season data was used as a test set, and 5-fold cross-validation was performed by dividing the training set and the validation set for model training. The overfitting problem was solved by comparing the cross-validation result and the final analysis result using the test set to confirm that there was no difference in the performance matrix. Because the quality level of the raw data is high and the statistical assumptions are satisfied, most of the models showed good results despite the small data set. This study not only predicts NBA game results or classifies whether or not to advance to the playoffs using machine learning, but also examines whether the variables of interest are included in the major variables with high importance by understanding the importance of input attribute. Through the visualization of SHAP value, it was possible to overcome the limitation that could not be interpreted only with the result of feature importance, and to compensate for the lack of consistency in the importance calculation in the process of entering/removing variables. It was found that a number of variables related to three points and errors classified as subjects of interest in this study were included in the major variables affecting advancing to the playoffs in the NBA. Although this study is similar in that it includes topics such as match results, playoffs, and championship predictions, which have been dealt with in the existing sports data analysis field, and comparatively analyzed several machine learning models for analysis, there is a difference in that the interest features are set in advance and statistically verified, so that it is compared with the machine learning analysis result. Also, it was differentiated from existing studies by presenting explanatory visualization results using SHAP, one of the XAI models.

본 연구는 웹 크롤링을 이용하여 1990년부터 2022년까지 총 32개년에 해당하는 NBA 통계 정보를 획득하고, 탐색적 데이터 분석을 통해 관심 변수를 관찰하고 관련된 파생변수를 생성한다. 입력 데이터에 대한 정제 과정을 거쳐 무의미한 변수들을 제거하고, 남은 변수에 대한 상관관계 분석, t 검정 및 분산분석을 수행하였다. 관심 변수에 대해 플레이오프 진출/미진출 그룹 간 평균의 차이를 검정하였고, 이를 보완하기 위해 순위를 기준으로 하는 3개 집단(상위/중위/하위) 간 평균 차이를 재확인하였다. 입력 데이터 중 올해 시즌 데이터만을 테스트 세트로 활용하였고, 모델 훈련을 위해서는 훈련 세트와 검증 세트를 분할하여 5-fold 교차검증을 수행하였다. 교차검증 결과와 시험 세트를 이용한 최종 분석 결과를 비교하여 성능 지표에서 차이가 없음을 확인함으로써 과적합 문제를 해결하였다. 원시 데이터의 품질 수준이 높고, 통계적 가정을 만족하기 때문에 적은 수준의 데이터 세트임에도 불구하고 대부분 모델에서 좋은 결과를 나타냈다. 본 연구는 단순히 머신러닝을 이용하여 NBA의 경기 결과를 예측하거나 플레이오프 진출 여부만을 분류하는 것에서 그치지 않고, 입력 특성의 중요도를 파악하여 높은 중요도를 갖는 주요 변수에 본 연구의 관심 대상 변수가 포함되는지를 확인하였다. Shap value의 시각화를 통해 특성 중요도의 결과만으로 해석할 수 없었던 한계를 극복하고, 변수의 진입/제거 과정에서 중요도 산출에 일관성이 부족하다는 점을 보완할 수 있었다. 본 연구에서 관심 대상으로 분류했던 3점 및 실책과 관련된 다수의 변수가 미국 프로농구에서의 플레이오프 진출에 영향을 미치는 주요 변수에 포함되는 것으로 나타났다. 본 연구는 기존의 스포츠 데이터 분석 분야에서 다루었던 경기 결과, 플레이오프 및 우승 예측 등의 주제를 포함하고 분석을 위해 여러 머신러닝 모델을 비교 분석했다는 점에서 유사성이 있지만, 사전에 관심 속성을 설정하고, 이를 통계적으로 검증함으로써 머신러닝 분석 결과와 비교하였다는 측면에서 차이가 있다. 또한 XAI 모델 중 하나인 SHAP를 이용하여 설명 가능한 시각화 결과를 제시함으로써 기존 연구와 차별화하였다.

Keywords

References

  1. Albert, A. A., de Mingo Lopez, Luis Fernando, K. Allbright and N. Gomez Blas, "A Hybrid Machine Learning Model for Predicting USA NBA All-Stars," ELECTRONICS, Vol.11, No.1(2022), 97-112.
  2. Wang, Y., W. Liu and X. Liu, "Explainable AI techniques with application to NBA gameplay prediction," Neurocomputing, Vol.483(2022), 59-71. https://doi.org/10.1016/j.neucom.2022.01.098
  3. Araujo, D., M. Couceiro, L. Seifert, H. Sarmento and K. Davids, Artificial Intelligence in Sport Performance Analysis , Routledge, New York, 2021.
  4. Bai, Z. and X. Bai, "Sports Big Data: Management, Analysis, Applications, and Challenges," COMPLEXITY, Vol.2021(2021), 6676297-6676307.
  5. Chen, W., M. Jhou, T. Lee and C. Lu, "Hybrid Basketball Game Outcome Prediction Model by Integrating Data Mining Methods for the National Basketball Association," ENTROPY, Vol.23, No.4(2021), 477-490. https://doi.org/10.3390/e23040477
  6. Geurkink, Y., J. Boone, S. Verstockt and J. G. Bourgois, "Machine Learning-Based Identification of the Strongest Predictive Variables of Winning and Losing in Belgian Professional Soccer," APPLIED SCIENCES-BASEL, Vol.11, No.5(2021), 2378-2388. https://doi.org/10.3390/app11052378
  7. Jain, P. K., W. Quamer and R. Pamula, "Sports result prediction using data mining techniques in comparison with base line model," OPSEARCH, Vol.58, No.1(2021), 54-70. https://doi.org/10.1007/s12597-020-00470-9
  8. Liu, S., Predicting NBA Playoffs Using Machine Learning, rScroll , 2021.
  9. Wang, J. and Q. Fan, "Application of Machine Learning on NBA Data Sets," Journal of physics. Conference series, Vol.1802, No.3(2021), 32036.
  10. Yazbek, D., J. S. Sibindi and T. L. Van Zyl, "Deep Similarity Learning for Sports Team Ranking," 2021 SAUPEC/RobMech/PRASA (2021), 1-6.
  11. Choi, Y. H. and K. H. Lee, "Analysis of Football Fans' Uniform Consumption: Before and After Son Heung-Min's Transfer to Tottenham Hotspur FC," J Intell Inform Syst, Vol.26, No.3(2020), 91-108. https://doi.org/10.13088/JIIS.2020.26.3.091
  12. Eom, H., J. Kim and S. Choi, "Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble mode," J Intell Inform Syst, Vol.26, No.2(2020), 105-129. https://doi.org/10.13088/JIIS.2020.26.2.105
  13. Geng, S. and T. Hu, "Sports Games Modeling and Prediction using Genetic Programming," In 2020 IEEE Congress on Evolutionary Computation (CEC)(2020), 1-6.
  14. Han, D. Y., M. Hawkins and H. J. Choi, "Analysis of different types of turnovers between winning and losing performances in men's NCAA basketball," Journal of the Korea Society of Computer and Information, Vol.25, No.7(2020), 135-142. https://doi.org/10.9708/JKSCI.2020.25.07.135
  15. Horvat, T. and J. Job, "The use of machine learning in sport outcome prediction: A review," Wiley interdisciplinary reviews. Data mining and knowledge discovery, Vol.10, No.5(2020), e1380.
  16. Horvat, T., L. Havas and D. Srpak, "The impact of selecting a validation method in machine learning on predicting basketball game outcomes," Symmetry, Vol.12, No.3(2020), art. no. 431.
  17. Migliorati, M., "Detecting drivers of basketball successful games: an exploratory study with machine learning algorithms," Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. Electronic Journal of Applied Statistical Analysis, Vol.13, No.2 (2020), 454-473.
  18. Oh, J., Y. Lee and G. Kim, "Improvement of Solar Power Forecasting Using Interpretation of Artificial Intelligence," The transactions of The Korean Institute of Electrical Engineers, Vol.69, No.7(2020), 1112-1117.
  19. Yi, J. H. and S. W. Lee, "Prediction of English Premier League Game Using an Ensemble Technique," KIPS Trans. Softw. and Data Eng., Vol.9, No.5(2020), 161-168. https://doi.org/10.3745/KTSDE.2020.9.5.161
  20. Chen, Y., J. Dai and C. Zhang, "A neural network model of the NBA most valued player selection prediction," ACM International Conference Proceeding Series(2019), 16.
  21. Horvat, T. and J. Job, "Importance of the training dataset length in basketball game outcome prediction by using naive classification machine learning methods," Elektroteh.Vestn. Electrotech.Rev., Vol.86, No.4(2019), 197.
  22. Mandic, R., S. Jakovljevic, F. Erculj and E. Strumbelj, "Trends in NBA and Euroleague basketball: Analysis and comparison of statistical data from 2000 to 2017," PLoS ONE, Vol.14, No.10(2019), 1-17.
  23. Thabtah, F., L. Zhang and N. Abdelhamid, "NBA Game Result Prediction Using Feature Analysis and Machine Learning," Annals of Data Science, Vol.6, No.1(2019), 103-116. https://doi.org/10.1007/s40745-018-00189-x
  24. Hsu, P. -., S. Galsanbadam, J. -. Yang and C. -. Yang, "Evaluating Machine Learning Varieties for NBA Players' Winning Contribution," 2018 International Conference on System Science and Engineering (ICSSE)(2018), 1-6.
  25. Lai, M., R. Meo, R. Schifanella and E. Sulis, "The role of the network of matches on predicting success in table tennis," J.Sports Sci., Vol.36, No.23(2018), 2691-2698. https://doi.org/10.1080/02640414.2018.1482813
  26. Lam, M. W. Y., "ONE-MATCH-AHEAD FORECASTING IN TWO-TEAM SPORTS WITH STACKED BAYESIAN REGRESSIONS," JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, Vol.8, No.3(2018), 159-171. https://doi.org/10.1515/jaiscr-2018-0011
  27. Rahman, M. H. A. A., A. Mustapha, N. Razali and R. Fauzi, "Bayesian approach to classification of football match outcome," International Journal of Integrated Engineering, Vol.10, No.6(2018), 155.
  28. Bianchi, F., T. Facchinetti and P. Zuccolotto, "Role revolution: towards a new meaning of positions in basketball," Electronic Journal of Applied Statistical Analysis, Vol.10, No.3(2017), 712-734.
  29. Giuliodori, P., "An artificial neural network-based prediction model for underdog teams in NBA matches," CEUR Workshop Proceedings, Vol.1971(2017), 73-82.
  30. Kaur, H. and S. Jain, "Machine learning approaches to predict basketball game outcome," The 3rd International Conference on Advances in Computing Communication & Automation (ICACCA)(2017), 1-7.
  31. Leicht, A. S., M. A. Gomez and C. T. Woods, "Team Performance Indicators Explain Outcome during Women's Basketball Matches at the Olympic Games," Sports (2075-4663), Vol.5, No.4(2017), 96-103. https://doi.org/10.3390/sports5040096
  32. Pai, P., L. ChangLiao and K. Lin, "Analyzing basketball games by a support vector machines with decision tree model," Neural Computing & Applications, Vol.28, No.12(2017), 4159-4167. https://doi.org/10.1007/s00521-016-2321-9
  33. Cheng, G., Z. Zhang, M. N. Kyebambe and N. Kimbugwe, "Predicting the Outcome of NBA Playoffs Based on the Maximum Entropy Principle," Entropy, Vol.18, No.12(2016), 450-464. https://doi.org/10.3390/e18120450
  34. Prasetio, D. and D. Harlili, "Predicting football match results with logistic regression," 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA)(2016), 1-5.
  35. Soto Valero, C., "Predicting win-loss outcomes in MLB regular season games-a comparative study using data mining methods," International Journal of Computer Science in Sport, Vol.15, No.2(2016), 91. https://doi.org/10.1515/ijcss-2016-0007
  36. Al-Jarrah, O. Y., P. D. Yoo, S. Muhaidat, G. K. Karagiannidis and K. Taha, "Efficient Machine Learning for Big Data: A Review," Big Data Research, Vol.2, No.3(2015), 87-93. https://doi.org/10.1016/j.bdr.2015.04.001
  37. Kempe, M., A. Grunz and D. Memmert, "Detecting tactical patterns in basketball: Comparison of merge self-organising maps and dynamic controlled neural networks," European Journal of Sport Science, Vol.15, No.4(2015), 249-255. https://doi.org/10.1080/17461391.2014.933882
  38. Lopez, M. J. and G. J. Matthews, "Building an NCAA men's basketball predictive model and quantifying its success," Journal of Quantitative Analysis in Sports, Vol.11, No.1(2015), 5-12.
  39. Leung, C. K. and K. W. Joseph, "Sports Data Mining: Predicting Results for the College Football Games," Procedia Computer Science, Vol.35(2014), 710-719. https://doi.org/10.1016/j.procs.2014.08.153
  40. Zimmermann, A., S. Moorthy and Z. Shi, Predicting college basketball match outcomes using machine learning techniques: some results and lessons learned, arXiv, 2013. Available at https://arxiv.org/pdf/1310.3607.pdf (Downloaded February 5, 2022).
  41. Kim, S. H., J. W. Lee and M. S. Lee, "Estimating the determinants of victory and defeat through analyzing records of Korean pro-basketball," Journal of the Korean Data And Information Science Society, Vol.23, No.5(2012), 993-1003. https://doi.org/10.7465/jkdi.2012.23.5.993
  42. Schmidt, A., "Movement pattern recognition in basketball free-throw shooting," Human Movement Science, Vol.31, No.2(2012), 360-382. https://doi.org/10.1016/j.humov.2011.01.003
  43. Pak, S. I. and T. H. Oh, "The Application of Analysis of Variance (ANOVA)," Journal of Veterinary Clinics, Vol.27, No.1(2010), 71-78.
  44. Kubatko, J., D. Oliver, K. Pelton and D. T. Rosenbaum, "A starting point for analyzing basketball statistics," Journal of Quantitative Analysis in Sports, Vol.3, No.3(2007), undefined.
  45. Joseph, A., N. E. Fenton and M. Neil, "Predicting football results using Bayesian nets and other machine learning techniques," Knowledge-Based Syst., Vol.19, No.7(2006), 544-553. https://doi.org/10.1016/j.knosys.2006.04.011
  46. Nunes, S., M. Sousa and d. E. Faculdade, Applying data mining techniques to football data from European championships, OpenAIRE, Europe, 2006.
  47. Lee, G. B., "The factors of KBL team's playoff pass and winning percent," Korean Journal of Sport Science, Vol.15, No.3(2004), 41-50.