- Volume 21 Issue 3
We analyzed Korean professional basketball and baseball players salary under the assumption that it depends on the personal records and contribution to the team in the previous year. We extensively used data visualization tools to check the relationship among the variables, to find outliers and to do model diagnostics. We used multiple linear regression and regression tree to fit the model and used cross-validation to find an optimal model. We check the relationship between variables carefully and chose a set of variables for the stepwise regression instead of using all variables. We found that points per game, number of assists, number of free throw successes, career are important variables for the basketball players. For the baseball pitchers, career, number of strike-outs per 9 innings, ERA, number of homeruns are important variables. For the baseball hitters, career, number of hits, FA are important variables.
Professional sports;salary;multiple linear regression;regression tree;model Optimization
- Torgo, L. (2002). Data Mining with R, http://www.liaad.up.pt/ ltorgo/DataMiningWithR/
- Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S, Springer, New York
- Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees, Chapman & Hall/CRC, New York
- Cleveland, W. S. (1993). Visualizing Data, Hobart Press, New Jersey
- Hastie, T., Tibshirani, R. and Friedman, J. H. (2003). The Elements of Statistical Learning, Springer, New York
- Hoaglin, D. C. and Velleman, P. F. (1995). A critical look at some analyses of major league baseball salaries, The American Statistician, 49, 277-285 https://doi.org/10.2307/2684201
- Murrell, P. (2006). R Graphics, Chapman & Hall/CRC, New York
- Watnik, M. R. (1998). Pay for play: Are baseball salaries based on performance?, Journal of Statistics Education. 6.