A Multivariate Analysis of Korean Professional Players Salary Song, Jong-Woo;
We analyzed Korean professional basketball and baseball players salary under the assumption that it depends on the personal records and contribution to the team in the previous year. We extensively used data visualization tools to check the relationship among the variables, to find outliers and to do model diagnostics. We used multiple linear regression and regression tree to fit the model and used cross-validation to find an optimal model. We check the relationship between variables carefully and chose a set of variables for the stepwise regression instead of using all variables. We found that points per game, number of assists, number of free throw successes, career are important variables for the basketball players. For the baseball pitchers, career, number of strike-outs per 9 innings, ERA, number of homeruns are important variables. For the baseball hitters, career, number of hits, FA are important variables.
Professional sports;salary;multiple linear regression;regression tree;model Optimization;
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees, Chapman & Hall/CRC, New York
Cleveland, W. S. (1993). Visualizing Data, Hobart Press, New Jersey
Hastie, T., Tibshirani, R. and Friedman, J. H. (2003). The Elements of Statistical Learning, Springer, New York
Hoaglin, D. C. and Velleman, P. F. (1995). A critical look at some analyses of major league baseball salaries, The American Statistician, 49, 277-285
Murrell, P. (2006). R Graphics, Chapman & Hall/CRC, New York
Torgo, L. (2002). Data Mining with R, http://www.liaad.up.pt/ ltorgo/DataMiningWithR/
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S, Springer, New York
Watnik, M. R. (1998). Pay for play: Are baseball salaries based on performance?, Journal of Statistics Education. 6.