Advanced SearchSearch Tips
Parameter estimation for the imbalanced credit scoring data using AUC maximization
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Parameter estimation for the imbalanced credit scoring data using AUC maximization
Hong, C.S.; Won, C.H.;
  PDF(new window)
For binary classification models, we consider a risk score that is a function of linear scores and estimate the coefficients of the linear scores. There are two estimation methods: one is to obtain MLEs using logistic models and the other is to estimate by maximizing AUC. AUC approach estimates are better than MLEs when using logistic models under a general situation which does not support logistic assumptions. This paper considers imbalanced data that contains a smaller number of observations in the default class than those in the non-default for credit assessment models; consequently, the AUC approach is applied to imbalanced data. Various logit link functions are used as a link function to generate imbalanced data. It is found that predicted coefficients obtained by the AUC approach are equivalent to (or better) than those from logistic models for low default probability - imbalanced data.
 Cited by
Allison, P. D. (2008). Convergence failures in logistic regression, In SAS Global Forum, 360, 1-11.

Bamber, D. C. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, Journal of Mathematical Psychology, 12, 387-415. crossref(new window)

Brown, I. and Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39, 3446-3453. crossref(new window)

Burr, I. W. (1942). Cumulative frequency functions, The Annals of Mathematical Statistics, 13, 215-232. crossref(new window)

Calabrese, R. and Osmetti, S. A. (2011). Generalized extreme value regression for binary rare events data: an application to credit defaults, Bulletin of the International Statistical Institute LXII, 58th Session of the International Statistical Institute, 5631-5634.

Cavanagh, C. and Sherman, R. P. (1998). Rank estimators for monotonic index models, Journal of Econometrics, 84, 351-381. crossref(new window)

Dreiseitl, S., Ohno-Machado, L., and Binder, M. (2000). Comparing three-class diagnostic tests by three-way ROC analysis, Medical Decision Making, 20, 323-331. crossref(new window)

Egan, J. P. (1975). Signal Detection Theory and ROC Analysis, Academic Press, New York.

Engelmann, B., Hayden, E., and Tasche, D. (2003). Measuring the discriminative power of rating systems, Risk, 82-86.

Fawcett, T. (2003). ROC graphs: Notes and practical considerations for data mining researchers, HP Labs Technical Report HPL-2003-4, CA, USA.

Han, A. K. (1987). Non-parametric analysis of a generalized regression model, the maximum rank correlation estimator, Journal of Economics, 35, 303-316. crossref(new window)

Heckerling, P. S. (2001). Parametric three-way receiver operating characteristic surface analysis using mathematica, Medical Decision Making, 21, 409-417. crossref(new window)

Hong, C. S. and Cho, M. H. (2015a). VUS and HUM represented with Mann-Whitney statistic, Communications for Statistical Applications and Methods, 22, 223-232. crossref(new window)

Hong, C. S. and Cho, M. H. (2015b). Test statistics for volume under the ROC surface and hypervolume under the ROC manifold, Communications for Statistical Applications and Methods, 22, 377-387. crossref(new window)

Hong, C. S. and Choi, J. S. (2009). Optimal threshold from ROC and CAP curves, The Korean Journal of Applied Statistics, 22, 911-921. crossref(new window)

Hong, C. S., Joo, J. S., and Choi, J. S. (2010). Optimal thresholds from mixture distributions, The Korean Journal of Applied Statistics, 23, 13-28. crossref(new window)

Hong, C. S. and Jung, D. G. (2014). Standard criterion of hypervolume under the ROC manifold, Journal of the Korean Data & Information Science Society, 25, 473-483. crossref(new window)

Hong, C. S. and Jung, E. S. (2013). Optimal thresholds criteria for ROC surfaces, Journal of The Korean Data and Information Science Society, 24, 1489-1496. crossref(new window)

Hong, C. S., Jung, E. S., and Jung, D. G. (2013). Standard criterion of VUS for ROC surface, The Korean Journal of Applied Statistics, 26, 1-8. crossref(new window)

Hong, C. S., Won, C. H., and Jeong, D. G. (2015). Parameter estimation of linear function using VUS and HUM maximization, Journal of the Korean Data & Information Science Society, To appear.

Hong, C. S. and Wu, Zhi Qiang (2014). Alternative accuracy for multiple ROC analysis, Journal of The Korean Data & Information Science Society, 25, 1521-1530. crossref(new window)

Hosmer, D. W. (2000). Applied Logistic Regression, 2nd ed., Wiley, New York.

Joseph, M. P. (2005). A PD validation framework for Basel II internal ratings-based systems, Quantitative Analyst Basel II Project, Commonwealth Bank of Australia.

Kraus, A. (2014). Recent Methods from Statistics and Machine Learning for Credit Scoring, Dissertation an der Fakultat fur Mathematik, Informatik und Statistik, der Ludwig-Maximilians-Universitat Munchen, Munchen; Anne.pdf.

Li, J. and Fine, J. P. (2008). ROC analysis with multiple classes and multiple tests: methodology and its application in microarray studies, Biostatistics, 9, 566-576. crossref(new window)

Mossman, D. (1999). Three-way ROCs, Medical Decision Making, 19, 78-89. crossref(new window)

Nakas, C. T., Alonzo, T. A., and Yiannoutsos, C. T. (2010). Accuracy and cut off point selection in three class classification problems using a generalization of the Youden index, Statistics in Medicine, 29, 2946-2955. crossref(new window)

Nakas, C. T. and Yiannoutsos, C. T. (2004). Ordered multiple-class ROC analysis with continuous measurements, Statistics in Medicine, 23, 3437-3449. crossref(new window)

Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization, The Computer Journal, 7, 308-313. crossref(new window)

Patel, A. C. and Markey, M. K. (2005). Comparison of three-class classification performance metrics: A case study in breast cancer CAD, International Society for Optical Engineering, 5749, 581-589.

Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford UniversityPress, Oxford.

Pepe, M. S., Cai, T., and Longton, G. (2005). Combining predictors for classification using the area under the receiver operating characteristic curve, Biometrics, 1, 221-229.

Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. crossref(new window)

Scurfield, B. K. (1996). Multiple-event forced-choice tasks in the theory of signal detectability, Journal of Mathematical Psychology, 40, 253-269. crossref(new window)

Sherman, R. P. (1993). The limiting distribution of the maximum rank correlation estimator, Econometrics, 61, 123-137. crossref(new window)

Sobehart, J. R. and Keenan, S. C. (2001). Measuring default accurately, Credit risk special report, Risk, 14, 31-33.

Swets, J. (1988). Measuring the accuracy of diagnostic systems, Science, 240, 1285-1293. crossref(new window)

Swets, J. A., Dawes, R. M., and Monahan, J. (2000). Better decisions through science, Scientific American, 283, 82-87.

Tasche, D. (2009). Estimating discriminatory power and PD curves when the number of defaults is small, Lioyds Banking Group.

Wandishin, M. S. and Mullen, S. J. (2009). Multiclass ROC analysis, Weather and Forecasting, 24, 530-547. crossref(new window)

Zou, K. H., O'Malley, A. J., and Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models, Circulation, 115, 654-657. crossref(new window)