DOI QR코드

DOI QR Code

Test Statistics for Volume under the ROC Surface and Hypervolume under the ROC Manifold

  • Hong, Chong Sun (Department of Statistics, Sungkyunkwan University) ;
  • Cho, Min Ho (Department of Statistics, Sungkyunkwan University)
  • Received : 2015.06.15
  • Accepted : 2015.07.13
  • Published : 2015.07.31

Abstract

The area under the ROC curve can be represented by both Mann-Whitney and Wilcoxon rank sum statistics. Consider an ROC surface and manifold equal to three dimensions or more. This paper finds that the volume under the ROC surface (VUS) and the hypervolume under the ROC manifold (HUM) could be derived as functions of both conditional Mann-Whitney statistics and conditional Wilcoxon rank sum statistics. The nullhypothesis equal to three distribution functions or more are identical can be tested using VUS and HUM statistics based on the asymptotic large sample theory of Wilcoxon rank sum statistics. Illustrative examples with three and four random samples show that two approaches give the same VUS and $HUM^4$. The equivalence of several distribution functions is also tested with VUS and $HUM^4$ in terms of conditional Wilcoxon rank sum statistics.

References

  1. Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, Journal of Mathematical Psychology, 12, 387-415. https://doi.org/10.1016/0022-2496(75)90001-2
  2. Conover, W. J. (1980). Practical Nonparametric Statistics, John Wiley & Sons, New York.
  3. Egan, J. P. (1975). Signal Detection Theory and ROC Analysis, Academic Press, New York.
  4. Engelmann, B., Hayden, E. and Tasche, D. (2003). Testing rating accuracy, Risk, 16, 82-86.
  5. Faraggi, D. and Reiser, B. (2002). Estimation of the area under the ROC curve, Statistics in Medicine, 21, 3093-3106. https://doi.org/10.1002/sim.1228
  6. Fawcett, T. (2003). ROC graphs: Notes and practical considerations for data mining researchers, HP Labs Tech Report HPL-2003-4, Available from: http://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf
  7. Gibbons, J. D. (1971). Nonparametric Statistical Inference, McGraw-Hill, New York.
  8. Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29-36. https://doi.org/10.1148/radiology.143.1.7063747
  9. Hong, C. S. (2009). Optimal threshold from ROC and CAP curves, Communications in Statistics- Simulation and Computation, 38, 2060-2072. https://doi.org/10.1080/03610910903243703
  10. Hong, C. S. and Cho, M. H. (2015). VUS and HUM represented with Mann-Whitney statistic, Communications for Statistical Applications and Methods, 22, 223-232. https://doi.org/10.5351/CSAM.2015.22.3.223
  11. Hong, C. S., Joo, J. S. and Choi, J. S. (2010). Optimal thresholds from mixture distributions, The Korean Journal of Applied Statistics, 23, 13-28. https://doi.org/10.5351/KJAS.2010.23.1.013
  12. Hong, C. S., Jung, E. S. and Jung, D. G. (2013). Standard criterion of VUS for ROC surface, The Korean Journal of Applied Statistics, 26, 977-985. https://doi.org/10.5351/KJAS.2013.26.6.977
  13. Hong, C. S. and Jung, D. G. (2014). Standard criterion of hypervolume under the ROC manifold, Journal of the Korean Data & Information Science Society, 25, 473-483. https://doi.org/10.7465/jkdi.2014.25.3.473
  14. Joseph, M. P. (2005). A PD validation framework for Basel II internal ratings-based systems, Available from: http://www.business-school.ed.ac.uk/waf/crcarchive/2005/papers/joseph-maurice.pdf
  15. Mann, H. B. and Whitney, D. R. (1947). On a test whether one of two random variables is stochasti- cally larger than the other, Annals of Mathematical Statistics, 18, 50-60. https://doi.org/10.1214/aoms/1177730491
  16. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/A:1007601015854
  17. Randles, R. H. and Wolfe, D. A. (1979). Introduction to the Theory of Nonparametric Statistics, John Wiley & Sons, New York.
  18. Rosset, S. (2004). Model selection via the AUC, In Proceedings of the 21st International Conference of Machine Learning, Banff, Canada.
  19. Sobehart, J. R. and Keenan, S. C. (2001). Measuring default accurately, Risk: Credit Risk Special Report, 14, S31-S33.
  20. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, Science, 240, 1285-1293. https://doi.org/10.1126/science.3287615
  21. Swets, J. A., Dawes, R. M. and Monahan, J. (2000). Better decisions through science, Scientific American, 283, 82-87.
  22. Wilcoxon, F. (1945). Individual comparisons by ranking methods, Biometrics Bulletin, 1, 80-83. https://doi.org/10.2307/3001968
  23. Wilkie, A. D. (2004). Measures for comparing scoring systems. In L. C. Thomas, D. B. Edelman, and J. N. Crook (Eds.), Readings in Credit Scoring, Oxford University Press, Oxford, 51-62.
  24. Zou, K. H., O′Malley, A. J. and Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models, Circulation, 115, 654-657. https://doi.org/10.1161/CIRCULATIONAHA.105.594929

Cited by

  1. Proposition of polytomous discrimination index and test statistics vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.337
  2. Standardized polytomous discrimination index using concordance vol.27, pp.1, 2016, https://doi.org/10.7465/jkdi.2016.27.1.33
  3. Parameter estimation for the imbalanced credit scoring data using AUC maximization vol.29, pp.2, 2016, https://doi.org/10.5351/KJAS.2016.29.2.309