Test Statistics for Volume under the ROC Surface and Hypervolume under the ROC Manifold

  • Hong, Chong Sun (Department of Statistics, Sungkyunkwan University) ;
  • Cho, Min Ho (Department of Statistics, Sungkyunkwan University)
  • Received : 2015.06.15
  • Accepted : 2015.07.13
  • Published : 2015.07.31


The area under the ROC curve can be represented by both Mann-Whitney and Wilcoxon rank sum statistics. Consider an ROC surface and manifold equal to three dimensions or more. This paper finds that the volume under the ROC surface (VUS) and the hypervolume under the ROC manifold (HUM) could be derived as functions of both conditional Mann-Whitney statistics and conditional Wilcoxon rank sum statistics. The nullhypothesis equal to three distribution functions or more are identical can be tested using VUS and HUM statistics based on the asymptotic large sample theory of Wilcoxon rank sum statistics. Illustrative examples with three and four random samples show that two approaches give the same VUS and $HUM^4$. The equivalence of several distribution functions is also tested with VUS and $HUM^4$ in terms of conditional Wilcoxon rank sum statistics.


  1. Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, Journal of Mathematical Psychology, 12, 387-415.
  2. Conover, W. J. (1980). Practical Nonparametric Statistics, John Wiley & Sons, New York.
  3. Egan, J. P. (1975). Signal Detection Theory and ROC Analysis, Academic Press, New York.
  4. Engelmann, B., Hayden, E. and Tasche, D. (2003). Testing rating accuracy, Risk, 16, 82-86.
  5. Faraggi, D. and Reiser, B. (2002). Estimation of the area under the ROC curve, Statistics in Medicine, 21, 3093-3106.
  6. Fawcett, T. (2003). ROC graphs: Notes and practical considerations for data mining researchers, HP Labs Tech Report HPL-2003-4, Available from:
  7. Gibbons, J. D. (1971). Nonparametric Statistical Inference, McGraw-Hill, New York.
  8. Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29-36.
  9. Hong, C. S. (2009). Optimal threshold from ROC and CAP curves, Communications in Statistics- Simulation and Computation, 38, 2060-2072.
  10. Hong, C. S. and Cho, M. H. (2015). VUS and HUM represented with Mann-Whitney statistic, Communications for Statistical Applications and Methods, 22, 223-232.
  11. Hong, C. S., Joo, J. S. and Choi, J. S. (2010). Optimal thresholds from mixture distributions, The Korean Journal of Applied Statistics, 23, 13-28.
  12. Hong, C. S., Jung, E. S. and Jung, D. G. (2013). Standard criterion of VUS for ROC surface, The Korean Journal of Applied Statistics, 26, 977-985.
  13. Hong, C. S. and Jung, D. G. (2014). Standard criterion of hypervolume under the ROC manifold, Journal of the Korean Data & Information Science Society, 25, 473-483.
  14. Joseph, M. P. (2005). A PD validation framework for Basel II internal ratings-based systems, Available from:
  15. Mann, H. B. and Whitney, D. R. (1947). On a test whether one of two random variables is stochasti- cally larger than the other, Annals of Mathematical Statistics, 18, 50-60.
  16. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231.
  17. Randles, R. H. and Wolfe, D. A. (1979). Introduction to the Theory of Nonparametric Statistics, John Wiley & Sons, New York.
  18. Rosset, S. (2004). Model selection via the AUC, In Proceedings of the 21st International Conference of Machine Learning, Banff, Canada.
  19. Sobehart, J. R. and Keenan, S. C. (2001). Measuring default accurately, Risk: Credit Risk Special Report, 14, S31-S33.
  20. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, Science, 240, 1285-1293.
  21. Swets, J. A., Dawes, R. M. and Monahan, J. (2000). Better decisions through science, Scientific American, 283, 82-87.
  22. Wilcoxon, F. (1945). Individual comparisons by ranking methods, Biometrics Bulletin, 1, 80-83.
  23. Wilkie, A. D. (2004). Measures for comparing scoring systems. In L. C. Thomas, D. B. Edelman, and J. N. Crook (Eds.), Readings in Credit Scoring, Oxford University Press, Oxford, 51-62.
  24. Zou, K. H., O′Malley, A. J. and Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models, Circulation, 115, 654-657.

Cited by

  1. Proposition of polytomous discrimination index and test statistics vol.27, pp.2, 2016,
  2. Standardized polytomous discrimination index using concordance vol.27, pp.1, 2016,
  3. Parameter estimation for the imbalanced credit scoring data using AUC maximization vol.29, pp.2, 2016,