Advanced SearchSearch Tips
Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap
Kim Ji-Hyun; Cha Eun-Song;
  PDF(new window)
It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.
Generalization Error;Prediction Accuracy;Classification Tree;Boosting;
 Cited by
Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap, Computational Statistics & Data Analysis, 2009, 53, 11, 3735  crossref(new windwow)
Cha, E.S. (2005). 예측오차 추정방법에 대한 비교연구, 석사학위논문, 숭실대학교

Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, Vol. 36, 105-139 crossref(new window)

Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases. University of California in Irvine, Department of Information and Computer Science

Braga-Neto, U.M. and Dougherty, E.R. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, Vol. 20, 374-380 crossref(new window)

Crawford, S.L. (1989). Extensions to the CART algorithm, Intemational Journal of Man-Machine Studies, Vol. 31, 197-217 crossref(new window)

Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross -validation. Journal of the American Statistical Association, Vol. 78, 316-331 crossref(new window)

Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap, Chapman and Hall

Efron, B. and Tibshirani, R. (1997), Improvements on cross-validation: The 632+ bootstrap method. Journal of the American Statistical Association, Vol. 92. 548-560 crossref(new window)

Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, Vol. 55, 119-139 crossref(new window)

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Technical Report, Stanford University, Department of Computer Sciences

Merler, S. and Furlanello, C. (1997). Selection of tree-based classifiers with the bootstrap 632+ rule. RIST Technical Report: TR-9605-01, revised Jan 97

R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available from

Themeau, T.M. and Atkinson, E.J. (1997). An introduction to recursive partitioning using the RPART routines. Technical Report, Mayo Foundation