Exact performance of error estimators for discrete classifiers
- Additional Document Info
- View All
Discrete classification problems abound in pattern recognition and data mining applications. One of the most common discrete rules is the discrete histogram rule. This paper presents exact formulas for the computation of bias, variance, and RMS of the resubstitution and leave-one-out error estimators, for the discrete histogram rule. We also describe an algorithm to compute the exact probability distribution of resubstitution and leave-one-out, as well as their deviations from the true error rate. Using a parametric Zipf model, we compute the exact performance of resubstitution and leave-one-out, for varying expected true error, number of samples, and classifier complexity (number of bins). We compare this to approximate performance measures-computed by Monte-Carlo sampling - of 10-repeated 4-fold cross-validation and the 0.632 bootstrap error estimator. Our results show that resubstitution is low-biased but much less variable than leave-one-out, and is effectively the superior error estimator between the two, provided classifier complexity is low. In addition, our results indicate that the overall performance of resubstitution, as measured by the RMS, can be substantially better than the 10-repeated 4-fold cross-validation estimator, and even comparable to the 0.632 bootstrap estimator, provided that classifier complexity is low and the expected error rates are moderate. In addition to the results discussed in the paper, we provide an extensive set of plots that can be accessed on a companion website, at the URL http://ee.tamu.edu/edward/exact_discrete. 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
author list (cited authors)
Braga-Neto, U., & Dougherty, E.
complete list of authors
Braga-Neto, U||Dougherty, E