Exact performance of error estimators for discrete classifiers

abstract

Discrete classification problems abound in pattern recognition and data mining applications. One of the most common discrete rules is the discrete histogram rule. This paper presents exact formulas for the computation of bias, variance, and RMS of the resubstitution and leave-one-out error estimators, for the discrete histogram rule. We also describe an algorithm to compute the exact probability distribution of resubstitution and leave-one-out, as well as their deviations from the true error rate. Using a parametric Zipf model, we compute the exact performance of resubstitution and leave-one-out, for varying expected true error, number of samples, and classifier complexity (number of bins). We compare this to approximate performance measures-computed by Monte-Carlo sampling - of 10-repeated 4-fold cross-validation and the 0.632 bootstrap error estimator. Our results show that resubstitution is low-biased but much less variable than leave-one-out, and is effectively the superior error estimator between the two, provided classifier complexity is low. In addition, our results indicate that the overall performance of resubstitution, as measured by the RMS, can be substantially better than the 10-repeated 4-fold cross-validation estimator, and even comparable to the 0.632 bootstrap estimator, provided that classifier complexity is low and the expected error rates are moderate. In addition to the results discussed in the paper, we provide an extensive set of plots that can be accessed on a companion website, at the URL http://ee.tamu.edu/edward/exact_discrete. 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

authors

published proceedings

PATTERN RECOGNITION

author list (cited authors)

Braga-Neto, U., & Dougherty, E.

citation count

47

complete list of authors

Braga-Neto, U||Dougherty, E

publication date

January 2005

publisher

Elsevier Publisher

published in

Pattern Recognition Journal

keywords

Bootstrap
Cross-validation
Discrete Classification
Error Estimation
Histogram Rule
Leave-one-out
Resubstitution

Digital Object Identifier (DOI)

10.1016/j.patcog.2005.02.013

start page

1799

end page

1814

volume

38

issue

11

URL

http://dx.doi.org/10.1016/j.patcog.2005.02.013

Exact performance of error estimators for discrete classifiers Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL