Exact correlation between actual and estimated errors in discrete classification Academic Article uri icon

abstract

  • Discrete classification problems are important in pattern recognition applications. The most often used discrete classification rule is the discrete histogram rule. In this letter we provide exact expressions for the correlation coefficient between the actual error and the resubstitution and leave-one-out cross-validation error estimators for the discrete histogram rule. We show with an example that correlations between actual and estimated errors are generally poor, and that in fact leave-one-out cross-validation can display negative correlation when sample sizes are small and classifier complexity is large. We observe that correlation decreases with increasing classifier complexity and increasing sample size does not necessarily produce an increase in correlation. The exact expressions given here can be computed reasonably fast for given sample size, dimensionality, and model parameters, which is useful because, as also illustrated in this letter, Monte-Carlo approximations of the correlation coefficient are generally poor, even at a large number of simulated data sets. 2009 Elsevier B.V. All rights reserved.

published proceedings

  • PATTERN RECOGNITION LETTERS

author list (cited authors)

  • Braga-Neto, U. M., & Dougherty, E. R.

citation count

  • 21

complete list of authors

  • Braga-Neto, Ulisses M||Dougherty, Edward R

publication date

  • January 2010