Relationship between the accuracy of classifier error estimation and distribution complexity Conference Paper uri icon

abstract

  • Error estimation is a crucial part of any classification problem and it becomes problematic with small samples. In this paper, we analyze the performance of some widely used error estimation methods relative to the complexity of the featurelabel distribution: resubstitution, 10-fold cross validation with repetition (CV10r), leave-one-out (LOO), bootstrap .632, and bolstered resubstitution. Our definition of complexity takes into account both the complexity of the Bayes decision surface and the Bayes error. We define the complexity of distribution for a class of Gaussian mixture models. In this class, the Bayes classifier is a piecewise linear classifier and its complexity is included in our definition. Based on the defined measure of complexity, we perform experiments for 2-dimensional and 3-dimensional problems and apply different error estimation methods for distributions of different complexities. The Bias and root-mean-squared (RMS) error of the error estimators are used to analyze their performances. The simulation results show that all the estimation methods lose accuracy as the complexity increases and this performance loss is quantified as a function of distribution complexity. 2011 IEEE.

name of conference

  • 2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

published proceedings

  • 2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

author list (cited authors)

  • Atashpaz-Gargari, E., Sima, C., Braga-Neto, U. M., & Dougherty, E. R.

citation count

  • 0

complete list of authors

  • Atashpaz-Gargari, Esmaeil||Sima, Chao||Braga-Neto, Ulisses M||Dougherty, Edward R

publication date

  • January 2011