Corrected small-sample estimation of the Bayes error. Academic Article uri icon

abstract

  • MOTIVATION: A major problem of pattern classification is estimation of the Bayes error when only small samples are available. One way to estimate the Bayes error is to design a classifier based on some classification rule applied to sample data, estimate the error of the designed classifier, and then use this estimate as an estimate of the Bayes error. Relative to the Bayes error, the expected error of the designed classifier is biased high, and this bias can be severe with small samples. RESULTS: This paper provides a correction for the bias by subtracting a term derived from the representation of the estimation error. It does so for Boolean classifiers, these being defined on binary features. Although the general theory applies to any Boolean classifier, a model is introduced to reduce the number of parameters. A key point is that the expected correction is conservative. Properties of the corrected estimate are studied via simulation. The correction applies to binary predictors because they are mathematically identical to Boolean classifiers. In this context the correction is adapted to the coefficient of determination, which has been used to measure nonlinear multivariate relations between genes and design genetic regulatory networks. An application using gene-expression data from a microarray experiment is provided on the website http://gspsnap.tamu.edu/smallsample/ (user:'smallsample', password:'smallsample)').

published proceedings

  • Bioinformatics

author list (cited authors)

  • Brun, M., Sabbagh, D. L., Kim, S., & Dougherty, E. R.

citation count

  • 6

complete list of authors

  • Brun, Marcel||Sabbagh, David L||Kim, Seungchan||Dougherty, Edward R

publication date

  • May 2003