Corrected small-sample estimation of the Bayes error. - Texas A&M University (TAMU) Scholar

abstract

MOTIVATION: A major problem of pattern classification is estimation of the Bayes error when only small samples are available. One way to estimate the Bayes error is to design a classifier based on some classification rule applied to sample data, estimate the error of the designed classifier, and then use this estimate as an estimate of the Bayes error. Relative to the Bayes error, the expected error of the designed classifier is biased high, and this bias can be severe with small samples. RESULTS: This paper provides a correction for the bias by subtracting a term derived from the representation of the estimation error. It does so for Boolean classifiers, these being defined on binary features. Although the general theory applies to any Boolean classifier, a model is introduced to reduce the number of parameters. A key point is that the expected correction is conservative. Properties of the corrected estimate are studied via simulation. The correction applies to binary predictors because they are mathematically identical to Boolean classifiers. In this context the correction is adapted to the coefficient of determination, which has been used to measure nonlinear multivariate relations between genes and design genetic regulatory networks. An application using gene-expression data from a microarray experiment is provided on the website http://gspsnap.tamu.edu/smallsample/ (user:'smallsample', password:'smallsample)').

authors

Dougherty, Edward

published proceedings

Bioinformatics

author list (cited authors)

Brun, M., Sabbagh, D. L., Kim, S., & Dougherty, E. R.

citation count

6

complete list of authors

Brun, Marcel||Sabbagh, David L||Kim, Seungchan||Dougherty, Edward R

publication date

May 2003

publisher

Oxford University Press (OUP) Publisher

published in

Bioinformatics Journal

keywords

Algorithms
Bayes Theorem
Computer Simulation
Gene Expression Regulation
Models, Genetic
Models, Statistical
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Quality Control
Reproducibility Of Results
Sample Size
Sensitivity And Specificity
Sequence Alignment
Signal Processing, Computer-Assisted

PubMed Central ID

12761056

Digital Object Identifier (DOI)

10.1093/bioinformatics/btg144

start page

944

end page

951

volume

19

issue

8

URL

http://dx.doi.org/10.1093/bioinformatics/btg144

Corrected small-sample estimation of the Bayes error. Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL