Is cross-validation valid for small-sample microarray classification?

abstract

MOTIVATION: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable understanding of the behavior of cross-validation in the context of very small samples. RESULTS: An extensive simulation study has been performed comparing cross-validation, resubstitution and bootstrap estimation for three popular classification rules-linear discriminant analysis, 3-nearest-neighbor and decision trees (CART)-using both synthetic and real breast-cancer patient data. Comparison is via the distribution of differences between the estimated and true errors. Various statistics for the deviation distribution have been computed: mean (for estimator bias), variance (for estimator precision), root-mean square error (for composition of bias and variance) and quartile ranges, including outlier behavior. In general, while cross-validation error estimation is much less biased than resubstitution, it displays excessive variance, which makes individual estimates unreliable for small samples. Bootstrap methods provide improved performance relative to variance, but at a high computational cost and often with increased bias (albeit, much less than with resubstitution).

authors

published proceedings

Bioinformatics

altmetric score

9.5

author list (cited authors)

Braga-Neto, U. M., & Dougherty, E. R.

citation count

474

complete list of authors

Braga-Neto, Ulisses M||Dougherty, Edward R

publication date

February 2004

publisher

Oxford University Press (OUP) Publisher

published in

Bioinformatics Journal

keywords

Algorithms
Benchmarking
Breast Neoplasms
Computer Simulation
Gene Expression Profiling
Genetic Predisposition To Disease
Genetic Testing
Humans
Models, Genetic
Models, Statistical
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Reproducibility Of Results
Sample Size
Sensitivity And Specificity

PubMed Central ID

14960464

Digital Object Identifier (DOI)

10.1093/bioinformatics/btg419

start page

374

end page

380

volume

20

issue

3

URL

http%3A%2F%2Fdx.doi.org%2F10.1093%2Fbioinformatics%2Fbtg419

Is cross-validation valid for small-sample microarray classification? Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL