Joint Sampling Distribution Between Actual and Estimated Classification Errors for Linear Discriminant Analysis

abstract

Error estimation must be used to find the accuracy of a designed classifier, an issue that is critical in biomarker discovery for disease diagnosis and prognosis in genomics and proteomics. This paper presents, for what is believed to be the first time, the analytical formulation for the joint sampling distribution of the actual and estimated errors of a classification rule. The analysis presented here concerns the linear discriminant analysis (LDA) classification rule and the resubstitution and leave-one-out error estimators, under a general parametric Gaussian assumption. Exact results are provided in the univariate case, and a simple method is suggested to obtain an accurate approximation in the multivariate case. It is also shown how these results can be applied in the computation of condition bounds and the regression of the actual error, given the observed error estimate. In contrast to asymptotic results, the analysis presented here is applicable to finite training data. In particular, it applies in the small-sample settings commonly found in genomics and proteomics applications. Numerical examples, which include parameters estimated from actual microarray data, illustrate the analysis throughout. 2006 IEEE.

authors

published proceedings

IEEE TRANSACTIONS ON INFORMATION THEORY

author list (cited authors)

Zollanvari, A., Braga-Neto, U. M., & Dougherty, E. R.

citation count

28

complete list of authors

Zollanvari, Amin||Braga-Neto, Ulisses M||Dougherty, Edward R

publication date

February 2010

publisher

Institute of Electrical and Electronics Engineers (IEEE) Publisher

published in

IEEE TRANSACTIONS ON INFORMATION THEORY Journal

keywords

Classification
Cross-validation
Error Estimation
Leave-one-out
Linear Discriminant Analysis
Resubstitution
Sampling Distribution

Digital Object Identifier (DOI)

10.1109/TIT.2009.2037034

start page

784

end page

804

volume

56

issue

2

URL

http%3A%2F%2Fdx.doi.org%2F10.1109%2Ftit.2009.2037034

Joint Sampling Distribution Between Actual and Estimated Classification Errors for Linear Discriminant Analysis Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL