Relation Between Permutation-Test P Values and Classifier Error Estimates Academic Article

Overview
Identity
Additional Document Info
Other
View All

abstract

Gene-expression-based classifiers suffer from the small number of microarrays usually available for classifier design. Hence, one is confronted with the dual problem of designing a classifier and estimating its error with only a small sample. Permutation testing has been recommended to assess the dependency of a designed classifier on the specific data set. This involves randomly permuting the labels of the data points, estimating the error of the designed classifiers for each permutation, and then finding the p value of the error for the actual labeling relative to the population of errors for the random labelings. This paper addresses the issue of whether or not this p value is informative. It provides both analytic and simulation results to show that the permutation p value is, up to very small deviation, a function of the error estimate. Moreover, even though the p value is a monotonically increasing function of the error estimate, in the range of the error where the majority of the p values lie, the function is very slowly increasing, so that inversion is problematic. Hence, the conclusion is that the p value is less informative than the error estimate. This result demonstrates that random labeling does not provide any further insight into the accuracy of the classifier or the precision of the error estimate. We have no knowledge beyond the error estimate itself and the various distribution-free, classifier-specific bounds developed for this estimate.

authors

Dougherty, Edward

published proceedings

Machine Learning

author list (cited authors)

Hsing, T., Attoor, S., & Dougherty, E.

citation count

21

complete list of authors

Hsing, Tailen||Attoor, Sanju||Dougherty, Edward

publication date

July 2003

publisher

Springer Nature Publisher

published in

Machine Learning Journal