Impact of error estimation on feature selection Academic Article uri icon

abstract

  • Given a large set of potential features, it is usually necessary to find a small subset with which to classify. The task of finding an optimal feature set is inherently combinatoric and therefore suboptimal algorithms are typically used to find feature sets. If feature selection is based directly on classification error, then a feature-selection algorithm must base its decision on error estimates. This paper addresses the impact of error estimation on feature selection using two performance measures: comparison of the true error of the optimal feature set with the true error of the feature set found by a feature-selection algorithm, and the number of features among the truly optimal feature set that appear in the feature set found by the algorithm. The study considers seven error estimators applied to three standard suboptimal feature-selection algorithms and exhaustive search, and it considers three different feature-label model distributions. It draws two conclusions for the cases considered: (1) depending on the sample size and the classification rule, feature-selection algorithms can produce feature sets whose corresponding classifiers possess errors far in excess of the classifier corresponding to the optimal feature set; and (2) for small samples, differences in performances among the feature-selection algorithms are less significant than performance differences among the error estimators used to implement the algorithms. Moreover, keeping in mind that results depend on the particular classifier-distribution pair, for the error estimators considered in this study, bootstrap and bolstered resubstitution usually outperform cross-validation, and bolstered resubstitution usually performs as well as or better than bootstrap. 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

published proceedings

  • PATTERN RECOGNITION

author list (cited authors)

  • Sima, C., Attoor, S., Brag-Neto, U., Lowey, J., Suh, E., & Dougherty, E. R.

citation count

  • 41

complete list of authors

  • Sima, C||Attoor, S||Brag-Neto, U||Lowey, J||Suh, E||Dougherty, ER

publication date

  • December 2005