Qualitatively, a filter is said to be "robust" if its performance degradation is acceptable for distributions close to the one for which it is optimal, that is, the one for which it has been designed. This paper adapts the signal-processing theory of optimal robust filters to classifiers. The distribution (class conditional distributions) to which the classifier is to be applied is parameterized by a state vector and the principle issue is to choose a design state that is optimal in comparison to all other states relative to some measure of robustness. A minimax robust classifier is one whose worst performance over all states is better than the worst performances of the other classifiers (defined at the other states). A Bayesian robust classifier is one whose expected performance is better than the expected performances of the other classifiers. The state corresponding to the Bayesian robust classifier is called the maximally robust state. Minimax robust classifiers tend to give too much weight to states for which classification is very difficult and therefore our effort is focused on Bayesian robust classifiers. Whereas the signal-processing theory of robust filtering concentrates on design with full distributional knowledge and a fixed number of observation variables (features), design via training from sample data and feature selection are so important for classification that robustness optimality must be considered from these perspectives - in particular, for small samples. In this context, for a given sample size, we will be concerned with the maximally robust state-feature pair. All definitions are independent of the classification rule; however, applications are only considered for linear and quadratic discriminant analysis, for which there are parametric forms for the optimal discriminants. 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.