Bolstered resubstitution is a simple and fast error estimation method that has been shown to perform better than cross-validation and comparably with bootstrap in small-sample settings. However, it has been observed that its performance can deteriorate in high-dimensional feature spaces. To overcome this issue, we propose here a modification of bolstered error estimation based on the principle of Naive Bayes. This estimator is simple to compute and is reducible under feature selection. In experiments using popular classification rules applied to data from a well-known breast cancer gene expression study, the new Naive-Bayes bolstered estimator outperformed the old one, as well as cross-validation and resubstitution, in high-dimensional target feature spaces (after feature selection); it was superior to the 0.632 bootstrap provided that the sample size was not too small. Model selection is the task of choosing a model with optimal complexity for the given data set. Most model selection criteria try to minimize the sum of a training error term and a complexity control term, that is, minimize the complexity penalized loss. We investigate replacing the training error with bolstered resubstitution in the penalized loss to do model selection. Computer simulations indicate that the proposed method improves the performance of the model selection in terms of choosing the correct model complexity. Besides applying novel error estimation to model selection in pattern recognition, we also apply it to assess the performance of classifiers designed on the banana gene-expression data. Bananas are the world's most important fruit; they are a vital component of local diets in many countries. Diseases and drought are major threats in banana production. To generate disease and drought tolerant bananas, we need to identify disease and drought responsive genes and pathways. Towards this goal, we conducted RNA-Seq analysis with wild type and transgenic banana, with and without inoculation/drought stress, and on different days after applying the stress. By combining several state-of-the-art computational models, we identified stress responsive genes and pathways. The validation results of these genes in Arabidopsis are promising.
Bolstered resubstitution is a simple and fast error estimation method that has
been shown to perform better than cross-validation and comparably with
bootstrap in small-sample settings. However, it has been observed that its
performance can deteriorate in high-dimensional feature spaces. To overcome
this issue, we propose here a modification of bolstered error estimation based
on the principle of Naive Bayes. This estimator is simple to compute and is
reducible under feature selection. In experiments using popular classification
rules applied to data from a well-known breast cancer gene expression study,
the new Naive-Bayes bolstered estimator outperformed the old one, as well as
cross-validation and resubstitution, in high-dimensional target feature spaces
(after feature selection); it was superior to the 0.632 bootstrap provided that
the sample size was not too small.
Model selection is the task of choosing a model with optimal complexity for the
given data set. Most model selection criteria try to minimize the sum of
a training error term and a complexity control term, that is, minimize the
complexity penalized loss. We investigate replacing the training error with bolstered
resubstitution in the penalized loss to do model selection. Computer
simulations indicate that the proposed method improves the performance of the
model selection in terms of choosing the correct model complexity.
Besides applying novel error estimation to model selection in pattern
recognition, we also apply it to assess the performance of classifiers designed
on the banana gene-expression data. Bananas are the world's most important
fruit; they are a vital component of local diets in many countries.
Diseases and drought are major threats in banana production. To generate
disease and drought tolerant bananas, we need to identify disease and drought
responsive genes and pathways. Towards this goal, we conducted RNA-Seq analysis
with wild type and transgenic banana, with and without inoculation/drought
stress, and on different days after applying the stress. By combining several
state-of-the-art computational models, we identified stress responsive genes
and pathways. The validation results of these genes in Arabidopsis are