Jiang, Xingde (2016-12). Novel Pattern Recognition Approaches to Identification of Gene-Expression Pathways in Banana Cultivars. Doctoral Dissertation. Thesis uri icon

abstract

  • Bolstered resubstitution is a simple and fast error estimation method that has been shown to perform better than cross-validation and comparably with bootstrap in small-sample settings. However, it has been observed that its performance can deteriorate in high-dimensional feature spaces. To overcome this issue, we propose here a modification of bolstered error estimation based on the principle of Naive Bayes. This estimator is simple to compute and is reducible under feature selection. In experiments using popular classification rules applied to data from a well-known breast cancer gene expression study, the new Naive-Bayes bolstered estimator outperformed the old one, as well as cross-validation and resubstitution, in high-dimensional target feature spaces (after feature selection); it was superior to the 0.632 bootstrap provided that the sample size was not too small. Model selection is the task of choosing a model with optimal complexity for the given data set. Most model selection criteria try to minimize the sum of a training error term and a complexity control term, that is, minimize the complexity penalized loss. We investigate replacing the training error with bolstered resubstitution in the penalized loss to do model selection. Computer simulations indicate that the proposed method improves the performance of the model selection in terms of choosing the correct model complexity. Besides applying novel error estimation to model selection in pattern recognition, we also apply it to assess the performance of classifiers designed on the banana gene-expression data. Bananas are the world's most important fruit; they are a vital component of local diets in many countries. Diseases and drought are major threats in banana production. To generate disease and drought tolerant bananas, we need to identify disease and drought responsive genes and pathways. Towards this goal, we conducted RNA-Seq analysis with wild type and transgenic banana, with and without inoculation/drought stress, and on different days after applying the stress. By combining several state-of-the-art computational models, we identified stress responsive genes and pathways. The validation results of these genes in Arabidopsis are promising.
  • Bolstered resubstitution is a simple and fast error estimation method that has

    been shown to perform better than cross-validation and comparably with

    bootstrap in small-sample settings. However, it has been observed that its

    performance can deteriorate in high-dimensional feature spaces. To overcome

    this issue, we propose here a modification of bolstered error estimation based

    on the principle of Naive Bayes. This estimator is simple to compute and is

    reducible under feature selection. In experiments using popular classification

    rules applied to data from a well-known breast cancer gene expression study,

    the new Naive-Bayes bolstered estimator outperformed the old one, as well as

    cross-validation and resubstitution, in high-dimensional target feature spaces

    (after feature selection); it was superior to the 0.632 bootstrap provided that

    the sample size was not too small.



    Model selection is the task of choosing a model with optimal complexity for the

    given data set. Most model selection criteria try to minimize the sum of

    a training error term and a complexity control term, that is, minimize the

    complexity penalized loss. We investigate replacing the training error with bolstered

    resubstitution in the penalized loss to do model selection. Computer

    simulations indicate that the proposed method improves the performance of the

    model selection in terms of choosing the correct model complexity.



    Besides applying novel error estimation to model selection in pattern

    recognition, we also apply it to assess the performance of classifiers designed

    on the banana gene-expression data. Bananas are the world's most important

    fruit; they are a vital component of local diets in many countries.

    Diseases and drought are major threats in banana production. To generate

    disease and drought tolerant bananas, we need to identify disease and drought

    responsive genes and pathways. Towards this goal, we conducted RNA-Seq analysis

    with wild type and transgenic banana, with and without inoculation/drought

    stress, and on different days after applying the stress. By combining several

    state-of-the-art computational models, we identified stress responsive genes

    and pathways. The validation results of these genes in Arabidopsis are

    promising.

publication date

  • December 2016