Is bagging effective in the classification of small-sample genomic and proteomic data?

abstract

There has been considerable interest recently in the application of bagging in the classification of both gene-expression data and protein-abundance mass spectrometry data. The approach is often justified by the improvement it produces on the performance of unstable, overfitting classification rules under small-sample situations. However, the question of real practical interest is whether the ensemble scheme will improve performance of those classifiers sufficiently to beat the performance of single stable, nonoverfitting classifiers, in the case of small-sample genomic and proteomic data sets. To investigate that question, we conducted a detailed empirical study, using publicly-available data sets from published genomic and proteomic studies. We observed that, under t-test and RELIEF filter-based feature selection, bagging generally does a good job of improving the performance of unstable, overfitting classifiers, such as CART decision trees and neural networks, but that improvement was not sufficient to beat the performance of single stable, nonoverfitting classifiers, such as diagonal and plain linear discriminant analysis, or 3-nearest neighbors. Furthermore, as expected, the ensemble method did not improve the performance of these classifiers significantly. Representative experimental results are presented and discussed in this work.

authors

Braga Neto, Ulisses

published proceedings

EURASIP J Bioinform Syst Biol

author list (cited authors)

Vu, T. T., & Braga-Neto, U. M.

citation count

8

complete list of authors

Vu, TT||Braga-Neto, UM

publication date

May 2009

publisher

Springer Nature Publisher

published in

EURASIP Journal on Bioinformatics and Systems Biology Journal

keywords

46 Information And Computing Sciences
4611 Machine Learning
Biotechnology
Genetics
Human Genome

PubMed Central ID

19390645

Digital Object Identifier (DOI)

10.1155/2009/158368

start page

158368

end page

158368

volume

2009

issue

1

URL

http%3A%2F%2Fdx.doi.org%2F10.1155%2F2009%2F158368

Is bagging effective in the classification of small-sample genomic and proteomic data? Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL