Optimality driven nearest centroid classification from genomic data.

abstract

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.

authors

Dabney, Alan

published proceedings

PLoS One

altmetric score

5.12

author list (cited authors)

Dabney, A. R., & Storey, J. D.

citation count

28

complete list of authors

Dabney, Alan R||Storey, John D

editor list (cited editors)

Zhu, J. i.

publication date

January 2007

publisher

Public Library of Science (PLoS) Publisher

published in

PLoS ONE Journal

keywords

Algorithms
Child
Data Interpretation, Statistical
Discriminant Analysis
Gene Expression Profiling
Gene Expression Regulation, Neoplastic
Genetic Techniques
Genomics
Humans
Leukemia
Lymphoma
Models, Statistical
Models, Theoretical
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated

PubMed Central ID

17912341

Digital Object Identifier (DOI)

10.1371/journal.pone.0001002

URI

https://hdl.handle.net/1969.1/182085

start page

e1002

end page

e1002

volume

2

issue

10

URL

http://dx.doi.org/10.1371/journal.pone.0001002

Optimality driven nearest centroid classification from genomic data. Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

editor list (cited editors)

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

URI

Additional Document Info

start page

end page

volume

issue

Other

URL