Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.

abstract

Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.

authors

published proceedings

EURASIP J Bioinform Syst Biol

author list (cited authors)

Sun, Y., Braga-Neto, U., & Dougherty, E. R.

citation count

12

complete list of authors

Sun, Youting||Braga-Neto, Ulisses||Dougherty, Edward R

publication date

December 2009

publisher

Springer Nature Publisher

published in

n1687-4153ISSN Journal

keywords

31 Biological Sciences
3102 Bioinformatics And Computational Biology
8 Health And Social Care Services Research
8.4 Research Design And Methodologies (health Services)
Bioengineering

Digital Object Identifier (DOI)

10.1155/2009/504069

start page

504069

end page

504069

volume

2009

issue

1

URL

http://dx.doi.org/10.1155/2009/504069

Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study. Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL