Model-based study of the Effectiveness of Reporting Lists of Small Feature Sets using RNA-Seq Data
Conference Paper
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
Ranking feature sets for phenotype classification based on gene expression is one of the most challenging issues in bioinformatics. When the number of samples is small, all feature-selection algorithms are known to be unreliable and error estimators suffer to different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because Next Generation Sequencing (NGS) technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they could potentially produce less discriminative data than the actual gene expression levels. Here, we focus on the implications of the non-linear transformation of gene concentrations by the sequencing machine and the choice of error estimators on feature set ranking.
name of conference
Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics