Feature selection for high-dimensional integrated data
Conference Paper
Overview
Identity
Additional Document Info
Other
View All
Overview
abstract
Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y , and the remainder of the predictors constitute a " noise set" Xu independent of Y . Using Monte Carlo simulations, we investigated the relative performance of two methods: Thresholding and singular-value decomposition, in combination with stochastic optimization to determine " empirical bounds" on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset. Copyright 2012 by the Society for Industrial and Applied Mathematics.
name of conference
Proceedings of the 2012 SIAM International Conference on Data Mining