Bayesian Variable Selection in Clustering High-Dimensional Data With Substructure
Additional Document Info
In this article we focus on clustering techniques recently proposed for high dimensional data that incorporate variable selection and extend them to the modeling of data with a known substructure, such as the structure imposed by an experimental design. Our method essentially approximates the within-group covariance by facilitating clustering without disrupting the groups defined by the experimenter. The method we adopt simultaneously determines which expression patterns are important, and which genes contribute to such patterns. We evaluate performance on simulated data and on microarray data from a colon carcinogenesis study. Selected genes are biologically consistent with current research and provide strong biological validation of the cluster configuration identified by the method. 2008 American Statistical Association and the International Biometric Society.