A probabilistic latent semantic analysis model for coclustering the mouse brain atlas.
Additional Document Info
The mammalian brain contains cells of a large variety of types. The phenotypic properties of cells of different types are largely the results of distinct gene expression patterns. Therefore, it is of critical importance to characterize the gene expression patterns in the mammalian brain. The Allen Developing Mouse Brain Atlas provides spatiotemporal in situ hybridization gene expression data across multiple stages of mouse brain development. It provides a framework to explore spatiotemporal regulation of gene expression during development. We employ a graph approximation formulation to cocluster the genes and the brain voxels simultaneously for each time point. We show that this formulation can be expressed as a probabilistic latent semantic analysis (PLSA) model, thereby allowing us to use the expectation-maximization algorithm for PLSA to estimate the coclustering parameters. To provide a quantitative comparison with prior methods, we evaluate the coclustering method on a set of standard synthetic data sets. Results indicate that our method consistently outperforms prior methods. We apply our method to cocluster the Allen Developing Mouse Brain Atlas data. Results indicate that our clustering of voxels is more consistent with classical neuroanatomy than those of prior methods. Our analysis also yields sets of genes that are co-expressed in a subset of the brain voxels.