Evolutionary soft co-clustering: formulations, algorithms, and applications
Academic Article
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
2014, The Author(s). We consider the co-clustering of time-varying data using evolutionary co-clustering methods. Existing approaches are based on the spectral learning framework, thus lacking a probabilistic interpretation. We overcome this limitation by developing a probabilistic model in this paper. The proposed model assumes that the observed data are generated via a two-step process that depends on the historic co-clusters. This allows us to capture the temporal smoothness in a probabilistically principled manner. To perform maximum likelihood parameter estimation, we present an EM-based algorithm. We also establish the convergence of the proposed EM algorithm. An appealing feature of the proposed model is that it leads to soft co-clustering assignments naturally. We evaluate the proposed method on both synthetic and real-world data sets. Experimental results show that our method consistently outperforms prior approaches based on spectral method. To fully exploit the real-world impact of our methods, we further perform a systematic application study on the analysis of Drosophila gene expression pattern images. We encode the spatial gene expression information at a particular developmental time point into a data matrix using a mesh-generation pipeline. We then co-cluster the embryonic domains and the genes simultaneously for multiple time points using our evolutionary co-clustering method. Results show that the co-clusters of gene and embryonic domains reflect the underlying biology.