Evolutionary soft co-clustering: formulations, algorithms, and applications Academic Article uri icon

abstract

  • 2014, The Author(s). We consider the co-clustering of time-varying data using evolutionary co-clustering methods. Existing approaches are based on the spectral learning framework, thus lacking a probabilistic interpretation. We overcome this limitation by developing a probabilistic model in this paper. The proposed model assumes that the observed data are generated via a two-step process that depends on the historic co-clusters. This allows us to capture the temporal smoothness in a probabilistically principled manner. To perform maximum likelihood parameter estimation, we present an EM-based algorithm. We also establish the convergence of the proposed EM algorithm. An appealing feature of the proposed model is that it leads to soft co-clustering assignments naturally. We evaluate the proposed method on both synthetic and real-world data sets. Experimental results show that our method consistently outperforms prior approaches based on spectral method. To fully exploit the real-world impact of our methods, we further perform a systematic application study on the analysis of Drosophila gene expression pattern images. We encode the spatial gene expression information at a particular developmental time point into a data matrix using a mesh-generation pipeline. We then co-cluster the embryonic domains and the genes simultaneously for multiple time points using our evolutionary co-clustering method. Results show that the co-clusters of gene and embryonic domains reflect the underlying biology.

published proceedings

  • DATA MINING AND KNOWLEDGE DISCOVERY

author list (cited authors)

  • Zhang, W., Li, R., Feng, D., Chernikov, A., Chrisochoides, N., Osgood, C., & Ji, S.

citation count

  • 12

complete list of authors

  • Zhang, Wenlu||Li, Rongjian||Feng, Daming||Chernikov, Andrey||Chrisochoides, Nikos||Osgood, Christopher||Ji, Shuiwang

publication date

  • May 2015