MOPAC: motif finding by preprocessing and agglomerative clustering from microarrays.
Additional Document Info
We propose a novel strategy for discovering motifs from gene expression data. The gene expression data in our experiments comes from DNA Microarray analysis of the bacterium E. coli in response to recovery from nutrient starvation. We have annotated the data and identified the upregulated genes. Our interest is to find common regulatory motifs that are responsible for the upregulation of these specific genes. We assume that a common motif that a regulatory protein can bind to will be present in the upstream region of the upregulated genes and will not be present in the upstream regions of genes that showed a constant level of expression over time. Our objective is to find the common motifs that are present in at least some of the upstream sequences of upregulated genes and not present in the control set, which is the set of genes whose expression remained the same. Because it is possible that there could be several subsets of co-regulated genes under different control mechanisms among the co-expressed genes, we do not want to require motifs to be present in all upregulated sequences. Therefore, we propose a new algorithm for finding such motifs through stages of pre-processing, denoising, agglomerative clustering and consensus checking. Through this process, we have found some motifs that are good candidates for further validation.