Improved patterndriven algorithms for motif finding in DNA sequences
Conference Paper

 Overview

 Identity

 Additional Document Info

 View All

Overview
abstract

In order to guarantee that the optimal motif is found, traditional patterndriven approaches perform an exhaustive search over all candidate motifs of length l. We develop an improved patterndriven algorithm that takes O(4llk) time, where k is the number of sequences in the sample and l is the motif length, which is independent of the length of each sequence n for large enough l and saving a factor of n in time complexity over the original patterndriven approach. We further extend this strategy to allow arbitrary don't care positions within a motif without much decrease in solvable values of l. Testing this algorithm on a large set of yeast samples constructed from coexpressed gene clusters reveals that most biological motifs have many invariant or almost invariant positions and these positions can be used to define the motif while ignoring the other positions. This motivates the following twostage strategy that extends the solvable values of / substantially for the patterndriven approach: first use an O(2llkn) algorithm to exhaustively search over all candidate motifs allowing arbitrary don't care positions but disallowing mismatches, then refine these motifs by allowing a limited amount of flexibility to model the almost invariant positions. We demonstrate that this seemingly restrictive motif definition is sufficiently powerful by showing that the performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. A software program implementing these approaches (MotifEnumerator) is available at http://faculty.cs.tamu.edu/shsze/motifenumerator. © SpringerVerlag Berlin Heidelberg 2007.
author list (cited authors)
publication date
published in
Identity
International Standard Book Number (ISBN) 10
International Standard Book Number (ISBN) 13
Additional Document Info
start page
end page
volume