Improved pattern-driven algorithms for motif finding in DNA sequences

abstract

In order to guarantee that the optimal motif is found, traditional pattern-driven approaches perform an exhaustive search over all candidate motifs of length l. We develop an improved pattern-driven algorithm that takes O(4llk) time, where k is the number of sequences in the sample and l is the motif length, which is independent of the length of each sequence n for large enough l and saving a factor of n in time complexity over the original pattern-driven approach. We further extend this strategy to allow arbitrary don't care positions within a motif without much decrease in solvable values of l. Testing this algorithm on a large set of yeast samples constructed from co-expressed gene clusters reveals that most biological motifs have many invariant or almost invariant positions and these positions can be used to define the motif while ignoring the other positions. This motivates the following two-stage strategy that extends the solvable values of / substantially for the pattern-driven approach: first use an O(2llkn) algorithm to exhaustively search over all candidate motifs allowing arbitrary don't care positions but disallowing mismatches, then refine these motifs by allowing a limited amount of flexibility to model the almost invariant positions. We demonstrate that this seemingly restrictive motif definition is sufficiently powerful by showing that the performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. A software program implementing these approaches (MotifEnumerator) is available at http://faculty.cs.tamu.edu/shsze/motifenumerator. Springer-Verlag Berlin Heidelberg 2007.

authors

Sze, Sing-Hoi

published proceedings

SYSTEMS BIOLOGY AND REGULATORY GENOMICS

author list (cited authors)

Sze, S., & Zhao, X.

complete list of authors

Sze, Sing-Hoi||Zhao, Xiaoyan

publication date

August 2007

published in

Lecture Notes in Artificial Intelligence Journal

Improved pattern-driven algorithms for motif finding in DNA sequences Conference Paper

Overview

abstract

authors

published proceedings

author list (cited authors)

complete list of authors

publication date

published in

Identity

International Standard Book Number (ISBN) 10

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

volume