Identifying gene clusters within localized regions in multiple genomes.

abstract

An important strategy to study genome evolution is to investigate the clustering of orthologous genes among multiple genomes, in which the most popular approaches require that the distance between adjacent genes in a cluster be small. We investigate a different formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparison of clusters of different sizes. We first consider a restricted version which requires that orthologous genes are strictly ordered within each cluster and show that it can be solved in polynomial time. We then develop practical exact algorithms for the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome while considering a general model in which a gene is allowed to appear in more than one orthologous group. We show that our algorithm can identify biologically relevant gene clusters on four bacterial genomes Bacillus subtilis, Streptococcus pyogenes, Streptococcus pneumoniae, and Clostridium acetobutylicum. We also show that our algorithm can identify significantly more functionally enriched gene clusters on four yeast genomes Saccharomyces cerevisiae, Saccharomyces paradoxus, Saccharomyces mikatae, and Saccharomyces bayanus than previous algorithms. A software program (GCFinder) and a list of gene clusters found on the bacterial and the yeast genomes are available at http://faculty.cse.tamu.edu/shsze/gcfinder .

authors

Sze, Sing-Hoi

published proceedings

J Comput Biol

altmetric score

0.25

author list (cited authors)

Yang, Q., Yi, G., Zhang, F., Thon, M. R., & Sze, S.

citation count

3

complete list of authors

Yang, Qingwu||Yi, Gangman||Zhang, Fenghui||Thon, Michael R||Sze, Sing-Hoi

publication date

January 2010

publisher

Mary Ann Liebert Publisher

published in

Journal of Computational Biology Journal

keywords

Algorithms
Biological Evolution
Genome, Bacterial
Genome, Fungal
Multigene Family
Software

PubMed Central ID

20500020

Digital Object Identifier (DOI)

10.1089/cmb.2009.0116

start page

657

end page

668

volume

17

issue

5

URL

http%3A%2F%2Fdx.doi.org%2F10.1089%2Fcmb.2009.0116

Identifying gene clusters within localized regions in multiple genomes. Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL