Yi, Gang Man (2006-12). An algorithm for identifying clusters of functionally related genes in genomes. Master's Thesis. Thesis uri icon

abstract

  • An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromo- somes that are linked by common attributes. A generalized method that can find gene clusters, regardless of the mechanism of origin, would provide researchers with an unbiased method for finding clusters and studying the evolutionary forces that give rise to them. I present a basis of algorithm to identify gene clusters in eukaryotic genomes that utilizes functional categories defined in graph-based vocabularies such as the Gene Ontology (GO). Clusters identified in this manner need only have a common function and are not constrained by gene expression or other properties. I tested the algorithm by analyzing genomes of a representative set of species. I identified species specific variation in percentage of clustered genes as well as in properties of gene clusters, including size distribution and functional annotation. These properties may be diagnostic of the evolutionary forces that lead to the formation of gene clusters. The approach finds all gene clusters in the data set and ranks them by their likelihood of occurrence by chance. The method successfully identified clusters.
  • An increasing body of literature shows that genomes of eukaryotes can contain
    clusters of functionally related genes. Most approaches to identify gene clusters utilize
    microarray data or metabolic pathway databases to find groups of genes on chromo-
    somes that are linked by common attributes. A generalized method that can find
    gene clusters, regardless of the mechanism of origin, would provide researchers with
    an unbiased method for finding clusters and studying the evolutionary forces that
    give rise to them.
    I present a basis of algorithm to identify gene clusters in eukaryotic genomes
    that utilizes functional categories defined in graph-based vocabularies such as the
    Gene Ontology (GO). Clusters identified in this manner need only have a common
    function and are not constrained by gene expression or other properties. I tested the
    algorithm by analyzing genomes of a representative set of species. I identified species
    specific variation in percentage of clustered genes as well as in properties of gene
    clusters, including size distribution and functional annotation. These properties may
    be diagnostic of the evolutionary forces that lead to the formation of gene clusters.
    The approach finds all gene clusters in the data set and ranks them by their likelihood
    of occurrence by chance. The method successfully identified clusters.

publication date

  • December 2006