Implementing Galaxy for Community-based Phage Genomics Grant uri icon

abstract

  • Phages, the viruses of bacteria, are the most numerous genetic entities in the biosphere, outnumbering bacteria by 10-100-fold, and contain most of its DNA diversity. Phage biology is a driver in global ecology and in the global dynamics of gene transfer. Phages, as the natural predators of bacteria, have recognized potential as antibacterial agents, both in human health and in animal husbandry and agriculture. Phages, because they can be restricted to specific bacterial species or genera, represent the only currently available tool for manipulating the diverse populations of bacteria in microbiomes, now known to be an essential component of health and development. Despite all of these factors, only a tiny fraction of phage biodiversity is captured by sequenced genomes; in fact, phages are by far the most under-sequenced genetic entity. As Next Generation Sequencing advances, the flow of phage DNA sequence is going to increase enormously. However, phage genomes represent special problems in genomic analysis, in part because of biological factors, including rapid sequence divergence, the compression of gene sizes and extensive gene overlap. Even more problematic is the general lack of expertise in phage biology, which makes quality annotation of phage genomes inaccessible to most of the scientific public. The project will implement scalable infrastructure for bioinformatics analyses, focusing on the automated structural and functional annotation of phages. Publicly accessible infrastructure will be developed and deployed, from new and existing components to support community re-annotation of paradigm phages into "gold standard" curated annotation sets. Additionally the infrastructure will develop components focused on the acquisition and annotation of new phage genomes going forward, as the field of bacteriophage genomics rapidly expands. Tools will be developed and released encoding expert annotation knowledge to improve the state of the art in automated, quality, phage annotation. The entire project will be developed as open source software under an OSI approved license, permitting the re-implementation of the project''s infrastructure in other genome annotation communities where it will provide value. Phage Genomics Education resources developed as part of our well-established course in Phage Genomics at TAMU will be improved to take advantage of the new community resources being built. As implementation progresses, the infrastructure deployed and progress updates will be available at https://cpt.tamu.edu/phagedb/

date/time interval

  • 2016 - 2020