DeJesus, Michael A. (2012-07). Bayesian Analysis of Transposon Mutagenesis Data. Master's Thesis. Thesis uri icon

abstract

  • Determining which genes are essential for growth of a bacterial organism is an important question to answer as it is useful for the discovery of drugs that inhibit critical biological functions of a pathogen. To evaluate essentiality, biologists often use transposon mutagenesis to disrupt genomic regions within an organism, revealing which genes are able to withstand disruption and are therefore not required for growth. The development of next-generation sequencing technology augments transposon mutagenesis by providing high-resolution sequence data that identifies the exact location of transposon insertions in the genome. Although this high-resolution information has already been used to assess essentiality at a genome-wide scale, no formal statistical model has been developed capable of quantifying significance. This thesis presents a formal Bayesian framework for analyzing sequence information obtained from transposon mutagenesis experiments. Our method assesses the statistical significance of gaps in transposon coverage that are indicative of essential regions through a Gumbel distribution, and utilizes a Metropolis-Hastings sampling procedure to obtain posterior estimates of the probability of essentiality for each gene. We apply our method to libraries of M. tuberculosis transposon mutants, to identify genes essential for growth in vitro, and show concordance with previous essentiality results based on hybridization. Furthermore, we show how our method is capable of identifying essential domains within genes, by detecting significant sub-regions of open-reading frames unable to withstand disruption. We show that several genes involved in PG biosynthesis have essential domains.

publication date

  • May 2012