De Jesus Aneiro, Michael A. (2016-12). Statistical Analysis of Transposon Sequencing Data to Determine Essential Genes. Doctoral Dissertation. Thesis uri icon

abstract

  • Transposon Sequencing (TnSeq) has become a popular biological tool for assessing the phenotypes of large libraries of bacterial mutants at the same time. This allows for high-throughput identification of genes which are essential for growth, thus providing valuable information about the function of those genes and the discovery of potential drug targets that could lead to treatments. However, analysis of data obtained from TnSeq is challenging as it requires estimating unknown parameters from data that is often noisy and likely coming from a mixture of different phenotypes. In addition, the classification of essentiality is not known a priori, requiring unsupervised methods capable of identifying key features in the data to confidently determine essentiality. We present several models capable of identifying essential genes while overcoming the difficulties that are present in analyzing TnSeq data. Together, these methods provide ways to analyze TnSeq data in one or multiple conditions, confined within gene boundaries or across the entire genome, and while reducing the impact of noise and outliers that are often present in this type of data.
  • Transposon Sequencing (TnSeq) has become a popular biological tool for assessing the phenotypes of large libraries of bacterial mutants at the same time. This allows for high-throughput identification of genes which are essential for growth, thus providing valuable information about the function of those genes and the discovery of potential drug targets that could lead to treatments.
    However, analysis of data obtained from TnSeq is challenging as it requires estimating unknown parameters from data that is often noisy and likely coming from a mixture of different phenotypes. In addition, the classification of essentiality is not known a priori, requiring unsupervised methods capable of identifying key features in the data to confidently determine essentiality.
    We present several models capable of identifying essential genes while overcoming the difficulties that are present in analyzing TnSeq data. Together, these methods provide ways to analyze TnSeq data in one or multiple conditions, confined within gene boundaries or across the entire genome, and while reducing the impact of noise and outliers that are often present in this type of data.

publication date

  • December 2016