Normalization of transposon-mutant library sequencing datasets to improve identification of conditionally essential genes.

abstract

Sequencing of transposon-mutant libraries using next-generation sequencing (TnSeq) has become a popular method for determining which genes and non-coding regions are essential for growth under various conditions in bacteria. For methods that rely on quantitative comparison of counts of reads at transposon insertion sites, proper normalization of TnSeq datasets is vitally important. Real TnSeq datasets are often noisy and exhibit a significant skew that can be dominated by high counts at a small number of sites (often for non-biological reasons). If two datasets that are not appropriately normalized are compared, it might cause the artifactual appearance of Differentially Essential (DE) genes in a statistical test, constituting type I errors (false positives). In this paper, we propose a novel method for normalization of TnSeq datasets that corrects for the skew of read-count distributions by fitting them to a Beta-Geometric distribution. We show that this read-count correction procedure reduces the number of false positives when comparing replicate datasets grown under the same conditions (for which no genuine differences in essentiality are expected). We compare these results to results obtained with other normalization procedures, and show that it results in greater reduction in the number of false positives. In addition we investigate the effects of normalization on the detection of DE genes.

authors

Ioerger, Thomas

published proceedings

J Bioinform Comput Biol

altmetric score

0.25

author list (cited authors)

DeJesus, M. A., & Ioerger, T. R.

citation count

11

complete list of authors

DeJesus, Michael A||Ioerger, Thomas R

publication date

June 2016

publisher

World Scientific Publishing Publisher

published in

Journal of Bioinformatics and Computational Biology Journal

keywords

Computational Biology
DNA Transposable Elements
Databases, Nucleic Acid
Essentiality
Gene Library
Genes, Essential
High-Throughput Nucleotide Sequencing
Normalization
Tnseq

Digital Object Identifier (DOI)

10.1142/S021972001642004X

start page

1642004

end page

1642004

volume

14

issue

3

URL

http%3A%2F%2Fdx.doi.org%2F10.1142%2Fs021972001642004x

Normalization of transposon-mutant library sequencing datasets to improve identification of conditionally essential genes. Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL