A statistical method to identify recombination in bacterial genomes based on SNP incompatibility
- Additional Document Info
- View All
BACKGROUND: Phylogeny estimation for bacteria is likely to reflect their true evolutionary histories only if they are highly clonal. However, recombination events could occur during evolution for some species. The reconstruction of phylogenetic trees from an alignment without considering recombination could be misleading, since the relationships among strains in some parts of the genome might be different than in others. Using a single, global tree can create the appearance of homoplasy in recombined regions. Hence, the identification of recombination breakpoints is essential to better understand the evolutionary relationships of isolates among a bacterial population. RESULTS: Previously, we have developed a method (called ACR) to detect potential breakpoints in an alignment by evaluating compatibility of polymorphic sites in a sliding window. To assess the statistical significance of candidate breakpoints, we propose an extension of the algorithm (ptACR) that applies a permutation test to generate a null distribution for comparing the average local compatibility. The performance of ptACR is evaluated on both simulated and empirical datasets. ptACR is shown to have similar sensitivity (true positive rate) but a lower false positive rate and higher F1 score compared to basic ACR. When used to analyze a collection of clinical isolates of Staphylococcus aureus, ptACR finds clear evidence of recombination events in this bacterial pathogen, and is able to identify statistically significant boundaries of chromosomal regions with distinct phylogenies. CONCLUSIONS: ptACR is an accurate and efficient method for identifying genomic regions affected by recombination in bacterial genomes.
author list (cited authors)
Lai, Y., & Ioerger, T. R.