Zhu, Weixi (2020-07). Improving CNV Detection Efficacy in a Nonmodel Organism Through Simulations and Optimization of Parameters in the ExomeDepth Program. Master's Thesis. Thesis uri icon

abstract

  • Copy number variants (CNVs) represent changes in the number of DNA segments from 50 bp to several millions of nucleotides that often include genic sequences. CNVs play a critical role in evolution and are related to disease in humans. Increasingly, genome and exome resequencing efforts have been used to identify CNVs. Whole exome sequencing (WES) data provide the advantage of informing on polymorphisms, including CNVs, in genic regions at a fraction of the cost necessary for whole genome sequencing (WGS). However, the performance of current CNV detection tools using WES data in species with genomic architecture different from model organisms has yet to be determined. In this research, I investigated the ability of a widespread CNV detector relying on WES data, ExomeDepth, to accurately identify CNVs in loblolly pine (Pinus taeda L.), a major forest species in the U.S. that is characterized by a large genome size (>20 Gbp) and by available WES data. Using CNV simulations, I first determined the sensitivity and false discovery rate of ExomeDepth, which showed high sensitivity and low false discovery rate for deletions but performed relatively poorly with duplications. The detection of duplications is especially affected by ExomeDepth's main parameter, transition probability. Importantly, intersecting detected CNVs from multiple resampled runs of ExomeDepth significantly decreases the false discovery rate for duplication, but it might be challenging to apply to large datasets because of the required computational power. Overall, this project has laid the foundations for the accurate detection of CNVs based on WES data in loblolly pine, which might be useful on other nonmodel organisms.

publication date

  • July 2020