Optimal hybrid sequencing and assembly: Feasibility conditions for accurate genome reconstruction and cost minimization strategy
- Additional Document Info
- View All
Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. Modern sequencers generate a huge number of small sequence fragments called reads, where the read length and the per-base sequencing cost depend on the technology used. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. However, rigorous investigation of the feasibility conditions for complete genome reconstruction and the optimal sequencing strategy for minimizing the sequencing cost has been conspicuously missing. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this paper, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. Through extensive evaluations based on several genomes and different read sets, we verify the derived feasibility conditions and demonstrate the performance of the proposed optimal hybrid sequencing and assembly strategy.
author list (cited authors)
Chen, C., Ghaffari, N., Qian, X., & Yoon, B.