Yin, Yue (2021-07). Comprehensive Data Analysis Toolkit Development for a Low Input Bisulfite Sequencing. Doctoral Dissertation. Thesis uri icon

abstract

  • The human cell-free DNA (cfDNA) methylation profile in liquid biopsy has been utilized to diagnose early-stage disease and estimate therapy response. However, typical clinical procedures are capable of purifying only very small amounts of cfDNA. Whole-genome bisulfite sequencing (WGBS) is the gold standard for measuring DNA methylation; however, WGBS with small amounts of fragmented DNA introduces a critical challenge for data processing, analysis, and visualization. For data processing, the low mapping ratio of low input bisulfite sequencing samples resulting in genome-wide low sequencing depth and low coverage of CpG sites is a bottleneck for the clinical application of cfDNA-based WGBS assays. We developed LiBis (Low-input Bisulfite Sequencing), a novel augmentation for low-input WGBS data alignment. By dynamically clipping initially unmapped reads and remapping clipped fragments, we judiciously rescued those reads and uniquely aligned them to the genome. By substantially increasing the mapping ratio by up to 88%, LiBis dramatically improved the number of informative CpG sites and the precision in quantifying the methylation status of individual CpG sites. The high sensitivity and cost-effectiveness afforded by LiBis for low-input samples will help the discovery of genetic and epigenetic features suitable for downstream analysis and biomarker identification using liquid biopsy. For data analysis, we present Mmint, a user-friendly comprehensive integrative analysis tool. It generates publication-quality figures with epigenetic data from the following aspects: quality assessment, integrative analysis between BS-Seq and ChIP-seq data, correlation analysis between DNA methylation/Histone modification, and gene expression. Versatile analysis by Mmint can help users to interpret epigenetic data comprehensively and provide potential novel biological insights. To further simplify the data utilization and visualization, especially for researchers who do not specialize in bioinformatics skills, we implement GsmPlot. GsmPlot can simply accept GSM IDs to automatically download NCBI data or accept user's local bigwig files as input to plot the data of interest on promoters, exons or any other user-defined genome locations and generate UCSC visualization tracks. By linking public data repository and in-house data, GsmPlot can spark data-driven ideas and hence promote epigenetic research.

publication date

  • July 2021