MiB: A Comparative Assembly Processing Pipeline - Texas A&M University (TAMU) Scholar

This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results. 2012 IEEE.

Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

2012 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS)

MiB: A Comparative Assembly Processing Pipeline Conference Paper