Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data. Academic Article uri icon

abstract

  • BACKGROUND: The advance of high-throughput sequencing has made it possible to obtain new transcriptomes and study splicing mechanisms in non-model organisms. In these studies, there is often a need to investigate the transcriptomes of two related organisms at the same time in order to find the similarities and differences between them. The traditional approach to address this problem is to perform de novo transcriptome assemblies to obtain predicted transcripts for these organisms independently and then employ similarity comparison algorithms to study them. RESULTS: Instead of obtaining predicted transcripts for these organisms separately from the intermediate de Bruijn graph structures employed by de novo transcriptome assembly algorithms, we develop an algorithm to allow direct comparisons between paths in two de Bruijn graphs by first enumerating short paths in both graphs, and iteratively extending paths in one graph that have high similarity to paths in the other graph to obtain longer corresponding paths between the two graphs. These paths represent predicted transcripts that are present in both organisms. CONCLUSIONS: Our approach generalizes the pairwise sequence alignment problem to allow the input to be non-linear structures, and provides a heuristic to reliably recover similar paths from the two structures. Our algorithm allows detailed investigation of the similarities and differences in alternative splicing between the two organisms at both the sequence and structure levels, even in the absence of reference transcriptomes or a closely related model organism.

published proceedings

  • BMC Genomics

altmetric score

  • 7.846

author list (cited authors)

  • Fu, S., Tarone, A. M., & Sze, S.

citation count

  • 4

complete list of authors

  • Fu, Shuhua||Tarone, Aaron M||Sze, Sing-Hoi

publication date

  • December 2015