A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms.

abstract

BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

authors

published proceedings

BMC Genomics

altmetric score

20.876

author list (cited authors)

Sze, S., Parrott, J. J., & Tarone, A. M.

citation count

0

complete list of authors

Sze, Sing-Hoi||Parrott, Jonathan J||Tarone, Aaron M

publication date

January 2017

publisher

Springer Nature Publisher

published in

BMC GENOMICS Journal

keywords

Algorithms
Animals
Arabidopsis
De Novo Transcriptome Assembly
Divide-and-conquer
Drosophila Melanogaster
Gene Expression Profiling
RNA-seq
Schizosaccharomyces
Sequence Analysis, RNA

PubMed Central ID

29244008

Digital Object Identifier (DOI)

10.1186/s12864-017-4270-9

start page

895

volume

18

issue

Suppl 10

URL

http%3A%2F%2Fdx.doi.org%2F10.1186%2Fs12864-017-4270-9

A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms. Conference Paper

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

volume

issue

Other

URL