Comparative analysis of de novo transcriptome assembly. - Texas A&M University (TAMU) Scholar

abstract

The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis. A fast algorithm, de Bruijn graph has been successfully used for genome DNA de novo assembly; nevertheless, its performance for transcriptome assembly is unclear. In this study, we used both simulated and real RNA-Seq data, from either artificial RNA templates or human transcripts, to evaluate five de novo assemblers, ABySS, Mira, Trinity, Velvet and Oases. Of these assemblers, ABySS, Trinity, Velvet and Oases are all based on de Bruijn graph, and Mira uses an overlap graph algorithm. Various numbers of RNA short reads were selected from the External RNA Control Consortium (ERCC) data and human chromosome 22. A number of statistics were then calculated for the resulting contigs from each assembler. Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate. Trinity had relative good performance for both ERCC and human data, but it may not consistently generate full length transcripts. ABySS was the fastest method but its assembly quality was low. Mira gave a good rate for mapping its contigs onto human chromosome 22, but its computational speed is not satisfactory. Our results suggest that transcript assembly remains a challenge problem for bioinformatics society. Therefore, a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.

authors

published proceedings

Sci China Life Sci

altmetric score

0.5

author list (cited authors)

Clarke, K., Yang, Y. i., Marsh, R., Xie, L., & Zhang, K. K.

citation count

42

complete list of authors

Clarke, Kaitlin||Yang, Yi||Marsh, Ronald||Xie, Linglin||Zhang, Ke K

publication date

February 2013

publisher

Springer Nature Publisher

published in

Science China Life Sciences Journal

keywords

Algorithms
Brain
Chromosomes, Human, Pair 22
Computational Biology
Databases, Nucleic Acid
Gene Expression Profiling
High-Throughput Nucleotide Sequencing
Humans
Sequence Analysis, RNA

PubMed Central ID

23393031

Digital Object Identifier (DOI)

10.1007/s11427-013-4444-x

start page

156

end page

162

volume

56

issue

2

URL

http://dx.doi.org/10.1007/s11427-013-4444-x

Comparative analysis of de novo transcriptome assembly. Academic Article

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

PubMed Central ID

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL