A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms. Academic Article uri icon


  • BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. RESULTS: We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. CONCLUSIONS: Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available.

published proceedings

  • BMC Genomics

altmetric score

  • 17.446

author list (cited authors)

  • Sze, S., Pimsler, M. L., Tomberlin, J. K., Jones, C. D., & Tarone, A. M.

citation count

  • 7

complete list of authors

  • Sze, Sing-Hoi||Pimsler, Meaghan L||Tomberlin, Jeffery K||Jones, Corbin D||Tarone, Aaron M

publication date

  • January 2017