Fast and Efficient Genotype Encoding Using Sparse 2D Bitmaps for Database-Driven Genomics Applications
- Additional Document Info
- View All
© 2018 IEEE. Data management is a main challenge facing many genomics applications. A central target for genomic research is identifying and storing genetic variants present in human populations. Recently, there has been increasing interest in adopting a database representation for variant information. However, the massive scale of variant data pose many storage and access time challenges for database-driven genomic applications. Efficient database-driven variant encoding techniques need to be developed to address this problem. In this paper we propose a variant encoding technique for Single Nucleotide Polymorphisms (SNPs) based on 2D sparse bitmaps. The proposed encoding technique was designed to achieve high compressibility while minimizing access time. Using this approach, we were able to reduce the database storage space of the 1000 Genome dataset pilot data to 4.75GB from the 45.24GB required in a basic implementation. Our approach achieved this reduction while reducing database access time by around 100 times. Furthermore, we compared our approach to the popular Ensembl Variant Database and achieved database size reductions reaching up to 47.33% without compromising access time.
author list (cited authors)
Kawam, A. A., & Datta, A.