Skip to main navigation Skip to search Skip to main content

BDBG: A bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG.

Original languageEnglish
Article numbere5611
JournalPeerJ
Volume2018
Issue number10
DOIs
StatePublished - 2018
Externally publishedYes

Keywords

  • Bucket-based
  • Compression
  • Dynamic de Bruijn graph
  • Next-generation sequencing

Fingerprint

Dive into the research topics of 'BDBG: A bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs'. Together they form a unique fingerprint.

Cite this