Skip to main navigation Skip to search Skip to main content

The Design of Fast Delta Encoding for Delta Compression Based Storage Systems

  • Harbin Institute of Technology Shenzhen
  • Peng Cheng Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks' words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%.

Original languageEnglish
Article number23
JournalACM Transactions on Storage
Volume20
Issue number4
DOIs
StatePublished - 6 Aug 2024
Externally publishedYes

Keywords

  • Data reduction
  • compression
  • data deduplication
  • delta encoding

Fingerprint

Dive into the research topics of 'The Design of Fast Delta Encoding for Delta Compression Based Storage Systems'. Together they form a unique fingerprint.

Cite this