Skip to main navigation Skip to search Skip to main content

Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio

  • Harbin Institute of Technology Shenzhen
  • Dell

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Fine-grained deduplication, which first removes identical chunks and then eliminates redundancies between similar but non-identical chunks (i.e., delta compression), could exploit workloads' compressibility to achieve a very high deduplication ratio but suffers from poor backup/restore performance. This makes it not as popular as chunk-level deduplication thus far. This is because allowing workloads to share more references among similar chunks further reduces spatial/ temporal locality, causes more I/O overhead, and leads to worse backup/restore performance. In this paper, we address issues for different forms of poor locality with several techniques, and propose MeGA, which achieves backup and restore speed close to chunklevel deduplication while preserving fine-grained deduplication's significant deduplication ratio advantage. Specifically, MeGA applies 1a backup-workflow-oriented delta selector to address poor locality when reading base chunks, and 2 a delta-friendly data layout and "Always-Forward-Reference" traversing in the restore workflow to deal with the poor spatial/ temporal locality of deduplicated data. Evaluations on four datasets show that MeGA achieves a better performance than other fine-grained deduplication approaches. In particular, compared with the traditional greedy approach, MeGA achieves a 4:47-34:45× higher backup performance and a 30-105× higher restore performance while maintaining a very high deduplication ratio.

Original languageEnglish
Title of host publicationProceedings of the 2022 USENIX Annual Technical Conference, ATC 2022
PublisherUSENIX Association
Pages19-35
Number of pages17
ISBN (Electronic)9781939133298
StatePublished - 2022
Externally publishedYes
Event2022 USENIX Annual Technical Conference, ATC 2022 - Carlsbad, United States
Duration: 11 Jul 202213 Jul 2022

Publication series

NameProceedings of the 2022 USENIX Annual Technical Conference, ATC 2022

Conference

Conference2022 USENIX Annual Technical Conference, ATC 2022
Country/TerritoryUnited States
CityCarlsbad
Period11/07/2213/07/22

Fingerprint

Dive into the research topics of 'Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio'. Together they form a unique fingerprint.

Cite this