Skip to main navigation Skip to search Skip to main content

Design tradeoffs for data deduplication performance in backup workloads

  • Min Fu
  • , Dan Feng*
  • , Yu Hua
  • , Xubin He
  • , Zuoning Chen
  • , Wen Xia
  • , Yucheng Zhang
  • , Yujuan Tan
  • *Corresponding author for this work
  • Huazhong University of Science and Technology
  • Virginia Commonwealth University
  • National Engineering Research Center for Parallel Computer
  • Chongqing University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data deduplication has become a standard component in modern backup systems. In order to understand the fundamental tradeoffs in each of its design choices (such as prefetching and sampling), we disassemble data deduplication into a large N-dimensional parameter space. Each point in the space is of various parameter settings, and performs a tradeoff among backup and restore performance, memory footprint, and storage cost. Existing and potential solutions can be considered as specific points in the space. Then, we propose a general-purpose framework to evaluate various deduplication solutions in the space. Given that no single solution is perfect in all metrics, our goal is to find some reasonable solutions that have sustained backup performance and perform a suitable tradeoff between deduplication ratio, memory footprints, and restore performance. Our findings from extensive experiments using real-world workloads provide a detailed guide to make efficient design decisions according to the desired tradeoff.

Original languageEnglish
Title of host publicationProceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015
PublisherUSENIX Association
Pages331-344
Number of pages14
ISBN (Electronic)9781931971201
StatePublished - 2015
Externally publishedYes
Event13th USENIX Conference on File and Storage Technologies, FAST 2015 - Santa Clara, United States
Duration: 16 Feb 201519 Feb 2015

Publication series

NameProceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015

Conference

Conference13th USENIX Conference on File and Storage Technologies, FAST 2015
Country/TerritoryUnited States
CitySanta Clara
Period16/02/1519/02/15

Fingerprint

Dive into the research topics of 'Design tradeoffs for data deduplication performance in backup workloads'. Together they form a unique fingerprint.

Cite this