Skip to main navigation Skip to search Skip to main content

SIlo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput

  • Wen Xia
  • , Hong Jiang
  • , Dan Feng
  • , Yu Hua
  • Huazhong University of Science and Technology
  • University of Nebraska-Lincoln

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data Deduplication is becoming increasingly popular in storage systems as a space-efficient approach to data backup and archiving. Most existing state-of-the-art deduplication methods are either locality based or similarity based, which, according to our analysis, do not work adequately in many situations. While the former produces poor deduplication throughput when there is little or no locality in datasets, the latter can fail to identify and thus remove significant amounts of redundant data when there is a lack of similarity among files. In this paper, we present SiLo, a near-exact deduplication system that effectively and complementarily exploits similarity and locality to achieve high duplicate elimination and throughput at extremely low RAM overheads. The main idea behind SiLo is to expose and exploit more similarity by grouping strongly correlated small files into a segment and segmenting large files, and to leverage locality in the backup stream by grouping contiguous segments into blocks to capture similar and duplicate data missed by the probabilistic similarity detection. By judiciously enhancing similarity through the exploitation of locality and vice versa, the SiLo approach is able to significantly reduce RAM usage for index-lookup and maintain a very high deduplication throughput. Our experimental evaluation of SiLo based on real-world datasets shows that the SiLo system consistently and significantly outperforms two existing state-of-the-art system, one based on similarity and the other based on locality, under various workload conditions.

Original languageEnglish
Title of host publicationProceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC 2011
PublisherUSENIX Association
Pages285-298
Number of pages14
ISBN (Electronic)9781931971850
StatePublished - 2011
Externally publishedYes
Event2011 USENIX Annual Technical Conference, USENIX ATC 2011 - Portland, United States
Duration: 15 Jun 201117 Jun 2011

Publication series

NameProceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC 2011

Conference

Conference2011 USENIX Annual Technical Conference, USENIX ATC 2011
Country/TerritoryUnited States
CityPortland
Period15/06/1117/06/11

Fingerprint

Dive into the research topics of 'SIlo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput'. Together they form a unique fingerprint.

Cite this