Skip to main navigation Skip to search Skip to main content

P-Dedupe: Exploiting parallelism in data deduplication system

  • Wen Xia
  • , Hong Jiang
  • , Dan Feng*
  • , Lei Tian
  • , Min Fu
  • , Zhongtao Wang
  • *Corresponding author for this work
  • Huazhong University of Science and Technology
  • University of Nebraska-Lincoln

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data deduplication, an efficient space reduction method, has gained increasing attention and popularity in dataintensive storage systems. Most existing state-of-the-art deduplication methods remove redundant data at either the file level or the chunk level, which incurs unavoidable and significant overheads in time (due to chunking and fingerprinting). These overheads can degrade the write performance to an unacceptable level in a data storage system. In this paper, we propose PDedupe, a fast and scalable deduplication system. The main idea behind P-Dedupe is to fully compose pipelined and parallel computations of data deduplication by effectively exploiting the idle resources of modern computer systems with multi-core and many-core processor architectures. Our experimental evaluation of the P-Dedupe prototype based on real-world datasets shows that P-Dedupe speeds up the deduplication write throughput by a factor of 2∼4 through pipelining deduplication and parallelizing hash calculation and achieves 80%∼250% of the performance of a conventional storage system without data deduplication.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE 7th International Conference on Networking, Architecture and Storage, NAS 2012
Pages338-347
Number of pages10
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 IEEE 7th International Conference on Networking, Architecture and Storage, NAS 2012 - Xiamen, Fujian, China
Duration: 28 Jun 201230 Jun 2012

Publication series

NameProceedings - 2012 IEEE 7th International Conference on Networking, Architecture and Storage, NAS 2012

Conference

Conference2012 IEEE 7th International Conference on Networking, Architecture and Storage, NAS 2012
Country/TerritoryChina
CityXiamen, Fujian
Period28/06/1230/06/12

Fingerprint

Dive into the research topics of 'P-Dedupe: Exploiting parallelism in data deduplication system'. Together they form a unique fingerprint.

Cite this