Skip to main navigation Skip to search Skip to main content

A Comprehensive Study of the Past, Present, and Future of Data Deduplication

  • Wen Xia
  • , Hong Jiang
  • , Dan Feng
  • , Fred Douglis
  • , Philip Shilane
  • , Yu Hua
  • , Min Fu
  • , Yucheng Zhang
  • , Yukun Zhou
  • Huazhong University of Science and Technology
  • University of Texas at Arlington
  • Dell

Research output: Contribution to journalArticlepeer-review

Abstract

Data deduplication, an efficient approach to data reduction, has gained increasing attention and popularity in large-scale storage systems due to the explosive growth of digital data. It eliminates redundant data at the file or subfile level and identifies duplicate content by its cryptographically secure hash signature (i.e., collision-resistant fingerprint), which is shown to be much more computationally efficient than the traditional compression approaches in large-scale storage systems. In this paper, we first review the background and key features of data deduplication, then summarize and classify the state-of-the-art research in data deduplication according to the key workflow of the data deduplication process. The summary and taxonomy of the state of the art on deduplication help identify and understand the most important design considerations for data deduplication systems. In addition, we discuss the main applications and industry trend of data deduplication, and provide a list of the publicly available sources for deduplication research and studies. Finally, we outline the open problems and future research directions facing deduplication-based storage systems.

Original languageEnglish
Article number7529062
Pages (from-to)1681-1710
Number of pages30
JournalProceedings of the IEEE
Volume104
Issue number9
DOIs
StatePublished - Sep 2016
Externally publishedYes

Keywords

  • Data compression
  • data deduplication
  • data reduction
  • delta compression
  • storage security
  • storage systems

Fingerprint

Dive into the research topics of 'A Comprehensive Study of the Past, Present, and Future of Data Deduplication'. Together they form a unique fingerprint.

Cite this