TY - GEN
T1 - Design tradeoffs for data deduplication performance in backup workloads
AU - Fu, Min
AU - Feng, Dan
AU - Hua, Yu
AU - He, Xubin
AU - Chen, Zuoning
AU - Xia, Wen
AU - Zhang, Yucheng
AU - Tan, Yujuan
PY - 2015
Y1 - 2015
N2 - Data deduplication has become a standard component in modern backup systems. In order to understand the fundamental tradeoffs in each of its design choices (such as prefetching and sampling), we disassemble data deduplication into a large N-dimensional parameter space. Each point in the space is of various parameter settings, and performs a tradeoff among backup and restore performance, memory footprint, and storage cost. Existing and potential solutions can be considered as specific points in the space. Then, we propose a general-purpose framework to evaluate various deduplication solutions in the space. Given that no single solution is perfect in all metrics, our goal is to find some reasonable solutions that have sustained backup performance and perform a suitable tradeoff between deduplication ratio, memory footprints, and restore performance. Our findings from extensive experiments using real-world workloads provide a detailed guide to make efficient design decisions according to the desired tradeoff.
AB - Data deduplication has become a standard component in modern backup systems. In order to understand the fundamental tradeoffs in each of its design choices (such as prefetching and sampling), we disassemble data deduplication into a large N-dimensional parameter space. Each point in the space is of various parameter settings, and performs a tradeoff among backup and restore performance, memory footprint, and storage cost. Existing and potential solutions can be considered as specific points in the space. Then, we propose a general-purpose framework to evaluate various deduplication solutions in the space. Given that no single solution is perfect in all metrics, our goal is to find some reasonable solutions that have sustained backup performance and perform a suitable tradeoff between deduplication ratio, memory footprints, and restore performance. Our findings from extensive experiments using real-world workloads provide a detailed guide to make efficient design decisions according to the desired tradeoff.
UR - https://www.scopus.com/pages/publications/85077062946
M3 - 会议稿件
AN - SCOPUS:85077062946
T3 - Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015
SP - 331
EP - 344
BT - Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015
PB - USENIX Association
T2 - 13th USENIX Conference on File and Storage Technologies, FAST 2015
Y2 - 16 February 2015 through 19 February 2015
ER -