TY - GEN
T1 - Reducing Chunk Fragmentation for In-Line Delta Compressed and Deduplicated Backup Systems
AU - Zhang, Yucheng
AU - Feng, Dan
AU - Hua, Yu
AU - Hu, Yuchong
AU - Xia, Wen
AU - Fu, Min
AU - Tang, Xiaolan
AU - Wang, Zhikun
AU - Huang, Fangting
AU - Zhou, Yukun
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/6
Y1 - 2017/9/6
N2 - Chunk-level deduplication, while robust in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation and accelerate the restore speed. Delta compression can remove redundant data between non-duplicate but similar chunks which cannot be eliminated by chunk-level deduplication. Some applications use delta compression as a complement for chunk-level deduplication to attain extra space and bandwidth savings. However, we observe that delta compression introduces a new type of chunk fragmentation stemming from delta compressed chunks whose base chunks are fragmented. We refer to such delta compressed chunks as base-fragmented chunks. We found that this new type of chunk fragmentation has a more severely impact on the restore performance than the chunk fragmentation introduced by chunk-level deduplication and cannot be reduced by existing rewriting algorithms. In order to address the problem due to the base-fragmented chunks, we propose SDC, a scheme that selectively performs delta compression after chunk-level deduplication. The main idea behind SDC is to simulate a restore cache to identify the non-base-fragmented chunks and only perform delta compression for these chunks, thus avoiding the new type of chunk fragmentation. Due to the locality among the backup streams, most of the non-base-fragmented chunks can be detected by the simulated restore cache. Experimental results based on real-world datasets show that SDC improves the restore performance of the delta compressed and deduplicated backup system by 1.93X-7.48X, and achieves 95.5%-97.4% of its compression, while imposing negligible impact on the backup throughput.
AB - Chunk-level deduplication, while robust in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation and accelerate the restore speed. Delta compression can remove redundant data between non-duplicate but similar chunks which cannot be eliminated by chunk-level deduplication. Some applications use delta compression as a complement for chunk-level deduplication to attain extra space and bandwidth savings. However, we observe that delta compression introduces a new type of chunk fragmentation stemming from delta compressed chunks whose base chunks are fragmented. We refer to such delta compressed chunks as base-fragmented chunks. We found that this new type of chunk fragmentation has a more severely impact on the restore performance than the chunk fragmentation introduced by chunk-level deduplication and cannot be reduced by existing rewriting algorithms. In order to address the problem due to the base-fragmented chunks, we propose SDC, a scheme that selectively performs delta compression after chunk-level deduplication. The main idea behind SDC is to simulate a restore cache to identify the non-base-fragmented chunks and only perform delta compression for these chunks, thus avoiding the new type of chunk fragmentation. Due to the locality among the backup streams, most of the non-base-fragmented chunks can be detected by the simulated restore cache. Experimental results based on real-world datasets show that SDC improves the restore performance of the delta compressed and deduplicated backup system by 1.93X-7.48X, and achieves 95.5%-97.4% of its compression, while imposing negligible impact on the backup throughput.
UR - https://www.scopus.com/pages/publications/85031735835
U2 - 10.1109/NAS.2017.8026874
DO - 10.1109/NAS.2017.8026874
M3 - 会议稿件
AN - SCOPUS:85031735835
T3 - 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 - Proceedings
BT - 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017
Y2 - 7 August 2017 through 9 August 2017
ER -