TY - GEN
T1 - Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio
AU - Zou, Xiangyu
AU - Xia, Wen
AU - Shilane, Philip
AU - Zhang, Haijun
AU - Wang, Xuan
N1 - Publisher Copyright:
© 2022 USENIX Annual Technical Conference, ATC 2022.All rights reserved.
PY - 2022
Y1 - 2022
N2 - Fine-grained deduplication, which first removes identical chunks and then eliminates redundancies between similar but non-identical chunks (i.e., delta compression), could exploit workloads' compressibility to achieve a very high deduplication ratio but suffers from poor backup/restore performance. This makes it not as popular as chunk-level deduplication thus far. This is because allowing workloads to share more references among similar chunks further reduces spatial/ temporal locality, causes more I/O overhead, and leads to worse backup/restore performance. In this paper, we address issues for different forms of poor locality with several techniques, and propose MeGA, which achieves backup and restore speed close to chunklevel deduplication while preserving fine-grained deduplication's significant deduplication ratio advantage. Specifically, MeGA applies 1a backup-workflow-oriented delta selector to address poor locality when reading base chunks, and 2 a delta-friendly data layout and "Always-Forward-Reference" traversing in the restore workflow to deal with the poor spatial/ temporal locality of deduplicated data. Evaluations on four datasets show that MeGA achieves a better performance than other fine-grained deduplication approaches. In particular, compared with the traditional greedy approach, MeGA achieves a 4:47-34:45× higher backup performance and a 30-105× higher restore performance while maintaining a very high deduplication ratio.
AB - Fine-grained deduplication, which first removes identical chunks and then eliminates redundancies between similar but non-identical chunks (i.e., delta compression), could exploit workloads' compressibility to achieve a very high deduplication ratio but suffers from poor backup/restore performance. This makes it not as popular as chunk-level deduplication thus far. This is because allowing workloads to share more references among similar chunks further reduces spatial/ temporal locality, causes more I/O overhead, and leads to worse backup/restore performance. In this paper, we address issues for different forms of poor locality with several techniques, and propose MeGA, which achieves backup and restore speed close to chunklevel deduplication while preserving fine-grained deduplication's significant deduplication ratio advantage. Specifically, MeGA applies 1a backup-workflow-oriented delta selector to address poor locality when reading base chunks, and 2 a delta-friendly data layout and "Always-Forward-Reference" traversing in the restore workflow to deal with the poor spatial/ temporal locality of deduplicated data. Evaluations on four datasets show that MeGA achieves a better performance than other fine-grained deduplication approaches. In particular, compared with the traditional greedy approach, MeGA achieves a 4:47-34:45× higher backup performance and a 30-105× higher restore performance while maintaining a very high deduplication ratio.
UR - https://www.scopus.com/pages/publications/85140993239
M3 - 会议稿件
AN - SCOPUS:85140993239
T3 - Proceedings of the 2022 USENIX Annual Technical Conference, ATC 2022
SP - 19
EP - 35
BT - Proceedings of the 2022 USENIX Annual Technical Conference, ATC 2022
PB - USENIX Association
T2 - 2022 USENIX Annual Technical Conference, ATC 2022
Y2 - 11 July 2022 through 13 July 2022
ER -