TY - GEN
T1 - CDAC
T2 - 35th Symposium on Mass Storage Systems and Technologies, MSST 2019
AU - Tan, Yujuan
AU - Xia, Wen
AU - Xie, Jing
AU - Xu, Congcong
AU - Yan, Zhichao
AU - Jiang, Hong
AU - Zhao, Yajun
AU - Fu, Min
AU - Chen, Xianzhang
AU - Liu, Duo
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Data deduplication, as a proven technology for effective data reduction in backup and archive storage systems, also demonstrates the promise in increasing the logical space capacity of storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they do improve the hit ratios compared to the caching algorithms without deduplication, especially when the cache block size is set to 4KB. But when the block size is larger than 4KB, a clear trend for modern storage systems, their hit ratios are significantly reduced. A slight increase in hit ratios due to deduplicationmay not be able to improve the overall storage performance because of the high overhead created by deduplication. To address this problem, in this paper we propose CDAC, a Content-driven Deduplication-Aware Cache, which focuses on exploiting the blocks' content redundancy and their intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDACLRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU and DARC, by up to 19.49X in read cache hit ratio, with an average of 1.95X under real-world traces when the cache size ranges from 20% to 80% of the working set size and the block size ranges from 4KB to 64 KB.
AB - Data deduplication, as a proven technology for effective data reduction in backup and archive storage systems, also demonstrates the promise in increasing the logical space capacity of storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they do improve the hit ratios compared to the caching algorithms without deduplication, especially when the cache block size is set to 4KB. But when the block size is larger than 4KB, a clear trend for modern storage systems, their hit ratios are significantly reduced. A slight increase in hit ratios due to deduplicationmay not be able to improve the overall storage performance because of the high overhead created by deduplication. To address this problem, in this paper we propose CDAC, a Content-driven Deduplication-Aware Cache, which focuses on exploiting the blocks' content redundancy and their intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDACLRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU and DARC, by up to 19.49X in read cache hit ratio, with an average of 1.95X under real-world traces when the cache size ranges from 20% to 80% of the working set size and the block size ranges from 4KB to 64 KB.
KW - Content
KW - Data deduplication
KW - SSD Cache
UR - https://www.scopus.com/pages/publications/85074982950
U2 - 10.1109/MSST.2019.00008
DO - 10.1109/MSST.2019.00008
M3 - 会议稿件
AN - SCOPUS:85074982950
T3 - IEEE Symposium on Mass Storage Systems and Technologies
SP - 282
EP - 291
BT - Proceedings - 2019 35th Symposium on Mass Storage Systems and Technologies, MSST 2019
PB - IEEE Computer Society
Y2 - 20 May 2019 through 24 May 2019
ER -