Skip to main navigation Skip to search Skip to main content

Greedy Transfer Planning Search for Improving Repair Throughput of RDP-like Coded Storage Clusters

  • Juehao Chen
  • , Shiyi Li*
  • , Wen Xia
  • , Shuaipeng Zhang
  • , Qicong Lin
  • , Haojun Hu
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
  • Huazhong University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the increasing scale of data and user demands for low latency, the development of large-scale clusters has become a trend. To ensure high availability of data in data clusters, XOR-based erasure code fault-tolerant technologies are widely used due to their low storage and computational overhead. Meanwhile, as the scale of clusters ranges from hundreds to thousands, the probability of multiple node failures is not negligible. This can lead to serious consequences, such as data loss, and should be recovered as soon as possible. However, codes such as RDP and EVENODD can easily lead to network congestion when recovering in the event of concurrent failures, making it challenging to recover quickly.To address this issue, we propose a novel network transfer plan search algorithm, Greedy Row-Diagonal Parity Search or GRS for short. GRS optimally allocates the network traffic generated during the repair process by greedily utilizing idle bandwidth and leveraging the commutative property of XOR operations, ensuring a more even distribution of traffic across the cluster network, which improves the repair throughput.We build a prototype in a distributed erasure-coded cluster and conduct experiment evaluation. The experimental results indicate that, compared to existing repair optimization methods, GRS improves repair throughput by 230%-880%.

Original languageEnglish
Title of host publication2024 IEEE/ACM 32nd International Symposium on Quality of Service, IWQoS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350350128
DOIs
StatePublished - 2024
Externally publishedYes
Event32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024 - Guangzhou, China
Duration: 19 Jun 202421 Jun 2024

Publication series

NameIEEE International Workshop on Quality of Service, IWQoS
ISSN (Print)1548-615X

Conference

Conference32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024
Country/TerritoryChina
CityGuangzhou
Period19/06/2421/06/24

Keywords

  • Availability
  • Distributed systems
  • Erasure code
  • Network transfer
  • Storage cluster

Fingerprint

Dive into the research topics of 'Greedy Transfer Planning Search for Improving Repair Throughput of RDP-like Coded Storage Clusters'. Together they form a unique fingerprint.

Cite this