Skip to main navigation Skip to search Skip to main content

PSG-Codes: An Erasure Codes Family with High Fault Tolerance and Fast Recovery

  • Shiyi Li
  • , Cao Qiang*
  • , Lei Tian
  • , Shenggang Wan
  • , Lu Qian
  • , Changsheng Xie
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As hard disk failure rates are rarely improved and the reconstruction time for TB-level disks typically amounts to days, multiple concurrent disk/storage node failures in datacenter storage systems become common and frequent. As a result, the erasure coding schemes used in datacenters must meet the critical requirements of high fault tolerance, high storage efficiency, and fast fault recovery. In this paper, we introduce a new XOR-based non-MDS erasure code family with an ability of tolerating up to 12-disk/node failures, called PSG-Codes. The basic idea behind PSG-Codes is to partition disks into groups, and exploit short parity chains to generate parity units. Then, the parity chain is further shortened by varying the number of parity elements for each strip. We conduct a simulation-based study to search configuration parameter space of PSG-Codes, and prove that PSG-Codes can tolerate up to 12 disk/node failures. Compared with a well-known XOR-based non-MDS code, WEAVER codes, PSG-Codes have higher storage efficiency and lower reconstruction cost. Moreover, the storage efficiency and performance of PSG-Codes are also competitive with another stat-of-The-Art GF-based non-MDS codes, LRC codes.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshops, SRDSW 2015
PublisherIEEE Computer Society
Pages47-57
Number of pages11
ISBN (Electronic)9781467393027, 9781509000920
DOIs
StatePublished - 4 Jan 2016
Externally publishedYes
Event34th IEEE International Symposium on Reliable Distributed Systems, SRDS 2015 - Montreal, Canada
Duration: 28 Sep 20151 Oct 2015

Publication series

NameProceedings of the IEEE Symposium on Reliable Distributed Systems
Volume2016-January
ISSN (Print)1060-9857

Conference

Conference34th IEEE International Symposium on Reliable Distributed Systems, SRDS 2015
Country/TerritoryCanada
CityMontreal
Period28/09/151/10/15

Keywords

  • Storage systems
  • erasure codes
  • fast recovery
  • fault tolerance
  • reliability

Fingerprint

Dive into the research topics of 'PSG-Codes: An Erasure Codes Family with High Fault Tolerance and Fast Recovery'. Together they form a unique fingerprint.

Cite this