TY - GEN
T1 - PSG-Codes
T2 - 34th IEEE International Symposium on Reliable Distributed Systems, SRDS 2015
AU - Li, Shiyi
AU - Qiang, Cao
AU - Tian, Lei
AU - Wan, Shenggang
AU - Qian, Lu
AU - Xie, Changsheng
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/4
Y1 - 2016/1/4
N2 - As hard disk failure rates are rarely improved and the reconstruction time for TB-level disks typically amounts to days, multiple concurrent disk/storage node failures in datacenter storage systems become common and frequent. As a result, the erasure coding schemes used in datacenters must meet the critical requirements of high fault tolerance, high storage efficiency, and fast fault recovery. In this paper, we introduce a new XOR-based non-MDS erasure code family with an ability of tolerating up to 12-disk/node failures, called PSG-Codes. The basic idea behind PSG-Codes is to partition disks into groups, and exploit short parity chains to generate parity units. Then, the parity chain is further shortened by varying the number of parity elements for each strip. We conduct a simulation-based study to search configuration parameter space of PSG-Codes, and prove that PSG-Codes can tolerate up to 12 disk/node failures. Compared with a well-known XOR-based non-MDS code, WEAVER codes, PSG-Codes have higher storage efficiency and lower reconstruction cost. Moreover, the storage efficiency and performance of PSG-Codes are also competitive with another stat-of-The-Art GF-based non-MDS codes, LRC codes.
AB - As hard disk failure rates are rarely improved and the reconstruction time for TB-level disks typically amounts to days, multiple concurrent disk/storage node failures in datacenter storage systems become common and frequent. As a result, the erasure coding schemes used in datacenters must meet the critical requirements of high fault tolerance, high storage efficiency, and fast fault recovery. In this paper, we introduce a new XOR-based non-MDS erasure code family with an ability of tolerating up to 12-disk/node failures, called PSG-Codes. The basic idea behind PSG-Codes is to partition disks into groups, and exploit short parity chains to generate parity units. Then, the parity chain is further shortened by varying the number of parity elements for each strip. We conduct a simulation-based study to search configuration parameter space of PSG-Codes, and prove that PSG-Codes can tolerate up to 12 disk/node failures. Compared with a well-known XOR-based non-MDS code, WEAVER codes, PSG-Codes have higher storage efficiency and lower reconstruction cost. Moreover, the storage efficiency and performance of PSG-Codes are also competitive with another stat-of-The-Art GF-based non-MDS codes, LRC codes.
KW - Storage systems
KW - erasure codes
KW - fast recovery
KW - fault tolerance
KW - reliability
UR - https://www.scopus.com/pages/publications/84960965854
U2 - 10.1109/SRDS.2015.39
DO - 10.1109/SRDS.2015.39
M3 - 会议稿件
AN - SCOPUS:84960965854
T3 - Proceedings of the IEEE Symposium on Reliable Distributed Systems
SP - 47
EP - 57
BT - Proceedings - 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshops, SRDSW 2015
PB - IEEE Computer Society
Y2 - 28 September 2015 through 1 October 2015
ER -