Copyright 2014 FUJITSU LABORATORIES LIMITED 
Erasure Code with Shingled Local Parity Groups for Efficient Recovery from Multiple Disk Failures 
Takeshi Miyamae, Takanori Nakao, Kensuke Shiozawa 
Fujitsu Laboratories Ltd. 
October 5th, 2014 (HotDep’14) 
0
1.Backgrounds and Our Proposal 
2.SHEC's Theoretical Analysis 
3.SHEC's Experimental Evaluation 
4.Summary 
Contents 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
1
1.Backgrounds and Our Proposal 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
2
Backgrounds (1) 
Erasure codes for content data 
Content data for ICT services is ever-growing 
Demand for higher space efficiency and durability 
Reed Solomon code (de facto erasure code) improves both 
3 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Reed Solomon Code 
(Old style)Triple Replication 
However, Reed Solomon code is not so recovery-efficient 
content data 
copy 
copy 
3x space 
parity 
parity 
1.5x space 
content data
Backgrounds (2) 
Local parity improves recovery efficiency 
Data recovery should be as efficient as possible 
•in order to avoid multiple disk failures and data loss 
Reed Solomon code is improved by local parity methods 
•data read from disks is reduced during recovery 
4 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Data Chunks 
Parity Chunks 
Reed Solomon Code (No Local Parities) 
Local Parities 
data read from disks 
However, multiple disk failures is out of consideration 
A Local Parity Method
Local parity method for multiple disk failures 
Existing methods are optimized for single disk failure 
•e.g. Microsoft MS-LRC, Facebook Xorbas 
However, Its recovery overhead is large in case of multiple disk failures 
•because they have a chance to use global parities for recovery 
Our Goal 
5 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
A Local Parity Method 
Our goal is a method efficiently handling multiple disk failures 
Multiple Disk Failures
SHEC (= Shingled Erasure Code) 
An erasure code only with local parity groups 
•to improve recovery efficiency in case of multiple disk failures 
The calculation ranges of local parities are shifted and partly overlap with each other (like the shingles on a roof) 
•to keep enough durability 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Our Proposal Method (SHEC) 
6 
k : data chunks (=10) 
m : parity chunks (=6) 
l : calculation range (=5)
2.SHEC's Theoretical Analysis 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
7
Erasure Code’s Properties 
8 
Copyright 2014 FUJITSU LIMITED 
Space Efficiency 
The ratio of user data 
Durability 
Probability of Data Loss (PDL) 
Recovery Efficiency The ratio of data read during recovery 
We picked three erasure code’s properties for SHEC’s theoretical analysis 
Three-Way Trade-Off 
The properties satisfy a three-way trade-off relationship
Copyright 2014 FUJITSU LABORATORIES LIMITED 
High Recovery Efficiency from Multiple Disk Failures 
The amount of data read from disks is minimized 
•(e.g.) When D6/D9 break out, SHEC will select P3/P4 for recovery 
SHEC’s Recovery Efficiency 
9 
No need to be read 
a minimum union of calculation 
ranges including D6/D9 
Recovery efficiency is one of the biggest features of SHEC
SHEC is expected to recover more efficiently than the other methods in case of multiple disk failures 
Other methods : Reed Solomon, MS-LRC and Xorbas 
Comparison with Other Methods 
10 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
multiple disk failures
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Durability Estimator (=ml/k) 
Indicates the number up to how many disks can be failed 
Therefore, ml/k+1 disk failures can cause data loss 
•(e.g.) SHEC(10,6,5)’s durability estimator is three. Therefore, four failures of D1/P1/P5/P6 cause data loss because D1 cannot be recovered from the remaining chunks 
SHEC’s Durability 
11 
Durability Estimator 
ml/k = 3 
k =10 
m = 6 
l = 5
Upper area becomes sparse 
Reed Solomon code has few recovery-efficient layouts 
Property Map of Reed Solomon code 
12 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Durability(PDL) 
1e-44 
1e-0 
Recovery Efficiency 
Space Efficiency 
RAID6=RS(4,2) 
sparse
Upper area is filled with SHEC-specific layouts 
SHEC provides many recovery-efficient layouts 
SHEC is more adjustable than Reed Solomon code 
Property Map of SHEC 
13 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Durability(PDL) 
1e-44 
1e-0 
Recovery Efficiency 
Space Efficiency 
RAID6=RS(4,2) 
SHEC(6,5,2) 
dense
Single disk failure case 
MS-LRC is plotted farther from the origin (= superior) 
SHEC is plotted in a broader area (= more flexible) 
Comparison with MS-LRC (1) 
14 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
(conditions: 16 OSDs) 
SHEC 
MS-LRC emulation 
Space Efficiency 
Space Efficiency 
Recovery Efficiency 
Recovery Efficiency 
durability 
durability
Double disk failures case 
Both are plotted at the same distance from the origin 
SHEC is plotted in a broader area (=more flexible) 
Comparison with MS-LRC (2) 
15 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
(conditions: 16 OSDs) 
MS-LRC emulation 
SHEC 
Space Efficiency 
Space Efficiency 
Recovery Efficiency 
Recovery Efficiency 
durability 
durability
3.SHEC's Experimental Evaluation 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
16
SHEC is implemented as an erasure code plugin of Ceph, an open source scalable object storage 
SHEC’s Implementation on Ceph 
17 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
4MB objects are split 
into data/parity chunks, 
distributed over OSDs 
encode/decode logic is separated 
from main part of Ceph Storage 
SHEC plugin
Experiment of Recovery Efficiency 
Experiment Abstract 
Test items : Recovery completion time / Resource profiles 
Failure degree : Double disk failures 
Comparison : Reed Solomon RS(6,4) / SHEC(6,4,3) 
18 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
Hardware and Software Setup
SHEC’s recovery completion time was 18.6% faster 
OTOH, total amount of data read from disks was 26% decreased (= theoretical improvement) 
Recovery Completion Time 
19 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
18.6% 
Why were not these figures the same?
Disks were only partly (65%) bottlenecked 
The Reason (= Disk utilization) 
20 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
65% (bottlenecked time ratio) 
There is 35% room for recovery time improvement
4.Summary 
Copyright 2014 FUJITSU LABORATORIES LIMITED 
21
Copyright 2014 FUJITSU LABORATORIES LIMITED 
1.We proposed Shingled Erasure Code (SHEC) 
SHEC is recovery-efficient especially in case of multiple disk failures 
2.We found SHEC is more adjustable than Reed Solomon code 
because SHEC provides many recovery-efficient layouts including Reed Solomon codes 
3.We confirmed SHEC’s recovery efficiency in an experiment 
SHEC’s recovery time was 18.6% faster than Reed Solomon code in case of double disk failures 
Summary 
22
Shingled Erasure Code (SHEC) at HotDep'14

More Related Content

PPT
Real IO and Parallel NetCDF4 Performance
PDF
Storage tiering and erasure coding in Ceph (SCaLE13x)
PDF
Erasure Code in Ceph
PDF
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
PPTX
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
PDF
BlueStore: a new, faster storage backend for Ceph
PPTX
EMC Multisite DR for SQL Server 2012
PDF
A4 oracle's application engineered storage your application advantage
Real IO and Parallel NetCDF4 Performance
Storage tiering and erasure coding in Ceph (SCaLE13x)
Erasure Code in Ceph
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
BlueStore: a new, faster storage backend for Ceph
EMC Multisite DR for SQL Server 2012
A4 oracle's application engineered storage your application advantage

Similar to Shingled Erasure Code (SHEC) at HotDep'14 (20)

PDF
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
PDF
Apache CarbonData:New high performance data format for faster data analysis
PPTX
Configuring Aerospike - Part 2
PPTX
Xiotech Redefining Storage Value
PDF
MySQL Performance Metrics that Matter
PPTX
Group04
PPTX
Group04 (4)
DOC
Using preferred read groups in oracle asm michael ault
PPTX
Group04
PDF
Runtime Performance Optimizations for an OpenFOAM Simulation
PDF
Mastering Data Management: Leveraging FME for Cloud Native Databases
PDF
Apouc 2014-enterprise-manager-12c
PDF
The benefits of IBM FlashSystems
PPT
A novel method to extend flash memory lifetime in flash based dbms
PDF
Ceph Performance on OpenStack - Barcelona Summit
PPTX
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
PPT
Data protection for oracle databases
PDF
Streaming solutions for real time problems
PDF
[NetApp] Simplified HA:DR Using Storage Solutions
PDF
A NEW MULTI-TIERED SOLID STATE DISK USING SLC/MLC COMBINED FLASH MEMORY
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
Apache CarbonData:New high performance data format for faster data analysis
Configuring Aerospike - Part 2
Xiotech Redefining Storage Value
MySQL Performance Metrics that Matter
Group04
Group04 (4)
Using preferred read groups in oracle asm michael ault
Group04
Runtime Performance Optimizations for an OpenFOAM Simulation
Mastering Data Management: Leveraging FME for Cloud Native Databases
Apouc 2014-enterprise-manager-12c
The benefits of IBM FlashSystems
A novel method to extend flash memory lifetime in flash based dbms
Ceph Performance on OpenStack - Barcelona Summit
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Data protection for oracle databases
Streaming solutions for real time problems
[NetApp] Simplified HA:DR Using Storage Solutions
A NEW MULTI-TIERED SOLID STATE DISK USING SLC/MLC COMBINED FLASH MEMORY
Ad

Recently uploaded (20)

PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Enhancing emotion recognition model for a student engagement use case through...
DOCX
search engine optimization ppt fir known well about this
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
What is a Computer? Input Devices /output devices
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Five Habits of High-Impact Board Members
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Chapter 5: Probability Theory and Statistics
Convolutional neural network based encoder-decoder for efficient real-time ob...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
A contest of sentiment analysis: k-nearest neighbor versus neural network
Microsoft Excel 365/2024 Beginner's training
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Custom Battery Pack Design Considerations for Performance and Safety
Enhancing emotion recognition model for a student engagement use case through...
search engine optimization ppt fir known well about this
2018-HIPAA-Renewal-Training for executives
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Developing a website for English-speaking practice to English as a foreign la...
UiPath Agentic Automation session 1: RPA to Agents
What is a Computer? Input Devices /output devices
Getting started with AI Agents and Multi-Agent Systems
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
A proposed approach for plagiarism detection in Myanmar Unicode text
Five Habits of High-Impact Board Members
Ad

Shingled Erasure Code (SHEC) at HotDep'14

  • 1. Copyright 2014 FUJITSU LABORATORIES LIMITED Erasure Code with Shingled Local Parity Groups for Efficient Recovery from Multiple Disk Failures Takeshi Miyamae, Takanori Nakao, Kensuke Shiozawa Fujitsu Laboratories Ltd. October 5th, 2014 (HotDep’14) 0
  • 2. 1.Backgrounds and Our Proposal 2.SHEC's Theoretical Analysis 3.SHEC's Experimental Evaluation 4.Summary Contents Copyright 2014 FUJITSU LABORATORIES LIMITED 1
  • 3. 1.Backgrounds and Our Proposal Copyright 2014 FUJITSU LABORATORIES LIMITED 2
  • 4. Backgrounds (1) Erasure codes for content data Content data for ICT services is ever-growing Demand for higher space efficiency and durability Reed Solomon code (de facto erasure code) improves both 3 Copyright 2014 FUJITSU LABORATORIES LIMITED Reed Solomon Code (Old style)Triple Replication However, Reed Solomon code is not so recovery-efficient content data copy copy 3x space parity parity 1.5x space content data
  • 5. Backgrounds (2) Local parity improves recovery efficiency Data recovery should be as efficient as possible •in order to avoid multiple disk failures and data loss Reed Solomon code is improved by local parity methods •data read from disks is reduced during recovery 4 Copyright 2014 FUJITSU LABORATORIES LIMITED Data Chunks Parity Chunks Reed Solomon Code (No Local Parities) Local Parities data read from disks However, multiple disk failures is out of consideration A Local Parity Method
  • 6. Local parity method for multiple disk failures Existing methods are optimized for single disk failure •e.g. Microsoft MS-LRC, Facebook Xorbas However, Its recovery overhead is large in case of multiple disk failures •because they have a chance to use global parities for recovery Our Goal 5 Copyright 2014 FUJITSU LABORATORIES LIMITED A Local Parity Method Our goal is a method efficiently handling multiple disk failures Multiple Disk Failures
  • 7. SHEC (= Shingled Erasure Code) An erasure code only with local parity groups •to improve recovery efficiency in case of multiple disk failures The calculation ranges of local parities are shifted and partly overlap with each other (like the shingles on a roof) •to keep enough durability Copyright 2014 FUJITSU LABORATORIES LIMITED Our Proposal Method (SHEC) 6 k : data chunks (=10) m : parity chunks (=6) l : calculation range (=5)
  • 8. 2.SHEC's Theoretical Analysis Copyright 2014 FUJITSU LABORATORIES LIMITED 7
  • 9. Erasure Code’s Properties 8 Copyright 2014 FUJITSU LIMITED Space Efficiency The ratio of user data Durability Probability of Data Loss (PDL) Recovery Efficiency The ratio of data read during recovery We picked three erasure code’s properties for SHEC’s theoretical analysis Three-Way Trade-Off The properties satisfy a three-way trade-off relationship
  • 10. Copyright 2014 FUJITSU LABORATORIES LIMITED High Recovery Efficiency from Multiple Disk Failures The amount of data read from disks is minimized •(e.g.) When D6/D9 break out, SHEC will select P3/P4 for recovery SHEC’s Recovery Efficiency 9 No need to be read a minimum union of calculation ranges including D6/D9 Recovery efficiency is one of the biggest features of SHEC
  • 11. SHEC is expected to recover more efficiently than the other methods in case of multiple disk failures Other methods : Reed Solomon, MS-LRC and Xorbas Comparison with Other Methods 10 Copyright 2014 FUJITSU LABORATORIES LIMITED multiple disk failures
  • 12. Copyright 2014 FUJITSU LABORATORIES LIMITED Durability Estimator (=ml/k) Indicates the number up to how many disks can be failed Therefore, ml/k+1 disk failures can cause data loss •(e.g.) SHEC(10,6,5)’s durability estimator is three. Therefore, four failures of D1/P1/P5/P6 cause data loss because D1 cannot be recovered from the remaining chunks SHEC’s Durability 11 Durability Estimator ml/k = 3 k =10 m = 6 l = 5
  • 13. Upper area becomes sparse Reed Solomon code has few recovery-efficient layouts Property Map of Reed Solomon code 12 Copyright 2014 FUJITSU LABORATORIES LIMITED Durability(PDL) 1e-44 1e-0 Recovery Efficiency Space Efficiency RAID6=RS(4,2) sparse
  • 14. Upper area is filled with SHEC-specific layouts SHEC provides many recovery-efficient layouts SHEC is more adjustable than Reed Solomon code Property Map of SHEC 13 Copyright 2014 FUJITSU LABORATORIES LIMITED Durability(PDL) 1e-44 1e-0 Recovery Efficiency Space Efficiency RAID6=RS(4,2) SHEC(6,5,2) dense
  • 15. Single disk failure case MS-LRC is plotted farther from the origin (= superior) SHEC is plotted in a broader area (= more flexible) Comparison with MS-LRC (1) 14 Copyright 2014 FUJITSU LABORATORIES LIMITED (conditions: 16 OSDs) SHEC MS-LRC emulation Space Efficiency Space Efficiency Recovery Efficiency Recovery Efficiency durability durability
  • 16. Double disk failures case Both are plotted at the same distance from the origin SHEC is plotted in a broader area (=more flexible) Comparison with MS-LRC (2) 15 Copyright 2014 FUJITSU LABORATORIES LIMITED (conditions: 16 OSDs) MS-LRC emulation SHEC Space Efficiency Space Efficiency Recovery Efficiency Recovery Efficiency durability durability
  • 17. 3.SHEC's Experimental Evaluation Copyright 2014 FUJITSU LABORATORIES LIMITED 16
  • 18. SHEC is implemented as an erasure code plugin of Ceph, an open source scalable object storage SHEC’s Implementation on Ceph 17 Copyright 2014 FUJITSU LABORATORIES LIMITED 4MB objects are split into data/parity chunks, distributed over OSDs encode/decode logic is separated from main part of Ceph Storage SHEC plugin
  • 19. Experiment of Recovery Efficiency Experiment Abstract Test items : Recovery completion time / Resource profiles Failure degree : Double disk failures Comparison : Reed Solomon RS(6,4) / SHEC(6,4,3) 18 Copyright 2014 FUJITSU LABORATORIES LIMITED Hardware and Software Setup
  • 20. SHEC’s recovery completion time was 18.6% faster OTOH, total amount of data read from disks was 26% decreased (= theoretical improvement) Recovery Completion Time 19 Copyright 2014 FUJITSU LABORATORIES LIMITED 18.6% Why were not these figures the same?
  • 21. Disks were only partly (65%) bottlenecked The Reason (= Disk utilization) 20 Copyright 2014 FUJITSU LABORATORIES LIMITED 65% (bottlenecked time ratio) There is 35% room for recovery time improvement
  • 22. 4.Summary Copyright 2014 FUJITSU LABORATORIES LIMITED 21
  • 23. Copyright 2014 FUJITSU LABORATORIES LIMITED 1.We proposed Shingled Erasure Code (SHEC) SHEC is recovery-efficient especially in case of multiple disk failures 2.We found SHEC is more adjustable than Reed Solomon code because SHEC provides many recovery-efficient layouts including Reed Solomon codes 3.We confirmed SHEC’s recovery efficiency in an experiment SHEC’s recovery time was 18.6% faster than Reed Solomon code in case of double disk failures Summary 22