Overview of Redundant Disk Arrays

Andrew Robinson
University of Michigan
<androbin@umich.edu>

Redundant Arrays of
Inexpensive Disks (RAID)
What a cool idea!

Authors
• David A Patterson
• Garth Gibson
• Randy H Katz

Officially published in 1988.

Overview
• What is RAID?
• Why bother?
• What is RAID, really?
• How well does it work?
• How’s it holding up?

What is RAID?
• Take a bunch of disks and make them appear
as one disk.
• Put data on all of them
• Use all at once to gain performance
• Duplicate data to gain reliability
• Buy cheap disks to gain dollars

This seems like a lot of work…

why bother?

CPUs and Memory kept getting faster…

• Exponential growth everywhere!
• CPU Performance: 1.4X increase per year
– More transistors
– Better architecture
• Memory Performance: 1.4-2X increase per
year
– Invention of caches
– SRAM technology

… but disks did not.
• It’s hard to make things spin exponentially
faster every year (they tend to fly apart).
• Disk seek time improved at a rate of
approximately 7% a year.
• Caching had been employed to buffer I/O
activity, this works reasonably well for
predictable workloads.

Slow I/O Makes Slow Computers
• Amdahl’s Law describes the impact of only
improving some pieces, while leaving others.

1
S=
S – The effective speedup
F – Fraction of work in faster mode
(1- f ) + f / k K – Speedup while in faster mode

…really slow.
• If applications spend 10% of their time in I/O,
when computers are 10 times faster, they will
only appear 5% faster.

Something needed to be done.

What should we do?
• Single Large Expensive Disks (SLED) are not
improving fast enough.
• Larger memory or solid state drives weren’t
practical

• Small personal hard drives are emerging… can
we do something with those?

Why didn’t someone do this before?
• Standards like SCSI have finally allowed drive
makers to integrate features seen in
traditional mainframe controllers.

There is a problem…
• A hundredfold increase in number of disks
means a hundredfold increase decrease in
total reliability

MTTFSingleDisk
MTTFDiskArray =
nDisks

that’s all really nice, but

what is RAID, really?

A couple levels… a single idea
• RAID manages the tradeoff between
performance and reliability
• RAID comes in levels (RAID1 to RAID5)
• These levels represent points in the
performance reliability space

Groups, Disks, and Check Disks
• RAID organizes disks into groups of reliability
• Some of the disks in a group store error
correcting data

D = Total disks with data
G = Disks in a group
C = Number of check disks in a group

Metrics
• Useable Storage – Percent of storage that
holds data, excluding parity information
• Performance – Tough to make one number:
– Reads, Writes, and Read-Modify-Write Access
Patterns
– Sequential and Random Data Distribution

RAID1 – The Naive Approach
• Mirroring of all data
• To read:
– Use either disk
• To write:
– Send to both disks
simultaneously

• Minor read
performance increase.

Evaluation
Pros Cons
• Reads can occur • Useable storage is cut in
simultaneously half
• Seek times can improve • All other performance
with special controllers metrics are left the same
• Predictable performance

Alright for large sequential jobs and transaction
processing jobs

RAID2 – Bit Level Striping
• Uses Hamming Code for Error Detection
• Requires many check disks
– For 10 data disks, 4 check disks
– For 25 data disks, 5 check disks
• Can detect errors, and determine the at-fault
disk

Evaluation
Pros Cons
• Better useable storage, 71% • Dismal small random data
for G=10, 83% for G=25 access performance: 3-9%
of RAID1 or SLED

Good for large sequential jobs, bad for transaction
processing systems.

RAID3 – Byte Level Striping
• Simpler parity error correction
• Only a single check disk required for error
detection
• Cannot determine which disk failed, but that’s
usually pretty obvious
• Transfers of large continuous blocks is good

Evaluation
Pros Cons
• Even better useable • Small random data access
storage, 91% for G=10, 96% performance: Just as bad as
for G=25 RAID2

Even better for large sequential jobs, bad for
transaction processing systems.

What is parity?
• Parity is calculated as an XOR of the data
blocks.
• XOR is reversible:
– 1011 (A1) XOR 1100 (A2) => 0111 (AP) “parity”
– 0111 (AP) XOR 1011 (A1) => 1100 (A2)
– 0111 (AP) XOR 1100 (A2) => 1011 (A1)

• This makes error detection and reconstruction
possible!

RAID4 - Block Level Striping
• Like RAID3, but more parallelly
• Interleave data at sector level rather than bit
level
• Allows for servicing of multiple block requests
by different drives
• Still keeps all the parity information on a
single drive

Evaluation
Pros Cons
• Finally better small random • Small writes, and read-
access. Reads are fast! write-modifies are still slow.

Good for large sequential jobs, still not great for
transaction processing systems.

RAID5 – Block Level Striping with
Distributed Parity
• Instead of checksums on a single disk, we
distribute them across all disks.
• Allows us to support multiple writes per group

Evaluation
Pros Cons
• Really good usable storage • Slightly worse write
• Finally decent small random performance, data must be
data access performance written to two disks
across the board! simultaneously

Finally, a system that works well for both applications!

sounds complicated,

how well does it work?

As a Whole
• RAID has many different levels that achieve
different tradeoffs in reliability and
performance
• Almost all of them, for some (or many) use
cases will outperform a SLED for the same
cost.

Read-Modify-Write Per Disk
Performance

wow, raid sounds awesome,

how’s it holding up?

RAID has held up remarkably well
• Data centers around the world use RAID
technology.
• The small, inexpensive disk is the de facto
standard of storage
• The ideas developed for RAID have been
applied to many not-RAID things

Some open questions
• What will become of RAID as new, super fast
storage mediums start to become cost
effective?
• How does it fit in with massive internet-scale
storage farms?

Take Aways
• RAID offers significant advantage over SLED for
the same cost
– RAID5 offers 10x improvement in performance,
reliability, and power consumption while reducing size
of array.
• RAID allows for modular growth (add more disks)
• Cost effective option to meet challenge of
exponential growth in processor and memory
speeds

References
• “A Case for Redundant Arrays of Inexpensive
Disks” by David A Patterson, Garth Gibson,
and Randy H Katz
• “RAID: A Personal Recollection of How Storage
Became a System” by Randy H Katz
• Slides by David Luo and Ramasubramanian K.
• Images generously borrowed from Wikipedia
<http://en.wikipedia.org/wiki/RAID>

Overview of Redundant Disk Arrays

More Related Content

What's hot (20)

Similar to Overview of Redundant Disk Arrays (20)

Recently uploaded (20)

Overview of Redundant Disk Arrays

Editor's Notes