SlideShare a Scribd company logo
1
Mass-StorageStructure (Galvin Notes, 9th Ed.)
Chapter 10: Mass-Storage Structure
 OVERVIEW OF MASS-STORAGE STRUCTURE
 Magnetic Disks
 Solid-State Disks
 DISK STRUCTURE
 DISK ATTACHMENT
 Host-Attached Storage
 Network-Attached Storage
 Storage-Area Network
 DISK SCHEDULING
 FCFS Scheduling
 SSTF Scheduling
 Scan Scheduling
 C-SCAN Scheduling
 LOOK Scheduling
 Selection of a Disk-Scheduling Algorithm
 DISK MANAGEMENT
 Disk Formatting
 Boot Block
 Bad Blocks
 SWAP-SPACE MANAGEMENT
 Swap-Space Use
 Swap-Space Location
 Swap-Space Management: An Example
 RAID STRUCTURE
 Improvement of Reliability via Redundancy
 Improvement in Performance via Parallelism
 RAID Levels
 Selecting a RAID Level
 Extensions
 Problems with RAID
SKIPPED CONTENT
 STABLE-STORAGE IMPLEMENTATION ( OPTIONAL )
 TERTIARY-STORAGE STRUCTURE - OPTIONAL, OMITTED FROM NINTH EDITION
 Tertiary-Storage Devices (Removable Disks, Tapes, Future Technology)
 Operating-System Support (Application Interface, File Naming, Hierarchical Storage Management)
 Performance Issues (Speed, Reliability, Cost)
Content
OVERVIEW OF MASS-STORAGE STRUCTURE
Magnetic Disks
 Traditional magnetic disks have the following basic structure:
One or more platters in the form of disks covered with
magnetic media. Eachplatter has two working surfaces. Each
workingsurface is divided into a number of concentric rings
called tracks. The collection of all tracks that are the same
distance from the edge of the platter, (i.e. all tracks
immediatelyabove one another inthe following diagram) is
called a cylinder. Each track is further divided into sectors,
traditionallycontaining 512 bytes of data each, although some
modern disks occasionallyuse larger sector sizes. (Sectors also
include a header anda trailer, including checksuminformation
among other things. Larger sector sizes reduce the fraction of
the disk consumed by headers and trailers, but increase
internal fragmentation and the amount of disk that must be marked bad in the case of errors.)
2
Mass-StorageStructure (Galvin Notes, 9th Ed.)
 The data ona hard drive is readbyread-write heads. The standardconfiguration (shownbelow) uses one headper surface, each on a
separate arm, andcontrolled bya common arm assemblywhich moves all heads simultaneouslyfrom one cylinder to another. (Other
configurations, including independent read-write heads, may speed up disk access, but involve serious technical difficulties.)
 The storage capacityof a traditional diskdrive is equalto the number of heads (i.e. the number of working surfaces), times the
number of tracks per surface, times the number of sectors per track, times the number of bytes per sector. A particular physicalblock
of data is specified by providing the head-sector-cylinder number at which it is located.
 The rate at which data can be transferred from the disk to the computer is composed of several steps:
o The positioning time, a.k.a. the seek time or random access time is the time requiredto move the heads fromone cylinder
to another, andfor the heads to settle downafter the move. This is typically the slowest step in the process and the
predominant bottleneck to overall transfer rates.
o The rotational latency is the amount oftime requiredfor the desiredsector to rotate around and come under the read -
write head.Thiscanrange anywhere fromzeroto one full revolution, andonthe average willequal one-half revolution. This
is another physical step and is usuallythe secondslowest step behind seek time. (For a disk rotating at 7200 rpm, the
average rotational latencywould be 1/2 revolution/ 120 revolutions per second, or just over 4 milliseconds, a long time by
computer standards.
o The transferrate, which is the time requiredto move the data electronicallyfrom the diskto the computer. (Some authors
include the seek time and rotational latency as well as the electronic data transfer rate.)
 The host controller/host adapter is at the computer endof the I/O bus, andthe disk controller is built into the disk itself. The CPU
issues commands to the host controller via I/O ports. Data is transferredbetweenthe magnetic surface andonboard cache bythe disk
controller, andthenthe data is transferredfrom that ca che to the host controller andthe motherboardmemoryat electronic speeds.
Solid-State Disks:
SSDs use memorytechnologyas a small fast hard disk. Specific implementations may use either flash memory or DRAM chips protected by a
batteryto sustainthe information through power cycles. Because SSDs have no moving parts theyare muchfaster thantra ditional hard drives, and
certainproblems suchas the scheduling of disk accessessimplydo not apply. However SSDs alsohave their weaknesses: They a re more expensive
than hard drives, generallynot as large, andmayhave shorter life spans. SSDs are especiallyuseful as a high-speed cache of hard-disk information
that must be accessed quickly. One example is to store filesystemmeta-data, e.g. directoryandinode information, that must be accessed quicklyand
often. Another variationis a boot diskcontainingthe OS and some application executables, but novital user data. SSDs are also used in laptops to
make themsmaller, faster, andlighter. Because SSDs are somuchfaster than traditional hard disks, the throughput ofthe bus canbecome a limiting
factor, causing some SSDs to be connected directly to the system PCI bus for example.
DISK STRUCTURE
 The traditional head-sector-cylinder, HSCnumbers are mappedto linear blockaddresses bynumberingthe first sector on the first head on
the outermost track as sector 0. Numberingproceeds withthe rest ofthe sectors on that same track, andthen the rest of the tracks onthe
same cylinder before proceedingthroughthe rest ofthe cylinders to the center of the disk. In modernpractice these linear block addresses
are usedin place of the HSCnumbers for a varietyof reasons: 1) The linear lengthof tracks near the outer edge ofthe diskis much longer
than for those tracks locatednear the center, andtherefore it is possible to squeeze manymore sectors onto outer tracks thanonto inner
ones. 2) All disks have some badsectors, and therefore disks maintain a few spare sectors that canbe usedin place of the bad ones. The
mapping ofspare sectors to badsectors inmanagedinternally to the disk controller. 3) Modern hard drives can have thousands of
cylinders, and hundreds ofsectors per trackon their outermost tracks. These numbers exceed the range of HSCnumbers for many (older)
operatingsystems, andtherefore disks canbe configuredfor anyconvenient combination of HSCvaluesthat fallswithinthe total number
of sectors physically on the drive.
 Modern disks packmanymore sectors into outer cylinders than inner ones, usingone of two approaches: With Constant Linear Velocity,
CLV, the densityof bits is uniform fromcylinder to cylinder. Because there are more sectors in outer cylinders, the diskspins slower when
reading those cylinders, causingthe rate of bits passingunder the read-write head to remain constant. This is the approach used by
modern CDs andDVDs. With Constant Angular Velocity, CAV, the diskrotates at a constant angular speed, withthe bit densitydecreasing
on outer cylinders. (These disks would have a constant number of sectors per track on all cylinders.)
DISK ATTACHMENT
Diskdrives canbe attachedeither directlyto a particular host (a local disk) or to a network.
 Host-Attached Storage: Local disks are accessed throughI/O Ports. The most commoninterfaces are IDE or ATA, eachof which allow up to
two drives per host controller. SATA is similar with simpler cabling. Highendworkstations or other systems in need of large r number of
disks typicallyuse SCSI disks:The SCSI standard supports up to 16 targets oneach SCSI bus, one of which is generallythe host adapter and
the other 15 of whichcanbe disk or tape drives. A SCSI target is usuallya single drive, but the standardalsosupports up to 8 units within
each target. These wouldgenerallybe usedfor accessing individual disks withina RAIDarray. The SCSI standardalso supports multiple host
adapters in a single computer, i.e. multiple SCSI busses. SCSI cablesmaybe either 50 or 68 conductors. SCSI devices maybe external as well
as internal.
FC is a high-speedserialarchitecture that canoperate over optical fiber or four-conductor copper wires, andhas two variants: 1) A large
switched fabric having a 24-bit address space. This variant allows for multiple devices andmultiple hosts to interconnect, forming the basis
for the storage-area networks (SANs). 2) The arbitrated loop, FC-AL, that can address up to 126 devices (drives and controllers).
 Network-Attached Storage: Network attachedstorage connects storage devices to computers using a remote procedure call, RPC,
interface, typicallywithsomethinglike NFS filesystem mounts. This is convenient for allowing several computers ina group commonaccess
and namingconventions for shared storage. NAS can be implementedusingSCSI cabling, or ISCSI using Internet protocols and standard
3
Mass-StorageStructure (Galvin Notes, 9th Ed.)
network connections, allowinglong-distance remote accessto sharedfiles. NAS allows computers to easilyshare data storage, but tends to
be less efficient than standard host-attached storage.
 Storage-Area Network: A Storage-Area Network, SAN, connects computers andstorage devices in a network, using storage protocols
insteadof network protocols. One advantage of this is that storage access does not tie up regular networking bandwidth. SAN is very
flexible anddynamic, allowing hosts anddevices to attachand detach onthe fly. SAN is alsocontrollable, allowing restricted access to
certain hosts and devices.

DISK SCHEDULING
Disktransfer speeds are limitedprimarilybyseek timesandrotationallatency. Whenmultiple requests are to be processedthere is alsosome
inherent delayin waiting for other requests to be processed.
Bandwidthis measured bythe amount of data transferreddivided bythe total amount of time fromthe first request being made to the last
transfer beingcompleted (for a series ofdiskrequests). Both bandwidthandaccess time canbe improved byprocessing requests ina goodorder.
(Disk requests include the disk address, memoryaddress, number of sectors to transfer, andwhether the request is for readingor writing.)
 FCFS Scheduling: It is simple but not veryefficient. Consider in the following sequence the wildswing fromcylinder 122 to 14 and then
back to 124:
 SSTF (Shortest Seek Time First) Scheduling: It is more efficient, but mayleadto starvationif a constant stream ofrequests arrives for the
same generalarea ofthe disk. SSTFreduces the total head movement to 236 cylinders, down from 640 required f or the same set of
requests under FCFS. Note, however that the distance couldbe reduced stillfurther to 208 by starting with 37 and then 14 fi rst before
processing the rest of the requests.
 SCAN Scheduling: The SCAN algorithm, a.k.a. the elevator algorithm moves backandforth fromone end ofthe diskto the other, similarly
to an elevator processing requests in a tall building. Under the SCAN algorithm, If a request arrives just ahead ofthe movin g head then it
will be processedright away, but ifit arrives just after the head has passed, thenit will have to wait for the head to pass going the other
wayon the return trip. This leads to a fairlywide variationinaccesstimes which can be improvedupon. Consider, for examp le, when the
headreaches the high endof the disk:Requests with highcylinder numbers just missedthe passing head, whichmeans they are all fairly
recent requests, whereas requests withlownumbers mayhave beenwaiting for a muchlonger time. Making the return scan from high to
low then ends up accessing recent requests first and making older requests wait that much longer.
 C-SCAN Scheduling: The Circular-SCAN algorithm improves upon SCAN bytreating all requests ina circular queue fashion - Once the head
reaches the endof the disk, it returns to the other end without processinganyrequests, and thenstarts againfromthe beginningof the
disk:
4
Mass-StorageStructure (Galvin Notes, 9th Ed.)
 LOOK Scheduling: LOOK scheduling improves upon SCAN bylooking ahead at the
queue of pending requests, andnot moving the heads anyfarther towards the end
of the disk thanis necessary. The following diagramillustrates the circular form of
LOOK.
 Selection of a Disk-Scheduling Algorithm: With verylowloads all algorithms are
equal, since there will normally only be one request to process at a time. For
slightlylarger loads, SSTFoffers better performance than FCFS, but may lead to
starvationwhenloads become heavyenough. For busier systems, SCAN and LOOK
algorithms eliminate starvation problems. Some improvement to overall filesystem
access times canbe made by intelligent placement of directory and/or inode
information. Ifthosestructures are placedinthe middle of the disk instead of at
the beginning of the disk, thenthe maximum distance from those structures to
data blocks is reducedto onlyone-halfof the disk size. Ifthose structures can be
further distributedandfurthermore have their data blocks stored as close as
possible to the corresponding directorystructures, then that reduces stillfurther the overall time to findthe diskblock n umbers and then
access the corresponding data blocks.
On modern disks the rotational latencycanbe almost as significant as the seek time, however it is not within the OSes control to
account for that, because modern disks do not reveal their internal sector mapping schemes, (particularly when bad blocks have been
remappedto spare sectors.)Some disk manufacturers provide for disk scheduling algorithms directlyon their disk controllers, (which do
know the actual geometryof the disk as well as anyremapping), sothat if a seriesof requests are sent from the computer to the controller
then those requests can be processed in an optimal order.
Unfortunatelythere are some considerations that the OS must take intoaccount that are beyond the abilities of the on -board disk-
scheduling algorithms, suchas priorities of some requests over others, or the needto process certainrequests in a particular order. For this
reason OSes may elect to spoon-feed requests to the disk controller one at a time in certain situations.
DISK MANAGEMENT
Disk Formatting:
 Before a diskcanbe used, it has to be low-level formatted, whichmeans laying down all ofthe headers andtrailers marking the beginning
and ends of eachsector. Included inthe header and trailer are the linear sector numbers, and error-correcting codes (ECC) which allow
damagedsectors to not onlybe detected, but in manycases for the damaged data to be recovered (depending on the extent of t he
damage).
 ECC calculationis performed witheverydisk read or write, andifdamage is detected but the data is recoverable, then a soft error has
occurred. Soft errors are generally handled by the on-board disk controller, and never seen by the OS.
 Once the diskis low-level formatted, the next stepis to partition the drive into one or more separate partitions. This step must be
completedeven if the disk is to be usedas a single large partition, sothat the partitiontable can be writtento the begin ning of the disk.
 After partitioning, thenthe filesystems must be logicallyformatted, which involves laying down the master directory information (FAT
table or inode structure), initializing free lists, andcreatingat least the root directoryof the filesystem. (Disk partiti ons which are to be
usedas raw devices are not logicallyformatted. Thissaves the overhead anddisk space of the filesystemstructure, but requires that the
application program manage its own disk storage requirements.)
Boot Block:
 Computer ROMcontains a bootstrap program(OS independent) withjust enough code to find
the first sector on the first harddrive on the first controller, loadthat sector intomemory, and
transfer control over to it. (The ROMbootstrapprogram maylookinfloppyand/or CD drives
before accessing the harddrive, andis smart enough to recognize whether it has found valid
boot code or not.)
5
Mass-StorageStructure (Galvin Notes, 9th Ed.)
 The first sector on the hard drive is knownas the Master Boot Record, MBR, andcontains a verysmallamount of code in addition to the
partitiontable. The partitiontable documents how the disk is partitionedintological disks, andindicatesspecificallywhich partition is the
active or boot partition. (MBR has the boot program eh? Yes and a lot more, says Wiki)
 The boot program thenlooks to the active partitionto findan operating system, possiblyloadingup a slightlylarger / more advanced boot
program along the way. Ina dual-boot (or larger multi-boot) system, the user maybe givena choice of which operating system to boot,
with a default action to be taken in the event of no response within some time frame.
 Once the kernel is foundbythe boot program, it is loadedintomemoryandthen control is transferred over to the OS. The kernel will
normallycontinue the boot process byinitializing all important kerneldata structures, launching important systemservices (e.g. network
daemons, sched, init, etc.), andfinallyproviding one or more loginprompts. Boot options at this stage may include single -user a.k.a.
maintenance or safe modes, inwhichveryfew system servicesare started - These modes are designedfor systemadministrators to repair
problems or otherwise maintain the system.
Bad Blocks:
 In the olddays, formattingof the diskor running certain disk-analysistools wouldidentifybadblocks, and attempt to read the data off of
them one last time throughrepeatedtries. Then the badblocks would be mapped out and taken out of future service. Modern disk
controllers make muchbetter use of the error-correcting codes, sothat badblocks canbe detectedearlier andthe data usuallyrecovered.
(Recallthat blocks are testedwitheverywrite as well as with everyread, sooftenerrors canbe detected before the write operation is
complete, and the data simply written to a different sector instead.)
 Note that re-mapping of sectors from their normal linear progressioncanthrowoff the diskschedulingoptimization ofthe OS, especially if
the replacement sector is physicallyfar awayfrom the sector it is replacing. For this reasonmost disks normallykeep a few spare sectors
on eachcylinder, as well as at least one spare cylinder. Whenever possible a bad sector will be mapped to another sector on the same
cylinder, or at least a cylinder as close as possible. Sector slipping mayalso be performed, in whichallsectors betweenth e bad sector and
the replacement sector are moveddownbyone, so that the linear progressionof sector numbers can be maintained. Ifthe data on a bad
block cannot be recovered, thena hard error has occurred., which requires replacingthe file(s) from backups, or rebuilding them from
scratch.
SWAP-SPACEMANAGEMENT
 Swap-Space Use: The amount of swapspace neededbyanOS varies greatlyaccording to how it is used. Some systems require an amount
equal to physical RAM;some want a multiple of that;some want an amount equal to the amount by which virtual memory exceeds
physical RAM, and some systems use little or none at all! Some systems support multiple swap spaces onseparate disks inorder to spee d
up the virtual memory system.
 Swap-Space Location: Swap space can be physicallylocatedinone oftwo locations – (A) As a large file which is part of the regular file-
system. This is easyto implement, but inefficient. Not onlymust the swap space be accessedthroughthe directorysystem, th e file is also
subject to fragmentationissues. Cachingthe block locationhelps infinding the physical blocks, but that is not a complete fix. (B) As a raw
partition, possiblyon a separate or little-useddisk. This allows the OS more control over swapspace management, whichis usually faster
and more efficient. Fragmentationof swap space is generally not a big issue, as the space is re-initialized every time the system is
rebooted. The downside of keeping swap space on a raw partition is that it can only be grown by repartitioning the hard drive .
 Swap-Space Management: An Example — HistoricallyOSes swappedout entire processes as needed. Modern systems swap out only
individualpages, and onlyas needed. (For example process
code blocks andother blocks that have not been changed
since theywere originally loaded are normally just freed
from the virtual memorysystem rather than copying them
to swapspace, because it is faster to go find them again in
the filesystem and read them back in from there than to
write themout to swap space andthen readthem back.) In
the mapping system shown belowfor Linux systems, a map
of swap space is kept in memory, where each entry
corresponds to a 4Kblock inthe swapspace. Zeros indicate
free slots andnon-zeros refer to how manyprocesses have a
mapping to that particular block (>1 for shared pages only.)
RAID STRUCTURE
The generalidea behind RAIDis to employa group ofharddrives together withsome form of duplication, either to increase reliabilityor to speed up
operations, (or sometimes both.)RAID originallystood for Redundant Arrayof Inexpensive Disks, andwas designedto use a bunch of cheap small
disks in place of one or two larger more expensive ones. TodayRAID systems employlarge possiblyexpensive disks as their co mponents, switching
the definition to Independent disks.
Improvement of Reliability via Redundancy
 The more disks a systemhas, the greater the likelihood that one ofthem will gobadat anygiventime. Hence increasing disks on a system
actuallydecreasesthe MeanTime To Failure, MTTF of the system. If, however, the same data was copied onto multiple disks, then the
data wouldnot be lost unless both(or all)copies of the data were damagedsimultaneously, whichis a MUCH lower probability than for a
single diskgoingbad. More specifically, the seconddiskwould have to go badbefore the first diskwas repaired, which brings the Mean
6
Mass-StorageStructure (Galvin Notes, 9th Ed.)
Time To Repair , MTTR into play. This is the basic idea behind disk mirroring, in which a systemcontains identical data on two or more
disks.
Improvement in Performance via Parallelism
 There is also a performance benefit to mirroring, particularlywithrespect to reads. Since everyblock of data is duplicated on multiple
disks, readoperations canbe satisfiedfrom anyavailable copy, andmultiple disks can be readingdifferent data blocks simultaneously in
parallel. (Writes couldpossiblybe spedup as well through careful scheduling algorithms, but it would be complicated in pra ctice.)
 Another wayof improvingdiskaccesstime is with striping, whichbasicallymeans spreading data out across multiple disks that can be
accessed simultaneously. Withbit-level stripingthe bits ofeachbyte are stripedacross multiple disks. For example if 8 disks were involved,
then each8-bit byte wouldbe readinparallel by8 heads onseparate disks. A single disk read would access 8 * 512 bytes = 4K worth of
data in the time normallyrequiredto read512 bytes. Similarlyif 4 disks were involved, thentwo bits of each byte couldbe stored on each
disk. Block-level striping spreads a file-systemacross multiple disks on a block-by-block basis, so if
block N were located ondisk0, thenblock N + 1 would be ondisk1, andso on. This is particularly
useful when file-systems are accessedinclusters of physical blocks. Other striping possibilitiesexist,
with block-level striping being the most common.
Raid Levels
Mirroring providesreliability but is expensive; Striping improves performance, but does not improve
reliability. Accordinglythere are a number of different schemes that combine the principalsof mirroring and
striping indifferent ways, inorder to balance reliabilityversus performance versus cost. These are described
bydifferent RAIDlevels, as follows:(Inthe diagram that follows, "C" indicates a copy, and"P" indicates parity, i.e. checksum bits.). (Geekstuffsays:In
most situations, you would be usinglevels0, 1, 5 and 10 /1+0, 5 and 10 in critical servers. There are
several non-standardraids, which are not used except in some rare situations – RAID2, RAID 3, RAID 4
and RAID 6).
 RaidLevel 0 - This level includesstriping only, with nomirroring. (Blocks striped, no mirrors,
no parity. Minimum 2 disks)
 RaidLevel 1 - This level includesmirroring only, no striping. (No stripe, noparity. Minimum
2 disks)
 RaidLevel 2 - This level stores error-correcting codes onadditional disks, allowing for any
damageddata to be reconstructedbysubtractionfrom the remaining undamageddata. Note
that this scheme requires onlythree extra disks to protect 4 disks worthof data, as opposed
to full mirroring. (The number of disks requiredis a function of the error-correcting
algorithms, andthe means bywhich the particular bad bit(s) is (are) identified.)-
(Book)
(http://www.thegeekstuff.com/2011/11/raid2-raid3-raid4-raid6) - This uses bit
level striping. i.e insteadof stripingthe blocks acrossthe disks, it stripes the bits across
the disks. Inthe diagramb1, b2, b3 are bits. E1, E2, E3 are error correctioncodes. You
need twogroups of disks. One group of disks are used to write the data, another
group is usedto write the error correction codes. Thisuses Hammingerror correction
code (ECC), andstores this informationinthe redundancydisks. Whendata is written
to the disks, it calculatesthe ECCcode for the data onthe fly, and stripes the data bits
to the data-disks,and writes the ECCcode to the redundancydisks. When data is read
from the disks, it also reads the corresponding ECC code from the
redundancydisks, and checks whether the data is consistent. If required, it
makes appropriate corrections on the fly.
This uses lot ofdisks andcanbe configuredindifferent diskconfiguration.
Some validconfigurations are 1) 10 disks for data and 4 disks for ECC 2) 4
disks for data and3 disks for ECC. This is not usedanymore. This is expensive
and implementing it ina RAIDcontroller is complex, and ECC is redundant
now-a-days, as the hard disk themselves can do this.
 RaidLevel 3 - This level is similar to level 2, except that it takesadvantage of
the fact that eachdiskis still doing its ownerror-detection, sothat when an
error occurs, there is noquestionabout which diskinthe arrayhas the bad
data. As a result a single paritybit is all that is neededto recover the lost
data fromanarrayof disks. Level 3 alsoincludesstriping, which
improves performance. The downside with the parityapproachis that
everydisk must take part ineverydisk access, and the paritybits must
be constantlycalculatedandchecked, reducingperformance. Hardware-
level paritycalculations andNVRAMcache canhelpwith bothof those
issues. Inpractice level 3 is greatlypreferred over level 2. (Book)
(GeekStuff) This uses byte level striping, i.e insteadof striping the
blocks across the disks, it stripes the bytes across the disks. Inthe above
diagram B1, B2, B3 are bytes, p1, p2, p3 are parities. Uses multiple data
disks, anda dedicated disk to store parity. The disks have to spininsync
to get to the data. Sequential read and write will have good
performance. Random readandwrite will have worst performance. This
7
Mass-StorageStructure (Galvin Notes, 9th Ed.)
is not commonlyused.
 RaidLevel 4 - This level is similar to level 3, employing block-level striping insteadof bit-level
striping. The benefits are that multiple blocks can be read independently, and changes to a
block onlyrequire writing twoblocks (data andparity) rather thaninvolving all disks. Note
that new disks canbe added seamlesslyto the system provided they are initialized to all
zeros, as this does not affect the parity results. (Book)
(GeekStuff) This uses blocklevel striping. Inthe above diagram B1, B2, B3 are blocks. p1,
p2, p3 are parities. Uses multiple data disks, and a dedicateddiskto store parity. Minimum
of 3 disks (2 disks for data and1 for parity). Good random reads, as the data blocks are
striped. Bad random writes, as for everywrite, it has to write to the single parity disk. It is
somewhat similar to RAID3 and5, but a little different. This is just like RAID 3 in having the
dedicatedparitydisk, but thisstripes blocks. Thisis just like RAID 5 in striping the
blocks across the data disks, but this has onlyone paritydisk. This is not commonly
used.
 RaidLevel 5 - This level is similar to level 4, except the parityblocks are distributed
over all disks, therebymore evenlybalancingthe loadonthe system. For any given
block onthe disk(s), one ofthe disks willholdthe parityinformation for that block
and the other N-1 disks willhold the data. Note that the same disk cannot hold both
data andparityfor the same block, as bothwouldbe lost inthe event of a diskcrash.
(GeekStuff) Minimum3 disks required. Good performance (as blocks are striped).
Good redundancy(distributed parity). Best cost effective option providing both
performance and redundancy. Use this for DB that is heavily read -oriented. Write operations will be slow.
 RaidLevel 6 - This level extends raidlevel 5 bystoring multiple bits of error-recoverycodes, (such as the Reed-Solomoncodes), for each bit
position ofdata, rather thana single paritybit. Inthe example shown below2 bits ofECCare storedfor every4 bits of data, allowingdata
recoveryin the face ofup to twosimultaneous diskfailures. Note that thisstill involves only50% increase in storage needs, as opposedto
100% for simple mirroringwhichcouldonlytolerate a single disk failure. (Book)
(GeekStuff) Just like RAID5, this does blocklevel striping. However, it uses dual parity. In the above diagram A, B, Care blocks. p1, p2,
p3 are parities. This creates two parityblocks for each data block. Canhandle two disk failure. ThisRAIDconfigurationis complex to
implement ina RAIDcontroller, as it has to calculate two paritydata for eachdata block.
 There are alsotwo RAIDlevels whichcombine RAIDlevels0 and1 (striping andmirroring)indifferent combinations, designed to provide
both performance andreliabilityat the expense of increasedcost. (Bookcontent is understandable for these two, especially the latter).
 RAID level 0 + 1 disks are first striped, and thenthe stripeddisks mirroredto another set. Thislevel generallyprovides better performance
than RAID level 5.
 RAID level 1 + 0 mirrors disks in pairs, andthenstripes the mirroredpairs. The storage capacity, performance, etc. are all the same, but
there is an advantage to this approach in the event of multiple disk failures, as illustrated below:
In diagram(a)below, the 8 disks have been dividedintotwo sets of four, eachof whichis striped, and thenone stripe set is usedto mirror
the other set.
o If a single disk fails, it wipes out the entire stripe set, but the system can keep on functioning using the remaining set.
o However if a seconddisk fromthe other stripe set now fails, then the entire system is lost, as a result of two disk failure s.
In diagram (b), the same 8 disks are dividedinto four sets of two, eachof whichis mirrored, andthenthe file systemis striped across the
four sets of mirrored disks.
o If a single disk fails, then that mirror set is reduced to a single disk, but the system rolls on, and the other three mirror sets
continue mirroring.
o Now if a second disk fails, (that is not the mirror of the alreadyfaileddisk), thenanother one of the mirror sets is reduced to a
single disk, but the system can continue without data loss.
o In fact the secondarrangement couldhandle as manyas four simultaneouslyfailed disks, as longas no twoof them were from
the same mirror pair.
(GeekStuff for RAID 10) — Minimum4 disks. This is also calledas “stripe ofmirrors” Excellent redundancy(as blocks are mirrored).
Excellent performance (as blocks are striped). If you canaffordthe dollar, thisis the BEST option for anymission critical applications
(especiallydatabases).
8
Mass-StorageStructure (Galvin Notes, 9th Ed.)
(GeekStuff for RAID 01) — RAID 01 is alsocalledas RAID0+1. It is alsocalled as
“mirror of stripes”. It requires minimum of 3 disks, but inmost casesthis will be
implemented as minimum of 4 disks. To understand this better, create two
groups. For example, if youhave total of 6 disks, create two groups with 3 disks
each as shownbelow. Inthe above example, Group1 has 3 disks and Group 2
has 3 disks. Within the group, the data is striped, i.e in the Group 1 which
contains three disks, the 1st blockwill be written to 1st disk, 2nd block to 2nd
disk, andthe 3rd blockto 3rd disk. So, block A is writtento Disk1, block B to Disk
2, block Cto Disk 3. Across the group, the data is mirrored, i.e the Group 1 and
Group 2 will look exactlythe same. i.e Disk 1 is mirroredto Disk4, Disk 2 to Disk 5, Disk 3 to Disk 6. This is why it is called “mirror of
stripes”. i.e the disks within the groups are striped. But, the groups are mirrored.
(GeekStuff RAID 01 vs RAID 10 differences) — Performance on both RAID 10
and RAID01 willbe the same. The storage capacityon these will be the same.
The maindifference is the fault tolerance level. On most implementations of
RAID controllers, RAID 01 fault tolerance is less. On RAID 01, since we have
onlytwo groups ofRAID0, if two drives (one in each group) fails, the entire
RAID 01 will fail. Inthe above RAID01 diagram, if Disk 1 and Disk 4 fails, both
the groups will be down. So, the whole RAID01 will fail. RAID 10 fault tolerance
is more. On RAID 10, since there are manygroups (as the individual group is
onlytwo disks), evenifthree disks fails(one ineachgroup), the RAID 10 is still
functional. Inthe above RAID10 example, evenif Disk1, Disk 3, Disk 5 fails, the
RAID 10 will still be functional. So, givena choice between RAID 10 and RAID
01, always choose RAID 10.
 Selecting a RAID Level: Trade-offs in selectingthe optimal RAIDlevel for a particular application include cost, volume of data, need for
reliability, need for performance, and rebuildtime, the latter of whichcanaffect the likelihoodthat a seconddisk will fail while the first
faileddiskis being rebuilt. Other decisions include how manydisks are involvedin a RAID set andhowmanydisks to protect with a single
paritybit. More disks inthe set increases performance but increasescost. Protectingmore disks per paritybit saves cost, but increases the
likelihood that a second disk will fail before the first bad disk is repaired.
SUMMARY
 Diskdrives are the major secondarystorage I/O devices onmost computers. Most secondarystorage devices are either magnetic disks or
magnetic tapes, although solid-state disks are growing in importance. Modern disk drives are structuredas large one-dimensional arrays of
logical diskblocks. Generally, theselogicalblocks are 512 bytes insize. Disks maybe attachedto a computer systeminone of two ways:(1)
through the local I/O ports on the host computer or (2) through a network connection.
 Requests for diskI/O are generatedbythe file system and bythe virtual memory system. Each request specifies the address on the disk to
be referenced, inthe form of a logical block number. Disk-scheduling algorithms can improve the effective bandwidth, the average
response time, and the variance inresponse time. Algorithms such as SSTF, SCAN, C-SCAN, LOOK, and C-LOOKare designed to make such
improvements throughstrategiesfor disk-queue ordering. Performance of disk-scheduling algorithms canvarygreatlyon magnetic disks.
In contrast, because solid-state disks have nomoving parts, performance varies little amongalgorithms, and quite often a simple FCFS
strategy is used.
 Performance canbe harmedbyexternalfragmentation. Some systems have utilities that scan the file system to identifyfragmented files;
theythen move blocks aroundto decrease the fragmentation. Defragmenting a badlyfragmentedfile system can significantly improve
performance, but the system mayhave reducedperformance whilethe defragmentationis in progress. Sophisticatedfile systems, such as
the UNIXFast File System, incorporate manystrategiesto control fragmentationduringspace allocationsothat disk reorganization is not
needed.
 The operating systemmanages the disk blocks. First, a diskmust be lowlevel- formattedto create the sectors onthe raw hardware—new
disks usuallycome preformatted. Then, the diskis partitioned, file systems are created, andboot blocks are allocated to store the system’s
bootstrapprogram. Finally,whena block is corrupted, the system must have a wayto lock out that block or to replace it logically with a
spare.
 Because anefficient swapspace is a keyto goodperformance, systems usuallybypass the file system and use raw-disk access for paging
I/O. Some systems dedicate a raw-diskpartitionto swapspace, andothers use a file within the filesystem instead. Still other systems allow
the user or system administrator to make the decision by providing both options.
 Because of the amount of storage required onlarge systems, disks are frequentlymade redundant via RAID algorithms. These algorithms
allowmore thanone diskto be usedfor a given operationandallowcontinuedoperationandevenautomatic recoveryin the face of a disk
failure. RAIDalgorithms are organizedintodifferent levels;eachlevel providessome combinationof reliability and high transfer rates.
ReadLater
 Stable-Storage Implementationpart hasbeen left out. It hasbeen marked optional byprofessor. Won’t be neededfor first run.
Further Reading
 TertiaryStorage Structure (Just for culture). It is marked optional byprofessor and has beendeletedfrom 9th Edition.

More Related Content

PPT
Ch10
PPT
Chapter 12 - Mass Storage Systems
PPTX
Mass storage structure
PPTX
Massstorage
PPTX
Mass storage systemsos
PPT
PPTX
Mass storage device
PDF
Mass Storage Devices
Ch10
Chapter 12 - Mass Storage Systems
Mass storage structure
Massstorage
Mass storage systemsos
Mass storage device
Mass Storage Devices

What's hot (18)

PPT
operating system
PPT
DB_ch11
PPT
PPT
PPTX
DOCX
Mass storagestructure pre-final-formatting
PPT
Disk structure
PPT
PPT
Pandi
PPT
Ch11 - Silberschatz
PPT
PPT
storage and file structure
PPTX
Mass Storage Structure
PPTX
04.01 file organization
PPT
db
PPTX
Sheik Mohamed Shadik - BSc - Project Details
operating system
DB_ch11
Mass storagestructure pre-final-formatting
Disk structure
Pandi
Ch11 - Silberschatz
storage and file structure
Mass Storage Structure
04.01 file organization
db
Sheik Mohamed Shadik - BSc - Project Details
Ad

Similar to Mass storage structurefinal (20)

PDF
CH10.pdf
PPT
PPT
Ch14 OS
 
DOCX
What is the average rotational latency of this disk drive What seek.docx
PDF
Cs8493 unit 4
PDF
unit-4.pdf
PPTX
Operation System
PPT
Secondary storage structure-Operating System Concepts
PPTX
OS Slide Ch12 13
PPTX
Operation System
PPT
12.mass stroage system
PPT
Operating system presentation part 2 2025
PPTX
SAN BASICS..Why we will go for SAN?
PPTX
PPT
Network and system administration Chapter 5.pptxChapter 6.ppt
PPTX
Introduction to Storage technologies
PPT
ISR UNIT2.ppt
PPTX
Viknesh
CH10.pdf
Ch14 OS
 
What is the average rotational latency of this disk drive What seek.docx
Cs8493 unit 4
unit-4.pdf
Operation System
Secondary storage structure-Operating System Concepts
OS Slide Ch12 13
Operation System
12.mass stroage system
Operating system presentation part 2 2025
SAN BASICS..Why we will go for SAN?
Network and system administration Chapter 5.pptxChapter 6.ppt
Introduction to Storage technologies
ISR UNIT2.ppt
Viknesh
Ad

More from marangburu42 (20)

DOCX
PDF
Write miss
DOCX
Hennchthree 161102111515
DOCX
Hennchthree
DOCX
Hennchthree
DOCX
Sequential circuits
DOCX
Combinational circuits
DOCX
Hennchthree 160912095304
DOCX
Sequential circuits
DOCX
Combinational circuits
DOCX
Karnaugh mapping allaboutcircuits
DOCX
Aac boolean formulae
DOCX
Virtualmemoryfinal 161019175858
DOCX
Io systems final
DOCX
File system interfacefinal
DOCX
File systemimplementationfinal
DOCX
All aboutcircuits karnaugh maps
DOCX
Virtual memoryfinal
DOCX
Mainmemoryfinal 161019122029
DOCX
Virtualmemorypre final-formatting-161019022904
Write miss
Hennchthree 161102111515
Hennchthree
Hennchthree
Sequential circuits
Combinational circuits
Hennchthree 160912095304
Sequential circuits
Combinational circuits
Karnaugh mapping allaboutcircuits
Aac boolean formulae
Virtualmemoryfinal 161019175858
Io systems final
File system interfacefinal
File systemimplementationfinal
All aboutcircuits karnaugh maps
Virtual memoryfinal
Mainmemoryfinal 161019122029
Virtualmemorypre final-formatting-161019022904

Recently uploaded (20)

PPTX
Art Appreciation-Lesson-1-1.pptx College
PPTX
Military history & Evolution of Armed Forces of the Philippines
PPTX
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
PPTX
SAPOTA CULTIVATION.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
PPTX
Socio ch 1 characteristics characteristics
PPTX
E8 Q1 020ssssssssssssssssssssssssssssss2 PS.pptx
PDF
TUTI FRUTI RECETA RÁPIDA Y DIVERTIDA PARA TODOS
PPTX
22 Bindushree Sahu.pptxmadam curie life and achievements
PPTX
Lesson 1-Principles of Indigenous Creative Crafts.pptx
PPTX
CPAR7 ARTS GRADE 112 LITERARY ARTS OR LI
PPTX
Slide_Egg-81850-About Us PowerPoint Template Free.pptx
PPTX
Green and Orange Illustration Understanding Climate Change Presentation.pptx
PDF
DPSR MUN'25 (U).pdf hhhhhhhhhhhhhbbnhhhh
PPTX
Visual-Arts.pptx power point elements of art the line, shape, form
PDF
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
PPTX
400kV_Switchyard_Training_with_Diagrams.pptx
PPTX
CPAR_QR1_WEEK1_INTRODUCTION TO CPAR.pptx
PPT
Jaipur Sculpture Tradition: Crafting Marble Statues
PDF
Slide_BIS 2020 v2.pdf....................................
PPTX
Brown and Beige Vintage Scrapbook Idea Board Presentation.pptx.pptx
Art Appreciation-Lesson-1-1.pptx College
Military history & Evolution of Armed Forces of the Philippines
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
SAPOTA CULTIVATION.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
Socio ch 1 characteristics characteristics
E8 Q1 020ssssssssssssssssssssssssssssss2 PS.pptx
TUTI FRUTI RECETA RÁPIDA Y DIVERTIDA PARA TODOS
22 Bindushree Sahu.pptxmadam curie life and achievements
Lesson 1-Principles of Indigenous Creative Crafts.pptx
CPAR7 ARTS GRADE 112 LITERARY ARTS OR LI
Slide_Egg-81850-About Us PowerPoint Template Free.pptx
Green and Orange Illustration Understanding Climate Change Presentation.pptx
DPSR MUN'25 (U).pdf hhhhhhhhhhhhhbbnhhhh
Visual-Arts.pptx power point elements of art the line, shape, form
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
400kV_Switchyard_Training_with_Diagrams.pptx
CPAR_QR1_WEEK1_INTRODUCTION TO CPAR.pptx
Jaipur Sculpture Tradition: Crafting Marble Statues
Slide_BIS 2020 v2.pdf....................................
Brown and Beige Vintage Scrapbook Idea Board Presentation.pptx.pptx

Mass storage structurefinal

  • 1. 1 Mass-StorageStructure (Galvin Notes, 9th Ed.) Chapter 10: Mass-Storage Structure  OVERVIEW OF MASS-STORAGE STRUCTURE  Magnetic Disks  Solid-State Disks  DISK STRUCTURE  DISK ATTACHMENT  Host-Attached Storage  Network-Attached Storage  Storage-Area Network  DISK SCHEDULING  FCFS Scheduling  SSTF Scheduling  Scan Scheduling  C-SCAN Scheduling  LOOK Scheduling  Selection of a Disk-Scheduling Algorithm  DISK MANAGEMENT  Disk Formatting  Boot Block  Bad Blocks  SWAP-SPACE MANAGEMENT  Swap-Space Use  Swap-Space Location  Swap-Space Management: An Example  RAID STRUCTURE  Improvement of Reliability via Redundancy  Improvement in Performance via Parallelism  RAID Levels  Selecting a RAID Level  Extensions  Problems with RAID SKIPPED CONTENT  STABLE-STORAGE IMPLEMENTATION ( OPTIONAL )  TERTIARY-STORAGE STRUCTURE - OPTIONAL, OMITTED FROM NINTH EDITION  Tertiary-Storage Devices (Removable Disks, Tapes, Future Technology)  Operating-System Support (Application Interface, File Naming, Hierarchical Storage Management)  Performance Issues (Speed, Reliability, Cost) Content OVERVIEW OF MASS-STORAGE STRUCTURE Magnetic Disks  Traditional magnetic disks have the following basic structure: One or more platters in the form of disks covered with magnetic media. Eachplatter has two working surfaces. Each workingsurface is divided into a number of concentric rings called tracks. The collection of all tracks that are the same distance from the edge of the platter, (i.e. all tracks immediatelyabove one another inthe following diagram) is called a cylinder. Each track is further divided into sectors, traditionallycontaining 512 bytes of data each, although some modern disks occasionallyuse larger sector sizes. (Sectors also include a header anda trailer, including checksuminformation among other things. Larger sector sizes reduce the fraction of the disk consumed by headers and trailers, but increase internal fragmentation and the amount of disk that must be marked bad in the case of errors.)
  • 2. 2 Mass-StorageStructure (Galvin Notes, 9th Ed.)  The data ona hard drive is readbyread-write heads. The standardconfiguration (shownbelow) uses one headper surface, each on a separate arm, andcontrolled bya common arm assemblywhich moves all heads simultaneouslyfrom one cylinder to another. (Other configurations, including independent read-write heads, may speed up disk access, but involve serious technical difficulties.)  The storage capacityof a traditional diskdrive is equalto the number of heads (i.e. the number of working surfaces), times the number of tracks per surface, times the number of sectors per track, times the number of bytes per sector. A particular physicalblock of data is specified by providing the head-sector-cylinder number at which it is located.  The rate at which data can be transferred from the disk to the computer is composed of several steps: o The positioning time, a.k.a. the seek time or random access time is the time requiredto move the heads fromone cylinder to another, andfor the heads to settle downafter the move. This is typically the slowest step in the process and the predominant bottleneck to overall transfer rates. o The rotational latency is the amount oftime requiredfor the desiredsector to rotate around and come under the read - write head.Thiscanrange anywhere fromzeroto one full revolution, andonthe average willequal one-half revolution. This is another physical step and is usuallythe secondslowest step behind seek time. (For a disk rotating at 7200 rpm, the average rotational latencywould be 1/2 revolution/ 120 revolutions per second, or just over 4 milliseconds, a long time by computer standards. o The transferrate, which is the time requiredto move the data electronicallyfrom the diskto the computer. (Some authors include the seek time and rotational latency as well as the electronic data transfer rate.)  The host controller/host adapter is at the computer endof the I/O bus, andthe disk controller is built into the disk itself. The CPU issues commands to the host controller via I/O ports. Data is transferredbetweenthe magnetic surface andonboard cache bythe disk controller, andthenthe data is transferredfrom that ca che to the host controller andthe motherboardmemoryat electronic speeds. Solid-State Disks: SSDs use memorytechnologyas a small fast hard disk. Specific implementations may use either flash memory or DRAM chips protected by a batteryto sustainthe information through power cycles. Because SSDs have no moving parts theyare muchfaster thantra ditional hard drives, and certainproblems suchas the scheduling of disk accessessimplydo not apply. However SSDs alsohave their weaknesses: They a re more expensive than hard drives, generallynot as large, andmayhave shorter life spans. SSDs are especiallyuseful as a high-speed cache of hard-disk information that must be accessed quickly. One example is to store filesystemmeta-data, e.g. directoryandinode information, that must be accessed quicklyand often. Another variationis a boot diskcontainingthe OS and some application executables, but novital user data. SSDs are also used in laptops to make themsmaller, faster, andlighter. Because SSDs are somuchfaster than traditional hard disks, the throughput ofthe bus canbecome a limiting factor, causing some SSDs to be connected directly to the system PCI bus for example. DISK STRUCTURE  The traditional head-sector-cylinder, HSCnumbers are mappedto linear blockaddresses bynumberingthe first sector on the first head on the outermost track as sector 0. Numberingproceeds withthe rest ofthe sectors on that same track, andthen the rest of the tracks onthe same cylinder before proceedingthroughthe rest ofthe cylinders to the center of the disk. In modernpractice these linear block addresses are usedin place of the HSCnumbers for a varietyof reasons: 1) The linear lengthof tracks near the outer edge ofthe diskis much longer than for those tracks locatednear the center, andtherefore it is possible to squeeze manymore sectors onto outer tracks thanonto inner ones. 2) All disks have some badsectors, and therefore disks maintain a few spare sectors that canbe usedin place of the bad ones. The mapping ofspare sectors to badsectors inmanagedinternally to the disk controller. 3) Modern hard drives can have thousands of cylinders, and hundreds ofsectors per trackon their outermost tracks. These numbers exceed the range of HSCnumbers for many (older) operatingsystems, andtherefore disks canbe configuredfor anyconvenient combination of HSCvaluesthat fallswithinthe total number of sectors physically on the drive.  Modern disks packmanymore sectors into outer cylinders than inner ones, usingone of two approaches: With Constant Linear Velocity, CLV, the densityof bits is uniform fromcylinder to cylinder. Because there are more sectors in outer cylinders, the diskspins slower when reading those cylinders, causingthe rate of bits passingunder the read-write head to remain constant. This is the approach used by modern CDs andDVDs. With Constant Angular Velocity, CAV, the diskrotates at a constant angular speed, withthe bit densitydecreasing on outer cylinders. (These disks would have a constant number of sectors per track on all cylinders.) DISK ATTACHMENT Diskdrives canbe attachedeither directlyto a particular host (a local disk) or to a network.  Host-Attached Storage: Local disks are accessed throughI/O Ports. The most commoninterfaces are IDE or ATA, eachof which allow up to two drives per host controller. SATA is similar with simpler cabling. Highendworkstations or other systems in need of large r number of disks typicallyuse SCSI disks:The SCSI standard supports up to 16 targets oneach SCSI bus, one of which is generallythe host adapter and the other 15 of whichcanbe disk or tape drives. A SCSI target is usuallya single drive, but the standardalsosupports up to 8 units within each target. These wouldgenerallybe usedfor accessing individual disks withina RAIDarray. The SCSI standardalso supports multiple host adapters in a single computer, i.e. multiple SCSI busses. SCSI cablesmaybe either 50 or 68 conductors. SCSI devices maybe external as well as internal. FC is a high-speedserialarchitecture that canoperate over optical fiber or four-conductor copper wires, andhas two variants: 1) A large switched fabric having a 24-bit address space. This variant allows for multiple devices andmultiple hosts to interconnect, forming the basis for the storage-area networks (SANs). 2) The arbitrated loop, FC-AL, that can address up to 126 devices (drives and controllers).  Network-Attached Storage: Network attachedstorage connects storage devices to computers using a remote procedure call, RPC, interface, typicallywithsomethinglike NFS filesystem mounts. This is convenient for allowing several computers ina group commonaccess and namingconventions for shared storage. NAS can be implementedusingSCSI cabling, or ISCSI using Internet protocols and standard
  • 3. 3 Mass-StorageStructure (Galvin Notes, 9th Ed.) network connections, allowinglong-distance remote accessto sharedfiles. NAS allows computers to easilyshare data storage, but tends to be less efficient than standard host-attached storage.  Storage-Area Network: A Storage-Area Network, SAN, connects computers andstorage devices in a network, using storage protocols insteadof network protocols. One advantage of this is that storage access does not tie up regular networking bandwidth. SAN is very flexible anddynamic, allowing hosts anddevices to attachand detach onthe fly. SAN is alsocontrollable, allowing restricted access to certain hosts and devices.  DISK SCHEDULING Disktransfer speeds are limitedprimarilybyseek timesandrotationallatency. Whenmultiple requests are to be processedthere is alsosome inherent delayin waiting for other requests to be processed. Bandwidthis measured bythe amount of data transferreddivided bythe total amount of time fromthe first request being made to the last transfer beingcompleted (for a series ofdiskrequests). Both bandwidthandaccess time canbe improved byprocessing requests ina goodorder. (Disk requests include the disk address, memoryaddress, number of sectors to transfer, andwhether the request is for readingor writing.)  FCFS Scheduling: It is simple but not veryefficient. Consider in the following sequence the wildswing fromcylinder 122 to 14 and then back to 124:  SSTF (Shortest Seek Time First) Scheduling: It is more efficient, but mayleadto starvationif a constant stream ofrequests arrives for the same generalarea ofthe disk. SSTFreduces the total head movement to 236 cylinders, down from 640 required f or the same set of requests under FCFS. Note, however that the distance couldbe reduced stillfurther to 208 by starting with 37 and then 14 fi rst before processing the rest of the requests.  SCAN Scheduling: The SCAN algorithm, a.k.a. the elevator algorithm moves backandforth fromone end ofthe diskto the other, similarly to an elevator processing requests in a tall building. Under the SCAN algorithm, If a request arrives just ahead ofthe movin g head then it will be processedright away, but ifit arrives just after the head has passed, thenit will have to wait for the head to pass going the other wayon the return trip. This leads to a fairlywide variationinaccesstimes which can be improvedupon. Consider, for examp le, when the headreaches the high endof the disk:Requests with highcylinder numbers just missedthe passing head, whichmeans they are all fairly recent requests, whereas requests withlownumbers mayhave beenwaiting for a muchlonger time. Making the return scan from high to low then ends up accessing recent requests first and making older requests wait that much longer.  C-SCAN Scheduling: The Circular-SCAN algorithm improves upon SCAN bytreating all requests ina circular queue fashion - Once the head reaches the endof the disk, it returns to the other end without processinganyrequests, and thenstarts againfromthe beginningof the disk:
  • 4. 4 Mass-StorageStructure (Galvin Notes, 9th Ed.)  LOOK Scheduling: LOOK scheduling improves upon SCAN bylooking ahead at the queue of pending requests, andnot moving the heads anyfarther towards the end of the disk thanis necessary. The following diagramillustrates the circular form of LOOK.  Selection of a Disk-Scheduling Algorithm: With verylowloads all algorithms are equal, since there will normally only be one request to process at a time. For slightlylarger loads, SSTFoffers better performance than FCFS, but may lead to starvationwhenloads become heavyenough. For busier systems, SCAN and LOOK algorithms eliminate starvation problems. Some improvement to overall filesystem access times canbe made by intelligent placement of directory and/or inode information. Ifthosestructures are placedinthe middle of the disk instead of at the beginning of the disk, thenthe maximum distance from those structures to data blocks is reducedto onlyone-halfof the disk size. Ifthose structures can be further distributedandfurthermore have their data blocks stored as close as possible to the corresponding directorystructures, then that reduces stillfurther the overall time to findthe diskblock n umbers and then access the corresponding data blocks. On modern disks the rotational latencycanbe almost as significant as the seek time, however it is not within the OSes control to account for that, because modern disks do not reveal their internal sector mapping schemes, (particularly when bad blocks have been remappedto spare sectors.)Some disk manufacturers provide for disk scheduling algorithms directlyon their disk controllers, (which do know the actual geometryof the disk as well as anyremapping), sothat if a seriesof requests are sent from the computer to the controller then those requests can be processed in an optimal order. Unfortunatelythere are some considerations that the OS must take intoaccount that are beyond the abilities of the on -board disk- scheduling algorithms, suchas priorities of some requests over others, or the needto process certainrequests in a particular order. For this reason OSes may elect to spoon-feed requests to the disk controller one at a time in certain situations. DISK MANAGEMENT Disk Formatting:  Before a diskcanbe used, it has to be low-level formatted, whichmeans laying down all ofthe headers andtrailers marking the beginning and ends of eachsector. Included inthe header and trailer are the linear sector numbers, and error-correcting codes (ECC) which allow damagedsectors to not onlybe detected, but in manycases for the damaged data to be recovered (depending on the extent of t he damage).  ECC calculationis performed witheverydisk read or write, andifdamage is detected but the data is recoverable, then a soft error has occurred. Soft errors are generally handled by the on-board disk controller, and never seen by the OS.  Once the diskis low-level formatted, the next stepis to partition the drive into one or more separate partitions. This step must be completedeven if the disk is to be usedas a single large partition, sothat the partitiontable can be writtento the begin ning of the disk.  After partitioning, thenthe filesystems must be logicallyformatted, which involves laying down the master directory information (FAT table or inode structure), initializing free lists, andcreatingat least the root directoryof the filesystem. (Disk partiti ons which are to be usedas raw devices are not logicallyformatted. Thissaves the overhead anddisk space of the filesystemstructure, but requires that the application program manage its own disk storage requirements.) Boot Block:  Computer ROMcontains a bootstrap program(OS independent) withjust enough code to find the first sector on the first harddrive on the first controller, loadthat sector intomemory, and transfer control over to it. (The ROMbootstrapprogram maylookinfloppyand/or CD drives before accessing the harddrive, andis smart enough to recognize whether it has found valid boot code or not.)
  • 5. 5 Mass-StorageStructure (Galvin Notes, 9th Ed.)  The first sector on the hard drive is knownas the Master Boot Record, MBR, andcontains a verysmallamount of code in addition to the partitiontable. The partitiontable documents how the disk is partitionedintological disks, andindicatesspecificallywhich partition is the active or boot partition. (MBR has the boot program eh? Yes and a lot more, says Wiki)  The boot program thenlooks to the active partitionto findan operating system, possiblyloadingup a slightlylarger / more advanced boot program along the way. Ina dual-boot (or larger multi-boot) system, the user maybe givena choice of which operating system to boot, with a default action to be taken in the event of no response within some time frame.  Once the kernel is foundbythe boot program, it is loadedintomemoryandthen control is transferred over to the OS. The kernel will normallycontinue the boot process byinitializing all important kerneldata structures, launching important systemservices (e.g. network daemons, sched, init, etc.), andfinallyproviding one or more loginprompts. Boot options at this stage may include single -user a.k.a. maintenance or safe modes, inwhichveryfew system servicesare started - These modes are designedfor systemadministrators to repair problems or otherwise maintain the system. Bad Blocks:  In the olddays, formattingof the diskor running certain disk-analysistools wouldidentifybadblocks, and attempt to read the data off of them one last time throughrepeatedtries. Then the badblocks would be mapped out and taken out of future service. Modern disk controllers make muchbetter use of the error-correcting codes, sothat badblocks canbe detectedearlier andthe data usuallyrecovered. (Recallthat blocks are testedwitheverywrite as well as with everyread, sooftenerrors canbe detected before the write operation is complete, and the data simply written to a different sector instead.)  Note that re-mapping of sectors from their normal linear progressioncanthrowoff the diskschedulingoptimization ofthe OS, especially if the replacement sector is physicallyfar awayfrom the sector it is replacing. For this reasonmost disks normallykeep a few spare sectors on eachcylinder, as well as at least one spare cylinder. Whenever possible a bad sector will be mapped to another sector on the same cylinder, or at least a cylinder as close as possible. Sector slipping mayalso be performed, in whichallsectors betweenth e bad sector and the replacement sector are moveddownbyone, so that the linear progressionof sector numbers can be maintained. Ifthe data on a bad block cannot be recovered, thena hard error has occurred., which requires replacingthe file(s) from backups, or rebuilding them from scratch. SWAP-SPACEMANAGEMENT  Swap-Space Use: The amount of swapspace neededbyanOS varies greatlyaccording to how it is used. Some systems require an amount equal to physical RAM;some want a multiple of that;some want an amount equal to the amount by which virtual memory exceeds physical RAM, and some systems use little or none at all! Some systems support multiple swap spaces onseparate disks inorder to spee d up the virtual memory system.  Swap-Space Location: Swap space can be physicallylocatedinone oftwo locations – (A) As a large file which is part of the regular file- system. This is easyto implement, but inefficient. Not onlymust the swap space be accessedthroughthe directorysystem, th e file is also subject to fragmentationissues. Cachingthe block locationhelps infinding the physical blocks, but that is not a complete fix. (B) As a raw partition, possiblyon a separate or little-useddisk. This allows the OS more control over swapspace management, whichis usually faster and more efficient. Fragmentationof swap space is generally not a big issue, as the space is re-initialized every time the system is rebooted. The downside of keeping swap space on a raw partition is that it can only be grown by repartitioning the hard drive .  Swap-Space Management: An Example — HistoricallyOSes swappedout entire processes as needed. Modern systems swap out only individualpages, and onlyas needed. (For example process code blocks andother blocks that have not been changed since theywere originally loaded are normally just freed from the virtual memorysystem rather than copying them to swapspace, because it is faster to go find them again in the filesystem and read them back in from there than to write themout to swap space andthen readthem back.) In the mapping system shown belowfor Linux systems, a map of swap space is kept in memory, where each entry corresponds to a 4Kblock inthe swapspace. Zeros indicate free slots andnon-zeros refer to how manyprocesses have a mapping to that particular block (>1 for shared pages only.) RAID STRUCTURE The generalidea behind RAIDis to employa group ofharddrives together withsome form of duplication, either to increase reliabilityor to speed up operations, (or sometimes both.)RAID originallystood for Redundant Arrayof Inexpensive Disks, andwas designedto use a bunch of cheap small disks in place of one or two larger more expensive ones. TodayRAID systems employlarge possiblyexpensive disks as their co mponents, switching the definition to Independent disks. Improvement of Reliability via Redundancy  The more disks a systemhas, the greater the likelihood that one ofthem will gobadat anygiventime. Hence increasing disks on a system actuallydecreasesthe MeanTime To Failure, MTTF of the system. If, however, the same data was copied onto multiple disks, then the data wouldnot be lost unless both(or all)copies of the data were damagedsimultaneously, whichis a MUCH lower probability than for a single diskgoingbad. More specifically, the seconddiskwould have to go badbefore the first diskwas repaired, which brings the Mean
  • 6. 6 Mass-StorageStructure (Galvin Notes, 9th Ed.) Time To Repair , MTTR into play. This is the basic idea behind disk mirroring, in which a systemcontains identical data on two or more disks. Improvement in Performance via Parallelism  There is also a performance benefit to mirroring, particularlywithrespect to reads. Since everyblock of data is duplicated on multiple disks, readoperations canbe satisfiedfrom anyavailable copy, andmultiple disks can be readingdifferent data blocks simultaneously in parallel. (Writes couldpossiblybe spedup as well through careful scheduling algorithms, but it would be complicated in pra ctice.)  Another wayof improvingdiskaccesstime is with striping, whichbasicallymeans spreading data out across multiple disks that can be accessed simultaneously. Withbit-level stripingthe bits ofeachbyte are stripedacross multiple disks. For example if 8 disks were involved, then each8-bit byte wouldbe readinparallel by8 heads onseparate disks. A single disk read would access 8 * 512 bytes = 4K worth of data in the time normallyrequiredto read512 bytes. Similarlyif 4 disks were involved, thentwo bits of each byte couldbe stored on each disk. Block-level striping spreads a file-systemacross multiple disks on a block-by-block basis, so if block N were located ondisk0, thenblock N + 1 would be ondisk1, andso on. This is particularly useful when file-systems are accessedinclusters of physical blocks. Other striping possibilitiesexist, with block-level striping being the most common. Raid Levels Mirroring providesreliability but is expensive; Striping improves performance, but does not improve reliability. Accordinglythere are a number of different schemes that combine the principalsof mirroring and striping indifferent ways, inorder to balance reliabilityversus performance versus cost. These are described bydifferent RAIDlevels, as follows:(Inthe diagram that follows, "C" indicates a copy, and"P" indicates parity, i.e. checksum bits.). (Geekstuffsays:In most situations, you would be usinglevels0, 1, 5 and 10 /1+0, 5 and 10 in critical servers. There are several non-standardraids, which are not used except in some rare situations – RAID2, RAID 3, RAID 4 and RAID 6).  RaidLevel 0 - This level includesstriping only, with nomirroring. (Blocks striped, no mirrors, no parity. Minimum 2 disks)  RaidLevel 1 - This level includesmirroring only, no striping. (No stripe, noparity. Minimum 2 disks)  RaidLevel 2 - This level stores error-correcting codes onadditional disks, allowing for any damageddata to be reconstructedbysubtractionfrom the remaining undamageddata. Note that this scheme requires onlythree extra disks to protect 4 disks worthof data, as opposed to full mirroring. (The number of disks requiredis a function of the error-correcting algorithms, andthe means bywhich the particular bad bit(s) is (are) identified.)- (Book) (http://www.thegeekstuff.com/2011/11/raid2-raid3-raid4-raid6) - This uses bit level striping. i.e insteadof stripingthe blocks acrossthe disks, it stripes the bits across the disks. Inthe diagramb1, b2, b3 are bits. E1, E2, E3 are error correctioncodes. You need twogroups of disks. One group of disks are used to write the data, another group is usedto write the error correction codes. Thisuses Hammingerror correction code (ECC), andstores this informationinthe redundancydisks. Whendata is written to the disks, it calculatesthe ECCcode for the data onthe fly, and stripes the data bits to the data-disks,and writes the ECCcode to the redundancydisks. When data is read from the disks, it also reads the corresponding ECC code from the redundancydisks, and checks whether the data is consistent. If required, it makes appropriate corrections on the fly. This uses lot ofdisks andcanbe configuredindifferent diskconfiguration. Some validconfigurations are 1) 10 disks for data and 4 disks for ECC 2) 4 disks for data and3 disks for ECC. This is not usedanymore. This is expensive and implementing it ina RAIDcontroller is complex, and ECC is redundant now-a-days, as the hard disk themselves can do this.  RaidLevel 3 - This level is similar to level 2, except that it takesadvantage of the fact that eachdiskis still doing its ownerror-detection, sothat when an error occurs, there is noquestionabout which diskinthe arrayhas the bad data. As a result a single paritybit is all that is neededto recover the lost data fromanarrayof disks. Level 3 alsoincludesstriping, which improves performance. The downside with the parityapproachis that everydisk must take part ineverydisk access, and the paritybits must be constantlycalculatedandchecked, reducingperformance. Hardware- level paritycalculations andNVRAMcache canhelpwith bothof those issues. Inpractice level 3 is greatlypreferred over level 2. (Book) (GeekStuff) This uses byte level striping, i.e insteadof striping the blocks across the disks, it stripes the bytes across the disks. Inthe above diagram B1, B2, B3 are bytes, p1, p2, p3 are parities. Uses multiple data disks, anda dedicated disk to store parity. The disks have to spininsync to get to the data. Sequential read and write will have good performance. Random readandwrite will have worst performance. This
  • 7. 7 Mass-StorageStructure (Galvin Notes, 9th Ed.) is not commonlyused.  RaidLevel 4 - This level is similar to level 3, employing block-level striping insteadof bit-level striping. The benefits are that multiple blocks can be read independently, and changes to a block onlyrequire writing twoblocks (data andparity) rather thaninvolving all disks. Note that new disks canbe added seamlesslyto the system provided they are initialized to all zeros, as this does not affect the parity results. (Book) (GeekStuff) This uses blocklevel striping. Inthe above diagram B1, B2, B3 are blocks. p1, p2, p3 are parities. Uses multiple data disks, and a dedicateddiskto store parity. Minimum of 3 disks (2 disks for data and1 for parity). Good random reads, as the data blocks are striped. Bad random writes, as for everywrite, it has to write to the single parity disk. It is somewhat similar to RAID3 and5, but a little different. This is just like RAID 3 in having the dedicatedparitydisk, but thisstripes blocks. Thisis just like RAID 5 in striping the blocks across the data disks, but this has onlyone paritydisk. This is not commonly used.  RaidLevel 5 - This level is similar to level 4, except the parityblocks are distributed over all disks, therebymore evenlybalancingthe loadonthe system. For any given block onthe disk(s), one ofthe disks willholdthe parityinformation for that block and the other N-1 disks willhold the data. Note that the same disk cannot hold both data andparityfor the same block, as bothwouldbe lost inthe event of a diskcrash. (GeekStuff) Minimum3 disks required. Good performance (as blocks are striped). Good redundancy(distributed parity). Best cost effective option providing both performance and redundancy. Use this for DB that is heavily read -oriented. Write operations will be slow.  RaidLevel 6 - This level extends raidlevel 5 bystoring multiple bits of error-recoverycodes, (such as the Reed-Solomoncodes), for each bit position ofdata, rather thana single paritybit. Inthe example shown below2 bits ofECCare storedfor every4 bits of data, allowingdata recoveryin the face ofup to twosimultaneous diskfailures. Note that thisstill involves only50% increase in storage needs, as opposedto 100% for simple mirroringwhichcouldonlytolerate a single disk failure. (Book) (GeekStuff) Just like RAID5, this does blocklevel striping. However, it uses dual parity. In the above diagram A, B, Care blocks. p1, p2, p3 are parities. This creates two parityblocks for each data block. Canhandle two disk failure. ThisRAIDconfigurationis complex to implement ina RAIDcontroller, as it has to calculate two paritydata for eachdata block.  There are alsotwo RAIDlevels whichcombine RAIDlevels0 and1 (striping andmirroring)indifferent combinations, designed to provide both performance andreliabilityat the expense of increasedcost. (Bookcontent is understandable for these two, especially the latter).  RAID level 0 + 1 disks are first striped, and thenthe stripeddisks mirroredto another set. Thislevel generallyprovides better performance than RAID level 5.  RAID level 1 + 0 mirrors disks in pairs, andthenstripes the mirroredpairs. The storage capacity, performance, etc. are all the same, but there is an advantage to this approach in the event of multiple disk failures, as illustrated below: In diagram(a)below, the 8 disks have been dividedintotwo sets of four, eachof whichis striped, and thenone stripe set is usedto mirror the other set. o If a single disk fails, it wipes out the entire stripe set, but the system can keep on functioning using the remaining set. o However if a seconddisk fromthe other stripe set now fails, then the entire system is lost, as a result of two disk failure s. In diagram (b), the same 8 disks are dividedinto four sets of two, eachof whichis mirrored, andthenthe file systemis striped across the four sets of mirrored disks. o If a single disk fails, then that mirror set is reduced to a single disk, but the system rolls on, and the other three mirror sets continue mirroring. o Now if a second disk fails, (that is not the mirror of the alreadyfaileddisk), thenanother one of the mirror sets is reduced to a single disk, but the system can continue without data loss. o In fact the secondarrangement couldhandle as manyas four simultaneouslyfailed disks, as longas no twoof them were from the same mirror pair. (GeekStuff for RAID 10) — Minimum4 disks. This is also calledas “stripe ofmirrors” Excellent redundancy(as blocks are mirrored). Excellent performance (as blocks are striped). If you canaffordthe dollar, thisis the BEST option for anymission critical applications (especiallydatabases).
  • 8. 8 Mass-StorageStructure (Galvin Notes, 9th Ed.) (GeekStuff for RAID 01) — RAID 01 is alsocalledas RAID0+1. It is alsocalled as “mirror of stripes”. It requires minimum of 3 disks, but inmost casesthis will be implemented as minimum of 4 disks. To understand this better, create two groups. For example, if youhave total of 6 disks, create two groups with 3 disks each as shownbelow. Inthe above example, Group1 has 3 disks and Group 2 has 3 disks. Within the group, the data is striped, i.e in the Group 1 which contains three disks, the 1st blockwill be written to 1st disk, 2nd block to 2nd disk, andthe 3rd blockto 3rd disk. So, block A is writtento Disk1, block B to Disk 2, block Cto Disk 3. Across the group, the data is mirrored, i.e the Group 1 and Group 2 will look exactlythe same. i.e Disk 1 is mirroredto Disk4, Disk 2 to Disk 5, Disk 3 to Disk 6. This is why it is called “mirror of stripes”. i.e the disks within the groups are striped. But, the groups are mirrored. (GeekStuff RAID 01 vs RAID 10 differences) — Performance on both RAID 10 and RAID01 willbe the same. The storage capacityon these will be the same. The maindifference is the fault tolerance level. On most implementations of RAID controllers, RAID 01 fault tolerance is less. On RAID 01, since we have onlytwo groups ofRAID0, if two drives (one in each group) fails, the entire RAID 01 will fail. Inthe above RAID01 diagram, if Disk 1 and Disk 4 fails, both the groups will be down. So, the whole RAID01 will fail. RAID 10 fault tolerance is more. On RAID 10, since there are manygroups (as the individual group is onlytwo disks), evenifthree disks fails(one ineachgroup), the RAID 10 is still functional. Inthe above RAID10 example, evenif Disk1, Disk 3, Disk 5 fails, the RAID 10 will still be functional. So, givena choice between RAID 10 and RAID 01, always choose RAID 10.  Selecting a RAID Level: Trade-offs in selectingthe optimal RAIDlevel for a particular application include cost, volume of data, need for reliability, need for performance, and rebuildtime, the latter of whichcanaffect the likelihoodthat a seconddisk will fail while the first faileddiskis being rebuilt. Other decisions include how manydisks are involvedin a RAID set andhowmanydisks to protect with a single paritybit. More disks inthe set increases performance but increasescost. Protectingmore disks per paritybit saves cost, but increases the likelihood that a second disk will fail before the first bad disk is repaired. SUMMARY  Diskdrives are the major secondarystorage I/O devices onmost computers. Most secondarystorage devices are either magnetic disks or magnetic tapes, although solid-state disks are growing in importance. Modern disk drives are structuredas large one-dimensional arrays of logical diskblocks. Generally, theselogicalblocks are 512 bytes insize. Disks maybe attachedto a computer systeminone of two ways:(1) through the local I/O ports on the host computer or (2) through a network connection.  Requests for diskI/O are generatedbythe file system and bythe virtual memory system. Each request specifies the address on the disk to be referenced, inthe form of a logical block number. Disk-scheduling algorithms can improve the effective bandwidth, the average response time, and the variance inresponse time. Algorithms such as SSTF, SCAN, C-SCAN, LOOK, and C-LOOKare designed to make such improvements throughstrategiesfor disk-queue ordering. Performance of disk-scheduling algorithms canvarygreatlyon magnetic disks. In contrast, because solid-state disks have nomoving parts, performance varies little amongalgorithms, and quite often a simple FCFS strategy is used.  Performance canbe harmedbyexternalfragmentation. Some systems have utilities that scan the file system to identifyfragmented files; theythen move blocks aroundto decrease the fragmentation. Defragmenting a badlyfragmentedfile system can significantly improve performance, but the system mayhave reducedperformance whilethe defragmentationis in progress. Sophisticatedfile systems, such as the UNIXFast File System, incorporate manystrategiesto control fragmentationduringspace allocationsothat disk reorganization is not needed.  The operating systemmanages the disk blocks. First, a diskmust be lowlevel- formattedto create the sectors onthe raw hardware—new disks usuallycome preformatted. Then, the diskis partitioned, file systems are created, andboot blocks are allocated to store the system’s bootstrapprogram. Finally,whena block is corrupted, the system must have a wayto lock out that block or to replace it logically with a spare.  Because anefficient swapspace is a keyto goodperformance, systems usuallybypass the file system and use raw-disk access for paging I/O. Some systems dedicate a raw-diskpartitionto swapspace, andothers use a file within the filesystem instead. Still other systems allow the user or system administrator to make the decision by providing both options.  Because of the amount of storage required onlarge systems, disks are frequentlymade redundant via RAID algorithms. These algorithms allowmore thanone diskto be usedfor a given operationandallowcontinuedoperationandevenautomatic recoveryin the face of a disk failure. RAIDalgorithms are organizedintodifferent levels;eachlevel providessome combinationof reliability and high transfer rates. ReadLater  Stable-Storage Implementationpart hasbeen left out. It hasbeen marked optional byprofessor. Won’t be neededfor first run. Further Reading  TertiaryStorage Structure (Just for culture). It is marked optional byprofessor and has beendeletedfrom 9th Edition.