SlideShare a Scribd company logo
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 10: Mass-Storage
Systems
10.2 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 10: Mass-Storage Systems
 Overview of Mass Storage Structure
 Disk Structure
 Disk Attachment
 Disk Scheduling
 Disk Management
 Swap-Space Management
 RAID Structure
 Stable-Storage Implementation
10.3 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Objectives
 To describe the physical structure of secondary storage devices
and its effects on the uses of the devices
 To explain the performance characteristics of mass-storage
devices
 To evaluate disk scheduling algorithms
 To discuss operating-system services provided for mass storage,
including RAID
10.4 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Overview of Mass Storage Structure
 Magnetic disks provide bulk of secondary storage of modern computers
 Drives rotate at 60 to 250 times per second
 Transfer rate is rate at which data flow between drive and computer
 Positioning time (random-access time) is time to move disk arm to
desired cylinder (seek time) and time for desired sector to rotate
under the disk head (rotational latency)
 Head crash results from disk head making contact with the disk
surface -- That’s bad
 Disks can be removable
 Drive attached to computer via I/O bus
 Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel,
SCSI, SAS, Firewire
 Host controller in computer uses bus to talk to disk controller built
into drive or storage array
10.5 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Moving-head Disk Mechanism
10.6 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Hard Disks
 Platters range from .85” to 14” (historically)
 Commonly 3.5”, 2.5”, and 1.8”
 Range from 30GB to 3TB per drive
 Performance
 Transfer Rate – theoretical – 6 Gb/sec
 Effective Transfer Rate – real –
1Gb/sec
 Seek time from 3ms to 12ms – 9ms
common for desktop drives
 Average seek time measured or
calculated based on 1/3 of tracks
 Latency based on spindle speed
 1 / (RPM / 60) = 60 / RPM
 Average latency = ½ latency
(From Wikipedia)
10.7 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Hard Disk Performance
 Access Latency = Average access time = average seek time +
average latency
 For fastest disk 3ms + 2ms = 5ms
 For slow disk 9ms + 5.56ms = 14.56ms
 Average I/O time = average access time + (amount to transfer /
transfer rate) + controller overhead
 For example to transfer a 4KB block on a 7200 RPM disk with a
5ms average seek time, 1Gb/sec transfer rate with a .1ms
controller overhead =
 5ms + 4.17ms + 0.1ms + transfer time =
 Transfer time = 4KB / 1Gb/s * 8Gb / GB * 1GB / 10242KB =
32 / (10242) = 0.031 ms
 Average I/O time for 4KB block = 9.27ms + .031ms =
9.301ms
10.8 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
The First Commercial Disk Drive
1956
IBM RAMDAC computer
included the IBM Model
350 disk storage system
5M (7 bit) characters
50 x 24” platters
Access time = < 1 second
10.9 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Solid-State Disks
 Nonvolatile memory used like a hard drive
 Many technology variations
 Can be more reliable than HDDs
 More expensive per MB
 Maybe have shorter life span
 Less capacity
 But much faster
 Busses can be too slow -> connect directly to PCI for example
 No moving parts, so no seek time or rotational latency
10.10 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Magnetic Tape
 Was early secondary-storage medium
 Evolved from open spools to cartridges
 Relatively permanent and holds large quantities of data
 Access time slow
 Random access ~1000 times slower than disk
 Mainly used for backup, storage of infrequently-used data,
transfer medium between systems
 Kept in spool and wound or rewound past read-write head
 Once data under head, transfer rates comparable to disk
 140MB/sec and greater
 200GB to 1.5TB typical storage
 Common technologies are LTO-{3,4,5} and T10000
10.11 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Structure
 Disk drives are addressed as large 1-dimensional arrays of logical
blocks, where the logical block is the smallest unit of transfer
 Low-level formatting creates logical blocks on physical media
 The 1-dimensional array of logical blocks is mapped into the
sectors of the disk sequentially
 Sector 0 is the first sector of the first track on the outermost
cylinder
 Mapping proceeds in order through that track, then the rest of
the tracks in that cylinder, and then through the rest of the
cylinders from outermost to innermost
 Logical to physical address should be easy
 Except for bad sectors
 Non-constant # of sectors per track via constant angular
velocity
10.12 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Attachment
 Host-attached storage accessed through I/O ports talking to I/O
busses
 SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator
requests operation and SCSI targets perform tasks
 Each target can have up to 8 logical units (disks attached to
device controller)
 FC is high-speed serial architecture
 Can be switched fabric with 24-bit address space – the basis of
storage area networks (SANs) in which many hosts attach to
many storage units
 I/O directed to bus ID, device ID, logical unit (LUN)
10.13 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Storage Array
 Can just attach disks, or arrays of disks
 Storage Array has controller(s), provides features to attached
host(s)
 Ports to connect hosts to array
 Memory, controlling software (sometimes NVRAM, etc)
 A few to thousands of disks
 RAID, hot spares, hot swap (discussed later)
 Shared storage -> more efficiency
 Features found in some file systems
 Snaphots, clones, thin provisioning, replication,
deduplication, etc
10.14 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Storage Area Network
 Common in large storage environments
 Multiple hosts attached to multiple storage arrays - flexible
10.15 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Storage Area Network (Cont.)
 SAN is one or more storage arrays
 Connected to one or more Fibre Channel switches
 Hosts also attach to the switches
 Storage made available via LUN Masking from specific arrays
to specific servers
 Easy to add or remove storage, add new host and allocate it
storage
 Over low-latency Fibre Channel fabric
 Why have separate storage networks and communications
networks?
 Consider iSCSI, FCOE
10.16 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Network-Attached Storage
 Network-attached storage (NAS) is storage made available over
a network rather than over a local connection (such as a bus)
 Remotely attaching to file systems
 NFS and CIFS are common protocols
 Implemented via remote procedure calls (RPCs) between host
and storage over typically TCP or UDP on IP network
 iSCSI protocol uses IP network to carry the SCSI protocol
 Remotely attaching to devices (blocks)
10.17 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Scheduling
 The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a fast
access time and disk bandwidth
 Minimize seek time
 Seek time  seek distance
 Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for service
and the completion of the last transfer
10.18 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Scheduling (Cont.)
 There are many sources of disk I/O request
 OS
 System processes
 Users processes
 I/O request includes input or output mode, disk address, memory
address, number of sectors to transfer
 OS maintains queue of requests, per disk or device
 Idle disk can immediately work on I/O request, busy disk means
work must queue
 Optimization algorithms only make sense when a queue exists
10.19 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Scheduling (Cont.)
 Note that drive controllers have small buffers and can manage a
queue of I/O requests (of varying “depth”)
 Several algorithms exist to schedule the servicing of disk I/O
requests
 The analysis is true for one or many platters
 We illustrate scheduling algorithms with a request queue (0-199)
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
10.20 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
FCFS
Illustration shows total head movement of 640 cylinders
10.21 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
SSTF
 Shortest Seek Time First selects the request with the minimum
seek time from the current head position
 SSTF scheduling is a form of SJF scheduling; may cause
starvation of some requests
 Illustration shows total head movement of 236 cylinders
10.22 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
SCAN
 The disk arm starts at one end of the disk, and moves toward the
other end, servicing requests until it gets to the other end of the
disk, where the head movement is reversed and servicing
continues.
 SCAN algorithm Sometimes called the elevator algorithm
 Illustration shows total head movement of 236 cylinders
 But note that if requests are uniformly dense, largest density at
other end of disk and those wait the longest
10.23 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
SCAN (Cont.)
10.24 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
C-SCAN
 Provides a more uniform wait time than SCAN
 The head moves from one end of the disk to the other, servicing
requests as it goes
 When it reaches the other end, however, it immediately
returns to the beginning of the disk, without servicing any
requests on the return trip
 Treats the cylinders as a circular list that wraps around from the
last cylinder to the first one
 Total number of cylinders?
10.25 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
C-SCAN (Cont.)
10.26 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
C-LOOK
 LOOK a version of SCAN, C-LOOK a version of C-SCAN
 Arm only goes as far as the last request in each direction,
then reverses direction immediately, without first going all
the way to the end of the disk
 Total number of cylinders?
10.27 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
C-LOOK (Cont.)
10.28 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Selecting a Disk-Scheduling Algorithm
 SSTF is common and has a natural appeal
 SCAN and C-SCAN perform better for systems that place a heavy load
on the disk
 Less starvation
 Performance depends on the number and types of requests
 Requests for disk service can be influenced by the file-allocation method
 And metadata layout
 The disk-scheduling algorithm should be written as a separate module of
the operating system, allowing it to be replaced with a different algorithm
if necessary
 Either SSTF or LOOK is a reasonable choice for the default algorithm
 What about rotational latency?
 Difficult for OS to calculate
 How does disk-based queueing effect OS queue ordering efforts?
10.29 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Management
 Low-level formatting, or physical formatting — Dividing a disk into
sectors that the disk controller can read and write
 Each sector can hold header information, plus data, plus error
correction code (ECC)
 Usually 512 bytes of data but can be selectable
 To use a disk to hold files, the operating system still needs to record its
own data structures on the disk
 Partition the disk into one or more groups of cylinders, each treated
as a logical disk
 Logical formatting or “making a file system”
 To increase efficiency most file systems group blocks into clusters
 Disk I/O done in blocks
 File I/O done in clusters
10.30 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Management (Cont.)
 Raw disk access for apps that want to do their own block
management, keep OS out of the way (databases for example)
 Boot block initializes system
 The bootstrap is stored in ROM
 Bootstrap loader program stored in boot blocks of boot
partition
 Methods such as sector sparing used to handle bad blocks
10.31 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Booting from a Disk in Windows
10.32 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Swap-Space Management
 Swap-space — Virtual memory uses disk space as an extension of main memory
 Less common now due to memory capacity increases
 Swap-space can be carved out of the normal file system, or, more commonly, it
can be in a separate disk partition (raw)
 Swap-space management
 4.3BSD allocates swap space when process starts; holds text segment (the
program) and data segment
 Kernel uses swap maps to track swap-space use
 Solaris 2 allocates swap space only when a dirty page is forced out of
physical memory, not when the virtual memory page is first created
 File data written to swap space until write to file system requested
 Other dirty pages go to swap space due to no other home
 Text segment pages thrown out and reread from the file system as
needed
 What if a system runs out of swap space?
 Some systems allow multiple swap spaces
10.33 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Data Structures for Swapping on Linux Systems
10.34 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
RAID Structure
 RAID – redundant array of inexpensive disks
 multiple disk drives provides reliability via redundancy
 Increases the mean time to failure
 Mean time to repair – exposure time when another failure could
cause data loss
 Mean time to data loss based on above factors
 If mirrored disks fail independently, consider disk with 1300,000
mean time to failure and 10 hour mean time to repair
 Mean time to data loss is 100, 0002 / (2 ∗ 10) = 500 ∗ 106 hours,
or 57,000 years!
 Frequently combined with NVRAM to improve write performance
 Several improvements in disk-use techniques involve the use of
multiple disks working cooperatively
10.35 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
RAID (Cont.)
 Disk striping uses a group of disks as one storage unit
 RAID is arranged into six different levels
 RAID schemes improve performance and improve the reliability
of the storage system by storing redundant data
 Mirroring or shadowing (RAID 1) keeps duplicate of each
disk
 Striped mirrors (RAID 1+0) or mirrored stripes (RAID 0+1)
provides high performance and high reliability
 Block interleaved parity (RAID 4, 5, 6) uses much less
redundancy
 RAID within a storage array can still fail if the array fails, so
automatic replication of the data between arrays is common
 Frequently, a small number of hot-spare disks are left
unallocated, automatically replacing a failed disk and having data
rebuilt onto them
10.36 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
RAID Levels
10.37 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
RAID (0 + 1) and (1 + 0)
10.38 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Other Features
 Regardless of where RAID implemented, other useful features
can be added
 Snapshot is a view of file system before a set of changes take
place (i.e. at a point in time)
 More in Ch 12
 Replication is automatic duplication of writes between separate
sites
 For redundancy and disaster recovery
 Can be synchronous or asynchronous
 Hot spare disk is unused, automatically used by RAID production
if a disk fails to replace the failed disk and rebuild the RAID set if
possible
 Decreases mean time to repair
10.39 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Extensions
 RAID alone does not prevent or detect data corruption or other
errors, just disk failures
 Solaris ZFS adds checksums of all data and metadata
 Checksums kept with pointer to object, to detect if object is the
right one and whether it changed
 Can detect and correct data and metadata corruption
 ZFS also removes volumes, partitions
 Disks allocated in pools
 Filesystems with a pool share that pool, use and release
space like malloc() and free() memory allocate /
release calls
10.40 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
ZFS Checksums All Metadata and Data
10.41 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Traditional and Pooled Storage
10.42 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Stable-Storage Implementation
 Write-ahead log scheme requires stable storage
 Stable storage means data is never lost (due to failure, etc)
 To implement stable storage:
 Replicate information on more than one nonvolatile storage media
with independent failure modes
 Update information in a controlled manner to ensure that we can
recover the stable data after any failure during data transfer or
recovery
 Disk write has 1 of 3 outcomes
1. Successful completion - The data were written correctly on disk
2. Partial failure - A failure occurred in the midst of transfer, so only
some of the sectors were written with the new data, and the sector
being written during the failure may have been corrupted
3. Total failure - The failure occurred before the disk write started, so
the previous data values on the disk remain intact
10.43 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Stable-Storage Implementation (Cont.)
 If failure occurs during block write, recovery procedure restores
block to consistent state
 System maintains 2 physical blocks per logical block and
does the following:
1. Write to 1st physical
2. When successful, write to 2nd physical
3. Declare complete only after second write completes
successfully
Systems frequently use NVRAM as one physical to accelerate
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
End of Chapter 10
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 11:
File-System Interface
11.46 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 11: File-System Interface
 File Concept
 Access Methods
 Disk and Directory Structure
 File-System Mounting
 File Sharing
 Protection
11.47 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Objectives
 To explain the function of file systems
 To describe the interfaces to file systems
 To discuss file-system design tradeoffs, including access
methods, file sharing, file locking, and directory structures
 To explore file-system protection
11.48 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Concept
 Contiguous logical address space
 Types:
 Data
 numeric
 character
 binary
 Program
 Contents defined by file’s creator
 Many types
 Consider text file, source file, executable file
11.49 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Attributes
 Name – only information kept in human-readable form
 Identifier – unique tag (number) identifies file within file system
 Type – needed for systems that support different types
 Location – pointer to file location on device
 Size – current file size
 Protection – controls who can do reading, writing, executing
 Time, date, and user identification – data for protection, security,
and usage monitoring
 Information about files are kept in the directory structure, which is
maintained on the disk
 Many variations, including extended file attributes such as file
checksum
 Information kept in the directory structure
11.50 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File info Window on Mac OS X
11.51 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Operations
 File is an abstract data type
 Create
 Write – at write pointer location
 Read – at read pointer location
 Reposition within file - seek
 Delete
 Truncate
 Open(Fi) – search the directory structure on disk for entry Fi,
and move the content of entry to memory
 Close (Fi) – move the content of entry Fi in memory to
directory structure on disk
11.52 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Open Files
 Several pieces of data are needed to manage open files:
 Open-file table: tracks open files
 File pointer: pointer to last read/write location, per
process that has the file open
 File-open count: counter of number of times a file is
open – to allow removal of data from open-file table when
last processes closes it
 Disk location of the file: cache of data access information
 Access rights: per-process access mode information
11.53 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Open File Locking
 Provided by some operating systems and file systems
 Similar to reader-writer locks
 Shared lock similar to reader lock – several processes can
acquire concurrently
 Exclusive lock similar to writer lock
 Mediates access to a file
 Mandatory or advisory:
 Mandatory – access is denied depending on locks held and
requested
 Advisory – processes can find status of locks and decide
what to do
11.54 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Locking Example – Java API
import java.io.*;
import java.nio.channels.*;
public class LockingExample {
public static final boolean EXCLUSIVE = false;
public static final boolean SHARED = true;
public static void main(String arsg[]) throws IOException {
FileLock sharedLock = null;
FileLock exclusiveLock = null;
try {
RandomAccessFile raf = new RandomAccessFile("file.txt", "rw");
// get the channel for the file
FileChannel ch = raf.getChannel();
// this locks the first half of the file - exclusive
exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE);
/** Now modify the data . . . */
// release the lock
exclusiveLock.release();
11.55 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Locking Example – Java API (Cont.)
// this locks the second half of the file - shared
sharedLock = ch.lock(raf.length()/2+1, raf.length(),
SHARED);
/** Now read the data . . . */
// release the lock
sharedLock.release();
} catch (java.io.IOException ioe) {
System.err.println(ioe);
}finally {
if (exclusiveLock != null)
exclusiveLock.release();
if (sharedLock != null)
sharedLock.release();
}
}
}
11.56 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Types – Name, Extension
11.57 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Structure
 None - sequence of words, bytes
 Simple record structure
 Lines
 Fixed length
 Variable length
 Complex Structures
 Formatted document
 Relocatable load file
 Can simulate last two with first method by inserting
appropriate control characters
 Who decides:
 Operating system
 Program
11.58 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Sequential-access File
11.59 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Access Methods
 Sequential Access
read next
write next
reset
no read after last write
(rewrite)
 Direct Access – file is fixed length logical records
read n
write n
position to n
read next
write next
rewrite n
n = relative block number
 Relative block numbers allow OS to decide where file should be placed
 See allocation problem in Ch 12
11.60 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Simulation of Sequential Access on Direct-access File
11.61 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Other Access Methods
 Can be built on top of base methods
 General involve creation of an index for the file
 Keep index in memory for fast determination of location of
data to be operated on (consider UPC code plus record of
data about that item)
 If too large, index (in memory) of the index (on disk)
 IBM indexed sequential-access method (ISAM)
 Small master index, points to disk blocks of secondary
index
 File kept sorted on a defined key
 All done by the OS
 VMS operating system provides index and relative files as
another example (see next slide)
11.62 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Example of Index and Relative Files
11.63 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Directory Structure
 A collection of nodes containing information about all files
F 1 F 2
F 3
F 4
F n
Directory
Files
Both the directory structure and the files reside on disk
11.64 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Disk Structure
 Disk can be subdivided into partitions
 Disks or partitions can be RAID protected against failure
 Disk or partition can be used raw – without a file system, or
formatted with a file system
 Partitions also known as minidisks, slices
 Entity containing file system known as a volume
 Each volume containing file system also tracks that file
system’s info in device directory or volume table of contents
 As well as general-purpose file systems there are many
special-purpose file systems, frequently all within the same
operating system or computer
11.65 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
A Typical File-system Organization
11.66 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Types of File Systems
 We mostly talk of general-purpose file systems
 But systems frequently have may file systems, some general- and
some special- purpose
 Consider Solaris has
 tmpfs – memory-based volatile FS for fast, temporary I/O
 objfs – interface into kernel memory to get kernel symbols for
debugging
 ctfs – contract file system for managing daemons
 lofs – loopback file system allows one FS to be accessed in
place of another
 procfs – kernel interface to process structures
 ufs, zfs – general purpose file systems
11.67 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Operations Performed on Directory
 Search for a file
 Create a file
 Delete a file
 List a directory
 Rename a file
 Traverse the file system
11.68 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Directory Organization
 Efficiency – locating a file quickly
 Naming – convenient to users
 Two users can have same name for different files
 The same file can have several different names
 Grouping – logical grouping of files by properties, (e.g., all
Java programs, all games, …)
The directory is organized logically to obtain
11.69 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Single-Level Directory
 A single directory for all users
 Naming problem
 Grouping problem
11.70 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Two-Level Directory
 Separate directory for each user
 Path name
 Can have the same file name for different user
 Efficient searching
 No grouping capability
11.71 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Tree-Structured Directories
11.72 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Tree-Structured Directories (Cont.)
 Efficient searching
 Grouping Capability
 Current directory (working directory)
 cd /spell/mail/prog
 type list
11.73 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Tree-Structured Directories (Cont)
 Absolute or relative path name
 Creating a new file is done in current directory
 Delete a file
rm <file-name>
 Creating a new subdirectory is done in current directory
mkdir <dir-name>
Example: if in current directory /mail
mkdir count
Deleting “mail”  deleting the entire subtree rooted by “mail”
11.74 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Acyclic-Graph Directories
 Have shared subdirectories and files
11.75 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Acyclic-Graph Directories (Cont.)
 Two different names (aliasing)
 If dict deletes list  dangling pointer
Solutions:
 Backpointers, so we can delete all pointers
Variable size records a problem
 Backpointers using a daisy chain organization
 Entry-hold-count solution
 New directory entry type
 Link – another name (pointer) to an existing file
 Resolve the link – follow pointer to locate the file
11.76 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
General Graph Directory
11.77 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
General Graph Directory (Cont.)
 How do we guarantee no cycles?
 Allow only links to file not subdirectories
 Garbage collection
 Every time a new link is added use a cycle detection
algorithm to determine whether it is OK
11.78 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File System Mounting
 A file system must be mounted before it can be accessed
 A unmounted file system (i.e., Fig. 11-11(b)) is mounted at a
mount point
11.79 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Mount Point
11.80 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Sharing
 Sharing of files on multi-user systems is desirable
 Sharing may be done through a protection scheme
 On distributed systems, files may be shared across a network
 Network File System (NFS) is a common distributed file-sharing
method
 If multi-user system
 User IDs identify users, allowing permissions and
protections to be per-user
Group IDs allow users to be in groups, permitting group
access rights
 Owner of a file / directory
 Group of a file / directory
11.81 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Sharing – Remote File Systems
 Uses networking to allow file system access between systems
 Manually via programs like FTP
 Automatically, seamlessly using distributed file systems
 Semi automatically via the world wide web
 Client-server model allows clients to mount remote file systems from
servers
 Server can serve multiple clients
 Client and user-on-client identification is insecure or complicated
 NFS is standard UNIX client-server file sharing protocol
 CIFS is standard Windows protocol
 Standard operating system file calls are translated into remote calls
 Distributed Information Systems (distributed naming services) such
as LDAP, DNS, NIS, Active Directory implement unified access to
information needed for remote computing
11.82 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Sharing – Failure Modes
 All file systems have failure modes
 For example corruption of directory structures or other non-
user data, called metadata
 Remote file systems add new failure modes, due to network
failure, server failure
 Recovery from failure can involve state information about
status of each remote request
 Stateless protocols such as NFS v3 include all information in
each request, allowing easy recovery but less security
11.83 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File Sharing – Consistency Semantics
 Specify how multiple users are to access a shared file
simultaneously
 Similar to Ch 5 process synchronization algorithms
 Tend to be less complex due to disk I/O and network
latency (for remote file systems
 Andrew File System (AFS) implemented complex remote file
sharing semantics
 Unix file system (UFS) implements:
 Writes to an open file visible immediately to other users of
the same open file
 Sharing file pointer to allow multiple users to read and write
concurrently
 AFS has session semantics
 Writes only visible to sessions starting after the file is
closed
11.84 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Protection
 File owner/creator should be able to control:
 what can be done
 by whom
 Types of access
 Read
 Write
 Execute
 Append
 Delete
 List
11.85 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Access Lists and Groups
 Mode of access: read, write, execute
 Three classes of users on Unix / Linux
RWX
a) owner access 7  1 1 1
RWX
b) group access 6  1 1 0
RWX
c) public access 1  0 0 1
 Ask manager to create a group (unique name), say G, and add
some users to the group.
 For a particular file (say game) or subdirectory, define an
appropriate access.
Attach a group to a file
chgrp G game
11.86 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Windows 7 Access-Control List Management
11.87 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
A Sample UNIX Directory Listing
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
End of Chapter 11
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 12: File System
Implementation
12.90 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 12: File System Implementation
 File-System Structure
 File-System Implementation
 Directory Implementation
 Allocation Methods
 Free-Space Management
 Efficiency and Performance
 Recovery
 NFS
 Example: WAFL File System
12.91 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Objectives
 To describe the details of implementing local file systems and
directory structures
 To describe the implementation of remote file systems
 To discuss block allocation and free-block algorithms and trade-
offs
12.92 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File-System Structure
 File structure
 Logical storage unit
 Collection of related information
 File system resides on secondary storage (disks)
 Provided user interface to storage, mapping logical to physical
 Provides efficient and convenient access to disk by allowing
data to be stored, located retrieved easily
 Disk provides in-place rewrite and random access
 I/O transfers performed in blocks of sectors (usually 512
bytes)
 File control block – storage structure consisting of information
about a file
 Device driver controls the physical device
 File system organized into layers
12.93 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Layered File System
12.94 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File System Layers
 Device drivers manage I/O devices at the I/O control layer
 Given commands like “read drive1, cylinder 72, track 2, sector
10, into memory location 1060” outputs low-level hardware
specific commands to hardware controller
 Basic file system given command like “retrieve block 123”
translates to device driver
 Also manages memory buffers and caches (allocation, freeing,
replacement)
 Buffers hold data in transit
 Caches hold frequently used data
 File organization module understands files, logical address, and
physical blocks
 Translates logical block # to physical block #
 Manages free space, disk allocation
12.95 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File System Layers (Cont.)
 Logical file system manages metadata information
 Translates file name into file number, file handle, location by
maintaining file control blocks (inodes in UNIX)
 Directory management
 Protection
 Layering useful for reducing complexity and redundancy, but
adds overhead and can decrease performanceTranslates file
name into file number, file handle, location by maintaining file
control blocks (inodes in UNIX)
 Logical layers can be implemented by any coding method
according to OS designer
12.96 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File System Layers (Cont.)
 Many file systems, sometimes many within an operating
system
 Each with its own format (CD-ROM is ISO 9660; Unix has
UFS, FFS; Windows has FAT, FAT32, NTFS as well as
floppy, CD, DVD Blu-ray, Linux has more than 40 types,
with extended file system ext2 and ext3 leading; plus
distributed file systems, etc.)
 New ones still arriving – ZFS, GoogleFS, Oracle ASM,
FUSE
12.97 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File-System Implementation
 We have system calls at the API level, but how do we implement
their functions?
 On-disk and in-memory structures
 Boot control block contains info needed by system to boot OS
from that volume
 Needed if volume contains OS, usually first block of volume
 Volume control block (superblock, master file table) contains
volume details
 Total # of blocks, # of free blocks, block size, free block
pointers or array
 Directory structure organizes the files
 Names and inode numbers, master file table
12.98 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File-System Implementation (Cont.)
 Per-file File Control Block (FCB) contains many details about
the file
 inode number, permissions, size, dates
 NFTS stores into in master file table using relational DB
structures
12.99 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
In-Memory File System Structures
 Mount table storing file system mounts, mount points, file
system types
 The following figure illustrates the necessary file system
structures provided by the operating systems
 Figure 12-3(a) refers to opening a file
 Figure 12-3(b) refers to reading a file
 Plus buffers hold data blocks from secondary storage
 Open returns a file handle for subsequent use
 Data from read eventually copied to specified user process
memory address
12.100 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
In-Memory File System Structures
12.101 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Partitions and Mounting
 Partition can be a volume containing a file system (“cooked”) or
raw – just a sequence of blocks with no file system
 Boot block can point to boot volume or boot loader set of blocks that
contain enough code to know how to load the kernel from the file
system
 Or a boot management program for multi-os booting
 Root partition contains the OS, other partitions can hold other
Oses, other file systems, or be raw
 Mounted at boot time
 Other partitions can mount automatically or manually
 At mount time, file system consistency checked
 Is all metadata correct?
 If not, fix it, try again
 If yes, add to mount table, allow access
12.102 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Virtual File Systems
 Virtual File Systems (VFS) on Unix provide an object-oriented
way of implementing file systems
 VFS allows the same system call interface (the API) to be used
for different types of file systems
 Separates file-system generic operations from
implementation details
 Implementation can be one of many file systems types, or
network file system
 Implements vnodes which hold inodes or network file
details
 Then dispatches operation to appropriate file system
implementation routines
12.103 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Virtual File Systems (Cont.)
 The API is to the VFS interface, rather than any specific type of
file system
12.104 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Virtual File System Implementation
 For example, Linux has four object types:
 inode, file, superblock, dentry
 VFS defines set of operations on the objects that must be
implemented
 Every object has a pointer to a function table
 Function table has addresses of routines to implement that
function on that object
 For example:
 • int open(. . .)—Open a file
 • int close(. . .)—Close an already-open file
 • ssize t read(. . .)—Read from a file
 • ssize t write(. . .)—Write to a file
 • int mmap(. . .)—Memory-map a file
12.105 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Directory Implementation
 Linear list of file names with pointer to the data blocks
 Simple to program
 Time-consuming to execute
 Linear search time
 Could keep ordered alphabetically via linked list or use
B+ tree
 Hash Table – linear list with hash data structure
 Decreases directory search time
 Collisions – situations where two file names hash to the
same location
 Only good if entries are fixed size, or use chained-overflow
method
12.106 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Allocation Methods - Contiguous
 An allocation method refers to how disk blocks are allocated for
files:
 Contiguous allocation – each file occupies set of contiguous
blocks
 Best performance in most cases
 Simple – only starting location (block #) and length (number
of blocks) are required
 Problems include finding space for file, knowing file size,
external fragmentation, need for compaction off-line
(downtime) or on-line
12.107 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Contiguous Allocation
 Mapping from logical to physical
LA/512
Q
R
Block to be accessed = Q +
starting address
Displacement into block = R
12.108 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Extent-Based Systems
 Many newer file systems (i.e., Veritas File System) use a
modified contiguous allocation scheme
 Extent-based file systems allocate disk blocks in extents
 An extent is a contiguous block of disks
 Extents are allocated for file allocation
 A file consists of one or more extents
12.109 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Allocation Methods - Linked
 Linked allocation – each file a linked list of blocks
 File ends at nil pointer
 No external fragmentation
 Each block contains pointer to next block
 No compaction, external fragmentation
 Free space management system called when new block
needed
 Improve efficiency by clustering blocks into groups but
increases internal fragmentation
 Reliability can be a problem
 Locating a block can take many I/Os and disk seeks
12.110 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Allocation Methods – Linked (Cont.)
 FAT (File Allocation Table) variation
 Beginning of volume has table, indexed by block number
 Much like a linked list, but faster on disk and cacheable
 New block allocation simple
12.111 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Linked Allocation
 Each file is a linked list of disk blocks: blocks may be scattered
anywhere on the disk
pointer
block =
 Mapping
Block to be accessed is the Qth block in the linked chain of blocks
representing the file.
Displacement into block = R + 1
LA/511
Q
R
12.112 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Linked Allocation
12.113 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
File-Allocation Table
12.114 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Allocation Methods - Indexed
 Indexed allocation
 Each file has its own index block(s) of pointers to its data blocks
 Logical view
index table
12.115 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Example of Indexed Allocation
12.116 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Indexed Allocation (Cont.)
 Need index table
 Random access
 Dynamic access without external fragmentation, but have overhead
of index block
 Mapping from logical to physical in a file of maximum size of 256K
bytes and block size of 512 bytes. We need only 1 block for index
table
LA/512
Q
R
Q = displacement into index table
R = displacement into block
12.117 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Indexed Allocation – Mapping (Cont.)
 Mapping from logical to physical in a file of unbounded length (block
size of 512 words)
 Linked scheme – Link blocks of index table (no limit on size)
LA / (512 x 511)
Q1
R1
Q1 = block of index table
R1 is used as follows:
R1 / 512
Q2
R2
Q2 = displacement into block of index table
R2 displacement into block of file:
12.118 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Indexed Allocation – Mapping (Cont.)
 Two-level index (4K blocks could store 1,024 four-byte pointers in outer
index -> 1,048,567 data blocks and file size of up to 4GB)
LA / (512 x 512)
Q1
R1
Q1 = displacement into outer-index
R1 is used as follows:
R1 / 512
Q2
R2
Q2 = displacement into block of index table
R2 displacement into block of file:
12.119 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Indexed Allocation – Mapping (Cont.)
12.120 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Combined Scheme: UNIX UFS
More index blocks than can be addressed with 32-bit file pointer
4K bytes per block, 32-bit addresses
12.121 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Performance
 Best method depends on file access type
 Contiguous great for sequential and random
 Linked good for sequential, not random
 Declare access type at creation -> select either contiguous or
linked
 Indexed more complex
 Single block access could require 2 index block reads then
data block read
 Clustering can help improve throughput, reduce CPU
overhead
12.122 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Performance (Cont.)
 Adding instructions to the execution path to save one disk I/O is
reasonable
 Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000
MIPS
 http://en.wikipedia.org/wiki/Instructions_per_second
 Typical disk drive at 250 I/Os per second
 159,000 MIPS / 250 = 630 million instructions during one
disk I/O
 Fast SSD drives provide 60,000 IOPS
 159,000 MIPS / 60,000 = 2.65 millions instructions during
one disk I/O
12.123 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Free-Space Management
 File system maintains free-space list to track available blocks/clusters
 (Using term “block” for simplicity)
 Bit vector or bit map (n blocks)
…
0 1 2 n-1
bit[i] =

1  block[i] free
0  block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit
12.124 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Free-Space Management (Cont.)
 Bit map requires extra space
 Example:
block size = 4KB = 212 bytes
disk size = 240 bytes (1 terabyte)
n = 240/212 = 228 bits (or 32MB)
if clusters of 4 blocks -> 8MB of memory
 Easy to get contiguous files
12.125 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Linked Free Space List on Disk
 Linked list (free list)
 Cannot get contiguous
space easily
 No waste of space
 No need to traverse the
entire list (if # free blocks
recorded)
12.126 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Free-Space Management (Cont.)
 Grouping
 Modify linked list to store address of next n-1 free blocks in first
free block, plus a pointer to next block that contains free-block-
pointers (like this one)
 Counting
 Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering
 Keep address of first free block and count of following free
blocks
 Free space list then has entries containing addresses and
counts
12.127 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Free-Space Management (Cont.)
 Space Maps
 Used in ZFS
 Consider meta-data I/O on very large file systems
 Full data structures like bit maps couldn’t fit in memory ->
thousands of I/Os
 Divides device space into metaslab units and manages metaslabs
 Given volume can contain hundreds of metaslabs
 Each metaslab has associated space map
 Uses counting algorithm
 But records to log file rather than file system
 Log of all block activity, in time order, in counting format
 Metaslab activity -> load space map into memory in balanced-tree
structure, indexed by offset
 Replay log into that structure
 Combine contiguous free blocks into single entry
12.128 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Efficiency and Performance
 Efficiency dependent on:
 Disk allocation and directory algorithms
 Types of data kept in file’s directory entry
 Pre-allocation or as-needed allocation of metadata
structures
 Fixed-size or varying-size data structures
12.129 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Efficiency and Performance (Cont.)
 Performance
 Keeping data and metadata close together
 Buffer cache – separate section of main memory for frequently
used blocks
 Synchronous writes sometimes requested by apps or needed
by OS
 No buffering / caching – writes must hit disk before
acknowledgement
 Asynchronous writes more common, buffer-able, faster
 Free-behind and read-ahead – techniques to optimize
sequential access
 Reads frequently slower than writes
12.130 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Page Cache
 A page cache caches pages rather than disk blocks using virtual
memory techniques and addresses
 Memory-mapped I/O uses a page cache
 Routine I/O through the file system uses the buffer (disk) cache
 This leads to the following figure
12.131 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
I/O Without a Unified Buffer Cache
12.132 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Unified Buffer Cache
 A unified buffer cache uses the same page cache to cache
both memory-mapped pages and ordinary file system I/O to
avoid double caching
 But which caches get priority, and what replacement
algorithms to use?
12.133 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
I/O Using a Unified Buffer Cache
12.134 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Recovery
 Consistency checking – compares data in directory structure
with data blocks on disk, and tries to fix inconsistencies
 Can be slow and sometimes fails
 Use system programs to back up data from disk to another
storage device (magnetic tape, other magnetic disk, optical)
 Recover lost file or disk by restoring data from backup
12.135 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Log Structured File Systems
 Log structured (or journaling) file systems record each metadata
update to the file system as a transaction
 All transactions are written to a log
 A transaction is considered committed once it is written to the
log (sequentially)
 Sometimes to a separate device or section of disk
 However, the file system may not yet be updated
 The transactions in the log are asynchronously written to the file
system structures
 When the file system structures are modified, the transaction is
removed from the log
 If the file system crashes, all remaining transactions in the log must
still be performed
 Faster recovery from crash, removes chance of inconsistency of
metadata
12.136 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
The Sun Network File System (NFS)
 An implementation and a specification of a software system
for accessing remote files across LANs (or WANs)
 The implementation is part of the Solaris and SunOS
operating systems running on Sun workstations using an
unreliable datagram protocol (UDP/IP protocol and Ethernet
12.137 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
NFS (Cont.)
 Interconnected workstations viewed as a set of independent machines
with independent file systems, which allows sharing among these file
systems in a transparent manner
 A remote directory is mounted over a local file system directory
 The mounted directory looks like an integral subtree of the local
file system, replacing the subtree descending from the local
directory
 Specification of the remote directory for the mount operation is
nontransparent; the host name of the remote directory has to be
provided
 Files in the remote directory can then be accessed in a
transparent manner
 Subject to access-rights accreditation, potentially any file system
(or directory within a file system), can be mounted remotely on top
of any local directory
12.138 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
NFS (Cont.)
 NFS is designed to operate in a heterogeneous environment of
different machines, operating systems, and network architectures;
the NFS specifications independent of these media
 This independence is achieved through the use of RPC primitives
built on top of an External Data Representation (XDR) protocol
used between two implementation-independent interfaces
 The NFS specification distinguishes between the services provided
by a mount mechanism and the actual remote-file-access services
12.139 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Three Independent File Systems
12.140 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Mounting in NFS
Mounts Cascading mounts
12.141 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
NFS Mount Protocol
 Establishes initial logical connection between server and client
 Mount operation includes name of remote directory to be mounted
and name of server machine storing it
 Mount request is mapped to corresponding RPC and forwarded
to mount server running on server machine
 Export list – specifies local file systems that server exports for
mounting, along with names of machines that are permitted to
mount them
 Following a mount request that conforms to its export list, the
server returns a file handle—a key for further accesses
 File handle – a file-system identifier, and an inode number to
identify the mounted directory within the exported file system
 The mount operation changes only the user’s view and does not
affect the server side
12.142 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
NFS Protocol
 Provides a set of remote procedure calls for remote file operations.
The procedures support the following operations:
 searching for a file within a directory
 reading a set of directory entries
 manipulating links and directories
 accessing file attributes
 reading and writing files
 NFS servers are stateless; each request has to provide a full set
of arguments (NFS V4 is just coming available – very different,
stateful)
 Modified data must be committed to the server’s disk before
results are returned to the client (lose advantages of caching)
 The NFS protocol does not provide concurrency-control
mechanisms
12.143 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Three Major Layers of NFS Architecture
 UNIX file-system interface (based on the open, read, write, and
close calls, and file descriptors)
 Virtual File System (VFS) layer – distinguishes local files from
remote ones, and local files are further distinguished according to
their file-system types
 The VFS activates file-system-specific operations to handle
local requests according to their file-system types
 Calls the NFS protocol procedures for remote requests
 NFS service layer – bottom layer of the architecture
 Implements the NFS protocol
12.144 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Schematic View of NFS Architecture
12.145 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
NFS Path-Name Translation
 Performed by breaking the path into component names and
performing a separate NFS lookup call for every pair of
component name and directory vnode
 To make lookup faster, a directory name lookup cache on the
client’s side holds the vnodes for remote directory names
12.146 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
NFS Remote Operations
 Nearly one-to-one correspondence between regular UNIX system
calls and the NFS protocol RPCs (except opening and closing
files)
 NFS adheres to the remote-service paradigm, but employs
buffering and caching techniques for the sake of performance
 File-blocks cache – when a file is opened, the kernel checks with
the remote server whether to fetch or revalidate the cached
attributes
 Cached file blocks are used only if the corresponding cached
attributes are up to date
 File-attribute cache – the attribute cache is updated whenever new
attributes arrive from the server
 Clients do not free delayed-write blocks until the server confirms
that the data have been written to disk
12.147 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Example: WAFL File System
 Used on Network Appliance “Filers” – distributed file system
appliances
 “Write-anywhere file layout”
 Serves up NFS, CIFS, http, ftp
 Random I/O optimized, write optimized
 NVRAM for write caching
 Similar to Berkeley Fast File System, with extensive
modifications
12.148 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
The WAFL File Layout
12.149 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Snapshots in WAFL
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
End of Chapter 12
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 13: I/O Systems
13.152 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Chapter 13: I/O Systems
 Overview
 I/O Hardware
 Application I/O Interface
 Kernel I/O Subsystem
 Transforming I/O Requests to Hardware Operations
 STREAMS
 Performance
13.153 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Objectives
 Explore the structure of an operating system’s I/O subsystem
 Discuss the principles of I/O hardware and its complexity
 Provide details of the performance aspects of I/O hardware
and software
13.154 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Overview
 I/O management is a major component of operating system
design and operation
 Important aspect of computer operation
 I/O devices vary greatly
 Various methods to control them
 Performance management
 New types of devices frequent
 Ports, busses, device controllers connect to various devices
 Device drivers encapsulate device details
 Present uniform device-access interface to I/O subsystem
13.155 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
I/O Hardware
 Incredible variety of I/O devices
 Storage
 Transmission
 Human-interface
 Common concepts – signals from I/O devices interface with computer
 Port – connection point for device
 Bus - daisy chain or shared direct access
 PCI bus common in PCs and servers, PCI Express (PCIe)
 expansion bus connects relatively slow devices
 Controller (host adapter) – electronics that operate port, bus, device
 Sometimes integrated
 Sometimes separate circuit board (host adapter)
 Contains processor, microcode, private memory, bus controller, etc
– Some talk to per-device controller with bus controller, microcode,
memory, etc
13.156 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
A Typical PC Bus Structure
13.157 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
I/O Hardware (Cont.)
 I/O instructions control devices
 Devices usually have registers where device driver places
commands, addresses, and data to write, or read data from
registers after command execution
 Data-in register, data-out register, status register, control
register
 Typically 1-4 bytes, or FIFO buffer
 Devices have addresses, used by
 Direct I/O instructions
 Memory-mapped I/O
 Device data and command registers mapped to
processor address space
 Especially for large address spaces (graphics)
13.158 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Device I/O Port Locations on PCs (partial)
13.159 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Polling
 For each byte of I/O
1. Read busy bit from status register until 0
2. Host sets read or write bit and if write copies data into data-out
register
3. Host sets command-ready bit
4. Controller sets busy bit, executes transfer
5. Controller clears busy bit, error bit, command-ready bit when
transfer done
 Step 1 is busy-wait cycle to wait for I/O from device
 Reasonable if device is fast
 But inefficient if device slow
 CPU switches to other tasks?
 But if miss a cycle data overwritten / lost
13.160 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Interrupts
 Polling can happen in 3 instruction cycles
 Read status, logical-and to extract status bit, branch if not zero
 How to be more efficient if non-zero infrequently?
 CPU Interrupt-request line triggered by I/O device
 Checked by processor after each instruction
 Interrupt handler receives interrupts
 Maskable to ignore or delay some interrupts
 Interrupt vector to dispatch interrupt to correct handler
 Context switch at start and end
 Based on priority
 Some nonmaskable
 Interrupt chaining if more than one device at same interrupt
number
13.161 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Interrupt-Driven I/O Cycle
13.162 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Intel Pentium Processor Event-Vector Table
13.163 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Interrupts (Cont.)
 Interrupt mechanism also used for exceptions
 Terminate process, crash system due to hardware error
 Page fault executes when memory access error
 System call executes via trap to trigger kernel to execute
request
 Multi-CPU systems can process interrupts concurrently
 If operating system designed to handle it
 Used for time-sensitive processing, frequent, must be fast
13.164 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Direct Memory Access
 Used to avoid programmed I/O (one byte at a time) for large data
movement
 Requires DMA controller
 Bypasses CPU to transfer data directly between I/O device and
memory
 OS writes DMA command block into memory
 Source and destination addresses
 Read or write mode
 Count of bytes
 Writes location of command block to DMA controller
 Bus mastering of DMA controller – grabs bus from CPU
 Cycle stealing from CPU but still much more efficient
 When done, interrupts to signal completion
 Version that is aware of virtual addresses can be even more efficient -
DVMA
13.165 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Six Step Process to Perform DMA Transfer
13.166 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Application I/O Interface
 I/O system calls encapsulate device behaviors in generic classes
 Device-driver layer hides differences among I/O controllers from kernel
 New devices talking already-implemented protocols need no extra
work
 Each OS has its own I/O subsystem structures and device driver
frameworks
 Devices vary in many dimensions
 Character-stream or block
 Sequential or random-access
 Synchronous or asynchronous (or both)
 Sharable or dedicated
 Speed of operation
 read-write, read only, or write only
13.167 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
A Kernel I/O Structure
13.168 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Characteristics of I/O Devices
13.169 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Characteristics of I/O Devices (Cont.)
 Subtleties of devices handled by device drivers
 Broadly I/O devices can be grouped by the OS into
 Block I/O
 Character I/O (Stream)
 Memory-mapped file access
 Network sockets
 For direct manipulation of I/O device specific characteristics,
usually an escape / back door
 Unix ioctl() call to send arbitrary bits to a device control
register and data to device data register
13.170 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Block and Character Devices
 Block devices include disk drives
 Commands include read, write, seek
 Raw I/O, direct I/O, or file-system access
 Memory-mapped file access possible
 File mapped to virtual memory and clusters brought via
demand paging
 DMA
 Character devices include keyboards, mice, serial ports
 Commands include get(), put()
 Libraries layered on top allow line editing
13.171 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Network Devices
 Varying enough from block and character to have own
interface
 Linux, Unix, Windows and many others include socket
interface
 Separates network protocol from network operation
 Includes select() functionality
 Approaches vary widely (pipes, FIFOs, streams, queues,
mailboxes)
13.172 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Clocks and Timers
 Provide current time, elapsed time, timer
 Normal resolution about 1/60 second
 Some systems provide higher-resolution timers
 Programmable interval timer used for timings, periodic
interrupts
 ioctl() (on UNIX) covers odd aspects of I/O such as
clocks and timers
13.173 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Nonblocking and Asynchronous I/O
 Blocking - process suspended until I/O completed
 Easy to use and understand
 Insufficient for some needs
 Nonblocking - I/O call returns as much as available
 User interface, data copy (buffered I/O)
 Implemented via multi-threading
 Returns quickly with count of bytes read or written
 select() to find if data ready then read() or write()
to transfer
 Asynchronous - process runs while I/O executes
 Difficult to use
 I/O subsystem signals process when I/O completed
13.174 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Two I/O Methods
Synchronous Asynchronous
13.175 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Vectored I/O
 Vectored I/O allows one system call to perform multiple I/O
operations
 For example, Unix readve() accepts a vector of multiple
buffers to read into or write from
 This scatter-gather method better than multiple individual I/O
calls
 Decreases context switching and system call overhead
 Some versions provide atomicity
 Avoid for example worry about multiple threads
changing data as reads / writes occurring
13.176 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Kernel I/O Subsystem
 Scheduling
 Some I/O request ordering via per-device queue
 Some OSs try fairness
 Some implement Quality Of Service (i.e. IPQOS)
 Buffering - store data in memory while transferring between devices
 To cope with device speed mismatch
 To cope with device transfer size mismatch
 To maintain “copy semantics”
 Double buffering – two copies of the data
 Kernel and user
 Varying sizes
 Full / being processed and not-full / being used
 Copy-on-write can be used for efficiency in some cases
13.177 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Device-status Table
13.178 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Sun Enterprise 6000 Device-Transfer Rates
13.179 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Kernel I/O Subsystem
 Caching - faster device holding copy of data
 Always just a copy
 Key to performance
 Sometimes combined with buffering
 Spooling - hold output for a device
 If device can serve only one request at a time
 i.e., Printing
 Device reservation - provides exclusive access to a device
 System calls for allocation and de-allocation
 Watch out for deadlock
13.180 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Error Handling
 OS can recover from disk read, device unavailable, transient
write failures
 Retry a read or write, for example
 Some systems more advanced – Solaris FMA, AIX
 Track error frequencies, stop using device with
increasing frequency of retry-able errors
 Most return an error number or code when I/O request fails
 System error logs hold problem reports
13.181 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
I/O Protection
 User process may accidentally or purposefully attempt to
disrupt normal operation via illegal I/O instructions
 All I/O instructions defined to be privileged
 I/O must be performed via system calls
 Memory-mapped and I/O port memory locations must
be protected too
13.182 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Use of a System Call to Perform I/O
13.183 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Kernel Data Structures
 Kernel keeps state info for I/O components, including open file
tables, network connections, character device state
 Many, many complex data structures to track buffers, memory
allocation, “dirty” blocks
 Some use object-oriented methods and message passing to
implement I/O
 Windows uses message passing
 Message with I/O information passed from user mode
into kernel
 Message modified as it flows through to device driver
and back to process
 Pros / cons?
13.184 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
UNIX I/O Kernel Structure
13.185 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Power Management
 Not strictly domain of I/O, but much is I/O related
 Computers and devices use electricity, generate heat, frequently
require cooling
 OSes can help manage and improve use
 Cloud computing environments move virtual machines
between servers
 Can end up evacuating whole systems and shutting them
down
 Mobile computing has power management as first class OS
aspect
13.186 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Power Management (Cont.)
 For example, Android implements
 Component-level power management
 Understands relationship between components
 Build device tree representing physical device topology
 System bus -> I/O subsystem -> {flash, USB storage}
 Device driver tracks state of device, whether in use
 Unused component – turn it off
 All devices in tree branch unused – turn off branch
 Wake locks – like other locks but prevent sleep of device when lock
is held
 Power collapse – put a device into very deep sleep
 Marginal power use
 Only awake enough to respond to external stimuli (button
press, incoming call)
13.187 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
I/O Requests to Hardware Operations
 Consider reading a file from disk for a process:
 Determine device holding file
 Translate name to device representation
 Physically read data from disk into buffer
 Make data available to requesting process
 Return control to process
13.188 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Life Cycle of An I/O Request
13.189 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
STREAMS
 STREAM – a full-duplex communication channel between a
user-level process and a device in Unix System V and beyond
 A STREAM consists of:
 STREAM head interfaces with the user process
 driver end interfaces with the device
 zero or more STREAM modules between them
 Each module contains a read queue and a write queue
 Message passing is used to communicate between queues
 Flow control option to indicate available or busy
 Asynchronous internally, synchronous where user process
communicates with stream head
13.190 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
The STREAMS Structure
13.191 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Performance
 I/O a major factor in system performance:
 Demands CPU to execute device driver, kernel I/O
code
 Context switches due to interrupts
 Data copying
 Network traffic especially stressful
13.192 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Intercomputer Communications
13.193 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Improving Performance
 Reduce number of context switches
 Reduce data copying
 Reduce interrupts by using large transfers, smart controllers,
polling
 Use DMA
 Use smarter hardware devices
 Balance CPU, memory, bus, and I/O performance for highest
throughput
 Move user-mode processes / daemons to kernel threads
13.194 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
Device-Functionality Progression
Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition
End of Chapter 13

More Related Content

PPT
disk scheduling
PPT
operating system
PPT
PPT
PPTX
Viknesh
PPT
DOCX
Mass storagestructure pre-final-formatting
PPT
Disk scheduling geekssay.com
disk scheduling
operating system
Viknesh
Mass storagestructure pre-final-formatting
Disk scheduling geekssay.com

What's hot (20)

PDF
LizardFS-WhitePaper-Eng-v3.9.2-web
PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PPT
4.8 apend backups
PDF
Quick-and-Easy Deployment of a Ceph Storage Cluster
PDF
LizardFS-WhitePaper-Eng-v4.0 (1)
PPT
Ch11 - Silberschatz
PDF
Comparison of-foss-distributed-storage
PPT
Chapter 12 Model Answers
PDF
Comparison of foss distributed storage
PDF
Scale2014
PDF
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
PPTX
958 and 959 sales exam prep
PPTX
TDS-16489U-R2 0215 EN
PDF
はじめてのGlusterFS
PPT
Operating Systems
PDF
SUSE Storage: Sizing and Performance (Ceph)
PDF
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
PPTX
JetStor Unified Storage NAS/SAN/Cloud 1600s
PPTX
Revisiting CephFS MDS and mClock QoS Scheduler
LizardFS-WhitePaper-Eng-v3.9.2-web
Performance comparison of Distributed File Systems on 1Gbit networks
4.8 apend backups
Quick-and-Easy Deployment of a Ceph Storage Cluster
LizardFS-WhitePaper-Eng-v4.0 (1)
Ch11 - Silberschatz
Comparison of-foss-distributed-storage
Chapter 12 Model Answers
Comparison of foss distributed storage
Scale2014
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
958 and 959 sales exam prep
TDS-16489U-R2 0215 EN
はじめてのGlusterFS
Operating Systems
SUSE Storage: Sizing and Performance (Ceph)
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
JetStor Unified Storage NAS/SAN/Cloud 1600s
Revisiting CephFS MDS and mClock QoS Scheduler
Ad

Similar to cs8493 - operating systems unit 4 (20)

PPT
ch10.ppt- Cryptography and Network security
PPT
Disk Management.ppt
PPT
Operating system presentation part 2 2025
PPTX
ch12-gh1.pptx
PPT
Mass_Storage_Structure_presentation.ppttx
PPT
Physics.ppt 9th class physics important.
PPT
ch14.ppt
PDF
ch10_massSt.pdf
PPT
12.mass stroage system
PPT
Lecture12-Secondary Storage-PAhsjfhsjf.ppt
PPT
Database Lecture12-Secondary Storage-PA.ppt
PPT
Mass Storage Structure Manipal University Jaipur PPT
PPT
CH13-OS.PPTdfjhfgjkghkc kjbhkyuikgkmbv mh
PPT
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
PPT
운영체제론 Ch13
PPT
OS Chapter 1 - Introduction of basic concepts and understanding the OS
PPT
ch1-introduction-to-os operating system .ppt
PPT
os storage mass.ppt
PPT
chapter ppt presentation engineering 1.ppt
PPTX
Engg chapter one which shows that how it works
ch10.ppt- Cryptography and Network security
Disk Management.ppt
Operating system presentation part 2 2025
ch12-gh1.pptx
Mass_Storage_Structure_presentation.ppttx
Physics.ppt 9th class physics important.
ch14.ppt
ch10_massSt.pdf
12.mass stroage system
Lecture12-Secondary Storage-PAhsjfhsjf.ppt
Database Lecture12-Secondary Storage-PA.ppt
Mass Storage Structure Manipal University Jaipur PPT
CH13-OS.PPTdfjhfgjkghkc kjbhkyuikgkmbv mh
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
운영체제론 Ch13
OS Chapter 1 - Introduction of basic concepts and understanding the OS
ch1-introduction-to-os operating system .ppt
os storage mass.ppt
chapter ppt presentation engineering 1.ppt
Engg chapter one which shows that how it works
Ad

More from SIMONTHOMAS S (20)

PPTX
Cs8092 computer graphics and multimedia unit 5
PPTX
Cs8092 computer graphics and multimedia unit 4
PPTX
Cs8092 computer graphics and multimedia unit 3
PPT
Cs8092 computer graphics and multimedia unit 2
PPTX
Cs8092 computer graphics and multimedia unit 1
PPTX
Mg6088 spm unit-5
PPT
Mg6088 spm unit-4
PPTX
Mg6088 spm unit-3
PPTX
Mg6088 spm unit-2
PPTX
Mg6088 spm unit-1
PPTX
IT6701-Information Management Unit 5
PPTX
IT6701-Information Management Unit 4
PPTX
IT6701-Information Management Unit 3
PPTX
IT6701-Information Management Unit 2
PPTX
IT6701-Information Management Unit 1
PPTX
CS8391-Data Structures Unit 5
PPTX
CS8391-Data Structures Unit 4
PPTX
CS8391-Data Structures Unit 3
PPTX
CS8391-Data Structures Unit 2
PPTX
CS8391-Data Structures Unit 1
Cs8092 computer graphics and multimedia unit 5
Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 3
Cs8092 computer graphics and multimedia unit 2
Cs8092 computer graphics and multimedia unit 1
Mg6088 spm unit-5
Mg6088 spm unit-4
Mg6088 spm unit-3
Mg6088 spm unit-2
Mg6088 spm unit-1
IT6701-Information Management Unit 5
IT6701-Information Management Unit 4
IT6701-Information Management Unit 3
IT6701-Information Management Unit 2
IT6701-Information Management Unit 1
CS8391-Data Structures Unit 5
CS8391-Data Structures Unit 4
CS8391-Data Structures Unit 3
CS8391-Data Structures Unit 2
CS8391-Data Structures Unit 1

Recently uploaded (20)

PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
DOCX
573137875-Attendance-Management-System-original
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
web development for engineering and engineering
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Digital Logic Computer Design lecture notes
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Well-logging-methods_new................
PPTX
OOP with Java - Java Introduction (Basics)
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CH1 Production IntroductoryConcepts.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
573137875-Attendance-Management-System-original
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
web development for engineering and engineering
UNIT-1 - COAL BASED THERMAL POWER PLANTS
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Digital Logic Computer Design lecture notes
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Well-logging-methods_new................
OOP with Java - Java Introduction (Basics)

cs8493 - operating systems unit 4

  • 1. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 10: Mass-Storage Systems
  • 2. 10.2 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 10: Mass-Storage Systems  Overview of Mass Storage Structure  Disk Structure  Disk Attachment  Disk Scheduling  Disk Management  Swap-Space Management  RAID Structure  Stable-Storage Implementation
  • 3. 10.3 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Objectives  To describe the physical structure of secondary storage devices and its effects on the uses of the devices  To explain the performance characteristics of mass-storage devices  To evaluate disk scheduling algorithms  To discuss operating-system services provided for mass storage, including RAID
  • 4. 10.4 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Overview of Mass Storage Structure  Magnetic disks provide bulk of secondary storage of modern computers  Drives rotate at 60 to 250 times per second  Transfer rate is rate at which data flow between drive and computer  Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)  Head crash results from disk head making contact with the disk surface -- That’s bad  Disks can be removable  Drive attached to computer via I/O bus  Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI, SAS, Firewire  Host controller in computer uses bus to talk to disk controller built into drive or storage array
  • 5. 10.5 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Moving-head Disk Mechanism
  • 6. 10.6 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Hard Disks  Platters range from .85” to 14” (historically)  Commonly 3.5”, 2.5”, and 1.8”  Range from 30GB to 3TB per drive  Performance  Transfer Rate – theoretical – 6 Gb/sec  Effective Transfer Rate – real – 1Gb/sec  Seek time from 3ms to 12ms – 9ms common for desktop drives  Average seek time measured or calculated based on 1/3 of tracks  Latency based on spindle speed  1 / (RPM / 60) = 60 / RPM  Average latency = ½ latency (From Wikipedia)
  • 7. 10.7 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Hard Disk Performance  Access Latency = Average access time = average seek time + average latency  For fastest disk 3ms + 2ms = 5ms  For slow disk 9ms + 5.56ms = 14.56ms  Average I/O time = average access time + (amount to transfer / transfer rate) + controller overhead  For example to transfer a 4KB block on a 7200 RPM disk with a 5ms average seek time, 1Gb/sec transfer rate with a .1ms controller overhead =  5ms + 4.17ms + 0.1ms + transfer time =  Transfer time = 4KB / 1Gb/s * 8Gb / GB * 1GB / 10242KB = 32 / (10242) = 0.031 ms  Average I/O time for 4KB block = 9.27ms + .031ms = 9.301ms
  • 8. 10.8 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition The First Commercial Disk Drive 1956 IBM RAMDAC computer included the IBM Model 350 disk storage system 5M (7 bit) characters 50 x 24” platters Access time = < 1 second
  • 9. 10.9 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Solid-State Disks  Nonvolatile memory used like a hard drive  Many technology variations  Can be more reliable than HDDs  More expensive per MB  Maybe have shorter life span  Less capacity  But much faster  Busses can be too slow -> connect directly to PCI for example  No moving parts, so no seek time or rotational latency
  • 10. 10.10 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Magnetic Tape  Was early secondary-storage medium  Evolved from open spools to cartridges  Relatively permanent and holds large quantities of data  Access time slow  Random access ~1000 times slower than disk  Mainly used for backup, storage of infrequently-used data, transfer medium between systems  Kept in spool and wound or rewound past read-write head  Once data under head, transfer rates comparable to disk  140MB/sec and greater  200GB to 1.5TB typical storage  Common technologies are LTO-{3,4,5} and T10000
  • 11. 10.11 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Structure  Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer  Low-level formatting creates logical blocks on physical media  The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially  Sector 0 is the first sector of the first track on the outermost cylinder  Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost  Logical to physical address should be easy  Except for bad sectors  Non-constant # of sectors per track via constant angular velocity
  • 12. 10.12 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Attachment  Host-attached storage accessed through I/O ports talking to I/O busses  SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and SCSI targets perform tasks  Each target can have up to 8 logical units (disks attached to device controller)  FC is high-speed serial architecture  Can be switched fabric with 24-bit address space – the basis of storage area networks (SANs) in which many hosts attach to many storage units  I/O directed to bus ID, device ID, logical unit (LUN)
  • 13. 10.13 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Storage Array  Can just attach disks, or arrays of disks  Storage Array has controller(s), provides features to attached host(s)  Ports to connect hosts to array  Memory, controlling software (sometimes NVRAM, etc)  A few to thousands of disks  RAID, hot spares, hot swap (discussed later)  Shared storage -> more efficiency  Features found in some file systems  Snaphots, clones, thin provisioning, replication, deduplication, etc
  • 14. 10.14 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Storage Area Network  Common in large storage environments  Multiple hosts attached to multiple storage arrays - flexible
  • 15. 10.15 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Storage Area Network (Cont.)  SAN is one or more storage arrays  Connected to one or more Fibre Channel switches  Hosts also attach to the switches  Storage made available via LUN Masking from specific arrays to specific servers  Easy to add or remove storage, add new host and allocate it storage  Over low-latency Fibre Channel fabric  Why have separate storage networks and communications networks?  Consider iSCSI, FCOE
  • 16. 10.16 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network rather than over a local connection (such as a bus)  Remotely attaching to file systems  NFS and CIFS are common protocols  Implemented via remote procedure calls (RPCs) between host and storage over typically TCP or UDP on IP network  iSCSI protocol uses IP network to carry the SCSI protocol  Remotely attaching to devices (blocks)
  • 17. 10.17 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Scheduling  The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk bandwidth  Minimize seek time  Seek time  seek distance  Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer
  • 18. 10.18 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Scheduling (Cont.)  There are many sources of disk I/O request  OS  System processes  Users processes  I/O request includes input or output mode, disk address, memory address, number of sectors to transfer  OS maintains queue of requests, per disk or device  Idle disk can immediately work on I/O request, busy disk means work must queue  Optimization algorithms only make sense when a queue exists
  • 19. 10.19 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Scheduling (Cont.)  Note that drive controllers have small buffers and can manage a queue of I/O requests (of varying “depth”)  Several algorithms exist to schedule the servicing of disk I/O requests  The analysis is true for one or many platters  We illustrate scheduling algorithms with a request queue (0-199) 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53
  • 20. 10.20 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition FCFS Illustration shows total head movement of 640 cylinders
  • 21. 10.21 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition SSTF  Shortest Seek Time First selects the request with the minimum seek time from the current head position  SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests  Illustration shows total head movement of 236 cylinders
  • 22. 10.22 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition SCAN  The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues.  SCAN algorithm Sometimes called the elevator algorithm  Illustration shows total head movement of 236 cylinders  But note that if requests are uniformly dense, largest density at other end of disk and those wait the longest
  • 23. 10.23 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition SCAN (Cont.)
  • 24. 10.24 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition C-SCAN  Provides a more uniform wait time than SCAN  The head moves from one end of the disk to the other, servicing requests as it goes  When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip  Treats the cylinders as a circular list that wraps around from the last cylinder to the first one  Total number of cylinders?
  • 25. 10.25 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition C-SCAN (Cont.)
  • 26. 10.26 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition C-LOOK  LOOK a version of SCAN, C-LOOK a version of C-SCAN  Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk  Total number of cylinders?
  • 27. 10.27 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition C-LOOK (Cont.)
  • 28. 10.28 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Selecting a Disk-Scheduling Algorithm  SSTF is common and has a natural appeal  SCAN and C-SCAN perform better for systems that place a heavy load on the disk  Less starvation  Performance depends on the number and types of requests  Requests for disk service can be influenced by the file-allocation method  And metadata layout  The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary  Either SSTF or LOOK is a reasonable choice for the default algorithm  What about rotational latency?  Difficult for OS to calculate  How does disk-based queueing effect OS queue ordering efforts?
  • 29. 10.29 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Management  Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write  Each sector can hold header information, plus data, plus error correction code (ECC)  Usually 512 bytes of data but can be selectable  To use a disk to hold files, the operating system still needs to record its own data structures on the disk  Partition the disk into one or more groups of cylinders, each treated as a logical disk  Logical formatting or “making a file system”  To increase efficiency most file systems group blocks into clusters  Disk I/O done in blocks  File I/O done in clusters
  • 30. 10.30 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Management (Cont.)  Raw disk access for apps that want to do their own block management, keep OS out of the way (databases for example)  Boot block initializes system  The bootstrap is stored in ROM  Bootstrap loader program stored in boot blocks of boot partition  Methods such as sector sparing used to handle bad blocks
  • 31. 10.31 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Booting from a Disk in Windows
  • 32. 10.32 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Swap-Space Management  Swap-space — Virtual memory uses disk space as an extension of main memory  Less common now due to memory capacity increases  Swap-space can be carved out of the normal file system, or, more commonly, it can be in a separate disk partition (raw)  Swap-space management  4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment  Kernel uses swap maps to track swap-space use  Solaris 2 allocates swap space only when a dirty page is forced out of physical memory, not when the virtual memory page is first created  File data written to swap space until write to file system requested  Other dirty pages go to swap space due to no other home  Text segment pages thrown out and reread from the file system as needed  What if a system runs out of swap space?  Some systems allow multiple swap spaces
  • 33. 10.33 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Data Structures for Swapping on Linux Systems
  • 34. 10.34 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition RAID Structure  RAID – redundant array of inexpensive disks  multiple disk drives provides reliability via redundancy  Increases the mean time to failure  Mean time to repair – exposure time when another failure could cause data loss  Mean time to data loss based on above factors  If mirrored disks fail independently, consider disk with 1300,000 mean time to failure and 10 hour mean time to repair  Mean time to data loss is 100, 0002 / (2 ∗ 10) = 500 ∗ 106 hours, or 57,000 years!  Frequently combined with NVRAM to improve write performance  Several improvements in disk-use techniques involve the use of multiple disks working cooperatively
  • 35. 10.35 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition RAID (Cont.)  Disk striping uses a group of disks as one storage unit  RAID is arranged into six different levels  RAID schemes improve performance and improve the reliability of the storage system by storing redundant data  Mirroring or shadowing (RAID 1) keeps duplicate of each disk  Striped mirrors (RAID 1+0) or mirrored stripes (RAID 0+1) provides high performance and high reliability  Block interleaved parity (RAID 4, 5, 6) uses much less redundancy  RAID within a storage array can still fail if the array fails, so automatic replication of the data between arrays is common  Frequently, a small number of hot-spare disks are left unallocated, automatically replacing a failed disk and having data rebuilt onto them
  • 36. 10.36 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition RAID Levels
  • 37. 10.37 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition RAID (0 + 1) and (1 + 0)
  • 38. 10.38 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Other Features  Regardless of where RAID implemented, other useful features can be added  Snapshot is a view of file system before a set of changes take place (i.e. at a point in time)  More in Ch 12  Replication is automatic duplication of writes between separate sites  For redundancy and disaster recovery  Can be synchronous or asynchronous  Hot spare disk is unused, automatically used by RAID production if a disk fails to replace the failed disk and rebuild the RAID set if possible  Decreases mean time to repair
  • 39. 10.39 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Extensions  RAID alone does not prevent or detect data corruption or other errors, just disk failures  Solaris ZFS adds checksums of all data and metadata  Checksums kept with pointer to object, to detect if object is the right one and whether it changed  Can detect and correct data and metadata corruption  ZFS also removes volumes, partitions  Disks allocated in pools  Filesystems with a pool share that pool, use and release space like malloc() and free() memory allocate / release calls
  • 40. 10.40 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition ZFS Checksums All Metadata and Data
  • 41. 10.41 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Traditional and Pooled Storage
  • 42. 10.42 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Stable-Storage Implementation  Write-ahead log scheme requires stable storage  Stable storage means data is never lost (due to failure, etc)  To implement stable storage:  Replicate information on more than one nonvolatile storage media with independent failure modes  Update information in a controlled manner to ensure that we can recover the stable data after any failure during data transfer or recovery  Disk write has 1 of 3 outcomes 1. Successful completion - The data were written correctly on disk 2. Partial failure - A failure occurred in the midst of transfer, so only some of the sectors were written with the new data, and the sector being written during the failure may have been corrupted 3. Total failure - The failure occurred before the disk write started, so the previous data values on the disk remain intact
  • 43. 10.43 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Stable-Storage Implementation (Cont.)  If failure occurs during block write, recovery procedure restores block to consistent state  System maintains 2 physical blocks per logical block and does the following: 1. Write to 1st physical 2. When successful, write to 2nd physical 3. Declare complete only after second write completes successfully Systems frequently use NVRAM as one physical to accelerate
  • 44. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition End of Chapter 10
  • 45. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 11: File-System Interface
  • 46. 11.46 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 11: File-System Interface  File Concept  Access Methods  Disk and Directory Structure  File-System Mounting  File Sharing  Protection
  • 47. 11.47 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Objectives  To explain the function of file systems  To describe the interfaces to file systems  To discuss file-system design tradeoffs, including access methods, file sharing, file locking, and directory structures  To explore file-system protection
  • 48. 11.48 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Concept  Contiguous logical address space  Types:  Data  numeric  character  binary  Program  Contents defined by file’s creator  Many types  Consider text file, source file, executable file
  • 49. 11.49 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Attributes  Name – only information kept in human-readable form  Identifier – unique tag (number) identifies file within file system  Type – needed for systems that support different types  Location – pointer to file location on device  Size – current file size  Protection – controls who can do reading, writing, executing  Time, date, and user identification – data for protection, security, and usage monitoring  Information about files are kept in the directory structure, which is maintained on the disk  Many variations, including extended file attributes such as file checksum  Information kept in the directory structure
  • 50. 11.50 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File info Window on Mac OS X
  • 51. 11.51 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Operations  File is an abstract data type  Create  Write – at write pointer location  Read – at read pointer location  Reposition within file - seek  Delete  Truncate  Open(Fi) – search the directory structure on disk for entry Fi, and move the content of entry to memory  Close (Fi) – move the content of entry Fi in memory to directory structure on disk
  • 52. 11.52 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Open Files  Several pieces of data are needed to manage open files:  Open-file table: tracks open files  File pointer: pointer to last read/write location, per process that has the file open  File-open count: counter of number of times a file is open – to allow removal of data from open-file table when last processes closes it  Disk location of the file: cache of data access information  Access rights: per-process access mode information
  • 53. 11.53 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Open File Locking  Provided by some operating systems and file systems  Similar to reader-writer locks  Shared lock similar to reader lock – several processes can acquire concurrently  Exclusive lock similar to writer lock  Mediates access to a file  Mandatory or advisory:  Mandatory – access is denied depending on locks held and requested  Advisory – processes can find status of locks and decide what to do
  • 54. 11.54 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Locking Example – Java API import java.io.*; import java.nio.channels.*; public class LockingExample { public static final boolean EXCLUSIVE = false; public static final boolean SHARED = true; public static void main(String arsg[]) throws IOException { FileLock sharedLock = null; FileLock exclusiveLock = null; try { RandomAccessFile raf = new RandomAccessFile("file.txt", "rw"); // get the channel for the file FileChannel ch = raf.getChannel(); // this locks the first half of the file - exclusive exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE); /** Now modify the data . . . */ // release the lock exclusiveLock.release();
  • 55. 11.55 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Locking Example – Java API (Cont.) // this locks the second half of the file - shared sharedLock = ch.lock(raf.length()/2+1, raf.length(), SHARED); /** Now read the data . . . */ // release the lock sharedLock.release(); } catch (java.io.IOException ioe) { System.err.println(ioe); }finally { if (exclusiveLock != null) exclusiveLock.release(); if (sharedLock != null) sharedLock.release(); } } }
  • 56. 11.56 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Types – Name, Extension
  • 57. 11.57 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Structure  None - sequence of words, bytes  Simple record structure  Lines  Fixed length  Variable length  Complex Structures  Formatted document  Relocatable load file  Can simulate last two with first method by inserting appropriate control characters  Who decides:  Operating system  Program
  • 58. 11.58 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Sequential-access File
  • 59. 11.59 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Access Methods  Sequential Access read next write next reset no read after last write (rewrite)  Direct Access – file is fixed length logical records read n write n position to n read next write next rewrite n n = relative block number  Relative block numbers allow OS to decide where file should be placed  See allocation problem in Ch 12
  • 60. 11.60 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Simulation of Sequential Access on Direct-access File
  • 61. 11.61 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Other Access Methods  Can be built on top of base methods  General involve creation of an index for the file  Keep index in memory for fast determination of location of data to be operated on (consider UPC code plus record of data about that item)  If too large, index (in memory) of the index (on disk)  IBM indexed sequential-access method (ISAM)  Small master index, points to disk blocks of secondary index  File kept sorted on a defined key  All done by the OS  VMS operating system provides index and relative files as another example (see next slide)
  • 62. 11.62 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Example of Index and Relative Files
  • 63. 11.63 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Directory Structure  A collection of nodes containing information about all files F 1 F 2 F 3 F 4 F n Directory Files Both the directory structure and the files reside on disk
  • 64. 11.64 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Disk Structure  Disk can be subdivided into partitions  Disks or partitions can be RAID protected against failure  Disk or partition can be used raw – without a file system, or formatted with a file system  Partitions also known as minidisks, slices  Entity containing file system known as a volume  Each volume containing file system also tracks that file system’s info in device directory or volume table of contents  As well as general-purpose file systems there are many special-purpose file systems, frequently all within the same operating system or computer
  • 65. 11.65 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition A Typical File-system Organization
  • 66. 11.66 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Types of File Systems  We mostly talk of general-purpose file systems  But systems frequently have may file systems, some general- and some special- purpose  Consider Solaris has  tmpfs – memory-based volatile FS for fast, temporary I/O  objfs – interface into kernel memory to get kernel symbols for debugging  ctfs – contract file system for managing daemons  lofs – loopback file system allows one FS to be accessed in place of another  procfs – kernel interface to process structures  ufs, zfs – general purpose file systems
  • 67. 11.67 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Operations Performed on Directory  Search for a file  Create a file  Delete a file  List a directory  Rename a file  Traverse the file system
  • 68. 11.68 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Directory Organization  Efficiency – locating a file quickly  Naming – convenient to users  Two users can have same name for different files  The same file can have several different names  Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …) The directory is organized logically to obtain
  • 69. 11.69 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Single-Level Directory  A single directory for all users  Naming problem  Grouping problem
  • 70. 11.70 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Two-Level Directory  Separate directory for each user  Path name  Can have the same file name for different user  Efficient searching  No grouping capability
  • 71. 11.71 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Tree-Structured Directories
  • 72. 11.72 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Tree-Structured Directories (Cont.)  Efficient searching  Grouping Capability  Current directory (working directory)  cd /spell/mail/prog  type list
  • 73. 11.73 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Tree-Structured Directories (Cont)  Absolute or relative path name  Creating a new file is done in current directory  Delete a file rm <file-name>  Creating a new subdirectory is done in current directory mkdir <dir-name> Example: if in current directory /mail mkdir count Deleting “mail”  deleting the entire subtree rooted by “mail”
  • 74. 11.74 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Acyclic-Graph Directories  Have shared subdirectories and files
  • 75. 11.75 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Acyclic-Graph Directories (Cont.)  Two different names (aliasing)  If dict deletes list  dangling pointer Solutions:  Backpointers, so we can delete all pointers Variable size records a problem  Backpointers using a daisy chain organization  Entry-hold-count solution  New directory entry type  Link – another name (pointer) to an existing file  Resolve the link – follow pointer to locate the file
  • 76. 11.76 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition General Graph Directory
  • 77. 11.77 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition General Graph Directory (Cont.)  How do we guarantee no cycles?  Allow only links to file not subdirectories  Garbage collection  Every time a new link is added use a cycle detection algorithm to determine whether it is OK
  • 78. 11.78 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File System Mounting  A file system must be mounted before it can be accessed  A unmounted file system (i.e., Fig. 11-11(b)) is mounted at a mount point
  • 79. 11.79 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Mount Point
  • 80. 11.80 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Sharing  Sharing of files on multi-user systems is desirable  Sharing may be done through a protection scheme  On distributed systems, files may be shared across a network  Network File System (NFS) is a common distributed file-sharing method  If multi-user system  User IDs identify users, allowing permissions and protections to be per-user Group IDs allow users to be in groups, permitting group access rights  Owner of a file / directory  Group of a file / directory
  • 81. 11.81 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Sharing – Remote File Systems  Uses networking to allow file system access between systems  Manually via programs like FTP  Automatically, seamlessly using distributed file systems  Semi automatically via the world wide web  Client-server model allows clients to mount remote file systems from servers  Server can serve multiple clients  Client and user-on-client identification is insecure or complicated  NFS is standard UNIX client-server file sharing protocol  CIFS is standard Windows protocol  Standard operating system file calls are translated into remote calls  Distributed Information Systems (distributed naming services) such as LDAP, DNS, NIS, Active Directory implement unified access to information needed for remote computing
  • 82. 11.82 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Sharing – Failure Modes  All file systems have failure modes  For example corruption of directory structures or other non- user data, called metadata  Remote file systems add new failure modes, due to network failure, server failure  Recovery from failure can involve state information about status of each remote request  Stateless protocols such as NFS v3 include all information in each request, allowing easy recovery but less security
  • 83. 11.83 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File Sharing – Consistency Semantics  Specify how multiple users are to access a shared file simultaneously  Similar to Ch 5 process synchronization algorithms  Tend to be less complex due to disk I/O and network latency (for remote file systems  Andrew File System (AFS) implemented complex remote file sharing semantics  Unix file system (UFS) implements:  Writes to an open file visible immediately to other users of the same open file  Sharing file pointer to allow multiple users to read and write concurrently  AFS has session semantics  Writes only visible to sessions starting after the file is closed
  • 84. 11.84 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Protection  File owner/creator should be able to control:  what can be done  by whom  Types of access  Read  Write  Execute  Append  Delete  List
  • 85. 11.85 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Access Lists and Groups  Mode of access: read, write, execute  Three classes of users on Unix / Linux RWX a) owner access 7  1 1 1 RWX b) group access 6  1 1 0 RWX c) public access 1  0 0 1  Ask manager to create a group (unique name), say G, and add some users to the group.  For a particular file (say game) or subdirectory, define an appropriate access. Attach a group to a file chgrp G game
  • 86. 11.86 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Windows 7 Access-Control List Management
  • 87. 11.87 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition A Sample UNIX Directory Listing
  • 88. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition End of Chapter 11
  • 89. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 12: File System Implementation
  • 90. 12.90 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 12: File System Implementation  File-System Structure  File-System Implementation  Directory Implementation  Allocation Methods  Free-Space Management  Efficiency and Performance  Recovery  NFS  Example: WAFL File System
  • 91. 12.91 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Objectives  To describe the details of implementing local file systems and directory structures  To describe the implementation of remote file systems  To discuss block allocation and free-block algorithms and trade- offs
  • 92. 12.92 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File-System Structure  File structure  Logical storage unit  Collection of related information  File system resides on secondary storage (disks)  Provided user interface to storage, mapping logical to physical  Provides efficient and convenient access to disk by allowing data to be stored, located retrieved easily  Disk provides in-place rewrite and random access  I/O transfers performed in blocks of sectors (usually 512 bytes)  File control block – storage structure consisting of information about a file  Device driver controls the physical device  File system organized into layers
  • 93. 12.93 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Layered File System
  • 94. 12.94 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File System Layers  Device drivers manage I/O devices at the I/O control layer  Given commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060” outputs low-level hardware specific commands to hardware controller  Basic file system given command like “retrieve block 123” translates to device driver  Also manages memory buffers and caches (allocation, freeing, replacement)  Buffers hold data in transit  Caches hold frequently used data  File organization module understands files, logical address, and physical blocks  Translates logical block # to physical block #  Manages free space, disk allocation
  • 95. 12.95 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File System Layers (Cont.)  Logical file system manages metadata information  Translates file name into file number, file handle, location by maintaining file control blocks (inodes in UNIX)  Directory management  Protection  Layering useful for reducing complexity and redundancy, but adds overhead and can decrease performanceTranslates file name into file number, file handle, location by maintaining file control blocks (inodes in UNIX)  Logical layers can be implemented by any coding method according to OS designer
  • 96. 12.96 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File System Layers (Cont.)  Many file systems, sometimes many within an operating system  Each with its own format (CD-ROM is ISO 9660; Unix has UFS, FFS; Windows has FAT, FAT32, NTFS as well as floppy, CD, DVD Blu-ray, Linux has more than 40 types, with extended file system ext2 and ext3 leading; plus distributed file systems, etc.)  New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
  • 97. 12.97 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File-System Implementation  We have system calls at the API level, but how do we implement their functions?  On-disk and in-memory structures  Boot control block contains info needed by system to boot OS from that volume  Needed if volume contains OS, usually first block of volume  Volume control block (superblock, master file table) contains volume details  Total # of blocks, # of free blocks, block size, free block pointers or array  Directory structure organizes the files  Names and inode numbers, master file table
  • 98. 12.98 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File-System Implementation (Cont.)  Per-file File Control Block (FCB) contains many details about the file  inode number, permissions, size, dates  NFTS stores into in master file table using relational DB structures
  • 99. 12.99 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition In-Memory File System Structures  Mount table storing file system mounts, mount points, file system types  The following figure illustrates the necessary file system structures provided by the operating systems  Figure 12-3(a) refers to opening a file  Figure 12-3(b) refers to reading a file  Plus buffers hold data blocks from secondary storage  Open returns a file handle for subsequent use  Data from read eventually copied to specified user process memory address
  • 100. 12.100 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition In-Memory File System Structures
  • 101. 12.101 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Partitions and Mounting  Partition can be a volume containing a file system (“cooked”) or raw – just a sequence of blocks with no file system  Boot block can point to boot volume or boot loader set of blocks that contain enough code to know how to load the kernel from the file system  Or a boot management program for multi-os booting  Root partition contains the OS, other partitions can hold other Oses, other file systems, or be raw  Mounted at boot time  Other partitions can mount automatically or manually  At mount time, file system consistency checked  Is all metadata correct?  If not, fix it, try again  If yes, add to mount table, allow access
  • 102. 12.102 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Virtual File Systems  Virtual File Systems (VFS) on Unix provide an object-oriented way of implementing file systems  VFS allows the same system call interface (the API) to be used for different types of file systems  Separates file-system generic operations from implementation details  Implementation can be one of many file systems types, or network file system  Implements vnodes which hold inodes or network file details  Then dispatches operation to appropriate file system implementation routines
  • 103. 12.103 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Virtual File Systems (Cont.)  The API is to the VFS interface, rather than any specific type of file system
  • 104. 12.104 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Virtual File System Implementation  For example, Linux has four object types:  inode, file, superblock, dentry  VFS defines set of operations on the objects that must be implemented  Every object has a pointer to a function table  Function table has addresses of routines to implement that function on that object  For example:  • int open(. . .)—Open a file  • int close(. . .)—Close an already-open file  • ssize t read(. . .)—Read from a file  • ssize t write(. . .)—Write to a file  • int mmap(. . .)—Memory-map a file
  • 105. 12.105 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Directory Implementation  Linear list of file names with pointer to the data blocks  Simple to program  Time-consuming to execute  Linear search time  Could keep ordered alphabetically via linked list or use B+ tree  Hash Table – linear list with hash data structure  Decreases directory search time  Collisions – situations where two file names hash to the same location  Only good if entries are fixed size, or use chained-overflow method
  • 106. 12.106 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Allocation Methods - Contiguous  An allocation method refers to how disk blocks are allocated for files:  Contiguous allocation – each file occupies set of contiguous blocks  Best performance in most cases  Simple – only starting location (block #) and length (number of blocks) are required  Problems include finding space for file, knowing file size, external fragmentation, need for compaction off-line (downtime) or on-line
  • 107. 12.107 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Contiguous Allocation  Mapping from logical to physical LA/512 Q R Block to be accessed = Q + starting address Displacement into block = R
  • 108. 12.108 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Extent-Based Systems  Many newer file systems (i.e., Veritas File System) use a modified contiguous allocation scheme  Extent-based file systems allocate disk blocks in extents  An extent is a contiguous block of disks  Extents are allocated for file allocation  A file consists of one or more extents
  • 109. 12.109 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Allocation Methods - Linked  Linked allocation – each file a linked list of blocks  File ends at nil pointer  No external fragmentation  Each block contains pointer to next block  No compaction, external fragmentation  Free space management system called when new block needed  Improve efficiency by clustering blocks into groups but increases internal fragmentation  Reliability can be a problem  Locating a block can take many I/Os and disk seeks
  • 110. 12.110 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Allocation Methods – Linked (Cont.)  FAT (File Allocation Table) variation  Beginning of volume has table, indexed by block number  Much like a linked list, but faster on disk and cacheable  New block allocation simple
  • 111. 12.111 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Linked Allocation  Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk pointer block =  Mapping Block to be accessed is the Qth block in the linked chain of blocks representing the file. Displacement into block = R + 1 LA/511 Q R
  • 112. 12.112 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Linked Allocation
  • 113. 12.113 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition File-Allocation Table
  • 114. 12.114 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Allocation Methods - Indexed  Indexed allocation  Each file has its own index block(s) of pointers to its data blocks  Logical view index table
  • 115. 12.115 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Example of Indexed Allocation
  • 116. 12.116 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Indexed Allocation (Cont.)  Need index table  Random access  Dynamic access without external fragmentation, but have overhead of index block  Mapping from logical to physical in a file of maximum size of 256K bytes and block size of 512 bytes. We need only 1 block for index table LA/512 Q R Q = displacement into index table R = displacement into block
  • 117. 12.117 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Indexed Allocation – Mapping (Cont.)  Mapping from logical to physical in a file of unbounded length (block size of 512 words)  Linked scheme – Link blocks of index table (no limit on size) LA / (512 x 511) Q1 R1 Q1 = block of index table R1 is used as follows: R1 / 512 Q2 R2 Q2 = displacement into block of index table R2 displacement into block of file:
  • 118. 12.118 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Indexed Allocation – Mapping (Cont.)  Two-level index (4K blocks could store 1,024 four-byte pointers in outer index -> 1,048,567 data blocks and file size of up to 4GB) LA / (512 x 512) Q1 R1 Q1 = displacement into outer-index R1 is used as follows: R1 / 512 Q2 R2 Q2 = displacement into block of index table R2 displacement into block of file:
  • 119. 12.119 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Indexed Allocation – Mapping (Cont.)
  • 120. 12.120 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Combined Scheme: UNIX UFS More index blocks than can be addressed with 32-bit file pointer 4K bytes per block, 32-bit addresses
  • 121. 12.121 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Performance  Best method depends on file access type  Contiguous great for sequential and random  Linked good for sequential, not random  Declare access type at creation -> select either contiguous or linked  Indexed more complex  Single block access could require 2 index block reads then data block read  Clustering can help improve throughput, reduce CPU overhead
  • 122. 12.122 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Performance (Cont.)  Adding instructions to the execution path to save one disk I/O is reasonable  Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000 MIPS  http://en.wikipedia.org/wiki/Instructions_per_second  Typical disk drive at 250 I/Os per second  159,000 MIPS / 250 = 630 million instructions during one disk I/O  Fast SSD drives provide 60,000 IOPS  159,000 MIPS / 60,000 = 2.65 millions instructions during one disk I/O
  • 123. 12.123 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Free-Space Management  File system maintains free-space list to track available blocks/clusters  (Using term “block” for simplicity)  Bit vector or bit map (n blocks) … 0 1 2 n-1 bit[i] =  1  block[i] free 0  block[i] occupied Block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit CPUs have instructions to return offset within word of first “1” bit
  • 124. 12.124 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Free-Space Management (Cont.)  Bit map requires extra space  Example: block size = 4KB = 212 bytes disk size = 240 bytes (1 terabyte) n = 240/212 = 228 bits (or 32MB) if clusters of 4 blocks -> 8MB of memory  Easy to get contiguous files
  • 125. 12.125 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Linked Free Space List on Disk  Linked list (free list)  Cannot get contiguous space easily  No waste of space  No need to traverse the entire list (if # free blocks recorded)
  • 126. 12.126 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Free-Space Management (Cont.)  Grouping  Modify linked list to store address of next n-1 free blocks in first free block, plus a pointer to next block that contains free-block- pointers (like this one)  Counting  Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or clustering  Keep address of first free block and count of following free blocks  Free space list then has entries containing addresses and counts
  • 127. 12.127 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Free-Space Management (Cont.)  Space Maps  Used in ZFS  Consider meta-data I/O on very large file systems  Full data structures like bit maps couldn’t fit in memory -> thousands of I/Os  Divides device space into metaslab units and manages metaslabs  Given volume can contain hundreds of metaslabs  Each metaslab has associated space map  Uses counting algorithm  But records to log file rather than file system  Log of all block activity, in time order, in counting format  Metaslab activity -> load space map into memory in balanced-tree structure, indexed by offset  Replay log into that structure  Combine contiguous free blocks into single entry
  • 128. 12.128 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Efficiency and Performance  Efficiency dependent on:  Disk allocation and directory algorithms  Types of data kept in file’s directory entry  Pre-allocation or as-needed allocation of metadata structures  Fixed-size or varying-size data structures
  • 129. 12.129 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Efficiency and Performance (Cont.)  Performance  Keeping data and metadata close together  Buffer cache – separate section of main memory for frequently used blocks  Synchronous writes sometimes requested by apps or needed by OS  No buffering / caching – writes must hit disk before acknowledgement  Asynchronous writes more common, buffer-able, faster  Free-behind and read-ahead – techniques to optimize sequential access  Reads frequently slower than writes
  • 130. 12.130 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Page Cache  A page cache caches pages rather than disk blocks using virtual memory techniques and addresses  Memory-mapped I/O uses a page cache  Routine I/O through the file system uses the buffer (disk) cache  This leads to the following figure
  • 131. 12.131 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition I/O Without a Unified Buffer Cache
  • 132. 12.132 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Unified Buffer Cache  A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O to avoid double caching  But which caches get priority, and what replacement algorithms to use?
  • 133. 12.133 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition I/O Using a Unified Buffer Cache
  • 134. 12.134 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Recovery  Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies  Can be slow and sometimes fails  Use system programs to back up data from disk to another storage device (magnetic tape, other magnetic disk, optical)  Recover lost file or disk by restoring data from backup
  • 135. 12.135 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Log Structured File Systems  Log structured (or journaling) file systems record each metadata update to the file system as a transaction  All transactions are written to a log  A transaction is considered committed once it is written to the log (sequentially)  Sometimes to a separate device or section of disk  However, the file system may not yet be updated  The transactions in the log are asynchronously written to the file system structures  When the file system structures are modified, the transaction is removed from the log  If the file system crashes, all remaining transactions in the log must still be performed  Faster recovery from crash, removes chance of inconsistency of metadata
  • 136. 12.136 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition The Sun Network File System (NFS)  An implementation and a specification of a software system for accessing remote files across LANs (or WANs)  The implementation is part of the Solaris and SunOS operating systems running on Sun workstations using an unreliable datagram protocol (UDP/IP protocol and Ethernet
  • 137. 12.137 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition NFS (Cont.)  Interconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner  A remote directory is mounted over a local file system directory  The mounted directory looks like an integral subtree of the local file system, replacing the subtree descending from the local directory  Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided  Files in the remote directory can then be accessed in a transparent manner  Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory
  • 138. 12.138 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition NFS (Cont.)  NFS is designed to operate in a heterogeneous environment of different machines, operating systems, and network architectures; the NFS specifications independent of these media  This independence is achieved through the use of RPC primitives built on top of an External Data Representation (XDR) protocol used between two implementation-independent interfaces  The NFS specification distinguishes between the services provided by a mount mechanism and the actual remote-file-access services
  • 139. 12.139 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Three Independent File Systems
  • 140. 12.140 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Mounting in NFS Mounts Cascading mounts
  • 141. 12.141 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition NFS Mount Protocol  Establishes initial logical connection between server and client  Mount operation includes name of remote directory to be mounted and name of server machine storing it  Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine  Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them  Following a mount request that conforms to its export list, the server returns a file handle—a key for further accesses  File handle – a file-system identifier, and an inode number to identify the mounted directory within the exported file system  The mount operation changes only the user’s view and does not affect the server side
  • 142. 12.142 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition NFS Protocol  Provides a set of remote procedure calls for remote file operations. The procedures support the following operations:  searching for a file within a directory  reading a set of directory entries  manipulating links and directories  accessing file attributes  reading and writing files  NFS servers are stateless; each request has to provide a full set of arguments (NFS V4 is just coming available – very different, stateful)  Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching)  The NFS protocol does not provide concurrency-control mechanisms
  • 143. 12.143 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Three Major Layers of NFS Architecture  UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors)  Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types  The VFS activates file-system-specific operations to handle local requests according to their file-system types  Calls the NFS protocol procedures for remote requests  NFS service layer – bottom layer of the architecture  Implements the NFS protocol
  • 144. 12.144 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Schematic View of NFS Architecture
  • 145. 12.145 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition NFS Path-Name Translation  Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode  To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names
  • 146. 12.146 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition NFS Remote Operations  Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs (except opening and closing files)  NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance  File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes  Cached file blocks are used only if the corresponding cached attributes are up to date  File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server  Clients do not free delayed-write blocks until the server confirms that the data have been written to disk
  • 147. 12.147 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Example: WAFL File System  Used on Network Appliance “Filers” – distributed file system appliances  “Write-anywhere file layout”  Serves up NFS, CIFS, http, ftp  Random I/O optimized, write optimized  NVRAM for write caching  Similar to Berkeley Fast File System, with extensive modifications
  • 148. 12.148 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition The WAFL File Layout
  • 149. 12.149 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Snapshots in WAFL
  • 150. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition End of Chapter 12
  • 151. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 13: I/O Systems
  • 152. 13.152 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Chapter 13: I/O Systems  Overview  I/O Hardware  Application I/O Interface  Kernel I/O Subsystem  Transforming I/O Requests to Hardware Operations  STREAMS  Performance
  • 153. 13.153 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Objectives  Explore the structure of an operating system’s I/O subsystem  Discuss the principles of I/O hardware and its complexity  Provide details of the performance aspects of I/O hardware and software
  • 154. 13.154 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Overview  I/O management is a major component of operating system design and operation  Important aspect of computer operation  I/O devices vary greatly  Various methods to control them  Performance management  New types of devices frequent  Ports, busses, device controllers connect to various devices  Device drivers encapsulate device details  Present uniform device-access interface to I/O subsystem
  • 155. 13.155 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition I/O Hardware  Incredible variety of I/O devices  Storage  Transmission  Human-interface  Common concepts – signals from I/O devices interface with computer  Port – connection point for device  Bus - daisy chain or shared direct access  PCI bus common in PCs and servers, PCI Express (PCIe)  expansion bus connects relatively slow devices  Controller (host adapter) – electronics that operate port, bus, device  Sometimes integrated  Sometimes separate circuit board (host adapter)  Contains processor, microcode, private memory, bus controller, etc – Some talk to per-device controller with bus controller, microcode, memory, etc
  • 156. 13.156 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition A Typical PC Bus Structure
  • 157. 13.157 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition I/O Hardware (Cont.)  I/O instructions control devices  Devices usually have registers where device driver places commands, addresses, and data to write, or read data from registers after command execution  Data-in register, data-out register, status register, control register  Typically 1-4 bytes, or FIFO buffer  Devices have addresses, used by  Direct I/O instructions  Memory-mapped I/O  Device data and command registers mapped to processor address space  Especially for large address spaces (graphics)
  • 158. 13.158 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Device I/O Port Locations on PCs (partial)
  • 159. 13.159 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Polling  For each byte of I/O 1. Read busy bit from status register until 0 2. Host sets read or write bit and if write copies data into data-out register 3. Host sets command-ready bit 4. Controller sets busy bit, executes transfer 5. Controller clears busy bit, error bit, command-ready bit when transfer done  Step 1 is busy-wait cycle to wait for I/O from device  Reasonable if device is fast  But inefficient if device slow  CPU switches to other tasks?  But if miss a cycle data overwritten / lost
  • 160. 13.160 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Interrupts  Polling can happen in 3 instruction cycles  Read status, logical-and to extract status bit, branch if not zero  How to be more efficient if non-zero infrequently?  CPU Interrupt-request line triggered by I/O device  Checked by processor after each instruction  Interrupt handler receives interrupts  Maskable to ignore or delay some interrupts  Interrupt vector to dispatch interrupt to correct handler  Context switch at start and end  Based on priority  Some nonmaskable  Interrupt chaining if more than one device at same interrupt number
  • 161. 13.161 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Interrupt-Driven I/O Cycle
  • 162. 13.162 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Intel Pentium Processor Event-Vector Table
  • 163. 13.163 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Interrupts (Cont.)  Interrupt mechanism also used for exceptions  Terminate process, crash system due to hardware error  Page fault executes when memory access error  System call executes via trap to trigger kernel to execute request  Multi-CPU systems can process interrupts concurrently  If operating system designed to handle it  Used for time-sensitive processing, frequent, must be fast
  • 164. 13.164 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Direct Memory Access  Used to avoid programmed I/O (one byte at a time) for large data movement  Requires DMA controller  Bypasses CPU to transfer data directly between I/O device and memory  OS writes DMA command block into memory  Source and destination addresses  Read or write mode  Count of bytes  Writes location of command block to DMA controller  Bus mastering of DMA controller – grabs bus from CPU  Cycle stealing from CPU but still much more efficient  When done, interrupts to signal completion  Version that is aware of virtual addresses can be even more efficient - DVMA
  • 165. 13.165 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Six Step Process to Perform DMA Transfer
  • 166. 13.166 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Application I/O Interface  I/O system calls encapsulate device behaviors in generic classes  Device-driver layer hides differences among I/O controllers from kernel  New devices talking already-implemented protocols need no extra work  Each OS has its own I/O subsystem structures and device driver frameworks  Devices vary in many dimensions  Character-stream or block  Sequential or random-access  Synchronous or asynchronous (or both)  Sharable or dedicated  Speed of operation  read-write, read only, or write only
  • 167. 13.167 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition A Kernel I/O Structure
  • 168. 13.168 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Characteristics of I/O Devices
  • 169. 13.169 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Characteristics of I/O Devices (Cont.)  Subtleties of devices handled by device drivers  Broadly I/O devices can be grouped by the OS into  Block I/O  Character I/O (Stream)  Memory-mapped file access  Network sockets  For direct manipulation of I/O device specific characteristics, usually an escape / back door  Unix ioctl() call to send arbitrary bits to a device control register and data to device data register
  • 170. 13.170 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Block and Character Devices  Block devices include disk drives  Commands include read, write, seek  Raw I/O, direct I/O, or file-system access  Memory-mapped file access possible  File mapped to virtual memory and clusters brought via demand paging  DMA  Character devices include keyboards, mice, serial ports  Commands include get(), put()  Libraries layered on top allow line editing
  • 171. 13.171 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Network Devices  Varying enough from block and character to have own interface  Linux, Unix, Windows and many others include socket interface  Separates network protocol from network operation  Includes select() functionality  Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes)
  • 172. 13.172 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Clocks and Timers  Provide current time, elapsed time, timer  Normal resolution about 1/60 second  Some systems provide higher-resolution timers  Programmable interval timer used for timings, periodic interrupts  ioctl() (on UNIX) covers odd aspects of I/O such as clocks and timers
  • 173. 13.173 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Nonblocking and Asynchronous I/O  Blocking - process suspended until I/O completed  Easy to use and understand  Insufficient for some needs  Nonblocking - I/O call returns as much as available  User interface, data copy (buffered I/O)  Implemented via multi-threading  Returns quickly with count of bytes read or written  select() to find if data ready then read() or write() to transfer  Asynchronous - process runs while I/O executes  Difficult to use  I/O subsystem signals process when I/O completed
  • 174. 13.174 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Two I/O Methods Synchronous Asynchronous
  • 175. 13.175 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Vectored I/O  Vectored I/O allows one system call to perform multiple I/O operations  For example, Unix readve() accepts a vector of multiple buffers to read into or write from  This scatter-gather method better than multiple individual I/O calls  Decreases context switching and system call overhead  Some versions provide atomicity  Avoid for example worry about multiple threads changing data as reads / writes occurring
  • 176. 13.176 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Kernel I/O Subsystem  Scheduling  Some I/O request ordering via per-device queue  Some OSs try fairness  Some implement Quality Of Service (i.e. IPQOS)  Buffering - store data in memory while transferring between devices  To cope with device speed mismatch  To cope with device transfer size mismatch  To maintain “copy semantics”  Double buffering – two copies of the data  Kernel and user  Varying sizes  Full / being processed and not-full / being used  Copy-on-write can be used for efficiency in some cases
  • 177. 13.177 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Device-status Table
  • 178. 13.178 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Sun Enterprise 6000 Device-Transfer Rates
  • 179. 13.179 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Kernel I/O Subsystem  Caching - faster device holding copy of data  Always just a copy  Key to performance  Sometimes combined with buffering  Spooling - hold output for a device  If device can serve only one request at a time  i.e., Printing  Device reservation - provides exclusive access to a device  System calls for allocation and de-allocation  Watch out for deadlock
  • 180. 13.180 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Error Handling  OS can recover from disk read, device unavailable, transient write failures  Retry a read or write, for example  Some systems more advanced – Solaris FMA, AIX  Track error frequencies, stop using device with increasing frequency of retry-able errors  Most return an error number or code when I/O request fails  System error logs hold problem reports
  • 181. 13.181 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition I/O Protection  User process may accidentally or purposefully attempt to disrupt normal operation via illegal I/O instructions  All I/O instructions defined to be privileged  I/O must be performed via system calls  Memory-mapped and I/O port memory locations must be protected too
  • 182. 13.182 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Use of a System Call to Perform I/O
  • 183. 13.183 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Kernel Data Structures  Kernel keeps state info for I/O components, including open file tables, network connections, character device state  Many, many complex data structures to track buffers, memory allocation, “dirty” blocks  Some use object-oriented methods and message passing to implement I/O  Windows uses message passing  Message with I/O information passed from user mode into kernel  Message modified as it flows through to device driver and back to process  Pros / cons?
  • 184. 13.184 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition UNIX I/O Kernel Structure
  • 185. 13.185 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Power Management  Not strictly domain of I/O, but much is I/O related  Computers and devices use electricity, generate heat, frequently require cooling  OSes can help manage and improve use  Cloud computing environments move virtual machines between servers  Can end up evacuating whole systems and shutting them down  Mobile computing has power management as first class OS aspect
  • 186. 13.186 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Power Management (Cont.)  For example, Android implements  Component-level power management  Understands relationship between components  Build device tree representing physical device topology  System bus -> I/O subsystem -> {flash, USB storage}  Device driver tracks state of device, whether in use  Unused component – turn it off  All devices in tree branch unused – turn off branch  Wake locks – like other locks but prevent sleep of device when lock is held  Power collapse – put a device into very deep sleep  Marginal power use  Only awake enough to respond to external stimuli (button press, incoming call)
  • 187. 13.187 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition I/O Requests to Hardware Operations  Consider reading a file from disk for a process:  Determine device holding file  Translate name to device representation  Physically read data from disk into buffer  Make data available to requesting process  Return control to process
  • 188. 13.188 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Life Cycle of An I/O Request
  • 189. 13.189 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition STREAMS  STREAM – a full-duplex communication channel between a user-level process and a device in Unix System V and beyond  A STREAM consists of:  STREAM head interfaces with the user process  driver end interfaces with the device  zero or more STREAM modules between them  Each module contains a read queue and a write queue  Message passing is used to communicate between queues  Flow control option to indicate available or busy  Asynchronous internally, synchronous where user process communicates with stream head
  • 190. 13.190 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition The STREAMS Structure
  • 191. 13.191 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Performance  I/O a major factor in system performance:  Demands CPU to execute device driver, kernel I/O code  Context switches due to interrupts  Data copying  Network traffic especially stressful
  • 192. 13.192 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Intercomputer Communications
  • 193. 13.193 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Improving Performance  Reduce number of context switches  Reduce data copying  Reduce interrupts by using large transfers, smart controllers, polling  Use DMA  Use smarter hardware devices  Balance CPU, memory, bus, and I/O performance for highest throughput  Move user-mode processes / daemons to kernel threads
  • 194. 13.194 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition Device-Functionality Progression
  • 195. Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition End of Chapter 13