SlideShare a Scribd company logo
UNIT-3
ADVANCE FILE I/O
ADVANCE FILE I/O
Scatter/gather I/O:
• Allows a single call to read from or write data to many
buffers at once
• Useful for bunching together fields of different data
structures to form one I/O transaction.
Epoll:
• Improves on the poll() and select() system calls
• Useful when hundreds of file descriptors need to be
polled from a single thread.
Memory-mapped I/O:
• Maps a file into memory, allowing file I/O to occur via
simple memory manipulation.
• Useful for certain patterns of I/O.
ADVANCE FILE I/O
File advice:
• Allows a process to provide hints to the kernel on the
process’s intended
• Uses for a file and improved I/O performance.
Asynchronous I/O:
• Allows a process to issue I/O requests without
waiting for them to complete
• Useful for juggling heavy I/O workloads without the
use of threads.
Scatter/Gather I/O
• Scatter/gather I/O is a method of input and output
where a single system call writes to a vector of
buffers from a single data stream, or, alternatively,
reads into a vector of buffers from a single data
stream.
• This type of I/O is so named because the data is
scattered into or gathered from the given vector of
buffers.
• An alternative name for this approach to input and
output is vectored I/O.
Scatter/Gather I/O
• Scatter/gather I/O provides several advantages over
linear I/O methods:
– More natural coding pattern
• If your data is naturally segmented—say, the fields of a predefined
structure—vectored I/O allows for intuitive manipulation.
– Efficiency
• A single vectored I/O operation can replace multiple linear I/O
operations.
– Performance
• In addition to a reduction in the number of issued system calls, a
vectored I/O implementation can provide improved performance
over a linear I/O implementation via internal optimizations.
– Atomicity
• In contrast with multiple linear I/O operations, a process can
execute a single vectored I/O operation with no risk of interleaving
I/O from another process.
Scatter/Gather I/O
• readv() and writev():
– The readv() function reads count segments from
the file descriptor fd into the buffers described by
iov.
– The writev() function writes at most count
segments from the buffers described by iov into
the file descriptor fd:
– Syntax for readv:
#include <sys/uio.h>
ssize_t readv (int fd, const struct iovec *iov, int count);
– Syntax for writev:
ssize_t writev (int fd, const struct iovec *iov,int count);
Scatter/Gather I/O
• The functions readv and writev both use a concept of
an I/O vector. The <sys/uio.h> include file defines the
struct iovec, which is defined as follows:
struct iovec {
ptr_t iov_base;
size_t iov_len;
};
Scatter/Gather I/O
• The struct iovec defines one vector element.
Normally, this structure is used as an array of
multiple elements.
• For each transfer element, the pointer member
iov_base points to a buffer that is receiving data
for readv or is transmitting data for writev.
• The member iov_len in each case determines the
maximum receive length and the actual write length,
respectively.
Scatter/Gather I/O
• The readv() and writev() functions behave the same
as read() and write(), respectively,except that
multiple buffers are read from or written to.
• On success, readv() and writev() return the number
of bytes read or written, respectively. This number
should be the sum of all count iov_len values.
• On error, the system calls return −1 and set errno as
appropriate.
Scatter/Gather I/O
• The following code sample demonstrates the use
of writev():
• “char *str0 = "hello ";
char *str1 = "worldn";
struct iovec iov[2];
ssize_t nwritten;
iov[0].iov_base = str0;
iov[0].iov_len = strlen(str0);
iov[1].iov_base = str1;
iov[1].iov_len = strlen(str1);
nwritten = writev(STDOUT_FILENO, iov, 2);”
Event Poll
• User processes can efficiently monitor and control
multiple streams with two system calls: poll and the
I_SETSIG ioctl command.
• Recognizing the limitations of both poll() and select(),
Linux kernel2 introduced the event poll (epoll)
facility.
• epoll solves the fundamental performance problem
shared by both of them and adds several new
features
Event Poll
• Both poll() and select() require the full list of file
descriptors to watch on each invocation
• The kernel must then walk the list of each file
descriptor to be monitored
• One system call initializes an epoll context, another
adds monitored file descriptors to or removes them
from the context, and a third performs the actual
event wait.
• #include <sys/epoll.h>
• int epoll_create1 (int flags);
• int epoll_create (int size);
Event Poll
• A successful call to epoll_create1() instantiates a new
epoll instance and returns a file descriptor associated
with the instance
• On error, the call returns −1 and sets errno to one of
the following:
• EINVAL: Invalid flags parameter.
• EMFILE: The user has reached their limit on the
total number of open files.
• ENFILE: The system has reached its limit on the
total number of open files.
• ENOMEM: Insufficient memory was available to
complete the operation.
Controlling Epoll
• The epoll_ctl() system call can be used to add file
descriptors to and remove file descriptors from a given
epoll context
• #include <sys/epoll.h>
int epoll_ctl (int epfd, int op,int fd,struct
epoll_event *event);
• A successful call to epoll_ctl() controls the epoll instance
associated with the filedescriptor epfd.
• The parameter op specifies the operation to be taken
against the file associated with fd.
Controlling Epoll
• Valid values for the op parameter:
• EPOLL_CTL_ADD: Add a monitor on the file associated
with the file descriptor fd to the epoll instance
associated with epfd, per the events defined in event.
• EPOLL_CTL_DEL: Remove a monitor on the file
associated with the file descriptor fd from the epoll
instance associated with epfd.
• EPOLL_CTL_MOD: Modify an existing monitor of fd with
the updated events specified by event.
Controlling Epoll
• The events field in the epoll_event structure
• EPOLLERR: An error condition occurred on the file. This
event is always monitored, even if it’s not specified.
• EPOLLET: Enables edge-triggered behavior for the monitor of
the file.The default behavior is level-triggered.
• EPOLLHUP: A hangup occurred on the file. This event is
always monitored, even if it’s not specified.
• EPOLLIN: The file is available to be read from without
blocking.
• EPOLLOUT: The file is available to be written to without
blocking.
• EPOLLPRI: There is urgent out-of-band data available to
read.
Mapping Files into Memory
• As an alternative to standard file I/O, the kernel
provides an interface that allows an application to map
a file into memory, meaning that there is a one-to-one
correspondence between a memory address and a word
in the file.
• mmap():
• A call to mmap() asks the kernel to map len bytes of the
object represented by the file descriptor fd, starting at
offset bytes into the file, into memory.
• If addr is included, it indicates a preference to use that
starting address in memory
Mapping Files into Memory
• #include <sys/mman.h>
• void * mmap (void *addr, size_t len, int prot, int flags,
int fd, off_t offset);
• The addr parameter offers a suggestion to the kernel of
where best to map the file.(most users pass 0)
• The prot parameter describes the desired memory
protection of the mapping.
• PROT_READ: The pages may be read.
• PROT_WRITE: The pages may be written.
• PROT_EXEC: The pages may be executed.
Mapping Files into Memory
Mapping Files into Memory
• Return values and error codes
• On success, a call to mmap() returns the location of the mapping.
• On failure, the call returns MAP_FAILED and sets errno
appropriately.
• Possible errno values include:
• EACCES: The given file descriptor is not a regular file, or the mode
with which it was opened conflicts with prot or flags.
• EAGAIN: The file has been locked via a file lock.
• EBADF: The given file descriptor is not valid.
• EINVAL: One or more of the parameters addr, len, or off are
invalid.
• ENFILE: The system-wide limit on open files has been reached.
• ENODEV: The filesystem on which the file to map resides does not
support memory mapping.
Advice for Normal File I/O
• Linux provides two interfaces for such advice giving:
posix_fadvise() and readahead().
• The posix_fadvise() System Call:
• The first advice interface, as its name alludes, is
standardized by POSIX 1003.1-2003:
• #include <fcntl.h>
• int posix_fadvise (int fd, off_t offset, off_t len, int
advice);
• A call to posix_fadvise() provides the kernel with the
hint advice on the file descriptor fd in the interval
[offset,offset+len).
Advice for Normal File I/O
• one of the following should be provided for advice:
• POSIX_FADV_NORMAL: The application has no specific advice to
give on this range of the file. It should be treated as normal.
• POSIX_FADV_RANDOM: The application intends to access the
data in the specified range in a random (nonsequential)
order.
• POSIX_FADV_SEQUENTIAL: The application intends to access the
data in the specified range sequentially, from lower to higher
addresses.
• POSIX_FADV_WILLNEED: The application intends to access the
data in the specified range in the near future.
• POSIX_FADV_NOREUSE: The application intends to access the
data in the specified range in the near future, but only once.
• POSIX_FADV_DONTNEED: The application does not intend to
access the pages in the specified range in the near future.
Synchronized, Synchronous, and
Asynchronous Operations
• A synchronized operation is more restrictive and safer
than a merely synchronous operation.
• The terms synchronous and asynchronous refer to
whether I/O operations wait for some event (e.g.,
storage of the data) before returning.
• The terms synchronized and non synchronized,
meanwhile, specify exactly what event must occur (e.g.,
writing the data to disk).
• Normally, Unix write operations are synchronous and
nonsynchronized; read operations are synchronous and
synchronized.
Synchronized, Synchronous, and
Asynchronous Operations
Synchronized, Synchronous, and
Asynchronous Operations
Synchronized, Synchronous, and
Asynchronous Operations
• Asynchronous I/O:
• Performing asynchronous I/O requires kernel support at the very lowest
layers
• POSIX 1003.1-2003 defines the aio interfaces, which Linux fortunately
implements
• #include <aio.h>
• /* asynchronous I/O control block */
• struct aiocb {
• int aio_fildes; /* file descriptor */
• int aio_lio_opcode; /* operation to perform */
• int aio_reqprio; /* request priority offset */
• volatile void *aio_buf; /* pointer to buffer */
• size_t aio_nbytes; /* length of operation */
• struct sigevent aio_sigevent; /* signal number and value */
• /* internal, private members follow... */
• };
I/O Schedulers and I/O Performance
• In a modern system, the relative performance gap
between disks and the rest of the system is quite large
• The worst component of disk performance is the
process of moving the read/write head from one part of
the disk to another, an operation known as a seek.
• a single disk seek can average over 8 milliseconds still a
small number, to be sure, but 25 million times longer
than a single processor cycle.
I/O Schedulers and I/O Performance
• Inefficient to send I/O requests to the disk in the order
in which they are issued.
• Therefore, modern operating system kernels implement
I/O schedulers, which work to minimize the number and
size of disk seeks by manipulating the order in which I/O
requests are serviced and the times at which they are
serviced.
• Disk Addressing:
• Hard disks address their data using the familiar
geometry-based addressing of cylinders, heads, and
sectors, or CHS addressing.
I/O Schedulers and I/O Performance
• Cylinder-head-sector (CHS) is an early method for giving
addresses to each physical block of data on a hard disk
drive.
I/O Schedulers and I/O Performance
• To locate a specific unit of data on a disk, the drive’s
logic requires three pieces of information:
• the cylinder, head, and sector values
• The hard disk knows what platter, what track, and what
sector to look in for the data.
• It can position the read/ write head of the correct
platter over the correct track and read from or write to
the requisite sector.
I/O Schedulers and I/O Performance
• Modern hard disks do not force computers to
communicate with their disks in terms of cylinders,
heads, and sectors.
• Instead, contemporary hard drives map a unique block
number (also called physical blocks or device blocks)
over each cylinder/head/sector triplet effectively, a
block maps to a specific sector.
• Modern operating systems can then address hard drives
using these block numbers a process known as logical
block addressing (LBA) and the hard drive internally
translates the block number into the correct CHS
address.
I/O Schedulers and I/O Performance
• The Life of an I/O Scheduler:
• I/O schedulers perform two basic operations: merging
and sorting.
• Merging is the process of taking two or more adjacent
I/O requests and combining them into a single request.
• Consider two requests, one to read from disk block 5,
and another to read from disk blocks 6 through 7.
• These requests can be merged into a single request to
read from disk blocks 5 through 7.
• The total amount of I/O might be the same, but the
number of I/O operations is reduced by half.
I/O Schedulers and I/O Performance
• Sorting: The process of arranging pending I/O requests
in ascending block order.
• Given I/O operations to blocks 52, 109, and 7
• The I/O scheduler would sort these requests into the
ordering 7, 52, and 109.
• If a request was then issued to block 81, it would be
inserted between the requests to blocks 52 and 109.
• The I/O scheduler would then dispatch the requests to
the disk in the order that they exist in the queue: 7,
then 52, then 81, and finally 109.
• In this manner, the disk head’s movements are
minimized.
I/O Schedulers and I/O Performance
• Helping Out Reads:
• Each read request must return up-to-date data.
• if the requested data is not in the page cache, the reading
process must block until the data can be read from disk a
potentially lengthy operation.
• We call this performance impact read latency.
• Consider reading every file in a directory.
• The application opens the first file, reads a chunk of it, waits
for data, reads another chunk, and so on, until the entire file
is read.
• Then the application starts again, on the next file.
• The requests become serialized: a subsequent request
cannot be issued until the current request completes.
I/O Schedulers and I/O Performance
• The Deadline I/O Scheduler:
• The main goal of the Deadline scheduler is to guarantee
a start service time for a request.
• It does so by imposing a deadline on all I/O operations
to prevent starvation of requests.
• It also maintains two deadline queues, in addition to the
sorted queues (both read and write).
• Before serving the next request, the deadline scheduler
decides which queue to use.
• Read queues are given a higher priority, because
processes usually block on read operations.
I/O Schedulers and I/O Performance
• The Deadline I/O Scheduler:
I/O Schedulers and I/O Performance
• The Anticipatory I/O Scheduler:
• Consider a system undergoing heavy write activity.
• Every time a read request is submitted, the I/O
scheduler quickly rushes to handle the read request
• The preference toward read requests is a good thing,
but the resulting pair of seeks is detrimental to global
disk throughput.
• The Anticipatory I/O scheduler aims to continue to
provide excellent read latency, but also provide excellent
global throughput.
I/O Schedulers and I/O Performance
• The Anticipatory I/O Scheduler:
• The Anticipatory I/O scheduler starts with the Deadline I/O
scheduler as its base.
• When a read request is issued, it is handled as usual, within
its usual expiration period.
• After the request is submitted, however, the Anticipatory I/O
scheduler does not immediately seek back and return to
handling other requests.
• Instead, it does absolutely nothing for a few milliseconds (by
default it is six milliseconds).
• In those few milliseconds, Any requests issued to an
adjacent area of the disk are immediately handled.
• After the waiting period elapses, the Anticipatory I/O
scheduler seeks back to where it left off and continues
handling the previous requests.
I/O Schedulers and I/O Performance
• The CFQ I/O Scheduler:
• The Complete Fair Queuing (CFQ) I/O scheduler is an I/O
scheduler designed for specialized workloads, provides
good performance across multiple workloads.
• Each process is assigned its own queue, and each queue
is assigned a time slice.
• The I/O scheduler visits each queue in a round-robin
fashion, servicing requests from the queue until the
queue’s time slice is exhausted, or until no more
requests remain.
• The CFQ I/O Scheduler will then sit idle for a brief
period(by default, 10 ms) waiting for a new request on
the queue.
I/O Schedulers and I/O Performance
• The Noop I/O Scheduler:
• The Noop I/O Scheduler is the most basic of the
available schedulers.
• It performs no sorting whatsoever, only basic merging.
• It is used for specialized devices that do not require (or
that perform) their own request sorting.
I/O Schedulers and I/O Performance
• Selecting and Configuring Your I/O Scheduler:
• The default I/O scheduler is selectable at boot time via
the iosched kernel command line parameter.
• Valid options are as, cfq, deadline, and noop.
• The I/O scheduler is also runtime-selectable on a per
device basis via /sys/block/[device]/queue/scheduler
• Ex: to set the device hda to the CFQ I/O Scheduler:
• # echo cfq > /sys/block/hda/queue/scheduler
• Changing any of these settings requires rootprivileges.
I/O Schedulers and I/O Performance
• Optimizing I/O Performance:
• Scheduling I/O in user space:
• The I/O scheduler does its job, sorting and merging the
requests before sending them out to the disk.
• If an application is generating many requests particularly
if they are for data all over the disk it can benefit from
sorting the requests before submitting them, ensuring
they reach the I/O scheduler in the desired order.
• User-space applications can sort based on:
• The full path
• The inode number
• The physical disk block of the file
I/O Schedulers and I/O Performance
• Sorting by path:
• Sorting by the path name is the easiest, yet least
effective
• Due to the layout algorithms used by most file systems,
the files in each directory and thus the directories
sharing a parent directory tend to be adjacent on disk.
• It is certainly true that two files in the same directory
have a better chance of being located near each other
than two files in radically different parts of the file
system.
I/O Schedulers and I/O Performance
• Sorting by inode:
• Inodes are Unix constructs that contain the metadata
associated with individual files.
• each file has exactly one inode, which contains
information such as the file’s size, permissions, owner,
and so on.
• Sorting by inode is better than sorting by path
file i's inode number < file j's inode number
• implies, in general, that:
physical blocks of file i < physical blocks of file j
I/O Schedulers and I/O Performance
• Sorting by physical block:
• Each file is broken up into logical blocks, which are the
smallest allocation units of a file system.
• The size of a logical block is file system dependent.
• Each logical block maps to a single physical block.
• We can thus find the number of logical blocks in a file,
determine what physical blocks they map to, and sort
based on that.
• The kernel provides a method for obtaining the physical
disk block from the logical block number of a file.
• This is done via the ioctl() system call
I/O Schedulers and I/O Performance
ret = ioctl (fd, FIBMAP, &block);
if (ret < 0)
perror ("ioctl");
• fd is the file descriptor of the file in question
• block is the logical block whose physical block we want
to determine.
• On successful return, block is replaced with the physical
block number.
• The logical blocks passed in are zero-indexed and file-
relative.
• If a file is made up of eight logical blocks, valid values
are 0 through 7.
I/O Schedulers and I/O Performance
• Finding the logical-to-physical-block mapping is thus a
two-step process.
• First, we must determine the number of blocks in a
given file.
• This is done via the stat() system call.
• Second, for each logical block, we must issue an ioctl()
request to find the corresponding physical block.

More Related Content

PPTX
Linux System Programming - File I/O
PPTX
Char Drivers And Debugging Techniques
PPTX
Linux System Programming - Buffered I/O
PPTX
Device Drivers and Running Modules
PDF
Linux for embedded_systems
PDF
Introduction to char device driver
PPT
Driver development – memory management
PPT
LINUX Device Drivers
Linux System Programming - File I/O
Char Drivers And Debugging Techniques
Linux System Programming - Buffered I/O
Device Drivers and Running Modules
Linux for embedded_systems
Introduction to char device driver
Driver development – memory management
LINUX Device Drivers

What's hot (20)

PPTX
Introduction Linux Device Drivers
PDF
Linux Char Device Driver
PPTX
Linux device drivers
PDF
Board support package_on_linux
DOC
Type of Embedded core
PPT
Kernel module programming
PDF
Linux Module Programming
ODP
Linux internal
PDF
Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities)
PDF
brief intro to Linux device drivers
PDF
Part 04 Creating a System Call in Linux
PPTX
Linux Device Driver’s
PPT
linux device driver
PPT
PPTX
Device Drivers
PPT
Introduction to System Calls
PDF
Studying a decade of Linux system calls
PDF
Linux Internals - Interview essentials - 1.0
PDF
Linux Kernel and Driver Development Training
Introduction Linux Device Drivers
Linux Char Device Driver
Linux device drivers
Board support package_on_linux
Type of Embedded core
Kernel module programming
Linux Module Programming
Linux internal
Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities)
brief intro to Linux device drivers
Part 04 Creating a System Call in Linux
Linux Device Driver’s
linux device driver
Device Drivers
Introduction to System Calls
Studying a decade of Linux system calls
Linux Internals - Interview essentials - 1.0
Linux Kernel and Driver Development Training
Ad

Similar to Linux System Programming - Advanced File I/O (20)

PPTX
Os lectures
PDF
Smash the Stack: Writing a Buffer Overflow Exploit (Win32)
PPTX
Systemcall1
PPT
Data file handling
PPTX
Linux IO
PDF
02_os_structures.pdfbnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
DOCX
Linux 系統程式--第一章 i/o 函式
PPT
System calls in Linux environment for beginners
PPTX
OOPs & C++(UNIT 5)
PPT
06 file processing
PPTX
IOStream.pptx
PPTX
Unix-module4 -chap2.123156456484844546pptx
PPTX
Device_drivers_copy_to_user_copy_from_user.pptx
PDF
Kqueue : Generic Event notification
PPTX
unit-1 lecture 7 Types of system calls.pptx
PPTX
embedded C.pptx
PPSX
File mangement
PDF
Basics of files and its functions with example
PPTX
MODULE 8-File and preprocessor.pptx for c program learners easy learning
PPT
Intro reverse engineering
Os lectures
Smash the Stack: Writing a Buffer Overflow Exploit (Win32)
Systemcall1
Data file handling
Linux IO
02_os_structures.pdfbnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Linux 系統程式--第一章 i/o 函式
System calls in Linux environment for beginners
OOPs & C++(UNIT 5)
06 file processing
IOStream.pptx
Unix-module4 -chap2.123156456484844546pptx
Device_drivers_copy_to_user_copy_from_user.pptx
Kqueue : Generic Event notification
unit-1 lecture 7 Types of system calls.pptx
embedded C.pptx
File mangement
Basics of files and its functions with example
MODULE 8-File and preprocessor.pptx for c program learners easy learning
Intro reverse engineering
Ad

Recently uploaded (20)

PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Computing-Curriculum for Schools in Ghana
PDF
Classroom Observation Tools for Teachers
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Complications of Minimal Access Surgery at WLH
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Presentation on HIE in infants and its manifestations
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Institutional Correction lecture only . . .
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
RMMM.pdf make it easy to upload and study
PPTX
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Computing-Curriculum for Schools in Ghana
Classroom Observation Tools for Teachers
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
A systematic review of self-coping strategies used by university students to ...
Complications of Minimal Access Surgery at WLH
Abdominal Access Techniques with Prof. Dr. R K Mishra
Presentation on HIE in infants and its manifestations
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Institutional Correction lecture only . . .
2.FourierTransform-ShortQuestionswithAnswers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Final Presentation General Medicine 03-08-2024.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
RMMM.pdf make it easy to upload and study
Microbial diseases, their pathogenesis and prophylaxis

Linux System Programming - Advanced File I/O

  • 2. ADVANCE FILE I/O Scatter/gather I/O: • Allows a single call to read from or write data to many buffers at once • Useful for bunching together fields of different data structures to form one I/O transaction. Epoll: • Improves on the poll() and select() system calls • Useful when hundreds of file descriptors need to be polled from a single thread. Memory-mapped I/O: • Maps a file into memory, allowing file I/O to occur via simple memory manipulation. • Useful for certain patterns of I/O.
  • 3. ADVANCE FILE I/O File advice: • Allows a process to provide hints to the kernel on the process’s intended • Uses for a file and improved I/O performance. Asynchronous I/O: • Allows a process to issue I/O requests without waiting for them to complete • Useful for juggling heavy I/O workloads without the use of threads.
  • 4. Scatter/Gather I/O • Scatter/gather I/O is a method of input and output where a single system call writes to a vector of buffers from a single data stream, or, alternatively, reads into a vector of buffers from a single data stream. • This type of I/O is so named because the data is scattered into or gathered from the given vector of buffers. • An alternative name for this approach to input and output is vectored I/O.
  • 5. Scatter/Gather I/O • Scatter/gather I/O provides several advantages over linear I/O methods: – More natural coding pattern • If your data is naturally segmented—say, the fields of a predefined structure—vectored I/O allows for intuitive manipulation. – Efficiency • A single vectored I/O operation can replace multiple linear I/O operations. – Performance • In addition to a reduction in the number of issued system calls, a vectored I/O implementation can provide improved performance over a linear I/O implementation via internal optimizations. – Atomicity • In contrast with multiple linear I/O operations, a process can execute a single vectored I/O operation with no risk of interleaving I/O from another process.
  • 6. Scatter/Gather I/O • readv() and writev(): – The readv() function reads count segments from the file descriptor fd into the buffers described by iov. – The writev() function writes at most count segments from the buffers described by iov into the file descriptor fd: – Syntax for readv: #include <sys/uio.h> ssize_t readv (int fd, const struct iovec *iov, int count); – Syntax for writev: ssize_t writev (int fd, const struct iovec *iov,int count);
  • 7. Scatter/Gather I/O • The functions readv and writev both use a concept of an I/O vector. The <sys/uio.h> include file defines the struct iovec, which is defined as follows: struct iovec { ptr_t iov_base; size_t iov_len; };
  • 8. Scatter/Gather I/O • The struct iovec defines one vector element. Normally, this structure is used as an array of multiple elements. • For each transfer element, the pointer member iov_base points to a buffer that is receiving data for readv or is transmitting data for writev. • The member iov_len in each case determines the maximum receive length and the actual write length, respectively.
  • 9. Scatter/Gather I/O • The readv() and writev() functions behave the same as read() and write(), respectively,except that multiple buffers are read from or written to. • On success, readv() and writev() return the number of bytes read or written, respectively. This number should be the sum of all count iov_len values. • On error, the system calls return −1 and set errno as appropriate.
  • 10. Scatter/Gather I/O • The following code sample demonstrates the use of writev(): • “char *str0 = "hello "; char *str1 = "worldn"; struct iovec iov[2]; ssize_t nwritten; iov[0].iov_base = str0; iov[0].iov_len = strlen(str0); iov[1].iov_base = str1; iov[1].iov_len = strlen(str1); nwritten = writev(STDOUT_FILENO, iov, 2);”
  • 11. Event Poll • User processes can efficiently monitor and control multiple streams with two system calls: poll and the I_SETSIG ioctl command. • Recognizing the limitations of both poll() and select(), Linux kernel2 introduced the event poll (epoll) facility. • epoll solves the fundamental performance problem shared by both of them and adds several new features
  • 12. Event Poll • Both poll() and select() require the full list of file descriptors to watch on each invocation • The kernel must then walk the list of each file descriptor to be monitored • One system call initializes an epoll context, another adds monitored file descriptors to or removes them from the context, and a third performs the actual event wait. • #include <sys/epoll.h> • int epoll_create1 (int flags); • int epoll_create (int size);
  • 13. Event Poll • A successful call to epoll_create1() instantiates a new epoll instance and returns a file descriptor associated with the instance • On error, the call returns −1 and sets errno to one of the following: • EINVAL: Invalid flags parameter. • EMFILE: The user has reached their limit on the total number of open files. • ENFILE: The system has reached its limit on the total number of open files. • ENOMEM: Insufficient memory was available to complete the operation.
  • 14. Controlling Epoll • The epoll_ctl() system call can be used to add file descriptors to and remove file descriptors from a given epoll context • #include <sys/epoll.h> int epoll_ctl (int epfd, int op,int fd,struct epoll_event *event); • A successful call to epoll_ctl() controls the epoll instance associated with the filedescriptor epfd. • The parameter op specifies the operation to be taken against the file associated with fd.
  • 15. Controlling Epoll • Valid values for the op parameter: • EPOLL_CTL_ADD: Add a monitor on the file associated with the file descriptor fd to the epoll instance associated with epfd, per the events defined in event. • EPOLL_CTL_DEL: Remove a monitor on the file associated with the file descriptor fd from the epoll instance associated with epfd. • EPOLL_CTL_MOD: Modify an existing monitor of fd with the updated events specified by event.
  • 16. Controlling Epoll • The events field in the epoll_event structure • EPOLLERR: An error condition occurred on the file. This event is always monitored, even if it’s not specified. • EPOLLET: Enables edge-triggered behavior for the monitor of the file.The default behavior is level-triggered. • EPOLLHUP: A hangup occurred on the file. This event is always monitored, even if it’s not specified. • EPOLLIN: The file is available to be read from without blocking. • EPOLLOUT: The file is available to be written to without blocking. • EPOLLPRI: There is urgent out-of-band data available to read.
  • 17. Mapping Files into Memory • As an alternative to standard file I/O, the kernel provides an interface that allows an application to map a file into memory, meaning that there is a one-to-one correspondence between a memory address and a word in the file. • mmap(): • A call to mmap() asks the kernel to map len bytes of the object represented by the file descriptor fd, starting at offset bytes into the file, into memory. • If addr is included, it indicates a preference to use that starting address in memory
  • 18. Mapping Files into Memory • #include <sys/mman.h> • void * mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset); • The addr parameter offers a suggestion to the kernel of where best to map the file.(most users pass 0) • The prot parameter describes the desired memory protection of the mapping. • PROT_READ: The pages may be read. • PROT_WRITE: The pages may be written. • PROT_EXEC: The pages may be executed.
  • 20. Mapping Files into Memory • Return values and error codes • On success, a call to mmap() returns the location of the mapping. • On failure, the call returns MAP_FAILED and sets errno appropriately. • Possible errno values include: • EACCES: The given file descriptor is not a regular file, or the mode with which it was opened conflicts with prot or flags. • EAGAIN: The file has been locked via a file lock. • EBADF: The given file descriptor is not valid. • EINVAL: One or more of the parameters addr, len, or off are invalid. • ENFILE: The system-wide limit on open files has been reached. • ENODEV: The filesystem on which the file to map resides does not support memory mapping.
  • 21. Advice for Normal File I/O • Linux provides two interfaces for such advice giving: posix_fadvise() and readahead(). • The posix_fadvise() System Call: • The first advice interface, as its name alludes, is standardized by POSIX 1003.1-2003: • #include <fcntl.h> • int posix_fadvise (int fd, off_t offset, off_t len, int advice); • A call to posix_fadvise() provides the kernel with the hint advice on the file descriptor fd in the interval [offset,offset+len).
  • 22. Advice for Normal File I/O • one of the following should be provided for advice: • POSIX_FADV_NORMAL: The application has no specific advice to give on this range of the file. It should be treated as normal. • POSIX_FADV_RANDOM: The application intends to access the data in the specified range in a random (nonsequential) order. • POSIX_FADV_SEQUENTIAL: The application intends to access the data in the specified range sequentially, from lower to higher addresses. • POSIX_FADV_WILLNEED: The application intends to access the data in the specified range in the near future. • POSIX_FADV_NOREUSE: The application intends to access the data in the specified range in the near future, but only once. • POSIX_FADV_DONTNEED: The application does not intend to access the pages in the specified range in the near future.
  • 23. Synchronized, Synchronous, and Asynchronous Operations • A synchronized operation is more restrictive and safer than a merely synchronous operation. • The terms synchronous and asynchronous refer to whether I/O operations wait for some event (e.g., storage of the data) before returning. • The terms synchronized and non synchronized, meanwhile, specify exactly what event must occur (e.g., writing the data to disk). • Normally, Unix write operations are synchronous and nonsynchronized; read operations are synchronous and synchronized.
  • 26. Synchronized, Synchronous, and Asynchronous Operations • Asynchronous I/O: • Performing asynchronous I/O requires kernel support at the very lowest layers • POSIX 1003.1-2003 defines the aio interfaces, which Linux fortunately implements • #include <aio.h> • /* asynchronous I/O control block */ • struct aiocb { • int aio_fildes; /* file descriptor */ • int aio_lio_opcode; /* operation to perform */ • int aio_reqprio; /* request priority offset */ • volatile void *aio_buf; /* pointer to buffer */ • size_t aio_nbytes; /* length of operation */ • struct sigevent aio_sigevent; /* signal number and value */ • /* internal, private members follow... */ • };
  • 27. I/O Schedulers and I/O Performance • In a modern system, the relative performance gap between disks and the rest of the system is quite large • The worst component of disk performance is the process of moving the read/write head from one part of the disk to another, an operation known as a seek. • a single disk seek can average over 8 milliseconds still a small number, to be sure, but 25 million times longer than a single processor cycle.
  • 28. I/O Schedulers and I/O Performance • Inefficient to send I/O requests to the disk in the order in which they are issued. • Therefore, modern operating system kernels implement I/O schedulers, which work to minimize the number and size of disk seeks by manipulating the order in which I/O requests are serviced and the times at which they are serviced. • Disk Addressing: • Hard disks address their data using the familiar geometry-based addressing of cylinders, heads, and sectors, or CHS addressing.
  • 29. I/O Schedulers and I/O Performance • Cylinder-head-sector (CHS) is an early method for giving addresses to each physical block of data on a hard disk drive.
  • 30. I/O Schedulers and I/O Performance • To locate a specific unit of data on a disk, the drive’s logic requires three pieces of information: • the cylinder, head, and sector values • The hard disk knows what platter, what track, and what sector to look in for the data. • It can position the read/ write head of the correct platter over the correct track and read from or write to the requisite sector.
  • 31. I/O Schedulers and I/O Performance • Modern hard disks do not force computers to communicate with their disks in terms of cylinders, heads, and sectors. • Instead, contemporary hard drives map a unique block number (also called physical blocks or device blocks) over each cylinder/head/sector triplet effectively, a block maps to a specific sector. • Modern operating systems can then address hard drives using these block numbers a process known as logical block addressing (LBA) and the hard drive internally translates the block number into the correct CHS address.
  • 32. I/O Schedulers and I/O Performance • The Life of an I/O Scheduler: • I/O schedulers perform two basic operations: merging and sorting. • Merging is the process of taking two or more adjacent I/O requests and combining them into a single request. • Consider two requests, one to read from disk block 5, and another to read from disk blocks 6 through 7. • These requests can be merged into a single request to read from disk blocks 5 through 7. • The total amount of I/O might be the same, but the number of I/O operations is reduced by half.
  • 33. I/O Schedulers and I/O Performance • Sorting: The process of arranging pending I/O requests in ascending block order. • Given I/O operations to blocks 52, 109, and 7 • The I/O scheduler would sort these requests into the ordering 7, 52, and 109. • If a request was then issued to block 81, it would be inserted between the requests to blocks 52 and 109. • The I/O scheduler would then dispatch the requests to the disk in the order that they exist in the queue: 7, then 52, then 81, and finally 109. • In this manner, the disk head’s movements are minimized.
  • 34. I/O Schedulers and I/O Performance • Helping Out Reads: • Each read request must return up-to-date data. • if the requested data is not in the page cache, the reading process must block until the data can be read from disk a potentially lengthy operation. • We call this performance impact read latency. • Consider reading every file in a directory. • The application opens the first file, reads a chunk of it, waits for data, reads another chunk, and so on, until the entire file is read. • Then the application starts again, on the next file. • The requests become serialized: a subsequent request cannot be issued until the current request completes.
  • 35. I/O Schedulers and I/O Performance • The Deadline I/O Scheduler: • The main goal of the Deadline scheduler is to guarantee a start service time for a request. • It does so by imposing a deadline on all I/O operations to prevent starvation of requests. • It also maintains two deadline queues, in addition to the sorted queues (both read and write). • Before serving the next request, the deadline scheduler decides which queue to use. • Read queues are given a higher priority, because processes usually block on read operations.
  • 36. I/O Schedulers and I/O Performance • The Deadline I/O Scheduler:
  • 37. I/O Schedulers and I/O Performance • The Anticipatory I/O Scheduler: • Consider a system undergoing heavy write activity. • Every time a read request is submitted, the I/O scheduler quickly rushes to handle the read request • The preference toward read requests is a good thing, but the resulting pair of seeks is detrimental to global disk throughput. • The Anticipatory I/O scheduler aims to continue to provide excellent read latency, but also provide excellent global throughput.
  • 38. I/O Schedulers and I/O Performance • The Anticipatory I/O Scheduler: • The Anticipatory I/O scheduler starts with the Deadline I/O scheduler as its base. • When a read request is issued, it is handled as usual, within its usual expiration period. • After the request is submitted, however, the Anticipatory I/O scheduler does not immediately seek back and return to handling other requests. • Instead, it does absolutely nothing for a few milliseconds (by default it is six milliseconds). • In those few milliseconds, Any requests issued to an adjacent area of the disk are immediately handled. • After the waiting period elapses, the Anticipatory I/O scheduler seeks back to where it left off and continues handling the previous requests.
  • 39. I/O Schedulers and I/O Performance • The CFQ I/O Scheduler: • The Complete Fair Queuing (CFQ) I/O scheduler is an I/O scheduler designed for specialized workloads, provides good performance across multiple workloads. • Each process is assigned its own queue, and each queue is assigned a time slice. • The I/O scheduler visits each queue in a round-robin fashion, servicing requests from the queue until the queue’s time slice is exhausted, or until no more requests remain. • The CFQ I/O Scheduler will then sit idle for a brief period(by default, 10 ms) waiting for a new request on the queue.
  • 40. I/O Schedulers and I/O Performance • The Noop I/O Scheduler: • The Noop I/O Scheduler is the most basic of the available schedulers. • It performs no sorting whatsoever, only basic merging. • It is used for specialized devices that do not require (or that perform) their own request sorting.
  • 41. I/O Schedulers and I/O Performance • Selecting and Configuring Your I/O Scheduler: • The default I/O scheduler is selectable at boot time via the iosched kernel command line parameter. • Valid options are as, cfq, deadline, and noop. • The I/O scheduler is also runtime-selectable on a per device basis via /sys/block/[device]/queue/scheduler • Ex: to set the device hda to the CFQ I/O Scheduler: • # echo cfq > /sys/block/hda/queue/scheduler • Changing any of these settings requires rootprivileges.
  • 42. I/O Schedulers and I/O Performance • Optimizing I/O Performance: • Scheduling I/O in user space: • The I/O scheduler does its job, sorting and merging the requests before sending them out to the disk. • If an application is generating many requests particularly if they are for data all over the disk it can benefit from sorting the requests before submitting them, ensuring they reach the I/O scheduler in the desired order. • User-space applications can sort based on: • The full path • The inode number • The physical disk block of the file
  • 43. I/O Schedulers and I/O Performance • Sorting by path: • Sorting by the path name is the easiest, yet least effective • Due to the layout algorithms used by most file systems, the files in each directory and thus the directories sharing a parent directory tend to be adjacent on disk. • It is certainly true that two files in the same directory have a better chance of being located near each other than two files in radically different parts of the file system.
  • 44. I/O Schedulers and I/O Performance • Sorting by inode: • Inodes are Unix constructs that contain the metadata associated with individual files. • each file has exactly one inode, which contains information such as the file’s size, permissions, owner, and so on. • Sorting by inode is better than sorting by path file i's inode number < file j's inode number • implies, in general, that: physical blocks of file i < physical blocks of file j
  • 45. I/O Schedulers and I/O Performance • Sorting by physical block: • Each file is broken up into logical blocks, which are the smallest allocation units of a file system. • The size of a logical block is file system dependent. • Each logical block maps to a single physical block. • We can thus find the number of logical blocks in a file, determine what physical blocks they map to, and sort based on that. • The kernel provides a method for obtaining the physical disk block from the logical block number of a file. • This is done via the ioctl() system call
  • 46. I/O Schedulers and I/O Performance ret = ioctl (fd, FIBMAP, &block); if (ret < 0) perror ("ioctl"); • fd is the file descriptor of the file in question • block is the logical block whose physical block we want to determine. • On successful return, block is replaced with the physical block number. • The logical blocks passed in are zero-indexed and file- relative. • If a file is made up of eight logical blocks, valid values are 0 through 7.
  • 47. I/O Schedulers and I/O Performance • Finding the logical-to-physical-block mapping is thus a two-step process. • First, we must determine the number of blocks in a given file. • This is done via the stat() system call. • Second, for each logical block, we must issue an ioctl() request to find the corresponding physical block.