SlideShare a Scribd company logo
Inter- Process Communication Pipes and FIFO’s (named pipes) Semaphores Messages Shared Memory Regions Sockets
Pipes One-way flow of data | operator eg: $ ls | more instead of : $ ls > temp   $ temp > more  The shell statement is much shorter and simpler There is no need to find temporary regular files, which must be deleted later.
System calls for using a pipe Pipe()  Read()  Write() POSIX : half- duplex pipes System V Release 4 : full-duplex pipes Linux : one-way file descriptors but not necessary to close one of them before using the other
‘ ls | more’  example When command shells interprets ‘ls | more’, it: 1. Invokes the  pipe( )  system call; let us assume that  pipe( )  returns the file   descriptors 3 (the pipe's  read channel  ) and 4 (the  write channel  ). 2. Invokes the  fork( )  system call twice. 3. Invokes the  close( )  system call twice to release file descriptors 3 and 4.
‘ ls | more’  example The first child process, which must execute the  l s  program, performs the following operations: 1. Invokes  dup2(4,1)  to copy file descriptor 4 to file descriptor 1. From now on, file   descriptor 1 refers to the pipe's write channel. 2. Invokes the  close( )  system call twice to release file descriptors 3 and 4. 3. Invokes the  execve( )  system call to execute the  /bin/ls  program .  By default, such a program writes its output to the file having file   descriptor 1 (the standard output), that is, it writes into the pipe .
‘ ls | more’  example The second child process must execute the  more  program; therefore, it performs the following   operations: 1. Invokes  dup2(3,0)  to copy file descriptor 3 to file descriptor 0. From now on, file   descriptor  0  refers to the pipe's read channel. 2. Invokes the  close( )  system call twice to release file descriptors 3 and 4. 3. Invokes the  execve( )  system call to execute  /bin/more . By default, that program   reads its input from the file having file descriptor (the standard input); that is, it reads   from the pipe.
Wrapper functions The  popen( )  function   receives two parameters: the  filename   pathname of an executable file and a  type   string   specifying the direction of the data transfer. It returns the pointer to a  FILE   data structure. The   popen( )  function essentially performs the following operations: 1. Creates a new pipe by making use of the  pipe( )  system call 2. Forks a new process, which in turn executes the following operations: a. If  type  is  r , duplicates the file descriptor associated with the pipe's write   channel as file descriptor 1 (standard output); otherwise, if  type  is  w ,   duplicates the file descriptor associated with the pipe's read channel as file   descriptor  0  (standard input) b. Closes the file descriptors returned by  pipe( ) c. Invokes the  execve( )  system call to execute the program specified by   filename 3. If  type  is  r , closes the file descriptor associated with the pipe's write channel;   otherwise, if  type  is  w , closes the file descriptor associated with the pipe's read   channel 4. Returns the address of the  FILE  file pointer that refers to whichever file descriptor for   the pipe is still open
Wrapper functions The  pclose( )  function, which receives the file pointer returned by  popen( )  as its   parameter, simply invokes the  wait4( )  system call and waits for the termination of the   process created by  popen( ) .
Pipe Data Structures For each pipe, kernel creates : an inode object, its  u  field containing the pipe_inode_info structure. i_size  field stores the current pipe size i_atomic_write  semaphore two file objects, one for reading and other for writing pipe buffer
Creating a pipe The  pipe( )  system call is serviced by the  sys_pipe( )  function, which in turn invokes the   do_pipe( )  function. In order to create a new pipe,  do_pipe( )  performs the following   operations: 1. Allocates a file object and a file descriptor for the read channel of the pipe, sets the   flag  field of the file object to  O_RDONLY , and initializes the  f_op  field with the   address of the  read_ pipe_fops  table. 2. Allocates a file object and a file descriptor for the write channel of the pipe, sets the   flag  field of the file object to  O_WRONLY , and initializes the  f_op  field with the   address of the  write_ pipe_fops  table. 3. Invokes the  get_ pipe_inode( )  function, which allocates and initializes an inode   object for the pipe. This function also allocates a page frame for the pipe buffer and   stores its address in the  base  field of the  pipe_inode_info  structure. 4. Allocates a dentry object and uses it to link together the two file objects and the inode   object. 5. Returns the two file descriptors to the User Mode process.
Destroying a pipe Whenever a process invokes the  close( )  system call on a file descriptor associated with a   pipe, the kernel executes the  fput( )  function on the   corresponding file object, which   decrements the usage counter. If the counter becomes 0, the function invokes the  release   method of the file operations. the  pipe_read_release( )  and the  pipe_write_release( )  functions set  0  to the  readers  and the   writers  fields, respectively, of the  pipe_inode_info  structure.  Each function then invokes   the  pipe_release( )   function. This function wakes up any processes sleeping in the pipe's   wait queue so that they can recognize the change in the pipe state.  Moreover, the function   checks whether both the  readers  and  writers  fields are equal to 0; in this case, it releases   the page frame containing the pipe buffer.
Reading from a pipe A process wishing to get data from a pipe issues a  read( )  system call, specifying as its file   descriptor the descriptor associated with the pipe's read channel.  T he kernel ends up invoking the  read  method found in the file operation   table associated with the proper file object. In the case of a pipe, the entry for the read method   in the  read_pipe_fops  table points to the  pipe_read( )  function  which performs the following operations: 1. Determines if the pipe size, which is stored into the inode's  i_size  field, is 0. In this   case, determines if the function must return or if the process must be blocked while   waiting until another process writes some data in the pipe. The type   of I/O operation (blocking or nonblocking) is specified by the  O_NONBLOCK  flag in the   f_flags  field of the file object. If necessary, invokes the  i nterruptible_sleep_on(   )  function to suspend the current process after having inserted it in the wait queue to   which the  wait  field of the  pipe_inode_info  data structure points.
Reading from a pipe 2. Checks the  lock  field of the  pipe_inode_info  data structure. If it is not null, another   process is currently accessing the pipe; in this case, either suspends the current process   or immediately terminates the system call, depending on the type of read operation   (blocking or   nonblocking). 3. Increments the  lock  field. 4. Copies the requested number of bytes (or the number of available bytes, if the buffer   size is too small) from the pipe's buffer to the user address space. 5. Decrements the  lock  field. 6. Invokes  wake_up_interruptible( )  to wake up all processes sleeping on the pipe's   wait queue. 7. Returns the number of bytes copied into the user address space.
Writing into a pipe A process wishing to put data into a pipe issues a  write( )  system call, specifying as its file   descriptor the descriptor associated with the pipe's write channel. The kernel satisfies this   request by invoking the  write  method of the proper file object; the corresponding entry in the   write _ pipe_fops  table points to the  pipe_write( )  function  which performs the following functions: 1. Checks whether the pipe has at least one reading process. If not, sends a  SIGPIPE   signal to the  current  process and return an  -EPIPE  value. 2. Releases the  i_sem  semaphore of the pipe's inode, which was acquired by the   sys_write( )  function and acquires the   i_atomic_write  semaphore of the same inode .   The  i_sem  semaphore prevents multiple processes from starting write operations on a file, and thus on the pipe.
Writing into a pipe 3. Checks whether the number of bytes to be written is within the pipe's buffer size: a.  If so, the write operation must be atomic. Therefore, checks whether the buffer   has enough free space to store all bytes to be written. b. If the number of bytes is greater than the buffer size, the operation can start as   long as there is any free space at all. Therefore, checks for at least 1 free byte. 4. If the buffer does not have enough free space and the write operation is blocking,   inserts the current process into the pipe's wait queue and suspends it until some data is   read from the pipe. Notice that the  i_atomic_write  semaphore is not released, so no   other process can start a write operation on the buffer. If the write operation is   nonblocking, returns the  -EAGAIN  error code.
Writing into a pipe 5. Checks the  lock  field of the  pipe_inode_info  data structure. If it is not null, another   process is currently reading the pipe, so either suspends the current process or   immediately terminates the write depending on whether the write operation is blocking   or nonblocking. 6. Increments the  lock  field. 7. Copies the requested number of bytes (or the number of free bytes if the pipe size is   too small) from the user address space to the pipe's buffer. 8. If there are bytes yet to be written, goes to step 4. 9. After all requested data is written, decrements the  lock  field. 10. Invokes  wake_up_interruptible( )  to wake up all processes sleeping on the pipe's   wait queue. 11. Releases the  i_atomic_write  semaphore and acquires the  i_sem  semaphore (so that   sys_write( )  can safely release the latter). 12. Returns the number of bytes written into the pipe's buffer.
FIFOs A process creates a FIFO by issuing a  mknod( )   system call ,  passing to it as parameters the pathname of the new FIFO and the value  S_IFIFO   ( 0x1000 ) logically ORed with the permission bit mask of the new file.  POSIX introduces a   system call named  mkfifo( )  specifically to create a FIFO. This call is implemented in   Linux, as in System V Release 4, as a C library function that invokes  mknod( ) . Once created, a FIFO can be accessed through the usual  open( ) ,  read( ) ,  write( ) , and   close( )  system calls .
Fifo_open( ) The  fifo_open( )  function examines the values of the  readers  and  writers  fields in the   pipe_inode_info  data structure.
Fifo_open( ) 2. T he function further determines specialized behavior   for the set of file operations to be used by setting the  f_op  field of the file object to the   address of some predefined tables shown in  Table 18-5 . 3.  Finally, the function checks whether   the  base  field of the   pipe_inode_info  data structure is  NULL ; in this case, it gets a free page   frame for the FIFO's kernel buffer and stores its address in  base .
System V IPC It denotes a set of system calls that allows a User Mode process to : Synchronize itself with other processes by means of semaphores Send messages to other processes or receive messages from them Share a memory area with other processes IPC data structures are created dynamically when a process requests an  IPC resource  (a   semaphore, a message queue, or a shared memory segment). Each IPC resource is persistent:   unless explicitly released by a process, it is kept in memory. An IPC resource may be used by   any process, including those that do not share the ancestor that created the resource. E ach new resource is   identified by a 32-bit  IPC key , which is similar to the file pathname in the system's directory   tree.  Each IPC resource also has a 32-bit  IPC identifier , which is somewhat similar to the file   descriptor associated with an open file. When two or more processes wish to communicate through an IPC resource, they all refer to   the IPC identifier of the resource.
Using an IPC resource By  invoking the  semget( ) ,  msgget( ) , or  shmget( )  functions , IPC identifier is derived from the corresponding IPC key, which is then used to access the resource. If there is no IPC resource already associated with the IPC key, a   new resource is created. If everything goes right, the function returns a positive IPC identifier;   otherwise, it returns one of the error codes illustrated in  Table 18-6 .
Using an IPC resource  The  ftok( )  function attempts to create a new key from a file pathname and an 8-bit project identifier passed as parameters. It does not  gu arantee, however, a unique key number . One process issues a  semget( ) ,  msgget( ) , or  shmget( )  function by specifying  IPC_PRIVATE  as its IPC key. A new IPC resource is thus allocated, and the process can either communicate its IPC identifier to the other process in the application   or fork the other process itself. IPC_CREAT flag  specifies that the IPC resource must be created, if it does not already exist. IPC_EXCL flag  specifies that the function must fail if the resource already exists and the  IPC_CREAT  flag is set.
Using an IPC resource Each IPC identifier is computed by combining a  slot usage sequence number  s  relative to the resource type, an arbitrary  slot index  i  for the allocated resource, and the value   chosen in the kernel for the maximum number of allocatable resources   M ,  where 0  i < M IPC identifier =  s  x  M  +  i At every allocation, slot index  i  is incremented. At every deallocation, t he slot usage sequence number  s ,  which   is initialized to  0,  is incremented by 1  while  i  decreases.
Using an IPC resource The  semctl( ) ,  msgctl( ) , and  shmctl( )  functions may be used to handle IPC resources. The  IPC_SET  subcommand allows a process to change the owner's user and group identifiers   and the permission bit mask in the  ipc_perm  data structure.  The  IPC_STAT  and  IPC_INFO   subcommands retrieve some information concerning a resource.  T he  IPC_RMID   subcommand releases an IPC resource. A process may acquire or release an IPC semaphore by issuing the   semop( )  function.  When a process wants to send or receive an IPC message, it uses the   msgsnd( )  and  msgrcv( )  functions, respectively.  A  process attaches and detaches a   shared memory segment in its address space by means of the  shmat( )  and  shmdt( )   functions, respectively.
IPC semaphores IPC semaphores  are   counters used to provide controlled access to shared data structures for multiple processes. The semaphore value is positive if the protected resource is available, and negative or  0  if the   protected resource is currently not available.  A process that wants to access the resource   decrements by 1 the semaphore value.  It is allowed to use the resource only if the old value   was positive; otherwise, the process waits until the semaphore becomes positive. When a   process relinquishes a protected resource, it increments its semaphore value by 1; in doing so,   any other process waiting for the semaphore is woken up.
IPC semaphores Each IPC semaphore is a set of one or more semaphore values , called primitive semaphores , not just a single value   as for kernel semaphores. The number of  primitive  semaphore s  in each IPC   semaphore must be specified as a parameter of the  semget( )  function when the   resource is being allocated, but it cannot be greater than  SEMMSL  (usually 32). When a process chooses to use this mechanism, the resulting operations   are called  undoable  semaphore operations. When the process dies, all of its IPC   semaphores can revert to the values they would have had if the process had never   started its operations.
IPC semaphores T o access one or   more resources protected by an IPC semaphore , a  process: 1.  Invokes the  semget( )  wrapper function to get the IPC semaphore identifier, specifying as the parameter the IPC key of the IPC semaphore that protects the shared resources. If the process wants to create a new IPC semaphore, it also specifies the  IPC_CREATE  or  IPC_PRIVATE  flag and the number of primitive semaphores required. 2.  Invokes the  semop( )  wrapper function to test and decrement all primitive semaphore values involved. If all the tests succeed, the decrements are performed, the function terminates, and the process is allowed to access the protected resources. If some semaphores are in use, the process is usually suspended until some other process releases the resources. The function receives as parameters the IPC semaphore identifier, an array of numbers specifying the operations to be atomically performed on the primitive semaphores, and the number of such operations. Optionally, the process may specify the  SEM_UNDO  flag, which instructs the kernel to reverse the operations should the process exit without releasing the primitive semaphores. 3.  When relinquishing the protected resources, invokes the  semop( )  function again to atomically increment all primitive semaphores involved. 4.  Optionally, invokes the  semctl( )  wrapper function, specifying in its parameter the  IPC_RMID  flag to remove the IPC semaphore from the system.
IPC semaphore data structures
IPC semaphore data structures semary  array: Statically allocated Includes  SEMMNI values (usually 128) Each element in the array can assume one of the following values:  IPC_UNUSED  (-1): no IPC resource refers to this slot.  IPC_NOID  (-2): the IPC resource is being allocated or destroyed. The address of a dynamically allocated memory area containing the IPC semaphore resource. Index number of array represents the slot index  i. When a new IPC resource must be allocated, the kernel scans the array and uses the first array element (slot) containing the value  IPC_UNUSED .
IPC semaphore data structures The first locations of the memory area containing the IPC semaphore store a descriptor of   type  struct semid_ds . All other locations in the   memory area store several  sem   data structures  with t wo fields: semval  sempid
Undoable semaphore operations Processes may   require undoable operations by specifying the  SEM_UNDO  flag in the  semop( )  function. sem_undo  data structure   contains  the IPC identifier of the semaphore ,  and  an array of integers representing the changes   to the primitive semaphore's values caused by all undoable operations per-process list used when a process terminates per-semaphore list used when a process invokes the  semctl( )  function to force an explicit value   into a primitive semaphore used when an IPC semaphore   is destroyed
IPC semaphore data structures The queue of pending requests:
IPC Messages A message is composed of: a fixed-size  header,  which is a data structure of type  msg a variable-length  text
IPC message queue data structures
IPC message queue data structures
Sending and Receiving Messages msgsnd( )  function; parameters : The IPC identifier of the destination message queue The size of the message text The address of a User Mode buffer that contains the message type immediately followed by the message text msgrcv( )  function; parameters: The IPC identifier of the IPC message queue resource The pointer to a User Mode buffer to which the message type and message text should be copied The size of this buffer A value  t  that specifies what message should be retrieved
IPC Shared Memory shmget( )  function shmat( )  function shmdt( )  function shmctl( )  function   accesses the   contents of the  shmid_ds  data structure named  u  inside the  shmid_kernel  structure which is IPC shared memory descriptor.
IPC shared memory data structures
IPC Shared Memory Data Structures
Demand paging for IPC shared memory segments “ Page Fault” occurs when a process tries to access a location of shared memory segment. Exception handler invokes the  do_no_page function. It invokes  nopage  method and the page table entry is set to the address returned by it. (The usage counter of the page frame is increased.)
Swapping out pages of IPC shared memory segments try_to_sawp_out( )  function : page frame is not released to the Buddy system its usage counter is decremented by 1 corresponding entry in the calling process’s page table is cleared. do_try_to_free_pages( )  function periodically invokes  shm_swap( )  function, which iteratively scans  all descriptors referenced by the  shm_segs  array for each descriptor, examines all page frames assigned to each segment  if the usage counter of a page frame is found to be 1, that page is swapped out to the disk the swapped out page identifier is saved in the  shm_pages  array.
LINUX SCHEDULING (version 2.6.x)
Linux is a multitasking kernel that allows more than one process to exist at any given time A kernel’s scheduler enforces a thread scheduling policy, including when, for how long, and in some cases where (on Symmetric Multiprocessing (SMP) systems) threads can execute. Normally the scheduler runs in its own thread, which is woken up by a timer interrupt. Otherwise it is invoked via a system call or another kernel thread that wishes to yield the CPU.  A thread will be allowed to execute for a certain amount of time , then a context switch to the scheduler thread will occur, followed by another context switch to a thread of the scheduler’s choice.
CPU and I/O-bound Threads CPU Bound  : Threads spend a lot of time doing CPU computation. eg. Video Encoder. I/O Bound : Threads spend a lot of time waiting for relatively slow I/O operations. eg. VI ( Text Editor)
Linux Kernel 2.6 Molnar designed this scheduler with the desktop and the server market in mind, and as a result desktop performance is much improved in Linux distributions based on 2.6.x kernels.
Linux Scheduling Goals Efficiency : Every algorithm is O(1) regardless of number of running processes. Provide good interactive performance. Provide fairness : No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice. Soft RT Scheduling : RT tasks are assigned special scheduling modes and the scheduler gives them priority over any other task on the system.
POLICY Determines what runs when!!! Responsible for optimally utilizing processor time &  overall feel of the system.
Linux Policy CPU bound processes are scheduled less frequently but are given longer timeslice. I/O bound processes are scheduled for short periods and more frequently. Leading to improved process response time  at the cost of process throughput.
There are two key data structures in the Linux 2.6.8.1 scheduler that allow for it to perform its duties in O(1) time, and its design revolves around them – runqueues and priority arrays.
RUNQUEUE Runqueue keeps track of all runnable tasks assigned to a particular CPU. As such, one runqueue is created and maintained for each CPU in a system. Each runqueue contains two priority arrays, tasks on a CPU begin in one priority array, the active one, and as they run out of their timeslices they are moved to the expired priority array. During the move, a new timeslice is calculated. When there are no more runnable tasks in the active priority arrays, it is simply swapped with the expired priority array (simply updating two pointers). Runqueue also keep track of a CPU’s special thread information (idle thread, migration thread).
RUNQUEUE DATASTRUCTURE struct runqueue {  spinlock_t lock;  spin lock which protects this runqueue   unsigned long nr_running;  number of runnable tasks   unsigned long nr_switches;  number of contextswitches   unsigned long expired_timestamp;  time of last array swap unsigned long nr_uninterruptible;  number of tasks in uinterruptible sleep struct task_struct *curr;  this processor's currently running task struct task_struct *idle;  this processor's idle task
struct mm_struct *prev_mm;  mm_struct of last running task struct prio_array *active;  pointer to the active priority array   struct prio_array *expired;  pointer to the expired priority array   struct prio_array arrays[2];  the actual priority arrays   int prev_cpu_load[NR_CPUS];  load on each processor   struct task_struct *migration_thread;  the migration thread on this processor  struct list_head migration_queue;  the migration queue for this processor  atomic_t nr_iowait;  number of tasks waiting on I/O }
PRIORITY ARRAY This data structure is the basis for most of the Linux 2.6.8.1 scheduler’s advantageous behavior, in particular its O(1) (constant) time performance. The Linux 2.6.8.1 scheduler always schedules the highest priority task on a system, and if multiple tasks exist at the same priority level, they are scheduled roundrobin with each other.  Priority arrays make finding the highest priority task in a system a constant-time operation, and also makes round-robin behavior within priority levels possible in constant-time.  Furthermore, using two priority arrays in unison (the active and expired priority arrays) makes transitions between timeslice epochs a constant-time operation. An epoch is the time between when all runnable tasks begin with a fresh timeslice and when all runnable tasks have used up their timeslices.
Priority Array Datastructure unsigned int nr_active  The number of active tasks in the priority array. unsigned long bitmap[BITMAP_SIZE] The bitmap representing the priorities for which active tasks exist in the priority array.  It is stored in form of 5 32-bit words. For example - if there are three active tasks, two at priority 0 and one at priority 5, then bits 0 and 5 should be set in this bitmap. Because the number of priorities is constant time to search the first set bit.
struct list_head queue[MAX_PRIO] An array of linked lists. There is one list in the array for each priority level (MAX_PRIO).  In Linux kernel 2.6.x MAX_PRIO is 140.
Priorities   STATIC PRIORITY All tasks have a static priority, often called a nice value. On Linux, nice values range from -20 to 19, with higher values being lower priority . By default, tasks start with a static priority of 0, but that priority can be changed via the nice() system call. Apart from its initial value and modifications via the nice() system call, the scheduler never changes a task’s static priority. Static priority is the mechanism through which users can modify task’s priority, and the scheduler will respect the user’s input. A task’s static priority is stored in its static_prio variable.
Priorities DYNAMIC PRIORITY Scheduler rewards I/O-bound tasks and punishes CPU-bound tasks by adding or subtracting from a task’s static priority. This adjusted priority is dynamic priority and is stored in p->prio. Maximum Priority bonus & penalty is 5.
I/O-bound vs. CPU-bound Heuristics To determine how much penalty or bonus has to be given to a process scheduler keeps track of how long the process sleeps as opposed to running during its time slice. When a task is woken up from sleep, its total sleep time is added to its sleep_avg variable (limiting the max value to MAX_SLEEP_AVG) When a task gives up the CPU, voluntarily or involuntarily, the time the current task spent running is subtracted from its sleep_avg.
Calculating dynamic priority The sleep_avg value is used to determine the DP as it is an index of how much process is I/O bound or CPU bound effective_prio() It checks whether the function is RT or not, If it is RT then no bonus. It maps the sleep_avg value to bonus in range -5 to 5 which is then subtracted from the static priority to get DP.
Calculating Timeslice Timeslice is calculated by simply scaling a task’s static priority onto the possible timeslice range and making sure a certain minimum and maximum timeslice is enforced .(task_timeslice()) The higher the task’s static priority (the lower the task’s static_prio value) the larger the timeslice it gets.
Interactive task’s timeslice may be broken up into chunks, based on the TIMESLICE_GRANULARITY value in the scheduler. The function scheduler_tick() (called after every 1 msec) checks to see if the currently running task has been taking the CPU from other tasks of the same dynamic priority for too long (TIMESLICE_GRANULARITY).  If a task has been running for TIMESLICE_GRANULARITY and task of the same dynamic priority exists a round-robin switch between other tasks of the same dynamic priority is made.
Reinsertion of interactive task If a task is sufficiently interactive, when it exhausts its timeslice, it will not be inserted into the expired array, but instead reinserted back into the active array(so long as nothing is starving in the expired priority array checked by EXPIRED_STARVING() called by scheduler_tick())
Fairness when forking new tasks When new tasks (threads or processes) are forked, the functions wake_up_forked_thread() and wake_up_forked_process() decrease the sleep avg of both parents and children. This prevents highly interactive tasks from spawning other highly interactive tasks.
Interactivity Credits Every task maintains an interactive_credit variable. Tasks get an interactive credit when they sleep for a long time, and lose an interactive credit when they run for a long time. If a task has less then -100 interactivity credits it is considered to have low interactivity credit. Low interactivity credit tasks waking from uninterruptible sleep are limited in their sleep avg rise since they are probably CPU hogs waiting on I/O. High interactivity credit tasks get less run time subtracted from their sleep avg in order to prevent them from losing interactive status too quickly.
SLEEPING & WAKING TASKS
Wait queue A wait queue is essentially a list of tasks waiting for some event or condition to occur. When that event occurs, code controlling that event will tell the waitqueue to wake up all of the tasks in it.
Going to Sleep Create a wait queue via DECLARE_WAITQUEUE(). Add task to the wait queue via add_wait_queue(). To wake tasks when the event occurs : wake_up() Mark task as sleeping, either TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE. calls schedule() When the task wakes up, mark task as TASK_RUNNING and remove it from the wait queue via remove_wait_queue().
Interruptible & Uninterruptible states Uninterruptible state : Ignores any signal until the event it was originally waiting for occurs. Interruptible state : respond to the signal immediately (though their response won’t necessarily be to die as the user probably wants).
Main Scheduling Function Pick a new task to run and switch to it. Called whenever a task wishes to give up the CPU voluntarily (often through the sys_sched_yield() system call), and if scheduler_tick() sets the TIF_NEED_RESCHED flag on a task because it has run out of timeslice.  scheduler_tick() is a function called during every system time tick, via a clock interrupt. It checks the state of the currently running task and other tasks in a CPU’s runqueue to see if scheduling is necessary .

More Related Content

PPTX
Linux process management
PPT
Process scheduling linux
PDF
Inter Process Communication
PPTX
Linux System Programming - File I/O
DOCX
linux file sysytem& input and output
PPTX
Unix signals
PDF
Signal Handling in Linux
PDF
Inter process communication using Linux System Calls
Linux process management
Process scheduling linux
Inter Process Communication
Linux System Programming - File I/O
linux file sysytem& input and output
Unix signals
Signal Handling in Linux
Inter process communication using Linux System Calls

What's hot (20)

PPT
Linux Commands
PPT
Introduction to System Calls
PPTX
U-Boot Porting on New Hardware
PPTX
Process management in linux
PDF
Inter process communication
PPT
System call
PDF
Shell scripting
PDF
netfilter and iptables
PDF
Implementation of FIFO in Linux
PDF
Memory management in Linux kernel
PPT
Process management in os
PPT
Message passing interface
PDF
Lecture 3 parallel programming platforms
PPTX
Cpu scheduling in operating System.
PDF
Processes description and process control.
PDF
Introduction to OpenMP
PPT
Unix.system.calls
PDF
Network Drivers
PPTX
System Booting Process overview
Linux Commands
Introduction to System Calls
U-Boot Porting on New Hardware
Process management in linux
Inter process communication
System call
Shell scripting
netfilter and iptables
Implementation of FIFO in Linux
Memory management in Linux kernel
Process management in os
Message passing interface
Lecture 3 parallel programming platforms
Cpu scheduling in operating System.
Processes description and process control.
Introduction to OpenMP
Unix.system.calls
Network Drivers
System Booting Process overview
Ad

Viewers also liked (20)

PPT
Inter process communication
PDF
Semaphores
PPT
Networking chapter V
PPTX
Semaphores-R.D.Sivakumar
PPT
Inter-Process Communication (IPC) techniques on Mac OS X
PPS
Linux Ubuntu
PDF
unix interprocess communication
PPT
IPC mechanisms in windows
PDF
Operating Systems - Advanced Synchronization
PPT
Operating System 3
PPT
Ipc ppt
PDF
Introduction to SOAP/WSDL Web Services and RESTful Web Services
PPT
PPT
Interprocess communication
PPTX
Critical section problem in operating system.
PDF
Inter-Process Communication in distributed systems
PPTX
Operating systems linux
PPT
Classical problem of synchronization
PPT
Semaphores OS Basics
PPT
Lec11 semaphores
Inter process communication
Semaphores
Networking chapter V
Semaphores-R.D.Sivakumar
Inter-Process Communication (IPC) techniques on Mac OS X
Linux Ubuntu
unix interprocess communication
IPC mechanisms in windows
Operating Systems - Advanced Synchronization
Operating System 3
Ipc ppt
Introduction to SOAP/WSDL Web Services and RESTful Web Services
Interprocess communication
Critical section problem in operating system.
Inter-Process Communication in distributed systems
Operating systems linux
Classical problem of synchronization
Semaphores OS Basics
Lec11 semaphores
Ad

Similar to Ipc in linux (20)

PPTX
Unit V.pptx
PDF
Input File dalam C++
PPTX
Linux System Programming - Buffered I/O
DOCX
1-Information sharing 2-Computation speedup3-Modularity4-.docx
PPT
new pdfrdfzdfzdzzzzzzzzzzzzzzzzzzzzzzzzzzgggggggggggggggggggggggggggggggggggg...
PDF
File_Management_in_C
PPS
C programming session 08
PPT
PIPES in Unix programming in computer science and engineering.PPT
PDF
File handling C program
PDF
Java IO
PPT
Linux basics
PPT
02 fundamentals
PPTX
18CS56-UP-Module 3.pptx
PPTX
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
DOCX
Files nts
PDF
C library for input output operations.cstdio.(stdio.h)
PPTX
File management in C++
PPTX
BUFFERS, PIPE, DEVICE AND PROGRAM MANAGEMENT.pptx
PPTX
IO Programming.pptx all informatiyon ppt
PPTX
File management
Unit V.pptx
Input File dalam C++
Linux System Programming - Buffered I/O
1-Information sharing 2-Computation speedup3-Modularity4-.docx
new pdfrdfzdfzdzzzzzzzzzzzzzzzzzzzzzzzzzzgggggggggggggggggggggggggggggggggggg...
File_Management_in_C
C programming session 08
PIPES in Unix programming in computer science and engineering.PPT
File handling C program
Java IO
Linux basics
02 fundamentals
18CS56-UP-Module 3.pptx
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Files nts
C library for input output operations.cstdio.(stdio.h)
File management in C++
BUFFERS, PIPE, DEVICE AND PROGRAM MANAGEMENT.pptx
IO Programming.pptx all informatiyon ppt
File management

More from Dr. C.V. Suresh Babu (20)

PPTX
Data analytics with R
PPTX
Association rules
PPTX
PPTX
Classification
PPTX
Blue property assumptions.
PPTX
Introduction to regression
PPTX
Expert systems
PPTX
Dempster shafer theory
PPTX
Bayes network
PPTX
Bayes' theorem
PPTX
Knowledge based agents
PPTX
Rule based system
PPTX
Formal Logic in AI
PPTX
Production based system
PPTX
Game playing in AI
PPTX
Diagnosis test of diabetics and hypertension by AI
PPTX
A study on “impact of artificial intelligence in covid19 diagnosis”
PDF
A study on “impact of artificial intelligence in covid19 diagnosis”
Data analytics with R
Association rules
Classification
Blue property assumptions.
Introduction to regression
Expert systems
Dempster shafer theory
Bayes network
Bayes' theorem
Knowledge based agents
Rule based system
Formal Logic in AI
Production based system
Game playing in AI
Diagnosis test of diabetics and hypertension by AI
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”

Recently uploaded (20)

PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Cell Types and Its function , kingdom of life
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
master seminar digital applications in india
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
01-Introduction-to-Information-Management.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
Institutional Correction lecture only . . .
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Pre independence Education in Inndia.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial disease of the cardiovascular and lymphatic systems
Cell Types and Its function , kingdom of life
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
human mycosis Human fungal infections are called human mycosis..pptx
Week 4 Term 3 Study Techniques revisited.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Basic Mud Logging Guide for educational purpose
master seminar digital applications in india
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPH.pptx obstetrics and gynecology in nursing
01-Introduction-to-Information-Management.pdf
Classroom Observation Tools for Teachers
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Cell Structure & Organelles in detailed.
Institutional Correction lecture only . . .
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pre independence Education in Inndia.pdf

Ipc in linux

  • 1. Inter- Process Communication Pipes and FIFO’s (named pipes) Semaphores Messages Shared Memory Regions Sockets
  • 2. Pipes One-way flow of data | operator eg: $ ls | more instead of : $ ls > temp $ temp > more The shell statement is much shorter and simpler There is no need to find temporary regular files, which must be deleted later.
  • 3. System calls for using a pipe Pipe() Read() Write() POSIX : half- duplex pipes System V Release 4 : full-duplex pipes Linux : one-way file descriptors but not necessary to close one of them before using the other
  • 4. ‘ ls | more’ example When command shells interprets ‘ls | more’, it: 1. Invokes the pipe( ) system call; let us assume that pipe( ) returns the file descriptors 3 (the pipe's read channel ) and 4 (the write channel ). 2. Invokes the fork( ) system call twice. 3. Invokes the close( ) system call twice to release file descriptors 3 and 4.
  • 5. ‘ ls | more’ example The first child process, which must execute the l s program, performs the following operations: 1. Invokes dup2(4,1) to copy file descriptor 4 to file descriptor 1. From now on, file descriptor 1 refers to the pipe's write channel. 2. Invokes the close( ) system call twice to release file descriptors 3 and 4. 3. Invokes the execve( ) system call to execute the /bin/ls program . By default, such a program writes its output to the file having file descriptor 1 (the standard output), that is, it writes into the pipe .
  • 6. ‘ ls | more’ example The second child process must execute the more program; therefore, it performs the following operations: 1. Invokes dup2(3,0) to copy file descriptor 3 to file descriptor 0. From now on, file descriptor 0 refers to the pipe's read channel. 2. Invokes the close( ) system call twice to release file descriptors 3 and 4. 3. Invokes the execve( ) system call to execute /bin/more . By default, that program reads its input from the file having file descriptor (the standard input); that is, it reads from the pipe.
  • 7. Wrapper functions The popen( ) function receives two parameters: the filename pathname of an executable file and a type string specifying the direction of the data transfer. It returns the pointer to a FILE data structure. The popen( ) function essentially performs the following operations: 1. Creates a new pipe by making use of the pipe( ) system call 2. Forks a new process, which in turn executes the following operations: a. If type is r , duplicates the file descriptor associated with the pipe's write channel as file descriptor 1 (standard output); otherwise, if type is w , duplicates the file descriptor associated with the pipe's read channel as file descriptor 0 (standard input) b. Closes the file descriptors returned by pipe( ) c. Invokes the execve( ) system call to execute the program specified by filename 3. If type is r , closes the file descriptor associated with the pipe's write channel; otherwise, if type is w , closes the file descriptor associated with the pipe's read channel 4. Returns the address of the FILE file pointer that refers to whichever file descriptor for the pipe is still open
  • 8. Wrapper functions The pclose( ) function, which receives the file pointer returned by popen( ) as its parameter, simply invokes the wait4( ) system call and waits for the termination of the process created by popen( ) .
  • 9. Pipe Data Structures For each pipe, kernel creates : an inode object, its u field containing the pipe_inode_info structure. i_size field stores the current pipe size i_atomic_write semaphore two file objects, one for reading and other for writing pipe buffer
  • 10. Creating a pipe The pipe( ) system call is serviced by the sys_pipe( ) function, which in turn invokes the do_pipe( ) function. In order to create a new pipe, do_pipe( ) performs the following operations: 1. Allocates a file object and a file descriptor for the read channel of the pipe, sets the flag field of the file object to O_RDONLY , and initializes the f_op field with the address of the read_ pipe_fops table. 2. Allocates a file object and a file descriptor for the write channel of the pipe, sets the flag field of the file object to O_WRONLY , and initializes the f_op field with the address of the write_ pipe_fops table. 3. Invokes the get_ pipe_inode( ) function, which allocates and initializes an inode object for the pipe. This function also allocates a page frame for the pipe buffer and stores its address in the base field of the pipe_inode_info structure. 4. Allocates a dentry object and uses it to link together the two file objects and the inode object. 5. Returns the two file descriptors to the User Mode process.
  • 11. Destroying a pipe Whenever a process invokes the close( ) system call on a file descriptor associated with a pipe, the kernel executes the fput( ) function on the corresponding file object, which decrements the usage counter. If the counter becomes 0, the function invokes the release method of the file operations. the pipe_read_release( ) and the pipe_write_release( ) functions set 0 to the readers and the writers fields, respectively, of the pipe_inode_info structure. Each function then invokes the pipe_release( ) function. This function wakes up any processes sleeping in the pipe's wait queue so that they can recognize the change in the pipe state. Moreover, the function checks whether both the readers and writers fields are equal to 0; in this case, it releases the page frame containing the pipe buffer.
  • 12. Reading from a pipe A process wishing to get data from a pipe issues a read( ) system call, specifying as its file descriptor the descriptor associated with the pipe's read channel. T he kernel ends up invoking the read method found in the file operation table associated with the proper file object. In the case of a pipe, the entry for the read method in the read_pipe_fops table points to the pipe_read( ) function which performs the following operations: 1. Determines if the pipe size, which is stored into the inode's i_size field, is 0. In this case, determines if the function must return or if the process must be blocked while waiting until another process writes some data in the pipe. The type of I/O operation (blocking or nonblocking) is specified by the O_NONBLOCK flag in the f_flags field of the file object. If necessary, invokes the i nterruptible_sleep_on( ) function to suspend the current process after having inserted it in the wait queue to which the wait field of the pipe_inode_info data structure points.
  • 13. Reading from a pipe 2. Checks the lock field of the pipe_inode_info data structure. If it is not null, another process is currently accessing the pipe; in this case, either suspends the current process or immediately terminates the system call, depending on the type of read operation (blocking or nonblocking). 3. Increments the lock field. 4. Copies the requested number of bytes (or the number of available bytes, if the buffer size is too small) from the pipe's buffer to the user address space. 5. Decrements the lock field. 6. Invokes wake_up_interruptible( ) to wake up all processes sleeping on the pipe's wait queue. 7. Returns the number of bytes copied into the user address space.
  • 14. Writing into a pipe A process wishing to put data into a pipe issues a write( ) system call, specifying as its file descriptor the descriptor associated with the pipe's write channel. The kernel satisfies this request by invoking the write method of the proper file object; the corresponding entry in the write _ pipe_fops table points to the pipe_write( ) function which performs the following functions: 1. Checks whether the pipe has at least one reading process. If not, sends a SIGPIPE signal to the current process and return an -EPIPE value. 2. Releases the i_sem semaphore of the pipe's inode, which was acquired by the sys_write( ) function and acquires the i_atomic_write semaphore of the same inode . The i_sem semaphore prevents multiple processes from starting write operations on a file, and thus on the pipe.
  • 15. Writing into a pipe 3. Checks whether the number of bytes to be written is within the pipe's buffer size: a. If so, the write operation must be atomic. Therefore, checks whether the buffer has enough free space to store all bytes to be written. b. If the number of bytes is greater than the buffer size, the operation can start as long as there is any free space at all. Therefore, checks for at least 1 free byte. 4. If the buffer does not have enough free space and the write operation is blocking, inserts the current process into the pipe's wait queue and suspends it until some data is read from the pipe. Notice that the i_atomic_write semaphore is not released, so no other process can start a write operation on the buffer. If the write operation is nonblocking, returns the -EAGAIN error code.
  • 16. Writing into a pipe 5. Checks the lock field of the pipe_inode_info data structure. If it is not null, another process is currently reading the pipe, so either suspends the current process or immediately terminates the write depending on whether the write operation is blocking or nonblocking. 6. Increments the lock field. 7. Copies the requested number of bytes (or the number of free bytes if the pipe size is too small) from the user address space to the pipe's buffer. 8. If there are bytes yet to be written, goes to step 4. 9. After all requested data is written, decrements the lock field. 10. Invokes wake_up_interruptible( ) to wake up all processes sleeping on the pipe's wait queue. 11. Releases the i_atomic_write semaphore and acquires the i_sem semaphore (so that sys_write( ) can safely release the latter). 12. Returns the number of bytes written into the pipe's buffer.
  • 17. FIFOs A process creates a FIFO by issuing a mknod( ) system call , passing to it as parameters the pathname of the new FIFO and the value S_IFIFO ( 0x1000 ) logically ORed with the permission bit mask of the new file. POSIX introduces a system call named mkfifo( ) specifically to create a FIFO. This call is implemented in Linux, as in System V Release 4, as a C library function that invokes mknod( ) . Once created, a FIFO can be accessed through the usual open( ) , read( ) , write( ) , and close( ) system calls .
  • 18. Fifo_open( ) The fifo_open( ) function examines the values of the readers and writers fields in the pipe_inode_info data structure.
  • 19. Fifo_open( ) 2. T he function further determines specialized behavior for the set of file operations to be used by setting the f_op field of the file object to the address of some predefined tables shown in Table 18-5 . 3. Finally, the function checks whether the base field of the pipe_inode_info data structure is NULL ; in this case, it gets a free page frame for the FIFO's kernel buffer and stores its address in base .
  • 20. System V IPC It denotes a set of system calls that allows a User Mode process to : Synchronize itself with other processes by means of semaphores Send messages to other processes or receive messages from them Share a memory area with other processes IPC data structures are created dynamically when a process requests an IPC resource (a semaphore, a message queue, or a shared memory segment). Each IPC resource is persistent: unless explicitly released by a process, it is kept in memory. An IPC resource may be used by any process, including those that do not share the ancestor that created the resource. E ach new resource is identified by a 32-bit IPC key , which is similar to the file pathname in the system's directory tree. Each IPC resource also has a 32-bit IPC identifier , which is somewhat similar to the file descriptor associated with an open file. When two or more processes wish to communicate through an IPC resource, they all refer to the IPC identifier of the resource.
  • 21. Using an IPC resource By invoking the semget( ) , msgget( ) , or shmget( ) functions , IPC identifier is derived from the corresponding IPC key, which is then used to access the resource. If there is no IPC resource already associated with the IPC key, a new resource is created. If everything goes right, the function returns a positive IPC identifier; otherwise, it returns one of the error codes illustrated in Table 18-6 .
  • 22. Using an IPC resource The ftok( ) function attempts to create a new key from a file pathname and an 8-bit project identifier passed as parameters. It does not gu arantee, however, a unique key number . One process issues a semget( ) , msgget( ) , or shmget( ) function by specifying IPC_PRIVATE as its IPC key. A new IPC resource is thus allocated, and the process can either communicate its IPC identifier to the other process in the application or fork the other process itself. IPC_CREAT flag specifies that the IPC resource must be created, if it does not already exist. IPC_EXCL flag specifies that the function must fail if the resource already exists and the IPC_CREAT flag is set.
  • 23. Using an IPC resource Each IPC identifier is computed by combining a slot usage sequence number s relative to the resource type, an arbitrary slot index i for the allocated resource, and the value chosen in the kernel for the maximum number of allocatable resources M , where 0 i < M IPC identifier = s x M + i At every allocation, slot index i is incremented. At every deallocation, t he slot usage sequence number s , which is initialized to 0, is incremented by 1 while i decreases.
  • 24. Using an IPC resource The semctl( ) , msgctl( ) , and shmctl( ) functions may be used to handle IPC resources. The IPC_SET subcommand allows a process to change the owner's user and group identifiers and the permission bit mask in the ipc_perm data structure. The IPC_STAT and IPC_INFO subcommands retrieve some information concerning a resource. T he IPC_RMID subcommand releases an IPC resource. A process may acquire or release an IPC semaphore by issuing the semop( ) function. When a process wants to send or receive an IPC message, it uses the msgsnd( ) and msgrcv( ) functions, respectively. A process attaches and detaches a shared memory segment in its address space by means of the shmat( ) and shmdt( ) functions, respectively.
  • 25. IPC semaphores IPC semaphores are counters used to provide controlled access to shared data structures for multiple processes. The semaphore value is positive if the protected resource is available, and negative or 0 if the protected resource is currently not available. A process that wants to access the resource decrements by 1 the semaphore value. It is allowed to use the resource only if the old value was positive; otherwise, the process waits until the semaphore becomes positive. When a process relinquishes a protected resource, it increments its semaphore value by 1; in doing so, any other process waiting for the semaphore is woken up.
  • 26. IPC semaphores Each IPC semaphore is a set of one or more semaphore values , called primitive semaphores , not just a single value as for kernel semaphores. The number of primitive semaphore s in each IPC semaphore must be specified as a parameter of the semget( ) function when the resource is being allocated, but it cannot be greater than SEMMSL (usually 32). When a process chooses to use this mechanism, the resulting operations are called undoable semaphore operations. When the process dies, all of its IPC semaphores can revert to the values they would have had if the process had never started its operations.
  • 27. IPC semaphores T o access one or more resources protected by an IPC semaphore , a process: 1. Invokes the semget( ) wrapper function to get the IPC semaphore identifier, specifying as the parameter the IPC key of the IPC semaphore that protects the shared resources. If the process wants to create a new IPC semaphore, it also specifies the IPC_CREATE or IPC_PRIVATE flag and the number of primitive semaphores required. 2. Invokes the semop( ) wrapper function to test and decrement all primitive semaphore values involved. If all the tests succeed, the decrements are performed, the function terminates, and the process is allowed to access the protected resources. If some semaphores are in use, the process is usually suspended until some other process releases the resources. The function receives as parameters the IPC semaphore identifier, an array of numbers specifying the operations to be atomically performed on the primitive semaphores, and the number of such operations. Optionally, the process may specify the SEM_UNDO flag, which instructs the kernel to reverse the operations should the process exit without releasing the primitive semaphores. 3. When relinquishing the protected resources, invokes the semop( ) function again to atomically increment all primitive semaphores involved. 4. Optionally, invokes the semctl( ) wrapper function, specifying in its parameter the IPC_RMID flag to remove the IPC semaphore from the system.
  • 28. IPC semaphore data structures
  • 29. IPC semaphore data structures semary array: Statically allocated Includes SEMMNI values (usually 128) Each element in the array can assume one of the following values: IPC_UNUSED (-1): no IPC resource refers to this slot. IPC_NOID (-2): the IPC resource is being allocated or destroyed. The address of a dynamically allocated memory area containing the IPC semaphore resource. Index number of array represents the slot index i. When a new IPC resource must be allocated, the kernel scans the array and uses the first array element (slot) containing the value IPC_UNUSED .
  • 30. IPC semaphore data structures The first locations of the memory area containing the IPC semaphore store a descriptor of type struct semid_ds . All other locations in the memory area store several sem data structures with t wo fields: semval sempid
  • 31. Undoable semaphore operations Processes may require undoable operations by specifying the SEM_UNDO flag in the semop( ) function. sem_undo data structure contains the IPC identifier of the semaphore , and an array of integers representing the changes to the primitive semaphore's values caused by all undoable operations per-process list used when a process terminates per-semaphore list used when a process invokes the semctl( ) function to force an explicit value into a primitive semaphore used when an IPC semaphore is destroyed
  • 32. IPC semaphore data structures The queue of pending requests:
  • 33. IPC Messages A message is composed of: a fixed-size header, which is a data structure of type msg a variable-length text
  • 34. IPC message queue data structures
  • 35. IPC message queue data structures
  • 36. Sending and Receiving Messages msgsnd( ) function; parameters : The IPC identifier of the destination message queue The size of the message text The address of a User Mode buffer that contains the message type immediately followed by the message text msgrcv( ) function; parameters: The IPC identifier of the IPC message queue resource The pointer to a User Mode buffer to which the message type and message text should be copied The size of this buffer A value t that specifies what message should be retrieved
  • 37. IPC Shared Memory shmget( ) function shmat( ) function shmdt( ) function shmctl( ) function accesses the contents of the shmid_ds data structure named u inside the shmid_kernel structure which is IPC shared memory descriptor.
  • 38. IPC shared memory data structures
  • 39. IPC Shared Memory Data Structures
  • 40. Demand paging for IPC shared memory segments “ Page Fault” occurs when a process tries to access a location of shared memory segment. Exception handler invokes the do_no_page function. It invokes nopage method and the page table entry is set to the address returned by it. (The usage counter of the page frame is increased.)
  • 41. Swapping out pages of IPC shared memory segments try_to_sawp_out( ) function : page frame is not released to the Buddy system its usage counter is decremented by 1 corresponding entry in the calling process’s page table is cleared. do_try_to_free_pages( ) function periodically invokes shm_swap( ) function, which iteratively scans all descriptors referenced by the shm_segs array for each descriptor, examines all page frames assigned to each segment if the usage counter of a page frame is found to be 1, that page is swapped out to the disk the swapped out page identifier is saved in the shm_pages array.
  • 43. Linux is a multitasking kernel that allows more than one process to exist at any given time A kernel’s scheduler enforces a thread scheduling policy, including when, for how long, and in some cases where (on Symmetric Multiprocessing (SMP) systems) threads can execute. Normally the scheduler runs in its own thread, which is woken up by a timer interrupt. Otherwise it is invoked via a system call or another kernel thread that wishes to yield the CPU. A thread will be allowed to execute for a certain amount of time , then a context switch to the scheduler thread will occur, followed by another context switch to a thread of the scheduler’s choice.
  • 44. CPU and I/O-bound Threads CPU Bound : Threads spend a lot of time doing CPU computation. eg. Video Encoder. I/O Bound : Threads spend a lot of time waiting for relatively slow I/O operations. eg. VI ( Text Editor)
  • 45. Linux Kernel 2.6 Molnar designed this scheduler with the desktop and the server market in mind, and as a result desktop performance is much improved in Linux distributions based on 2.6.x kernels.
  • 46. Linux Scheduling Goals Efficiency : Every algorithm is O(1) regardless of number of running processes. Provide good interactive performance. Provide fairness : No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice. Soft RT Scheduling : RT tasks are assigned special scheduling modes and the scheduler gives them priority over any other task on the system.
  • 47. POLICY Determines what runs when!!! Responsible for optimally utilizing processor time & overall feel of the system.
  • 48. Linux Policy CPU bound processes are scheduled less frequently but are given longer timeslice. I/O bound processes are scheduled for short periods and more frequently. Leading to improved process response time at the cost of process throughput.
  • 49. There are two key data structures in the Linux 2.6.8.1 scheduler that allow for it to perform its duties in O(1) time, and its design revolves around them – runqueues and priority arrays.
  • 50. RUNQUEUE Runqueue keeps track of all runnable tasks assigned to a particular CPU. As such, one runqueue is created and maintained for each CPU in a system. Each runqueue contains two priority arrays, tasks on a CPU begin in one priority array, the active one, and as they run out of their timeslices they are moved to the expired priority array. During the move, a new timeslice is calculated. When there are no more runnable tasks in the active priority arrays, it is simply swapped with the expired priority array (simply updating two pointers). Runqueue also keep track of a CPU’s special thread information (idle thread, migration thread).
  • 51. RUNQUEUE DATASTRUCTURE struct runqueue { spinlock_t lock; spin lock which protects this runqueue unsigned long nr_running; number of runnable tasks unsigned long nr_switches; number of contextswitches unsigned long expired_timestamp; time of last array swap unsigned long nr_uninterruptible; number of tasks in uinterruptible sleep struct task_struct *curr; this processor's currently running task struct task_struct *idle; this processor's idle task
  • 52. struct mm_struct *prev_mm; mm_struct of last running task struct prio_array *active; pointer to the active priority array struct prio_array *expired; pointer to the expired priority array struct prio_array arrays[2]; the actual priority arrays int prev_cpu_load[NR_CPUS]; load on each processor struct task_struct *migration_thread; the migration thread on this processor struct list_head migration_queue; the migration queue for this processor atomic_t nr_iowait; number of tasks waiting on I/O }
  • 53. PRIORITY ARRAY This data structure is the basis for most of the Linux 2.6.8.1 scheduler’s advantageous behavior, in particular its O(1) (constant) time performance. The Linux 2.6.8.1 scheduler always schedules the highest priority task on a system, and if multiple tasks exist at the same priority level, they are scheduled roundrobin with each other. Priority arrays make finding the highest priority task in a system a constant-time operation, and also makes round-robin behavior within priority levels possible in constant-time. Furthermore, using two priority arrays in unison (the active and expired priority arrays) makes transitions between timeslice epochs a constant-time operation. An epoch is the time between when all runnable tasks begin with a fresh timeslice and when all runnable tasks have used up their timeslices.
  • 54. Priority Array Datastructure unsigned int nr_active The number of active tasks in the priority array. unsigned long bitmap[BITMAP_SIZE] The bitmap representing the priorities for which active tasks exist in the priority array. It is stored in form of 5 32-bit words. For example - if there are three active tasks, two at priority 0 and one at priority 5, then bits 0 and 5 should be set in this bitmap. Because the number of priorities is constant time to search the first set bit.
  • 55. struct list_head queue[MAX_PRIO] An array of linked lists. There is one list in the array for each priority level (MAX_PRIO). In Linux kernel 2.6.x MAX_PRIO is 140.
  • 56. Priorities STATIC PRIORITY All tasks have a static priority, often called a nice value. On Linux, nice values range from -20 to 19, with higher values being lower priority . By default, tasks start with a static priority of 0, but that priority can be changed via the nice() system call. Apart from its initial value and modifications via the nice() system call, the scheduler never changes a task’s static priority. Static priority is the mechanism through which users can modify task’s priority, and the scheduler will respect the user’s input. A task’s static priority is stored in its static_prio variable.
  • 57. Priorities DYNAMIC PRIORITY Scheduler rewards I/O-bound tasks and punishes CPU-bound tasks by adding or subtracting from a task’s static priority. This adjusted priority is dynamic priority and is stored in p->prio. Maximum Priority bonus & penalty is 5.
  • 58. I/O-bound vs. CPU-bound Heuristics To determine how much penalty or bonus has to be given to a process scheduler keeps track of how long the process sleeps as opposed to running during its time slice. When a task is woken up from sleep, its total sleep time is added to its sleep_avg variable (limiting the max value to MAX_SLEEP_AVG) When a task gives up the CPU, voluntarily or involuntarily, the time the current task spent running is subtracted from its sleep_avg.
  • 59. Calculating dynamic priority The sleep_avg value is used to determine the DP as it is an index of how much process is I/O bound or CPU bound effective_prio() It checks whether the function is RT or not, If it is RT then no bonus. It maps the sleep_avg value to bonus in range -5 to 5 which is then subtracted from the static priority to get DP.
  • 60. Calculating Timeslice Timeslice is calculated by simply scaling a task’s static priority onto the possible timeslice range and making sure a certain minimum and maximum timeslice is enforced .(task_timeslice()) The higher the task’s static priority (the lower the task’s static_prio value) the larger the timeslice it gets.
  • 61. Interactive task’s timeslice may be broken up into chunks, based on the TIMESLICE_GRANULARITY value in the scheduler. The function scheduler_tick() (called after every 1 msec) checks to see if the currently running task has been taking the CPU from other tasks of the same dynamic priority for too long (TIMESLICE_GRANULARITY). If a task has been running for TIMESLICE_GRANULARITY and task of the same dynamic priority exists a round-robin switch between other tasks of the same dynamic priority is made.
  • 62. Reinsertion of interactive task If a task is sufficiently interactive, when it exhausts its timeslice, it will not be inserted into the expired array, but instead reinserted back into the active array(so long as nothing is starving in the expired priority array checked by EXPIRED_STARVING() called by scheduler_tick())
  • 63. Fairness when forking new tasks When new tasks (threads or processes) are forked, the functions wake_up_forked_thread() and wake_up_forked_process() decrease the sleep avg of both parents and children. This prevents highly interactive tasks from spawning other highly interactive tasks.
  • 64. Interactivity Credits Every task maintains an interactive_credit variable. Tasks get an interactive credit when they sleep for a long time, and lose an interactive credit when they run for a long time. If a task has less then -100 interactivity credits it is considered to have low interactivity credit. Low interactivity credit tasks waking from uninterruptible sleep are limited in their sleep avg rise since they are probably CPU hogs waiting on I/O. High interactivity credit tasks get less run time subtracted from their sleep avg in order to prevent them from losing interactive status too quickly.
  • 66. Wait queue A wait queue is essentially a list of tasks waiting for some event or condition to occur. When that event occurs, code controlling that event will tell the waitqueue to wake up all of the tasks in it.
  • 67. Going to Sleep Create a wait queue via DECLARE_WAITQUEUE(). Add task to the wait queue via add_wait_queue(). To wake tasks when the event occurs : wake_up() Mark task as sleeping, either TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE. calls schedule() When the task wakes up, mark task as TASK_RUNNING and remove it from the wait queue via remove_wait_queue().
  • 68. Interruptible & Uninterruptible states Uninterruptible state : Ignores any signal until the event it was originally waiting for occurs. Interruptible state : respond to the signal immediately (though their response won’t necessarily be to die as the user probably wants).
  • 69. Main Scheduling Function Pick a new task to run and switch to it. Called whenever a task wishes to give up the CPU voluntarily (often through the sys_sched_yield() system call), and if scheduler_tick() sets the TIF_NEED_RESCHED flag on a task because it has run out of timeslice. scheduler_tick() is a function called during every system time tick, via a clock interrupt. It checks the state of the currently running task and other tasks in a CPU’s runqueue to see if scheduling is necessary .