SlideShare a Scribd company logo
Debasis Das
Mutual Exclusion
CS 704D Advanced OS 2
Complexities
In distributed systems
 Absence of shared memory
 Inter-node communication delays can be considerable
 Global system state cannot be observed by constituent
machines due to communication delays, component
failures, absence of shared memory
 Many more modes of failures yet fail soft is a goal
CS 704D Advanced OS 3
Some Considerations
 Policies/strategies developed for a distributed system
can be made applicable in a uniprocessor case
 However, policies/strategies developed for
uniprocessor case cannot be extended to distributed
case
 Same can be simulated by adding a central resources
allocator
 Increase traffic to central allocator, the system will fail
when the allocator fails
 Election of a successor would be needed
CS 704D Advanced OS 4
Required Assumptions
 Messages exchanged by a pair of communicating
processes need to be received in the same order as they
were generated (pipelining property)
 Every message is received without errors, no duplicates
 The underlying network ensures all nodes are fully
connected. Any node can communicate with every
other node
CS 704D Advanced OS 5
Desirable Properties
of Algorithms
 All nodes should have equal amount of information
 Each node makes decisions on the basis of local
information. The algorithm should ensure that nodes
make consistent & coherent decisions
 All nodes reach decisions through about equal effort
 Failure of a node should not cause complete break
down. The ability of reaching a decision and accessing
the resources should not be affected
CS 704D Advanced OS 6
Time & Ordering of Events
Happened Before Relationship
 Logical clock needs to ensure
 If a and b are events in the same process and a comes
before b then a->b
 If event a is a representation of sending of a message
and b is that of receiving of message in another process
the a->b
 It is a transitive relationship; that is if a->b and b->c
then a->c
 If a and b has no happened before relationship the a and
b are said to be concurrent
CS 704D Advanced OS 7
Time & Ordering of Events
Logical Clock Properties
 If a->b the C(a) < C(b)
 Clock condition is satisfied if
 If a and b are events in a process Pi and if a comes before
b the Ci(a)< Ci(b)
 If a is an event sending message m by process Pi and b is
the receipt of message by process Pj then Ci(a) <Cj(b)
CS 704D Advanced OS 8
Time & Ordering of Events
Logical Clock Implementation
 Process Pi increments the clock Ci between successive
events
 Message m needs to be time stamped so that
T(m)=ci(a)
 Receiving process adjusts clock such that it is max of
(Cj+1, Tm)
CS 704D Advanced OS 9
Total Ordering
 ab only when
 Ci(a) “less than” Cj(b) or
 Ci(a) = Cj(b) and Pi “less than” Pj
 Simple way to implement “less than” relation would be
to assign a unique number to each process and define
the “less than” such that i < j.
CS 704D Advanced OS 10
Lamport’s Algorithm
 Initiator i: Process Pi requires an exclusive access to a resource. Sends
time stamped message request (Ti, i) where Ti = Ci to all the other
processes.
 Other processes(j, j not= i): When Pj receives the request, places the
request on its own queue, send a reply with time stamp (Tj, j) to
Process Pi
 Pi is allowed access only when
 Pi request is in front of the queue and
 All replies are time stamped later that the Pi time stamp
 Pi sends a release message by sending a release message, time stamped
suitably
 Pj removes Pi request from it request queue
Cost: 3 (N-1) messages, works best on bus based system where broadcast
costs are minimal
CS 704D Advanced OS 11
Ricart-Agarwala Algorithm
 Initiator i: Process Pi requires an exclusive access to a resource. Sends
time stamped message request (Ti, i) where Ti = Ci to all the other
processes.
 Other processes(j, j not= i): When Pj receives the request reacts as
follows,
 If Pj is not requesting the resource, it sends a time stamped reply
 If Pj needs the resource and the time stamp precedes the Pi’s time stamp
Pi’s request is retained, else a time stamped reply is returned.
 Pi is allowed access only when
 Pi request is in front of the queue and
 All replies are time stamped later that the Pi time stamp
 Pi sends a releases resource by sending a release message, for each
pending resources
Cost: 2(N-1) messages
CS 704D Advanced OS 12
Distributed Shared Memory
 A software abstraction over the loosely coupled
systems
 Provides a shared memory kind of operation over the
underlying IPC/RPC mechanisms
 Can be implemented in OS kernel or runtime system
 Also known as Distributed Shared Virtual Memory
System (DSVM)
 The shared space exists only virtually
CS 704D Advanced OS 13
DSM Architecture
CS 704D Advanced OS 14
Distributed Shared Memory Layer
Memory
Mapping
CPU(s)
Memory
Mapping
CPU(s)
Memory
Mapping
CPU(s)
Communication Network
DSM Architecture
 Unlike tightly coupled systems, this shared memory is
entirely virtual
 Partitioned into blocks
 Local memory is treated as large local caches
 If the data requested is not available locally a network fault
is generated
 OS, through a message, requests the node holding the
block and gets it migrated to the node where fault occurred
 Data may be replicated locally
 Configuration varies depending on what kind of
replication, migration policies are used
CS 704D Advanced OS 15
Design issues
 Granularity (block size): Smaller size, higher faults,
traffic; larger blocks mean jobs with higher locality
 Structure: Layout of data, depends on application
 Coherence & access synchronization: Like the cache
situation in a uniprocessor system
 Data Location & access: what data to be replicated,
located
 Replacement strategy
 Thrashing
 Heterogeneity
CS 704D Advanced OS 16
Granularity
CS 704D Advanced OS 17
Block Size Selection Factors
 Large block sizes favored as overheads to transfer
smaller blocks and larger one not too different
 Paging overhead- paging overheads also favors larger
block sizes, application should thus have larger locality
of reference
 Directory size-smaller block larger directory, larger
management overhead
 Thrashing- thrashing is likely to increase with larger
block size
 False sharing-larger block sizes increases probability.
Consequence, higher thrashing
CS 704D Advanced OS 18
Page Size as Block Size
 Page size is preferred as the DSM block size
 Advantages are
 Existing page fault hardware can be used as block fault
mechanism. Memory coherence can be handled in page
fault handlers
 Access control can be managed with existing memory
mapping systems
 If page size is less than packet size, no extra overhead
 Page size proved to be, over time, the right unit as far as
memory contention
CS 704D Advanced OS 19
Structure
CS 704D Advanced OS 20
Structure of Shared Memory
Space
 Approaches to structuring
 No structure: a linear array of memory, easy to design
 By data type: granularity per variable, complex to
handle
 As database: as tuple space, associative memory,
primitives need to be added to languages, non
transparent access to shared data
CS 704D Advanced OS 21
Consistency Models
CS 704D Advanced OS 22
Consistency Models
 Strict consistency
 Sequential consistency
 Causal consistency
 Pipelined random access memory consistency
 Processor consistency
 Weak consistency
 Release consistency
CS 704D Advanced OS 23
Strict Consistency Model
 Value read of a memory address is the same as the
latest write at that address
 Writes become visible to all nodes
 Needs absolute ordering of memory read/write
operations, a global time required (to define most
recent)
 Nearly impossible to implement
CS 704D Advanced OS 24
Sequential Consistency Model
 All processes should see the same ordering of read,
writes
 Exact interleaving does not matter
 No memory operation is started unless earlier
operations have completed
 Acceptable in most applications
CS 704D Advanced OS 25
Causal Consistency Model
 Operations are seen in same order (correct order)when
they are causally related
 W2 follows w1 and causally related, then w1, w2 is the
order every process should see
 They may not be seen in same order when not related
causally
CS 704D Advanced OS 26
Pipelined RAM Consistency Model
 All writes of a single process are seen in the same order
by other processes (as in a pipeline)
 However, writes by other processes may appear in
different order.
 (W11,w12) and (w21, w22) can be seen as (wi1,wi2)
followed by (w21, w22) or (w21, w22) followed by
(w11,w12)
 Simple to implement
CS 704D Advanced OS 27
Processor Consistency Model
 Adds memory coherence to the PRAM model
 That is if the writes are for a particular memory
location then all processes should see the writes in the
same order that maintains memory coherence
CS 704D Advanced OS 28
Weak Consistency Model
 Changes in memory can be made after a set of changes has happened (example critical
section)
 Isolated access to variable is usually rare, usually there will be several accesses and then
none at all
 Difficulty is the system would not know when to show the changes
 Application programmers can take care of this through a synchronization variable
 Necessarily
 All accesses to sync variable must follow strongest consistency9sequential)
 All pending writes must be completed before access to sync variable is allowed
 All previous access to sync must be completed before another access is allowed
CS 704D Advanced OS 29
Release Consistency Model
 Weak consistency model requires that
 All changes made by a process are propagated to all
nodes
 All changes at other nodes are propagated to the
processor node
 Acquire and release variable used for sync so that only
one of the operations above need to be done
CS 704D Advanced OS 30
Discussion of Models
 Strict sequential model s difficult to implement,
almost never implemented
 Sequential consistency model is most commonly used
 Causal, PRAM, processor, weak and release
consistency are the ones implemented in many DSM
systems, programmers need to intervene
 Weak and release consistency provides explicit sync
variables to help with the consistency
CS 704D Advanced OS 31
Implementing Sequential
Concurrency Model
 Implementing sequential consistency would depend
on what replication/ migration are allowed
 Migration/Replication strategies
 Non replicated, non migrating blocks (NRNMBs)
 Non replicated, migrating blocks (NRMBs)
 Replicated, migrating blocks (RMBs)
 Replicated, non migrating blocks (RNMBs)
CS 704D Advanced OS 32
NRNMB
 All requests to a block are routed through the OS and
MMU to this one block that is not replicate and does
not move anywhere
 Can cause
 Bottleneck because of serializing of memory accesses
 Parallelism is not possible
CS 704D Advanced OS 33
NRMB
 No copies, if required entire block may be moved to
the node that requires it
 Advantages
 No communication costs, all accesses are local
 Applications can take advantage of locality, applications
with high locality will perform better
 Disadvantages
 Prone to thrashing
 No advantage of parallelism
CS 704D Advanced OS 34
Data Locating in NRMB
 Broadcast
 Fault happens, a request is broadcast, current owner sends
the block
 Broadcast cause communication overheads
 Centralized server
 Request sent to the server, servers asks the node holding the
block to send it to the requesting node, updates location
information
 Fixed distributed server
 Fault handler finds mapping of block to the specific server, send
request and gets the block
 Dynamic distributed server
 Fault causes a local search for probable owner, goes to that node,
finds another probable owner or the block, gets block updates info
CS 704D Advanced OS 35
RMB
 Replication is required to increase parallelism
 Reads can be done locally, writes has overheads
 High read/write ratio systems can apportion the write
overhead over many reads
 Maintaining coherence throughout replicated block is
an issue
 Two basic protocols used are
 Write-invalidate
 Write update
CS 704D Advanced OS 36
Coherence Protocols
 Write-invalidate
 On write fault, the fault handler copies the block from
one of the nodes to its own
 Invalidates all the copies, writes data
 If another node needs it now, the updated block is
replicated
 Write update
 On write fault, copy block to local node, update data
 Send address & new data to all the replicas
 Operation resumes after all the writes are done
CS 704D Advanced OS 37
Comparison
 Write update typically needs a global sequencer to
makes sure all nodes see writes in the same sequence
 Also the operations are full writes
 Together there is a significant communication
overhead
 Write invalidate does not need all that, just a
invalidation signal
 Write invalidate is thus more often used method
CS 704D Advanced OS 38
Data Locating in RMB Strategy
 Owner of a block needs to be located, the most recent
node which had write access
 Node that has a valid copy will need to be tracked
 Use on of the following
 Broadcasting
 Centralized server algorithm
 Fixed distributed server algorithm
 Dynamic distributed server algorithm
CS 704D Advanced OS 39
RNMB
 Replicas are maintained but blocks do not migrate
 Consistency is maintained by updating all the replicas
by a write update like process
CS 704D Advanced OS 40
Data Locating in RNMB Strategy
 Replica locations do not change
 Replicas are kept consistent
 Read requests can go to the nodes that has the data
block
 Writes through global sequencer
CS 704D Advanced OS 41
Munin: A Release Consistent DSM
System
 Structure: a collection of shared variables
 Each shared variable goes to a separate memory page
 acquireLock and releaselock are used
 Different consistency protocol is applied for different
types of shared variable used in the system
 Read-only, migratory, write-shared, producer-consumer,
result, reduction and conventional
CS 704D Advanced OS 42
Replacement Strategy
CS 704D Advanced OS 43
Replacement Strategy
 Shared memory blocks are replicated and/or migrated
so two strategies need to be decided
 Block to be replaced
 Where should the replaced block go
CS 704D Advanced OS 44
Blocks to Replace
 Usage based vs. non-usage based
 Fixed space vs. variable space
 Unused
 Nil
 Read only
 Read-owned
 Writable
CS 704D Advanced OS 45
Place for Replacement Block
 Using secondary store locally
 Using memory space of other nodes- store at free
memory space in some other node. Free memory space
status need to be exchanged, piggybacking on normal
communication messages
CS 704D Advanced OS 46
Thrashing
CS 704D Advanced OS 47
Thrashing Situations
 DSM allows migration, so migration back and forth leads
to thrashing
 Bata blocks keep migrating between nodes due to interleaved
accesses by processes
 Read only blocks are repeatedly invalidated so after
replication
CS 704D Advanced OS 48
Thrashing Reduction Strategies
 Application controlled locks
 Locking an application to a node for a time, deciding t
could be a very difficult issue
 Tune coherence strategy to the usage pattern,
transparency of the memory system is compromised
CS 704D Advanced OS 49
Other Approaches to DSM
CS 704D Advanced OS 50
Approaches
 Data caching managed by the OS
 Data Caching managed by MMUs
 Data Caching managed by the language run time
system
CS 704D Advanced OS 51
Heterogeneous DSM
CS 704D Advanced OS 52
Features of Heterogeneous DSM
 Data Conversion
 Structuring DSM as a source of source language objects
 Allowing one type of data in a block only (has
complications)
 Memory fragmentation
 Compilation issues
 Entire page is converted but a small part may be used before
transfer
 Not transparent, user provided conversion may be required
CS 704D Advanced OS 53
Advantages of DSM
CS 704D Advanced OS 54
Advantages
 Simpler abstraction
 Better portability of distributed applications
 Better performance of some Systems
 Flexible communications environment
 Ease of process migration
CS 704D Advanced OS 55

More Related Content

PDF
Inter Process Communication
PPTX
Inter-Process communication using pipe in FPGA based adaptive communication
PPT
IPC mechanisms in windows
PPTX
Inter process communication
PDF
Inter process communication using Linux System Calls
PPT
Ipc in linux
PPT
Inter process communication
PDF
Inter Process Communication
Inter-Process communication using pipe in FPGA based adaptive communication
IPC mechanisms in windows
Inter process communication
Inter process communication using Linux System Calls
Ipc in linux
Inter process communication

What's hot (20)

PDF
unix interprocess communication
PDF
PDF
Implementation of Pipe in Linux
PDF
Inter process communication
PPT
Ipc ppt
PPTX
Linux process management
PPTX
Pipes in Windows and Linux.
PDF
System Calls
PDF
PDF
Implementation of FIFO in Linux
PDF
Unit II - 2 - Operating System - Threads
PPTX
Linux Memory Management
DOCX
Unix system calls
PDF
Linux Network Management
PPTX
Process management in linux
PDF
Functional Programming with LISP
PPTX
Unit 7
DOCX
Linux syllabus
PDF
Part 04 Creating a System Call in Linux
PDF
Linux System Monitoring basic commands
unix interprocess communication
Implementation of Pipe in Linux
Inter process communication
Ipc ppt
Linux process management
Pipes in Windows and Linux.
System Calls
Implementation of FIFO in Linux
Unit II - 2 - Operating System - Threads
Linux Memory Management
Unix system calls
Linux Network Management
Process management in linux
Functional Programming with LISP
Unit 7
Linux syllabus
Part 04 Creating a System Call in Linux
Linux System Monitoring basic commands
Ad

Viewers also liked (13)

PPT
Ranjitbanshpal
PPT
multi processors
PPT
Cgmm presentation on distributed multimedia systems
PPTX
Multiprocessors(performance and synchronization issues)
PPT
4.file service architecture
PPTX
Distributed shred memory architecture
PPTX
Distributed systems and scalability rules
PDF
Distributed Systems Naming
PPT
Chapter 8 distributed file systems
ODP
Distributed shared memory shyam soni
PPT
3. distributed file system requirements
PPT
distributed shared memory
PPTX
Fault tolerance in distributed systems
Ranjitbanshpal
multi processors
Cgmm presentation on distributed multimedia systems
Multiprocessors(performance and synchronization issues)
4.file service architecture
Distributed shred memory architecture
Distributed systems and scalability rules
Distributed Systems Naming
Chapter 8 distributed file systems
Distributed shared memory shyam soni
3. distributed file system requirements
distributed shared memory
Fault tolerance in distributed systems
Ad

Similar to Cs704 d distributedmutualexcclusion&memory (20)

PPTX
Cs 704 d set2
PPTX
Cs 704 d set4distributedcomputing-1funda
PPTX
Apos week 1 4
PPTX
Cs704 d distributedschedulingetc.
PPT
Advanced Operating System, Distributed Operating System
PPT
Introduction distributed system modernss
PPT
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
PPTX
Cs 704 d set3
PDF
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
PPTX
Distributed operating system
PDF
Design of Parallel and HPC, Lecture: Memory Models
PDF
Distributed Operating System_4
PDF
DistributedOSintro.pdf from CSE Distributed operating system
PPTX
AABBCCDDOPERATING_SYSTEM_PARA_SUBIR.pptx
PPTX
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
PPT
Executing Multiple Thread on Modern Processor
PDF
OS_MD_1.pdffffffffffffffffffffffffffffffffffffff
PDF
OS_MD_1.pdf
PDF
istributed system
Cs 704 d set2
Cs 704 d set4distributedcomputing-1funda
Apos week 1 4
Cs704 d distributedschedulingetc.
Advanced Operating System, Distributed Operating System
Introduction distributed system modernss
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
Cs 704 d set3
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
Distributed operating system
Design of Parallel and HPC, Lecture: Memory Models
Distributed Operating System_4
DistributedOSintro.pdf from CSE Distributed operating system
AABBCCDDOPERATING_SYSTEM_PARA_SUBIR.pptx
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
KA 5 - Lecture 1 - Parallel Processing.pdf
Executing Multiple Thread on Modern Processor
OS_MD_1.pdffffffffffffffffffffffffffffffffffffff
OS_MD_1.pdf
istributed system

More from Debasis Das (20)

PPTX
Developing robust &amp; enterprise io t applications
PPTX
IoT: An Introduction and Getting Started Session
PPTX
Development eco-system in free-source for io t
PPTX
Microprocessors & microcontrollers- The design Context
PPT
Management control systems jsb 606 part4
PPT
Management control systems jsb 606 part3
PPT
Management control systems jsb 606 part2
PPTX
Management control systems jsb 606 part1
PPT
Computers for management jsb 1072003 ver
PPTX
Trends in education management
PPTX
Ei502microprocessorsmicrtocontrollerspart4 8051 Microcontroller
PPTX
Ei502microprocessorsmicrtocontrollerspart5 sixteen bit8086 1
PPTX
Ei502 microprocessors & micrtocontrollers part3hardwareinterfacing
PPTX
Ei502 microprocessors & micrtocontrollers part 2(instructionset)
PPTX
Ei502 microprocessors & micrtocontrollers part 1
PPTX
It802 d mobilecommunicationspart4
PPTX
It802 d mobilecommunicationspart3
PPTX
It 802 d_Mobile Communications_part 2
PPTX
It 802 d_Mobile Communications_part 2
PPT
It 802 d_mobile_communicationsSomeHistory
Developing robust &amp; enterprise io t applications
IoT: An Introduction and Getting Started Session
Development eco-system in free-source for io t
Microprocessors & microcontrollers- The design Context
Management control systems jsb 606 part4
Management control systems jsb 606 part3
Management control systems jsb 606 part2
Management control systems jsb 606 part1
Computers for management jsb 1072003 ver
Trends in education management
Ei502microprocessorsmicrtocontrollerspart4 8051 Microcontroller
Ei502microprocessorsmicrtocontrollerspart5 sixteen bit8086 1
Ei502 microprocessors & micrtocontrollers part3hardwareinterfacing
Ei502 microprocessors & micrtocontrollers part 2(instructionset)
Ei502 microprocessors & micrtocontrollers part 1
It802 d mobilecommunicationspart4
It802 d mobilecommunicationspart3
It 802 d_Mobile Communications_part 2
It 802 d_Mobile Communications_part 2
It 802 d_mobile_communicationsSomeHistory

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Understanding_Digital_Forensics_Presentation.pptx
sap open course for s4hana steps from ECC to s4
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars

Cs704 d distributedmutualexcclusion&memory

  • 2. Mutual Exclusion CS 704D Advanced OS 2
  • 3. Complexities In distributed systems  Absence of shared memory  Inter-node communication delays can be considerable  Global system state cannot be observed by constituent machines due to communication delays, component failures, absence of shared memory  Many more modes of failures yet fail soft is a goal CS 704D Advanced OS 3
  • 4. Some Considerations  Policies/strategies developed for a distributed system can be made applicable in a uniprocessor case  However, policies/strategies developed for uniprocessor case cannot be extended to distributed case  Same can be simulated by adding a central resources allocator  Increase traffic to central allocator, the system will fail when the allocator fails  Election of a successor would be needed CS 704D Advanced OS 4
  • 5. Required Assumptions  Messages exchanged by a pair of communicating processes need to be received in the same order as they were generated (pipelining property)  Every message is received without errors, no duplicates  The underlying network ensures all nodes are fully connected. Any node can communicate with every other node CS 704D Advanced OS 5
  • 6. Desirable Properties of Algorithms  All nodes should have equal amount of information  Each node makes decisions on the basis of local information. The algorithm should ensure that nodes make consistent & coherent decisions  All nodes reach decisions through about equal effort  Failure of a node should not cause complete break down. The ability of reaching a decision and accessing the resources should not be affected CS 704D Advanced OS 6
  • 7. Time & Ordering of Events Happened Before Relationship  Logical clock needs to ensure  If a and b are events in the same process and a comes before b then a->b  If event a is a representation of sending of a message and b is that of receiving of message in another process the a->b  It is a transitive relationship; that is if a->b and b->c then a->c  If a and b has no happened before relationship the a and b are said to be concurrent CS 704D Advanced OS 7
  • 8. Time & Ordering of Events Logical Clock Properties  If a->b the C(a) < C(b)  Clock condition is satisfied if  If a and b are events in a process Pi and if a comes before b the Ci(a)< Ci(b)  If a is an event sending message m by process Pi and b is the receipt of message by process Pj then Ci(a) <Cj(b) CS 704D Advanced OS 8
  • 9. Time & Ordering of Events Logical Clock Implementation  Process Pi increments the clock Ci between successive events  Message m needs to be time stamped so that T(m)=ci(a)  Receiving process adjusts clock such that it is max of (Cj+1, Tm) CS 704D Advanced OS 9
  • 10. Total Ordering  ab only when  Ci(a) “less than” Cj(b) or  Ci(a) = Cj(b) and Pi “less than” Pj  Simple way to implement “less than” relation would be to assign a unique number to each process and define the “less than” such that i < j. CS 704D Advanced OS 10
  • 11. Lamport’s Algorithm  Initiator i: Process Pi requires an exclusive access to a resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.  Other processes(j, j not= i): When Pj receives the request, places the request on its own queue, send a reply with time stamp (Tj, j) to Process Pi  Pi is allowed access only when  Pi request is in front of the queue and  All replies are time stamped later that the Pi time stamp  Pi sends a release message by sending a release message, time stamped suitably  Pj removes Pi request from it request queue Cost: 3 (N-1) messages, works best on bus based system where broadcast costs are minimal CS 704D Advanced OS 11
  • 12. Ricart-Agarwala Algorithm  Initiator i: Process Pi requires an exclusive access to a resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.  Other processes(j, j not= i): When Pj receives the request reacts as follows,  If Pj is not requesting the resource, it sends a time stamped reply  If Pj needs the resource and the time stamp precedes the Pi’s time stamp Pi’s request is retained, else a time stamped reply is returned.  Pi is allowed access only when  Pi request is in front of the queue and  All replies are time stamped later that the Pi time stamp  Pi sends a releases resource by sending a release message, for each pending resources Cost: 2(N-1) messages CS 704D Advanced OS 12
  • 13. Distributed Shared Memory  A software abstraction over the loosely coupled systems  Provides a shared memory kind of operation over the underlying IPC/RPC mechanisms  Can be implemented in OS kernel or runtime system  Also known as Distributed Shared Virtual Memory System (DSVM)  The shared space exists only virtually CS 704D Advanced OS 13
  • 14. DSM Architecture CS 704D Advanced OS 14 Distributed Shared Memory Layer Memory Mapping CPU(s) Memory Mapping CPU(s) Memory Mapping CPU(s) Communication Network
  • 15. DSM Architecture  Unlike tightly coupled systems, this shared memory is entirely virtual  Partitioned into blocks  Local memory is treated as large local caches  If the data requested is not available locally a network fault is generated  OS, through a message, requests the node holding the block and gets it migrated to the node where fault occurred  Data may be replicated locally  Configuration varies depending on what kind of replication, migration policies are used CS 704D Advanced OS 15
  • 16. Design issues  Granularity (block size): Smaller size, higher faults, traffic; larger blocks mean jobs with higher locality  Structure: Layout of data, depends on application  Coherence & access synchronization: Like the cache situation in a uniprocessor system  Data Location & access: what data to be replicated, located  Replacement strategy  Thrashing  Heterogeneity CS 704D Advanced OS 16
  • 18. Block Size Selection Factors  Large block sizes favored as overheads to transfer smaller blocks and larger one not too different  Paging overhead- paging overheads also favors larger block sizes, application should thus have larger locality of reference  Directory size-smaller block larger directory, larger management overhead  Thrashing- thrashing is likely to increase with larger block size  False sharing-larger block sizes increases probability. Consequence, higher thrashing CS 704D Advanced OS 18
  • 19. Page Size as Block Size  Page size is preferred as the DSM block size  Advantages are  Existing page fault hardware can be used as block fault mechanism. Memory coherence can be handled in page fault handlers  Access control can be managed with existing memory mapping systems  If page size is less than packet size, no extra overhead  Page size proved to be, over time, the right unit as far as memory contention CS 704D Advanced OS 19
  • 21. Structure of Shared Memory Space  Approaches to structuring  No structure: a linear array of memory, easy to design  By data type: granularity per variable, complex to handle  As database: as tuple space, associative memory, primitives need to be added to languages, non transparent access to shared data CS 704D Advanced OS 21
  • 22. Consistency Models CS 704D Advanced OS 22
  • 23. Consistency Models  Strict consistency  Sequential consistency  Causal consistency  Pipelined random access memory consistency  Processor consistency  Weak consistency  Release consistency CS 704D Advanced OS 23
  • 24. Strict Consistency Model  Value read of a memory address is the same as the latest write at that address  Writes become visible to all nodes  Needs absolute ordering of memory read/write operations, a global time required (to define most recent)  Nearly impossible to implement CS 704D Advanced OS 24
  • 25. Sequential Consistency Model  All processes should see the same ordering of read, writes  Exact interleaving does not matter  No memory operation is started unless earlier operations have completed  Acceptable in most applications CS 704D Advanced OS 25
  • 26. Causal Consistency Model  Operations are seen in same order (correct order)when they are causally related  W2 follows w1 and causally related, then w1, w2 is the order every process should see  They may not be seen in same order when not related causally CS 704D Advanced OS 26
  • 27. Pipelined RAM Consistency Model  All writes of a single process are seen in the same order by other processes (as in a pipeline)  However, writes by other processes may appear in different order.  (W11,w12) and (w21, w22) can be seen as (wi1,wi2) followed by (w21, w22) or (w21, w22) followed by (w11,w12)  Simple to implement CS 704D Advanced OS 27
  • 28. Processor Consistency Model  Adds memory coherence to the PRAM model  That is if the writes are for a particular memory location then all processes should see the writes in the same order that maintains memory coherence CS 704D Advanced OS 28
  • 29. Weak Consistency Model  Changes in memory can be made after a set of changes has happened (example critical section)  Isolated access to variable is usually rare, usually there will be several accesses and then none at all  Difficulty is the system would not know when to show the changes  Application programmers can take care of this through a synchronization variable  Necessarily  All accesses to sync variable must follow strongest consistency9sequential)  All pending writes must be completed before access to sync variable is allowed  All previous access to sync must be completed before another access is allowed CS 704D Advanced OS 29
  • 30. Release Consistency Model  Weak consistency model requires that  All changes made by a process are propagated to all nodes  All changes at other nodes are propagated to the processor node  Acquire and release variable used for sync so that only one of the operations above need to be done CS 704D Advanced OS 30
  • 31. Discussion of Models  Strict sequential model s difficult to implement, almost never implemented  Sequential consistency model is most commonly used  Causal, PRAM, processor, weak and release consistency are the ones implemented in many DSM systems, programmers need to intervene  Weak and release consistency provides explicit sync variables to help with the consistency CS 704D Advanced OS 31
  • 32. Implementing Sequential Concurrency Model  Implementing sequential consistency would depend on what replication/ migration are allowed  Migration/Replication strategies  Non replicated, non migrating blocks (NRNMBs)  Non replicated, migrating blocks (NRMBs)  Replicated, migrating blocks (RMBs)  Replicated, non migrating blocks (RNMBs) CS 704D Advanced OS 32
  • 33. NRNMB  All requests to a block are routed through the OS and MMU to this one block that is not replicate and does not move anywhere  Can cause  Bottleneck because of serializing of memory accesses  Parallelism is not possible CS 704D Advanced OS 33
  • 34. NRMB  No copies, if required entire block may be moved to the node that requires it  Advantages  No communication costs, all accesses are local  Applications can take advantage of locality, applications with high locality will perform better  Disadvantages  Prone to thrashing  No advantage of parallelism CS 704D Advanced OS 34
  • 35. Data Locating in NRMB  Broadcast  Fault happens, a request is broadcast, current owner sends the block  Broadcast cause communication overheads  Centralized server  Request sent to the server, servers asks the node holding the block to send it to the requesting node, updates location information  Fixed distributed server  Fault handler finds mapping of block to the specific server, send request and gets the block  Dynamic distributed server  Fault causes a local search for probable owner, goes to that node, finds another probable owner or the block, gets block updates info CS 704D Advanced OS 35
  • 36. RMB  Replication is required to increase parallelism  Reads can be done locally, writes has overheads  High read/write ratio systems can apportion the write overhead over many reads  Maintaining coherence throughout replicated block is an issue  Two basic protocols used are  Write-invalidate  Write update CS 704D Advanced OS 36
  • 37. Coherence Protocols  Write-invalidate  On write fault, the fault handler copies the block from one of the nodes to its own  Invalidates all the copies, writes data  If another node needs it now, the updated block is replicated  Write update  On write fault, copy block to local node, update data  Send address & new data to all the replicas  Operation resumes after all the writes are done CS 704D Advanced OS 37
  • 38. Comparison  Write update typically needs a global sequencer to makes sure all nodes see writes in the same sequence  Also the operations are full writes  Together there is a significant communication overhead  Write invalidate does not need all that, just a invalidation signal  Write invalidate is thus more often used method CS 704D Advanced OS 38
  • 39. Data Locating in RMB Strategy  Owner of a block needs to be located, the most recent node which had write access  Node that has a valid copy will need to be tracked  Use on of the following  Broadcasting  Centralized server algorithm  Fixed distributed server algorithm  Dynamic distributed server algorithm CS 704D Advanced OS 39
  • 40. RNMB  Replicas are maintained but blocks do not migrate  Consistency is maintained by updating all the replicas by a write update like process CS 704D Advanced OS 40
  • 41. Data Locating in RNMB Strategy  Replica locations do not change  Replicas are kept consistent  Read requests can go to the nodes that has the data block  Writes through global sequencer CS 704D Advanced OS 41
  • 42. Munin: A Release Consistent DSM System  Structure: a collection of shared variables  Each shared variable goes to a separate memory page  acquireLock and releaselock are used  Different consistency protocol is applied for different types of shared variable used in the system  Read-only, migratory, write-shared, producer-consumer, result, reduction and conventional CS 704D Advanced OS 42
  • 44. Replacement Strategy  Shared memory blocks are replicated and/or migrated so two strategies need to be decided  Block to be replaced  Where should the replaced block go CS 704D Advanced OS 44
  • 45. Blocks to Replace  Usage based vs. non-usage based  Fixed space vs. variable space  Unused  Nil  Read only  Read-owned  Writable CS 704D Advanced OS 45
  • 46. Place for Replacement Block  Using secondary store locally  Using memory space of other nodes- store at free memory space in some other node. Free memory space status need to be exchanged, piggybacking on normal communication messages CS 704D Advanced OS 46
  • 48. Thrashing Situations  DSM allows migration, so migration back and forth leads to thrashing  Bata blocks keep migrating between nodes due to interleaved accesses by processes  Read only blocks are repeatedly invalidated so after replication CS 704D Advanced OS 48
  • 49. Thrashing Reduction Strategies  Application controlled locks  Locking an application to a node for a time, deciding t could be a very difficult issue  Tune coherence strategy to the usage pattern, transparency of the memory system is compromised CS 704D Advanced OS 49
  • 50. Other Approaches to DSM CS 704D Advanced OS 50
  • 51. Approaches  Data caching managed by the OS  Data Caching managed by MMUs  Data Caching managed by the language run time system CS 704D Advanced OS 51
  • 52. Heterogeneous DSM CS 704D Advanced OS 52
  • 53. Features of Heterogeneous DSM  Data Conversion  Structuring DSM as a source of source language objects  Allowing one type of data in a block only (has complications)  Memory fragmentation  Compilation issues  Entire page is converted but a small part may be used before transfer  Not transparent, user provided conversion may be required CS 704D Advanced OS 53
  • 54. Advantages of DSM CS 704D Advanced OS 54
  • 55. Advantages  Simpler abstraction  Better portability of distributed applications  Better performance of some Systems  Flexible communications environment  Ease of process migration CS 704D Advanced OS 55