Cs704 d distributedmutualexcclusion&memory

Mutual Exclusion
CS 704D Advanced OS 2

Complexities
In distributed systems
 Absence of shared memory
 Inter-node communication delays can be considerable
 Global system state cannot be observed by constituent
machines due to communication delays, component
failures, absence of shared memory
 Many more modes of failures yet fail soft is a goal

Some Considerations
 Policies/strategies developed for a distributed system
can be made applicable in a uniprocessor case
 However, policies/strategies developed for
uniprocessor case cannot be extended to distributed
case
 Same can be simulated by adding a central resources
allocator
 Increase traffic to central allocator, the system will fail
when the allocator fails
 Election of a successor would be needed

Required Assumptions
 Messages exchanged by a pair of communicating
processes need to be received in the same order as they
were generated (pipelining property)
 Every message is received without errors, no duplicates
 The underlying network ensures all nodes are fully
connected. Any node can communicate with every
other node

Desirable Properties
of Algorithms
 All nodes should have equal amount of information
 Each node makes decisions on the basis of local
information. The algorithm should ensure that nodes
make consistent & coherent decisions
 All nodes reach decisions through about equal effort
 Failure of a node should not cause complete break
down. The ability of reaching a decision and accessing
the resources should not be affected

Time & Ordering of Events
Happened Before Relationship
 Logical clock needs to ensure
 If a and b are events in the same process and a comes
before b then a->b
 If event a is a representation of sending of a message
and b is that of receiving of message in another process
the a->b
 It is a transitive relationship; that is if a->b and b->c
then a->c
 If a and b has no happened before relationship the a and
b are said to be concurrent

Logical Clock Properties
 If a->b the C(a) < C(b)
 Clock condition is satisfied if
 If a and b are events in a process Pi and if a comes before
b the Ci(a)< Ci(b)
 If a is an event sending message m by process Pi and b is
the receipt of message by process Pj then Ci(a) <Cj(b)

Logical Clock Implementation
 Process Pi increments the clock Ci between successive
events
 Message m needs to be time stamped so that
T(m)=ci(a)
 Receiving process adjusts clock such that it is max of
(Cj+1, Tm)

Total Ordering
 ab only when
 Ci(a) “less than” Cj(b) or
 Ci(a) = Cj(b) and Pi “less than” Pj
 Simple way to implement “less than” relation would be
to assign a unique number to each process and define
the “less than” such that i < j.

Lamport’s Algorithm
 Initiator i: Process Pi requires an exclusive access to a resource. Sends
time stamped message request (Ti, i) where Ti = Ci to all the other
processes.
 Other processes(j, j not= i): When Pj receives the request, places the
request on its own queue, send a reply with time stamp (Tj, j) to
Process Pi
 Pi is allowed access only when
 Pi request is in front of the queue and
 All replies are time stamped later that the Pi time stamp
 Pi sends a release message by sending a release message, time stamped
suitably
 Pj removes Pi request from it request queue
Cost: 3 (N-1) messages, works best on bus based system where broadcast
costs are minimal

Ricart-Agarwala Algorithm
 Initiator i: Process Pi requires an exclusive access to a resource. Sends
time stamped message request (Ti, i) where Ti = Ci to all the other
processes.
 Other processes(j, j not= i): When Pj receives the request reacts as
follows,
 If Pj is not requesting the resource, it sends a time stamped reply
 If Pj needs the resource and the time stamp precedes the Pi’s time stamp
Pi’s request is retained, else a time stamped reply is returned.
 Pi is allowed access only when
 Pi request is in front of the queue and
 All replies are time stamped later that the Pi time stamp
 Pi sends a releases resource by sending a release message, for each
pending resources
Cost: 2(N-1) messages

Distributed Shared Memory
 A software abstraction over the loosely coupled
systems
 Provides a shared memory kind of operation over the
underlying IPC/RPC mechanisms
 Can be implemented in OS kernel or runtime system
 Also known as Distributed Shared Virtual Memory
System (DSVM)
 The shared space exists only virtually

DSM Architecture
Distributed Shared Memory Layer
Memory
Mapping
CPU(s)
Memory
Mapping
CPU(s)
Memory
Mapping
CPU(s)
Communication Network

DSM Architecture
 Unlike tightly coupled systems, this shared memory is
entirely virtual
 Partitioned into blocks
 Local memory is treated as large local caches
 If the data requested is not available locally a network fault
is generated
 OS, through a message, requests the node holding the
block and gets it migrated to the node where fault occurred
 Data may be replicated locally
 Configuration varies depending on what kind of
replication, migration policies are used

Design issues
 Granularity (block size): Smaller size, higher faults,
traffic; larger blocks mean jobs with higher locality
 Structure: Layout of data, depends on application
 Coherence & access synchronization: Like the cache
situation in a uniprocessor system
 Data Location & access: what data to be replicated,
located
 Replacement strategy
 Thrashing
 Heterogeneity

Granularity

Block Size Selection Factors
 Large block sizes favored as overheads to transfer
smaller blocks and larger one not too different
 Paging overhead- paging overheads also favors larger
block sizes, application should thus have larger locality
of reference
 Directory size-smaller block larger directory, larger
management overhead
 Thrashing- thrashing is likely to increase with larger
block size
 False sharing-larger block sizes increases probability.
Consequence, higher thrashing

Page Size as Block Size
 Page size is preferred as the DSM block size
 Advantages are
 Existing page fault hardware can be used as block fault
mechanism. Memory coherence can be handled in page
fault handlers
 Access control can be managed with existing memory
mapping systems
 If page size is less than packet size, no extra overhead
 Page size proved to be, over time, the right unit as far as
memory contention

Structure

Structure of Shared Memory
Space
 Approaches to structuring
 No structure: a linear array of memory, easy to design
 By data type: granularity per variable, complex to
handle
 As database: as tuple space, associative memory,
primitives need to be added to languages, non
transparent access to shared data

Consistency Models

Consistency Models
 Strict consistency
 Sequential consistency
 Causal consistency
 Pipelined random access memory consistency
 Processor consistency
 Weak consistency
 Release consistency

Strict Consistency Model
 Value read of a memory address is the same as the
latest write at that address
 Writes become visible to all nodes
 Needs absolute ordering of memory read/write
operations, a global time required (to define most
recent)
 Nearly impossible to implement

Sequential Consistency Model
 All processes should see the same ordering of read,
writes
 Exact interleaving does not matter
 No memory operation is started unless earlier
operations have completed
 Acceptable in most applications

Causal Consistency Model
 Operations are seen in same order (correct order)when
they are causally related
 W2 follows w1 and causally related, then w1, w2 is the
order every process should see
 They may not be seen in same order when not related
causally

Pipelined RAM Consistency Model
 All writes of a single process are seen in the same order
by other processes (as in a pipeline)
 However, writes by other processes may appear in
different order.
 (W11,w12) and (w21, w22) can be seen as (wi1,wi2)
followed by (w21, w22) or (w21, w22) followed by
(w11,w12)
 Simple to implement

Processor Consistency Model
 Adds memory coherence to the PRAM model
 That is if the writes are for a particular memory
location then all processes should see the writes in the
same order that maintains memory coherence

Weak Consistency Model
 Changes in memory can be made after a set of changes has happened (example critical
section)
 Isolated access to variable is usually rare, usually there will be several accesses and then
none at all
 Difficulty is the system would not know when to show the changes
 Application programmers can take care of this through a synchronization variable
 Necessarily
 All accesses to sync variable must follow strongest consistency9sequential)
 All pending writes must be completed before access to sync variable is allowed
 All previous access to sync must be completed before another access is allowed

Release Consistency Model
 Weak consistency model requires that
 All changes made by a process are propagated to all
nodes
 All changes at other nodes are propagated to the
processor node
 Acquire and release variable used for sync so that only
one of the operations above need to be done

Discussion of Models
 Strict sequential model s difficult to implement,
almost never implemented
 Sequential consistency model is most commonly used
 Causal, PRAM, processor, weak and release
consistency are the ones implemented in many DSM
systems, programmers need to intervene
 Weak and release consistency provides explicit sync
variables to help with the consistency

Implementing Sequential
Concurrency Model
 Implementing sequential consistency would depend
on what replication/ migration are allowed
 Migration/Replication strategies
 Non replicated, non migrating blocks (NRNMBs)
 Non replicated, migrating blocks (NRMBs)
 Replicated, migrating blocks (RMBs)
 Replicated, non migrating blocks (RNMBs)

NRNMB
 All requests to a block are routed through the OS and
MMU to this one block that is not replicate and does
not move anywhere
 Can cause
 Bottleneck because of serializing of memory accesses
 Parallelism is not possible

NRMB
 No copies, if required entire block may be moved to
the node that requires it
 Advantages
 No communication costs, all accesses are local
 Applications can take advantage of locality, applications
with high locality will perform better
 Disadvantages
 Prone to thrashing
 No advantage of parallelism

Data Locating in NRMB
 Broadcast
 Fault happens, a request is broadcast, current owner sends
the block
 Broadcast cause communication overheads
 Centralized server
 Request sent to the server, servers asks the node holding the
block to send it to the requesting node, updates location
information
 Fixed distributed server
 Fault handler finds mapping of block to the specific server, send
request and gets the block
 Dynamic distributed server
 Fault causes a local search for probable owner, goes to that node,
finds another probable owner or the block, gets block updates info

RMB
 Replication is required to increase parallelism
 Reads can be done locally, writes has overheads
 High read/write ratio systems can apportion the write
overhead over many reads
 Maintaining coherence throughout replicated block is
an issue
 Two basic protocols used are
 Write-invalidate
 Write update

Coherence Protocols
 Write-invalidate
 On write fault, the fault handler copies the block from
one of the nodes to its own
 Invalidates all the copies, writes data
 If another node needs it now, the updated block is
replicated
 Write update
 On write fault, copy block to local node, update data
 Send address & new data to all the replicas
 Operation resumes after all the writes are done

Comparison
 Write update typically needs a global sequencer to
makes sure all nodes see writes in the same sequence
 Also the operations are full writes
 Together there is a significant communication
overhead
 Write invalidate does not need all that, just a
invalidation signal
 Write invalidate is thus more often used method

Data Locating in RMB Strategy
 Owner of a block needs to be located, the most recent
node which had write access
 Node that has a valid copy will need to be tracked
 Use on of the following
 Broadcasting
 Centralized server algorithm
 Fixed distributed server algorithm
 Dynamic distributed server algorithm

RNMB
 Replicas are maintained but blocks do not migrate
 Consistency is maintained by updating all the replicas
by a write update like process

Data Locating in RNMB Strategy
 Replica locations do not change
 Replicas are kept consistent
 Read requests can go to the nodes that has the data
block
 Writes through global sequencer

Munin: A Release Consistent DSM
System
 Structure: a collection of shared variables
 Each shared variable goes to a separate memory page
 acquireLock and releaselock are used
 Different consistency protocol is applied for different
types of shared variable used in the system
 Read-only, migratory, write-shared, producer-consumer,
result, reduction and conventional

Replacement Strategy

Replacement Strategy
 Shared memory blocks are replicated and/or migrated
so two strategies need to be decided
 Block to be replaced
 Where should the replaced block go

Blocks to Replace
 Usage based vs. non-usage based
 Fixed space vs. variable space
 Unused
 Nil
 Read only
 Read-owned
 Writable

Place for Replacement Block
 Using secondary store locally
 Using memory space of other nodes- store at free
memory space in some other node. Free memory space
status need to be exchanged, piggybacking on normal
communication messages

Thrashing

Thrashing Situations
 DSM allows migration, so migration back and forth leads
to thrashing
 Bata blocks keep migrating between nodes due to interleaved
accesses by processes
 Read only blocks are repeatedly invalidated so after
replication

Thrashing Reduction Strategies
 Application controlled locks
 Locking an application to a node for a time, deciding t
could be a very difficult issue
 Tune coherence strategy to the usage pattern,
transparency of the memory system is compromised

Other Approaches to DSM

Approaches
 Data caching managed by the OS
 Data Caching managed by MMUs
 Data Caching managed by the language run time
system

Heterogeneous DSM

Features of Heterogeneous DSM
 Data Conversion
 Structuring DSM as a source of source language objects
 Allowing one type of data in a block only (has
complications)
 Memory fragmentation
 Compilation issues
 Entire page is converted but a small part may be used before
transfer
 Not transparent, user provided conversion may be required

Advantages of DSM

Advantages
 Simpler abstraction
 Better portability of distributed applications
 Better performance of some Systems
 Flexible communications environment
 Ease of process migration

Cs704 d distributedmutualexcclusion&memory

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to Cs704 d distributedmutualexcclusion&memory (20)

More from Debasis Das (20)

Recently uploaded (20)

Cs704 d distributedmutualexcclusion&memory