SlideShare a Scribd company logo
Synchronization
in
Distributed Systems
Chapter 6
Guide to Synchronization Lectures
• Synchronization in shared memory systems
• Event ordering in distributed systems
– Logical time, logical clocks, time stamps,
• Mutual exclusion in distributed systems
– Centralized, decentralized, etc.
– Election algorithms
• Data race detection in multithreaded
programs
Background
• Synchronization: coordination of actions
between processes.
• Processes are usually asynchronous, (operate
independent of events in other processes)
• Sometimes need to cooperate/synchronize
– For mutual exclusion
– For event ordering (was message x from process P
sent before or after message y from process Q?)
Introduction
• Synchronization in centralized systems is
primarily accomplished through shared
memory
– Event ordering is clear because all events are
timed by the same clock
• Synchronization in distributed systems is
harder
– No shared memory
– No common clock
Clock Synchronization
• Some applications rely on event ordering
to be successful
– Event ordering is easier if you can accurately
time-stamp events, but in a distributed system
the clocks may not always be synchronized
• Is it possible to synchronize clocks in a
distributed system?
Physical Clocks
• Physical clock example: counter + holding
register + oscillating quartz crystal
– The counter is decremented at each oscillation
– Counter interrupts when it reaches zero
– Reloads from the holding register
– Interrupt = clock tick (often 60 times/second)
• Software clock: counts interrupts
– This value represents number of seconds since some
predetermined time (Jan 1,1970 for UNIX systems;
beginning of the Gregorian calendar for Microsoft)
– Can be converted to normal clock times
Clock Skew
• Clock skew(offset): the difference between
the times on two different clocks
• Clock drift : the difference between a clock
and actual time
• Ordinary quartz clocks drift by ~ 1sec in
11-12 days. (10-6
secs/sec)
• High precision quartz clocks drift rate is
somewhat better
Various Ways of Measuring Time*
• The sun
– Mean solar second – gradually getting longer as earth’s
rotation slows.
• International Atomic Time (TAI)
– Atomic clocks are based on transitions of the cesium
atom
– Atomic second = value of solar second at some fixed
time (no longer accurate)
• Universal Coordinated Time (UTC)
– Based on TAI seconds, but more accurately reflects sun
time (inserts leap seconds to synchronize atomic second
with solar second)
Getting the Correct (UTC) Time*
• WWV radio station or similar stations in
other countries (accurate to +/- 10 msec)
• UTC services provided by earth satellites
(accurate to .5 msec)
• GPS (Global Positioning System)
(accurate to 20-35 nanoseconds)
Clock Synchronization Algorithms*
• In a distributed system one machine may
have a WWV receiver and some technique
is used to keep all the other machines in
synch with this value.
• Or, no machine has access to an external
time source and some technique is used
to keep all machines synchronized with
each other, if not with “real” time.
Clock Synchronization Algorithms
• Network Time Protocol (NTP):
– Objective: to keep all clocks in a system synchronized to
UTC time (1-50 msec accuracy) – not so good in WAN
– Uses a hierarchy of passive time servers
• The Berkeley Algorithm:
– Objective: to keep all clocks in a system synchronized to
each other (internal synchronization)
– Uses active time servers that poll machines periodically
• Reference broadcast synchronization (RBS)
– Objective: to keep all clocks in a wireless system
synchronized to each other
Three Philosophies of Clock
Synchronization
• Try to keep all clocks synchronized to
“real” time as closely as possible
• Try to keep all clocks synchronized to
each other, even if they vary somewhat
from UTC time
• Try to synchronize enough so that
interacting processes can agree upon an
event order.
– Refer to these “clocks” as logical clocks
6.2 Logical Clocks
• Observation: if two processes (running on
separate processors) do not interact, it
doesn’t matter if their clocks are not
synchronized.
• Observation: When processes do interact,
they are usually interested in event order,
instead of exact event time.
• Conclusion: Logical clocks are sufficient
for many applications
Formalization
• The distributed system consists of n
processes, p1, p2, …pn (e.g, a MPI group)
• Each pi executes on a separate processor
• No shared memory
• Each pi has a state si
• Process execution: a sequence of events
– Changes to the local state
– Message Send or Receive
Two Versions
• Lamport’s logical clocks: synchronizes
logical clocks
– Can be used to determine an absolute
ordering among a set of events although the
order doesn’t necessarily reflect causal
relations between events.
• Vector clocks: can capture the causal
relationships between events.
Lamport’s Logical Time
• Lamport defined a “happens-before”
relation between events in a process.
• "Events" are defined by the application. The
granularity may be as coarse as a
procedure or as fine-grained as a single
instruction.
Happened Before Relation (a  b)
• a  b: (page 244-245)
– in the same [sequential] process,
– send, receive in different processes,
(messages)
– transitivity: if a  b and b  c, then a  c
• If a  b, then a and b are causally related;
i.e., event a potentially has a causal effect
on event b.
Concurrent Events
• Happens-before defines a partial order of
events in a distributed system.
• Some events can’t be placed in the order
• a and b are concurrent (a || b) if
!(a  b) and !(b  a).
• If a and b aren’t connected by the
happened-before relation, there’s no way
one could affect the other.
Logical Clocks
• Needed: method to assign a “timestamp” to
event a (call it C(a)), even in the absence of a
global clock
• The method must guarantee that the clocks
have certain properties, in order to reflect the
definition of happens-before.
• Define a clock (event counter), Ci, at each
process (processor) Pi.
• When an event a occurs, its timestamp ts(a) =
C(a), the local clock value at the time the event
takes place.
Correctness Conditions
• If a and b are in the same process, and
a  b then C (a) < C (b)
• If a is the event of sending a message
from Pi, and b is the event of receiving the
message by Pj, then Ci (a) < Cj (b).
• The value of C must be increasing (time
doesn’t go backward).
– Corollary: any clock corrections must be
made by adding a positive number to a time.
Implementation Rules
• Between any two successive events a & b in
Pi, increment the local clock (Ci = Ci + 1)
– thus Ci(b) = Ci(a) + 1
• When a message m is sent from Pi, set its
time-stamp tsm to Ci, the time of the send
event after following previous step.
• When the message is received at Pj the local
time must be greater than tsm . The rule is (Cj
= max{Cj, tsm} + 1).
Lamport’s Logical Clocks (2)
Figure 6-9. (a) Three processes, each with its own clock.
The clocks “run” at different rates.
Event a: P1 sends m1
to P2 at t = 6,
Event b: P2 receives
m1 at t = 16.
If C(a) is the time m1
was sent, and C(b) is
the time m1 is
received, do C(a) and
C(b) satisfy the
correctness
conditions ?
Lamport’s Logical Clocks (3)
Figure 6-9. (b) Lamport’s algorithm corrects the clocks.
Event c: P3
sends m3 to
P2 at t = 60
Event d: P2
receives m3
at t = 56
Do C(c) and
C(d) satisfy
the
conditions?
Application Layer
Application sends message mi
Adjust local clock,
Timestamp mi
Middleware sends
message
Network Layer
Message mi is received
Adjust local clock
Deliver mi to application
Middleware layer
Figure 6-10. The positioning of Lamport’s logical clocks in distributed systems
Handling clock management as a middleware operation
Figure 5.3 (Advanced Operating Systems,Singhal and Shivaratri)
How Lamport’s logical clocks advance
e11 e12 e13 e14 e15 e16 e17
e21 e22 e23 e24 e25
P1
P2
Which events are causally related?
Which events are concurrent?
eij represents event j
on processor i
A Total Ordering Rule
(does not guarantee causality)
• A total ordering of events can be obtained
if we ensure that no two events happen at
the same time (have the same timestamp).
• Why? So all processors can agree on an
unambiguous order.
• How? Attach process number to low-order
end of time, separated by decimal point;
e.g., event at time 40 at process P1 is
40.1,event at time 40 at process P2 is 40.2
Figure 5.3 - Singhal and Shivaratri
e11 e12 e13 e14 e15 e16 e17
e21 e22 e23 e24 e25
P1
P2
What is the total ordering of the events in these
two processes?
Example: Total Order Multicast
• Consider a banking database, replicated
across several sites.
• Queries are processed at the
geographically closest replica
• We need to be able to guarantee that DB
updates are seen in the same order
everywhere
Totally Ordered Multicast
Update 1: Process 1 at Site A adds $100 to an
account, (initial value = $1000)
Update 2: Process 2 at Site B increments the
account by 1%
Without synchronization,
it’s possible that
replica 1 = $1111,
replica 2 = $1110
• Message 1: add $100.00
Message 2: increment account by 1%
• The replica that sees the messages in the
order m1, m2 will have a final balance of
$1111
• The replica that sees the messages in the
order m2, m1 will have a final balance of
$1110
The Problem
• Site 1 has final account balance of $1,111
after both transactions complete and Site 2
has final balance of $1,100.
• Which is “right”? Either, from the standpoint
of consistency.
• Problem: lack of consistency.
– Both values should be the same
• Solution: make sure both sites see/process
all messages in the same order.
Implementing Total Order
• Assumptions:
– Updates are multicast to all sites, including
(conceptually) the sender
– All messages from a single sender arrive in
the order in which they were sent
– No messages are lost
– Messages are time-stamped with Lamport
clock values.
Implementation
• When a process receives a message, put
it in a local message queue, ordered by
timestamp.
• Multicast an acknowledgement to all sites
• Each ack has a timestamp larger than the
timestamp on the message it
acknowledges
• The message queue at each site will
eventually be in the same order
Implementation
• Deliver a message to the application only when
the following conditions are true:
– The message is at the head of the queue
– The message has been acknowledged by all other
receivers. This guarantees that no update messages
with earlier timestamps are still in transit.
• Acknowledgements are deleted when the
message they acknowledge is processed.
• Since all queues have the same order, all sites
process the messages in the same order.
Causality
• Causally related events:
– Event a may causally affect event b if a  b
– Events a and b are causally related if either
a  b or b  a.
– If neither of the above relations hold, then
there is no causal relation between a & b. We
say that a || b (a and b are concurrent)
Vector Clock Rationale
• Lamport clocks limitation:
– If (ab) then C(a) < C(b) but
– If C(a) < C(b) then we only know that either
(ab) or (a || b), i.e., b a
• In other words, you cannot look at the clock
values of events on two different processors
and decide which one “happens before”.
• Lamport clocks do not capture causality
Lamport’s Logical Clocks (3)
Figure 6-12.
Suppose we add a message to the
scenario in Fig. 6.12(b).
• Tsnd(m1) < Tsnd(m3’).
(6) < (32)
• Does this mean
send(m1)  send(m3’)?
But …
• Tsnd(m1) < Tsnd(m2’).
(6) < (20)
• Does this mean
send(m1)  send(m2)?
m2’
m3’
Figure 5.4
Time
P1
P2
P3
e11
.
e21
e12
e22
e31 e32 e33
(1) (2)
(1) (3)
(1) (2) (3)
C(e11) < C(e22) and C(e11) < C(e32) but while e11  e22, we cannot say
e11  e32 since there is no causal path connecting them. So, with
Lamport clocks we can guarantee that if C(a) < C(b) then
b a , but by looking at the clock values alone we cannot say
whether or not the events are causally related.
Space
Vector Clocks – How They
Work
• Each processor keeps a vector of values,
instead of a single value.
• VCi is the clock at process i; it has a component
for each process in the system.
– VCi[i] corresponds to Pi‘s local “time”.
– VCi[j] represents Pi‘s knowledge of the “time”
at Pj (the # of events that Pi knows have
occurred at Pj
• Each processor knows its own “time” exactly,
and updates the values of other processors’
clocks based on timestamps received in
messages.
Implementation Rules
• IR1: Increment VCi[i] before each new event.
• IR2: When process i sends a message m it sets
m’s (vector) timestamp to VCi (after incrementing
VCi[i])
• IR3: When a process receives a message it does
a component-by-component comparison of the
message timestamp to its local time and picks the
maximum of the two corresponding components.
Adjust local components accordingly.
• Then deliver the message to the application.
Review
• Physical clocks: hard to keep synchronized
• Logical clocks: can provide some notion of
relative event occurrence
• Lamport’s logical time
– happened-before relation defines causal relations
– logical clocks – don’t capture causality
– total ordering relation
– use in establishing totally ordered multicast
• Vector clocks
– Unlike Lamport clocks, vector clocks capture causality
– Have a component for each process in the system
Figure 5.5. Singhal and Shivaratri
(1, 0 , 0) (2, 0, 0) (4, 5, 2)
e11 e12 e14
(0, 1, 0) (2, 2, 0) (2, 3, 1) (2, 5, 2)
(0, 0, 1) (0, 0, 2)
e21 e22 e23 e24
e31 e32
P1
P2
P3
(2,4,2)
e25
Vector clock values. In a 3- process system, VC(Pi) = vc1, vc2, vc3
e13
(3, 0, 0)
e33
(0, 0, 3)
Establishing Causal Order
• When Pi sends a message m to Pj, Pj knows
– How many events occurred at Pi before m was sent
– How many relevant events occurred at other sites before
m was sent (relevant = “happened-before”)
• In Figure 5.5, VC(e24) = (2, 4, 2). Two events in P1
and two events in P3 “happened before” e24.
– Even though P1 and P3 may have executed other events,
they don’t have a causal effect on e24.
Happened Before/Causally Related
Events - Vector Clock Definition
• a → b iff ts(a) <
ts(b)
(a happens before b iff the timestamp of a is less
than the timestamp of b)
• Events a and b are causally related if
– ts(a) <
ts(b) or
– ts(b) <
ts(a)
• Otherwise, we say the events are concurrent.
• Any pair of events that satisfy the vector clock
definition of happens-before will also satisfy the
Lamport definition, and vice-versa.
Comparing Vector Timestamps
• Less than: ts(a) < ts(b) iff at least one
component of ts(a) is strictly less than the
corresponding component of ts(b) and all
other components of ts(a) are either less
than or equal to the corresponding
component in ts(b).
• (3,3,5) ≤ (3,4,5), (3, 3, 3) ═ (3, 3, 3),
(3,3,5) ≥ (3,2,4), (3, 3 ,5) | | (4,2,5).
Figure 5.4
Time
P1
P2
P3
e21
e12
e22
e31 e32 e33
(1, 0, 0) (2, 0, 0)
(0, 1, 0) (2, 2, 0)
(0, 0,1) (0, 0, 2) (0, 0, 3)
ts(e11) = (1, 0, 0) and ts(e32) = (0, 0, 2), which shows that the
two events are concurrent.
ts(e11) = (1, 0, 0) and ts(e22) = (2, 2, 0), which shows that
e11 e22
e11
Causal Ordering of Messages
An Application of Vector Clocks
• Premise: Deliver a message only if
messages that causally precede it have
already been received
– i.e., if send(m1)  send(m2), then it should be
true that receive(m1)  receive(m2) at each
site.
– If messages are not related (send(m1) ||
send(m2)), delivery order is not of interest.
Compare to Total Order
• Totally ordered multicast (TOM) is
stronger (more inclusive) than causal
ordering (COM).
– TOM orders all messages, not just those that
are causally related.
– “Weaker” COM is often what is needed.
Enforcing Causal Communication
• Clocks are adjusted only when sending or
receiving messages; i.e, these are the only
events of interest.
• Send m: Pi increments VCi[i] by 1 and
applies timestamp, ts(m).
• Receive m: Pi compares VCi to ts(m); set
VCi[k] to max{VCi[k] , ts(m)[k]} for each k,
k ≠ i.
Message Delivery Conditions
• Suppose: PJ receives message m from Pi
• Middleware delivers m to the application iff
– ts(m)[i] = VCj[i] + 1
• all previous messages from Pi have been delivered
– ts(m)[k] ≤ VCi[k] for all k ≠ i
• PJ has received all messages that Pi had seen before
it sent message m.
• In other words, if a message m is received
from Pi, you should also have received every
message that Pi received before it sent m;
e.g.,
– if m is sent by P1 and ts(m) is (3, 4, 0) and you
are P3, you should already have received exactly
2 messages from P1 and at least 4 from P2
– if m is sent by P2 and ts(m) is (4, 5, 1, 3) and if
you are P3 and VC3 is (3, 3, 4, 3) then you need
to wait for a fourth message from P2 and at least
one more message from P1.
P0
P1
P2
(1, 0, 0)
P1 received message m from P0 before sending
message m* to P2; P2 must wait for delivery of m
before receiving m*
(Increment own clock only on message send)
Before sending or receiving any messages, one’s
own clock is (0, 0, …0)
VC2
(1, 0, 0) (1, 1, 0)
(1, 1, 0)
VC1
m
m*
VC0
VC2
Figure 6-13. Enforcing Causal Communication
VC0
(1, 1, 0)
(0, 0, 0)
VC2
History
• ISIS and Horus were middleware systems
that supported the building of distributed
environments through virtually
synchronous process groups
• Provided both totally ordered and causally
ordered message delivery.
– “Lightweight Causal and Atomic Group Multicast”
– Birman, K., Schiper, A., Stephenson, P, ACM Transactions on
Computer Systems, Vol 9, No. 3, August 1991, pp 272-314.
Location of Message Delivery
• Problems if located in middleware:
– Message ordering captures only potential causality;
no way to know if two messages from the same
source are actually dependent.
– Causality from other sources is not captured.
• End-to-end argument: the application is better
equipped to know which messages are causally
related.
• But … developers are now forced to do more
work; re-inventing the wheel.

More Related Content

PPT
Clock synchronization in distributed system
PPT
Chapter 10
PDF
Clock.pdf
PPTX
3. syncro. in distributed system
PPT
clock synchronization in Distributed System
PDF
6.Distributed Operating Systems
PPT
Time-Synchronization-ds14.pptmmmmmmmmmmmmmmmmmmmmmmmmmmm
Clock synchronization in distributed system
Chapter 10
Clock.pdf
3. syncro. in distributed system
clock synchronization in Distributed System
6.Distributed Operating Systems
Time-Synchronization-ds14.pptmmmmmmmmmmmmmmmmmmmmmmmmmmm

Similar to dokumen.tips_synchronization-in-distributed-systems-chapter-6.ppt (20)

PPT
Chapter Five: Introduction to Syncho.pptduction to Syncho.ppt
PPT
Chapter Five Synchonization distributed Sytem.ppt
PPT
Chap 5
PDF
Synchonization in Distributed Systems.pdf
PPTX
Cross cutting concerns should be logically centralized DRY ,but it may appear...
PPT
Chapter 5-Synchronozation.ppt
PPTX
Synchronization Pradeep K Sinha
PPTX
Physical and Logical Clocks
PPTX
Lesson 05 - Time in Distrributed System.pptx
PPTX
Synchronization
PPT
Chapter 6-Synchronozation2.ppt
PPT
09-time+synch.ppt
PPTX
DC UNIT 1 cs 3551 DISTRIBUTED COMPUTING.pptx
PPT
CS6601-Unit 4 Distributed Systems
PPTX
Unit iii-Synchronization
PPT
Distributed System
PPT
L12.FA20.ppt
PPT
Ds ppt imp.
Chapter Five: Introduction to Syncho.pptduction to Syncho.ppt
Chapter Five Synchonization distributed Sytem.ppt
Chap 5
Synchonization in Distributed Systems.pdf
Cross cutting concerns should be logically centralized DRY ,but it may appear...
Chapter 5-Synchronozation.ppt
Synchronization Pradeep K Sinha
Physical and Logical Clocks
Lesson 05 - Time in Distrributed System.pptx
Synchronization
Chapter 6-Synchronozation2.ppt
09-time+synch.ppt
DC UNIT 1 cs 3551 DISTRIBUTED COMPUTING.pptx
CS6601-Unit 4 Distributed Systems
Unit iii-Synchronization
Distributed System
L12.FA20.ppt
Ds ppt imp.
Ad

More from samaghorab (19)

PPTX
Lec-4-CSS-AdvancedAdvanced.Advanced.Advanced..pptx
PPTX
Web Development_Sec6_kkkkkkkkkkkkkkkkkkkkkkkkkJS.pptx
PDF
Web Development_Sec6_Java secriptvvvvv.pdf
PPTX
L2Web development development development.pptx
PPTX
L2Web-intro to web development in html,css.pptx
PPTX
lec+5+_part+1 cloud .pptx
PPTX
Lecture dkjdljfklllllllllllllllllllllllllllllllllllllllllllll
PDF
Lec+3-Introduction-to-Distributed-Systems.pdf
PDF
Lecture-1-2-+(1).pdf
PPT
Lec_1_Integration.ppt
PPTX
L2Web.pptx
PDF
Chapter+3+-+Normalization.pdf
PDF
L6.pdf
PDF
Lecture-1-2-+(1).pdf
PDF
Intro_to_data_base.pdf
PPT
5941981.ppt
PPTX
programs+ifelse&+for.pptx
PPTX
Bioinformatics-Lec+4-DNADamage-and-Repair.pptx
PPTX
Python_Session05_Homeworkquestions.pptx
Lec-4-CSS-AdvancedAdvanced.Advanced.Advanced..pptx
Web Development_Sec6_kkkkkkkkkkkkkkkkkkkkkkkkkJS.pptx
Web Development_Sec6_Java secriptvvvvv.pdf
L2Web development development development.pptx
L2Web-intro to web development in html,css.pptx
lec+5+_part+1 cloud .pptx
Lecture dkjdljfklllllllllllllllllllllllllllllllllllllllllllll
Lec+3-Introduction-to-Distributed-Systems.pdf
Lecture-1-2-+(1).pdf
Lec_1_Integration.ppt
L2Web.pptx
Chapter+3+-+Normalization.pdf
L6.pdf
Lecture-1-2-+(1).pdf
Intro_to_data_base.pdf
5941981.ppt
programs+ifelse&+for.pptx
Bioinformatics-Lec+4-DNADamage-and-Repair.pptx
Python_Session05_Homeworkquestions.pptx
Ad

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
DOCX
573137875-Attendance-Management-System-original
PPTX
Welding lecture in detail for understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Well-logging-methods_new................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mechanical Engineering MATERIALS Selection
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Operating System & Kernel Study Guide-1 - converted.pdf
Foundation to blockchain - A guide to Blockchain Tech
Embodied AI: Ushering in the Next Era of Intelligent Systems
Structs to JSON How Go Powers REST APIs.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
573137875-Attendance-Management-System-original
Welding lecture in detail for understanding
bas. eng. economics group 4 presentation 1.pptx
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...

dokumen.tips_synchronization-in-distributed-systems-chapter-6.ppt

  • 2. Guide to Synchronization Lectures • Synchronization in shared memory systems • Event ordering in distributed systems – Logical time, logical clocks, time stamps, • Mutual exclusion in distributed systems – Centralized, decentralized, etc. – Election algorithms • Data race detection in multithreaded programs
  • 3. Background • Synchronization: coordination of actions between processes. • Processes are usually asynchronous, (operate independent of events in other processes) • Sometimes need to cooperate/synchronize – For mutual exclusion – For event ordering (was message x from process P sent before or after message y from process Q?)
  • 4. Introduction • Synchronization in centralized systems is primarily accomplished through shared memory – Event ordering is clear because all events are timed by the same clock • Synchronization in distributed systems is harder – No shared memory – No common clock
  • 5. Clock Synchronization • Some applications rely on event ordering to be successful – Event ordering is easier if you can accurately time-stamp events, but in a distributed system the clocks may not always be synchronized • Is it possible to synchronize clocks in a distributed system?
  • 6. Physical Clocks • Physical clock example: counter + holding register + oscillating quartz crystal – The counter is decremented at each oscillation – Counter interrupts when it reaches zero – Reloads from the holding register – Interrupt = clock tick (often 60 times/second) • Software clock: counts interrupts – This value represents number of seconds since some predetermined time (Jan 1,1970 for UNIX systems; beginning of the Gregorian calendar for Microsoft) – Can be converted to normal clock times
  • 7. Clock Skew • Clock skew(offset): the difference between the times on two different clocks • Clock drift : the difference between a clock and actual time • Ordinary quartz clocks drift by ~ 1sec in 11-12 days. (10-6 secs/sec) • High precision quartz clocks drift rate is somewhat better
  • 8. Various Ways of Measuring Time* • The sun – Mean solar second – gradually getting longer as earth’s rotation slows. • International Atomic Time (TAI) – Atomic clocks are based on transitions of the cesium atom – Atomic second = value of solar second at some fixed time (no longer accurate) • Universal Coordinated Time (UTC) – Based on TAI seconds, but more accurately reflects sun time (inserts leap seconds to synchronize atomic second with solar second)
  • 9. Getting the Correct (UTC) Time* • WWV radio station or similar stations in other countries (accurate to +/- 10 msec) • UTC services provided by earth satellites (accurate to .5 msec) • GPS (Global Positioning System) (accurate to 20-35 nanoseconds)
  • 10. Clock Synchronization Algorithms* • In a distributed system one machine may have a WWV receiver and some technique is used to keep all the other machines in synch with this value. • Or, no machine has access to an external time source and some technique is used to keep all machines synchronized with each other, if not with “real” time.
  • 11. Clock Synchronization Algorithms • Network Time Protocol (NTP): – Objective: to keep all clocks in a system synchronized to UTC time (1-50 msec accuracy) – not so good in WAN – Uses a hierarchy of passive time servers • The Berkeley Algorithm: – Objective: to keep all clocks in a system synchronized to each other (internal synchronization) – Uses active time servers that poll machines periodically • Reference broadcast synchronization (RBS) – Objective: to keep all clocks in a wireless system synchronized to each other
  • 12. Three Philosophies of Clock Synchronization • Try to keep all clocks synchronized to “real” time as closely as possible • Try to keep all clocks synchronized to each other, even if they vary somewhat from UTC time • Try to synchronize enough so that interacting processes can agree upon an event order. – Refer to these “clocks” as logical clocks
  • 13. 6.2 Logical Clocks • Observation: if two processes (running on separate processors) do not interact, it doesn’t matter if their clocks are not synchronized. • Observation: When processes do interact, they are usually interested in event order, instead of exact event time. • Conclusion: Logical clocks are sufficient for many applications
  • 14. Formalization • The distributed system consists of n processes, p1, p2, …pn (e.g, a MPI group) • Each pi executes on a separate processor • No shared memory • Each pi has a state si • Process execution: a sequence of events – Changes to the local state – Message Send or Receive
  • 15. Two Versions • Lamport’s logical clocks: synchronizes logical clocks – Can be used to determine an absolute ordering among a set of events although the order doesn’t necessarily reflect causal relations between events. • Vector clocks: can capture the causal relationships between events.
  • 16. Lamport’s Logical Time • Lamport defined a “happens-before” relation between events in a process. • "Events" are defined by the application. The granularity may be as coarse as a procedure or as fine-grained as a single instruction.
  • 17. Happened Before Relation (a  b) • a  b: (page 244-245) – in the same [sequential] process, – send, receive in different processes, (messages) – transitivity: if a  b and b  c, then a  c • If a  b, then a and b are causally related; i.e., event a potentially has a causal effect on event b.
  • 18. Concurrent Events • Happens-before defines a partial order of events in a distributed system. • Some events can’t be placed in the order • a and b are concurrent (a || b) if !(a  b) and !(b  a). • If a and b aren’t connected by the happened-before relation, there’s no way one could affect the other.
  • 19. Logical Clocks • Needed: method to assign a “timestamp” to event a (call it C(a)), even in the absence of a global clock • The method must guarantee that the clocks have certain properties, in order to reflect the definition of happens-before. • Define a clock (event counter), Ci, at each process (processor) Pi. • When an event a occurs, its timestamp ts(a) = C(a), the local clock value at the time the event takes place.
  • 20. Correctness Conditions • If a and b are in the same process, and a  b then C (a) < C (b) • If a is the event of sending a message from Pi, and b is the event of receiving the message by Pj, then Ci (a) < Cj (b). • The value of C must be increasing (time doesn’t go backward). – Corollary: any clock corrections must be made by adding a positive number to a time.
  • 21. Implementation Rules • Between any two successive events a & b in Pi, increment the local clock (Ci = Ci + 1) – thus Ci(b) = Ci(a) + 1 • When a message m is sent from Pi, set its time-stamp tsm to Ci, the time of the send event after following previous step. • When the message is received at Pj the local time must be greater than tsm . The rule is (Cj = max{Cj, tsm} + 1).
  • 22. Lamport’s Logical Clocks (2) Figure 6-9. (a) Three processes, each with its own clock. The clocks “run” at different rates. Event a: P1 sends m1 to P2 at t = 6, Event b: P2 receives m1 at t = 16. If C(a) is the time m1 was sent, and C(b) is the time m1 is received, do C(a) and C(b) satisfy the correctness conditions ?
  • 23. Lamport’s Logical Clocks (3) Figure 6-9. (b) Lamport’s algorithm corrects the clocks. Event c: P3 sends m3 to P2 at t = 60 Event d: P2 receives m3 at t = 56 Do C(c) and C(d) satisfy the conditions?
  • 24. Application Layer Application sends message mi Adjust local clock, Timestamp mi Middleware sends message Network Layer Message mi is received Adjust local clock Deliver mi to application Middleware layer Figure 6-10. The positioning of Lamport’s logical clocks in distributed systems Handling clock management as a middleware operation
  • 25. Figure 5.3 (Advanced Operating Systems,Singhal and Shivaratri) How Lamport’s logical clocks advance e11 e12 e13 e14 e15 e16 e17 e21 e22 e23 e24 e25 P1 P2 Which events are causally related? Which events are concurrent? eij represents event j on processor i
  • 26. A Total Ordering Rule (does not guarantee causality) • A total ordering of events can be obtained if we ensure that no two events happen at the same time (have the same timestamp). • Why? So all processors can agree on an unambiguous order. • How? Attach process number to low-order end of time, separated by decimal point; e.g., event at time 40 at process P1 is 40.1,event at time 40 at process P2 is 40.2
  • 27. Figure 5.3 - Singhal and Shivaratri e11 e12 e13 e14 e15 e16 e17 e21 e22 e23 e24 e25 P1 P2 What is the total ordering of the events in these two processes?
  • 28. Example: Total Order Multicast • Consider a banking database, replicated across several sites. • Queries are processed at the geographically closest replica • We need to be able to guarantee that DB updates are seen in the same order everywhere
  • 29. Totally Ordered Multicast Update 1: Process 1 at Site A adds $100 to an account, (initial value = $1000) Update 2: Process 2 at Site B increments the account by 1% Without synchronization, it’s possible that replica 1 = $1111, replica 2 = $1110
  • 30. • Message 1: add $100.00 Message 2: increment account by 1% • The replica that sees the messages in the order m1, m2 will have a final balance of $1111 • The replica that sees the messages in the order m2, m1 will have a final balance of $1110
  • 31. The Problem • Site 1 has final account balance of $1,111 after both transactions complete and Site 2 has final balance of $1,100. • Which is “right”? Either, from the standpoint of consistency. • Problem: lack of consistency. – Both values should be the same • Solution: make sure both sites see/process all messages in the same order.
  • 32. Implementing Total Order • Assumptions: – Updates are multicast to all sites, including (conceptually) the sender – All messages from a single sender arrive in the order in which they were sent – No messages are lost – Messages are time-stamped with Lamport clock values.
  • 33. Implementation • When a process receives a message, put it in a local message queue, ordered by timestamp. • Multicast an acknowledgement to all sites • Each ack has a timestamp larger than the timestamp on the message it acknowledges • The message queue at each site will eventually be in the same order
  • 34. Implementation • Deliver a message to the application only when the following conditions are true: – The message is at the head of the queue – The message has been acknowledged by all other receivers. This guarantees that no update messages with earlier timestamps are still in transit. • Acknowledgements are deleted when the message they acknowledge is processed. • Since all queues have the same order, all sites process the messages in the same order.
  • 35. Causality • Causally related events: – Event a may causally affect event b if a  b – Events a and b are causally related if either a  b or b  a. – If neither of the above relations hold, then there is no causal relation between a & b. We say that a || b (a and b are concurrent)
  • 36. Vector Clock Rationale • Lamport clocks limitation: – If (ab) then C(a) < C(b) but – If C(a) < C(b) then we only know that either (ab) or (a || b), i.e., b a • In other words, you cannot look at the clock values of events on two different processors and decide which one “happens before”. • Lamport clocks do not capture causality
  • 37. Lamport’s Logical Clocks (3) Figure 6-12. Suppose we add a message to the scenario in Fig. 6.12(b). • Tsnd(m1) < Tsnd(m3’). (6) < (32) • Does this mean send(m1)  send(m3’)? But … • Tsnd(m1) < Tsnd(m2’). (6) < (20) • Does this mean send(m1)  send(m2)? m2’ m3’
  • 38. Figure 5.4 Time P1 P2 P3 e11 . e21 e12 e22 e31 e32 e33 (1) (2) (1) (3) (1) (2) (3) C(e11) < C(e22) and C(e11) < C(e32) but while e11  e22, we cannot say e11  e32 since there is no causal path connecting them. So, with Lamport clocks we can guarantee that if C(a) < C(b) then b a , but by looking at the clock values alone we cannot say whether or not the events are causally related. Space
  • 39. Vector Clocks – How They Work • Each processor keeps a vector of values, instead of a single value. • VCi is the clock at process i; it has a component for each process in the system. – VCi[i] corresponds to Pi‘s local “time”. – VCi[j] represents Pi‘s knowledge of the “time” at Pj (the # of events that Pi knows have occurred at Pj • Each processor knows its own “time” exactly, and updates the values of other processors’ clocks based on timestamps received in messages.
  • 40. Implementation Rules • IR1: Increment VCi[i] before each new event. • IR2: When process i sends a message m it sets m’s (vector) timestamp to VCi (after incrementing VCi[i]) • IR3: When a process receives a message it does a component-by-component comparison of the message timestamp to its local time and picks the maximum of the two corresponding components. Adjust local components accordingly. • Then deliver the message to the application.
  • 41. Review • Physical clocks: hard to keep synchronized • Logical clocks: can provide some notion of relative event occurrence • Lamport’s logical time – happened-before relation defines causal relations – logical clocks – don’t capture causality – total ordering relation – use in establishing totally ordered multicast • Vector clocks – Unlike Lamport clocks, vector clocks capture causality – Have a component for each process in the system
  • 42. Figure 5.5. Singhal and Shivaratri (1, 0 , 0) (2, 0, 0) (4, 5, 2) e11 e12 e14 (0, 1, 0) (2, 2, 0) (2, 3, 1) (2, 5, 2) (0, 0, 1) (0, 0, 2) e21 e22 e23 e24 e31 e32 P1 P2 P3 (2,4,2) e25 Vector clock values. In a 3- process system, VC(Pi) = vc1, vc2, vc3 e13 (3, 0, 0) e33 (0, 0, 3)
  • 43. Establishing Causal Order • When Pi sends a message m to Pj, Pj knows – How many events occurred at Pi before m was sent – How many relevant events occurred at other sites before m was sent (relevant = “happened-before”) • In Figure 5.5, VC(e24) = (2, 4, 2). Two events in P1 and two events in P3 “happened before” e24. – Even though P1 and P3 may have executed other events, they don’t have a causal effect on e24.
  • 44. Happened Before/Causally Related Events - Vector Clock Definition • a → b iff ts(a) < ts(b) (a happens before b iff the timestamp of a is less than the timestamp of b) • Events a and b are causally related if – ts(a) < ts(b) or – ts(b) < ts(a) • Otherwise, we say the events are concurrent. • Any pair of events that satisfy the vector clock definition of happens-before will also satisfy the Lamport definition, and vice-versa.
  • 45. Comparing Vector Timestamps • Less than: ts(a) < ts(b) iff at least one component of ts(a) is strictly less than the corresponding component of ts(b) and all other components of ts(a) are either less than or equal to the corresponding component in ts(b). • (3,3,5) ≤ (3,4,5), (3, 3, 3) ═ (3, 3, 3), (3,3,5) ≥ (3,2,4), (3, 3 ,5) | | (4,2,5).
  • 46. Figure 5.4 Time P1 P2 P3 e21 e12 e22 e31 e32 e33 (1, 0, 0) (2, 0, 0) (0, 1, 0) (2, 2, 0) (0, 0,1) (0, 0, 2) (0, 0, 3) ts(e11) = (1, 0, 0) and ts(e32) = (0, 0, 2), which shows that the two events are concurrent. ts(e11) = (1, 0, 0) and ts(e22) = (2, 2, 0), which shows that e11 e22 e11
  • 47. Causal Ordering of Messages An Application of Vector Clocks • Premise: Deliver a message only if messages that causally precede it have already been received – i.e., if send(m1)  send(m2), then it should be true that receive(m1)  receive(m2) at each site. – If messages are not related (send(m1) || send(m2)), delivery order is not of interest.
  • 48. Compare to Total Order • Totally ordered multicast (TOM) is stronger (more inclusive) than causal ordering (COM). – TOM orders all messages, not just those that are causally related. – “Weaker” COM is often what is needed.
  • 49. Enforcing Causal Communication • Clocks are adjusted only when sending or receiving messages; i.e, these are the only events of interest. • Send m: Pi increments VCi[i] by 1 and applies timestamp, ts(m). • Receive m: Pi compares VCi to ts(m); set VCi[k] to max{VCi[k] , ts(m)[k]} for each k, k ≠ i.
  • 50. Message Delivery Conditions • Suppose: PJ receives message m from Pi • Middleware delivers m to the application iff – ts(m)[i] = VCj[i] + 1 • all previous messages from Pi have been delivered – ts(m)[k] ≤ VCi[k] for all k ≠ i • PJ has received all messages that Pi had seen before it sent message m.
  • 51. • In other words, if a message m is received from Pi, you should also have received every message that Pi received before it sent m; e.g., – if m is sent by P1 and ts(m) is (3, 4, 0) and you are P3, you should already have received exactly 2 messages from P1 and at least 4 from P2 – if m is sent by P2 and ts(m) is (4, 5, 1, 3) and if you are P3 and VC3 is (3, 3, 4, 3) then you need to wait for a fourth message from P2 and at least one more message from P1.
  • 52. P0 P1 P2 (1, 0, 0) P1 received message m from P0 before sending message m* to P2; P2 must wait for delivery of m before receiving m* (Increment own clock only on message send) Before sending or receiving any messages, one’s own clock is (0, 0, …0) VC2 (1, 0, 0) (1, 1, 0) (1, 1, 0) VC1 m m* VC0 VC2 Figure 6-13. Enforcing Causal Communication VC0 (1, 1, 0) (0, 0, 0) VC2
  • 53. History • ISIS and Horus were middleware systems that supported the building of distributed environments through virtually synchronous process groups • Provided both totally ordered and causally ordered message delivery. – “Lightweight Causal and Atomic Group Multicast” – Birman, K., Schiper, A., Stephenson, P, ACM Transactions on Computer Systems, Vol 9, No. 3, August 1991, pp 272-314.
  • 54. Location of Message Delivery • Problems if located in middleware: – Message ordering captures only potential causality; no way to know if two messages from the same source are actually dependent. – Causality from other sources is not captured. • End-to-end argument: the application is better equipped to know which messages are causally related. • But … developers are now forced to do more work; re-inventing the wheel.