SlideShare a Scribd company logo
Distributed Systems
(3rd Edition)
Chapter 01:
Introduction
Version: March 20, 2022
2 /
Introduction: What is a distributed
system?
Distributed System
Definition
A distributed system is a collection of autonomous computing
elements that appears to its users as a single coherent system.
Characteristic features
► Autonomous computing elements, also referred to as
nodes, be they hardware devices or software processes.
► Single coherent system: users or applications perceive a
single
system ⇒ nodes need to collaborate.
3 /
Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing
elements
Collection of autonomous nodes
Independent behavior
Each node is autonomous and will thus have its own notion of
time: there is no global clock. Leads to fundamental
synchronization and coordination problems.
Collection of nodes
► How to manage group membership?
► How to know that you are indeed communicating
with an authorized (non)member?
4 /
Introduction: What is a distributed
system?
Characteristic 1: Collection of autonomous computing
elements
Organization
Overlay network
Each node in the collection communicates only with other nodes
in the system, its neighbors. The set of neighbors may be
dynamic, or may even be known only implicitly (i.e., requires a
lookup).
Overlay types
Well-known example of overlay networks: peer-to-peer systems.
Structured: each node has a well-defined set of neighbors with
whom it can communicate (tree, ring).
Unstructured: each node has references to randomly selected
other nodes from the system.
5 /
Introduction: What is a distributed
system?
Characteristic 2: Single coherent
system
Coherent system
Essence
The collection of nodes as a whole operates the same, no
matter where, when, and how interaction between a user and
the system takes place.
Examples
► An end user cannot tell where a computation is taking
place
► Where data is exactly stored should be irrelevant to an
application
► If or not data has been replicated is completely hidden
Keyword is distribution transparency
The snag: partial failures
It is inevitable that at any time only a part of the distributed system
fails. Hiding partial failures and their recovery is often very difficult
and in general impossible to hide.
Introduction: What is a distributed system? Middleware and distributed
systems
Middleware: the OS of distributed
systems
Local OS 1 Local OS 2 Local OS 3 Local OS 4
Appl. A Application B Appl. C
Distributed-system layer (middleware)
Computer
3
Computer
4
Same interf ace everywhere
Computer 1 Computer
2
Network
6 /
What does it contain?
Commonly used components and functions that need
not be implemented by applications separately.
7 /
Introduction: Design
goals
What do we want to achieve?
► Support sharing of
resources
► Distribution transparency
► Openness
► Scalability
8 /
Introduction: Design
goals
Supporting resource
sharing
Sharing
resources
Canonical examples
► Cloud-based shared storage and files
► Peer-to-peer assisted multimedia streaming
► Shared mail services (think of outsourced mail systems)
► Shared Web hosting (think of content distribution
networks)
Observation
“The network is the computer”
(quote from John Gage, then at Sun Microsystems)
Types of distribution 9 /
Introduction: Design
goals
Making distribution
transparent
Distribution
transparency
Types
Transparency Description
Access Hide differences in data representation and
how an object is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another
location while in use
Migration Hide that an object may move to another
location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several
independent users
Failure Hide the failure and recovery of an object
Degree of distribution 10 /
Introduction: Design
goals
Making distribution
transparent
Degree of transparency
Observation
Aiming at full distribution transparency may be too
much:
Degree of distribution 10 /
Introduction: Design
goals
Making distribution
transparent
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
► There are communication latencies that cannot be
hidden
Degree of distribution 10 /
Introduction: Design
goals
Making distribution
transparent
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
► There are communication latencies that cannot be hidden
► Completely hiding failures of networks and nodes is
(theoretically and practically) impossible
► You cannot distinguish a slow computer from a failing
one
► You can never be sure that a server actually performed
an operation before a crash
Degree of distribution 10 /
Introduction: Design
goals
Making distribution
transparent
Degree of transparency
Observation
Aiming at full distribution transparency may be too much:
► There are communication latencies that cannot be hidden
► Completely hiding failures of networks and nodes is
(theoretically and practically) impossible
► You cannot distinguish a slow computer from a failing
one
► You can never be sure that a server actually performed
an operation before a crash
► Full transparency will cost performance, exposing
distribution of the system
► Keeping replicas exactly up-to-date with the master
takes time
► Immediately flushing write operations to disk for
fault tolerance
Degree of distribution 11 /
Introduction: Design
goals
Making distribution
transparent
Degree of transparency
Exposing distribution may be good
► Making use of location-based services (finding your
nearby friends)
► When dealing with users in different time zones
► When it makes it easier for a user to understand what’s
going on (when e.g., a server does not respond for a long
time, report it as failing).
Degree of distribution 11 /
Introduction: Design
goals
Making distribution
transparent
Degree of transparency
Exposing distribution may be good
► Making use of location-based services (finding your
nearby friends)
► When dealing with users in different time zones
► When it makes it easier for a user to understand what’s
going on (when e.g., a server does not respond for a long
time, report it as failing).
Conclusion
Distribution transparency is a nice a goal, but achieving it is a
different story, and it should often not even be aimed at.
Interoperability, composability, and 12 /
Introduction: Design
goals
Being
open
Openness of distributed
systems
What are we talking about?
Be able to interact with services from other open systems,
irrespective of the underlying environment:
► Systems should conform to well-defined interfaces
► Systems should easily interoperate
► Systems should support portability of applications
► Systems should be easily extensible
Separating policy from 13 /
Introduction: Design
goals
Being
open
Policies versus mechanisms
Implementing openness: policies
► What level of consistency do we require for client-cached
data?
► Which operations do we allow downloaded code to
perform?
► Which QoS requirements do we adjust in the face of
varying bandwidth?
► What level of secrecy do we require for communication?
Implementing openness: mechanisms
► Allow (dynamic) setting of caching policies
► Support different levels of trust for mobile code
► Provide adjustable QoS parameters per data stream
► Offer different encryption algorithms
Separating policy from 14 /
Introduction: Design
goals
Being
open
On strict
separation
Observation
The stricter the separation between policy and mechanism, the
more we need to make ensure proper mechanisms, potentially
leading to many configuration parameters and complex
management.
Finding a balance
Hard coding policies often simplifies management and reduces
complexity at the price of less flexibility. There is no obvious
solution.
Scalability 15 /
Introduction: Design
goals
Being
scalable
Scale in distributed systems
Observation
Many developers of modern distributed systems easily use the
adjective “scalable” without making clear why their system
actually scales.
Scalability 15 /
Introduction: Design
goals
Being
scalable
Scale in distributed systems
Observation
Many developers of modern distributed systems easily use the
adjective “scalable” without making clear why their system
actually scales.
At least three components
► Number of users and/or processes (size scalability)
► Maximum distance between nodes (geographical
scalability)
► Number of administrative domains (administrative
scalability)
Scalability 15 /
Introduction: Design
goals
Being
scalable
Scale in distributed systems
Observation
Many developers of modern distributed systems easily use the
adjective “scalable” without making clear why their system
actually scales.
At least three components
► Number of users and/or processes (size scalability)
► Maximum distance between nodes (geographical
scalability)
► Number of administrative domains (administrative
scalability)
Observation
Most systems account only, to a certain extent, for size scalability.
Often a solution: multiple powerful servers operating
independently in parallel. Today, the challenge still lies in
geographical and administrative scalability.
Scalability 16 /
Introduction: Design
goals
Being
scalable
Size scalability
Root causes for scalability problems with
centralized solutions
► The computational capacity, limited by the CPUs
► The storage capacity, including the transfer rate
between CPUs
and disks
► The network between the user and the centralized
service
Introduction: Design goals Being
scalable
Formal
analysis
A centralized service can be modeled as a simple
queuing system
Requests
Response
Queue Process
Assumptions and notations
► The queue has infinite capacity ⇒ arrival rate of requests is
not influenced by current queue length or what is being
processed.
► Arrival rate requests: λ
► Processing capacity service: µ requests per second
Fraction of time having k requests in the system
pk = 1 −
λ
λ
µ
Scalability 17 /
k
Introduction: Design
goals
Being
scalable
Formal analysis
Utilization U of a service is the fraction of time that
it is busy
k > 0
µ
k 0 k
U = ∑ p = 1 −p =
λ
⇒ p = (1 −U)U k
Average number of requests in the
system
k ≥0 k ≥0 k
≥0
N = ∑ k ·pk = ∑ k ·(1−U)Uk
= (1−U) ∑ k ·Uk
=
(1 −U)U
=
U
(1 −U)2 1
−U
Average
throughput
server at
work
Scalability 18 /
server idle
µ
X = U · µ + (1 − U ) ·0 =
λ
· µ =
λ
Introduction: Design
goals
Being
scalable
Formal analysis
Response time: total time take to process a request
after submission
R =
N
=
S *
R
=
1
X 1 −U S 1
−U
with S = 1 being the service
time.
Scalability 19 /
µ
Observations
► If U is small, response-to-service time is close to 1: a request
is immediately processed
► If U goes up to 1, the system comes to a grinding halt.
Solution: decrease S.
Scalability 20 /
Introduction: Design
goals
Being
scalable
Problems with geographical
scalability
► Cannot simply go from LAN to WAN: many distributed
systems assume synchronous client-server interactions: client
sends request and waits for an answer. Latency may easily
prohibit this scheme.
► WAN links are often inherently unreliable: simply moving
streaming video from LAN to WAN is bound to fail.
► Lack of multipoint communication, so that a simple search
broadcast cannot be deployed. Solution is to develop
separate naming and directory services (having their own
scalability problems).
Scalability 21 /
Introduction: Design
goals
Being
scalable
Problems with administrative scalability
Essence
Conflicting policies concerning usage (and thus
payment), management, and security
Examples
► Computational grids: share expensive resources
between different domains.
► Shared equipment: how to control, manage, and use a
shared radio telescope constructed as large-scale shared
sensor network?
Exception: several peer-to-peer networks
► File-sharing systems (based, e.g., on BitTorrent)
► Peer-to-peer telephony (Skype)
► Peer-assisted audio streaming (Spotify)
Note: end users collaborate and not administrative entities.
Scaling 22 /
Introduction: Design
goals
Being
scalable
Techniques for
scaling
Hide communication latencies
► Make use of asynchronous communication
► Have separate handler for incoming
response
► Problem: not every application fits this
model
Introduction: Design
goals
Being
scalable
Techniques for scaling
Facilitate solution by moving computations to
client
M
A
A
R
T
E
N
FIRST NAME
LAST NAME
E-MAIL
MAARTEN
VAN STEEN
MVS@VAN-STEEN.NET
Serv er
Client
Check f
orm
Process
form
Serv er
Client
Check f
orm
Process f
orm
MVS@VAN-STEEN.NET
Scaling 23 /
FIRS T N A M E MAARTEN
LAST NAME VAN STEEN
E-MAIL
MAARTEN
VAN STEEN
MVS@VAN-
STEEN.NET
Scaling 24 /
Introduction: Design
goals
Being
scalable
Techniques for
scaling
Partition data and computations across multiple
machines
► Move computations to clients (Java applets)
► Decentralized naming services (DNS)
► Decentralized information systems (WWW)
Scaling 25 /
Introduction: Design
goals
Being
scalable
Techniques for
scaling
Replication and caching: Make copies of data
available at different machines
► Replicated file servers and databases
► Mirrored Web sites
► Web caches (in browsers and proxies)
► File caching (at server and client)
Scaling 26 /
Introduction: Design
goals
Being
scalable
Scaling: The problem with replication
Applying replication is easy, except for one
thing
Scaling 26 /
Introduction: Design
goals
Being
scalable
Scaling: The problem with replication
Applying replication is easy, except for one thing
► Having multiple copies (cached or replicated), leads to
inconsistencies: modifying one copy makes that copy
different from the rest.
Scaling 26 /
Introduction: Design
goals
Being
scalable
Scaling: The problem with replication
Applying replication is easy, except for one thing
► Having multiple copies (cached or replicated), leads to
inconsistencies: modifying one copy makes that copy
different from the rest.
► Always keeping copies consistent and in a general way
requires global synchronization on each modification.
Scaling 26 /
Introduction: Design
goals
Being
scalable
Scaling: The problem with replication
Applying replication is easy, except for one thing
► Having multiple copies (cached or replicated), leads to
inconsistencies: modifying one copy makes that copy
different from the rest.
► Always keeping copies consistent and in a general way
requires global synchronization on each modification.
► Global synchronization precludes large-scale solutions.
Scaling 26 /
Introduction: Design
goals
Being
scalable
Scaling: The problem with replication
Applying replication is easy, except for one thing
► Having multiple copies (cached or replicated), leads to
inconsistencies: modifying one copy makes that copy
different from the rest.
► Always keeping copies consistent and in a general way
requires global synchronization on each modification.
► Global synchronization precludes large-scale solutions.
Observation
If we can tolerate inconsistencies, we may reduce the need for
global synchronization, but tolerating inconsistencies is
application dependent.
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
► Bandwidth is infinite
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
► Bandwidth is infinite
► Transport cost is zero
27 /
Introduction: Design
goals Pitfalls
Developing distributed systems: Pitfalls
Observation
Many distributed systems are needlessly complex caused by
mistakes that required patching later on. Many false assumptions
are often made.
False (and often hidden) assumptions
► The network is reliable
► The network is secure
► The network is homogeneous
► The topology does not change
► Latency is zero
► Bandwidth is infinite
► Transport cost is zero
► There is one administrator
28 /
Introduction: Types of distributed
systems
Three types of distributed systems
► High performance distributed computing
systems
► Distributed information systems
► Distributed systems for pervasive computing
Introduction: Types of distributed
systems
High performance distributed
computing
Parallel computing
Observation
High-performance distributed computing started with
parallel computing
Multiprocessor and multicore versus
multicomputer
Shared memory Priv ate
memory
Processor
P P
M M M
Interconnect
P P
Memory
29 /
M M M
M
P P P P
Interconnect
30 /
Introduction: Types of distributed
systems
High performance distributed
computing
Distributed shared memory systems
Observation
Multiprocessors are relatively easy to program in comparison to
multicomputers, yet have problems when increasing the
number of processors (or cores). Solution: Try to implement a
shared-memory model on top of a multicomputer.
Example through virtual-memory techniques
Map all main-memory pages (from different processors) into one
single virtual address space. If process at processor A addresses a
page P located at processor B, the OS at A traps and fetches P from
B, just as it would if P had been located on local disk.
Problem
Performance of distributed shared memory could never compete
with that of multiprocessors, and failed to meet the expectations
of programmers. It has been widely abandoned by now.
Introduction: Types of distributed
systems
High performance distributed
computing
Cluster computing
Essentially a group of high-end systems
connected through a LAN
► Homogeneous: same OS, near-identical hardware
► Single managing node
Local OS
Local OS Local OS Local OS
Standard
network
Component
of
parallel
application
Component
of
parallel
application
Component
of
parallel
application
Parallel libs
Management
application
High-speed
network
Cluster 31 /
Remote access
network
Master node Compute
node
Compute node Compute node
Grid 32 /
Introduction: Types of distributed
systems
High performance distributed
computing
Grid
computing
The next step: lots of nodes from everywhere
► Heterogeneous
► Dispersed across several organizations
► Can easily span a wide-area network
Note
To allow for collaborations, grids generally use virtual
organizations. In essence, this is a grouping of users (or better:
their IDs) that will allow for authorization on resource allocation.
Introduction: Types of distributed systems High performance distributed
computing
Architecture for grid computing
Applications
Collectiv e layer
Fabric lay er
Connectivity layer
The layers
Fabric: Provides interfaces to local
resources (for querying state
and capabilities, locking, etc.)
Grid 33 /
Connectivity:
Communication/transaction
protocols, e.g., for moving data
between resources. Also various
authentication protocols.
Resource layer Resource: Manages a single resource,
such as creating processes or
reading data.
Collective: Handles access to
multiple resources: discovery,
scheduling, replication.
Application: Contains
actual grid applications
in a single organization.
Introduction: Types of distributed
systems
High performance distributed
computing
Cloud
computing
Application
Inf rastructure
Computation (VM), storage (block, file)
Hardware
Platf orms
Software framework (Java/Python/.Net)
Storage (databases)
aa
Sv
c
Inf
rastructure
Platf
orm
aa
Sv
c
Sof
tware
aa
Svc
MS Azure
Google App engine
Cloud 34 /
Amazon S3
Amazon EC2
Datacenters
CPU, memory, disk, bandwidth
Web services, multimedia, business apps
Google docs
Gmail
YouTube, Flickr
Cloud 35 /
Introduction: Types of distributed
systems
High performance distributed
computing
Cloud computing
Make a distinction between four layers
► Hardware: Processors, routers, power and cooling
systems. Customers normally never get to see these.
► Infrastructure: Deploys virtualization techniques. Evolves
around allocating and managing virtual storage devices and
virtual servers.
► Platform: Provides higher-level abstractions for storage and
such. Example: Amazon S3 storage system offers an API for
(locally created) files to be organized and stored in so-called
buckets.
► Application: Actual applications, such as office suites (text
processors, spreadsheet applications, presentation
applications). Comparable to the suite of apps shipped with
OSes.
Cloud 36 /
Introduction: Types of distributed systems High performance distributed
computing
Is cloud computing cost-effective?
Observation
An important reason for the success of cloud computing is that it
allows organizations to outsource their IT infrastructure: hardware
and software. Essential question: is outsourcing also cheaper?
Approach
► Consider enterprise applications, modeled as a
collection of components, each component Ci requiring Ni
servers.
► Application now becomes a directed graph, with a vertex
−→
representing a component, and an arc (i ,j) representing
data
flowing from Ci to Cj .
► Two associated weights per arc:
► Ti,j is the number of transactions per time unit that causes a
Cloud 37 /
Introduction: Types of distributed
systems
High performance distributed
computing
Is cloud computing cost-effective?
Migration plan
Figure out for each component Ci , how many ni of its Ni servers should
migrate, such that the monetary benefits reduced by additional
costs for Internet communication, are maximal.
Requirements migration plan
1. Policy constraints are met.
2. Additional latencies do not violate specific delay constraints.
3. All transactions continue to operate correctly; requests or data
are not lost during a transaction.
Cloud 38 /
Introduction: Types of distributed
systems
High performance distributed
computing
Computing
benefits
Monetary savings
► Bc : benefits of migrating a compute-intensive component
► Mc : total number of migrated compute-intensive components
► Bs : benefits of migrating a storage-intensive component
►Ms : total number of migrated storage-intensive components
Obviously, total benefits are: Bc ·Mc + Bs ·Ms
Cloud 39 /
Introduction: Types of distributed
systems
High performance distributed
computing
Internet
costs
Traffic to/from the cloud
Trlocal ,inet = ∑(Tuser ,i Suser ,i + Ti,user Si,user )
Ci
► Tuser,i : transaction per time unit causing data flow
from user to Ci
► Suser,i : amount of data associated with Tuser,i
Introduction: Types of distributed
systems
High performance distributed
computing
Rate of transactions after migration
Some notations
► Ci,local : set of servers of Ci that continue locally.
► Ci,cloud : set of servers of Ci that are placed in the cloud.
► Assume traffic distribution is the same for local and cloud
server
Note that |Ci,cloud |= ni . Let fi = ni /Ni , and si a server of Ci .
i ,j
T∗ =
(1 −f i) · (1 −f j) · T
i,j
(1 −fi ) · fj ·Ti,j
fi · (1 −f j ) ·T
i ,j fi · fj ·Ti,j
Cloud 40 /
when si ∈ Ci,local and sj ∈ Cj,local
when si ∈ Ci,local and sj ∈ Cj,clou d
when si ∈ Ci,clou d and sj ∈ Cj,loca l
when si ∈ Ci,cloud and sj ∈ Cj,cloud
Cloud 41 /
Introduction: Types of distributed
systems
High performance distributed
computing
Overall Internet
costs
Notations
►costlocal,inet : per unit Internet costs to local part
►costcloud ,inet : per unit Internet costs to cloud
Costs and traffic before and after
migration
i ,j i ,j j ,i j ,i
(T∗ S∗ + T∗ S∗ ) +
∑ (T ∗
Cj,local
S ∗ S∗
u ser ,j u ser,j j ,user
j ,user
+ T∗
)
Tr ∗
l ocal,inet =
Tr ∗
c loud ,inet =
∑
Ci ,local ,Cj ,local
∑
Ci,cloud ,Cj,cloud
i ,j i ,j j ,i j ,i
(T∗ S∗ + T∗ S∗ ) +
∑ (T ∗
Cj,cloud
+ T∗
S ∗ S∗
u ser ,j u ser,j j ,user
j ,user
)
costs
=costlocal,inet (Tr ∗l ocal,inet −Trlocal,inet ) + costcloud,inet T
r ∗ c loud ,inet
Introduction: Types of distributed
systems
Distributed information
systems
Integrating applications
Situation
Organizations confronted with many networked
applications, but achieving interoperability was painful.
Basic approach
A networked application is one that runs on a server making
its services available to remote clients. Simple integration:
clients combine requests for (different) applications; send that
off; collect responses, and present a coherent result to the
user.
Next step
Allow direct application-to-application communication,
leading to Enterprise Application Integration.
42 /
Introduction: Types of distributed
systems
Distributed information
systems
Example EAI: (nested)
transactions
Transaction
Primitive Description
BEGIN TRANSACTION Mark the start of a transaction
END TRANSACTION Terminate the transaction and try to
commit
ABORT TRANSACTION Kill the transaction and restore the old
values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise
Airline database Hotel database
Issue: all-or-
nothing
Nested transaction
Subtransaction Subtransaction
Two different (independent)
databases
Distributed transaction 43 /
► Atomic: happens indivisibly
(seemingly)
► Consistent: does not violate system
invariants
► Isolated: not mutual interference
► Durable: commit means changes
are permanent
Introduction: Types of distributed
systems
Distributed information
systems
TPM: Transaction Processing
Monitor
Serv er
Serv er
Serv er
Client
application
Requests
Reply
Request
Reply
TP monitor
Request
Request
Reply
Reply
Transaction
Observation
In many cases, the data involved in a transaction is distributed
across several servers. A TP Monitor is responsible for
coordinating the execution of a transaction.
Distributed transaction 44 /
Introduction: Types of distributed
systems
Distributed information
systems
Middleware and
EAI
Serv er-side
application
Serv er-side
application
Serv er-side
application
Client
application
Client
application
Communication middleware
Enterprise application 45 /
Middleware offers communication facilities for
integration
Remote Procedure Call (RPC): Requests are sent through local
procedure call, packaged as message, processed,
responded through message, and result returned as
return from call.
Message Oriented Middleware (MOM): Messages are sent to
logical contact point (published), and forwarded to
Enterprise application 46 /
Introduction: Types of distributed
systems
Distributed information
systems
How to integrate
applications
File transfer: Technically simple, but not flexible:
► Figure out file format and layout
► Figure out file management
► Update propagation, and update notifications.
Shared database: Much more flexible, but still requires common
data scheme next to risk of bottleneck.
Remote procedure call: Effective when execution of a series
of actions is needed.
Messaging: RPCs require caller and callee to be up and running
at the same time. Messaging allows decoupling in time and
space.
47 /
Introduction: Types of distributed
systems
Pervasive systems
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes
are small, mobile, and often embedded in a larger system,
characterized by the fact that the system naturally blends into the
user’s environment.
Three (overlapping) subtypes
47 /
Introduction: Types of distributed
systems
Pervasive systems
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes
are small, mobile, and often embedded in a larger system,
characterized by the fact that the system naturally blends into the
user’s environment.
Three (overlapping) subtypes
► Ubiquitous computing systems: pervasive and
continuously present, i.e., there is a continuous interaction
between system and user.
47 /
Introduction: Types of distributed
systems
Pervasive systems
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes
are small, mobile, and often embedded in a larger system,
characterized by the fact that the system naturally blends into the
user’s environment.
Three (overlapping) subtypes
► Ubiquitous computing systems: pervasive and
continuously present, i.e., there is a continuous interaction
between system and user.
► Mobile computing systems: pervasive, but emphasis is on the
fact that devices are inherently mobile.
47 /
Introduction: Types of distributed
systems
Pervasive systems
Distributed pervasive systems
Observation
Emerging next-generation of distributed systems in which nodes
are small, mobile, and often embedded in a larger system,
characterized by the fact that the system naturally blends into the
user’s environment.
Three (overlapping) subtypes
► Ubiquitous computing systems: pervasive and
continuously present, i.e., there is a continuous interaction
between system and user.
► Mobile computing systems: pervasive, but emphasis is on the
fact that devices are inherently mobile.
► Sensor (and actuator) networks: pervasive, with emphasis on
the actual (collaborative) sensing and actuation of the
environment.
Ubiquitous computing 48 /
Introduction: Types of distributed
systems
Pervasive systems
Ubiquitous systems
Core elements
1. (Distribution) Devices are networked, distributed, and
accessible in a transparent manner
2. (Interaction) Interaction between users and devices is
highly unobtrusive
3. (Context awareness) The system is aware of a user’s context
in order to optimize interaction
4. (Autonomy) Devices operate autonomously without
human intervention, and are thus highly self-managed
5. (Intelligence) The system as a whole can handle a wide range
of dynamic actions and interactions
Mobile computing 49 /
Introduction: Types of distributed
systems
Pervasive systems
Mobile
computing
Distinctive features
► A myriad of different mobile devices (smartphones, tablets,
GPS devices, remote controls, active badges.
► Mobile implies that a device’s location is expected to change
over time ⇒ change of local services, reachability, etc.
Keyword: discovery.
► Communication may become more difficult: no stable route,
but also perhaps no guaranteed connectivity ⇒ disruption-
tolerant networking.
Mobile computing 50 /
Introduction: Types of distributed
systems
Pervasive systems
Mobility patterns
Issue
What is the relationship between information dissemination and
human mobility? Basic idea: an encounter allows for the exchange
of information (pocket-switched networks).
A successful strategy
► Alice’s world consists of friends and strangers.
► If Alice wants to get a message to Bob: hand it out to all
her friends
► Friend passes message to Bob at first encounter
Observation
This strategy works because (apparently) there are relatively
closed communities of friends.
Mobile computing 51 /
Introduction: Types of distributed
systems
Pervasive systems
Community
detection
Issue
How to detect your community without having global
knowledge?
Gradually build your list
1. Node i maintains familiar set Fi and community set Ci , initially
both empty.
2. Node i adds j to Ci when
|Fj ∩Ci |
> λ
|Fj |
3. Merge two communities when |Ci ∩Cj |> γ|Ci ∪Cj |
Experiments show that λ = γ = 0.6 is good.
Introduction: Types of distributed
systems
Pervasive systems
How mobile are people?
Experimental results
Tracing 100,000 cell-phone users during six months leads
to:
5 10 50 100
10 -4
Mobile computing 52 /
10 -6
1
10 -2
500 1000
Displacement
Probability
Moreover: people tend to return to the same place after 24, 48,
or 72 hours ⇒ we’re not that mobile.
Sensor 53 /
Introduction: Types of distributed
systems
Pervasive systems
Sensor
networks
Characteristics
The nodes to which sensors are attached are:
► Many (10s-1000s)
► Simple (small memory/compute/communication
capacity)
► Often battery-powered (or even battery-less)
Introduction: Types of distributed
systems
Pervasive systems
Sensor networks as distributed
databases
Two extremes
Sensor network
Operator's site
Operator's site
Query
Sensors
send
only
answers
Sensor 54 /
Sensor network
Sensor data
is sent
directly to
operator
Each sensor
can process and
store data
Sensor 55 /
Introduction: Types of distributed
systems
Pervasive systems
Duty-cycled
networks
Issue
Many sensor networks need to operate on a strict energy
budget: introduce duty cycles
Definition
A node is active during Tactive time units, and then suspended
for
Tsuspended units, to become active again. Duty cycle τ :
τ =
Tactive
Tactive + Tsuspended
Typical duty cycles are 10 − 30%, but can also
be lower than 1%.
Sensor 56 /
Introduction: Types of distributed systems Pervasive
systems
Keeping duty-cycled networks in sync
Issue
If duty cycles are low, sensor nodes may not wake up at the same
time anymore and become permanently disconnected: they are
active during different, nonoverlapping time slots.
Solution
►Each node A adopts a cluster ID CA, being a number.
► Let a node send a join message during its suspended period.
►When A receives a join message from B and CA < CB , it sends a join
message to its neighbors (in cluster CA) before joining B.
►When CA > CB it sends a join message to B during B’s active
period.
Note
Once a join message reaches a whole cluster, merging two
clusters is very fast. Merging means: re-adjust clocks.

More Related Content

PPT
chapter 1- introduction to distributed system.ppt
PDF
نظم موزعة Distributed systems slides.01.pdf
PPTX
distributed_systems_introduction_chapter_1.pptx
PPTX
Intro to distributed systems
PPTX
Chapter-1-IntroDistributeddffsfdfsdf-1.pptx
PPT
Lecture 2 - Definition and Goals of a Distributed System.ppt
PPTX
Lecture 9.pptx
PDF
Distributed Network, Computing, and System
chapter 1- introduction to distributed system.ppt
نظم موزعة Distributed systems slides.01.pdf
distributed_systems_introduction_chapter_1.pptx
Intro to distributed systems
Chapter-1-IntroDistributeddffsfdfsdf-1.pptx
Lecture 2 - Definition and Goals of a Distributed System.ppt
Lecture 9.pptx
Distributed Network, Computing, and System

Similar to Distributed Systems (3rd Edition)Introduction (20)

PDF
intro_to_dis.pdf
PDF
Introduction to Distributed Systems
PPTX
Chapter 1-Introduction to distributed system.pptx
PPT
Chap 01 lecture 1distributed computer lecture
PDF
distributed system chapter one introduction to distribued system.pdf
PPT
Chapter 1-Introduction.ppt
PDF
Introduction to paralle and distributed computing
PPTX
Distributed Systems.pptx
PPT
Intorduction Distributed and Parallel Computing.ppt
DOCX
Distributed system Tanenbaum chapter 1,2,3,4 notes
PPTX
01 - Introduction to Distributed Systems
PPTX
Lect 1 Distributed System.pptx
PDF
chapter 1-Introductionkkkclll;;;x;lc,.pdf
PPTX
Distributed Computing Introduction01.pptx
PPTX
Introduction to Distributed System
PDF
Lecture 1 distriubted computing
PPT
Chapter One.ppt
PPT
Chapter 1-Introduction.ppt
PDF
distributed system original.pdf
PPTX
Distributed Systems for Data Sciene and Analytics
intro_to_dis.pdf
Introduction to Distributed Systems
Chapter 1-Introduction to distributed system.pptx
Chap 01 lecture 1distributed computer lecture
distributed system chapter one introduction to distribued system.pdf
Chapter 1-Introduction.ppt
Introduction to paralle and distributed computing
Distributed Systems.pptx
Intorduction Distributed and Parallel Computing.ppt
Distributed system Tanenbaum chapter 1,2,3,4 notes
01 - Introduction to Distributed Systems
Lect 1 Distributed System.pptx
chapter 1-Introductionkkkclll;;;x;lc,.pdf
Distributed Computing Introduction01.pptx
Introduction to Distributed System
Lecture 1 distriubted computing
Chapter One.ppt
Chapter 1-Introduction.ppt
distributed system original.pdf
Distributed Systems for Data Sciene and Analytics
Ad

Recently uploaded (20)

PDF
higher edu open stores 12.5.24 (1).pdf forreal
PDF
Honda Dealership SNS Evaluation pdf/ppts
PDF
Volvo EC290C NL EC290CNL Excavator Service Repair Manual Instant Download.pdf
PDF
How Much does a Volvo EC290C NL EC290CNL Weight.pdf
PDF
industrial engineering and safety system
PPTX
1. introduction-to-bvcjdhjdfffffffffffffffffffffffffffffffffffmicroprocessors...
PPTX
Gayatri Cultural Educational Society.pptx
PPTX
Paediatric History & Clinical Examination.pptx
PPTX
Fire Fighting Unit IV industrial safety.pptx
PPTX
UNIT-2(B) Organisavtional Appraisal.pptx
PDF
Volvo EC290C NL EC290CNL excavator weight.pdf
PPTX
Robot_ppt_YRG[1] [Read-Only]bestppt.pptx
PPT
ACCOMPLISHMENT REPOERTS AND FILE OF GRADE 12 2021.ppt
PPT
Kaizen for Beginners and how to implement Kaizen
PDF
EC300D LR EC300DLR - Volvo Service Repair Manual.pdf
PDF
Caterpillar CAT 312B L EXCAVATOR (2KW00001-UP) Operation and Maintenance Manu...
PPTX
laws of thermodynamics with complete explanation
PDF
Volvo EC20C Excavator Step-by-step Maintenance Instructions pdf
PDF
Renesas R-Car_Cockpit_overview210214-Gen4.pdf
PDF
Volvo EC20C Excavator Service maintenance schedules.pdf
higher edu open stores 12.5.24 (1).pdf forreal
Honda Dealership SNS Evaluation pdf/ppts
Volvo EC290C NL EC290CNL Excavator Service Repair Manual Instant Download.pdf
How Much does a Volvo EC290C NL EC290CNL Weight.pdf
industrial engineering and safety system
1. introduction-to-bvcjdhjdfffffffffffffffffffffffffffffffffffmicroprocessors...
Gayatri Cultural Educational Society.pptx
Paediatric History & Clinical Examination.pptx
Fire Fighting Unit IV industrial safety.pptx
UNIT-2(B) Organisavtional Appraisal.pptx
Volvo EC290C NL EC290CNL excavator weight.pdf
Robot_ppt_YRG[1] [Read-Only]bestppt.pptx
ACCOMPLISHMENT REPOERTS AND FILE OF GRADE 12 2021.ppt
Kaizen for Beginners and how to implement Kaizen
EC300D LR EC300DLR - Volvo Service Repair Manual.pdf
Caterpillar CAT 312B L EXCAVATOR (2KW00001-UP) Operation and Maintenance Manu...
laws of thermodynamics with complete explanation
Volvo EC20C Excavator Step-by-step Maintenance Instructions pdf
Renesas R-Car_Cockpit_overview210214-Gen4.pdf
Volvo EC20C Excavator Service maintenance schedules.pdf
Ad

Distributed Systems (3rd Edition)Introduction

  • 1. Distributed Systems (3rd Edition) Chapter 01: Introduction Version: March 20, 2022
  • 2. 2 / Introduction: What is a distributed system? Distributed System Definition A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. Characteristic features ► Autonomous computing elements, also referred to as nodes, be they hardware devices or software processes. ► Single coherent system: users or applications perceive a single system ⇒ nodes need to collaborate.
  • 3. 3 / Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing elements Collection of autonomous nodes Independent behavior Each node is autonomous and will thus have its own notion of time: there is no global clock. Leads to fundamental synchronization and coordination problems. Collection of nodes ► How to manage group membership? ► How to know that you are indeed communicating with an authorized (non)member?
  • 4. 4 / Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing elements Organization Overlay network Each node in the collection communicates only with other nodes in the system, its neighbors. The set of neighbors may be dynamic, or may even be known only implicitly (i.e., requires a lookup). Overlay types Well-known example of overlay networks: peer-to-peer systems. Structured: each node has a well-defined set of neighbors with whom it can communicate (tree, ring). Unstructured: each node has references to randomly selected other nodes from the system.
  • 5. 5 / Introduction: What is a distributed system? Characteristic 2: Single coherent system Coherent system Essence The collection of nodes as a whole operates the same, no matter where, when, and how interaction between a user and the system takes place. Examples ► An end user cannot tell where a computation is taking place ► Where data is exactly stored should be irrelevant to an application ► If or not data has been replicated is completely hidden Keyword is distribution transparency The snag: partial failures It is inevitable that at any time only a part of the distributed system fails. Hiding partial failures and their recovery is often very difficult and in general impossible to hide.
  • 6. Introduction: What is a distributed system? Middleware and distributed systems Middleware: the OS of distributed systems Local OS 1 Local OS 2 Local OS 3 Local OS 4 Appl. A Application B Appl. C Distributed-system layer (middleware) Computer 3 Computer 4 Same interf ace everywhere Computer 1 Computer 2 Network 6 / What does it contain? Commonly used components and functions that need not be implemented by applications separately.
  • 7. 7 / Introduction: Design goals What do we want to achieve? ► Support sharing of resources ► Distribution transparency ► Openness ► Scalability
  • 8. 8 / Introduction: Design goals Supporting resource sharing Sharing resources Canonical examples ► Cloud-based shared storage and files ► Peer-to-peer assisted multimedia streaming ► Shared mail services (think of outsourced mail systems) ► Shared Web hosting (think of content distribution networks) Observation “The network is the computer” (quote from John Gage, then at Sun Microsystems)
  • 9. Types of distribution 9 / Introduction: Design goals Making distribution transparent Distribution transparency Types Transparency Description Access Hide differences in data representation and how an object is accessed Location Hide where an object is located Relocation Hide that an object may be moved to another location while in use Migration Hide that an object may move to another location Replication Hide that an object is replicated Concurrency Hide that an object may be shared by several independent users Failure Hide the failure and recovery of an object
  • 10. Degree of distribution 10 / Introduction: Design goals Making distribution transparent Degree of transparency Observation Aiming at full distribution transparency may be too much:
  • 11. Degree of distribution 10 / Introduction: Design goals Making distribution transparent Degree of transparency Observation Aiming at full distribution transparency may be too much: ► There are communication latencies that cannot be hidden
  • 12. Degree of distribution 10 / Introduction: Design goals Making distribution transparent Degree of transparency Observation Aiming at full distribution transparency may be too much: ► There are communication latencies that cannot be hidden ► Completely hiding failures of networks and nodes is (theoretically and practically) impossible ► You cannot distinguish a slow computer from a failing one ► You can never be sure that a server actually performed an operation before a crash
  • 13. Degree of distribution 10 / Introduction: Design goals Making distribution transparent Degree of transparency Observation Aiming at full distribution transparency may be too much: ► There are communication latencies that cannot be hidden ► Completely hiding failures of networks and nodes is (theoretically and practically) impossible ► You cannot distinguish a slow computer from a failing one ► You can never be sure that a server actually performed an operation before a crash ► Full transparency will cost performance, exposing distribution of the system ► Keeping replicas exactly up-to-date with the master takes time ► Immediately flushing write operations to disk for fault tolerance
  • 14. Degree of distribution 11 / Introduction: Design goals Making distribution transparent Degree of transparency Exposing distribution may be good ► Making use of location-based services (finding your nearby friends) ► When dealing with users in different time zones ► When it makes it easier for a user to understand what’s going on (when e.g., a server does not respond for a long time, report it as failing).
  • 15. Degree of distribution 11 / Introduction: Design goals Making distribution transparent Degree of transparency Exposing distribution may be good ► Making use of location-based services (finding your nearby friends) ► When dealing with users in different time zones ► When it makes it easier for a user to understand what’s going on (when e.g., a server does not respond for a long time, report it as failing). Conclusion Distribution transparency is a nice a goal, but achieving it is a different story, and it should often not even be aimed at.
  • 16. Interoperability, composability, and 12 / Introduction: Design goals Being open Openness of distributed systems What are we talking about? Be able to interact with services from other open systems, irrespective of the underlying environment: ► Systems should conform to well-defined interfaces ► Systems should easily interoperate ► Systems should support portability of applications ► Systems should be easily extensible
  • 17. Separating policy from 13 / Introduction: Design goals Being open Policies versus mechanisms Implementing openness: policies ► What level of consistency do we require for client-cached data? ► Which operations do we allow downloaded code to perform? ► Which QoS requirements do we adjust in the face of varying bandwidth? ► What level of secrecy do we require for communication? Implementing openness: mechanisms ► Allow (dynamic) setting of caching policies ► Support different levels of trust for mobile code ► Provide adjustable QoS parameters per data stream ► Offer different encryption algorithms
  • 18. Separating policy from 14 / Introduction: Design goals Being open On strict separation Observation The stricter the separation between policy and mechanism, the more we need to make ensure proper mechanisms, potentially leading to many configuration parameters and complex management. Finding a balance Hard coding policies often simplifies management and reduces complexity at the price of less flexibility. There is no obvious solution.
  • 19. Scalability 15 / Introduction: Design goals Being scalable Scale in distributed systems Observation Many developers of modern distributed systems easily use the adjective “scalable” without making clear why their system actually scales.
  • 20. Scalability 15 / Introduction: Design goals Being scalable Scale in distributed systems Observation Many developers of modern distributed systems easily use the adjective “scalable” without making clear why their system actually scales. At least three components ► Number of users and/or processes (size scalability) ► Maximum distance between nodes (geographical scalability) ► Number of administrative domains (administrative scalability)
  • 21. Scalability 15 / Introduction: Design goals Being scalable Scale in distributed systems Observation Many developers of modern distributed systems easily use the adjective “scalable” without making clear why their system actually scales. At least three components ► Number of users and/or processes (size scalability) ► Maximum distance between nodes (geographical scalability) ► Number of administrative domains (administrative scalability) Observation Most systems account only, to a certain extent, for size scalability. Often a solution: multiple powerful servers operating independently in parallel. Today, the challenge still lies in geographical and administrative scalability.
  • 22. Scalability 16 / Introduction: Design goals Being scalable Size scalability Root causes for scalability problems with centralized solutions ► The computational capacity, limited by the CPUs ► The storage capacity, including the transfer rate between CPUs and disks ► The network between the user and the centralized service
  • 23. Introduction: Design goals Being scalable Formal analysis A centralized service can be modeled as a simple queuing system Requests Response Queue Process Assumptions and notations ► The queue has infinite capacity ⇒ arrival rate of requests is not influenced by current queue length or what is being processed. ► Arrival rate requests: λ ► Processing capacity service: µ requests per second Fraction of time having k requests in the system pk = 1 − λ λ µ Scalability 17 / k
  • 24. Introduction: Design goals Being scalable Formal analysis Utilization U of a service is the fraction of time that it is busy k > 0 µ k 0 k U = ∑ p = 1 −p = λ ⇒ p = (1 −U)U k Average number of requests in the system k ≥0 k ≥0 k ≥0 N = ∑ k ·pk = ∑ k ·(1−U)Uk = (1−U) ∑ k ·Uk = (1 −U)U = U (1 −U)2 1 −U Average throughput server at work Scalability 18 / server idle µ X = U · µ + (1 − U ) ·0 = λ · µ = λ
  • 25. Introduction: Design goals Being scalable Formal analysis Response time: total time take to process a request after submission R = N = S * R = 1 X 1 −U S 1 −U with S = 1 being the service time. Scalability 19 / µ Observations ► If U is small, response-to-service time is close to 1: a request is immediately processed ► If U goes up to 1, the system comes to a grinding halt. Solution: decrease S.
  • 26. Scalability 20 / Introduction: Design goals Being scalable Problems with geographical scalability ► Cannot simply go from LAN to WAN: many distributed systems assume synchronous client-server interactions: client sends request and waits for an answer. Latency may easily prohibit this scheme. ► WAN links are often inherently unreliable: simply moving streaming video from LAN to WAN is bound to fail. ► Lack of multipoint communication, so that a simple search broadcast cannot be deployed. Solution is to develop separate naming and directory services (having their own scalability problems).
  • 27. Scalability 21 / Introduction: Design goals Being scalable Problems with administrative scalability Essence Conflicting policies concerning usage (and thus payment), management, and security Examples ► Computational grids: share expensive resources between different domains. ► Shared equipment: how to control, manage, and use a shared radio telescope constructed as large-scale shared sensor network? Exception: several peer-to-peer networks ► File-sharing systems (based, e.g., on BitTorrent) ► Peer-to-peer telephony (Skype) ► Peer-assisted audio streaming (Spotify) Note: end users collaborate and not administrative entities.
  • 28. Scaling 22 / Introduction: Design goals Being scalable Techniques for scaling Hide communication latencies ► Make use of asynchronous communication ► Have separate handler for incoming response ► Problem: not every application fits this model
  • 29. Introduction: Design goals Being scalable Techniques for scaling Facilitate solution by moving computations to client M A A R T E N FIRST NAME LAST NAME E-MAIL MAARTEN VAN STEEN MVS@VAN-STEEN.NET Serv er Client Check f orm Process form Serv er Client Check f orm Process f orm MVS@VAN-STEEN.NET Scaling 23 / FIRS T N A M E MAARTEN LAST NAME VAN STEEN E-MAIL MAARTEN VAN STEEN MVS@VAN- STEEN.NET
  • 30. Scaling 24 / Introduction: Design goals Being scalable Techniques for scaling Partition data and computations across multiple machines ► Move computations to clients (Java applets) ► Decentralized naming services (DNS) ► Decentralized information systems (WWW)
  • 31. Scaling 25 / Introduction: Design goals Being scalable Techniques for scaling Replication and caching: Make copies of data available at different machines ► Replicated file servers and databases ► Mirrored Web sites ► Web caches (in browsers and proxies) ► File caching (at server and client)
  • 32. Scaling 26 / Introduction: Design goals Being scalable Scaling: The problem with replication Applying replication is easy, except for one thing
  • 33. Scaling 26 / Introduction: Design goals Being scalable Scaling: The problem with replication Applying replication is easy, except for one thing ► Having multiple copies (cached or replicated), leads to inconsistencies: modifying one copy makes that copy different from the rest.
  • 34. Scaling 26 / Introduction: Design goals Being scalable Scaling: The problem with replication Applying replication is easy, except for one thing ► Having multiple copies (cached or replicated), leads to inconsistencies: modifying one copy makes that copy different from the rest. ► Always keeping copies consistent and in a general way requires global synchronization on each modification.
  • 35. Scaling 26 / Introduction: Design goals Being scalable Scaling: The problem with replication Applying replication is easy, except for one thing ► Having multiple copies (cached or replicated), leads to inconsistencies: modifying one copy makes that copy different from the rest. ► Always keeping copies consistent and in a general way requires global synchronization on each modification. ► Global synchronization precludes large-scale solutions.
  • 36. Scaling 26 / Introduction: Design goals Being scalable Scaling: The problem with replication Applying replication is easy, except for one thing ► Having multiple copies (cached or replicated), leads to inconsistencies: modifying one copy makes that copy different from the rest. ► Always keeping copies consistent and in a general way requires global synchronization on each modification. ► Global synchronization precludes large-scale solutions. Observation If we can tolerate inconsistencies, we may reduce the need for global synchronization, but tolerating inconsistencies is application dependent.
  • 37. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made.
  • 38. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions
  • 39. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable
  • 40. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure
  • 41. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous
  • 42. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous ► The topology does not change
  • 43. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous ► The topology does not change ► Latency is zero
  • 44. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous ► The topology does not change ► Latency is zero ► Bandwidth is infinite
  • 45. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous ► The topology does not change ► Latency is zero ► Bandwidth is infinite ► Transport cost is zero
  • 46. 27 / Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous ► The topology does not change ► Latency is zero ► Bandwidth is infinite ► Transport cost is zero ► There is one administrator
  • 47. 28 / Introduction: Types of distributed systems Three types of distributed systems ► High performance distributed computing systems ► Distributed information systems ► Distributed systems for pervasive computing
  • 48. Introduction: Types of distributed systems High performance distributed computing Parallel computing Observation High-performance distributed computing started with parallel computing Multiprocessor and multicore versus multicomputer Shared memory Priv ate memory Processor P P M M M Interconnect P P Memory 29 / M M M M P P P P Interconnect
  • 49. 30 / Introduction: Types of distributed systems High performance distributed computing Distributed shared memory systems Observation Multiprocessors are relatively easy to program in comparison to multicomputers, yet have problems when increasing the number of processors (or cores). Solution: Try to implement a shared-memory model on top of a multicomputer. Example through virtual-memory techniques Map all main-memory pages (from different processors) into one single virtual address space. If process at processor A addresses a page P located at processor B, the OS at A traps and fetches P from B, just as it would if P had been located on local disk. Problem Performance of distributed shared memory could never compete with that of multiprocessors, and failed to meet the expectations of programmers. It has been widely abandoned by now.
  • 50. Introduction: Types of distributed systems High performance distributed computing Cluster computing Essentially a group of high-end systems connected through a LAN ► Homogeneous: same OS, near-identical hardware ► Single managing node Local OS Local OS Local OS Local OS Standard network Component of parallel application Component of parallel application Component of parallel application Parallel libs Management application High-speed network Cluster 31 / Remote access network Master node Compute node Compute node Compute node
  • 51. Grid 32 / Introduction: Types of distributed systems High performance distributed computing Grid computing The next step: lots of nodes from everywhere ► Heterogeneous ► Dispersed across several organizations ► Can easily span a wide-area network Note To allow for collaborations, grids generally use virtual organizations. In essence, this is a grouping of users (or better: their IDs) that will allow for authorization on resource allocation.
  • 52. Introduction: Types of distributed systems High performance distributed computing Architecture for grid computing Applications Collectiv e layer Fabric lay er Connectivity layer The layers Fabric: Provides interfaces to local resources (for querying state and capabilities, locking, etc.) Grid 33 / Connectivity: Communication/transaction protocols, e.g., for moving data between resources. Also various authentication protocols. Resource layer Resource: Manages a single resource, such as creating processes or reading data. Collective: Handles access to multiple resources: discovery, scheduling, replication. Application: Contains actual grid applications in a single organization.
  • 53. Introduction: Types of distributed systems High performance distributed computing Cloud computing Application Inf rastructure Computation (VM), storage (block, file) Hardware Platf orms Software framework (Java/Python/.Net) Storage (databases) aa Sv c Inf rastructure Platf orm aa Sv c Sof tware aa Svc MS Azure Google App engine Cloud 34 / Amazon S3 Amazon EC2 Datacenters CPU, memory, disk, bandwidth Web services, multimedia, business apps Google docs Gmail YouTube, Flickr
  • 54. Cloud 35 / Introduction: Types of distributed systems High performance distributed computing Cloud computing Make a distinction between four layers ► Hardware: Processors, routers, power and cooling systems. Customers normally never get to see these. ► Infrastructure: Deploys virtualization techniques. Evolves around allocating and managing virtual storage devices and virtual servers. ► Platform: Provides higher-level abstractions for storage and such. Example: Amazon S3 storage system offers an API for (locally created) files to be organized and stored in so-called buckets. ► Application: Actual applications, such as office suites (text processors, spreadsheet applications, presentation applications). Comparable to the suite of apps shipped with OSes.
  • 55. Cloud 36 / Introduction: Types of distributed systems High performance distributed computing Is cloud computing cost-effective? Observation An important reason for the success of cloud computing is that it allows organizations to outsource their IT infrastructure: hardware and software. Essential question: is outsourcing also cheaper? Approach ► Consider enterprise applications, modeled as a collection of components, each component Ci requiring Ni servers. ► Application now becomes a directed graph, with a vertex −→ representing a component, and an arc (i ,j) representing data flowing from Ci to Cj . ► Two associated weights per arc: ► Ti,j is the number of transactions per time unit that causes a
  • 56. Cloud 37 / Introduction: Types of distributed systems High performance distributed computing Is cloud computing cost-effective? Migration plan Figure out for each component Ci , how many ni of its Ni servers should migrate, such that the monetary benefits reduced by additional costs for Internet communication, are maximal. Requirements migration plan 1. Policy constraints are met. 2. Additional latencies do not violate specific delay constraints. 3. All transactions continue to operate correctly; requests or data are not lost during a transaction.
  • 57. Cloud 38 / Introduction: Types of distributed systems High performance distributed computing Computing benefits Monetary savings ► Bc : benefits of migrating a compute-intensive component ► Mc : total number of migrated compute-intensive components ► Bs : benefits of migrating a storage-intensive component ►Ms : total number of migrated storage-intensive components Obviously, total benefits are: Bc ·Mc + Bs ·Ms
  • 58. Cloud 39 / Introduction: Types of distributed systems High performance distributed computing Internet costs Traffic to/from the cloud Trlocal ,inet = ∑(Tuser ,i Suser ,i + Ti,user Si,user ) Ci ► Tuser,i : transaction per time unit causing data flow from user to Ci ► Suser,i : amount of data associated with Tuser,i
  • 59. Introduction: Types of distributed systems High performance distributed computing Rate of transactions after migration Some notations ► Ci,local : set of servers of Ci that continue locally. ► Ci,cloud : set of servers of Ci that are placed in the cloud. ► Assume traffic distribution is the same for local and cloud server Note that |Ci,cloud |= ni . Let fi = ni /Ni , and si a server of Ci . i ,j T∗ = (1 −f i) · (1 −f j) · T i,j (1 −fi ) · fj ·Ti,j fi · (1 −f j ) ·T i ,j fi · fj ·Ti,j Cloud 40 / when si ∈ Ci,local and sj ∈ Cj,local when si ∈ Ci,local and sj ∈ Cj,clou d when si ∈ Ci,clou d and sj ∈ Cj,loca l when si ∈ Ci,cloud and sj ∈ Cj,cloud
  • 60. Cloud 41 / Introduction: Types of distributed systems High performance distributed computing Overall Internet costs Notations ►costlocal,inet : per unit Internet costs to local part ►costcloud ,inet : per unit Internet costs to cloud Costs and traffic before and after migration i ,j i ,j j ,i j ,i (T∗ S∗ + T∗ S∗ ) + ∑ (T ∗ Cj,local S ∗ S∗ u ser ,j u ser,j j ,user j ,user + T∗ ) Tr ∗ l ocal,inet = Tr ∗ c loud ,inet = ∑ Ci ,local ,Cj ,local ∑ Ci,cloud ,Cj,cloud i ,j i ,j j ,i j ,i (T∗ S∗ + T∗ S∗ ) + ∑ (T ∗ Cj,cloud + T∗ S ∗ S∗ u ser ,j u ser,j j ,user j ,user ) costs =costlocal,inet (Tr ∗l ocal,inet −Trlocal,inet ) + costcloud,inet T r ∗ c loud ,inet
  • 61. Introduction: Types of distributed systems Distributed information systems Integrating applications Situation Organizations confronted with many networked applications, but achieving interoperability was painful. Basic approach A networked application is one that runs on a server making its services available to remote clients. Simple integration: clients combine requests for (different) applications; send that off; collect responses, and present a coherent result to the user. Next step Allow direct application-to-application communication, leading to Enterprise Application Integration. 42 /
  • 62. Introduction: Types of distributed systems Distributed information systems Example EAI: (nested) transactions Transaction Primitive Description BEGIN TRANSACTION Mark the start of a transaction END TRANSACTION Terminate the transaction and try to commit ABORT TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise Airline database Hotel database Issue: all-or- nothing Nested transaction Subtransaction Subtransaction Two different (independent) databases Distributed transaction 43 / ► Atomic: happens indivisibly (seemingly) ► Consistent: does not violate system invariants ► Isolated: not mutual interference ► Durable: commit means changes are permanent
  • 63. Introduction: Types of distributed systems Distributed information systems TPM: Transaction Processing Monitor Serv er Serv er Serv er Client application Requests Reply Request Reply TP monitor Request Request Reply Reply Transaction Observation In many cases, the data involved in a transaction is distributed across several servers. A TP Monitor is responsible for coordinating the execution of a transaction. Distributed transaction 44 /
  • 64. Introduction: Types of distributed systems Distributed information systems Middleware and EAI Serv er-side application Serv er-side application Serv er-side application Client application Client application Communication middleware Enterprise application 45 / Middleware offers communication facilities for integration Remote Procedure Call (RPC): Requests are sent through local procedure call, packaged as message, processed, responded through message, and result returned as return from call. Message Oriented Middleware (MOM): Messages are sent to logical contact point (published), and forwarded to
  • 65. Enterprise application 46 / Introduction: Types of distributed systems Distributed information systems How to integrate applications File transfer: Technically simple, but not flexible: ► Figure out file format and layout ► Figure out file management ► Update propagation, and update notifications. Shared database: Much more flexible, but still requires common data scheme next to risk of bottleneck. Remote procedure call: Effective when execution of a series of actions is needed. Messaging: RPCs require caller and callee to be up and running at the same time. Messaging allows decoupling in time and space.
  • 66. 47 / Introduction: Types of distributed systems Pervasive systems Distributed pervasive systems Observation Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment. Three (overlapping) subtypes
  • 67. 47 / Introduction: Types of distributed systems Pervasive systems Distributed pervasive systems Observation Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment. Three (overlapping) subtypes ► Ubiquitous computing systems: pervasive and continuously present, i.e., there is a continuous interaction between system and user.
  • 68. 47 / Introduction: Types of distributed systems Pervasive systems Distributed pervasive systems Observation Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment. Three (overlapping) subtypes ► Ubiquitous computing systems: pervasive and continuously present, i.e., there is a continuous interaction between system and user. ► Mobile computing systems: pervasive, but emphasis is on the fact that devices are inherently mobile.
  • 69. 47 / Introduction: Types of distributed systems Pervasive systems Distributed pervasive systems Observation Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment. Three (overlapping) subtypes ► Ubiquitous computing systems: pervasive and continuously present, i.e., there is a continuous interaction between system and user. ► Mobile computing systems: pervasive, but emphasis is on the fact that devices are inherently mobile. ► Sensor (and actuator) networks: pervasive, with emphasis on the actual (collaborative) sensing and actuation of the environment.
  • 70. Ubiquitous computing 48 / Introduction: Types of distributed systems Pervasive systems Ubiquitous systems Core elements 1. (Distribution) Devices are networked, distributed, and accessible in a transparent manner 2. (Interaction) Interaction between users and devices is highly unobtrusive 3. (Context awareness) The system is aware of a user’s context in order to optimize interaction 4. (Autonomy) Devices operate autonomously without human intervention, and are thus highly self-managed 5. (Intelligence) The system as a whole can handle a wide range of dynamic actions and interactions
  • 71. Mobile computing 49 / Introduction: Types of distributed systems Pervasive systems Mobile computing Distinctive features ► A myriad of different mobile devices (smartphones, tablets, GPS devices, remote controls, active badges. ► Mobile implies that a device’s location is expected to change over time ⇒ change of local services, reachability, etc. Keyword: discovery. ► Communication may become more difficult: no stable route, but also perhaps no guaranteed connectivity ⇒ disruption- tolerant networking.
  • 72. Mobile computing 50 / Introduction: Types of distributed systems Pervasive systems Mobility patterns Issue What is the relationship between information dissemination and human mobility? Basic idea: an encounter allows for the exchange of information (pocket-switched networks). A successful strategy ► Alice’s world consists of friends and strangers. ► If Alice wants to get a message to Bob: hand it out to all her friends ► Friend passes message to Bob at first encounter Observation This strategy works because (apparently) there are relatively closed communities of friends.
  • 73. Mobile computing 51 / Introduction: Types of distributed systems Pervasive systems Community detection Issue How to detect your community without having global knowledge? Gradually build your list 1. Node i maintains familiar set Fi and community set Ci , initially both empty. 2. Node i adds j to Ci when |Fj ∩Ci | > λ |Fj | 3. Merge two communities when |Ci ∩Cj |> γ|Ci ∪Cj | Experiments show that λ = γ = 0.6 is good.
  • 74. Introduction: Types of distributed systems Pervasive systems How mobile are people? Experimental results Tracing 100,000 cell-phone users during six months leads to: 5 10 50 100 10 -4 Mobile computing 52 / 10 -6 1 10 -2 500 1000 Displacement Probability Moreover: people tend to return to the same place after 24, 48, or 72 hours ⇒ we’re not that mobile.
  • 75. Sensor 53 / Introduction: Types of distributed systems Pervasive systems Sensor networks Characteristics The nodes to which sensors are attached are: ► Many (10s-1000s) ► Simple (small memory/compute/communication capacity) ► Often battery-powered (or even battery-less)
  • 76. Introduction: Types of distributed systems Pervasive systems Sensor networks as distributed databases Two extremes Sensor network Operator's site Operator's site Query Sensors send only answers Sensor 54 / Sensor network Sensor data is sent directly to operator Each sensor can process and store data
  • 77. Sensor 55 / Introduction: Types of distributed systems Pervasive systems Duty-cycled networks Issue Many sensor networks need to operate on a strict energy budget: introduce duty cycles Definition A node is active during Tactive time units, and then suspended for Tsuspended units, to become active again. Duty cycle τ : τ = Tactive Tactive + Tsuspended Typical duty cycles are 10 − 30%, but can also be lower than 1%.
  • 78. Sensor 56 / Introduction: Types of distributed systems Pervasive systems Keeping duty-cycled networks in sync Issue If duty cycles are low, sensor nodes may not wake up at the same time anymore and become permanently disconnected: they are active during different, nonoverlapping time slots. Solution ►Each node A adopts a cluster ID CA, being a number. ► Let a node send a join message during its suspended period. ►When A receives a join message from B and CA < CB , it sends a join message to its neighbors (in cluster CA) before joining B. ►When CA > CB it sends a join message to B during B’s active period. Note Once a join message reaches a whole cluster, merging two clusters is very fast. Merging means: re-adjust clocks.