2. 2
Introduction and Definition
Before the mid-80s, computers were
very expensive (hundred of thousands or even millions of
dollars)
very slow (a few thousand instructions per second)
not connected among themselves
After the mid-80s two major developments
Cheap and powerful microprocessor-based computers
appeared
Computer networks
LANs at speeds ranging from 10 to 1000 Mbps
WANs at speed ranging from 64 Kbps to gigabits/sec
2
2/24/2021
3. 3
Consequence
Feasibility of using a large network of computers to
work for the same application; this is in contrast to
the old centralized systems where there was a single
computer with its peripherals
3
2/24/2021
4. 4
A distributed system is:
a collection of independent computers that appears to
its users as a single coherent system - computer
(Tanenbaum & Van Steen)
This definition has two aspects:
1. Hardware: autonomous machines
2. Software: a single system view for the users
Definition
4
2/24/2021
5. 5
Distributed system is a system designed to support the
development of applications and services which can
exploit a physical architecture consisting of multiple,
autonomous processing elements that do not share
primary memory but cooperate by sending asynchronous
messages over a communication network (Blair &
Stefani)
Distributed system is one that stops you getting any work
done when a machine you’ve never even heard of crashes
(Leslie)
Other Definitions
5
2/24/2021
6. 6
Paradigmatic Examples
The Web over the Internet.
Mobile telephony over cellular networks.
Electronic funds transfer systems over special
purpose networks
For example, between bank accounts, on credit
card purchases, via cash machines.
6
2/24/2021
7. 7
Other examples abound:
Email,
Instant messaging,
Videoconferencing,
Multiuser gaming,
Home entertainment systems,
Global positioning systems,
etc..
7
2/24/2021
8. 8
Why Distribute Systems?
Constructing a distributed system can be motivated in
many ways:
By making continuously evolving, remote resources
accessible for sharing,
By opening proprietary processes to external
interaction in order to foster cooperation,
By leading to better performance/cost ratios,
By scaling effectively and efficiently if demand for
resources changes significantly,
By attaining high levels of reliability and availability.
8
2/24/2021
9. 9
The Benefit of Scale
Ultimately, more is more.
Interconnecting many systems has increased our
ability to tackle problems that centralized systems in
sequential mode cannot solve efficiently.
More users can do more work of a more valuable
nature more effectively and more efficiently with
distributed systems than with centralized ones.
9
2/24/2021
10. 10
What is a Distributed System?
A distributed system is one which is
– independent, self-sufficient,
– often heterogeneous and autonomous,
– spatially separated
– components must use a common interconnection to
exchange information in order to
• coordinate their actions and
• Allow the whole to appear to its users as
a single coherent system.
10
2/24/2021
11. 11
Independent, SelfSufficient, Autonomous, Heterogeneous
By independent, self-sufficient we mean that each
components have its own
processor
state (i.e., memory)
resource control and management (e.g., operating
system)
By autonomous we mean that each component may change or
be changed of its own accord (agreement)(i.e., without
previous agreement or notification).
By heterogeneous we mean that different components may
have different capabilities (e.g., performance).
11
2/24/2021
12. 12
Independent, SelfSufficient, Autonomous, Heterogeneous
There are many sources of heterogeneity:
Different hardware
Different software
Different software interface
The above in combination, and more.
Such differences may cause interacting components to drift
further apart in time.
Also, failures cause components to have to deal with a gap in
their knowledge of the current system state.
Given a system, the more spatially distant its components, the
more representative of a distributed system it is.
12
2/24/2021
13. 13
Characteristics of Distributed Systems
differences between the computers and the way they
communicate are hidden from users.
users and applications can interact with a distributed
system in a consistent and uniform way regardless of
location.
distributed systems should be easy to expand and scale.
a distributed system is normally continuously available,
even if there may be partial failures
13
2/24/2021
14. 14
Why Distributed?
Resource and Data Sharing
printers, databases, multimedia servers, ...
Availability, Reliability
the loss of some instances can be hidden
Scalability, Extensibility
the system grows with demand (e.g., extra servers)
Performance
huge power (CPU, memory, ...) available
Inherent distribution, communication
organizational distribution, e-mail, video
14
2/24/2021
15. 15
Problems of Distribution
Concurrency, Security
clients must not disturb each other
Privacy
e.g., when building a preference profile
unwanted communication such as spam
Partial failure
we often do not know where the error is (e.g., RPC)
Location, Migration, Replication
clients must be able to find their servers
Heterogeneity
hardware, platforms, languages, management
15
2/24/2021
16. 16
To support heterogeneous computers and networks and
to provide a single-system view, a distributed system is
often organized by means of a layer of software called
middleware that extends over multiple machines
16
Organization of a Distributed System
2/24/2021
17. 17
A distributed system organized as middleware; note that the
middleware layer extends over multiple machines
17
2/24/2021
18. 18
A distributed system should
easily connect users with resources (printers, computers,
storage facilities, data, files, Web pages, ...)
reasons: economics, to collaborate and exchange
information
be transparent: hide the fact that the resources and
processes are distributed across multiple computers
be open
be scalable
18
Goals of a Distributed System
2/24/2021
19. 19
Transparency in a Distributed System
a distributed system that is able to present itself to users
and applications as if it were only a single computer
system is said to be transparent
2/24/2021
20. 20
Transparency Description
Access Hide differences in data representation
(endianness, file naming, ...) and how a resource
is accessed
Location Hide where a resource is physically located; where
is http://www.prenhall.com/index.html? (naming)
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another
location while in use; e.g., mobile users using their
wireless laptops
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several
competitive users; a resource must be left in a
consistent state 20
Different forms of transparency in a distributed system
2/24/2021
21. 21
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in
memory or on disk
21
2/24/2021
22. 22
A distributed system should be open.
we need well-defined interfaces.
Interoperability
components of different origin can communicate
Portability
components work on different platforms
Another goal of an open distributed system is that it should be
flexible and extensible; easy to configure the system out of
different components; easy to add new components, replace
existing ones
An Open Distributed System is a system that offers services
according to standard rules that describe the syntax and
semantics of those services; e.g., protocols in networks
22
Openness in a Distributed System
2/24/2021
23. Scalability in Distributed Systems
a distributed system should be scalable
size: adding more users and resources to the system
geographically: users and resources may be far apart
administratively: should be easy to manage even if it spans
many administrative organizations
2/24/2021 23
24. 24
examples of scalability limitations
Scalability problems
Scaling Techniques
• how to solve scaling problems
• the problem is mainly performance, and arises as a result
of limitations in the capacity of servers and networks (for
geographical scalability)
three possible solutions: hiding communication latencies,
distribution, and replication
24
2/24/2021
25. 25
try to avoid waiting for responses to remote service
requests
let the requester do other useful job
i.e., construct requesting applications that use only
asynchronous communication instead of synchronous
communication; when a reply arrives the application is
interrupted
good for batch processing and parallel applications but
not for interactive applications
for interactive applications, move part of the job to the
client to reduce communication; e.g. filling a form and
checking the entries
25
a. Hide Communication Latencies
2/24/2021
26. 26
(a) a server checking the correctness of field entries
(b) a client doing the job
• e.g., shipping code is now supported in Web applications using Java
Applets
26
2/24/2021
27. 27
– e.g., DNS - Domain Name System
– divide the name space into zones
an example of dividing the DNS name space into zones 27
b. Distribution
2/24/2021
28. 28
Replicate components across a distributed system to
increase availability and for load balancing, leading to
better performance
Decided by the owner of a resource
Caching (a special form of replication) also reduces
communication latency; decided by the user
But, caching and replication may lead to consistency
problems
28
c. Replication
2/24/2021
29. 29
Types of distribution systems
Three types distributed
• Distributed computing systems,
• Distributed information systems
• Distributed pervasive/embedded systems
29
2/24/2021
30. 30
Used for high-performance computing tasks
Two types: Cluster computing and Grid computing
Cluster Computing
A collection of similar workstations or PCs
(homogeneous), closely connected by means of a
high-speed LAN
Each node runs the same operating system
Used for parallel programming in which a single
compute intensive program is run in parallel on
multiple machines
1. Distributed Computing Systems
2/24/2021
31. 31
An example of a cluster computing system
2/24/2021
a master node runs a middleware (containing
libraries for parallel programs) and controls other
compute nodes;
it allocates tasks and provides an interface to users
,etc
ns a middleware (containing libraries for parallel programs) and controls
other compute nodes;
32. 32
Grid Computing
“Resource sharing and coordinated problem solving in
dynamic, multi-institutional virtual organizations” (I. Foster)
high degree of heterogeneity: no assumptions are made
concerning hardware, operating systems, networks,
administrative domains, security policies, etc.
2/24/2021
33. 33
2. Distributed Information Systems
Problem: many networked applications with a problem
of interoperability
At the lowest level: wrap a number of requests into a
single larger request and have it executed as a
distributed transaction
How to let applications communicate directly with
each other, i.e., Enterprise Application Integration
(EAI)
2/24/2021
34. 34
e.g., Assume the following banking operation
withdraw an amount x from account 1
deposit the amount x to account 2
what happens if there is a problem after the first activity is
carried out?
group the two operations into one transaction; either both
are carried out or neither
we need a way to roll back when a transaction is not
completed
2/24/2021
35. 35
Properties of transactions, often referred to as ACID
1. Atomic: to the outside world, the transaction happens indivisibly; a
transaction either happens completely or not at all; intermediate
states are not seen by other processes
2. Consistent: the transaction does not violate system invariants;
e.g., in an internal transfer in a bank, the amount of money in the
bank must be the same as it was before the transfer (the law of
conservation of money); this may be violated for a brief period of
time, but not seen to other processes
3. Isolated or Serializable: concurrent transactions do not interfere
with each other; if two or more transactions are running at the
same time, the final result must look as though all transactions run
sequentially in some order
4. Durable: once a transaction commits, the changes are permanent;
2/24/2021
36. 36
Enterprise Application Integration
how to integrate applications independent from their
databases
transaction systems rely on request/reply
how can applications communicate with each other
middleware as a communication facilitator in enterprise application
integration
2/24/2021
37. 37
There are different communication models
RPC (Remote procedure Call)
RMI (Remote Method Invocation)
MOM (Message-Oriented Communication)
3. Distributed Pervasive Systems
The distributed systems discussed so far are characterized
by their stability; fixed nodes having high-quality connection
to a network
There are also mobile and embedded computing devices
with wireless connections
2/24/2021
38. 38
Three requirements for pervasive applications
Embrace contextual changes: a device is aware that its
environment may change all the time
Encourage ad hoc composition: devices are used in
different ways by different users
Recognize sharing as the default: devices join a system to
access or provide information
Examples of pervasive systems
Home Systems
Electronic Health Care Systems
Sensor Networks
2/24/2021
39. 39
Hardware Concepts
• Different classification schemes exist
– multiprocessors - with shared memory
– multicomputers - that do not share memory
» can be homogeneous or heterogeneous
39
Hardware and Software Concepts
2/24/2021
41. 41
Multiprocessors - Shared Memory
the shared memory has to be coherent - the same value
written by one processor must be read by another
processor
performance problem for bus-based organization since the
bus will be overloaded as the number of processors
increases
the solution is to add a high-speed cache memory
between the processors and the bus to hold the most
recently accessed words; may result in incoherent
memory
41
2/24/2021
42. 42
42
bus-based multiprocessors are difficult to scale even with
caches
two possible solutions: crossbar switch and omega network
a bus-based multiprocessor
2/24/2021
43. 43
Crossbar switch
divide memory into modules and connect them to the
processors with a crossbar switch
at every intersection, a crosspoint switch is opened and closed
to establish connection
problem: expensive; with n CPUs and n memories, n2 switches
are required
43
2/24/2021
44. 44
Omega network
use switches with multiple input and output lines
drawback: high latency because of several switching stages
between the CPU and memory
44
2/24/2021
45. 45
– OSs in relation to distributed systems
• Tightly-coupled systems, referred to as distributed OSs
(DOS)
– the OS tries to maintain a single, global view of the resources
it manages
– used for multiprocessors and homogeneous multicomputers
• Loosely-coupled systems, referred to as network OSs
(NOS)
– a collection of computers each running its own OS; they work
together to make their services and resources available to
others
– used for heterogeneous multicomputers
– Middleware: to enhance the services of NOSs so that a better
support for distribution transparency is provided 45
Software Concepts
2/24/2021
46. 46
System Description Main Goal
DOS Tightly-coupled operating system for multi-
processors and homogeneous multicomputer
Hide and manage
hardware
resources
NOS
Loosely-coupled operating system for
heterogeneous multicomputer (LAN and
WAN)
Offer local
services to remote
clients
Middleware Additional layer atop of NOS implementing
general-purpose services
Provide
distribution
transparency
Summary of main issues
an overview of DOSs, NOSs, and middleware
46
2/24/2021
47. 47
two types
multiprocessor operating system: to manage the resources of
a multiprocessor
multicomputer operating system: for homogeneous
multicomputer
Uniprocessor Operating Systems
separating applications from operating system code through a
microkernel
47
2/24/2021
Distributed Operating Systems
48. 48
• Multiprocessor Operating Systems
extended uniprocessor operating systems to support
multiple processors having access to a shared memory
A protection mechanism is required for concurrent access
to guarantee consistency.
48
2/24/2021
49. 49
• Multicomputer Operating Systems
general structure of a multicomputer operating system
processors can not share memory; instead communication is
through message passing
each node has its own
kernel for managing local resources
separate module for handling interprocess communication
49
2/24/2021
50. 50
how to emulate shared memories on distributed systems to
provide a virtual shared memory
page-based distributed shared memory (DSM) - use the virtual
memory capabilities of each individual node
pages of address space distributed among four machines
50
2/24/2021
Distributed Shared Memory Systems
51. 51
situation if page 10 is read only and replication is used
situation after CPU 1 references page 10
read-only pages can be easily replicated
51
2/24/2021
52. 52
– possibly heterogeneous underlying hardware
– constructed from a collection of uniprocessor systems, each with
its own operating system and connected to each other in a
computer network
general structure of a network operating system
52
2/24/2021
•Network Operating Systems
53. 53
• Services offered by network operating systems
– remote login (rlogin)
– remote file copy (rcp)
– shared file systems through file servers
53
2/24/2021
54. 54
a distributed operating system is not intended to handle a
collection of independent computers but provides
transparency and ease of use
a network operating system does not provide a view of a
single coherent system but is scalable and open
combine the scalability and openness of network operating
systems and the transparency and ease of use of
distributed operating systems
this is achieved through a middleware, another layer of
software
54
2/24/2021
Middleware
56. 56
• different middleware models exist
– treat every resource as a file; just as in UNIX
– through Remote Procedure Calls (RPCs) - calling a procedure on
a remote machine
– distributed object invocation
middleware services
access transparency: by hiding the low-level message
passing
naming: such as a URL in the WWW
distributed transactions: by allowing multiple read and
write operations to occur atomically
security
56
2/24/2021
57. 57
Middleware and Openness
in an open middleware-based distributed system, the protocols
used by each middleware layer should be the same, as well as
the interfaces they offer to applications
57
2/24/2021