2. Potential Benefits, Limits and Costs
of Parallel Programming
Amdahl's Law
âą Amdahl's Law states that potential program
speedup is defined by the fraction of code (P)
that can be parallelized:
3. Potential Benefits, Limits and Costs
of Parallel Programming
Amdahl's Law
Speedup when introducing more processors
5. Amdahl's Law
âą If none of the code can be parallelized, P = 0 and the
speedup = 1 (no speedup).
âą If all of the code is parallelized, P = 1 and the
speedup is
infinite (in theory).
âą If 50% of the code can be parallelized, maximum
speedup = 2, meaning the code will run twice as fast.
âą Introducing the number of processors performing the
parallel fraction of work, the relationship can be
modeled by:
âą Where P = parallel fraction, N = number of processors
and S = serial fraction.
6. Amdahl's Law
âą It soon becomes obvious that there are limits to
the scalability of parallelism.
âą For example:
7. ï§ "Famous" quote: You can spend a
lifetime getting 95% of your code to
be parallel, and never achieve better
than 20x speedup no matter how
many processors you use!
ï§ However, certain problems
demonstrate increased performance
by increasing the problem size.
8. Complexity
ï§ In general, parallel applications are more complex than
corresponding serial applications.
ï§ Not only do you have multiple instruction streams
executing at the same time, but you also have data flowing
between them.
ï§ The costs of complexity are measured in programmer time
in virtually every aspect of the software development cycle:
ï± Design
ï± Coding
ï± Debugging
ï± Tuning
ï± Maintenance
ï§ Adhering to "good" software development practices
is essential when developing parallel applications.
9. Portability
ï§ Thanks to standardization in several APIs, such as MPI,
OpenMP and POSIX threads, portability issues with
parallel programs are not as serious as in years past.
ï§ All the usual portability issues associated with serial
programs apply to parallel programs.
ï§ Even though standards exist for several APIs,
implementations will differ in a number of details,
sometimes to the point of requiring code modifications in
order to effect portability.
ï§ Operating systems can play a key role in code portability
issues.
ï§ Hardware architectures are characteristically highly
variable and can affect portability.
10. Resource Requirements
ï The primary intent of parallel programming is to
decrease execution wall clock time, however in order to
accomplish this, more CPU time is required.
ï For example, a parallel code that runs in 1 hour on 8
processors actually uses 8 hours of CPU time.
ï The amount of memory required can be greater for
parallel codes than serial codes, due to the need to
replicate data and for overheads associated with
parallel support libraries and subsystems.
ï For short running parallel programs, there can be a
decrease in performance compared to a similar serial
implementation.
ï The overhead costs associated with setting up the
parallel environment, task creation, communications
and task termination can comprise a significant portion
of the total execution time for short runs.
11. Scalability
ï§ High performance computing (HPC) clusters
are able to solve big problems using a large
number of processors.
ï§ This is also known as parallel computing,
where many processors work
simultaneously to produce exceptional
computational power and to significantly
reduce the total computational time.
ï§ In such scenarios, scalability or scaling is
widely used to indicate the ability of
hardware and software to deliver greater
computational power when the amount of
resources is increased.
12. Scalability
ï§ For HPC clusters, it is important that they
are scalable, in other words that the
capacity of the whole system can be
proportionally increased by adding more
hardware.
ï§ For software, scalability is sometimes
referred to as parallelization efficiency â
the ratio between the actual speedup and
the ideal speedup obtained when using a
certain number of processors.
13. Software scalability
â¶ The speedup in parallel computing can be
straightforwardly defined as
speedup = t1 / tN,
where t1 is the computational time for running the
software using one processor,
and tN is the computational time running the same
software with N processors.
â¶ Ideally, we would like software to have a linear
speedup that is equal to the number of processors
(speedup = N), as that would mean that every
processor would be contributing 100% of its
computational power.
â¶ Unfortunately, this is a very challenging goal for
real applications to attain.
14. Software scalability
ï In 1967, Amdahl pointed out that the speedup is
limited by the fraction of the serial part of the
software that is not amenable to parallelization
ï Amdahlâs law can be formulated as follows
speedup = 1 / (s + p / N)
ï Amdahlâs law states that, for a fixed problem, the
upper limit of speedup is determined by the serial
fraction of the code.
ï This is called strong scaling and can be explained
by the following example.
15. Strong scalability
ï Consider a program that takes 20 hours to run using a
single processor core.
ï If a particular part of the program, which takes one hour
to execute, cannot be parallelized (s = 1/20 = 0.05), and if
the code that takes up the remaining 19 hours of
execution time can be parallelized (p = 1 â s = 0.95), then
ï Regardless of how many processors are devoted to a
parallelized execution of this program, the minimum
execution time cannot be less than that critical one hour.
ï Hence, the theoretical speedup is limited to at most 20
times (when N = â, speedup = 1/s = 20). As such, the
parallelization efficiency decreases as the amount of
resources increases.
ï For this reason, parallel computing with many
processors
is useful only for highly parallelized programs.
16. Strong scalability
Amdahlâs law gives the upper limit of speedup for a problem of fixed
size. This seems to be a bottleneck for parallel computing; if one would
like to gain a 500 times speedup on 1000 processors, Amdahlâs law
requires that the proportion of serial part cannot exceed 0.1%.
17. Scalability
ï§ The ability of a parallel program's performance to scale is
a result of a number of interrelated factors. Simply
adding more processors is rarely the answer.
ï§ The algorithm may have inherent limits to scalability. At
some point, adding more resources causes performance to
decrease. This is a common situation with many parallel
applications.
ï§ Hardware factors play a significant role in scalability.
Examples:
ï§ Memory-cpu bus bandwidth on an SMP machine
ï§ Communications network bandwidth
ï§ Amount of memory available on any given machine or
set of
machines
ï§ Processor clock speed
ï§ Parallel support libraries and subsystems software