SlideShare a Scribd company logo
PARALLEL
PROCESSING
CONCEPTS
Prof. Shashikant V. Athawale
Assistant Professor | Computer Engineering
Department | AISSMS College of Engineering,
Kennedy Road, Pune , MH, India - 411001
Contents
2
 Introduction to Parallel Computing
 Motivating Parallelism
 Scope of Parallel Computing
 Parallel Programming Platforms
 Implicit Parallelism
 Trends in Microprocessor and Architectures
 Limitations of Memory System Performance
 Dichotomy of Parallel Computing Platforms
 Physical Organization of Parallel Platforms
 Communication Costs in Parallel Machines
 Scalable design principles
 Architectures: N-wide superscalar architectures
 Multi-core architectures.
Introduction to Parallel
Computing
3
A parallel computer is a “Collection of processing
elements that communicate and co-operate to solve large
problems fast”.
Processing of multiple tasks simultaneous on
multiple processor is called parallel processing.
What is Parallel Computing?
Traditionally, software has been written for serial computation:
To be run on a single computer having a single Central Processing Unit (CPU)
What is Parallel Computing?
In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem.
Serial Vs Parallel Computing
Fetch/Store
Compute
Fetch/Store
Compute
communicate
Cooperative game
Motivating Parallelism
7
The role of parallelism in accelerating computing
speeds has been recognized for several decades.
Its role in providing multiplicity of datapaths and
increased access to storage elements has been
significant in commercial applications.
The scalable performance and lower cost of parallel
platforms is reflected in the wide variety of applications.
8
Developing parallel hardware and software has traditionally
been time and effort intensive.
If one is to view this in the context of rapidly improving
uniprocessor speeds, one is tempted to question the need for
parallel computing.
This is the result of a number of fundamental physical and
computational limitations.
The emergence of standardized parallel programming
environments, libraries, and hardware have significantly
reduced time to (parallel) solution.
In short
9
1. Overcome limits to serial computing
2. Limits to increase transistor density
3. Limits to data transmission speed
4. Faster turn-around time
5. Solve larger problems
 Parallel computing has great impact on wide range of
applications.
 Commerical
 Scientific
 Turn around time should be minimum
 High performance
 Resource mangement
 Load balencing
 Dynamic libray
 Minimum network congetion and latency
10
Scope of Parallel Computing
Applications
 Commercial computing.
- Weather forecasting
- Remote sensors, Image processing
- Process optimization, operations research.
 Scientific and Engineering application.
- Computational chemistry
- Molecular modelling
- Structure mechanics
 Business application.
- E – Governance
- Medical Imaging
 Internet applications.
- Internet server
- Digital Libraries
11
 The main objective is to provide sufficient
details to programmer to be able to write
efficient code on variety of platform.
 Performance of various parallel
algorithm.
12
Parallel Programming
Platforms
Implicit Parallelism
A programming language is said to be
implicitly parallel if its compiler or interpreter
can recognize opportunities for
parallelization and implement them without
being told to do so.
13
Implicitly parallel programming
language
 Implicitly parallel programming languages
 Microsoft Axum
 MATLAB's M-code
 ZPL
 Laboratory Virtual Instrument Engineering
Workbench (LabVIEW)
 NESL
 SISAL
 High-Performance Fortran (HPF)
14
Dichotomy of Parallel
Computing Platforms
 First explore a dichotomy based on the logical and
physical organization of parallel platforms.
 The logical organization refers to a programmer's
view of the platform while the physical organization
refers to the actual hardware organization of the
platform.
 The two critical components of parallel computing
from a programmer's perspective are ways of
expressing parallel tasks and mechanisms for
specifying interaction between these tasks.
 The former is sometimes also referred to as the
control structure and the latter as the communication
model.
15
Control Structure of Parallel Platforms
16
Parallel tasks can be specified at various levels of granularity.
At the other extreme, individual instructions within a program
can be viewed as parallel tasks. Between these extremes lie a
range of models for specifying the control structure of programs
and the corresponding architectural support for them.
Parallelism from single instruction on multiple processors
Consider the following code segment that adds two vectors:
1 for (i = 0; i < 1000; i++)
2 c[i] = a[i] + b[i];
In this example, various iterations of the loop are independent
of each other; i.e., c[0] = a[0] + b[0]; c[1] = a[1] + b[1];, etc., can all be
executed independently of each other. Consequently, if there is a mechanism for executing the same
instruction, in this case add on all the processors with appropriate data, we
could execute this loop much faster
A typical SIMD architecture (a) and a typical MIMD
architecture (b).
17
Figure A typical SIMD architecture (a) and a typical MIMD architecture (b).
Executing a conditional statement on an SIMD computer
with four processors: (a) the conditional statement; (b) the
execution of the statement in two steps
18
Communication Model of Parallel Platforms
19
Shared-Address-Space Platforms
Typical shared-address-space architectures: (a) Uniform-memory-access
shared-address-space computer; (b) Uniform-memory-access shared-
address-space computer with caches and memories; (c) Non-uniform-
memory-access shared-address-space computer with local memory only.
Message-Passing Platforms
20
The logical machine view of a message-passing platform
consists of p processing nodes.
Instances clustered workstations and non-shared-address-
space multicomputers.
On such platforms, interactions between processes running
on different nodes must be accomplished using messages,
hence the name message passing.
This exchange of messages is used to transfer data, work,
and to synchronize actions among the processes.
In its most general form, message-passing paradigms
support execution of a different program on each of the p
nodes.
Physical Organization of
Parallel Platforms
21
Architecture of an Ideal Parallel Computer
Exclusive-read, exclusive-write (EREW) PRAM. In this class,
access to a memory location is exclusive. No concurrent read or
write operations are allowed.
Concurrent-read, exclusive-write (CREW) PRAM. In this class,
multiple read accesses to a memory location are allowed.
Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write
accesses are allowed to a memory location, but multiple read
accesses are serialized.
Concurrent-read, concurrent-write (CRCW) PRAM. This class
allows multiple read and write accesses to a common memory
location. This is the most powerful PRAM model.
Interconnection Networks for Parallel Computers
▹ Interconnection networks can be classified
as static or dynamic. Static networks consist of point-
to-point communication links among processing nodes
and are also referred to as direct networks. Figure .Classification
of interconnection networks: (a) a static network; and (b) a dynamic network.
22
Network Topology
23
Linear Arrays
Linear arrays: (a) with no wraparound links; (b) with
wraparound link.
Two and three dimensional meshes: (a) 2-D mesh with no
wraparound; (b) 2-D mesh with wraparound link (2-D
torus); and (c) a 3-D mesh with no wraparound.
24
Construction of hypercubes from hypercubes of lower
dimension.
25
Tree-Based Networks
26
Complete binary tree networks: (a) a static tree network;
and (b) a dynamic tree network.
Scalable Design principles
❖ Avoid the single point of failure.
❖ Scale horizontally, not vertically.
❖ Push work as far away from the core as possible.
❖ API first.
❖ Cache everything, always.
❖ Provide as fresh as needed data.
❖ Design for maintenance and automation.
❖ Asynchronous rather than synchronous.
❖ Strive for statelessness.
N-wide superscalar architecture:
❖ Superscalar architecture is called as N-wide architecture
if it supports to fetch and dispatch of n instructions in
every cycle.
Multi-core architectures:
Multi-core architectures:
❖ Many cores fit on the single processor socket.
❖ 2)Also called Chip-Multiprocessor
❖ 3)These cores runs in parallel.
❖ 4)The architecture of a multicore processor enables
❖ communication between all available cores to ensure that
the processing tasks are divided and assigned accurately.
THANKU YOU !!!!
31

More Related Content

PPTX
Information technology
PPT
Parallel processing
PPTX
Presentation on pointer.
PPTX
Causation in epidemiology
PPT
Parallel processing
PPTX
nba ppt for inspection.pptx
PPTX
introduction to Python (for beginners)
PPT
Multiprocessor Systems
Information technology
Parallel processing
Presentation on pointer.
Causation in epidemiology
Parallel processing
nba ppt for inspection.pptx
introduction to Python (for beginners)
Multiprocessor Systems

What's hot (20)

PPTX
Dichotomy of parallel computing platforms
DOCX
Parallel computing persentation
PPTX
System interconnect architecture
PDF
Distributed Operating System_1
PPT
multiprocessors and multicomputers
PPT
Parallel computing
PDF
Lecture 4 principles of parallel algorithm design updated
PPT
Flynns classification
PDF
Ddb 1.6-design issues
PPT
Hardware and Software parallelism
PDF
Array Processor
PPTX
Distributed operating system
PPTX
Physical organization of parallel platforms
PPT
Clock synchronization in distributed system
PDF
Lecture 1 introduction to parallel and distributed computing
PPTX
Distributed database management system
PDF
Introduction to Parallel Computing
PPTX
distributed Computing system model
PPTX
Grid computing Seminar PPT
PPTX
SYNCHRONIZATION IN MULTIPROCESSING
Dichotomy of parallel computing platforms
Parallel computing persentation
System interconnect architecture
Distributed Operating System_1
multiprocessors and multicomputers
Parallel computing
Lecture 4 principles of parallel algorithm design updated
Flynns classification
Ddb 1.6-design issues
Hardware and Software parallelism
Array Processor
Distributed operating system
Physical organization of parallel platforms
Clock synchronization in distributed system
Lecture 1 introduction to parallel and distributed computing
Distributed database management system
Introduction to Parallel Computing
distributed Computing system model
Grid computing Seminar PPT
SYNCHRONIZATION IN MULTIPROCESSING
Ad

Similar to Parallel Processing Concepts (20)

PPTX
5.7 Parallel Processing - Reactive Programming.pdf.pptx
PPTX
Ca alternative architecture
PPT
Par com
PDF
The Concurrency Challenge : Notes
PPT
Introduction to parallel_computing
PPTX
Chap 1(one) general introduction
PDF
Concurrent Matrix Multiplication on Multi-core Processors
PPTX
distributed-systemsfghjjjijoijioj-chap3.pptx
PPTX
Clustering by AKASHMSHAH
PPT
Parallel Computing 2007: Overview
PPTX
Future prediction-ds
PPT
CLUSTER COMPUTING
PPTX
Complier design
PPTX
Cloud computing and distributed systems.
PPTX
Cloud computing: Parallel and distributed processing.
ODP
Distributed Computing
ODP
High-Performance Computing and OpenSolaris
PDF
Pipelining and ILP (Instruction Level Parallelism)
PDF
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
PPTX
Integrating research and e learning in advance computer architecture
5.7 Parallel Processing - Reactive Programming.pdf.pptx
Ca alternative architecture
Par com
The Concurrency Challenge : Notes
Introduction to parallel_computing
Chap 1(one) general introduction
Concurrent Matrix Multiplication on Multi-core Processors
distributed-systemsfghjjjijoijioj-chap3.pptx
Clustering by AKASHMSHAH
Parallel Computing 2007: Overview
Future prediction-ds
CLUSTER COMPUTING
Complier design
Cloud computing and distributed systems.
Cloud computing: Parallel and distributed processing.
Distributed Computing
High-Performance Computing and OpenSolaris
Pipelining and ILP (Instruction Level Parallelism)
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
Integrating research and e learning in advance computer architecture
Ad

More from Dr Shashikant Athawale (20)

PPT
multi threaded and distributed algorithms
PPT
Amortized analysis
PPT
Complexity theory
PPT
Divide and Conquer
PPT
Model and Design
PPT
Fundamental of Algorithms
PPT
CUDA Architecture
PPT
Parallel Algorithms- Sorting and Graph
PPT
Analytical Models of Parallel Programs
PPT
Basic Communication
PPT
Parallel Processing Concepts
PPT
Dynamic programming
PPT
Parallel algorithms
PPT
Greedy method
PPT
Divide and conquer
PPT
Branch and bound
PPT
Asymptotic notation
PPT
String matching algorithms
PPTX
Advanced Wireless Technologies
multi threaded and distributed algorithms
Amortized analysis
Complexity theory
Divide and Conquer
Model and Design
Fundamental of Algorithms
CUDA Architecture
Parallel Algorithms- Sorting and Graph
Analytical Models of Parallel Programs
Basic Communication
Parallel Processing Concepts
Dynamic programming
Parallel algorithms
Greedy method
Divide and conquer
Branch and bound
Asymptotic notation
String matching algorithms
Advanced Wireless Technologies

Recently uploaded (20)

PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
composite construction of structures.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
DOCX
573137875-Attendance-Management-System-original
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Geodesy 1.pptx...............................................
PDF
Structs to JSON How Go Powers REST APIs.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CH1 Production IntroductoryConcepts.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
OOP with Java - Java Introduction (Basics)
bas. eng. economics group 4 presentation 1.pptx
Lesson 3_Tessellation.pptx finite Mathematics
UNIT-1 - COAL BASED THERMAL POWER PLANTS
composite construction of structures.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
573137875-Attendance-Management-System-original
Arduino robotics embedded978-1-4302-3184-4.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
PPT on Performance Review to get promotions
Geodesy 1.pptx...............................................
Structs to JSON How Go Powers REST APIs.pdf

Parallel Processing Concepts

  • 1. PARALLEL PROCESSING CONCEPTS Prof. Shashikant V. Athawale Assistant Professor | Computer Engineering Department | AISSMS College of Engineering, Kennedy Road, Pune , MH, India - 411001
  • 2. Contents 2  Introduction to Parallel Computing  Motivating Parallelism  Scope of Parallel Computing  Parallel Programming Platforms  Implicit Parallelism  Trends in Microprocessor and Architectures  Limitations of Memory System Performance  Dichotomy of Parallel Computing Platforms  Physical Organization of Parallel Platforms  Communication Costs in Parallel Machines  Scalable design principles  Architectures: N-wide superscalar architectures  Multi-core architectures.
  • 3. Introduction to Parallel Computing 3 A parallel computer is a “Collection of processing elements that communicate and co-operate to solve large problems fast”. Processing of multiple tasks simultaneous on multiple processor is called parallel processing.
  • 4. What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer having a single Central Processing Unit (CPU)
  • 5. What is Parallel Computing? In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.
  • 6. Serial Vs Parallel Computing Fetch/Store Compute Fetch/Store Compute communicate Cooperative game
  • 7. Motivating Parallelism 7 The role of parallelism in accelerating computing speeds has been recognized for several decades. Its role in providing multiplicity of datapaths and increased access to storage elements has been significant in commercial applications. The scalable performance and lower cost of parallel platforms is reflected in the wide variety of applications.
  • 8. 8 Developing parallel hardware and software has traditionally been time and effort intensive. If one is to view this in the context of rapidly improving uniprocessor speeds, one is tempted to question the need for parallel computing. This is the result of a number of fundamental physical and computational limitations. The emergence of standardized parallel programming environments, libraries, and hardware have significantly reduced time to (parallel) solution.
  • 9. In short 9 1. Overcome limits to serial computing 2. Limits to increase transistor density 3. Limits to data transmission speed 4. Faster turn-around time 5. Solve larger problems
  • 10.  Parallel computing has great impact on wide range of applications.  Commerical  Scientific  Turn around time should be minimum  High performance  Resource mangement  Load balencing  Dynamic libray  Minimum network congetion and latency 10 Scope of Parallel Computing
  • 11. Applications  Commercial computing. - Weather forecasting - Remote sensors, Image processing - Process optimization, operations research.  Scientific and Engineering application. - Computational chemistry - Molecular modelling - Structure mechanics  Business application. - E – Governance - Medical Imaging  Internet applications. - Internet server - Digital Libraries 11
  • 12.  The main objective is to provide sufficient details to programmer to be able to write efficient code on variety of platform.  Performance of various parallel algorithm. 12 Parallel Programming Platforms
  • 13. Implicit Parallelism A programming language is said to be implicitly parallel if its compiler or interpreter can recognize opportunities for parallelization and implement them without being told to do so. 13
  • 14. Implicitly parallel programming language  Implicitly parallel programming languages  Microsoft Axum  MATLAB's M-code  ZPL  Laboratory Virtual Instrument Engineering Workbench (LabVIEW)  NESL  SISAL  High-Performance Fortran (HPF) 14
  • 15. Dichotomy of Parallel Computing Platforms  First explore a dichotomy based on the logical and physical organization of parallel platforms.  The logical organization refers to a programmer's view of the platform while the physical organization refers to the actual hardware organization of the platform.  The two critical components of parallel computing from a programmer's perspective are ways of expressing parallel tasks and mechanisms for specifying interaction between these tasks.  The former is sometimes also referred to as the control structure and the latter as the communication model. 15
  • 16. Control Structure of Parallel Platforms 16 Parallel tasks can be specified at various levels of granularity. At the other extreme, individual instructions within a program can be viewed as parallel tasks. Between these extremes lie a range of models for specifying the control structure of programs and the corresponding architectural support for them. Parallelism from single instruction on multiple processors Consider the following code segment that adds two vectors: 1 for (i = 0; i < 1000; i++) 2 c[i] = a[i] + b[i]; In this example, various iterations of the loop are independent of each other; i.e., c[0] = a[0] + b[0]; c[1] = a[1] + b[1];, etc., can all be executed independently of each other. Consequently, if there is a mechanism for executing the same instruction, in this case add on all the processors with appropriate data, we could execute this loop much faster
  • 17. A typical SIMD architecture (a) and a typical MIMD architecture (b). 17 Figure A typical SIMD architecture (a) and a typical MIMD architecture (b).
  • 18. Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps 18
  • 19. Communication Model of Parallel Platforms 19 Shared-Address-Space Platforms Typical shared-address-space architectures: (a) Uniform-memory-access shared-address-space computer; (b) Uniform-memory-access shared- address-space computer with caches and memories; (c) Non-uniform- memory-access shared-address-space computer with local memory only.
  • 20. Message-Passing Platforms 20 The logical machine view of a message-passing platform consists of p processing nodes. Instances clustered workstations and non-shared-address- space multicomputers. On such platforms, interactions between processes running on different nodes must be accomplished using messages, hence the name message passing. This exchange of messages is used to transfer data, work, and to synchronize actions among the processes. In its most general form, message-passing paradigms support execution of a different program on each of the p nodes.
  • 21. Physical Organization of Parallel Platforms 21 Architecture of an Ideal Parallel Computer Exclusive-read, exclusive-write (EREW) PRAM. In this class, access to a memory location is exclusive. No concurrent read or write operations are allowed. Concurrent-read, exclusive-write (CREW) PRAM. In this class, multiple read accesses to a memory location are allowed. Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write accesses are allowed to a memory location, but multiple read accesses are serialized. Concurrent-read, concurrent-write (CRCW) PRAM. This class allows multiple read and write accesses to a common memory location. This is the most powerful PRAM model.
  • 22. Interconnection Networks for Parallel Computers ▹ Interconnection networks can be classified as static or dynamic. Static networks consist of point- to-point communication links among processing nodes and are also referred to as direct networks. Figure .Classification of interconnection networks: (a) a static network; and (b) a dynamic network. 22
  • 23. Network Topology 23 Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
  • 24. Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound. 24
  • 25. Construction of hypercubes from hypercubes of lower dimension. 25
  • 26. Tree-Based Networks 26 Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
  • 27. Scalable Design principles ❖ Avoid the single point of failure. ❖ Scale horizontally, not vertically. ❖ Push work as far away from the core as possible. ❖ API first. ❖ Cache everything, always. ❖ Provide as fresh as needed data. ❖ Design for maintenance and automation. ❖ Asynchronous rather than synchronous. ❖ Strive for statelessness.
  • 28. N-wide superscalar architecture: ❖ Superscalar architecture is called as N-wide architecture if it supports to fetch and dispatch of n instructions in every cycle.
  • 30. Multi-core architectures: ❖ Many cores fit on the single processor socket. ❖ 2)Also called Chip-Multiprocessor ❖ 3)These cores runs in parallel. ❖ 4)The architecture of a multicore processor enables ❖ communication between all available cores to ensure that the processing tasks are divided and assigned accurately.