SlideShare a Scribd company logo
Parallel Design and Programming
Von Neumann Architecture
•Comprised of four main components:
–Memory
–Control Unit
–Arithmetic Logic Unit
–Input / Output
•Read/write, random access memory is used to store both program instructions and data
–Program instructions are coded data which tell the computer to do something
–Data is simply information to be used by the program
•Control unit fetches instructions/data from memory, decodes the instructions and
then sequentially coordinates operations to accomplish the programmed task.
•Aritmetic Unit performs basic arithmetic operations
•Input/Output is the interface to the human operator
Parallel computers still follow this basic design, just multiplied in units. The basic,
fundamental architecture remains the same.
Flynn's Classical Taxonomy
Flynn's taxonomy distinguishes multi-processor computer
architectures according to how they can be classified along the
two independent dimensions of Instruction Stream and Data
Stream. Each of these dimensions can have only one of two
possible states: Single or Multiple.
•The matrix below defines the 4 possible classifications
according to Flynn:
Single Instruction, Single Data (SISD):
1.A serial (non-parallel) computer
2.Single Instruction: Only one instruction stream is being acted on by the CPU
during any one clock cycle
3.Single Data: Only one data stream is being used as input during any one clock
cycle
4.Deterministic execution
5.This is the oldest type of computer
6.Examples: older generation mainframes, minicomputers, workstations and single
processor/core PCs.
Single Instruction, Multiple Data (SIMD):
•A type of parallel computer
•Single Instruction: All processing units execute the same instruction at any given clock cycle
•Multiple Data: Each processing unit can operate on a different data element
•Best suited for specialized problems characterized by a high degree of regularity, such as
graphics/image processing.
•Synchronous (lockstep) and deterministic execution
•Two varieties: Processor Arrays and Vector Pipelines
•Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD
instructions and execution units.
Multiple Instruction, Single Data (MISD)
•A type of parallel computer
•Multiple Instruction: Each processing unit operates on the data independently
via separate instruction streams.
•Single Data: A single data stream is fed into multiple processing units.
•Few (if any) actual examples of this class of parallel computer have ever
existed.
•Some conceivable uses might be:
–multiple cryptography algorithms attempting to crack a single coded
message.
Multiple Instruction, Multiple Data (MIMD):
•A type of parallel computer
•Multiple Instruction: Every processor may be executing a different instruction stream
•Multiple Data: Every processor may be working with a different data stream
•Execution can be synchronous or asynchronous, deterministic or non-deterministic
•Currently, the most common type of parallel computer - most modern supercomputers fall
into this category.
•Examples: most current supercomputers, networked parallel computer clusters and
"grids", multi-processor SMP computers, multi-core PCs.
•Note: many MIMD architectures also include SIMD execution sub-components
Some General Parallel Terminology
•Supercomputing / High Performance Computing (HPC) :Using the
world's fastest and largest computers to solve large problems.
•Node : A standalone "computer in a box". Usually comprised of
multiple CPUs/processors/cores, memory, network interfaces, etc.
Nodes are networked together to comprise a supercomputer.
•CPU / Socket / Processor / Core :
CPU (Central Processing Unit) was a singular execution
component for a computer.
Multiple CPUs were incorporated into a node.
Individual CPUs were subdivided into multiple "cores", each
being a unique execution unit.
CPUs with multiple cores are sometimes called "sockets" -
vendor dependent. The result is a node with multiple CPUs, each
containing multiple cores.
•Task : A logically discrete section of computational work. A task is typically a
program or program-like set of instructions that is executed by a processor. A
parallel program consists of multiple tasks running on multiple processors.
•Pipelining : Breaking a task into steps performed by different processor units,
with inputs streaming through, much like an assembly line; a type of parallel
computing.
•Shared Memory : From a strictly hardware point of view, describes a computer
architecture where all processors have direct (usually bus based) access to
common physical memory. In a programming sense, it describes a model where
parallel tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.
•Distributed Memory : In hardware, refers to network based memory access for
physical memory that is not common. As a programming model, tasks can only
logically "see" local machine memory and must use communications to access
memory on other machines where other tasks are executing
•Symmetric Multi-Processor (SMP) : Shared memory hardware architecture
where multiple processors share a single address space and have equal access to
all resources.
•Synchronization : The coordination of parallel tasks in real time, very often
associated with communications. Often implemented by establishing a synchronization
point within an application where a task may not proceed further until another task(s)
reaches the same or logically equivalent point.
•Massively Parallel : Refers to the hardware that comprises a given parallel system -
having many processing elements. The meaning of "many" keeps increasing, but
currently, the largest parallel computers are comprised of processing elements
numbering in the hundreds of thousands to millions.
•Embarrassingly Parallel : Solving many similar, but independent tasks
simultaneously; little to no need for coordination between the tasks.
•Scalability : Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the addition of more
resources. Factors that contribute to scalability include:
–Hardware - particularly memory-cpu bandwidths and network communication
properties
–Application algorithm
–Parallel overhead related
–Characteristics of your specific application
Costs of Parallel Programming
•Amdahl's Law states that potential program speedup is defined by the fraction
of code (P) that can be parallelized:
•If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup).
•If all of the code is parallelized, P = 1 and the speedup is infinite (in theory).
•If 50% of the code can be parallelized, maximum speedup = 2, meaning the
code will run twice as fast.
•Introducing the number of processors performing the parallel fraction of work,
the relationship can be modeled by:
law=
1/(1-p)
1
speedup = --------------
P
--- + S
N
where P = parallel fraction, N = number of processors and S = serial fraction.

More Related Content

PPTX
Parallel computing
PPTX
parallel processing
PPTX
Parallel architecture-programming
PPTX
Introduction to parallel processing
PPT
Parallel architecture
PPTX
Parallel computing
PPT
Parallel processing
DOCX
Parallel computing persentation
Parallel computing
parallel processing
Parallel architecture-programming
Introduction to parallel processing
Parallel architecture
Parallel computing
Parallel processing
Parallel computing persentation

What's hot (20)

PPT
Parallel computing
PPTX
Parallel computing and its applications
PDF
Intro to parallel computing
PPTX
Parallel Programing Model
PPT
Paralle programming 2
PPT
Introduction to parallel_computing
PPT
Parallel Computing
PPT
Parallel processing extra
PDF
Reduce course notes class xi
PPTX
network ram parallel computing
PPT
Parallel processing
PDF
Lecture02 types
PDF
Unit 5 Advanced Computer Architecture
PDF
2 parallel processing presentation ph d 1st semester
PDF
Cloud and distributed computing, advantages
PPTX
Parallel Programming
PPSX
Research Scope in Parallel Computing And Parallel Programming
PPTX
PPT
Parallel Computing
PPTX
Introduction To Parallel Computing
Parallel computing
Parallel computing and its applications
Intro to parallel computing
Parallel Programing Model
Paralle programming 2
Introduction to parallel_computing
Parallel Computing
Parallel processing extra
Reduce course notes class xi
network ram parallel computing
Parallel processing
Lecture02 types
Unit 5 Advanced Computer Architecture
2 parallel processing presentation ph d 1st semester
Cloud and distributed computing, advantages
Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
Parallel Computing
Introduction To Parallel Computing
Ad

Similar to Lec 2 (parallel design and programming) (20)

DOCX
Introduction to parallel computing
PPTX
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PPTX
Parallel architecture &programming
PDF
Week # 1.pdf
PPTX
intro, definitions, basic laws+.pptx
PPTX
Asynchronous and Parallel Programming in .NET
DOC
Aca module 1
PPT
Lecture 2
PPTX
Computer organisation and architecture unit 5, SRM
PPT
Par com
PPTX
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
PPTX
CA UNIT IV.pptx
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
PPT
Parallel Programming Models: Shared variable model, Message passing model, Da...
PPTX
PPTX
PDF
PPTX
High performance computing
PDF
Lecture 1 introduction to parallel and distributed computing
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Introduction to parallel computing
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
Parallel architecture &programming
Week # 1.pdf
intro, definitions, basic laws+.pptx
Asynchronous and Parallel Programming in .NET
Aca module 1
Lecture 2
Computer organisation and architecture unit 5, SRM
Par com
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
CA UNIT IV.pptx
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Parallel Programming Models: Shared variable model, Message passing model, Da...
High performance computing
Lecture 1 introduction to parallel and distributed computing
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Ad

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Lecture1 pattern recognition............
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Global journeys: estimating international migration
PDF
Introduction to Business Data Analytics.
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Foundation of Data Science unit number two notes
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx
Launch Your Data Science Career in Kochi – 2025
Lecture1 pattern recognition............
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Global journeys: estimating international migration
Introduction to Business Data Analytics.
Fluorescence-microscope_Botany_detailed content
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Major-Components-ofNKJNNKNKNKNKronment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx

Lec 2 (parallel design and programming)

  • 1. Parallel Design and Programming
  • 2. Von Neumann Architecture •Comprised of four main components: –Memory –Control Unit –Arithmetic Logic Unit –Input / Output •Read/write, random access memory is used to store both program instructions and data –Program instructions are coded data which tell the computer to do something –Data is simply information to be used by the program •Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. •Aritmetic Unit performs basic arithmetic operations •Input/Output is the interface to the human operator Parallel computers still follow this basic design, just multiplied in units. The basic, fundamental architecture remains the same.
  • 3. Flynn's Classical Taxonomy Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can have only one of two possible states: Single or Multiple. •The matrix below defines the 4 possible classifications according to Flynn:
  • 4. Single Instruction, Single Data (SISD): 1.A serial (non-parallel) computer 2.Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle 3.Single Data: Only one data stream is being used as input during any one clock cycle 4.Deterministic execution 5.This is the oldest type of computer 6.Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs.
  • 5. Single Instruction, Multiple Data (SIMD): •A type of parallel computer •Single Instruction: All processing units execute the same instruction at any given clock cycle •Multiple Data: Each processing unit can operate on a different data element •Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. •Synchronous (lockstep) and deterministic execution •Two varieties: Processor Arrays and Vector Pipelines •Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.
  • 6. Multiple Instruction, Single Data (MISD) •A type of parallel computer •Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. •Single Data: A single data stream is fed into multiple processing units. •Few (if any) actual examples of this class of parallel computer have ever existed. •Some conceivable uses might be: –multiple cryptography algorithms attempting to crack a single coded message.
  • 7. Multiple Instruction, Multiple Data (MIMD): •A type of parallel computer •Multiple Instruction: Every processor may be executing a different instruction stream •Multiple Data: Every processor may be working with a different data stream •Execution can be synchronous or asynchronous, deterministic or non-deterministic •Currently, the most common type of parallel computer - most modern supercomputers fall into this category. •Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. •Note: many MIMD architectures also include SIMD execution sub-components
  • 8. Some General Parallel Terminology •Supercomputing / High Performance Computing (HPC) :Using the world's fastest and largest computers to solve large problems. •Node : A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer. •CPU / Socket / Processor / Core : CPU (Central Processing Unit) was a singular execution component for a computer. Multiple CPUs were incorporated into a node. Individual CPUs were subdivided into multiple "cores", each being a unique execution unit. CPUs with multiple cores are sometimes called "sockets" - vendor dependent. The result is a node with multiple CPUs, each containing multiple cores.
  • 9. •Task : A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors. •Pipelining : Breaking a task into steps performed by different processor units, with inputs streaming through, much like an assembly line; a type of parallel computing. •Shared Memory : From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. •Distributed Memory : In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing •Symmetric Multi-Processor (SMP) : Shared memory hardware architecture where multiple processors share a single address space and have equal access to all resources.
  • 10. •Synchronization : The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. •Massively Parallel : Refers to the hardware that comprises a given parallel system - having many processing elements. The meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of processing elements numbering in the hundreds of thousands to millions. •Embarrassingly Parallel : Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks. •Scalability : Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more resources. Factors that contribute to scalability include: –Hardware - particularly memory-cpu bandwidths and network communication properties –Application algorithm –Parallel overhead related –Characteristics of your specific application
  • 11. Costs of Parallel Programming •Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: •If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup). •If all of the code is parallelized, P = 1 and the speedup is infinite (in theory). •If 50% of the code can be parallelized, maximum speedup = 2, meaning the code will run twice as fast. •Introducing the number of processors performing the parallel fraction of work, the relationship can be modeled by: law= 1/(1-p) 1 speedup = -------------- P --- + S N where P = parallel fraction, N = number of processors and S = serial fraction.