SlideShare a Scribd company logo
Parallel and Distributed
Computing
Unit 1
Parallel Computing
Parallel Computing
Parallel Computing: Resources
Parallel Computing: The
Computational Problem
Why Parallel Computing?
Limitations of Serial Computing
Concepts and Terminology
Basic Design
Flynn’s Classical Taxonomy
Flynn Matrix
SISD
SIMD
MISD
MIMD
Parallel Terminology
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computer Memory
Architectures
Shared Memory
Parallel Computing
Cache-only memory access(COMA)
• In these memory architectures, only cache memories are
present; no main memory is employed either in the form
of a central shared memory as in UMA machines or in
the form of a distributed main memory as in NUMA and
CC-NUMA computers.
Shared Memory: Advantages and
Disadvantages
Distributed Memory
Distributed Memory: Advantages and
Disadvantages
Hybrid Distributed-Shared Memory
Parallel Programming Models
Shared Memory Model
Threads Model
Threads Model Implementations
Threads Model: OpenMP
Message Passing Model
Message Passing Model
Implementations: MPI
Data Parallel Model
Other Models
Hybrid
Single Program Multiple Data (SPMD)
Multiple Program Multiple Data
Designing Parallel Programs
Automatic Vs Manual parallelization
Parallel Computing
Parallel Computing
Understanding the Problem and the
Program
Example of Non-Parallelizable Problem
Parallel Computing
Parallel Computing
Partitioning
Parallel Computing
Partitioning Data
Functional Decomposition
Communications
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Synchronization
Data Dependencies
Loop carried Data Dependence
How to Handle Data Dependencies?
Load Balancing
How to achieve load Balance?
Parallel Computing
Granularity
Fine-grain Parallelism
Coarse-grain Parallelism
Which is Best?
I/O
Parallel Computing
Options: Reduce overall I/O as much
as possible
Limits and Costs of Parallel
Programming
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Performance Analysis and Tuning
Systolic Architecture
Parallel Computing
Parallel Computing
Parallel Computing
Example
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
• Thus, it is answered in 2n+1 time with systolic
architechture.
RISC and CISC Architectures
Parallel Computing
CISC: History
CISC: Architecture
Parallel Computing
Characteristics of CISC
Properties of CISC
Advantages
Disadvantages
RISC: History
RISC: Architecture
Parallel Computing
Characteristics of RISC
Parallel Computing
Properties
Advantages
Disadvantages
Vector Processor
• Vector processor is basically a central processing unit
that has the ability to execute the complete vector
input in a single instruction. More specifically we can
say, it is a complete unit of hardware resources that
executes a sequential set of similar data items in the
memory using a single instruction.
• We know elements of the vector are ordered properly
so as to have successive addressing format of the
memory. This is the reason why we have mentioned
that it implements the data sequentially.
• It holds a single control unit but has multiple execution
units that perform the same operation on different data
elements of the vector.
• Unlike scalar processors that operate on only a single pair
of data, a vector processor operates on multiple pair of
data. However, one can convert a scalar code into vector
code. This conversion process is known as vectorization. So,
we can say vector processing allows operation on multiple
data elements by the help of single instruction.
• These instructions are said to be single instruction multiple
data or vector instructions. The CPU used in recent time
makes use of vector processing as it is advantageous than
scalar processing.
Architecture and Working
• The functional units of a vector computer are as
follows:
• IPU or instruction processing unit
• Vector register
• Scalar register
• Scalar processor
• Vector instruction controller
• Vector access controller
• Vector processor
• As it has several functional pipes thus it can execute the instructions over the
operands. We know that both data and instructions are present in the memory at
the desired memory location. So, the instruction processing unit i.e., IPU fetches
the instruction from the memory.
• Once the instruction is fetched then IPU determines either the fetched instruction
is scalar or vector in nature. If it is scalar in nature, then the instruction is
transferred to the scalar register and then further scalar processing is performed.
• While, when the instruction is a vector in nature then it is fed to the vector
instruction controller. This vector instruction controller first decodes the vector
instruction then accordingly determines the address of the vector operand present
in the memory.
• Then it gives a signal to the vector access controller about the demand of the
respective operand. This vector access controller then fetches the desired operand
from the memory. Once the operand is fetched then it is provided to the
instruction register so that it can be processed at the vector processor.
• At times when multiple vector instructions are present, then the vector instruction
controller provides the multiple vector instructions to the task system. And in case
the task system shows that the vector task is very long then the processor divides
the task into subvectors.
• These subvectors are fed to the vector processor that makes use of several
pipelines in order to execute the instruction over the operand fetched from the
memory at the same time.
• The various vector instructions are scheduled by the vector instruction controller.
Very Long Instruction Word (VLIW)
Architecture
• The limitations of the Superscalar processor are prominent as the difficulty of
scheduling instruction becomes complex. The intrinsic parallelism in the
instruction stream, complexity, cost, and the branch instruction issue get
resolved by a higher instruction set architecture called the Very Long
Instruction Word (VLIW) or VLIW Machines.
• VLIW uses Instruction Level Parallelism, i.e. it has programs to control the
parallel execution of the instructions.
• In other architectures, the performance of the processor is improved by using
either of the following methods: pipelining (break the instruction into
subparts), superscalar processor (independently execute the instructions in
different parts of the processor), out-of-order-execution (execute orders
differently to the program) but each of these methods add to the complexity
of the hardware very much.
• VLIW Architecture deals with it by depending on the compiler. The programs
decide the parallel flow of the instructions and to resolve conflicts. This
increases compiler complexity but decreases hardware complexity by a lot.
Features
• The processors in this architecture have multiple functional units, fetch
from the Instruction cache that have the Very Long Instruction Word.
• Multiple independent operations are grouped together in a single VLIW
Instruction. They are initialized in the same clock cycle.
• Each operation is assigned an independent functional unit.
• All the functional units share a common register file.
• Instruction words are typically of the length 64-1024 bits depending on
the number of execution unit and the code length required to control each
unit.
• Instruction scheduling and parallel dispatch of the word is done statically
by the compiler.
• The compiler checks for dependencies before scheduling parallel
execution of the instructions.
Block Diagram
Advantages
• Reduces hardware complexity.
• Reduces power consumption because of reduction of
hardware complexity.
• Since compiler takes care of data dependency check,
decoding, instruction issues, it becomes a lot simpler.
Disadvantages
• Complex compilers are required which are hard to design.
• Increased program code size.
• Larger memory bandwidth and register-file bandwidth.
SuperPipelined Architecture
• Super-pipelining is the breaking of stages of a given pipeline into smaller
stages (thus making the pipeline deeper) in an attempt to shorten the
clock period and thus enhancing the instruction throughput by keeping
more and more instructions in flight at a time.
• Superpipelining is an alternative approach to achieve greater
performance. Many pipeline stages need half a clock cycle.
Parallel Computing
Superscalar Vs superpipelined
structure
• Superscalar machines can issue several instructions per
cycle. Superpipelined machines can issue only one instruction per cycle,
but they have cycle times shorter than the time required for any
operation. Both of these techniques exploit instruction-level parallelism,
which is often limited in many applications.
• Superscalar attempts to increase performance by executing multiple
instructions in parallel. Super-pipelining seeks to improve the sequential
instruction rate, while superscalar seeks to improve the parallel
instruction rate. Most modern processors are both superscalar and super-
pipelined.
Parallel Computing
Parallel Computing
RAM Model of Computation
Parallel Computing
PRAM Properties
PRAM Models
Assumptions
More Details on the PRAM Model
PRAM CW
EXAMPLE
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing
Parallel Computing

More Related Content

PDF
Unit 5 Advanced Computer Architecture
PPTX
Advanced processor principles
PPTX
VTU 6th Sem Elective CSE - Module 3 cloud computing
PPTX
CSA unit5.pptx
PPTX
Operating system 20 threads
PPTX
PPTX
PDF
chap2_slidesforparallelcomputingananthgarama
Unit 5 Advanced Computer Architecture
Advanced processor principles
VTU 6th Sem Elective CSE - Module 3 cloud computing
CSA unit5.pptx
Operating system 20 threads
chap2_slidesforparallelcomputingananthgarama

Similar to Parallel Computing (20)

PPT
Module2 MultiThreads.ppt
PPTX
Scope of parallelism
PDF
SOC System Design Approach
PPTX
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
PPTX
Difficulties in Pipelining
PPTX
cloudcomputingmodule2virtualizationbossss
PPTX
Lecture 3 threads
PPT
Basics of micro controllers for biginners
PDF
Multithreaded Programming Part- I.pdf
DOC
Aca module 1
PPTX
Cc module 3.pptx
PPTX
Chip Multithreading Systems Need a New Operating System Scheduler
PPTX
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
PPT
Introduction to symmetric multiprocessor
PPTX
Cloud infrastructure, Virtualization tec
PPTX
eve of Virtualization and virtualization support .pptx
PPTX
Parallel Processors (SIMD)
PPTX
Parallel Processors (SIMD)
PDF
22CS201 COA
PPTX
Array Processors & Architectural Classification Schemes_Computer Architecture...
Module2 MultiThreads.ppt
Scope of parallelism
SOC System Design Approach
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
Difficulties in Pipelining
cloudcomputingmodule2virtualizationbossss
Lecture 3 threads
Basics of micro controllers for biginners
Multithreaded Programming Part- I.pdf
Aca module 1
Cc module 3.pptx
Chip Multithreading Systems Need a New Operating System Scheduler
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
Introduction to symmetric multiprocessor
Cloud infrastructure, Virtualization tec
eve of Virtualization and virtualization support .pptx
Parallel Processors (SIMD)
Parallel Processors (SIMD)
22CS201 COA
Array Processors & Architectural Classification Schemes_Computer Architecture...
Ad

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Cell Structure & Organelles in detailed.
PDF
Complications of Minimal Access Surgery at WLH
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Classroom Observation Tools for Teachers
PDF
Computing-Curriculum for Schools in Ghana
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Microbial disease of the cardiovascular and lymphatic systems
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Sports Quiz easy sports quiz sports quiz
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
O7-L3 Supply Chain Operations - ICLT Program
102 student loan defaulters named and shamed – Is someone you know on the list?
Cell Structure & Organelles in detailed.
Complications of Minimal Access Surgery at WLH
STATICS OF THE RIGID BODIES Hibbelers.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Anesthesia in Laparoscopic Surgery in India
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Module 4: Burden of Disease Tutorial Slides S2 2025
Renaissance Architecture: A Journey from Faith to Humanism
Classroom Observation Tools for Teachers
Computing-Curriculum for Schools in Ghana
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Microbial disease of the cardiovascular and lymphatic systems
Ad

Parallel Computing