SlideShare a Scribd company logo
Parallel and Distributed
Computing
CST342-3
Vajira Thambawita
Learning Outcomes
At the end of the course, the students will be able to
• - define Parallel Algorithms
• - recognize parallel speedup and performance analysis
• - identify task decomposition techniques
• - perform Parallel Programming
• - apply acceleration strategies for algorithms
Contents
• Sequential Computing, History of Parallel Computation, Flynn’s
Taxonomy, Process, threads, Pipeline, parallel models, Shared
Memory UMA,NUMA, CCUMA, Ring ,Mesh , Hypercube topologies,
Cost and Complexity analysis of the interconnection networks, Task
Partition , Data Decomposition, Task Mapping, Tasks and
Decomposition , Processes and Mapping ,Processes Versus
Processors, Granularity, processing, elements, Speedup , Efficiency ,
overhead, Practical ,Introduction to Pthered library, CUDA program ,
MPICH, Introduction to Distributed Computing, Centralized System ,
Comparison , mini Computer ,Workstation models, Process pool ,
analysis, Distributed OS, Remote procedure call ,RPC, Sun RPC,
Distributed Resource Management, Fault Tolerance
References
• Ananth,G, Anshul,G, Karypis,G and Kumar,V, 2003, Introduction to
Parallel Computing , 2nd Edition , Addison Wesley
Optional References:
• CUDA Toolkit Documentation
• Introduction to Parallel Computing, Second Edition By Ananth Grama,
Anshul Gupta, George Karypis, Vipin Kumar
• Programming on Parallel Machines, Norm Matloff
• Introduction to High Performance Computing for Scientists and
Engineers, Georg Hager, Gerhard Wellein
Evaluation
• Continuous Assessment:
• 60% - Lab assignments, Tutorials, Quizzes,
• End Semester Examination:
• 40% - 2hrs or 3hrs paper
Knowledge
• Data structures and algorithms
• C programming
History of computing
Four decades of computing
• Batch Era
• Time sharing Era
• Desktop Era
• Network Era
Batch era
• Batch processing
• Is execution of a series of programs on a computer
without manual intervention
• The term originated in the days when users entered
programs on punch cards
Time-sharing Era
• time-sharing is the sharing of a computing
resource among many users by means of
multiprogramming and multi-tasking
• Developing a system that supported multiple
users at the same time
Desktop Era
• Personal Computers (PCs)
• With WAN
Network Era
• Systems with:
• Shared memory
• Distributed memory
• Example for parallel computers: Intel iPSC, nCUBE
FLYNN's taxonomy of computer
architecture
Two types of information flow into processor:
 Instructions
 Data
what are instructions and data?
FLYNN's taxonomy of computer
architecture
1. single-instruction single-data streams (SISD)
2. single-instruction multiple-data streams (SIMD)
3. multiple-instruction single-data streams (MISD)
4. multiple-instruction multiple-data streams (MIMD)
Parallel computing?
Serial computing
Parallel computing?
Parallel Computers
• all stand-alone computers today are parallel from a hardware
perspective
Parallel Computers
• Networks connect multiple stand-alone computers (nodes) to make
larger parallel computer clusters.
Why Use Parallel Computing?
• SAVE TIME AND/OR MONEY:
Why Use Parallel Computing?
• SOLVE LARGER / MORE COMPLEX PROBLEMS
Grand Challenge Problems ?
Why Use Parallel Computing?
• PROVIDE CONCURRENCY
Why Use Parallel Computing?
• TAKE ADVANTAGE OF NON-LOCAL RESOURCES:
Why Use Parallel Computing?
• MAKE BETTER USE OF UNDERLYING PARALLEL HARDWARE
• Modern computers, even laptops, are parallel in architecture with multiple
processors/cores
BACK to Flynn's Classical Taxonomy
Single Instruction Single Data
(SISD)
• A serial (non-parallel) computer
• This is the oldest type of computer
UNIVAC1
IBM 360
CRAY1 CDC 7600 PDP1
Single Instruction Multiple Data
(SIMD)
ILLIAC IV
MasPar
Cray X-MP
Cray Y-MP
Cell Processor (GPU)
Multiple Instruction Single Data
The Space Shuttle flight control computers
Multiple Instruction Multiple Data
(MIMD)
IBM POWER5
HP/Compaq Alphaserver
Intel IA32
AMD Opteron
What are we going to learn?
Shared Memory System
• A shared memory system typically accomplishes
interprocessor coordination through a global memory shared
by all processors.
• Ex: Server systems, GPGPU
Message Passing System
(Distributed Memory)
• This kind of systems typically combine the local
memory and processor at each node of the
interconnection network
• There is no global memory
• Use message passing technique to move data from
one local memory to another
Limits and Costs of Parallel Programming
• Amdahl's Law:
Amdahl's Law states that potential program speedup is defined by the
fraction of code (P) that can be parallelized:
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
1
1 − 𝑝
• If none of the code can be parallelized, P = 0 and the speedup = 1 (no
speedup).
• If all of the code is parallelized, P = 1 and the speedup is infinite (in
theory).
Limits and Costs of Parallel Programming
• If 50% of the code can be parallelized, maximum speedup = 2,
meaning the code will run twice as fast.
Limits and Costs of Parallel Programming
• Introducing the number of processors performing the parallel fraction
of work, the relationship can be modeled by:
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 =
1
𝑃
𝑁
+ 𝑆
• where P = parallel fraction, N = number of processors and S = serial
fraction
Limits and Costs of Parallel Programming
Next
• Parallel Computer Memory Architectures

More Related Content

PDF
Introduction to Parallel Computing
PPTX
Introduction to Parallel and Distributed Computing
PPT
Distributed & parallel system
PPTX
Underlying principles of parallel and distributed computing
PPT
Parallel computing
PPT
multiprocessors and multicomputers
PPT
Parallel Computing
PPTX
Introduction to Parallel Computing
Introduction to Parallel Computing
Introduction to Parallel and Distributed Computing
Distributed & parallel system
Underlying principles of parallel and distributed computing
Parallel computing
multiprocessors and multicomputers
Parallel Computing
Introduction to Parallel Computing

What's hot (20)

PPT
Introduction to Compiler design
PDF
Resource management
PPTX
Demand paging
PPTX
Segmentation in Operating Systems.
PPTX
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
PPTX
Problem reduction AND OR GRAPH & AO* algorithm.ppt
PPTX
Advanced computer network
PPSX
Issues in Data Link Layer
PPTX
Synchronization in distributed computing
PDF
operating system structure
PPTX
CPU Scheduling in OS Presentation
PDF
Operating systems system structures
PPTX
Reference models in Networks: OSI & TCP/IP
PPTX
Parallel computing and its applications
PPTX
Distributed DBMS - Unit 6 - Query Processing
PDF
Memory management
PPTX
Memory management ppt
PPT
Operating system services 9
PPT
program partitioning and scheduling IN Advanced Computer Architecture
PDF
8 memory management strategies
Introduction to Compiler design
Resource management
Demand paging
Segmentation in Operating Systems.
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Problem reduction AND OR GRAPH & AO* algorithm.ppt
Advanced computer network
Issues in Data Link Layer
Synchronization in distributed computing
operating system structure
CPU Scheduling in OS Presentation
Operating systems system structures
Reference models in Networks: OSI & TCP/IP
Parallel computing and its applications
Distributed DBMS - Unit 6 - Query Processing
Memory management
Memory management ppt
Operating system services 9
program partitioning and scheduling IN Advanced Computer Architecture
8 memory management strategies
Ad

Similar to Lecture 1 introduction to parallel and distributed computing (20)

PPTX
intro, definitions, basic laws+.pptx
PDF
Chapter 1 - introduction - parallel computing
PPT
Par com
PDF
ODP
Distributed Computing
DOCX
Parallel computing persentation
PPTX
Parallel Computing-Part-1.pptx
PPTX
distributed system lab materials about ad
PPT
ceg4131_models.ppthjjjjjjjhhjhjhjhjhjhjhj
PPT
Chap2 GGKK.ppt
PPTX
Lec 2 (parallel design and programming)
PDF
introduction to advanced distributed system
PPTX
Asynchronous and Parallel Programming in .NET
PPT
parallel computing.ppt
PPT
Parallel Processing Concepts
PDF
2 parallel processing presentation ph d 1st semester
PPTX
20090720 smith
PPT
Parallel Programming Models: Shared variable model, Message passing model, Da...
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
intro, definitions, basic laws+.pptx
Chapter 1 - introduction - parallel computing
Par com
Distributed Computing
Parallel computing persentation
Parallel Computing-Part-1.pptx
distributed system lab materials about ad
ceg4131_models.ppthjjjjjjjhhjhjhjhjhjhjhj
Chap2 GGKK.ppt
Lec 2 (parallel design and programming)
introduction to advanced distributed system
Asynchronous and Parallel Programming in .NET
parallel computing.ppt
Parallel Processing Concepts
2 parallel processing presentation ph d 1st semester
20090720 smith
Parallel Programming Models: Shared variable model, Message passing model, Da...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Ad

More from Vajira Thambawita (20)

PDF
Lecture 4 principles of parallel algorithm design updated
PDF
Lecture 3 parallel programming platforms
PDF
Lecture 2 more about parallel computing
PDF
Lecture 12 localization and navigation
PDF
Lecture 11 neural network principles
PDF
Lecture 10 mobile robot design
PDF
Lecture 09 control
PDF
Lecture 08 robots and controllers
PDF
Lecture 07 more about pic
PDF
Lecture 06 pic programming in c
PDF
Lecture 05 pic io port programming
PDF
Lecture 04 branch call and time delay
PDF
Lecture 03 basics of pic
PDF
Lecture 02 mechatronics systems
PDF
Lecture 1 - Introduction to embedded system and Robotics
PDF
Lec 09 - Registers and Counters
PDF
Lec 08 - DESIGN PROCEDURE
PDF
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
PDF
Lec 06 - Synchronous Sequential Logic
PDF
Lec 05 - Combinational Logic
Lecture 4 principles of parallel algorithm design updated
Lecture 3 parallel programming platforms
Lecture 2 more about parallel computing
Lecture 12 localization and navigation
Lecture 11 neural network principles
Lecture 10 mobile robot design
Lecture 09 control
Lecture 08 robots and controllers
Lecture 07 more about pic
Lecture 06 pic programming in c
Lecture 05 pic io port programming
Lecture 04 branch call and time delay
Lecture 03 basics of pic
Lecture 02 mechatronics systems
Lecture 1 - Introduction to embedded system and Robotics
Lec 09 - Registers and Counters
Lec 08 - DESIGN PROCEDURE
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 06 - Synchronous Sequential Logic
Lec 05 - Combinational Logic

Recently uploaded (20)

PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Basic Mud Logging Guide for educational purpose
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Business Ethics Teaching Materials for college
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Cell Types and Its function , kingdom of life
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Complications of Minimal Access Surgery at WLH
FourierSeries-QuestionsWithAnswers(Part-A).pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Basic Mud Logging Guide for educational purpose
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Business Ethics Teaching Materials for college
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
TR - Agricultural Crops Production NC III.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
VCE English Exam - Section C Student Revision Booklet
STATICS OF THE RIGID BODIES Hibbelers.pdf
Renaissance Architecture: A Journey from Faith to Humanism
2.FourierTransform-ShortQuestionswithAnswers.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

Lecture 1 introduction to parallel and distributed computing

  • 2. Learning Outcomes At the end of the course, the students will be able to • - define Parallel Algorithms • - recognize parallel speedup and performance analysis • - identify task decomposition techniques • - perform Parallel Programming • - apply acceleration strategies for algorithms
  • 3. Contents • Sequential Computing, History of Parallel Computation, Flynn’s Taxonomy, Process, threads, Pipeline, parallel models, Shared Memory UMA,NUMA, CCUMA, Ring ,Mesh , Hypercube topologies, Cost and Complexity analysis of the interconnection networks, Task Partition , Data Decomposition, Task Mapping, Tasks and Decomposition , Processes and Mapping ,Processes Versus Processors, Granularity, processing, elements, Speedup , Efficiency , overhead, Practical ,Introduction to Pthered library, CUDA program , MPICH, Introduction to Distributed Computing, Centralized System , Comparison , mini Computer ,Workstation models, Process pool , analysis, Distributed OS, Remote procedure call ,RPC, Sun RPC, Distributed Resource Management, Fault Tolerance
  • 4. References • Ananth,G, Anshul,G, Karypis,G and Kumar,V, 2003, Introduction to Parallel Computing , 2nd Edition , Addison Wesley Optional References: • CUDA Toolkit Documentation • Introduction to Parallel Computing, Second Edition By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar • Programming on Parallel Machines, Norm Matloff • Introduction to High Performance Computing for Scientists and Engineers, Georg Hager, Gerhard Wellein
  • 5. Evaluation • Continuous Assessment: • 60% - Lab assignments, Tutorials, Quizzes, • End Semester Examination: • 40% - 2hrs or 3hrs paper
  • 6. Knowledge • Data structures and algorithms • C programming
  • 8. Four decades of computing • Batch Era • Time sharing Era • Desktop Era • Network Era
  • 9. Batch era • Batch processing • Is execution of a series of programs on a computer without manual intervention • The term originated in the days when users entered programs on punch cards
  • 10. Time-sharing Era • time-sharing is the sharing of a computing resource among many users by means of multiprogramming and multi-tasking • Developing a system that supported multiple users at the same time
  • 11. Desktop Era • Personal Computers (PCs) • With WAN
  • 12. Network Era • Systems with: • Shared memory • Distributed memory • Example for parallel computers: Intel iPSC, nCUBE
  • 13. FLYNN's taxonomy of computer architecture Two types of information flow into processor:  Instructions  Data what are instructions and data?
  • 14. FLYNN's taxonomy of computer architecture 1. single-instruction single-data streams (SISD) 2. single-instruction multiple-data streams (SIMD) 3. multiple-instruction single-data streams (MISD) 4. multiple-instruction multiple-data streams (MIMD)
  • 17. Parallel Computers • all stand-alone computers today are parallel from a hardware perspective
  • 18. Parallel Computers • Networks connect multiple stand-alone computers (nodes) to make larger parallel computer clusters.
  • 19. Why Use Parallel Computing? • SAVE TIME AND/OR MONEY:
  • 20. Why Use Parallel Computing? • SOLVE LARGER / MORE COMPLEX PROBLEMS Grand Challenge Problems ?
  • 21. Why Use Parallel Computing? • PROVIDE CONCURRENCY
  • 22. Why Use Parallel Computing? • TAKE ADVANTAGE OF NON-LOCAL RESOURCES:
  • 23. Why Use Parallel Computing? • MAKE BETTER USE OF UNDERLYING PARALLEL HARDWARE • Modern computers, even laptops, are parallel in architecture with multiple processors/cores
  • 24. BACK to Flynn's Classical Taxonomy
  • 25. Single Instruction Single Data (SISD) • A serial (non-parallel) computer • This is the oldest type of computer UNIVAC1 IBM 360 CRAY1 CDC 7600 PDP1
  • 26. Single Instruction Multiple Data (SIMD) ILLIAC IV MasPar Cray X-MP Cray Y-MP Cell Processor (GPU)
  • 27. Multiple Instruction Single Data The Space Shuttle flight control computers
  • 28. Multiple Instruction Multiple Data (MIMD) IBM POWER5 HP/Compaq Alphaserver Intel IA32 AMD Opteron
  • 29. What are we going to learn?
  • 30. Shared Memory System • A shared memory system typically accomplishes interprocessor coordination through a global memory shared by all processors. • Ex: Server systems, GPGPU
  • 31. Message Passing System (Distributed Memory) • This kind of systems typically combine the local memory and processor at each node of the interconnection network • There is no global memory • Use message passing technique to move data from one local memory to another
  • 32. Limits and Costs of Parallel Programming • Amdahl's Law: Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = 1 1 − 𝑝 • If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup). • If all of the code is parallelized, P = 1 and the speedup is infinite (in theory).
  • 33. Limits and Costs of Parallel Programming • If 50% of the code can be parallelized, maximum speedup = 2, meaning the code will run twice as fast.
  • 34. Limits and Costs of Parallel Programming • Introducing the number of processors performing the parallel fraction of work, the relationship can be modeled by: 𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 1 𝑃 𝑁 + 𝑆 • where P = parallel fraction, N = number of processors and S = serial fraction
  • 35. Limits and Costs of Parallel Programming
  • 36. Next • Parallel Computer Memory Architectures