SlideShare a Scribd company logo
Chapter 18
Parallel Processing
(Multiprocessing)
Its All About Increasing Performance
• Processor performance can be measured by the
rate at which it executes instructions
MIPS rate = f * IPC (Millions of Instructions per Second)
— f is the processor clock frequency, in MHz
— IPC the is average Instructions Per Cycle
• Increase performance by
—increasing clock frequency and
—increasing instructions that complete during cycle
– May be reaching limit 
+ Complexity
+ Power consumption
Computer Organizations
Multiprogramming and Multiprocessing
Taxonomy of Parallel Processor Architectures
Multiple Processor Organization
• SISD - Single instruction, single data stream
• SIMD - Single instruction, multiple data stream
• MISD - Multiple instruction, single data stream
• MIMD - Multiple instruction, multiple data stream
SISD - Single Instruction, Single Data Stream
• Single processor
• Single instruction stream
• Data stored in single memory
• Uni-processor
Control Unit Processing Unit Memory Unit
SIMD - Single Instruction, Multiple Data Stream
• Single machine instruction
• Number of processing elements
• Each processing element has associated data memory
• Each instruction simultaneously executed on
different set of data by different processors
Typical Application - Vector and Array processors
MISD Multiple Instruction, Single Data Stream
• One sequence of data
• A set of processors
• Each processor executes different
instruction sequence
Not much practical application
MIMD - Multiple Instruction, Multiple Data Stream
• Set of processors
• Simultaneously execute different instruction sequences
• Different sets of data
— SMPs (Symmetric Multiprocessors)
— NUMA systems (Non-uniform Memory Access)
— Clusters (Groups of “partnering” computers)
Shared memory (SMP or NUMA) Distributed memory (Clusters)
MIMD - Overview
• General purpose processors
• Each can process all instructions
necessary
• Further classified by method of processor
communication & memory access
MIMD - Tightly Coupled
• Processors share memory
• Communicate via that shared memory
Symmetric Multiprocessor (SMP)
- Share single memory or pool
- Shared bus to access memory
- Memory access time to given area of memory is
approximately the same for each processor
Nonuniform memory access (NUMA)
- Access times to different regions of memory may differ
Block Diagram of Tightly Coupled Multiprocessor
MIMD - Loosely Coupled
Clusters
• Collection of independent
uniprocessors
• Interconnected to form a cluster
• Communication via fixed path or network
connections
Symmetric Multiprocessors (SMP)
• A stand alone “computer” with the following
characteristics:
—Two or more similar processors of comparable
capacity
– All processors can perform the same functions (hence
symmetric)
—Processors share same memory and I/O access
– Memory access time is approximately the same for each
processor (time shared bus or multi-port memory)
—Processors are connected by a bus or other internal
connection
—System controlled by integrated operating system
– Providing interaction between processors
– Providing interaction at job, task, file and data element
levels
SMP Advantages
• Performance
— If some work can be done in parallel
• Availability
— Since all processors can perform the same
functions, failure of a single processor
does not halt the system
• Incremental growth
— User can enhance performance by adding
additional processors
• Scaling
— Vendors can offer range of products based
on number of processors
Symmetric Multiprocessor Organization
Time Shared Bus (vs Multiport memory)
• Simplest form
• Structure and interface similar to single
processor system
• Following features provided
—Addressing - distinguish modules on bus
—Arbitration - any module can be temporary master
—Time sharing - if one module has the bus, others must
wait and may have to suspend
Time Share Bus - Advantages
Advantages:
• Simplicity
• Flexibility
• Reliability
Disadvantages
• Performance limited by bus cycle time
• Each processor must have local cache
— Reduce number of bus accesses
• Leads to problems with cache coherence
Operating System Issues
• Simultaneous concurrent processes
• Scheduling
• Synchronization
• Memory management
• Reliability and fault tolerance
• Cache Coherence
Cache Coherence
• Problem - multiple copies of same data
in different caches
• Can result in an inconsistent view of
memory
—Write back policy can lead to inconsistency
—Write through can also give problems unless
caches monitor memory traffic
 MESI Protocol (Modify - Exclusive - Shared - Invalid )
Software Solution to Cache Coherence
Compiler and operating system deal with
problem
• Overhead transferred to compile time
• Design complexity transferred from
hardware to software
— Analyze code to determine safe periods for
caching shared variables
— However, software tends to (must) make
conservative decisions
— Inefficient cache utilization
Hardware Solution to Cache Coherence
Cache coherence hardware protocols
• Dynamic recognition of potential
problems
• Run time solution
—More efficient use of cache
• Transparent to programmer / Compiler
Implemented with:
• Directory protocols
• Snoopy protocols
Directory & Snoopy Protocols
Directory Protocols
Effective in large scale systems with complex interconnection
schemes
• Collect and maintain information about copies of data in cache
— Directory stored in main memory
• Requests are checked against directory
— Appropriate transfers are performed
Creates central bottleneck
Snoopy Protocols
Suited to bus based multiprocessor
• Distribute cache coherence responsibility among cache
controllers
• Cache recognizes that a line is shared
• Updates announced to other caches
Increases bus traffic
Snoopy Protocols
• Write Update Protocol (Write Broadcast)
—Multiple readers and writers
—Updated word is distributed to all other processors
—Multiple readers, one writer
• Write Invalidate protocol (MESI)
—When a write is required, all other caches of the line
are invalidated
—Writing processor then has exclusive (cheap) access
until line is required by another processor
MESI Protocol - State of every line is marked as Modified,
Exclusive, Shared or Invalid
- two bits are included with each cache tag
Processor Designs
• Pipelined ALU
—Within operations
—Across operations
• Parallel ALUs
• Parallel processors

More Related Content

PPT
Parallel processing extra
PPT
parallel-processing.ppt
PPT
18 parallel processing
PPT
Parallel processing
PPT
Parallel processing
PPTX
Parallel Processing
PPTX
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
PPTX
Week 13-14 Parrallel Processing-new.pptx
Parallel processing extra
parallel-processing.ppt
18 parallel processing
Parallel processing
Parallel processing
Parallel Processing
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
Week 13-14 Parrallel Processing-new.pptx

Similar to chapter-18-parallel-processing-multiprocessing (1).ppt (20)

PPTX
PPTX
PPT
Parallel processing Concepts
PPTX
Lecture4
PPTX
High performance computing
PDF
22CS201 COA
PPTX
CA UNIT IV.pptx
PPTX
Multiprocessors and Special Processors_Group9.pptx
PPTX
Multiprocessor.pptx
PPT
chapter-6-multiprocessors-and-thread-level (1).ppt
PPT
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
PPT
EMBEDDED OS
PPT
Multi processing
DOC
Aca module 1
PPTX
Introduction to parallel processing
PPTX
CSA unit5.pptx
PDF
OS_MD_4.pdf
PPTX
Classification of Parallel Computers.pptx
PDF
Advanced processor Principles
PPT
Memory Management in Operating Systems for all
Parallel processing Concepts
Lecture4
High performance computing
22CS201 COA
CA UNIT IV.pptx
Multiprocessors and Special Processors_Group9.pptx
Multiprocessor.pptx
chapter-6-multiprocessors-and-thread-level (1).ppt
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
EMBEDDED OS
Multi processing
Aca module 1
Introduction to parallel processing
CSA unit5.pptx
OS_MD_4.pdf
Classification of Parallel Computers.pptx
Advanced processor Principles
Memory Management in Operating Systems for all
Ad

More from NANDHINIS109942 (6)

PPTX
machine learning alg.pptx
PPTX
PPTX
MACHINE LEARNING.pptx
PPT
parallel processing.ppt
PPTX
PPTX
5g tech.pptx
machine learning alg.pptx
MACHINE LEARNING.pptx
parallel processing.ppt
5g tech.pptx
Ad

Recently uploaded (20)

PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Insiders guide to clinical Medicine.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
01-Introduction-to-Information-Management.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Classroom Observation Tools for Teachers
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
102 student loan defaulters named and shamed – Is someone you know on the list?
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Insiders guide to clinical Medicine.pdf
PPH.pptx obstetrics and gynecology in nursing
Final Presentation General Medicine 03-08-2024.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
VCE English Exam - Section C Student Revision Booklet
01-Introduction-to-Information-Management.pdf
Microbial disease of the cardiovascular and lymphatic systems
Anesthesia in Laparoscopic Surgery in India
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cell Types and Its function , kingdom of life
Classroom Observation Tools for Teachers
Computing-Curriculum for Schools in Ghana
Pharmacology of Heart Failure /Pharmacotherapy of CHF

chapter-18-parallel-processing-multiprocessing (1).ppt

  • 2. Its All About Increasing Performance • Processor performance can be measured by the rate at which it executes instructions MIPS rate = f * IPC (Millions of Instructions per Second) — f is the processor clock frequency, in MHz — IPC the is average Instructions Per Cycle • Increase performance by —increasing clock frequency and —increasing instructions that complete during cycle – May be reaching limit  + Complexity + Power consumption
  • 5. Taxonomy of Parallel Processor Architectures
  • 6. Multiple Processor Organization • SISD - Single instruction, single data stream • SIMD - Single instruction, multiple data stream • MISD - Multiple instruction, single data stream • MIMD - Multiple instruction, multiple data stream
  • 7. SISD - Single Instruction, Single Data Stream • Single processor • Single instruction stream • Data stored in single memory • Uni-processor Control Unit Processing Unit Memory Unit
  • 8. SIMD - Single Instruction, Multiple Data Stream • Single machine instruction • Number of processing elements • Each processing element has associated data memory • Each instruction simultaneously executed on different set of data by different processors Typical Application - Vector and Array processors
  • 9. MISD Multiple Instruction, Single Data Stream • One sequence of data • A set of processors • Each processor executes different instruction sequence Not much practical application
  • 10. MIMD - Multiple Instruction, Multiple Data Stream • Set of processors • Simultaneously execute different instruction sequences • Different sets of data — SMPs (Symmetric Multiprocessors) — NUMA systems (Non-uniform Memory Access) — Clusters (Groups of “partnering” computers) Shared memory (SMP or NUMA) Distributed memory (Clusters)
  • 11. MIMD - Overview • General purpose processors • Each can process all instructions necessary • Further classified by method of processor communication & memory access
  • 12. MIMD - Tightly Coupled • Processors share memory • Communicate via that shared memory Symmetric Multiprocessor (SMP) - Share single memory or pool - Shared bus to access memory - Memory access time to given area of memory is approximately the same for each processor Nonuniform memory access (NUMA) - Access times to different regions of memory may differ
  • 13. Block Diagram of Tightly Coupled Multiprocessor
  • 14. MIMD - Loosely Coupled Clusters • Collection of independent uniprocessors • Interconnected to form a cluster • Communication via fixed path or network connections
  • 15. Symmetric Multiprocessors (SMP) • A stand alone “computer” with the following characteristics: —Two or more similar processors of comparable capacity – All processors can perform the same functions (hence symmetric) —Processors share same memory and I/O access – Memory access time is approximately the same for each processor (time shared bus or multi-port memory) —Processors are connected by a bus or other internal connection —System controlled by integrated operating system – Providing interaction between processors – Providing interaction at job, task, file and data element levels
  • 16. SMP Advantages • Performance — If some work can be done in parallel • Availability — Since all processors can perform the same functions, failure of a single processor does not halt the system • Incremental growth — User can enhance performance by adding additional processors • Scaling — Vendors can offer range of products based on number of processors
  • 18. Time Shared Bus (vs Multiport memory) • Simplest form • Structure and interface similar to single processor system • Following features provided —Addressing - distinguish modules on bus —Arbitration - any module can be temporary master —Time sharing - if one module has the bus, others must wait and may have to suspend
  • 19. Time Share Bus - Advantages Advantages: • Simplicity • Flexibility • Reliability Disadvantages • Performance limited by bus cycle time • Each processor must have local cache — Reduce number of bus accesses • Leads to problems with cache coherence
  • 20. Operating System Issues • Simultaneous concurrent processes • Scheduling • Synchronization • Memory management • Reliability and fault tolerance • Cache Coherence
  • 21. Cache Coherence • Problem - multiple copies of same data in different caches • Can result in an inconsistent view of memory —Write back policy can lead to inconsistency —Write through can also give problems unless caches monitor memory traffic  MESI Protocol (Modify - Exclusive - Shared - Invalid )
  • 22. Software Solution to Cache Coherence Compiler and operating system deal with problem • Overhead transferred to compile time • Design complexity transferred from hardware to software — Analyze code to determine safe periods for caching shared variables — However, software tends to (must) make conservative decisions — Inefficient cache utilization
  • 23. Hardware Solution to Cache Coherence Cache coherence hardware protocols • Dynamic recognition of potential problems • Run time solution —More efficient use of cache • Transparent to programmer / Compiler Implemented with: • Directory protocols • Snoopy protocols
  • 24. Directory & Snoopy Protocols Directory Protocols Effective in large scale systems with complex interconnection schemes • Collect and maintain information about copies of data in cache — Directory stored in main memory • Requests are checked against directory — Appropriate transfers are performed Creates central bottleneck Snoopy Protocols Suited to bus based multiprocessor • Distribute cache coherence responsibility among cache controllers • Cache recognizes that a line is shared • Updates announced to other caches Increases bus traffic
  • 25. Snoopy Protocols • Write Update Protocol (Write Broadcast) —Multiple readers and writers —Updated word is distributed to all other processors —Multiple readers, one writer • Write Invalidate protocol (MESI) —When a write is required, all other caches of the line are invalidated —Writing processor then has exclusive (cheap) access until line is required by another processor MESI Protocol - State of every line is marked as Modified, Exclusive, Shared or Invalid - two bits are included with each cache tag
  • 26. Processor Designs • Pipelined ALU —Within operations —Across operations • Parallel ALUs • Parallel processors