SlideShare a Scribd company logo
Parallel Processing 1 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Overview
 Parallel Processing
 Pipelining
 Characteristics of Multiprocessors
 Interconnection Structures
 Inter processor Arbitration
 Inter processor Communication and Synchronization
Parallel Processing 2 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Coupling of Processors
Tightly Coupled System
- Tasks and/or processors communicate in a highly synchronized fashion
- Communicates through a common shared memory
- Shared memory system
Loosely Coupled System
- Tasks or processors do not communicate in a synchronized fashion
- Communicates by message passing packets
- Overhead for data exchange is high
- Distributed memory system
Parallel Processing 3 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Granularity of Parallelism
Granularity of Parallelism
Coarse-grain
- A task is broken into a handful of pieces, each of which is executed by a powerful
processor
- Processors may be heterogeneous
- Computation/communication ratio is very high
Medium-grain
- Tens to few thousands of pieces
- Processors typically run the same code
- Computation/communication ratio is often hundreds or more
Fine-grain
- Thousands to perhaps millions of small pieces, executed by very
small, simple processors or through pipelines
- Processors typically have instructions broadcasted to them
- Compute/communicate ratio often near unity
Parallel Processing 4 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Memory
Network
Processors
Memory
SHARED MEMORY
Network
Processors/Memory
DISTRIBUTED MEMORY
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations
Nonuniform (NUMA) Memory
- Memory access is not uniform
Parallel Processing 5 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Shared Memory Multiprocessors
Interconnection Network
. . .
. . .P PP
M MM
Buses,
Multistage IN,
Crossbar Switch
Characteristics
All processors have equally direct access to one large memory address space
Example systems
- Bus and cache-based systems: Sequent Balance, Encore Multimax
- Multistage IN-based systems: Ultracomputer, Butterfly, RP3, HEP
- Crossbar switch-based systems: C.mmp, Alliant FX/8
Limitations
Memory access latency; Hot spot problem
Parallel Processing 6 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Message Passing MultiProcessors
Characteristics
- Interconnected computers
- Each processor has its own memory, and communicate via message-passing
Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III
Limitations
- Communication overhead; Hard to programming
Message-Passing Network
. . .P PP
M M M. . .
Point-to-point connections
Parallel Processing 7 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Interconnection Structure
* Time-Shared Common Bus
* Multiport Memory
* Crossbar Switch
* Multistage Switching Network
* Hypercube System
Bus
All processors (and memory) are connected to a common bus or busses
- Memory access is fairly uniform, but not very scalable
Parallel Processing 8 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
BUS
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
Operations of Bus
M3 wishes to communicate with S5
[1] M3 sends signals (address) on the bus that causes
S5 to respond
[2] M3 sends data to S5 or S5 sends data to
M3(determined by the command line)
Master Device: Device that initiates and controls the communication
Slave Device: Responding device
Multiple-master buses
-> Bus conflict
-> need bus arbitration
Devices
M3 S7 M6 S5 M4
S2
Parallel Processing 9 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
System Bus Structure for Multiprocessor
Common
Shared
Memory
System
Bus
Controller
CPU IOP
Local
Memory
System
Bus
Controller
CPU
Local
Memory
System
Bus
Controller
CPU IOP
Local
Memory
Local Bus
SYSTEM BUS
Local Bus Local Bus
Parallel Processing 10 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Multi Port Memory
Multiport Memory Module
- Each port serves a CPU
Memory Module Control Logic
- Each memory module has control logic
- Resolve memory module conflicts Fixed priority among CPUs
Advantages
- Multiple paths -> high transfer rate
Disadvantages
- Memory control logic
- Large number of cables and
connections
MM 1 MM 2 MM 3 MM 4
CPU 1
CPU 2
CPU 3
CPU 4
Memory Modules
Parallel Processing 11 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Cross Bar Switch
MM1
CPU1
CPU2
CPU3
CPU4
MM2 MM3 MM4
Parallel Processing 12 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Multi Stage Switching Network
A
B
0
1
A connected to 0
A
B
0
1
A connected to 1
A
B
0
1
B connected to 0
A
B
0
1
B connected to 1
Interstage Switch
Parallel Processing 13 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
MultiStage Interconnection Network
0
1
000
001
0
1
010
011
0
1
100
101
0
1
110
111
0
1
0
1
0
1
P1
P2
8x8 Omega Switching Network
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
Binary Tree with 2 x 2 Switches
Parallel Processing 14 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
HyperCube Interconnection
- p = 2n
- processors are conceptually on the corners of a
n-dimensional hypercube, and each is directly
connected to the n neighboring nodes
- Degree = n
One-cube Two-cube Three-cube
11010
1 00 10
010
110
011 111
101
100
001
000
n-dimensional hypercube (binary n-cube)

More Related Content

PPTX
Lecture 39
PPTX
Lecture 46
PPTX
Lecture 48
PPTX
Lecture 28
PPTX
Lecture 34
PPTX
Lecture 43
PPTX
CS6303 - Computer Architecture
PPTX
Lecture 40
Lecture 39
Lecture 46
Lecture 48
Lecture 28
Lecture 34
Lecture 43
CS6303 - Computer Architecture
Lecture 40

What's hot (20)

PPT
Report in SAD
PPT
03 top level view of computer function and interconnection
PPTX
Computer architecture
PPT
Chapter 01 - Introduction
PPT
hierarchical bus system
PPTX
Control Unit (CU) – Part 2
PPTX
Computer System Architecture
PPTX
Процессорын архитектур
DOC
POLITEKNIK MALAYSIA
PPT
PPT
top level view of computer function and interconnection
PPTX
Input/Output System (Part 2)
PPTX
Computer Architecture and organization
PPT
Computer function-and-interconnection 3
PPTX
EC8791 consumer electronics-platform level performance analysis
PPT
Wk 4 top_level_view_of_computer_function_and_interconnection
PPT
16 control unit
PPT
POLITEKNIK MALAYSIA
PPTX
PPTX
Pipelining and vector processing
Report in SAD
03 top level view of computer function and interconnection
Computer architecture
Chapter 01 - Introduction
hierarchical bus system
Control Unit (CU) – Part 2
Computer System Architecture
Процессорын архитектур
POLITEKNIK MALAYSIA
top level view of computer function and interconnection
Input/Output System (Part 2)
Computer Architecture and organization
Computer function-and-interconnection 3
EC8791 consumer electronics-platform level performance analysis
Wk 4 top_level_view_of_computer_function_and_interconnection
16 control unit
POLITEKNIK MALAYSIA
Pipelining and vector processing
Ad

Viewers also liked (14)

PPTX
Lecture 41
PPTX
Lecture 11
PPTX
Computer arithmetic
PDF
Binary tree
PPTX
Lecture 16
PPTX
3D X Point Innovation by Intel Corporation inc
PPT
Magnetic tape
PPTX
Lecture 44
PPTX
Lecture 22
PPTX
Lecture 25
PPTX
Lecture 27
PPT
Intel Core i7 Processors
PPTX
Computer arithmetic
PPT
Lecture 41
Lecture 11
Computer arithmetic
Binary tree
Lecture 16
3D X Point Innovation by Intel Corporation inc
Magnetic tape
Lecture 44
Lecture 22
Lecture 25
Lecture 27
Intel Core i7 Processors
Computer arithmetic
Ad

Similar to Lecture 47 (20)

PPT
Multiprocessors
PPTX
CS304PC:Computer Organization and Architecture Session 31 Multiprogramming.pptx
PPT
Multiprocessors Characters coherence.ppt
PPTX
Multiprocessor structures
PPTX
Unit 5 lect-3-multiprocessor
PPT
Parallel processing Concepts
PPT
Parallel processing
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
PPTX
Multiprocessor system
PPT
parallel-processing.ppt
PPT
18 parallel processing
PPTX
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
PPTX
Multiprocessor Architecture (Advanced computer architecture)
PPTX
Multiprocessor
PPTX
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
PPT
Unit 6 interconnection structure
PPT
Multiprocessor_YChen.ppt
PDF
Distributed system lectures
PPTX
Parallel Processing Presentation2
PDF
Aca2 07 new
Multiprocessors
CS304PC:Computer Organization and Architecture Session 31 Multiprogramming.pptx
Multiprocessors Characters coherence.ppt
Multiprocessor structures
Unit 5 lect-3-multiprocessor
Parallel processing Concepts
Parallel processing
KA 5 - Lecture 1 - Parallel Processing.pdf
Multiprocessor system
parallel-processing.ppt
18 parallel processing
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Unit 6 interconnection structure
Multiprocessor_YChen.ppt
Distributed system lectures
Parallel Processing Presentation2
Aca2 07 new

More from RahulRathi94 (17)

PPTX
Lecture 42
PPTX
Lecture 38
PPTX
Lecture 37
PPTX
Lecture 36
PPTX
Lecture 35
PPTX
Lecture 26
PPTX
Lecture 24
PPTX
Lecture 23
PPTX
Lecture 21
PPTX
Lecture 20
PPTX
Lecture 19
PPTX
Lecture 18
PPTX
Lecture 17
PPTX
Lecture 15
PPTX
Lecture 14
PPTX
Lecture 13
PPTX
Lecture 12
Lecture 42
Lecture 38
Lecture 37
Lecture 36
Lecture 35
Lecture 26
Lecture 24
Lecture 23
Lecture 21
Lecture 20
Lecture 19
Lecture 18
Lecture 17
Lecture 15
Lecture 14
Lecture 13
Lecture 12

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
KodekX | Application Modernization Development
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Modernizing your data center with Dell and AMD
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KodekX | Application Modernization Development
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Modernizing your data center with Dell and AMD
“AI and Expert System Decision Support & Business Intelligence Systems”

Lecture 47

  • 1. Parallel Processing 1 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Overview  Parallel Processing  Pipelining  Characteristics of Multiprocessors  Interconnection Structures  Inter processor Arbitration  Inter processor Communication and Synchronization
  • 2. Parallel Processing 2 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Coupling of Processors Tightly Coupled System - Tasks and/or processors communicate in a highly synchronized fashion - Communicates through a common shared memory - Shared memory system Loosely Coupled System - Tasks or processors do not communicate in a synchronized fashion - Communicates by message passing packets - Overhead for data exchange is high - Distributed memory system
  • 3. Parallel Processing 3 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Granularity of Parallelism Granularity of Parallelism Coarse-grain - A task is broken into a handful of pieces, each of which is executed by a powerful processor - Processors may be heterogeneous - Computation/communication ratio is very high Medium-grain - Tens to few thousands of pieces - Processors typically run the same code - Computation/communication ratio is often hundreds or more Fine-grain - Thousands to perhaps millions of small pieces, executed by very small, simple processors or through pipelines - Processors typically have instructions broadcasted to them - Compute/communicate ratio often near unity
  • 4. Parallel Processing 4 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Memory Network Processors Memory SHARED MEMORY Network Processors/Memory DISTRIBUTED MEMORY Shared (Global) Memory - A Global Memory Space accessible by all processors - Processors may also have some local memory Distributed (Local, Message-Passing) Memory - All memory units are associated with processors - To retrieve information from another processor's memory a message must be sent there Uniform Memory - All processors take the same time to reach all memory locations Nonuniform (NUMA) Memory - Memory access is not uniform
  • 5. Parallel Processing 5 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Shared Memory Multiprocessors Interconnection Network . . . . . .P PP M MM Buses, Multistage IN, Crossbar Switch Characteristics All processors have equally direct access to one large memory address space Example systems - Bus and cache-based systems: Sequent Balance, Encore Multimax - Multistage IN-based systems: Ultracomputer, Butterfly, RP3, HEP - Crossbar switch-based systems: C.mmp, Alliant FX/8 Limitations Memory access latency; Hot spot problem
  • 6. Parallel Processing 6 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Message Passing MultiProcessors Characteristics - Interconnected computers - Each processor has its own memory, and communicate via message-passing Example systems - Tree structure: Teradata, DADO - Mesh-connected: Rediflow, Series 2010, J-Machine - Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III Limitations - Communication overhead; Hard to programming Message-Passing Network . . .P PP M M M. . . Point-to-point connections
  • 7. Parallel Processing 7 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Interconnection Structure * Time-Shared Common Bus * Multiport Memory * Crossbar Switch * Multistage Switching Network * Hypercube System Bus All processors (and memory) are connected to a common bus or busses - Memory access is fairly uniform, but not very scalable
  • 8. Parallel Processing 8 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT BUS - A collection of signal lines that carry module-to-module communication - Data highways connecting several digital system elements Operations of Bus M3 wishes to communicate with S5 [1] M3 sends signals (address) on the bus that causes S5 to respond [2] M3 sends data to S5 or S5 sends data to M3(determined by the command line) Master Device: Device that initiates and controls the communication Slave Device: Responding device Multiple-master buses -> Bus conflict -> need bus arbitration Devices M3 S7 M6 S5 M4 S2
  • 9. Parallel Processing 9 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT System Bus Structure for Multiprocessor Common Shared Memory System Bus Controller CPU IOP Local Memory System Bus Controller CPU Local Memory System Bus Controller CPU IOP Local Memory Local Bus SYSTEM BUS Local Bus Local Bus
  • 10. Parallel Processing 10 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Multi Port Memory Multiport Memory Module - Each port serves a CPU Memory Module Control Logic - Each memory module has control logic - Resolve memory module conflicts Fixed priority among CPUs Advantages - Multiple paths -> high transfer rate Disadvantages - Memory control logic - Large number of cables and connections MM 1 MM 2 MM 3 MM 4 CPU 1 CPU 2 CPU 3 CPU 4 Memory Modules
  • 11. Parallel Processing 11 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Cross Bar Switch MM1 CPU1 CPU2 CPU3 CPU4 MM2 MM3 MM4
  • 12. Parallel Processing 12 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT Multi Stage Switching Network A B 0 1 A connected to 0 A B 0 1 A connected to 1 A B 0 1 B connected to 0 A B 0 1 B connected to 1 Interstage Switch
  • 13. Parallel Processing 13 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT MultiStage Interconnection Network 0 1 000 001 0 1 010 011 0 1 100 101 0 1 110 111 0 1 0 1 0 1 P1 P2 8x8 Omega Switching Network 0 1 2 3 4 5 6 7 000 001 010 011 100 101 110 111 Binary Tree with 2 x 2 Switches
  • 14. Parallel Processing 14 Lecture 47 CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT HyperCube Interconnection - p = 2n - processors are conceptually on the corners of a n-dimensional hypercube, and each is directly connected to the n neighboring nodes - Degree = n One-cube Two-cube Three-cube 11010 1 00 10 010 110 011 111 101 100 001 000 n-dimensional hypercube (binary n-cube)