Lecture 47

Parallel Processing 1 Lecture 47
CSE 211, Computer Organization and Architecture Harjeet Kaur, CSE/IT
Overview
 Parallel Processing
 Pipelining
 Characteristics of Multiprocessors
 Interconnection Structures
 Inter processor Arbitration
 Inter processor Communication and Synchronization

Coupling of Processors
Tightly Coupled System
- Tasks and/or processors communicate in a highly synchronized fashion
- Communicates through a common shared memory
- Shared memory system
Loosely Coupled System
- Tasks or processors do not communicate in a synchronized fashion
- Communicates by message passing packets
- Overhead for data exchange is high
- Distributed memory system

Granularity of Parallelism
Granularity of Parallelism
Coarse-grain
- A task is broken into a handful of pieces, each of which is executed by a powerful
processor
- Processors may be heterogeneous
- Computation/communication ratio is very high
Medium-grain
- Tens to few thousands of pieces
- Processors typically run the same code
- Computation/communication ratio is often hundreds or more
Fine-grain
- Thousands to perhaps millions of small pieces, executed by very
small, simple processors or through pipelines
- Processors typically have instructions broadcasted to them
- Compute/communicate ratio often near unity

Memory
Network
Processors
Memory
SHARED MEMORY
Network
Processors/Memory
DISTRIBUTED MEMORY
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations
Nonuniform (NUMA) Memory
- Memory access is not uniform

Shared Memory Multiprocessors
Interconnection Network
. . .
. . .P PP
M MM
Buses,
Multistage IN,
Crossbar Switch
Characteristics
All processors have equally direct access to one large memory address space
Example systems
- Bus and cache-based systems: Sequent Balance, Encore Multimax
- Multistage IN-based systems: Ultracomputer, Butterfly, RP3, HEP
- Crossbar switch-based systems: C.mmp, Alliant FX/8
Limitations
Memory access latency; Hot spot problem

Message Passing MultiProcessors
Characteristics
- Interconnected computers
- Each processor has its own memory, and communicate via message-passing
Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III
Limitations
- Communication overhead; Hard to programming
Message-Passing Network
. . .P PP
M M M. . .
Point-to-point connections

Interconnection Structure
* Time-Shared Common Bus
* Multiport Memory
* Crossbar Switch
* Multistage Switching Network
* Hypercube System
Bus
All processors (and memory) are connected to a common bus or busses
- Memory access is fairly uniform, but not very scalable

BUS
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
Operations of Bus
M3 wishes to communicate with S5
[1] M3 sends signals (address) on the bus that causes
S5 to respond
[2] M3 sends data to S5 or S5 sends data to
M3(determined by the command line)
Master Device: Device that initiates and controls the communication
Slave Device: Responding device
Multiple-master buses
-> Bus conflict
-> need bus arbitration
Devices
M3 S7 M6 S5 M4
S2

System Bus Structure for Multiprocessor
Common
Shared
Memory
System
Bus
Controller
CPU IOP
Local
Memory
System
Bus
Controller
CPU
Local
Memory
System
Bus
Controller
CPU IOP
Local
Memory
Local Bus
SYSTEM BUS
Local Bus Local Bus

Multi Port Memory
Multiport Memory Module
- Each port serves a CPU
Memory Module Control Logic
- Each memory module has control logic
- Resolve memory module conflicts Fixed priority among CPUs
Advantages
- Multiple paths -> high transfer rate
Disadvantages
- Memory control logic
- Large number of cables and
connections
MM 1 MM 2 MM 3 MM 4
CPU 1
CPU 2
CPU 3
CPU 4
Memory Modules

Cross Bar Switch
MM1
CPU1
CPU2
CPU3
CPU4
MM2 MM3 MM4

Multi Stage Switching Network
A
B
0
1
A connected to 0
A
B
0
1
A connected to 1
A
B
0
1
B connected to 0
A
B
0
1
B connected to 1
Interstage Switch

MultiStage Interconnection Network
0
1
000
001
0
1
010
011
0
1
100
101
0
1
110
111
0
1
0
1
0
1
P1
P2
8x8 Omega Switching Network
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
Binary Tree with 2 x 2 Switches

HyperCube Interconnection
- p = 2n
- processors are conceptually on the corners of a
n-dimensional hypercube, and each is directly
connected to the n neighboring nodes
- Degree = n
One-cube Two-cube Three-cube
11010
1 00 10
010
110
011 111
101
100
001
000
n-dimensional hypercube (binary n-cube)

Lecture 47

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Lecture 47 (20)

More from RahulRathi94 (17)

Recently uploaded (20)

Lecture 47