SlideShare a Scribd company logo
Multiprocessors
Mr. A. B. Shinde
Electronics Engineering
Contents…
 Symmetric and distributed shared
memory architectures –
 Cache coherence issues –
 Performance issues -
 Synchronization issues-
 Models of memory consistency –
 Interconnection networks
 Buses,
 crossbar and
 multi – stage switches
2
Taxonomy of Parallel Architectures
 In 1966, Flynn proposed a simple model for categorizing all
computers.
 He used the parallelism in the instruction and data streams and
placed all computers into one of four categories:
 Single Instruction stream, Single Data stream (SISD)
 Single Instruction stream, Multiple Data streams (SIMD)
 Multiple Instruction streams, Single Data stream (MISD)
 Multiple Instruction streams, Multiple Data streams (MIMD)
3
SISD
 This category is the uniprocessor.
 In computing, SISD is a term
referring to a computer architecture in
which a single processor,
(uniprocessor) executes a single
instruction stream, to operate on data
stored in a single memory.
 This corresponds to the Von Neumann
Architecture.
 Instruction fetching and pipelined
execution of instructions are common
examples found in most modern
SISD computers.
4
SISD
 This is the oldest style of computer
architecture, and still one of the most
important: all personal computers fit
within this category
 Single instruction refers to the fact that
there is only one instruction stream
being acted on by the CPU during any
one clock tick;
 Single data means, analogously, that
one and only one data stream is being
employed as input during any one
clock tick.
5
SIMD
 The same instruction is executed by multiple processors using
different data streams.
 SIMD computers exploit data-level parallelism by applying the same
operations to multiple items of data in parallel.
 Each processor has its own data memory (hence multiple data), but
there is a single instruction memory and control processor, which
fetches and dispatches instructions.
6
SIMD
 SIMD machines are capable of applying
the exact same instruction stream to
multiple streams of data
simultaneously.
 This type of architecture is perfectly
suited to achieving very high
processing rates, as the data can be
split into many different independent
pieces, and the multiple instruction units
can all operate on them at the same time.
7
SIMD
8
SIMD Processable Patterns SIMD Unprocesable Patterns
Example: Brightness Computation by SIMD Operations
MISD
 In computing, MISD is a type of parallel
computing architecture where many
functional units perform different
operations on the same data.
 Pipeline architectures belong to this
type.
 Fault-tolerant computers executing the
same instructions redundantly in order to
detect and mask errors, in a manner
known as task replication, may be
considered to belong to this type.
 Not many instances of this
architecture exist, as MIMD and
SIMD are often more appropriate for
common data parallel techniques.
9
MISD
 Another example of a MISD process
that is carried out routinely at United
Nations.
 When a delegate speaks in a
language of his/her choice, his
speech is simultaneously translated
into a number of other languages for
the benefit of other delegates
present. Thus the delegate‘s speech
(a single data) is being processed by
a number of translators (processors)
yielding different results.
10
No commercial multiprocessor of this type has been built to date.
MIMD
 Each processor fetches its own instructions
and operates on its own data.
 MIMD computers exploit thread-level
parallelism, since multiple threads operate
in parallel.
 In general, thread-level parallelism is
more flexible than data-level parallelism
and thus more generally applicable.
 Machines using MIMD have a number of
processors that function asynchronously
and independently.
 At any time, different processors may be
executing different instructions on different
pieces of data.
11
MIMD
 Two other factors have also contributed to the rise of the MIMD
multiprocessors:
1. MIMDs offer flexibility. With the correct hardware and software
support, MIMDs can function as single-user multiprocessors.
2. MIMDs can build on the cost-performance advantages of off-the-
shelf processors. Multicore chips leverage the design investment in a
single processor core by replicating it.
12
Shared-Memory Multiprocessor
13
Basic structure of a centralized shared-memory multiprocessor
Distributed-Memory Multiprocessor
14
Basic architecture of a distributed-memory multiprocessor
Symmetric Shared-Memory Architectures
 Symmetric shared-memory machines usually support the caching of
both shared and private data.
 Private data are used by a single processor, while shared data are
used by multiple processors.
 When a private item is cached, its location is migrated to the cache,
reducing the average access time as well as the memory bandwidth
required.
15
Symmetric Shared-Memory Architectures
 When shared data are cached, the shared value may be replicated in
multiple caches.
 In addition to the reduction in access latency and required memory
bandwidth, this replication also provides a reduction in contention that
may exist for shared data items.
 Caching of shared data, introduces a new problem: Cache Coherence.
16
Multiprocessor Cache Coherence
 Cache Coherence occurs because the view of memory held by two
different processors is through their individual caches, (without any
additional precautions, could end up seeing two different values).
17
Figure illustrates the problem and shows how two different processors
can have two different values for the same location.
This difficulty is generally referred to as the cache coherence problem.
Multiprocessor Cache Coherence
 Memory system is coherent, if any read of a data item returns the
most recently written value of that data item.
 This definition, is vague and simplistic; the reality is much more complex.
 This simple definition contains two different aspects of memory system
behavior, which are critical to writing correct shared-memory programs.
 The first aspect, called coherence, defines what values can be
returned by a read.
 The second aspect, called consistency, determines when a written
value will be returned by a read.
18
Multiprocessor Cache Coherence
 A memory system is coherent if
1. A read by a processor P to a location X that follows a write by P to X,
with no writes of X by another processor occurring between the write and
the read by P, always returns the value written by P.
2. A read by a processor to location X that follows a write by another
processor to X returns the written value if the read and write are
sufficiently separated in time and no other writes to X occur between the
two accesses.
3. Writes to the same location are serialized; that is, two writes to the
same location by any two processors are seen in the same order by
all processors.
 For example: If the values 1 and then 2 are written to a location,
processors can never read the value of the location as 2 and then later
read it as 1.
19
Multiprocessor Cache Coherence
 The question of when a written value will be seen is also important.
 We cannot require that a read of X instantaneously see the value written
for X by some other processor.
 For example: A write of X on one processor precedes a read of X on
another processor by a very small time, it may be impossible to
ensure that the read returns the value of the data written, since the
written data may not even have left the processor at that point.
 The issue of exactly when a written value must be seen by a reader is
defined by a memory consistency model.
20
Multiprocessor Cache Coherence
 Coherence and consistency are complementary:
 Coherence defines the behavior of reads and writes to the same
memory location, while
 Consistency defines the behavior of reads and writes with respect to
accesses to other memory locations.
21
Multiprocessor Cache Coherence
 Coherence and Consistency are complementary:
 Make the following two assumptions:
 First: A write does not complete (and allow the next write to occur) until
all processors have seen the effect of that write.
 Second: The processor does not change the order of any write with
respect to any other memory access.
 These two conditions mean that, if a processor writes location A
followed by location B, any processor that sees the new value of B
must also see the new value of A.
 These restrictions allow the processor to reorder reads, but forces
the processor to finish a write in program order.
22
Basic Schemes for Enforcing Coherence
 The coherence problem for multiprocessors and I/O, are similar in
origin, but has different characteristics.
 In I/O, multiple data copies are a rare whereas program running on
multiple processors will normally have copies of the same data in
several caches.
 In a coherent multiprocessor, the caches provide both migration and
replication of shared data items.
 Coherent caches provide migration, since a data item can be moved
to a local cache and used there in a transparent fashion.
 This migration reduces both the latency to access a shared data item
and the bandwidth demand on the shared memory.
23
Basic Schemes for Enforcing Coherence
 Coherent caches also provide replication for shared data, since the
caches make a copy of the data item in the local cache.
 Replication reduces both latency of access and contention for a
read shared data item.
 Supporting this migration and replication is critical to performance in
accessing shared data.
 Small-scale multiprocessors adopt a hardware solution by
introducing a protocol to maintain coherent caches.
 The protocols used to maintain coherence for multiple processors
are called cache coherence protocols.
 Key to implementing a cache coherence protocol is tracking the state of
any sharing of a data block.
24
Basic Schemes for Enforcing Coherence
 There are two classes of protocols, which use different techniques
to track the sharing status:
 Directory based:
 The sharing status of a block of physical memory is kept in just one
location, called the directory.
 Directory-based coherence has slightly higher implementation
overhead than snooping, but it can scale to larger processor counts.
 The Sun T1 design, uses directories.
25
Basic Schemes for Enforcing Coherence
 There are two classes of protocols, which use different techniques
to track the sharing status:
 Snooping:
 Every cache that has a copy of the data from a block of physical
memory also has a copy of the sharing status of the block.
 The caches are all accessible via some broadcast medium (a bus or
switch), and all cache controllers monitor or snoop on the medium to
determine whether or not they have a copy of a block that is requested
on a bus or switch access.
26
Performance of Symmetric Shared-Memory
 In a multiprocessor using a snoopy coherence protocol, several
different phenomena combine to determine performance.
 The overall cache performance is a combination of the behavior of
uniprocessor cache miss traffic and the traffic caused by
communication, which results in invalidations and subsequent cache
misses.
 Changing the processor count, cache size, and block size can affect
these two components of the miss rate.
 The misses raised from interprocessor communication, called as
coherence misses, can be broken into two separate sources.
27
Performance of Symmetric Shared-Memory
 Coherence misses, has two separate sources:
 First source is the true sharing misses that arise from the
communication of data through the cache coherence mechanism.
 In an invalidation based protocol, the first write by a processor to a
shared cache block causes an invalidation to establish ownership
of that block.
 Additionally, when another processor attempts to read a modified
word in that cache block, a miss occurs and the resultant block is
transferred.
 Both these misses are classified as true sharing misses since they
directly arise from the sharing of data among processors.
28
Performance of Symmetric Shared-Memory
 Coherence misses, has two separate sources:
 Second source, is false sharing, arises from the use of an
invalidation based coherence algorithm with a single valid bit per
cache block.
 False sharing occurs when a block is invalidated because some
word in the block.
 If the word being written and the word read are different and the
invalidation does not cause a new value to be communicated, but
only causes an extra cache miss, then it is a false sharing miss.
 In a false sharing miss, the block is shared, but no word in the
cache is actually shared
29
Distributed Shared Memory
 A snooping protocol requires communication with all caches on
every cache miss, including writes of potentially shared data.
 The absence of any centralized data structure that tracks the state
of the caches is the fundamental advantage of a snooping-based
scheme.
30
Distributed Shared Memory
 For example:
 With 16 processors, a block size of 64 bytes, and a 512 KB data
cache, the total bus bandwidth demand (ignoring stall cycles) for the
four programs in the scientific/technical workload, ranges from about 4
GB/sec to about 170 GB/sec.
 In comparison, the memory bandwidth of the highest-performance
centralized shared-memory 16-way multiprocessor in 2006 was 2.4
GB/sec per processor.
 In 2006, multiprocessors with a distributed-memory model are
available with over 12 GB/sec per processor to the nearest memory.
31
Performance of Symmetric Shared-Memory
 We can increase the memory bandwidth and interconnection
bandwidth by distributing the memory as shown in figure;
 This immediately separates local memory traffic from remote
memory traffic, reducing the bandwidth demands on the memory
system and on the interconnection network.
32
Performance of Symmetric Shared-Memory
 Eliminating the need for the coherence protocol to broadcast on
every cache miss, distributing the memory will gain little in
performance.
 The alternative to a snoop-based coherence protocol is a directory
protocol.
 A directory keeps the state of every block that may be cached.
 Information in the directory includes which caches have copies of the
block, whether it is dirty, and so on.
 A directory protocol also can be used to reduce the bandwidth
demands in a centralized shared-memory machine
33
Performance of Symmetric Shared-Memory
 The simplest directory implementations associate with an entry in
the directory with each memory block.
 In such implementations, the amount of information is proportional
to the product of the number of memory blocks and the number of
processors.
 This overhead is not a problem for less than about 200 processors
because the directory overhead with a reasonable block size will be
tolerable.
 For larger multiprocessors, we need methods to allow the directory
structure to be efficiently scaled.
 The methods used either try to keep information for fewer blocks or
try to keep fewer bits per entry by using individual bits to stand for a
small collection of processors.
34
Performance of Symmetric Shared-Memory
 To prevent the directory from becoming the bottleneck, the directory
is distributed along with the memory, so that different directory
accesses can go to different directories, just as different memory
requests go to different memories.
 A distributed directory retains the characteristic that the sharing
status of a block is always in a single known location.
 This property allows the coherence protocol to avoid broadcast.
35
Synchronization
36
Synchronization: The Basics
 Synchronization mechanisms are typically built with user-level
software routines that rely on hardware-supplied synchronization
instructions.
 For smaller multiprocessors, the key hardware capability is an
uninterruptible instruction sequence capable of automatically retrieving
and changing a value. Software synchronization mechanisms are then
constructed using this capability.
 Lock and Unlock are the synchronization operations.
 Lock and Unlock can be used to create mutual exclusion, as well as
to implement more complex synchronization mechanisms.
37
Synchronization: The Basics
 Synchronization mechanisms are typically built with user-level
software routines that rely on hardware-supplied synchronization
instructions.
 In larger-scale multiprocessors, synchronization can become a
performance bottleneck because contention introduces additional
delays and because latency is potentially greater in such a
multiprocessor.
38
Synchronization: The Basics
 Basic Hardware Primitives:
 The key ability to implement synchronization in a multiprocessor is a set
of hardware primitives with the ability to automatically read and
modify a memory location.
 Without such a capability, the cost of building basic synchronization
primitives will be too high and will increase as the processor count.
39
Synchronization: The Basics
 Basic Hardware Primitives:
 These hardware primitives are the basic building blocks that are
used to build a wide variety of user-level synchronization
operations, including things such as locks and barriers.
 In general, Architects do not expect users to employ the basic
hardware primitives, but instead expect that the primitives will be
used by system programmers to build a synchronization library.
40
Synchronization: The Basics
 Basic Hardware Primitives:
 One typical operation for building synchronization operations is the
atomic exchange, which interchanges a value in a register for a value in
memory.
 Assume that we want to build a simple lock where the value 0 is used
to indicate that the lock is free and 1 is used to indicate that the lock
is unavailable.
 A processor tries to set the lock by exchanging of 1, which is in a
register, with the memory address corresponding to the lock.
 The value returned from the exchange instruction is 1 if some other
processor had already claimed access and 0 otherwise.
 In the latter case, the value is also changed to 1, preventing any
competing exchange from also retrieving a 0.
41
Models of Memory Consistency
42
Models of Memory Consistency
 Cache coherence ensures that multiple processors see a consistent
view of memory.
 Since processors communicate through shared variables, the question
arises: In what order must a processor observe the data writes of
another processor?
 “Observe the writes of another processor” through reads.
43
Models of Memory Consistency
 Consider two code segments from processes P1 and P2…
44
Assume that the processes are running on different processors, and that
locations A and B are originally cached by both processors with the initial
value of 0.
Writes always take immediate effect and are immediately seen by
other processors, it will be impossible for both if statements (labelled L1
and L2) to evaluate their conditions as true.
Models of Memory Consistency
 The question is, Should this behavior be allowed, and if so, under
what conditions?
 The most straightforward model for memory consistency is called
sequential consistency.
 Sequential consistency requires that the result of any execution be
the same as if the memory accesses executed by each processor
were kept in order and the accesses among different processors were
arbitrarily interleaved.
 Sequential consistency eliminates the possibility of some
nonobvious execution (previous example) because the assignments
must be completed before the if statements are initiated.
45
Models of Memory Consistency
 The question is, Should this behavior be allowed, and if so, under
what conditions?
 The simplest way to implement sequential consistency is to require a
processor to delay the completion of any memory access until all the
invalidations caused by that access are completed.
 Memory consistency involves operations among different variables:
The two accesses that must be ordered are actually to different memory
locations.
 In our example, we must delay the read of A or B (A == 0 or B == 0) until
the previous write has completed (B = 1 or A = 1).
46
Models of Memory Consistency
 Relaxed Consistency Models:
 The key idea in relaxed consistency models is to allow reads and
writes to complete out of order, but to use synchronization
operations to enforce ordering, so that a synchronized program
behaves as if the processor were sequentially consistent.
 There are a variety of relaxed models that are classified according to
what read and write orderings they relax.
 We specify the orderings by a set of rules of the form X→Y, meaning that
operation X must complete before operation Y is done.
 Sequential consistency requires maintaining all four possible orderings:
R→W, R→R, W→R, and W→W.
47
Models of Memory Consistency
 The relaxed models are defined by four sets of orderings they relax:
1. Relaxing the W→R ordering yields a model known as total store
ordering or processor consistency. Because this ordering retains
ordering among writes, many programs that operate under sequential
consistency operate under this model, without additional synchronization.
2. Relaxing the W→W ordering yields a model known as partial store
order.
3. Relaxing the R→W and R→R orderings yields a variety of models
including weak ordering, the PowerPC consistency model, and
release consistency, depending on the details of the ordering
restrictions and how synchronization operations enforce ordering.
48
Models of Memory Consistency
 Finally,
 At the present time, many multiprocessors being built to support
some sort of relaxed consistency model.
 Since synchronization is highly multiprocessor specific, the
expectation is that most programmers will use standard
synchronization libraries.
 With speculation much of the performance advantage of relaxed
consistency models can be obtained with sequential or processor
consistency.
 Relaxed consistency revolves around the role of the compiler and its
ability to optimize memory access to potentially shared variables.
49
Interconnection Networks
50
Interconnection Networks
 A bus is a communication pathway connecting two or more devices.
 Bus is a shared transmission medium.
 Multiple devices are connected to the bus, and a signal transmitted
by any one device is available for reception by all other devices
attached to the bus.
 If two devices transmit during the same time period, their signals will
overlap and become garbled. Thus, only one device at a time can
successfully transmit the data.
51
Interconnection Networks
 Typically, a bus consists of multiple communication pathways, or
lines.
 Each line is capable of transmitting signals representing binary 1 and
binary 0.
 For example:
 An 8-bit unit of data can be transmitted over eight bus lines.
 A bus that connects major computer components (processor,
memory, I/O) is called a system bus.
 The most common computer interconnection structures are based on the
use of one or more system buses.
52
Interconnection Networks
 Bus Structure:
53
Bus Interconnection Scheme
Interconnection Networks
 Bus Structure: Data Bus
 The data lines provide a path for moving data among system
modules is called as data bus.
 The data bus may consist of 32, 64, 128, or even more separate
lines, the number of lines being referred to as the width of the data
bus.
 Because each line can carry only 1 bit at a time, the number of lines
determines how many bits can be transferred at a time.
 The width of the data bus is a major factor in determining overall system
performance.
54
Interconnection Networks
 Bus Structure: Address Bus
 The address lines are used to designate the source or destination of
the data on the data bus.
 For example: If the processor wishes to read a word (8, 16, or 32 bits) of
data from memory, it puts the address of the desired word on the address
lines.
 The width of the address bus determines the maximum possible
memory capacity of the system.
55
Interconnection Networks
 Bus Structure: Address Bus
 The address lines are also used to address I/O ports.
 Typically, the higher-order bits are used to select a particular module
on the bus, and the lower-order bits select a memory location or I/O
port within the module.
 For example: On an 8-bit address bus, address 01111111 and below
might reference locations in a memory module (module 0) with 128
words of memory, and address 10000000 and above refer to devices
attached to an I/O module (module 1).
56
Interconnection Networks
 Bus Structure: Control Bus
 The control lines are used to control the access to and the use of
the data and address lines.
 Since, the data and address lines are shared by all components,
controlling their use becomes crucial.
 Control signals transmit both command and timing information
among system modules.
 Timing signals indicate the validity of data and address information
& Command signals specify operations to be performed.
57
Interconnection Networks
 Typical control lines include
 Memory write: Causes data on the bus to be written into the addressed
location
 Memory read: Causes data from the addressed location to be placed on
the bus
 I/O write: Causes data on the bus to be output to the addressed I/O port
 I/O read: Causes data from the addressed I/O port to be placed on the
bus
 Transfer ACK: Indicates that data have been accepted from or placed
on the bus
58
Interconnection Networks
 Typical control lines include
 Bus request: Indicates that a module needs to gain control of the bus
 Bus grant: Indicates that a requesting module has been granted control
of the bus
 Interrupt request: Indicates that an interrupt is pending
 Interrupt ACK: Acknowledges that the pending interrupt has been
recognized
 Clock: Is used to synchronize operations
 Reset: Initializes all modules
59
Interconnection Networks
60
Physical Realization of a Bus
Architecture
Interconnection Networks
 Multiple-Bus Hierarchies:
 If a great number of devices are connected to the bus, performance will
suffer. There are two main causes:
 1. In general, the more devices attached to the bus, the greater the
bus length and hence the greater the propagation delay.
This delay determines the time it takes for devices to coordinate the use
of the bus.
When control of the bus passes from one device to another frequently,
these propagation delays can noticeably affect performance.
61
Interconnection Networks
 Multiple-Bus Hierarchies:
 If a great number of devices are connected to the bus, performance will
suffer. There are two main causes:
 2. The bus may become a bottleneck as the aggregate data transfer
demand approaches the capacity of the bus.
If data rate is increased beyond the capacity then it will create bottleneck.
The data rates generated by attached devices (graphics and video
controllers, network interfaces) are growing rapidly, hence bottleneck
problem will be faced.
62
Interconnection Networks
63
Traditional bus architecture
Interconnection Networks
64
High-performance architecture
65
This presentation is published only for educational purpose
shindesir.pvp@gmail.com

More Related Content

DOCX
Parallel computing persentation
PPTX
Presentation on flynn’s classification
PPTX
Computer architecture multi processor
PPTX
Parallel processing (simd and mimd)
PPT
Parallel Processing Concepts
PPTX
6.distributed shared memory
PPTX
Flynn's Taxonomy
PDF
Hadoop & MapReduce
Parallel computing persentation
Presentation on flynn’s classification
Computer architecture multi processor
Parallel processing (simd and mimd)
Parallel Processing Concepts
6.distributed shared memory
Flynn's Taxonomy
Hadoop & MapReduce

What's hot (20)

PPTX
Paging and Segmentation in Operating System
PPTX
Operating system memory management
PPT
Multi core-architecture
PPTX
Associative memory 14208
PPTX
Distributed Shared Memory
PPTX
Distributed shred memory architecture
PPTX
Distributed operating system
PPT
Computer Organization and Architecture.
PPTX
Register organization, stack
PPTX
Superscalar & superpipeline processor
PPTX
Direct memory access
PDF
Parallelism
PDF
Array Processor
PPTX
Multiprocessor architecture
PPT
Unix.system.calls
PPTX
INSTRUCTION LEVEL PARALLALISM
PPS
Interrupts
PPTX
Distributed computing
PPT
Synchronization in distributed systems
PPT
Parallel computing
Paging and Segmentation in Operating System
Operating system memory management
Multi core-architecture
Associative memory 14208
Distributed Shared Memory
Distributed shred memory architecture
Distributed operating system
Computer Organization and Architecture.
Register organization, stack
Superscalar & superpipeline processor
Direct memory access
Parallelism
Array Processor
Multiprocessor architecture
Unix.system.calls
INSTRUCTION LEVEL PARALLALISM
Interrupts
Distributed computing
Synchronization in distributed systems
Parallel computing
Ad

Similar to Multiprocessor (20)

PPT
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
PPTX
Multiprocessor.pptx
PDF
Memory and Cache Coherence in Multiprocessor System.pdf
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
PPTX
PDF
22CS201 COA
PPT
parallel processing.ppt
PPT
chapter-18-parallel-processing-multiprocessing (1).ppt
PPTX
Introduction to Thread Level Parallelism
PPT
chapter-6-multiprocessors-and-thread-level (1).ppt
PPTX
Parallel Processing Presentation2
PDF
Distributed system lectures
PPTX
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
PPTX
Multiprocessors and Thread-Level Parallelism.pptx
PDF
PDF
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
PDF
CS6801-MULTI-CORE-ARCHITECTURE-AND-PROGRAMMING_watermark.pdf
PDF
Aca2 07 new
PPT
Executing Multiple Thread on Modern Processor
PPT
Parallel processing Concepts
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
Multiprocessor.pptx
Memory and Cache Coherence in Multiprocessor System.pdf
KA 5 - Lecture 1 - Parallel Processing.pdf
22CS201 COA
parallel processing.ppt
chapter-18-parallel-processing-multiprocessing (1).ppt
Introduction to Thread Level Parallelism
chapter-6-multiprocessors-and-thread-level (1).ppt
Parallel Processing Presentation2
Distributed system lectures
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Multiprocessors and Thread-Level Parallelism.pptx
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
CS6801-MULTI-CORE-ARCHITECTURE-AND-PROGRAMMING_watermark.pdf
Aca2 07 new
Executing Multiple Thread on Modern Processor
Parallel processing Concepts
Ad

More from Dr. A. B. Shinde (20)

PDF
Python Programming Laboratory Manual for Students
PPSX
OOPS Concepts in Python and Exception Handling
PPSX
Python Functions, Modules and Packages
PPSX
Python Data Types, Operators and Control Flow
PPSX
Introduction to Python programming language
PPSX
Communication System Basics
PPSX
MOSFETs: Single Stage IC Amplifier
PPSX
PPSX
Color Image Processing: Basics
PPSX
Edge Detection and Segmentation
PPSX
Image Processing: Spatial filters
PPSX
Image Enhancement in Spatial Domain
DOCX
Resume Format
PDF
Digital Image Fundamentals
PPSX
Resume Writing
PPSX
Image Processing Basics
PPSX
Blooms Taxonomy in Engineering Education
PPSX
ISE 7.1i Software
PDF
VHDL Coding Syntax
PDF
VHDL Programs
Python Programming Laboratory Manual for Students
OOPS Concepts in Python and Exception Handling
Python Functions, Modules and Packages
Python Data Types, Operators and Control Flow
Introduction to Python programming language
Communication System Basics
MOSFETs: Single Stage IC Amplifier
Color Image Processing: Basics
Edge Detection and Segmentation
Image Processing: Spatial filters
Image Enhancement in Spatial Domain
Resume Format
Digital Image Fundamentals
Resume Writing
Image Processing Basics
Blooms Taxonomy in Engineering Education
ISE 7.1i Software
VHDL Coding Syntax
VHDL Programs

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Digital Logic Computer Design lecture notes
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Welding lecture in detail for understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Digital Logic Computer Design lecture notes
OOP with Java - Java Introduction (Basics)
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Structs to JSON How Go Powers REST APIs.pdf
573137875-Attendance-Management-System-original
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mechanical Engineering MATERIALS Selection
Welding lecture in detail for understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
UNIT-1 - COAL BASED THERMAL POWER PLANTS
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Lecture Notes Electrical Wiring System Components
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...

Multiprocessor

  • 1. Multiprocessors Mr. A. B. Shinde Electronics Engineering
  • 2. Contents…  Symmetric and distributed shared memory architectures –  Cache coherence issues –  Performance issues -  Synchronization issues-  Models of memory consistency –  Interconnection networks  Buses,  crossbar and  multi – stage switches 2
  • 3. Taxonomy of Parallel Architectures  In 1966, Flynn proposed a simple model for categorizing all computers.  He used the parallelism in the instruction and data streams and placed all computers into one of four categories:  Single Instruction stream, Single Data stream (SISD)  Single Instruction stream, Multiple Data streams (SIMD)  Multiple Instruction streams, Single Data stream (MISD)  Multiple Instruction streams, Multiple Data streams (MIMD) 3
  • 4. SISD  This category is the uniprocessor.  In computing, SISD is a term referring to a computer architecture in which a single processor, (uniprocessor) executes a single instruction stream, to operate on data stored in a single memory.  This corresponds to the Von Neumann Architecture.  Instruction fetching and pipelined execution of instructions are common examples found in most modern SISD computers. 4
  • 5. SISD  This is the oldest style of computer architecture, and still one of the most important: all personal computers fit within this category  Single instruction refers to the fact that there is only one instruction stream being acted on by the CPU during any one clock tick;  Single data means, analogously, that one and only one data stream is being employed as input during any one clock tick. 5
  • 6. SIMD  The same instruction is executed by multiple processors using different data streams.  SIMD computers exploit data-level parallelism by applying the same operations to multiple items of data in parallel.  Each processor has its own data memory (hence multiple data), but there is a single instruction memory and control processor, which fetches and dispatches instructions. 6
  • 7. SIMD  SIMD machines are capable of applying the exact same instruction stream to multiple streams of data simultaneously.  This type of architecture is perfectly suited to achieving very high processing rates, as the data can be split into many different independent pieces, and the multiple instruction units can all operate on them at the same time. 7
  • 8. SIMD 8 SIMD Processable Patterns SIMD Unprocesable Patterns Example: Brightness Computation by SIMD Operations
  • 9. MISD  In computing, MISD is a type of parallel computing architecture where many functional units perform different operations on the same data.  Pipeline architectures belong to this type.  Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type.  Not many instances of this architecture exist, as MIMD and SIMD are often more appropriate for common data parallel techniques. 9
  • 10. MISD  Another example of a MISD process that is carried out routinely at United Nations.  When a delegate speaks in a language of his/her choice, his speech is simultaneously translated into a number of other languages for the benefit of other delegates present. Thus the delegate‘s speech (a single data) is being processed by a number of translators (processors) yielding different results. 10 No commercial multiprocessor of this type has been built to date.
  • 11. MIMD  Each processor fetches its own instructions and operates on its own data.  MIMD computers exploit thread-level parallelism, since multiple threads operate in parallel.  In general, thread-level parallelism is more flexible than data-level parallelism and thus more generally applicable.  Machines using MIMD have a number of processors that function asynchronously and independently.  At any time, different processors may be executing different instructions on different pieces of data. 11
  • 12. MIMD  Two other factors have also contributed to the rise of the MIMD multiprocessors: 1. MIMDs offer flexibility. With the correct hardware and software support, MIMDs can function as single-user multiprocessors. 2. MIMDs can build on the cost-performance advantages of off-the- shelf processors. Multicore chips leverage the design investment in a single processor core by replicating it. 12
  • 13. Shared-Memory Multiprocessor 13 Basic structure of a centralized shared-memory multiprocessor
  • 14. Distributed-Memory Multiprocessor 14 Basic architecture of a distributed-memory multiprocessor
  • 15. Symmetric Shared-Memory Architectures  Symmetric shared-memory machines usually support the caching of both shared and private data.  Private data are used by a single processor, while shared data are used by multiple processors.  When a private item is cached, its location is migrated to the cache, reducing the average access time as well as the memory bandwidth required. 15
  • 16. Symmetric Shared-Memory Architectures  When shared data are cached, the shared value may be replicated in multiple caches.  In addition to the reduction in access latency and required memory bandwidth, this replication also provides a reduction in contention that may exist for shared data items.  Caching of shared data, introduces a new problem: Cache Coherence. 16
  • 17. Multiprocessor Cache Coherence  Cache Coherence occurs because the view of memory held by two different processors is through their individual caches, (without any additional precautions, could end up seeing two different values). 17 Figure illustrates the problem and shows how two different processors can have two different values for the same location. This difficulty is generally referred to as the cache coherence problem.
  • 18. Multiprocessor Cache Coherence  Memory system is coherent, if any read of a data item returns the most recently written value of that data item.  This definition, is vague and simplistic; the reality is much more complex.  This simple definition contains two different aspects of memory system behavior, which are critical to writing correct shared-memory programs.  The first aspect, called coherence, defines what values can be returned by a read.  The second aspect, called consistency, determines when a written value will be returned by a read. 18
  • 19. Multiprocessor Cache Coherence  A memory system is coherent if 1. A read by a processor P to a location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P. 2. A read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses. 3. Writes to the same location are serialized; that is, two writes to the same location by any two processors are seen in the same order by all processors.  For example: If the values 1 and then 2 are written to a location, processors can never read the value of the location as 2 and then later read it as 1. 19
  • 20. Multiprocessor Cache Coherence  The question of when a written value will be seen is also important.  We cannot require that a read of X instantaneously see the value written for X by some other processor.  For example: A write of X on one processor precedes a read of X on another processor by a very small time, it may be impossible to ensure that the read returns the value of the data written, since the written data may not even have left the processor at that point.  The issue of exactly when a written value must be seen by a reader is defined by a memory consistency model. 20
  • 21. Multiprocessor Cache Coherence  Coherence and consistency are complementary:  Coherence defines the behavior of reads and writes to the same memory location, while  Consistency defines the behavior of reads and writes with respect to accesses to other memory locations. 21
  • 22. Multiprocessor Cache Coherence  Coherence and Consistency are complementary:  Make the following two assumptions:  First: A write does not complete (and allow the next write to occur) until all processors have seen the effect of that write.  Second: The processor does not change the order of any write with respect to any other memory access.  These two conditions mean that, if a processor writes location A followed by location B, any processor that sees the new value of B must also see the new value of A.  These restrictions allow the processor to reorder reads, but forces the processor to finish a write in program order. 22
  • 23. Basic Schemes for Enforcing Coherence  The coherence problem for multiprocessors and I/O, are similar in origin, but has different characteristics.  In I/O, multiple data copies are a rare whereas program running on multiple processors will normally have copies of the same data in several caches.  In a coherent multiprocessor, the caches provide both migration and replication of shared data items.  Coherent caches provide migration, since a data item can be moved to a local cache and used there in a transparent fashion.  This migration reduces both the latency to access a shared data item and the bandwidth demand on the shared memory. 23
  • 24. Basic Schemes for Enforcing Coherence  Coherent caches also provide replication for shared data, since the caches make a copy of the data item in the local cache.  Replication reduces both latency of access and contention for a read shared data item.  Supporting this migration and replication is critical to performance in accessing shared data.  Small-scale multiprocessors adopt a hardware solution by introducing a protocol to maintain coherent caches.  The protocols used to maintain coherence for multiple processors are called cache coherence protocols.  Key to implementing a cache coherence protocol is tracking the state of any sharing of a data block. 24
  • 25. Basic Schemes for Enforcing Coherence  There are two classes of protocols, which use different techniques to track the sharing status:  Directory based:  The sharing status of a block of physical memory is kept in just one location, called the directory.  Directory-based coherence has slightly higher implementation overhead than snooping, but it can scale to larger processor counts.  The Sun T1 design, uses directories. 25
  • 26. Basic Schemes for Enforcing Coherence  There are two classes of protocols, which use different techniques to track the sharing status:  Snooping:  Every cache that has a copy of the data from a block of physical memory also has a copy of the sharing status of the block.  The caches are all accessible via some broadcast medium (a bus or switch), and all cache controllers monitor or snoop on the medium to determine whether or not they have a copy of a block that is requested on a bus or switch access. 26
  • 27. Performance of Symmetric Shared-Memory  In a multiprocessor using a snoopy coherence protocol, several different phenomena combine to determine performance.  The overall cache performance is a combination of the behavior of uniprocessor cache miss traffic and the traffic caused by communication, which results in invalidations and subsequent cache misses.  Changing the processor count, cache size, and block size can affect these two components of the miss rate.  The misses raised from interprocessor communication, called as coherence misses, can be broken into two separate sources. 27
  • 28. Performance of Symmetric Shared-Memory  Coherence misses, has two separate sources:  First source is the true sharing misses that arise from the communication of data through the cache coherence mechanism.  In an invalidation based protocol, the first write by a processor to a shared cache block causes an invalidation to establish ownership of that block.  Additionally, when another processor attempts to read a modified word in that cache block, a miss occurs and the resultant block is transferred.  Both these misses are classified as true sharing misses since they directly arise from the sharing of data among processors. 28
  • 29. Performance of Symmetric Shared-Memory  Coherence misses, has two separate sources:  Second source, is false sharing, arises from the use of an invalidation based coherence algorithm with a single valid bit per cache block.  False sharing occurs when a block is invalidated because some word in the block.  If the word being written and the word read are different and the invalidation does not cause a new value to be communicated, but only causes an extra cache miss, then it is a false sharing miss.  In a false sharing miss, the block is shared, but no word in the cache is actually shared 29
  • 30. Distributed Shared Memory  A snooping protocol requires communication with all caches on every cache miss, including writes of potentially shared data.  The absence of any centralized data structure that tracks the state of the caches is the fundamental advantage of a snooping-based scheme. 30
  • 31. Distributed Shared Memory  For example:  With 16 processors, a block size of 64 bytes, and a 512 KB data cache, the total bus bandwidth demand (ignoring stall cycles) for the four programs in the scientific/technical workload, ranges from about 4 GB/sec to about 170 GB/sec.  In comparison, the memory bandwidth of the highest-performance centralized shared-memory 16-way multiprocessor in 2006 was 2.4 GB/sec per processor.  In 2006, multiprocessors with a distributed-memory model are available with over 12 GB/sec per processor to the nearest memory. 31
  • 32. Performance of Symmetric Shared-Memory  We can increase the memory bandwidth and interconnection bandwidth by distributing the memory as shown in figure;  This immediately separates local memory traffic from remote memory traffic, reducing the bandwidth demands on the memory system and on the interconnection network. 32
  • 33. Performance of Symmetric Shared-Memory  Eliminating the need for the coherence protocol to broadcast on every cache miss, distributing the memory will gain little in performance.  The alternative to a snoop-based coherence protocol is a directory protocol.  A directory keeps the state of every block that may be cached.  Information in the directory includes which caches have copies of the block, whether it is dirty, and so on.  A directory protocol also can be used to reduce the bandwidth demands in a centralized shared-memory machine 33
  • 34. Performance of Symmetric Shared-Memory  The simplest directory implementations associate with an entry in the directory with each memory block.  In such implementations, the amount of information is proportional to the product of the number of memory blocks and the number of processors.  This overhead is not a problem for less than about 200 processors because the directory overhead with a reasonable block size will be tolerable.  For larger multiprocessors, we need methods to allow the directory structure to be efficiently scaled.  The methods used either try to keep information for fewer blocks or try to keep fewer bits per entry by using individual bits to stand for a small collection of processors. 34
  • 35. Performance of Symmetric Shared-Memory  To prevent the directory from becoming the bottleneck, the directory is distributed along with the memory, so that different directory accesses can go to different directories, just as different memory requests go to different memories.  A distributed directory retains the characteristic that the sharing status of a block is always in a single known location.  This property allows the coherence protocol to avoid broadcast. 35
  • 37. Synchronization: The Basics  Synchronization mechanisms are typically built with user-level software routines that rely on hardware-supplied synchronization instructions.  For smaller multiprocessors, the key hardware capability is an uninterruptible instruction sequence capable of automatically retrieving and changing a value. Software synchronization mechanisms are then constructed using this capability.  Lock and Unlock are the synchronization operations.  Lock and Unlock can be used to create mutual exclusion, as well as to implement more complex synchronization mechanisms. 37
  • 38. Synchronization: The Basics  Synchronization mechanisms are typically built with user-level software routines that rely on hardware-supplied synchronization instructions.  In larger-scale multiprocessors, synchronization can become a performance bottleneck because contention introduces additional delays and because latency is potentially greater in such a multiprocessor. 38
  • 39. Synchronization: The Basics  Basic Hardware Primitives:  The key ability to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to automatically read and modify a memory location.  Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count. 39
  • 40. Synchronization: The Basics  Basic Hardware Primitives:  These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers.  In general, Architects do not expect users to employ the basic hardware primitives, but instead expect that the primitives will be used by system programmers to build a synchronization library. 40
  • 41. Synchronization: The Basics  Basic Hardware Primitives:  One typical operation for building synchronization operations is the atomic exchange, which interchanges a value in a register for a value in memory.  Assume that we want to build a simple lock where the value 0 is used to indicate that the lock is free and 1 is used to indicate that the lock is unavailable.  A processor tries to set the lock by exchanging of 1, which is in a register, with the memory address corresponding to the lock.  The value returned from the exchange instruction is 1 if some other processor had already claimed access and 0 otherwise.  In the latter case, the value is also changed to 1, preventing any competing exchange from also retrieving a 0. 41
  • 42. Models of Memory Consistency 42
  • 43. Models of Memory Consistency  Cache coherence ensures that multiple processors see a consistent view of memory.  Since processors communicate through shared variables, the question arises: In what order must a processor observe the data writes of another processor?  “Observe the writes of another processor” through reads. 43
  • 44. Models of Memory Consistency  Consider two code segments from processes P1 and P2… 44 Assume that the processes are running on different processors, and that locations A and B are originally cached by both processors with the initial value of 0. Writes always take immediate effect and are immediately seen by other processors, it will be impossible for both if statements (labelled L1 and L2) to evaluate their conditions as true.
  • 45. Models of Memory Consistency  The question is, Should this behavior be allowed, and if so, under what conditions?  The most straightforward model for memory consistency is called sequential consistency.  Sequential consistency requires that the result of any execution be the same as if the memory accesses executed by each processor were kept in order and the accesses among different processors were arbitrarily interleaved.  Sequential consistency eliminates the possibility of some nonobvious execution (previous example) because the assignments must be completed before the if statements are initiated. 45
  • 46. Models of Memory Consistency  The question is, Should this behavior be allowed, and if so, under what conditions?  The simplest way to implement sequential consistency is to require a processor to delay the completion of any memory access until all the invalidations caused by that access are completed.  Memory consistency involves operations among different variables: The two accesses that must be ordered are actually to different memory locations.  In our example, we must delay the read of A or B (A == 0 or B == 0) until the previous write has completed (B = 1 or A = 1). 46
  • 47. Models of Memory Consistency  Relaxed Consistency Models:  The key idea in relaxed consistency models is to allow reads and writes to complete out of order, but to use synchronization operations to enforce ordering, so that a synchronized program behaves as if the processor were sequentially consistent.  There are a variety of relaxed models that are classified according to what read and write orderings they relax.  We specify the orderings by a set of rules of the form X→Y, meaning that operation X must complete before operation Y is done.  Sequential consistency requires maintaining all four possible orderings: R→W, R→R, W→R, and W→W. 47
  • 48. Models of Memory Consistency  The relaxed models are defined by four sets of orderings they relax: 1. Relaxing the W→R ordering yields a model known as total store ordering or processor consistency. Because this ordering retains ordering among writes, many programs that operate under sequential consistency operate under this model, without additional synchronization. 2. Relaxing the W→W ordering yields a model known as partial store order. 3. Relaxing the R→W and R→R orderings yields a variety of models including weak ordering, the PowerPC consistency model, and release consistency, depending on the details of the ordering restrictions and how synchronization operations enforce ordering. 48
  • 49. Models of Memory Consistency  Finally,  At the present time, many multiprocessors being built to support some sort of relaxed consistency model.  Since synchronization is highly multiprocessor specific, the expectation is that most programmers will use standard synchronization libraries.  With speculation much of the performance advantage of relaxed consistency models can be obtained with sequential or processor consistency.  Relaxed consistency revolves around the role of the compiler and its ability to optimize memory access to potentially shared variables. 49
  • 51. Interconnection Networks  A bus is a communication pathway connecting two or more devices.  Bus is a shared transmission medium.  Multiple devices are connected to the bus, and a signal transmitted by any one device is available for reception by all other devices attached to the bus.  If two devices transmit during the same time period, their signals will overlap and become garbled. Thus, only one device at a time can successfully transmit the data. 51
  • 52. Interconnection Networks  Typically, a bus consists of multiple communication pathways, or lines.  Each line is capable of transmitting signals representing binary 1 and binary 0.  For example:  An 8-bit unit of data can be transmitted over eight bus lines.  A bus that connects major computer components (processor, memory, I/O) is called a system bus.  The most common computer interconnection structures are based on the use of one or more system buses. 52
  • 53. Interconnection Networks  Bus Structure: 53 Bus Interconnection Scheme
  • 54. Interconnection Networks  Bus Structure: Data Bus  The data lines provide a path for moving data among system modules is called as data bus.  The data bus may consist of 32, 64, 128, or even more separate lines, the number of lines being referred to as the width of the data bus.  Because each line can carry only 1 bit at a time, the number of lines determines how many bits can be transferred at a time.  The width of the data bus is a major factor in determining overall system performance. 54
  • 55. Interconnection Networks  Bus Structure: Address Bus  The address lines are used to designate the source or destination of the data on the data bus.  For example: If the processor wishes to read a word (8, 16, or 32 bits) of data from memory, it puts the address of the desired word on the address lines.  The width of the address bus determines the maximum possible memory capacity of the system. 55
  • 56. Interconnection Networks  Bus Structure: Address Bus  The address lines are also used to address I/O ports.  Typically, the higher-order bits are used to select a particular module on the bus, and the lower-order bits select a memory location or I/O port within the module.  For example: On an 8-bit address bus, address 01111111 and below might reference locations in a memory module (module 0) with 128 words of memory, and address 10000000 and above refer to devices attached to an I/O module (module 1). 56
  • 57. Interconnection Networks  Bus Structure: Control Bus  The control lines are used to control the access to and the use of the data and address lines.  Since, the data and address lines are shared by all components, controlling their use becomes crucial.  Control signals transmit both command and timing information among system modules.  Timing signals indicate the validity of data and address information & Command signals specify operations to be performed. 57
  • 58. Interconnection Networks  Typical control lines include  Memory write: Causes data on the bus to be written into the addressed location  Memory read: Causes data from the addressed location to be placed on the bus  I/O write: Causes data on the bus to be output to the addressed I/O port  I/O read: Causes data from the addressed I/O port to be placed on the bus  Transfer ACK: Indicates that data have been accepted from or placed on the bus 58
  • 59. Interconnection Networks  Typical control lines include  Bus request: Indicates that a module needs to gain control of the bus  Bus grant: Indicates that a requesting module has been granted control of the bus  Interrupt request: Indicates that an interrupt is pending  Interrupt ACK: Acknowledges that the pending interrupt has been recognized  Clock: Is used to synchronize operations  Reset: Initializes all modules 59
  • 61. Interconnection Networks  Multiple-Bus Hierarchies:  If a great number of devices are connected to the bus, performance will suffer. There are two main causes:  1. In general, the more devices attached to the bus, the greater the bus length and hence the greater the propagation delay. This delay determines the time it takes for devices to coordinate the use of the bus. When control of the bus passes from one device to another frequently, these propagation delays can noticeably affect performance. 61
  • 62. Interconnection Networks  Multiple-Bus Hierarchies:  If a great number of devices are connected to the bus, performance will suffer. There are two main causes:  2. The bus may become a bottleneck as the aggregate data transfer demand approaches the capacity of the bus. If data rate is increased beyond the capacity then it will create bottleneck. The data rates generated by attached devices (graphics and video controllers, network interfaces) are growing rapidly, hence bottleneck problem will be faced. 62
  • 65. 65 This presentation is published only for educational purpose shindesir.pvp@gmail.com