SlideShare a Scribd company logo
Copyright © 2012, Elsevier Inc. All rights reserved. 1
Chapter 5
Multiprocessors and
Thread-Level Parallelism
Computer Architecture
A Quantitative Approach, Fifth Edition
2
Copyright © 2012, Elsevier Inc. All rights reserved.
Introduction
 Thread-Level parallelism
 Have multiple program counters
 Uses MIMD model
 Targeted for tightly-coupled shared-memory
multiprocessors
 For n processors, need n threads
 Amount of computation assigned to each thread
= grain size
 Threads can be used for data-level parallelism, but
the overheads may outweigh the benefit
Introduction
3
Copyright © 2012, Elsevier Inc. All rights reserved.
Types
 Symmetric multiprocessors
(SMP)
 Small number of cores
 Share single memory with
uniform memory latency
 Distributed shared memory
(DSM)
 Memory distributed among
processors
 Non-uniform memory
access/latency (NUMA)
 Processors connected via
direct (switched) and non-
direct (multi-hop)
interconnection networks
Introduction
4
Copyright © 2012, Elsevier Inc. All rights reserved.
Cache Coherence
 Processors may see different values through
their caches:
Centralized
Shared-Memory
Architectures
5
Copyright © 2012, Elsevier Inc. All rights reserved.
Cache Coherence
 Coherence
 All reads by any processor must return the most
recently written value
 Writes to the same location by any two processors are
seen in the same order by all processors
 Consistency
 When a written value will be returned by a read
 If a processor writes location A followed by location B,
any processor that sees the new value of B must also
see the new value of A
Centralized
Shared-Memory
Architectures
6
Copyright © 2012, Elsevier Inc. All rights reserved.
Enforcing Coherence
 Coherent caches provide:
 Migration: movement of data
 Replication: multiple copies of data
 Cache coherence protocols
 Directory based

Sharing status of each block kept in one location
 Snooping

Each core tracks sharing status of each block
Centralized
Shared-Memory
Architectures
7
Copyright © 2012, Elsevier Inc. All rights reserved.
Snoopy Coherence Protocols
 Write invalidate
 On write, invalidate all other copies
 Use bus itself to serialize

Write cannot complete until bus access is obtained
 Write update
 On write, update all copies
Centralized
Shared-Memory
Architectures
8
Copyright © 2012, Elsevier Inc. All rights reserved.
Snoopy Coherence Protocols
 Locating an item when a read miss occurs
 In write-back cache, the updated value must be sent
to the requesting processor
 Cache lines marked as shared or
exclusive/modified
 Only writes to shared lines need an invalidate
broadcast

After this, the line is marked as exclusive
Centralized
Shared-Memory
Architectures
9
Copyright © 2012, Elsevier Inc. All rights reserved.
Snoopy Coherence Protocols
Centralized
Shared-Memory
Architectures
10
Copyright © 2012, Elsevier Inc. All rights reserved.
Snoopy Coherence Protocols
Centralized
Shared-Memory
Architectures
11
Copyright © 2012, Elsevier Inc. All rights reserved.
Snoopy Coherence Protocols
 Complications for the basic MSI protocol:
 Operations are not atomic

E.g. detect miss, acquire bus, receive a response

Creates possibility of deadlock and races

One solution: processor that sends invalidate can hold bus
until other processors receive the invalidate
 Extensions:
 Add exclusive state to indicate clean block in only one
cache (MESI protocol)

Prevents needing to write invalidate on a write
 Owned state
Centralized
Shared-Memory
Architectures
12
Copyright © 2012, Elsevier Inc. All rights reserved.
Coherence Protocols: Extensions
 Shared memory bus
and snooping
bandwidth is
bottleneck for scaling
symmetric
multiprocessors
 Duplicating tags
 Place directory in
outermost cache
 Use crossbars or point-
to-point networks with
banked memory
Centralized
Shared-Memory
Architectures
13
Copyright © 2012, Elsevier Inc. All rights reserved.
Coherence Protocols
 AMD Opteron:
 Memory directly connected to each multicore chip in
NUMA-like organization
 Implement coherence protocol using point-to-point
links
 Use explicit acknowledgements to order operations
Centralized
Shared-Memory
Architectures
14
Copyright © 2012, Elsevier Inc. All rights reserved.
Performance
 Coherence influences cache miss rate
 Coherence misses

True sharing misses
 Write to shared block (transmission of invalidation)
 Read an invalidated block

False sharing misses
 Read an unmodified word in an invalidated block
Performance
of
Symmetric
Shared-Memory
Multiprocessors
15
Copyright © 2012, Elsevier Inc. All rights reserved.
Performance Study: Commercial Workload
Performance
of
Symmetric
Shared-Memory
Multiprocessors
16
Copyright © 2012, Elsevier Inc. All rights reserved.
Performance Study: Commercial Workload
Performance
of
Symmetric
Shared-Memory
Multiprocessors
17
Copyright © 2012, Elsevier Inc. All rights reserved.
Performance Study: Commercial Workload
Performance
of
Symmetric
Shared-Memory
Multiprocessors
18
Copyright © 2012, Elsevier Inc. All rights reserved.
Performance Study: Commercial Workload
Performance
of
Symmetric
Shared-Memory
Multiprocessors
19
Copyright © 2012, Elsevier Inc. All rights reserved.
Directory Protocols
 Directory keeps track of every block
 Which caches have each block
 Dirty status of each block
 Implement in shared L3 cache
 Keep bit vector of size = # cores for each block in L3
 Not scalable beyond shared L3
 Implement in a distributed fashion:
Distributed
Shared
Memory
and
Directory-Based
Coherence
20
Copyright © 2012, Elsevier Inc. All rights reserved.
Directory Protocols
 For each block, maintain state:
 Shared

One or more nodes have the block cached, value in memory
is up-to-date

Set of node IDs
 Uncached
 Modified

Exactly one node has a copy of the cache block, value in
memory is out-of-date

Owner node ID
 Directory maintains block states and sends
invalidation messages
Distributed
Shared
Memory
and
Directory-Based
Coherence
21
Copyright © 2012, Elsevier Inc. All rights reserved.
Messages
Distributed
Shared
Memory
and
Directory-Based
Coherence
22
Copyright © 2012, Elsevier Inc. All rights reserved.
Directory Protocols
Distributed
Shared
Memory
and
Directory-Based
Coherence
23
Copyright © 2012, Elsevier Inc. All rights reserved.
Directory Protocols
 For uncached block:
 Read miss

Requesting node is sent the requested data and is made the
only sharing node, block is now shared
 Write miss

The requesting node is sent the requested data and becomes
the sharing node, block is now exclusive
 For shared block:
 Read miss

The requesting node is sent the requested data from memory,
node is added to sharing set
 Write miss

The requesting node is sent the value, all nodes in the sharing
set are sent invalidate messages, sharing set only contains
requesting node, block is now exclusive
Distributed
Shared
Memory
and
Directory-Based
Coherence
24
Copyright © 2012, Elsevier Inc. All rights reserved.
Directory Protocols
 For exclusive block:
 Read miss

The owner is sent a data fetch message, block becomes
shared, owner sends data to the directory, data written
back to memory, sharers set contains old owner and
requestor
 Data write back

Block becomes uncached, sharer set is empty
 Write miss

Message is sent to old owner to invalidate and send the
value to the directory, requestor becomes new owner,
block remains exclusive
Distributed
Shared
Memory
and
Directory-Based
Coherence
25
Copyright © 2012, Elsevier Inc. All rights reserved.
Synchronization
 Basic building blocks:
 Atomic exchange

Swaps register with memory location
 Test-and-set

Sets under condition
 Fetch-and-increment

Reads original value from memory and increments it in memory
 Requires memory read and write in uninterruptable instruction
 load linked/store conditional

If the contents of the memory location specified by the load linked
are changed before the store conditional to the same address, the
store conditional fails
Synchronization
26
Copyright © 2012, Elsevier Inc. All rights reserved.
Implementing Locks
 Spin lock
 If no coherence:
DADDUI R2,R0,#1
lockit: EXCH R2,0(R1) ;atomic exchange
BNEZ R2,lockit ;already locked?
 If coherence:
lockit: LD R2,0(R1) ;load of lock
BNEZ R2,lockit ;not available-spin
DADDUI R2,R0,#1 ;load locked value
EXCH R2,0(R1) ;swap
BNEZ R2,lockit ;branch if lock
wasn’t 0
Synchronization
27
Copyright © 2012, Elsevier Inc. All rights reserved.
Implementing Locks
 Advantage of this scheme: reduces memory
traffic
Synchronization
28
Copyright © 2012, Elsevier Inc. All rights reserved.
Models of Memory Consistency
Models
of
Memory
Consistency:
An
Introduction
Processor 1:
A=0
…
A=1
if (B==0) …
Processor 2:
B=0
…
B=1
if (A==0) …
 Should be impossible for both if-statements to be
evaluated as true
 Delayed write invalidate?
 Sequential consistency:
 Result of execution should be the same as long as:
 Accesses on each processor were kept in order
 Accesses on different processors were arbitrarily interleaved
29
Copyright © 2012, Elsevier Inc. All rights reserved.
Implementing Locks
 To implement, delay completion of all memory
accesses until all invalidations caused by the
access are completed
 Reduces performance!
 Alternatives:
 Program-enforced synchronization to force write on
processor to occur before read on the other processor

Requires synchronization object for A and another for B
 “Unlock” after write
 “Lock” after read
Models
of
Memory
Consistency:
An
Introduction
30
Copyright © 2012, Elsevier Inc. All rights reserved.
Relaxed Consistency Models
 Rules:
 X → Y

Operation X must complete before operation Y is done

Sequential consistency requires:
 R → W, R → R, W → R, W → W
 Relax W → R

“Total store ordering”
 Relax W → W

“Partial store order”
 Relax R → W and R → R

“Weak ordering” and “release consistency”
Models
of
Memory
Consistency:
An
Introduction
31
Copyright © 2012, Elsevier Inc. All rights reserved.
Relaxed Consistency Models
 Consistency model is multiprocessor specific
 Programmers will often implement explicit
synchronization
 Speculation gives much of the performance
advantage of relaxed models with sequential
consistency
 Basic idea: if an invalidation arrives for a result that
has not been committed, use speculation recovery
Models
of
Memory
Consistency:
An
Introduction

More Related Content

PPTX
Introduction to Thread Level Parallelism
PPT
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
PPT
Executing Multiple Thread on Modern Processor
PPT
module4.ppt
PPT
chapter-6-multiprocessors-and-thread-level (1).ppt
PDF
Multiprocessor
PPT
Snooping 2
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
Introduction to Thread Level Parallelism
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
Executing Multiple Thread on Modern Processor
module4.ppt
chapter-6-multiprocessors-and-thread-level (1).ppt
Multiprocessor
Snooping 2
KA 5 - Lecture 1 - Parallel Processing.pdf

Similar to Multiprocessors and Thread-Level Parallelism.pptx (20)

PDF
Coherence and consistency models in multiprocessor architecture
PPTX
6.distributed shared memory
PDF
Multiprocessor
PPT
Distributed shared memory in distributed systems.ppt
PPTX
Distributed Shared Memory
PPTX
PPT
Multicore Processors
PPTX
2021Arch_14_Ch5_2_coherence.pptx Cache coherence
PDF
Week5
PPT
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
PPT
Introduction 1
PDF
Memory and Cache Coherence in Multiprocessor System.pdf
PDF
Parallel Computing - Lec 3
PPTX
Cache Coherence.pptx
PPT
Introduction to symmetric multiprocessor
PPT
Snooping protocols 3
PPTX
Coa presentation5
PPTX
Multiprocessors(performance and synchronization issues)
DOC
Introduction to multi core
Coherence and consistency models in multiprocessor architecture
6.distributed shared memory
Multiprocessor
Distributed shared memory in distributed systems.ppt
Distributed Shared Memory
Multicore Processors
2021Arch_14_Ch5_2_coherence.pptx Cache coherence
Week5
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Introduction 1
Memory and Cache Coherence in Multiprocessor System.pdf
Parallel Computing - Lec 3
Cache Coherence.pptx
Introduction to symmetric multiprocessor
Snooping protocols 3
Coa presentation5
Multiprocessors(performance and synchronization issues)
Introduction to multi core
Ad

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Digital Logic Computer Design lecture notes
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPT
Mechanical Engineering MATERIALS Selection
PPT
Project quality management in manufacturing
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Digital Logic Computer Design lecture notes
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CH1 Production IntroductoryConcepts.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Operating System & Kernel Study Guide-1 - converted.pdf
UNIT 4 Total Quality Management .pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Mechanical Engineering MATERIALS Selection
Project quality management in manufacturing
bas. eng. economics group 4 presentation 1.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Internet of Things (IOT) - A guide to understanding
Model Code of Practice - Construction Work - 21102022 .pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
Ad

Multiprocessors and Thread-Level Parallelism.pptx

  • 1. Copyright © 2012, Elsevier Inc. All rights reserved. 1 Chapter 5 Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition
  • 2. 2 Copyright © 2012, Elsevier Inc. All rights reserved. Introduction  Thread-Level parallelism  Have multiple program counters  Uses MIMD model  Targeted for tightly-coupled shared-memory multiprocessors  For n processors, need n threads  Amount of computation assigned to each thread = grain size  Threads can be used for data-level parallelism, but the overheads may outweigh the benefit Introduction
  • 3. 3 Copyright © 2012, Elsevier Inc. All rights reserved. Types  Symmetric multiprocessors (SMP)  Small number of cores  Share single memory with uniform memory latency  Distributed shared memory (DSM)  Memory distributed among processors  Non-uniform memory access/latency (NUMA)  Processors connected via direct (switched) and non- direct (multi-hop) interconnection networks Introduction
  • 4. 4 Copyright © 2012, Elsevier Inc. All rights reserved. Cache Coherence  Processors may see different values through their caches: Centralized Shared-Memory Architectures
  • 5. 5 Copyright © 2012, Elsevier Inc. All rights reserved. Cache Coherence  Coherence  All reads by any processor must return the most recently written value  Writes to the same location by any two processors are seen in the same order by all processors  Consistency  When a written value will be returned by a read  If a processor writes location A followed by location B, any processor that sees the new value of B must also see the new value of A Centralized Shared-Memory Architectures
  • 6. 6 Copyright © 2012, Elsevier Inc. All rights reserved. Enforcing Coherence  Coherent caches provide:  Migration: movement of data  Replication: multiple copies of data  Cache coherence protocols  Directory based  Sharing status of each block kept in one location  Snooping  Each core tracks sharing status of each block Centralized Shared-Memory Architectures
  • 7. 7 Copyright © 2012, Elsevier Inc. All rights reserved. Snoopy Coherence Protocols  Write invalidate  On write, invalidate all other copies  Use bus itself to serialize  Write cannot complete until bus access is obtained  Write update  On write, update all copies Centralized Shared-Memory Architectures
  • 8. 8 Copyright © 2012, Elsevier Inc. All rights reserved. Snoopy Coherence Protocols  Locating an item when a read miss occurs  In write-back cache, the updated value must be sent to the requesting processor  Cache lines marked as shared or exclusive/modified  Only writes to shared lines need an invalidate broadcast  After this, the line is marked as exclusive Centralized Shared-Memory Architectures
  • 9. 9 Copyright © 2012, Elsevier Inc. All rights reserved. Snoopy Coherence Protocols Centralized Shared-Memory Architectures
  • 10. 10 Copyright © 2012, Elsevier Inc. All rights reserved. Snoopy Coherence Protocols Centralized Shared-Memory Architectures
  • 11. 11 Copyright © 2012, Elsevier Inc. All rights reserved. Snoopy Coherence Protocols  Complications for the basic MSI protocol:  Operations are not atomic  E.g. detect miss, acquire bus, receive a response  Creates possibility of deadlock and races  One solution: processor that sends invalidate can hold bus until other processors receive the invalidate  Extensions:  Add exclusive state to indicate clean block in only one cache (MESI protocol)  Prevents needing to write invalidate on a write  Owned state Centralized Shared-Memory Architectures
  • 12. 12 Copyright © 2012, Elsevier Inc. All rights reserved. Coherence Protocols: Extensions  Shared memory bus and snooping bandwidth is bottleneck for scaling symmetric multiprocessors  Duplicating tags  Place directory in outermost cache  Use crossbars or point- to-point networks with banked memory Centralized Shared-Memory Architectures
  • 13. 13 Copyright © 2012, Elsevier Inc. All rights reserved. Coherence Protocols  AMD Opteron:  Memory directly connected to each multicore chip in NUMA-like organization  Implement coherence protocol using point-to-point links  Use explicit acknowledgements to order operations Centralized Shared-Memory Architectures
  • 14. 14 Copyright © 2012, Elsevier Inc. All rights reserved. Performance  Coherence influences cache miss rate  Coherence misses  True sharing misses  Write to shared block (transmission of invalidation)  Read an invalidated block  False sharing misses  Read an unmodified word in an invalidated block Performance of Symmetric Shared-Memory Multiprocessors
  • 15. 15 Copyright © 2012, Elsevier Inc. All rights reserved. Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors
  • 16. 16 Copyright © 2012, Elsevier Inc. All rights reserved. Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors
  • 17. 17 Copyright © 2012, Elsevier Inc. All rights reserved. Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors
  • 18. 18 Copyright © 2012, Elsevier Inc. All rights reserved. Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors
  • 19. 19 Copyright © 2012, Elsevier Inc. All rights reserved. Directory Protocols  Directory keeps track of every block  Which caches have each block  Dirty status of each block  Implement in shared L3 cache  Keep bit vector of size = # cores for each block in L3  Not scalable beyond shared L3  Implement in a distributed fashion: Distributed Shared Memory and Directory-Based Coherence
  • 20. 20 Copyright © 2012, Elsevier Inc. All rights reserved. Directory Protocols  For each block, maintain state:  Shared  One or more nodes have the block cached, value in memory is up-to-date  Set of node IDs  Uncached  Modified  Exactly one node has a copy of the cache block, value in memory is out-of-date  Owner node ID  Directory maintains block states and sends invalidation messages Distributed Shared Memory and Directory-Based Coherence
  • 21. 21 Copyright © 2012, Elsevier Inc. All rights reserved. Messages Distributed Shared Memory and Directory-Based Coherence
  • 22. 22 Copyright © 2012, Elsevier Inc. All rights reserved. Directory Protocols Distributed Shared Memory and Directory-Based Coherence
  • 23. 23 Copyright © 2012, Elsevier Inc. All rights reserved. Directory Protocols  For uncached block:  Read miss  Requesting node is sent the requested data and is made the only sharing node, block is now shared  Write miss  The requesting node is sent the requested data and becomes the sharing node, block is now exclusive  For shared block:  Read miss  The requesting node is sent the requested data from memory, node is added to sharing set  Write miss  The requesting node is sent the value, all nodes in the sharing set are sent invalidate messages, sharing set only contains requesting node, block is now exclusive Distributed Shared Memory and Directory-Based Coherence
  • 24. 24 Copyright © 2012, Elsevier Inc. All rights reserved. Directory Protocols  For exclusive block:  Read miss  The owner is sent a data fetch message, block becomes shared, owner sends data to the directory, data written back to memory, sharers set contains old owner and requestor  Data write back  Block becomes uncached, sharer set is empty  Write miss  Message is sent to old owner to invalidate and send the value to the directory, requestor becomes new owner, block remains exclusive Distributed Shared Memory and Directory-Based Coherence
  • 25. 25 Copyright © 2012, Elsevier Inc. All rights reserved. Synchronization  Basic building blocks:  Atomic exchange  Swaps register with memory location  Test-and-set  Sets under condition  Fetch-and-increment  Reads original value from memory and increments it in memory  Requires memory read and write in uninterruptable instruction  load linked/store conditional  If the contents of the memory location specified by the load linked are changed before the store conditional to the same address, the store conditional fails Synchronization
  • 26. 26 Copyright © 2012, Elsevier Inc. All rights reserved. Implementing Locks  Spin lock  If no coherence: DADDUI R2,R0,#1 lockit: EXCH R2,0(R1) ;atomic exchange BNEZ R2,lockit ;already locked?  If coherence: lockit: LD R2,0(R1) ;load of lock BNEZ R2,lockit ;not available-spin DADDUI R2,R0,#1 ;load locked value EXCH R2,0(R1) ;swap BNEZ R2,lockit ;branch if lock wasn’t 0 Synchronization
  • 27. 27 Copyright © 2012, Elsevier Inc. All rights reserved. Implementing Locks  Advantage of this scheme: reduces memory traffic Synchronization
  • 28. 28 Copyright © 2012, Elsevier Inc. All rights reserved. Models of Memory Consistency Models of Memory Consistency: An Introduction Processor 1: A=0 … A=1 if (B==0) … Processor 2: B=0 … B=1 if (A==0) …  Should be impossible for both if-statements to be evaluated as true  Delayed write invalidate?  Sequential consistency:  Result of execution should be the same as long as:  Accesses on each processor were kept in order  Accesses on different processors were arbitrarily interleaved
  • 29. 29 Copyright © 2012, Elsevier Inc. All rights reserved. Implementing Locks  To implement, delay completion of all memory accesses until all invalidations caused by the access are completed  Reduces performance!  Alternatives:  Program-enforced synchronization to force write on processor to occur before read on the other processor  Requires synchronization object for A and another for B  “Unlock” after write  “Lock” after read Models of Memory Consistency: An Introduction
  • 30. 30 Copyright © 2012, Elsevier Inc. All rights reserved. Relaxed Consistency Models  Rules:  X → Y  Operation X must complete before operation Y is done  Sequential consistency requires:  R → W, R → R, W → R, W → W  Relax W → R  “Total store ordering”  Relax W → W  “Partial store order”  Relax R → W and R → R  “Weak ordering” and “release consistency” Models of Memory Consistency: An Introduction
  • 31. 31 Copyright © 2012, Elsevier Inc. All rights reserved. Relaxed Consistency Models  Consistency model is multiprocessor specific  Programmers will often implement explicit synchronization  Speculation gives much of the performance advantage of relaxed models with sequential consistency  Basic idea: if an invalidation arrives for a result that has not been committed, use speculation recovery Models of Memory Consistency: An Introduction