Cache coherence and consistency
models in multiprocessor architecture
Computer Architecture Authors:
Piscione Pietro
Villardita Alessio
Degree: Computer EngineeringA.Y. 2014/2015
Introduction
● Multiprocessor architecture
overview
● Coherence vs. Consistency
○ Coherence protocols
○ Snooping and Directory models
○ Consistency models
○ Sequential Consistency
● More throughput
● More efficiency
Why multiprocessor architecture?
● Clock frequency wall
● Shared memory
● Distributed memory
The bus is the bottleneck
More processors
(16 Threads)
More cache memory
(20 MB L3)
More complexity
(1.86 billions transistors)
i7-990x (2011): 12 threads, 12
MB cache, 1.17 billions trans.
Processor-Memory Performance gap
1.25x
1.52x
1.20x
Cache design factors
Traditionally, memory hierarchies designers focused
on:
● Optimizing average memory access time
● Miss rate
● Miss penalty
More recently:
● power consumption has become a major
consideration
Miss rate vs cache size (SPEC CPU2000)
Conflict
Compulsory
Capacity
Cache (in)Coherence
Incoherence
occurs
Pi
Pj
They are different
Consistency and coherence
● Cache coherence model specifies HOW
memory accesses are coordinated among CPUs.
● Cache consistency model specifies WHEN a
memory write shows up at another CPU.
“For any given memory location, at any given
(logical) time, there is either a single core that may
write it (and that may also read it) or some number
of cores that may read it.”
Cache coherence: definition
Two fundamental invariants:
● Single-Writer-Multiple-Reader (SWMR)
● Data-Value
Cache coherence: epochs
● Dividing a given memory location’s lifetime into
epochs
● SWMR only is not enough: need for the Data-
Value invariant
Coherence Controller
● Accepts loads and stores from and returns load
values to the core
● Initiate a coherence transaction when a cache
miss occurs, by issuing a coherence request for
the block requested by the core
● Receive coherence requests and coherence
responses that must be processed
Coherence controller behavior
Coherence Protocols: basics
When a write occurs on a specific address, what’s
next? Two alternatives:
● Write invalidate (most common): invalidate all
other copies
● Write update (broadcast): update all the cached
copies
Invalidate vs. Update protocols
Invalidate:
● One message to
achieve coherence
● Significantly less
bandwidth
● Easy to implement
Update:
● Less read latency
● Larger messages
● More bandwidth
● More complex
implementations
Coherence Protocols: basics
● Directory based: physical memory blocks’
sharing status stored in one centralized
location
● Snooping: every cache tracks the sharing
status of the given block of physical memory
Snooping protocol: main features
● Distributed architecture
● Messages broadcasting
● Not so scalable
● Total order of coherence requests across all
blocks
● Interconnection network must serialize these
requests into some total order
● Write to shared data:
○ An invalidate is sent to all caches which snoop and
invalidate any copy
Snooping protocol: Write Invalidate
● Read Miss:
○ Write-through: memory is always up-to-date
○ Write-back: force other caches to update copy in main
memory, then snoop that value
Can use a separate invalidate bus for write traffic
● Write to shared data:
○ Broadcast on bus, processors snoop, and update
copies
Snooping protocol: Write Update
● Read miss:
○ memory is always up-to-date
● Higher bandwidth (transmit data + address),
but lower latency for readers (looks like write-
through cache)
Directory protocol: basic idea
● Global view of cache states
● Centralized in directory
● Unicast message
● More scalability
When a directory receives a
message, what does it happen?
Reply or Forward
Possible cases:
Directory protocol: basic idea
● One request-reply
● One request -> K
forwards -> K replies
● Point-to-point ordering
Directory protocol: example
1. Requestor sends GetM to
Directory
2. Directory sends Ack Count
to Requestor
3. Directory sends K Invalidate
Message to sharers
4. Sharers send an AckInv to
requestor
5. The requestor modifies the
block
Directory state
● Coarse directories
● Limited pointer directory
Directory distributed
Snooping vs. Directory coherence
Snooping Solution (Snoopy Bus):
● Send all requests for data to all processors (broadcast)
● Scaling limited by cache miss & write traffic saturating the
bus
Directory-Based Schemes:
● Send point-to-point requests to processors (unicast)
● Keep track of what is being shared in a directory
● Distributed memory => distributed directory (reducing
bottlenecks)
Hybrid Designs
There are protocols that combine aspects of:
● Snooping and directory protocols
● Invalidate and update protocols
Achieving advantages from both the solutions.
(aka memory consistency model, or, memory model)
● A specification of the allowed behavior of
multithreaded programs executing with shared
memory
● Multiple correct behaviors are usually allowed
One fundamental:
● Out-of-Order execution
Consistency model: definition
Cache (in)consistency
Should r2 always be set to NEW?
NO!
Core Might Reorder Memory Accesses
Sequential execution model (von Neumann):
● Usually, operations to the same address execute in the
original program order.
Possible reorderings (to different addresses):
● Store-Store: no FIFO write buffer
● Load-Load
● Load-Store and store-load: local bypass
Multiple executions allowed → Non-Determinism
S2 S7S1
write buffer
read
R1
● “The result of an execution is the same as if the
operations had been executed in the order
specified by the program.” (Lamport, 1979)
● Memory order must respect program order
● Every load gets its value from the last store before
it (in global memory order)
Sequential consistency: basic idea
Sequential consistency: Atomicity
● Need of instructions that atomically perform
a “read–modify–write” (e.g. “test-and-set”)
● Simplistic approach: the core effectively
locks the memory system → sacrifices
performance
● Aggressive approach: only need for a “test-
and-set” appearing in total order
Sequential consistency: simple implementation
Sequential (in)consistency: solved
Inconsistency
cannot occur
anymore
Conclusions
Which protocol is the best?
It depends from:
● Technology
● Architecture
● Purposes and applications

More Related Content

PDF
Lecture 6.1
PDF
Cache coherence
PPT
Cache coherence
PPT
Snooping protocols 3
PPTX
Cache coherence problem and its solutions
PPT
Snooping 2
PPT
Memory models
Lecture 6.1
Cache coherence
Cache coherence
Snooping protocols 3
Cache coherence problem and its solutions
Snooping 2
Memory models

What's hot (20)

PPTX
Cache coherence
PPTX
Directory based cache coherence
PPTX
Cache coherence ppt
PDF
Shared-Memory Multiprocessors
PPTX
Multithreading computer architecture
PPT
Hardware multithreading
PDF
Multithreaded processors ppt
PPTX
Multiple processor (ppt 2010)
PDF
Multiprocessor
PDF
What is simultaneous multithreading
PPTX
Hardware Multi-Threading
PPT
Introduction 1
PPTX
Multiprocessor
PPTX
Multiprocessors(performance and synchronization issues)
PPTX
Multiprocessor structures
PPTX
PDF
Notes on NUMA architecture
PPTX
Cache design
PPT
Multicore Processors
PPTX
Computer architecture multi processor
Cache coherence
Directory based cache coherence
Cache coherence ppt
Shared-Memory Multiprocessors
Multithreading computer architecture
Hardware multithreading
Multithreaded processors ppt
Multiple processor (ppt 2010)
Multiprocessor
What is simultaneous multithreading
Hardware Multi-Threading
Introduction 1
Multiprocessor
Multiprocessors(performance and synchronization issues)
Multiprocessor structures
Notes on NUMA architecture
Cache design
Multicore Processors
Computer architecture multi processor
Ad

Similar to Coherence and consistency models in multiprocessor architecture (20)

PPT
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
PPT
chapter-6-multiprocessors-and-thread-level (1).ppt
PPT
Executing Multiple Thread on Modern Processor
PPT
module4.ppt
PPTX
Multiprocessors and Thread-Level Parallelism.pptx
PPTX
Introduction to Thread Level Parallelism
PPT
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
PDF
Multiprocessor
PPTX
2021Arch_14_Ch5_2_coherence.pptx Cache coherence
PPT
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- SMP
PPTX
Cache Coherence.pptx
PDF
Parallel Computing - Lec 3
PDF
PDF
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
PDF
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
PPTX
Bus Based Multiprocessors v2
PDF
Cache Consistency – Requirements and its packet processing Performance implic...
PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
PDF
Week5
PPTX
Lecture5
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
chapter-6-multiprocessors-and-thread-level (1).ppt
Executing Multiple Thread on Modern Processor
module4.ppt
Multiprocessors and Thread-Level Parallelism.pptx
Introduction to Thread Level Parallelism
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Multiprocessor
2021Arch_14_Ch5_2_coherence.pptx Cache coherence
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- SMP
Cache Coherence.pptx
Parallel Computing - Lec 3
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
PHASE-PRIORITY BASED DIRECTORY COHERENCE FOR MULTICORE PROCESSOR
Bus Based Multiprocessors v2
Cache Consistency – Requirements and its packet processing Performance implic...
KA 5 - Lecture 1 - Parallel Processing.pdf
Week5
Lecture5
Ad

Recently uploaded (20)

PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PPTX
PRASUNET_20240614003_231416_0000[1].pptx
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
mechattonicsand iotwith sensor and actuator
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
Design of Material Handling Equipment Lecture Note
PPTX
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PDF
Cryptography and Network Security-Module-I.pdf
PPTX
Module 8- Technological and Communication Skills.pptx
PPTX
Software Engineering and software moduleing
PPTX
CyberSecurity Mobile and Wireless Devices
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
Computer organization and architecuture Digital Notes....pdf
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
MLpara ingenieira CIVIL, meca Y AMBIENTAL
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PRASUNET_20240614003_231416_0000[1].pptx
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Information Storage and Retrieval Techniques Unit III
mechattonicsand iotwith sensor and actuator
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Design of Material Handling Equipment Lecture Note
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
Cryptography and Network Security-Module-I.pdf
Module 8- Technological and Communication Skills.pptx
Software Engineering and software moduleing
CyberSecurity Mobile and Wireless Devices
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Computer organization and architecuture Digital Notes....pdf

Coherence and consistency models in multiprocessor architecture

  • 1. Cache coherence and consistency models in multiprocessor architecture Computer Architecture Authors: Piscione Pietro Villardita Alessio Degree: Computer EngineeringA.Y. 2014/2015
  • 2. Introduction ● Multiprocessor architecture overview ● Coherence vs. Consistency ○ Coherence protocols ○ Snooping and Directory models ○ Consistency models ○ Sequential Consistency
  • 3. ● More throughput ● More efficiency Why multiprocessor architecture? ● Clock frequency wall ● Shared memory ● Distributed memory The bus is the bottleneck
  • 4. More processors (16 Threads) More cache memory (20 MB L3) More complexity (1.86 billions transistors) i7-990x (2011): 12 threads, 12 MB cache, 1.17 billions trans.
  • 6. Cache design factors Traditionally, memory hierarchies designers focused on: ● Optimizing average memory access time ● Miss rate ● Miss penalty More recently: ● power consumption has become a major consideration
  • 7. Miss rate vs cache size (SPEC CPU2000) Conflict Compulsory Capacity
  • 9. They are different Consistency and coherence ● Cache coherence model specifies HOW memory accesses are coordinated among CPUs. ● Cache consistency model specifies WHEN a memory write shows up at another CPU.
  • 10. “For any given memory location, at any given (logical) time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.” Cache coherence: definition Two fundamental invariants: ● Single-Writer-Multiple-Reader (SWMR) ● Data-Value
  • 11. Cache coherence: epochs ● Dividing a given memory location’s lifetime into epochs ● SWMR only is not enough: need for the Data- Value invariant
  • 13. ● Accepts loads and stores from and returns load values to the core ● Initiate a coherence transaction when a cache miss occurs, by issuing a coherence request for the block requested by the core ● Receive coherence requests and coherence responses that must be processed Coherence controller behavior
  • 14. Coherence Protocols: basics When a write occurs on a specific address, what’s next? Two alternatives: ● Write invalidate (most common): invalidate all other copies ● Write update (broadcast): update all the cached copies
  • 15. Invalidate vs. Update protocols Invalidate: ● One message to achieve coherence ● Significantly less bandwidth ● Easy to implement Update: ● Less read latency ● Larger messages ● More bandwidth ● More complex implementations
  • 16. Coherence Protocols: basics ● Directory based: physical memory blocks’ sharing status stored in one centralized location ● Snooping: every cache tracks the sharing status of the given block of physical memory
  • 17. Snooping protocol: main features ● Distributed architecture ● Messages broadcasting ● Not so scalable ● Total order of coherence requests across all blocks ● Interconnection network must serialize these requests into some total order
  • 18. ● Write to shared data: ○ An invalidate is sent to all caches which snoop and invalidate any copy Snooping protocol: Write Invalidate ● Read Miss: ○ Write-through: memory is always up-to-date ○ Write-back: force other caches to update copy in main memory, then snoop that value Can use a separate invalidate bus for write traffic
  • 19. ● Write to shared data: ○ Broadcast on bus, processors snoop, and update copies Snooping protocol: Write Update ● Read miss: ○ memory is always up-to-date ● Higher bandwidth (transmit data + address), but lower latency for readers (looks like write- through cache)
  • 20. Directory protocol: basic idea ● Global view of cache states ● Centralized in directory ● Unicast message ● More scalability When a directory receives a message, what does it happen? Reply or Forward
  • 21. Possible cases: Directory protocol: basic idea ● One request-reply ● One request -> K forwards -> K replies ● Point-to-point ordering
  • 22. Directory protocol: example 1. Requestor sends GetM to Directory 2. Directory sends Ack Count to Requestor 3. Directory sends K Invalidate Message to sharers 4. Sharers send an AckInv to requestor 5. The requestor modifies the block
  • 23. Directory state ● Coarse directories ● Limited pointer directory
  • 25. Snooping vs. Directory coherence Snooping Solution (Snoopy Bus): ● Send all requests for data to all processors (broadcast) ● Scaling limited by cache miss & write traffic saturating the bus Directory-Based Schemes: ● Send point-to-point requests to processors (unicast) ● Keep track of what is being shared in a directory ● Distributed memory => distributed directory (reducing bottlenecks)
  • 26. Hybrid Designs There are protocols that combine aspects of: ● Snooping and directory protocols ● Invalidate and update protocols Achieving advantages from both the solutions.
  • 27. (aka memory consistency model, or, memory model) ● A specification of the allowed behavior of multithreaded programs executing with shared memory ● Multiple correct behaviors are usually allowed One fundamental: ● Out-of-Order execution Consistency model: definition
  • 28. Cache (in)consistency Should r2 always be set to NEW? NO!
  • 29. Core Might Reorder Memory Accesses Sequential execution model (von Neumann): ● Usually, operations to the same address execute in the original program order. Possible reorderings (to different addresses): ● Store-Store: no FIFO write buffer ● Load-Load ● Load-Store and store-load: local bypass Multiple executions allowed → Non-Determinism S2 S7S1 write buffer read R1
  • 30. ● “The result of an execution is the same as if the operations had been executed in the order specified by the program.” (Lamport, 1979) ● Memory order must respect program order ● Every load gets its value from the last store before it (in global memory order) Sequential consistency: basic idea
  • 31. Sequential consistency: Atomicity ● Need of instructions that atomically perform a “read–modify–write” (e.g. “test-and-set”) ● Simplistic approach: the core effectively locks the memory system → sacrifices performance ● Aggressive approach: only need for a “test- and-set” appearing in total order
  • 34. Conclusions Which protocol is the best? It depends from: ● Technology ● Architecture ● Purposes and applications