Coherence and consistency models in multiprocessor architecture

Cache coherence and consistency
models in multiprocessor architecture
Computer Architecture Authors:
Piscione Pietro
Villardita Alessio
Degree: Computer EngineeringA.Y. 2014/2015

Introduction
● Multiprocessor architecture
overview
● Coherence vs. Consistency
○ Coherence protocols
○ Snooping and Directory models
○ Consistency models
○ Sequential Consistency

● More throughput
● More efficiency
Why multiprocessor architecture?
● Clock frequency wall
● Shared memory
● Distributed memory
The bus is the bottleneck

More processors
(16 Threads)
More cache memory
(20 MB L3)
More complexity
(1.86 billions transistors)
i7-990x (2011): 12 threads, 12
MB cache, 1.17 billions trans.

Processor-Memory Performance gap
1.25x
1.52x
1.20x

Cache design factors
Traditionally, memory hierarchies designers focused
on:
● Optimizing average memory access time
● Miss rate
● Miss penalty
More recently:
● power consumption has become a major
consideration

Miss rate vs cache size (SPEC CPU2000)
Conflict
Compulsory
Capacity

Cache (in)Coherence
Incoherence
occurs
Pi
Pj

They are different
Consistency and coherence
● Cache coherence model specifies HOW
memory accesses are coordinated among CPUs.
● Cache consistency model specifies WHEN a
memory write shows up at another CPU.

“For any given memory location, at any given
(logical) time, there is either a single core that may
write it (and that may also read it) or some number
of cores that may read it.”
Cache coherence: definition
Two fundamental invariants:
● Single-Writer-Multiple-Reader (SWMR)
● Data-Value

Cache coherence: epochs
● Dividing a given memory location’s lifetime into
epochs
● SWMR only is not enough: need for the Data-
Value invariant

● Accepts loads and stores from and returns load
values to the core
● Initiate a coherence transaction when a cache
miss occurs, by issuing a coherence request for
the block requested by the core
● Receive coherence requests and coherence
responses that must be processed
Coherence controller behavior

Coherence Protocols: basics
When a write occurs on a specific address, what’s
next? Two alternatives:
● Write invalidate (most common): invalidate all
other copies
● Write update (broadcast): update all the cached
copies

Invalidate vs. Update protocols
Invalidate:
● One message to
achieve coherence
● Significantly less
bandwidth
● Easy to implement
Update:
● Less read latency
● Larger messages
● More bandwidth
● More complex
implementations

Coherence Protocols: basics
● Directory based: physical memory blocks’
sharing status stored in one centralized
location
● Snooping: every cache tracks the sharing
status of the given block of physical memory

Snooping protocol: main features
● Distributed architecture
● Messages broadcasting
● Not so scalable
● Total order of coherence requests across all
blocks
● Interconnection network must serialize these
requests into some total order

● Write to shared data:
○ An invalidate is sent to all caches which snoop and
invalidate any copy
Snooping protocol: Write Invalidate
● Read Miss:
○ Write-through: memory is always up-to-date
○ Write-back: force other caches to update copy in main
memory, then snoop that value
Can use a separate invalidate bus for write traffic

● Write to shared data:
○ Broadcast on bus, processors snoop, and update
copies
Snooping protocol: Write Update
● Read miss:
○ memory is always up-to-date
● Higher bandwidth (transmit data + address),
but lower latency for readers (looks like write-
through cache)

Directory protocol: basic idea
● Global view of cache states
● Centralized in directory
● Unicast message
● More scalability
When a directory receives a
message, what does it happen?
Reply or Forward

Possible cases:
Directory protocol: basic idea
● One request-reply
● One request -> K
forwards -> K replies
● Point-to-point ordering

Directory protocol: example
1. Requestor sends GetM to
Directory
2. Directory sends Ack Count
to Requestor
3. Directory sends K Invalidate
Message to sharers
4. Sharers send an AckInv to
requestor
5. The requestor modifies the
block

Directory state
● Coarse directories
● Limited pointer directory

Snooping vs. Directory coherence
Snooping Solution (Snoopy Bus):
● Send all requests for data to all processors (broadcast)
● Scaling limited by cache miss & write traffic saturating the
bus
Directory-Based Schemes:
● Send point-to-point requests to processors (unicast)
● Keep track of what is being shared in a directory
● Distributed memory => distributed directory (reducing
bottlenecks)

Hybrid Designs
There are protocols that combine aspects of:
● Snooping and directory protocols
● Invalidate and update protocols
Achieving advantages from both the solutions.

(aka memory consistency model, or, memory model)
● A specification of the allowed behavior of
multithreaded programs executing with shared
memory
● Multiple correct behaviors are usually allowed
One fundamental:
● Out-of-Order execution
Consistency model: definition

Cache (in)consistency
Should r2 always be set to NEW?
NO!

Core Might Reorder Memory Accesses
Sequential execution model (von Neumann):
● Usually, operations to the same address execute in the
original program order.
Possible reorderings (to different addresses):
● Store-Store: no FIFO write buffer
● Load-Load
● Load-Store and store-load: local bypass
Multiple executions allowed → Non-Determinism
S2 S7S1
write buffer
read
R1

● “The result of an execution is the same as if the
operations had been executed in the order
specified by the program.” (Lamport, 1979)
● Memory order must respect program order
● Every load gets its value from the last store before
it (in global memory order)
Sequential consistency: basic idea

Sequential consistency: Atomicity
● Need of instructions that atomically perform
a “read–modify–write” (e.g. “test-and-set”)
● Simplistic approach: the core effectively
locks the memory system → sacrifices
performance
● Aggressive approach: only need for a “test-
and-set” appearing in total order

Sequential consistency: simple implementation

Sequential (in)consistency: solved
Inconsistency
cannot occur
anymore

Conclusions
Which protocol is the best?
It depends from:
● Technology
● Architecture
● Purposes and applications

Coherence and consistency models in multiprocessor architecture

More Related Content

What's hot (20)

Similar to Coherence and consistency models in multiprocessor architecture (20)

Recently uploaded (20)

Coherence and consistency models in multiprocessor architecture