SlideShare a Scribd company logo
Tongping Liu, Charlie Curtsinger, Emery Berger
DTHREADS: Efficient Deterministic
Multithreading
Insanity: Doing the same
thing over and over again
and expecting different
results.
2
In the Beginning…
3
There was the Core.
4
And it was Good.
5
It gave us our Daily Speed.
6
Until the Apocalypse.
7
And the Speed was no Moore.
8
And then came a False Prophet…
9
10
Want speed?
11
I BRING YOU THE GIFT OF PARALLELISM!
12
color = ; row = 0; // globals
void nextStripe(){
for (c = 0; c < Width; c++)
drawBox (c,row,color);
color = (color == )?  : ;
row++;
}
for (n = 0; n < 9; n++)
pthread_create(t[n], nextStripe);
for (n = 0; n < 9; n++)
pthread_join(t[n]);
JUST USE THREADS…
13
14
15
16
17
18
pthreads
race conditions
atomicity violations
deadlock
order violations
19
Salvation?
20
21
pthreads
race conditions
atomicity violations
deadlock
order violations
DTHREADS
deterministic
race conditions
atomicity violations
deadlock
order violations
22
DTHREADS Enables…
Race-free Executions
Replay Debugging w/o Logging
Replicated State
Machines
23
0
1
2
3
4
5
6
runtimerelativetopthreads
CoreDet dthreads pthreads
8.47.8
DTHREADS: Efficient Determinism
Usually faster than the state of the art
24
0
1
2
3
4
5
6
runtimerelativetopthreads
CoreDet dthreads pthreads
8.47.8
DTHREADS: Efficient Determinism
Generally as fast or faster than pthreads
25
% g++ myprog.cpp –l thread
DTHREADS: Easy to Use
p
26
Isolation
shared address space disjoint address spaces
27
Performance: Processes vs. Threads
threads
processes
1 2 4 8 16 32 64 128 256 512
1024
Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
NormalizedExecutionTime
28
Performance: Processes vs. Threads
threads
processes
1 2 4 8 16 32 64 128 256 512
1024
Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
NormalizedExecutionTime
29
Performance: Processes vs. Threads
threads
processes
1 2 4 8 16 32 64 128 256 512
1024
Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
NormalizedExecutionTime
30
“Shared Memory”
31
Snapshot pages
before modifications
“Shared Memory”
32
Write back diffs
“Shared Memory”
33
“Thread” 1
“Thread” 2
“Thread” 3
Parallel Serial
Update in Deterministic Time & Order
Para
mutex_lock
cond_wait
pthread_create
34
0
1
2
3
4
runtimerelativetopthreads
dthreads pthreads
DTHREADS performance analysis
35
Thread 1
Main Memory
Core 1
Thread 2
Core 2
Invalidate
The Culprit: False Sharing
36
Thread 1 Thread 2
Invalidate
Main Memory
Core 1 Core 2
The Culprit: False Sharing
20x
37
Process 1 Process 2
Global State
Core 1 Core 2
Process 2
Process 1
DTHREADS: Eliminates False Sharing!
38
0
1
2
3
4
5
6
runtimerelativetopthreads
ordering only isolation only dthreads
DTHREADS: Detailed Analysis
39
0
1
2
3
4
5
6
runtimerelativetopthreads
ordering only isolation only dthreads
DTHREADS: Detailed Analysis
40
0
1
2
3
4
5
6
runtimerelativetopthreads
ordering only isolation only dthreads
DTHREADS: Detailed Analysis
41
0
1
2
3
4
speedupof8coresover2cores
CoreDet dthreads pthreads
DTHREADS: Scalable Determinism
42
0
1
2
3
4
speedupof8coresover2cores
CoreDet dthreads pthreads
DTHREADS: Scalable Determinism
43
0
1
2
3
4
speedupof8coresover2cores
CoreDet dthreads pthreads
DTHREADS: Scalable Determinism
44
DTHREADS
% g++ myprog.cpp –l threadp
45

More Related Content

PPTX
Disruptor
PPTX
The Internet
KEY
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
PPTX
Storage
PPTX
Scheduling in Linux and Web Servers
PPTX
Nested Loops
PDF
Velocity 2012 - Learning WebOps the Hard Way
PDF
Google Spanner
Disruptor
The Internet
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Storage
Scheduling in Linux and Web Servers
Nested Loops
Velocity 2012 - Learning WebOps the Hard Way
Google Spanner

What's hot (20)

PPTX
Alternative cryptocurrencies
PDF
PPTX
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
PDF
Inteligencia artificial 13
PPTX
Ac cuda c_3
DOCX
PDF
Brace yourselves, leap second is coming
PDF
Is your profiler speaking the same language as you? -- Docklands JUG
PDF
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
PDF
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
PDF
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
PDF
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
PPTX
Prosit google-cloud
PPTX
Making a Process (Virtualizing Memory)
PDF
The Quantum Physics of Java
DOC
ใบความรู้ที่ 1 บทที่ 9 เรื่อง จังหวะและเครื่องหมายกำหนดจังหวะ
PPTX
SSL Failing, Sharing, and Scheduling
PDF
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
ODP
CRIU: Time and Space Travel for Linux Containers
PDF
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
Alternative cryptocurrencies
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Inteligencia artificial 13
Ac cuda c_3
Brace yourselves, leap second is coming
Is your profiler speaking the same language as you? -- Docklands JUG
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Prosit google-cloud
Making a Process (Virtualizing Memory)
The Quantum Physics of Java
ใบความรู้ที่ 1 บทที่ 9 เรื่อง จังหวะและเครื่องหมายกำหนดจังหวะ
SSL Failing, Sharing, and Scheduling
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
CRIU: Time and Space Travel for Linux Containers
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
Ad

Similar to Dthreads: Efficient Deterministic Multithreading (20)

PDF
[Greach 17] make concurrency groovy again
PPTX
Data Encryption standard in cryptography
PDF
Optimizing Parallel Reduction in CUDA : NOTES
PDF
MultiThreading-in-system-and-android-logcat-42-.pdf
PPT
Building your own NSQL store
PPT
Nibiru: Building your own NoSQL store
PPT
Nibiru: Building your own NoSQL store
PPTX
Data Wars: The Bloody Enterprise strikes back
PPTX
Data oriented design and c++
PDF
Ben Coverston - The Apache Cassandra Project
PDF
Cryptography (under)engineering
PDF
Yevhen Tatarynov "From POC to High-Performance .NET applications"
PPT
Data race
PPTX
Parallel K means clustering using CUDA
PPTX
Cryptographic algorithms
PPTX
Cryptographic algorithms
PDF
The Curious Clojurist - Neal Ford (Thoughtworks)
PPT
Threads V4
PDF
A Development of Log-based Game AI using Deep Learning
PDF
LeetCode Solutions In Java .pdf
[Greach 17] make concurrency groovy again
Data Encryption standard in cryptography
Optimizing Parallel Reduction in CUDA : NOTES
MultiThreading-in-system-and-android-logcat-42-.pdf
Building your own NSQL store
Nibiru: Building your own NoSQL store
Nibiru: Building your own NoSQL store
Data Wars: The Bloody Enterprise strikes back
Data oriented design and c++
Ben Coverston - The Apache Cassandra Project
Cryptography (under)engineering
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Data race
Parallel K means clustering using CUDA
Cryptographic algorithms
Cryptographic algorithms
The Curious Clojurist - Neal Ford (Thoughtworks)
Threads V4
A Development of Log-based Game AI using Deep Learning
LeetCode Solutions In Java .pdf
Ad

More from Emery Berger (20)

PPTX
Doppio: Breaking the Browser Language Barrier
PDF
Programming with People
PDF
Stabilizer: Statistically Sound Performance Evaluation
PDF
DieHarder (CCS 2010, WOOT 2011)
PDF
Operating Systems - Advanced File Systems
PDF
Operating Systems - File Systems
PDF
Operating Systems - Networks
PDF
Operating Systems - Queuing Systems
PDF
Operating Systems - Distributed Parallel Computing
PDF
Operating Systems - Concurrency
PDF
Operating Systems - Advanced Synchronization
PDF
Operating Systems - Synchronization
PDF
Processes and Threads
PDF
Virtual Memory and Paging
PDF
Operating Systems - Virtual Memory
PPT
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
PPT
Vam: A Locality-Improving Dynamic Memory Allocator
PPT
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
PDF
Garbage Collection without Paging
PPT
DieHard: Probabilistic Memory Safety for Unsafe Languages
Doppio: Breaking the Browser Language Barrier
Programming with People
Stabilizer: Statistically Sound Performance Evaluation
DieHarder (CCS 2010, WOOT 2011)
Operating Systems - Advanced File Systems
Operating Systems - File Systems
Operating Systems - Networks
Operating Systems - Queuing Systems
Operating Systems - Distributed Parallel Computing
Operating Systems - Concurrency
Operating Systems - Advanced Synchronization
Operating Systems - Synchronization
Processes and Threads
Virtual Memory and Paging
Operating Systems - Virtual Memory
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
Vam: A Locality-Improving Dynamic Memory Allocator
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Garbage Collection without Paging
DieHard: Probabilistic Memory Safety for Unsafe Languages

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
Teaching material agriculture food technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Machine Learning_overview_presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Teaching material agriculture food technology
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Machine Learning_overview_presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm

Dthreads: Efficient Deterministic Multithreading

Editor's Notes

  • #2: In the beginning, there was the Core. And it was good.
  • #3: Casts out the demons of nondeterminism
  • #4: Highlight when same speed or faster.
  • #5: Highlight when same speed or faster.
  • #6: Obviously this doesn’t preserve shared memory semantics, so we need to commit changes made by one thread so they become visible to others.
  • #7: ADD ANIMATIONS: threads initially on one core then migrating, vs. processes spewed across cores
  • #8: ADD ANIMATIONS: threads initially on one core then migrating, vs. processes spewed across cores
  • #9: ADD ANIMATIONS: threads initially on one core then migrating, vs. processes spewed across cores
  • #11: It’s not *always* as fast or faster than pthreads. Slow THEN HIGHLIGHT THE FASTER PARTS.
  • #36: Cache coherence protocol makes false sharing problem unpleasant performance effect
  • #39: Panel 1 = what it does, panel 2 = how, panel 3 = efficient, panel 4 = easy to use