SlideShare a Scribd company logo
Lockless data structures
Sandeep Joshi (DC Engines)
Critical sections...
Stack::pop()
{
Lock.acquire()
Return top value
Move top to top->next
Lock.release();
}
HashTable::insert(element)
{
Lock.acquire()
Find hash bucket and insert
Lock.release();
}
Critical sections
Critical sections are like transactions. They ensure invariants on data structures
continue to hold.
Critical sections can be protected by
1. Locks (default approach)
2. Lockless - use Atomic operations (compare and swap instruction) and
load-store fences
3. Hardware transactional memory ( See Intel xbegin, xend, xabort)
Like Pune traffic
Not all Data structures easy to make lockless
Lists
● Singly-linked, doubly-linked.
● Queue, Stack, Set
Unordered : Hash table (build on singly-linked list solution)
Ordered : Skip list (build on singly-linked list), Red Black tree (requires localized
rebalancing), AVL tree (harder due to wider rebalancing).
Lockfree versus Waitfree
Concurrency levels
1. LockFree : if overall system progresses but individual threads may see delay.
You see retries (e.g. “While loop” which retries if atomic operation failed)
2. WaitFree : if the operation completes in finite number of steps. (e.g. a read
on a Multi-versioned data structure)
The same data structure can have some operations which are lockfree, and others
which are waitfree.
Basic weapon (for this talk)
C++ : std::atomic<T>has compare_exchange_strong(T& expectedValue, T desiredValue)
Java : AtomicReference (and other Atomic types) has compareAndSet(V expectedValue, V
desiredValue)
C (GNU builtin) : __sync_val_compare_and_swap(T* ptr, T expectedValue, T desiredValue)
Bool CAS(variable, expectedVal, desiredVal) {
If (Variable == expectedValue) {
Variable = desiredValue
Return true
} else { return false }
}
What we will cover
1. Stack
2. Queue
3. RCU
For Lists, see Herb Sutter’s talk on “Lock-free programming” at CppCon 2014.
Stack
bool Stack::push(int key) {
Node* newNode = new Node(key)
Do {
oldHead = top;
newNode-> next = oldHead
} While (not
top.compare_exchange(oldHead,
newNode))
}
Int Stack::pop() {
Do {
oldHead = top;
nextNode = top->next
} While (not
top.compare_exchange(top,nextNode));
int return_key = oldHead->key
return return_key;
}
Class Stack { atomic<Node*> top }
Stack
Problem : Every thread is doing read-modify-write on the same memory address
(Stack.top). The corresponding cache line keeps bouncing between CPU cores.
Solution : Find a way to match up simultaneous “push” and “pop” calls. Let the
two threads communicate without changing “Stack.top”.
Atomic exchange between 2 threads
EMPTY
value=nil
WAITING
value=T1.val
BUSY
value=T2.val
T1 comes, sets
its value, and
waits
T2 arrives, finds value
set. It atomically
exchanges T1.val with
T2.val and changes
state to BUSY
T1 who is
waiting reads
T2.val, resets
and returns
Use “compare and swap” to
atomically exchange a value
between two threads
Define an Exchanger {
state = empty, waiting, busy
Int value
}
Practical implementation in
java.util.concurrent.Exchanger
Stack + EliminationArray combination
EliminationArray is array of Exchanger objects
E1 E2 E3 …. En
stack.top
N1 N1 null
Every thread (push or pop)
first checks in
EliminationArray for a
complementary thread
After timeout, it calls
Stack.push or pop
What we will cover
1. Stack
2. Queue
3. RCU
Queues
Many dimensions to this problem
1. SPSC, SPMC, MPSC, MPMC (SPSC=Single Producer, Single Consumer)
2. Bounded vs unbounded
3. Blocking or nonblocking
4. Priorities, Intrusive, Ordering..
http://www.1024cores.net/home/lock-free-algorithms/queues
Queue with sentinel
HEAD TAIL
Sentinel New node
HEAD TAIL
Sentinel
HEAD TAIL
Deleted New sentinel
Enqueue
Dequeue
Return the node value and
turn it into sentinel
Unbounded SPSC (*incomplete)
SPSC_Queue { atomic<Node*> Head, Tail; }
enqueue(T elem) {
Node* newNode = new Node(elem)
Tail->next = newNode
Tail.store(newNode)
}
dequeue(T& returnElem) {
If (Head->next = null) { throw Empty; }
returnElem = Head->next->value;
Head.store(Head->next)
Delete oldHead;
}
Head = Tail = new Node()
First node is always Sentinel
Dequeue always returns value of
next node
Bounded SPSC
ProducerConsumerQueue
● atomic<int> readIndex
● Item records[size]
● atomic<int> writeIndex
enqueue(Item newElem) {
Int freeSlot = writeIndex.load()
If freeSlot + 1 != readIndex.load() {
records[freeSlot] = newElem
writeIndex.store(freeSlot)
}
dequeue(Item& returnElem) {
Int curSlot = readindex.load()
If curSlot != writeIndex.load() {
returnElem = record[curSlot]
readIndex.store(curSlot + 1)
}
Based on Facebook folly library
Avoid cache
line sharing
What we will cover
1. Stack
2. Queue
3. RCU
Multiple readers, one writer (with locks)
READER
1. Take Read lock
2. Safely read pointer and act
3. Release read lock
WRITER
1. Take write lock
2. Free pointer
3. Release write lock
This is the conventional approach
Multiple readers, one writer (RCU)
READER
1. Record new reader
2. Safely access the pointer
3. Inform reader finished
WRITER
1. Switch the pointer
2. Ensure all readers gone (Drain the queue in Grace period)
3. Free pointer
Multiple readers, one writer (RCU)
READER
1. Record new reader (rcu_read_lock)
2. Safely access the pointer
3. Inform reader finished (rcu_read_unlock)
WRITER
1. Switch the pointer (rcu_assign_pointer(ptr,val))
2. Ensure all readers gone (synchronize_rcu)
3. Free pointer
RCU (Read copy update)
On preemptible Linux kernels
1. Preemption disabled for Reader on calling “rcu_read_lock()”
2. Writer runs on every CPU core when “synchronize_rcu()” called to ensure all
readers have completed.
On real-time Linux kernels : Introduce two queues (current and next) to record the
Readers that were present before and after Writer started.
Userspace RCU : Same API now available for use in userspace
(https://github.com/urcu)
Some tricks used in lockless programming
1. Sentinels
2. Unused bits in 64-bit pointers
3. Lazy delete
4. Two (or more) bottlenecks better than one
5. Padding to avoid false cache line sharing
Trick 1 : Sentinels
Sentinel node is pre-allocated and never deleted.
Head and tail point to Sentinel when List or
Queue is empty.
This helps because when List/Queue transitions
from empty to non-empty or vice-versa, you don’t
have to update two variables atomically which
can get tricky.
class Queue {
Node* head, tail;
};
Head = tail = new Node(sentinel)
Trick 2 : Unused bits in pointer
Addresses on Intel x86-64 and ARM64 are limited to 48 bits. The unused higher
16 bits can be used to store a “marker” with every pointer. This allows you to use
“compare-and-swap” instruction to atomically change “pointer + custom info”
Facebook Folly C++ library : PackedSyncPtr, DiscriminatedPtr exploits this.
Java has AtomicMarkableReference, AtomicStampedReference.
Caveat : The number of unused bits may shrink with newer processors.
Intel also has “Cmpxcng16B” to manipulate 128 bit values.
Trick 3 : Lazy delete
Updater sets marker bit on the Node.
Marked Node is skipped during traversal until it is safe to delete
Deleted = true
Trick 4 : Two bottlenecks better than one
Cache line bouncing is reduced if threads can spin(i.e. do CAS) on multiple
variables instead of one. Seen in the Stack + EliminationArray example earlier.
Same with WaitQueue below. Each thread adds its own node to the WaitQueue.
It spins on local variable inside Node until woken up by Predecessor.
Wait Queue T1’s node T2’s node T3’s node
Trick 5 : Padding to avoid false cache line sharing
Class Queue {
Atomic<int> head;
Char cache_line_pad[CACHE_LINE_SIZE (e.g.64 byte)];
Atomic<int> tail; // Keeps head and tail on separate cache
lines
}
https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
Locks vs Lockless
● Locks can increase context switches
● Lockless can increase cache line contention
Which option performs better depends on several factors...
Language support
Golang : Philosophy is to “share memory by communicating instead of
communicating by shared memory”. But see ‘sync.atomic” package.
Java : “volatile” variables ensure sequential consistency. “Java.util.concurrent”,
sun.misc.unsafe.compareAndSwapObject()
C++ : std::atomic provides multiple levels of consistency
1. sequential consistency.
2. acquire, release, consume (not discussed today).
3. relaxed.
Who uses lockless ?
1. Early adopters were desktop audio drivers [1]
2. MemSQL : pervasive use of lockfree data structures
3. Couchbase : Nitro storage engine
4. DataDomain (EMC) : lockfree doubly linked list
5. Facebook Folly library
6. Java.util.concurrent (Doug Lea)
7. Linux kernel (other mechanisms besides RCU)
[1] http://www.rossbencina.com/code/lockfree
Not covered
1. ABA problem and Hazard pointers
2. Weaker memory models
3. Concurrent Skip List, Hash tables, Trees
4. Underlying Memory allocation also needs to be lockfree (e.g. Streamflow)
References
1. Herlihy, et al. The Art of Multiprocessor Programming
2. McKenney, Paul. Is Parallel Programming Hard, And, If So, What Can You
Do. About It?
3. http://1024cores.net
4. http://preshing.com
5. http://www.rdrop.com/~paulmck/
6. http://www.rossbencina.com/code/lockfree

More Related Content

PPTX
Concurrency in Java
PDF
Java 8 - Stamped Lock
PDF
Java Concurrency Idioms
PPTX
Concurrent talk
PPTX
Basics of Java Concurrency
ODP
Java Concurrency
PPT
Java concurrency
DOCX
Java 5 concurrency
Concurrency in Java
Java 8 - Stamped Lock
Java Concurrency Idioms
Concurrent talk
Basics of Java Concurrency
Java Concurrency
Java concurrency
Java 5 concurrency

What's hot (20)

PDF
Actor Concurrency
PPTX
.NET Multithreading/Multitasking
PPTX
Introduction to TPL
PPTX
Thread syncronization
KEY
Java and the machine - Martijn Verburg and Kirk Pepperdine
ODP
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
PDF
Python multithreaded programming
PDF
Java synchronizers
PPTX
Java Performance Tweaks
PPT
Inter threadcommunication.38
PPT
Python by ganesh kavhar
PPT
Tech talk
PPTX
What is String in Java?
PPTX
Introduction to Java Strings, By Kavita Ganesan
PPT
iOS Multithreading
PPTX
Java Strings
PPTX
Introduction Big Data and Hadoop
PDF
Non-blocking Michael-Scott queue algorithm
PDF
AWS Java SDK @ scale
Actor Concurrency
.NET Multithreading/Multitasking
Introduction to TPL
Thread syncronization
Java and the machine - Martijn Verburg and Kirk Pepperdine
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Python multithreaded programming
Java synchronizers
Java Performance Tweaks
Inter threadcommunication.38
Python by ganesh kavhar
Tech talk
What is String in Java?
Introduction to Java Strings, By Kavita Ganesan
iOS Multithreading
Java Strings
Introduction Big Data and Hadoop
Non-blocking Michael-Scott queue algorithm
AWS Java SDK @ scale
Ad

Similar to Lockless (20)

PDF
无锁编程
PDF
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
PDF
The Need for Async @ ScalaWorld
DOC
Concurrency Learning From Jdk Source
PDF
Lock free programming- pro tips
PDF
Lock free algorithms
PDF
Need for Async: Hot pursuit for scalable applications
PPTX
Introduction to Concurrent Data Structures
PPTX
Synchronization problem with threads
PPT
Lockless Programming GDC 09
PDF
Concurrency on the JVM
PDF
Lock free programming - pro tips devoxx uk
PPTX
Full solution to bounded buffer
PPT
10 Multicore 07
PDF
rcu dan porter read update linux copy.pdf
PPT
The Pillars Of Concurrency
PDF
Java Concurrency in Practice
PDF
jvm/java - towards lock-free concurrency
PPT
Hs java open_party
PPT
Advanced locking
无锁编程
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
The Need for Async @ ScalaWorld
Concurrency Learning From Jdk Source
Lock free programming- pro tips
Lock free algorithms
Need for Async: Hot pursuit for scalable applications
Introduction to Concurrent Data Structures
Synchronization problem with threads
Lockless Programming GDC 09
Concurrency on the JVM
Lock free programming - pro tips devoxx uk
Full solution to bounded buffer
10 Multicore 07
rcu dan porter read update linux copy.pdf
The Pillars Of Concurrency
Java Concurrency in Practice
jvm/java - towards lock-free concurrency
Hs java open_party
Advanced locking
Ad

More from Sandeep Joshi (11)

PDF
Block ciphers
PDF
Synthetic data generation
PDF
How to build a feedback loop in software
PDF
Programming workshop
PDF
Hash function landscape
PDF
Android malware presentation
PDF
Doveryai, no proveryai - Introduction to tla+
PDF
Apache spark undocumented extensions
PPTX
Rate limiters in big data systems
PDF
Virtualization overheads
PPTX
Data streaming algorithms
Block ciphers
Synthetic data generation
How to build a feedback loop in software
Programming workshop
Hash function landscape
Android malware presentation
Doveryai, no proveryai - Introduction to tla+
Apache spark undocumented extensions
Rate limiters in big data systems
Virtualization overheads
Data streaming algorithms

Recently uploaded (20)

PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
medical staffing services at VALiNTRY
PPT
Introduction Database Management System for Course Database
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
history of c programming in notes for students .pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
top salesforce developer skills in 2025.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
CHAPTER 2 - PM Management and IT Context
How Creative Agencies Leverage Project Management Software.pdf
Online Work Permit System for Fast Permit Processing
How to Migrate SBCGlobal Email to Yahoo Easily
Which alternative to Crystal Reports is best for small or large businesses.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
medical staffing services at VALiNTRY
Introduction Database Management System for Course Database
Softaken Excel to vCard Converter Software.pdf
Understanding Forklifts - TECH EHS Solution
history of c programming in notes for students .pptx
Design an Analysis of Algorithms II-SECS-1021-03
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PTS Company Brochure 2025 (1).pdf.......
top salesforce developer skills in 2025.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Upgrade and Innovation Strategies for SAP ERP Customers
VVF-Customer-Presentation2025-Ver1.9.pptx

Lockless

  • 1. Lockless data structures Sandeep Joshi (DC Engines)
  • 2. Critical sections... Stack::pop() { Lock.acquire() Return top value Move top to top->next Lock.release(); } HashTable::insert(element) { Lock.acquire() Find hash bucket and insert Lock.release(); }
  • 3. Critical sections Critical sections are like transactions. They ensure invariants on data structures continue to hold. Critical sections can be protected by 1. Locks (default approach) 2. Lockless - use Atomic operations (compare and swap instruction) and load-store fences 3. Hardware transactional memory ( See Intel xbegin, xend, xabort) Like Pune traffic
  • 4. Not all Data structures easy to make lockless Lists ● Singly-linked, doubly-linked. ● Queue, Stack, Set Unordered : Hash table (build on singly-linked list solution) Ordered : Skip list (build on singly-linked list), Red Black tree (requires localized rebalancing), AVL tree (harder due to wider rebalancing).
  • 5. Lockfree versus Waitfree Concurrency levels 1. LockFree : if overall system progresses but individual threads may see delay. You see retries (e.g. “While loop” which retries if atomic operation failed) 2. WaitFree : if the operation completes in finite number of steps. (e.g. a read on a Multi-versioned data structure) The same data structure can have some operations which are lockfree, and others which are waitfree.
  • 6. Basic weapon (for this talk) C++ : std::atomic<T>has compare_exchange_strong(T& expectedValue, T desiredValue) Java : AtomicReference (and other Atomic types) has compareAndSet(V expectedValue, V desiredValue) C (GNU builtin) : __sync_val_compare_and_swap(T* ptr, T expectedValue, T desiredValue) Bool CAS(variable, expectedVal, desiredVal) { If (Variable == expectedValue) { Variable = desiredValue Return true } else { return false } }
  • 7. What we will cover 1. Stack 2. Queue 3. RCU For Lists, see Herb Sutter’s talk on “Lock-free programming” at CppCon 2014.
  • 8. Stack bool Stack::push(int key) { Node* newNode = new Node(key) Do { oldHead = top; newNode-> next = oldHead } While (not top.compare_exchange(oldHead, newNode)) } Int Stack::pop() { Do { oldHead = top; nextNode = top->next } While (not top.compare_exchange(top,nextNode)); int return_key = oldHead->key return return_key; } Class Stack { atomic<Node*> top }
  • 9. Stack Problem : Every thread is doing read-modify-write on the same memory address (Stack.top). The corresponding cache line keeps bouncing between CPU cores. Solution : Find a way to match up simultaneous “push” and “pop” calls. Let the two threads communicate without changing “Stack.top”.
  • 10. Atomic exchange between 2 threads EMPTY value=nil WAITING value=T1.val BUSY value=T2.val T1 comes, sets its value, and waits T2 arrives, finds value set. It atomically exchanges T1.val with T2.val and changes state to BUSY T1 who is waiting reads T2.val, resets and returns Use “compare and swap” to atomically exchange a value between two threads Define an Exchanger { state = empty, waiting, busy Int value } Practical implementation in java.util.concurrent.Exchanger
  • 11. Stack + EliminationArray combination EliminationArray is array of Exchanger objects E1 E2 E3 …. En stack.top N1 N1 null Every thread (push or pop) first checks in EliminationArray for a complementary thread After timeout, it calls Stack.push or pop
  • 12. What we will cover 1. Stack 2. Queue 3. RCU
  • 13. Queues Many dimensions to this problem 1. SPSC, SPMC, MPSC, MPMC (SPSC=Single Producer, Single Consumer) 2. Bounded vs unbounded 3. Blocking or nonblocking 4. Priorities, Intrusive, Ordering.. http://www.1024cores.net/home/lock-free-algorithms/queues
  • 14. Queue with sentinel HEAD TAIL Sentinel New node HEAD TAIL Sentinel HEAD TAIL Deleted New sentinel Enqueue Dequeue Return the node value and turn it into sentinel
  • 15. Unbounded SPSC (*incomplete) SPSC_Queue { atomic<Node*> Head, Tail; } enqueue(T elem) { Node* newNode = new Node(elem) Tail->next = newNode Tail.store(newNode) } dequeue(T& returnElem) { If (Head->next = null) { throw Empty; } returnElem = Head->next->value; Head.store(Head->next) Delete oldHead; } Head = Tail = new Node() First node is always Sentinel Dequeue always returns value of next node
  • 16. Bounded SPSC ProducerConsumerQueue ● atomic<int> readIndex ● Item records[size] ● atomic<int> writeIndex enqueue(Item newElem) { Int freeSlot = writeIndex.load() If freeSlot + 1 != readIndex.load() { records[freeSlot] = newElem writeIndex.store(freeSlot) } dequeue(Item& returnElem) { Int curSlot = readindex.load() If curSlot != writeIndex.load() { returnElem = record[curSlot] readIndex.store(curSlot + 1) } Based on Facebook folly library Avoid cache line sharing
  • 17. What we will cover 1. Stack 2. Queue 3. RCU
  • 18. Multiple readers, one writer (with locks) READER 1. Take Read lock 2. Safely read pointer and act 3. Release read lock WRITER 1. Take write lock 2. Free pointer 3. Release write lock This is the conventional approach
  • 19. Multiple readers, one writer (RCU) READER 1. Record new reader 2. Safely access the pointer 3. Inform reader finished WRITER 1. Switch the pointer 2. Ensure all readers gone (Drain the queue in Grace period) 3. Free pointer
  • 20. Multiple readers, one writer (RCU) READER 1. Record new reader (rcu_read_lock) 2. Safely access the pointer 3. Inform reader finished (rcu_read_unlock) WRITER 1. Switch the pointer (rcu_assign_pointer(ptr,val)) 2. Ensure all readers gone (synchronize_rcu) 3. Free pointer
  • 21. RCU (Read copy update) On preemptible Linux kernels 1. Preemption disabled for Reader on calling “rcu_read_lock()” 2. Writer runs on every CPU core when “synchronize_rcu()” called to ensure all readers have completed. On real-time Linux kernels : Introduce two queues (current and next) to record the Readers that were present before and after Writer started. Userspace RCU : Same API now available for use in userspace (https://github.com/urcu)
  • 22. Some tricks used in lockless programming 1. Sentinels 2. Unused bits in 64-bit pointers 3. Lazy delete 4. Two (or more) bottlenecks better than one 5. Padding to avoid false cache line sharing
  • 23. Trick 1 : Sentinels Sentinel node is pre-allocated and never deleted. Head and tail point to Sentinel when List or Queue is empty. This helps because when List/Queue transitions from empty to non-empty or vice-versa, you don’t have to update two variables atomically which can get tricky. class Queue { Node* head, tail; }; Head = tail = new Node(sentinel)
  • 24. Trick 2 : Unused bits in pointer Addresses on Intel x86-64 and ARM64 are limited to 48 bits. The unused higher 16 bits can be used to store a “marker” with every pointer. This allows you to use “compare-and-swap” instruction to atomically change “pointer + custom info” Facebook Folly C++ library : PackedSyncPtr, DiscriminatedPtr exploits this. Java has AtomicMarkableReference, AtomicStampedReference. Caveat : The number of unused bits may shrink with newer processors. Intel also has “Cmpxcng16B” to manipulate 128 bit values.
  • 25. Trick 3 : Lazy delete Updater sets marker bit on the Node. Marked Node is skipped during traversal until it is safe to delete Deleted = true
  • 26. Trick 4 : Two bottlenecks better than one Cache line bouncing is reduced if threads can spin(i.e. do CAS) on multiple variables instead of one. Seen in the Stack + EliminationArray example earlier. Same with WaitQueue below. Each thread adds its own node to the WaitQueue. It spins on local variable inside Node until woken up by Predecessor. Wait Queue T1’s node T2’s node T3’s node
  • 27. Trick 5 : Padding to avoid false cache line sharing Class Queue { Atomic<int> head; Char cache_line_pad[CACHE_LINE_SIZE (e.g.64 byte)]; Atomic<int> tail; // Keeps head and tail on separate cache lines } https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
  • 28. Locks vs Lockless ● Locks can increase context switches ● Lockless can increase cache line contention Which option performs better depends on several factors...
  • 29. Language support Golang : Philosophy is to “share memory by communicating instead of communicating by shared memory”. But see ‘sync.atomic” package. Java : “volatile” variables ensure sequential consistency. “Java.util.concurrent”, sun.misc.unsafe.compareAndSwapObject() C++ : std::atomic provides multiple levels of consistency 1. sequential consistency. 2. acquire, release, consume (not discussed today). 3. relaxed.
  • 30. Who uses lockless ? 1. Early adopters were desktop audio drivers [1] 2. MemSQL : pervasive use of lockfree data structures 3. Couchbase : Nitro storage engine 4. DataDomain (EMC) : lockfree doubly linked list 5. Facebook Folly library 6. Java.util.concurrent (Doug Lea) 7. Linux kernel (other mechanisms besides RCU) [1] http://www.rossbencina.com/code/lockfree
  • 31. Not covered 1. ABA problem and Hazard pointers 2. Weaker memory models 3. Concurrent Skip List, Hash tables, Trees 4. Underlying Memory allocation also needs to be lockfree (e.g. Streamflow)
  • 32. References 1. Herlihy, et al. The Art of Multiprocessor Programming 2. McKenney, Paul. Is Parallel Programming Hard, And, If So, What Can You Do. About It? 3. http://1024cores.net 4. http://preshing.com 5. http://www.rdrop.com/~paulmck/ 6. http://www.rossbencina.com/code/lockfree