SlideShare a Scribd company logo
11Concurrent Stacks
and Elimination
11.1 Introduction
The Stack<T> class is a collection of items (of type T) that provides push()
and pop() methods satisfying the last-in-first-out (LIFO) property: the last item
pushed is the first popped. This chapter considers how to implement concurrent
stacks. At first glance, stacks seem to provide little opportunity for concurrency,
because push() and pop() calls seem to need to synchronize at the top of the
stack.
Surprisingly, perhaps, stacks are not inherently sequential. In this chapter, we
show how to implement concurrent stacks that can achieve a high degree of par-
allelism. As a first step, we consider how to build a lock-free stack in which pushes
and pops synchronize at a single location.
11.2 An Unbounded Lock-Free Stack
Fig. 11.1 shows a concurrent LockFreeStack class, whose code appears in
Figs. 11.2, 11.3 and 11.4. The lock-free stack is a linked list, where the top field
points to the first node (or null if the stack is empty.) For simplicity, we usually
assume it is illegal to add a null value to a stack.
A pop() call that tries to remove an item from an empty stack throws an excep-
tion. A push() method creates a new node (Line 13), and then calls tryPush()
to try to swing the top reference from the current top-of-stack to its succes-
sor. If tryPush() succeeds, push() returns, and if not, the tryPush() attempt
is repeated after backing off. The pop() method calls tryPop(), which uses
compareAndSet() to try to remove the first node from the stack. If it succeeds,
it returns the node, otherwise it returns null. (It throws an exception if the stack
The Art of Multiprocessor Programming. DOI: 10.1016/B978-0-12-397337-5.00011-3
Copyright © 2012 by Elsevier Inc. All rights reserved.
245
246 Chapter 11 Concurrent Stacks and Elimination
A:pop()
(b)
ls
top
value value value
A:push()
(a) top
value value
value
a
a
Figure 11.1 A Lock-free stack. In Part (a) a thread pushes value a into the stack by applying
a compareAndSet() to the top field. In Part (b) a thread pops value a from the stack by
applying a compareAndSet() to the top field.
1 public class LockFreeStack<T> {
2 AtomicReference<Node> top = new AtomicReference<Node>(null);
3 static final int MIN_DELAY = ...;
4 static final int MAX_DELAY = ...;
5 Backoff backoff = new Backoff(MIN_DELAY, MAX_DELAY);
6
7 protected boolean tryPush(Node node){
8 Node oldTop = top.get();
9 node.next = oldTop;
10 return(top.compareAndSet(oldTop, node));
11 }
12 public void push(T value) {
13 Node node = new Node(value);
14 while (true) {
15 if (tryPush(node)) {
16 return;
17 } else {
18 backoff.backoff();
19 }
20 }
21 }
Figure 11.2 The LockFreeStack<T> class: in the push() method, threads alternate between
trying to alter the top reference by calling tryPush(), and backing off using the Backoff class
from Fig. 7.5 of Chapter 7.
is empty.) The tryPop() method is called until it succeeds, at which point pop()
returns the value from the removed node.
As we have seen in Chapter 7, one can significantly reduce contention at the
top field using exponential backoff (see Fig. 7.5 of Chapter 7). Accordingly, both
11.2 An Unbounded Lock-Free Stack 247
1 public class Node {
2 public T value;
3 public Node next;
4 public Node(T value) {
5 value = value;
6 next = null;
7 }
8 }
Figure 11.3 Lock-free stack list node.
1 protected Node tryPop() throws EmptyException {
2 Node oldTop = top.get();
3 if (oldTop == null) {
4 throw new EmptyException();
5 }
6 Node newTop = oldTop.next;
7 if (top.compareAndSet(oldTop, newTop)) {
8 return oldTop;
9 } else {
10 return null;
11 }
12 }
13 public T pop() throws EmptyException {
14 while (true) {
15 Node returnNode = tryPop();
16 if (returnNode != null) {
17 return returnNode.value;
18 } else {
19 backoff.backoff();
20 }
21 }
22 }
Figure 11.4 The LockFreeStack<T> class: The pop() method alternates between trying to
change the top field and backing off.
the push() and pop() methods back off after an unsuccessful call to tryPush()
or tryPop().
This implementation is lock-free because a thread fails to complete a push() or
pop() method call only if there were infinitely many successful calls that modified
the top of the stack. The linearization point of both the push() and the pop()
methods is the successful compareAndSet(), or the throwing of the exception,
in Line 3, in case of a pop() on an empty stack. Note that the compareAndSet()
call by pop() does not have an ABA problem (see Chapter 10) because the Java
garbage collector ensures that a node cannot be reused by one thread, as long as
that node is accessible to another thread. Designing a lock-free stack that avoids
the ABA problem without a garbage collector is left as an exercise.
248 Chapter 11 Concurrent Stacks and Elimination
11.3 Elimination
The LockFreeStack implementation scales poorly, not so much because the
stack’s top field is a source of contention, but primarily because it is a sequen-
tial bottleneck: method calls can proceed only one after the other, ordered by
successful compareAndSet() calls applied to the stack’s top field.
Although exponential backoff can significantly reduce contention, it does
nothing to alleviate the sequential bottleneck. To make the stack parallel, we
exploit this simple observation: if a push() is immediately followed by a pop(),
the two operations cancel out, and the stack’s state does not change. It is as if
both operations never happened. If one could somehow cause concurrent pairs
of pushes and pops to cancel, then threads calling push() could exchange val-
ues with threads calling pop(), without ever modifying the stack itself. These two
calls would eliminate one another.
As depicted in Fig. 11.5, threads eliminate one another through an
EliminationArray in which threads pick random array entries to try to meet
complementary calls. Pairs of complementary push() and pop() calls exchange
values and return. A thread whose call cannot be eliminated, either because it has
failed to find a partner, or found a partner with the wrong kind of method call
(such as a push() meeting a push()), can either try again to eliminate at a new
location, or can access the shared LockFreeStack. The combined data structure,
array, and shared stack, is linearizable because the shared stack is linearizable,
and the eliminated calls can be ordered as if they happened at the point in which
they exchanged values.
We can use the EliminationArray as a backoff scheme on a shared
LockFreeStack. Each thread first accesses the LockFreeStack, and if it fails
A:pop()
top
d e fB:push(b)
C:pop()
B:return()
A:return(b)
C:return(d)
Figure 11.5 The EliminationBackoffStack<T> class. Each thread selects a random loca-
tion in the array. If thread A’s pop() and B’s push() calls arrive at the same location at about
the same time, then they exchange values without accessing the shared LockFreeStack.
Thread C that does not meet another thread eventually pops the shared LockFreeStack.
11.4 The Elimination Backoff Stack 249
to complete its call (that is, the compareAndSet() attempt fails), it attempts to
eliminate its call using the array instead of simply backing off. If it fails to elimi-
nate itself, it calls the LockFreeStack again, and so on. We call this structure an
EliminationBackoffStack.
11.4 The Elimination Backoff Stack
Here is how to construct an EliminationBackoffStack, a lock-free linearizable
stack implementation.
We are reminded of a story about two friends who are discussing poli-
tics on election day, each trying, to no avail, to convince the other to switch
sides.
Finally, one says to the other: “Look, it’s clear that we are unalterably opposed
on every political issue. Our votes will surely cancel out. Why not save ourselves
some time and both agree to not vote today?”
The other agrees enthusiastically and they part.
Shortly after that, a friend of the first one who had heard the conversation says,
“That was a sporting offer you made.”
“Not really,” says the second. “This is the third time I’ve done this today.”
The principle behind our construction is the same. We wish to allow threads
with pushes and pops to coordinate and cancel out, but must avoid a situation in
which a thread can make a sporting offer to more than one other thread. We do
so by implementing the EliminationArray using coordination structures called
exchangers, objects that allow exactly two threads (and no more) to rendezvous
and exchange values.
We already saw how to exchange values using locks in the synchronous queue
of Chapter 10. Here, we need a lock-free exchange, one in which threads spin
rather than block, as we expect them to wait only for very short durations.
11.4.1 A Lock-Free Exchanger
A LockFreeExchanger<T> object permits two threads to exchange values of
type T. If thread A calls the object’s exchange() method with argument a, and
B calls the same object’s exchange() method with argument b, then A’s call will
return value b and vice versa. On a high level, the exchanger works by having the
first thread arrive to write its value, and spin until a second arrives. The second
then detects that the first is waiting, reads its value, and signals the exchange.
They each have now read the other’s value, and can return. The first thread’s call
may timeout if the second does not show up, allowing it to proceed and leave the
exchanger, if it is unable to exchange a value within a reasonable duration.
250 Chapter 11 Concurrent Stacks and Elimination
1 public class LockFreeExchanger<T> {
2 static final int EMPTY = ..., WAITING = ..., BUSY = ...;
3 AtomicStampedReference<T> slot = new AtomicStampedReference<T>(null, 0);
4 public T exchange(T myItem, long timeout, TimeUnit unit)
5 throws TimeoutException {
6 long nanos = unit.toNanos(timeout);
7 long timeBound = System.nanoTime() + nanos;
8 int[] stampHolder = {EMPTY};
9 while (true) {
10 if (System.nanoTime() > timeBound)
11 throw new TimeoutException();
12 T yrItem = slot.get(stampHolder);
13 int stamp = stampHolder[0];
14 switch(stamp) {
15 case EMPTY:
16 if (slot.compareAndSet(yrItem, myItem, EMPTY, WAITING)) {
17 while (System.nanoTime() < timeBound){
18 yrItem = slot.get(stampHolder);
19 if (stampHolder[0] == BUSY) {
20 slot.set(null, EMPTY);
21 return yrItem;
22 }
23 }
24 if (slot.compareAndSet(myItem, null, WAITING, EMPTY)) {
25 throw new TimeoutException();
26 } else {
27 yrItem = slot.get(stampHolder);
28 slot.set(null, EMPTY);
29 return yrItem;
30 }
31 }
32 break;
33 case WAITING:
34 if (slot.compareAndSet(yrItem, myItem, WAITING, BUSY))
35 return yrItem;
36 break;
37 case BUSY:
38 break;
39 default: // impossible
40 ...
41 }
42 }
43 }
44 }
Figure 11.6 The LockFreeExchanger<T> Class.
The LockFreeExchanger<T> class appears in Fig. 11.6. It has a single
AtomicStampedReference<T> field,1
slot. The exchanger has three possible
states: EMPTY, BUSY, or WAITING. The reference’s stamp records the exchanger’s
state (Line 14). The exchanger’s main loop continues until the timeout limit
1 See Chapter 10, Pragma 10.6.1.
11.4 The Elimination Backoff Stack 251
passes, when it throws an exception (Line 10). In the meantime, a thread reads
the state of the slot (Line 12) and proceeds as follows:
If the state is EMPTY, then the thread tries to place its item in the slot and set
the state to WAITING using a compareAndSet() (Line 16). If it fails, then some
other thread succeeds and it retries. If it was successful (Line 17), then its item
is in the slot and the state is WAITING, so it spins, waiting for another thread
to complete the exchange. If another thread shows up, it will take the item in
the slot, replace it with its own, and set the state to BUSY (Line 19), indicat-
ing to the waiting thread that the exchange is complete. The waiting thread
will consume the item and reset the state to EMPTY. Resetting to EMPTY can be
done using a simple write because the waiting thread is the only one that can
change the state from BUSY to EMPTY (Line 20). If no other thread shows up,
the waiting thread needs to reset the state of the slot to EMPTY. This change
requires a compareAndSet() because other threads might be attempting to
exchange by setting the state from WAITING to BUSY (Line 24). If the call is suc-
cessful, it raises a timeout exception. If, however, the call fails, some exchang-
ing thread must have shown up, so the waiting thread completes the exchange
(Line 26).
If the state is WAITING, then some thread is waiting and the slot contains its
item. The thread takes the item, and tries to replace it with its own by changing
the state from WAITING to BUSY using a compareAndSet() (Line 34). It may
fail if another thread succeeds, or the other thread resets the state to EMPTY
following a timeout. If so, the thread must retry. If it does succeed changing
the state to BUSY, then it can return the item.
If the state is BUSY then two other threads are currently using the slot for an
exchange and the thread must retry (Line 37).
Notice that the algorithm allows the inserted item to be null, something
used later in the elimination array construction. There is no ABA problem
because the compareAndSet() call that changes the state never inspects the item.
A successful exchange’s linearization point occurs when the second thread to
arrive changes the state from WAITING to BUSY (Line 34). At this point both
exchange() calls overlap, and the exchange is committed to being successful. An
unsuccessful exchange’s linearization point occurs when the timeout exception is
thrown.
The algorithm is lock-free because overlapping exchange() calls with suffi-
cient time to exchange will fail only if other exchanges are repeatedly succeeding.
Clearly, too short an exchange time can cause a thread never to succeed, so care
must be taken when choosing timeout durations.
11.4.2 The Elimination Array
An EliminationArray is implemented as an array of Exchanger objects of
maximal size capacity. A thread attempting to perform an exchange picks an
array entry at random, and calls that entry’s exchange() method, providing
252 Chapter 11 Concurrent Stacks and Elimination
1 public class EliminationArray<T> {
2 private static final int duration = ...;
3 LockFreeExchanger<T>[] exchanger;
4 Random random;
5 public EliminationArray(int capacity) {
6 exchanger = (LockFreeExchanger<T>[]) new LockFreeExchanger[capacity];
7 for (int i = 0; i < capacity; i++) {
8 exchanger[i] = new LockFreeExchanger<T>();
9 }
10 random = new Random();
11 }
12 public T visit(T value, int range) throws TimeoutException {
13 int slot = random.nextInt(range);
14 return (exchanger[slot].exchange(value, duration,
15 TimeUnit.MILLISECONDS));
16 }
17 }
Figure 11.7 The EliminationArray<T> class: in each visit, a thread can choose dynamically
the sub-range of the array from which it will will randomly select a slot.
its own input as an exchange value with another thread. The code for the
EliminationArray appears in Fig. 11.7. The constructor takes as an argu-
ment the capacity of the array (the number of distinct exchangers). The
EliminationArray class provides a single method, visit(), which takes time-
out arguments. (Following the conventions used in the java.util.concurrent pack-
age, a timeout is expressed as a number and a time unit.) The visit() call takes
a value of type T and either returns the value input by its exchange partner,
or throws an exception if the timeout expires without exchanging a value with
another thread. At any point in time, each thread will select a random location in
a subrange of the array (Line 13). This subrange will be determined dynamically
based on the load on the data structure, and will be passed as a parameter to the
visit() method.
The EliminationBackoffStack is a subclass of LockFreeStack that over-
rides the push() and pop() methods, and adds an EliminationArray field.
Figs. 11.8 and 11.9 show the new push() and pop() methods. Upon failure of
a tryPush() or tryPop() attempt, instead of simply backing off, these methods
try to use the EliminationArray to exchange values (Lines 15 and 34). A push()
call calls visit() with its input value as argument, and a pop() call with null as
argument. Both push() and pop() have a thread-local RangePolicy object that
determines the EliminationArray subrange to be used.
When push() calls visit(), it selects a random array entry within its range
and attempts to exchange a value with another thread. If the exchange is suc-
cessful, the pushing thread checks whether the value was exchanged with a pop()
method (Line 18) by testing if the value exchanged was null. (Recall that pop()
always offers null to the exchanger while push() always offers a non-null value.)
Symmetrically, when pop() calls visit(), it attempts an exchange, and if the
11.4 The Elimination Backoff Stack 253
1 public class EliminationBackoffStack<T> extends LockFreeStack<T> {
2 static final int capacity = ...;
3 EliminationArray<T> eliminationArray = new EliminationArray<T>(capacity);
4 static ThreadLocal<RangePolicy> policy = new ThreadLocal<RangePolicy>() {
5 protected synchronized RangePolicy initialValue() {
6 return new RangePolicy();
7 }
8
9 public void push(T value) {
10 RangePolicy rangePolicy = policy.get();
11 Node node = new Node(value);
12 while (true) {
13 if (tryPush(node)) {
14 return;
15 } else try {
16 T otherValue = eliminationArray.visit
17 (value, rangePolicy.getRange());
18 if (otherValue == null) {
19 rangePolicy.recordEliminationSuccess();
20 return; // exchanged with pop
21 }
22 } catch (TimeoutException ex) {
23 rangePolicy.recordEliminationTimeout();
24 }
25 }
26 }
27 }
Figure 11.8 The EliminationBackoffStack<T> class: this push() method overrides the
LockFreeStack push() method. Instead of using a simple Backoff class, it uses an
EliminationArray and a dynamic RangePolicy to select the subrange of the array within
which to eliminate.
28 public T pop() throws EmptyException {
29 RangePolicy rangePolicy = policy.get();
30 while (true) {
31 Node returnNode = tryPop();
32 if (returnNode != null) {
33 return returnNode.value;
34 } else try {
35 T otherValue = eliminationArray.visit(null, rangePolicy.getRange());
36 if (otherValue != null) {
37 rangePolicy.recordEliminationSuccess();
38 return otherValue;
39 }
40 } catch (TimeoutException ex) {
41 rangePolicy.recordEliminationTimeout();
42 }
43 }
44 }
Figure 11.9 The EliminationBackoffStack<T> class: this pop() method overrides the
LockFreeStack push() method.
254 Chapter 11 Concurrent Stacks and Elimination
exchange is successful it checks (Line 36) whether the value was exchanged with
a push() call by checking whether it is not null.
It is possible that the exchange will be unsuccessful, either because no
exchange took place (the call to visit() timed out) or because the exchange
was with the same type of operation (such as a pop() with a pop()). For brevity,
we choose a simple approach to deal with such cases: we retry the tryPush() or
tryPop() calls (Lines 13 and 31).
One important parameter is the range of the EliminationArray from which
a thread selects an Exchanger location. A smaller range will allow a greater
chance of a successful collision when there are few threads, while a larger range
will lower the chances of threads waiting on a busy Exchanger (recall that an
Exchanger can only handle one exchange at a time). Thus, if few threads access
the array, they should choose smaller ranges, and as the number of threads
increase, so should the range. One can control the range dynamically using a
RangePolicy object that records both successful exchanges (as in Line 37) and
timeout failures (Line 40). We ignore exchanges that fail because the operations
do not match (such as push() with push()), because they account for a fixed frac-
tion of the exchanges for any given distribution of push() and pop() calls. One
simple policy is to shrink the range as the number of failures increases and vice
versa.
There are many other possible policies. For example, one can devise a more
elaborate range selection policy, vary the delays on the exchangers dynamically,
add additional backoff delays before accessing the shared stack, and control
whether to access the shared stack or the array dynamically. We leave these as
exercises.
The EliminationBackoffStack is a linearizable stack: any successful push()
or pop() call that completes by accessing the LockFreeStack can be linearized at
the point of its LockFreeStack access. Any pair of eliminated push() and pop()
calls can be linearized when they collide. As noted earlier, the method calls com-
pleted through elimination do not affect the linearizability of those completed
in the LockFreeStack, because they could have taken effect in any state of the
LockFreeStack, and having taken effect, the state of the LockFreeStack would
not have changed.
Because the EliminationArray is effectively used as a backoff scheme, we
expect it to deliver performance comparable to the LockFreeStack at low loads.
Unlike the LockFreeStack, it has the potential to scale. As the load increases, the
number of successful eliminations will grow, allowing many operations to com-
plete in parallel. Moreover, contention at the LockFreeStack is reduced because
eliminated operations never access the stack.
11.5 Chapter Notes
The LockFreeStack is credited to Treiber [145]. Actually it predates Treiber’s
report in 1986. It was probably invented in the early 1970s to motivate the
11.6 Exercises 255
CAS operation on the IBM 370. The EliminationBackoffStack is due to
Danny Hendler, Nir Shavit, and Lena Yerushalmi [57]. An efficient exchanger,
which quite interestingly uses an elimination array, was introduced by Doug Lea,
Michael Scott, and Bill Scherer [136]. A variant of this exchanger appears in the
Java Concurrency Package. The EliminationBackoffStack we present here is
modular, making use of exchangers, but somewhat inefficient. Mark Moir, Daniel
Nussbaum, Ori Shalev, and Nir Shavit present a highly effective implementation
of an EliminationArray [120].
11.6 Exercises
Exercise 126. Design an unbounded lock-based Stack<T> implementation based
on a linked list.
Exercise 127. Design a bounded lock-based Stack<T> using an array.
1. Use a single lock and a bounded array.
2. Try to make your algorithm lock-free. Where do you run into difficulty?
Exercise 128. Modify the unbounded lock-free stack of Section 11.2 to work
in the absence of a garbage collector. Create a thread-local pool of preallo-
cated nodes and recycle them. To avoid the ABA problem, consider using the
AtomicStampedReference<T> class from java.util.concurrent.atomic that encap-
sulates both a reference and an integer stamp.
Exercise 129. Discuss the backoff policies used in our implementation. Does it
make sense to use the same shared Backoff object for both pushes and pops in
our LockFreeStack<T> object? How else could we structure the backoff in space
and time in the EliminationBackoffStack<T>?
Exercise 130. Implement a stack algorithm assuming there is a bound, in any state
of the execution, on the total difference between the number of pushes and pops
to the stack.
Exercise 131. Consider the problem of implementing a bounded stack using an
array indexed by a top counter, initially zero. In the absence of concurrency, these
methods are almost trivial. To push an item, increment top to reserve an array
entry, and then store the item at that index. To pop an item, decrement top, and
return the item at the previous top index.
Clearly, this strategy does not work for concurrent implementations, because
one cannot make atomic changes to multiple memory locations. A single
synchronization operation can either increment or decrement the top counter,
but not both, and there is no way atomically to increment the counter and store
a value.
256 Chapter 11 Concurrent Stacks and Elimination
Nevertheless, Bob D. Hacker decides to solve this problem. He decides to adapt
the dual-data structure approach of Chapter 10 to implement a dual stack. His
DualStack<T> class splits push() and pop() methods into reservation and ful-
fillment steps. Bob’s implementation appears in Fig. 11.10.
The stack’s top is indexed by the top field, an AtomicInteger manipu-
lated only by getAndIncrement() and getAndDecrement() calls. Bob’s push()
1 public class DualStack<T> {
2 private class Slot {
3 boolean full = false;
4 volatile T value = null;
5 }
6 Slot[] stack;
7 int capacity;
8 private AtomicInteger top = new AtomicInteger(0); // array index
9 public DualStack(int myCapacity) {
10 capacity = myCapacity;
11 stack = (Slot[]) new Object[capacity];
12 for (int i = 0; i < capacity; i++) {
13 stack[i] = new Slot();
14 }
15 }
16 public T pop() throws EmptyException {
17 while (true) {
18 int i = top.getAndDecrement();
19 if (i <= 0) { // is stack empty?
20 throw new EmptyException();
21 } else if (i-1 < capacity){
22 while (!stack [i-1].full ) {};
23 T value = stack[i-1].value;
24 stack[i-1].full = false;
25 return value ; //pop fulfilled
26 }
27 }
28 }
29 public T pop() throws EmptyException {
30 while (true) {
31 int i = top.getAndDecrement();
32 if (i < 0) { // is stack empty?
33 throw new EmptyException();
34 } else if (i < capacity - 1) {
35 while (!stack[i].full){};
36 T value = stack[i].value;
37 stack[i].full = false;
38 return value; //pop fulfilled
39 }
40 }
41 }
42 }
Figure 11.10 Bob’s problematic dual stack.
11.6 Exercises 257
method’s reservation step reserves a slot by applying getAndIncrement() to top.
Suppose the call returns index i. If i is in the range 0...capacity − 1, the reser-
vation is complete. In the fulfillment phase, push(x) stores x at index i in the
array, and raises the full flag to indicate that the value is ready to be read. The
value field must be volatile to guarantee that once flag is raised, the value
has already been written to index i of the array.
If the index returned from push()’s getAndIncrement() is less than 0,
the push() method repeatedly retries getAndIncrement() until it returns an
index greater than or equal to 0. The index could be less than 0 due to
getAndDecrement() calls of failed pop() calls to an empty stack. Each such failed
getAndDecrement() decrements the top by one more past the 0 array bound.
If the index returned is greater than capacity −1, push() throws an exception
because the stack is full.
The situation is symmetric for pop(). It checks that the index is within the
bounds and removes an item by applying getAndDecrement() to top, returning
index i. If i is in the range 0...capacity−1, the reservation is complete. For the
fulfillment phase, pop() spins on the full flag of array slot i, until it detects that
the flag is true, indicating that the push() call is successful.
What is wrong with Bob’s algorithm? Is this an inherent problem or can you
think of a way to fix it?
Exercise 132. In Exercise 97 we ask you to implement the Rooms interface, repro-
duced in Fig. 11.11. The Rooms class manages a collection of rooms, indexed
from 0 to m (where m is a known constant). Threads can enter or exit any
room in that range. Each room can hold an arbitrary number of threads simul-
taneously, but only one room can be occupied at a time. The last thread to
leave a room triggers an onEmpty() handler, which runs while all rooms are
empty.
Fig. 11.12 shows an incorrect concurrent stack implementation.
1. Explain why this stack implementation does not work.
2. Fix it by adding calls to a two-room Rooms class: one room for pushing and
one for popping.
1 public interface Rooms {
2 public interface Handler {
3 void onEmpty();
4 }
5 void enter(int i);
6 boolean exit();
7 public void setExitHandler(int i, Rooms.Handler h) ;
8 }
Figure 11.11 The Rooms interface.
258 Chapter 11 Concurrent Stacks and Elimination
1 public class Stack<T> {
2 private AtomicInteger top;
3 private T[] items;
4 public Stack(int capacity) {
5 top = new AtomicInteger();
6 items = (T[]) new Object[capacity];
7 }
8 public void push(T x) throws FullException {
9 int i = top.getAndIncrement();
10 if (i >= items.length) { // stack is full
11 top.getAndDecrement(); // restore state
12 throw new FullException();
13 }
14 items[i] = x;
15 }
16 public T pop() throws EmptyException {
17 int i = top.getAndDecrement() - 1;
18 if (i < 0) { // stack is empty
19 top.getAndIncrement(); // restore state
20 throw new EmptyException();
21 }
22 return items[i];
23 }
24 }
Figure 11.12 Unsynchronized concurrent stack.
Exercise 133. This exercise is a follow-on to Exercise 132. Instead of having the
push() method throw FullException, exploit the push room’s exit handler to
resize the array. Remember that no thread can be in any room when an exit han-
dler is running, so (of course) only one exit handler can run at a time.

More Related Content

PPTX
Ppt on java basics
PDF
The Ring programming language version 1.10 book - Part 126 of 212
PPTX
chap4 ; Flow Control, Assertions, and Exception Handling (scjp/ocjp)
PPT
C Sharp Jn (3)
PPT
Loops
DOCX
Python unit 3 and Unit 4
PDF
Maxbox starter
PPTX
LISP:Loops In Lisp
Ppt on java basics
The Ring programming language version 1.10 book - Part 126 of 212
chap4 ; Flow Control, Assertions, and Exception Handling (scjp/ocjp)
C Sharp Jn (3)
Loops
Python unit 3 and Unit 4
Maxbox starter
LISP:Loops In Lisp

What's hot (20)

PDF
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala Part 2 ...
PDF
The Ring programming language version 1.8 book - Part 24 of 202
PDF
Monad Fact #6
PDF
Ad hoc Polymorphism using Type Classes and Cats
PPT
Java Puzzlers
PPTX
C# Loops
PDF
State Monad
PDF
Java concepts and questions
PPTX
Programming Homework Help
PPT
Clean code _v2003
PDF
06 -working_with_strings
PDF
The Ring programming language version 1.9 book - Part 124 of 210
TXT
Indus Valley Partner aptitude questions and answers
PDF
The Ring programming language version 1.8 book - Part 117 of 202
PDF
non-strict functions, bottom and scala by-name parameters
PDF
The Ring programming language version 1.5.1 book - Part 18 of 180
PDF
Keygenning using the Z3 SMT Solver
PPTX
Operating System Engineering
PPTX
Java 8 features
PDF
Python-02| Input, Output & Import
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala Part 2 ...
The Ring programming language version 1.8 book - Part 24 of 202
Monad Fact #6
Ad hoc Polymorphism using Type Classes and Cats
Java Puzzlers
C# Loops
State Monad
Java concepts and questions
Programming Homework Help
Clean code _v2003
06 -working_with_strings
The Ring programming language version 1.9 book - Part 124 of 210
Indus Valley Partner aptitude questions and answers
The Ring programming language version 1.8 book - Part 117 of 202
non-strict functions, bottom and scala by-name parameters
The Ring programming language version 1.5.1 book - Part 18 of 180
Keygenning using the Z3 SMT Solver
Operating System Engineering
Java 8 features
Python-02| Input, Output & Import
Ad

Similar to Concurrent Stacks and Elimination : The Art of Multiprocessor Programming : Notes (20)

PDF
Linked List Static and Dynamic Memory Allocation
PPT
Stacks, Queues, Deques
PPT
Chap 4 List of Data Structure.ppt
PPT
23 stacks-queues-deques
PPT
7 stacksqueues
PPTX
Data Structures and Agorithm: DS 06 Stack.pptx
PDF
Linked Lists: The Role of Locking : The Art of Multiprocessor Programming : N...
PPT
16-StacksQueuesCVCJUCGTCXYFRSTTIUGIUFTY.ppt
PPTX
Stack and Queue
PDF
stacks and queues
PPTX
Introduction to Concurrent Data Structures
PDF
Lockless
PPTX
sorting.pptx
PDF
JUnit PowerUp
PPT
lecture10trsgfchjvxgfzfdchgdchgcgshyjh.ppt
PPTX
U3.stack queue
PPT
Stack linked list
PDF
DSA (Data Structure and Algorithm) Questions
PDF
5 chapter3 list_stackqueuepart2
PPTX
My lecture stack_queue_operation
Linked List Static and Dynamic Memory Allocation
Stacks, Queues, Deques
Chap 4 List of Data Structure.ppt
23 stacks-queues-deques
7 stacksqueues
Data Structures and Agorithm: DS 06 Stack.pptx
Linked Lists: The Role of Locking : The Art of Multiprocessor Programming : N...
16-StacksQueuesCVCJUCGTCXYFRSTTIUGIUFTY.ppt
Stack and Queue
stacks and queues
Introduction to Concurrent Data Structures
Lockless
sorting.pptx
JUnit PowerUp
lecture10trsgfchjvxgfzfdchgdchgcgshyjh.ppt
U3.stack queue
Stack linked list
DSA (Data Structure and Algorithm) Questions
5 chapter3 list_stackqueuepart2
My lecture stack_queue_operation
Ad

More from Subhajit Sahu (20)

PDF
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
PDF
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
PDF
Adjusting Bitset for graph : SHORT REPORT / NOTES
PDF
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
PDF
Adjusting primitives for graph : SHORT REPORT / NOTES
PDF
Experiments with Primitive operations : SHORT REPORT / NOTES
PDF
PageRank Experiments : SHORT REPORT / NOTES
PDF
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
PDF
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
PDF
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
PDF
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
PDF
Shared memory Parallelism (NOTES)
PDF
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
PDF
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
PDF
Application Areas of Community Detection: A Review : NOTES
PDF
Community Detection on the GPU : NOTES
PDF
Survey for extra-child-process package : NOTES
PDF
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
PDF
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
PDF
Fast Incremental Community Detection on Dynamic Graphs : NOTES
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Adjusting Bitset for graph : SHORT REPORT / NOTES
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Adjusting primitives for graph : SHORT REPORT / NOTES
Experiments with Primitive operations : SHORT REPORT / NOTES
PageRank Experiments : SHORT REPORT / NOTES
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Shared memory Parallelism (NOTES)
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Application Areas of Community Detection: A Review : NOTES
Community Detection on the GPU : NOTES
Survey for extra-child-process package : NOTES
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Fast Incremental Community Detection on Dynamic Graphs : NOTES

Recently uploaded (20)

PPT
FABRICATION OF MOS FET BJT DEVICES IN NANOMETER
PDF
Cableado de Controladores Logicos Programables
PPTX
1.pptxsadafqefeqfeqfeffeqfqeqfeqefqfeqfqeffqe
PPTX
PROGRAMMING-QUARTER-2-PYTHON.pptxnsnsndn
PPTX
KVL KCL ppt electrical electronics eee tiet
PDF
-DIGITAL-INDIA.pdf one of the most prominent
PPTX
Embedded for Artificial Intelligence 1.pptx
PPTX
Wireless and Mobile Backhaul Market.pptx
PPTX
5. MEASURE OF INTERIOR AND EXTERIOR- MATATAG CURRICULUM.pptx
PPTX
DEATH AUDIT MAY 2025.pptxurjrjejektjtjyjjy
PPTX
Lecture 3b C Library _ ESP32.pptxjfjfjffkkfkfk
PPTX
"Fundamentals of Digital Image Processing: A Visual Approach"
PPTX
Embeded System for Artificial intelligence 2.pptx
PPTX
making presentation that do no stick.pptx
PDF
Smarter Security: How Door Access Control Works with Alarms & CCTV
PPTX
Prograce_Present.....ggation_Simple.pptx
PPTX
Nanokeyer nano keyekr kano ketkker nano keyer
DOCX
A PROPOSAL ON IoT climate sensor 2.docx
PPTX
了解新西兰毕业证(Wintec毕业证书)怀卡托理工学院毕业证存档可查的
PPTX
code of ethics.pptxdvhwbssssSAssscasascc
FABRICATION OF MOS FET BJT DEVICES IN NANOMETER
Cableado de Controladores Logicos Programables
1.pptxsadafqefeqfeqfeffeqfqeqfeqefqfeqfqeffqe
PROGRAMMING-QUARTER-2-PYTHON.pptxnsnsndn
KVL KCL ppt electrical electronics eee tiet
-DIGITAL-INDIA.pdf one of the most prominent
Embedded for Artificial Intelligence 1.pptx
Wireless and Mobile Backhaul Market.pptx
5. MEASURE OF INTERIOR AND EXTERIOR- MATATAG CURRICULUM.pptx
DEATH AUDIT MAY 2025.pptxurjrjejektjtjyjjy
Lecture 3b C Library _ ESP32.pptxjfjfjffkkfkfk
"Fundamentals of Digital Image Processing: A Visual Approach"
Embeded System for Artificial intelligence 2.pptx
making presentation that do no stick.pptx
Smarter Security: How Door Access Control Works with Alarms & CCTV
Prograce_Present.....ggation_Simple.pptx
Nanokeyer nano keyekr kano ketkker nano keyer
A PROPOSAL ON IoT climate sensor 2.docx
了解新西兰毕业证(Wintec毕业证书)怀卡托理工学院毕业证存档可查的
code of ethics.pptxdvhwbssssSAssscasascc

Concurrent Stacks and Elimination : The Art of Multiprocessor Programming : Notes

  • 1. 11Concurrent Stacks and Elimination 11.1 Introduction The Stack<T> class is a collection of items (of type T) that provides push() and pop() methods satisfying the last-in-first-out (LIFO) property: the last item pushed is the first popped. This chapter considers how to implement concurrent stacks. At first glance, stacks seem to provide little opportunity for concurrency, because push() and pop() calls seem to need to synchronize at the top of the stack. Surprisingly, perhaps, stacks are not inherently sequential. In this chapter, we show how to implement concurrent stacks that can achieve a high degree of par- allelism. As a first step, we consider how to build a lock-free stack in which pushes and pops synchronize at a single location. 11.2 An Unbounded Lock-Free Stack Fig. 11.1 shows a concurrent LockFreeStack class, whose code appears in Figs. 11.2, 11.3 and 11.4. The lock-free stack is a linked list, where the top field points to the first node (or null if the stack is empty.) For simplicity, we usually assume it is illegal to add a null value to a stack. A pop() call that tries to remove an item from an empty stack throws an excep- tion. A push() method creates a new node (Line 13), and then calls tryPush() to try to swing the top reference from the current top-of-stack to its succes- sor. If tryPush() succeeds, push() returns, and if not, the tryPush() attempt is repeated after backing off. The pop() method calls tryPop(), which uses compareAndSet() to try to remove the first node from the stack. If it succeeds, it returns the node, otherwise it returns null. (It throws an exception if the stack The Art of Multiprocessor Programming. DOI: 10.1016/B978-0-12-397337-5.00011-3 Copyright © 2012 by Elsevier Inc. All rights reserved. 245
  • 2. 246 Chapter 11 Concurrent Stacks and Elimination A:pop() (b) ls top value value value A:push() (a) top value value value a a Figure 11.1 A Lock-free stack. In Part (a) a thread pushes value a into the stack by applying a compareAndSet() to the top field. In Part (b) a thread pops value a from the stack by applying a compareAndSet() to the top field. 1 public class LockFreeStack<T> { 2 AtomicReference<Node> top = new AtomicReference<Node>(null); 3 static final int MIN_DELAY = ...; 4 static final int MAX_DELAY = ...; 5 Backoff backoff = new Backoff(MIN_DELAY, MAX_DELAY); 6 7 protected boolean tryPush(Node node){ 8 Node oldTop = top.get(); 9 node.next = oldTop; 10 return(top.compareAndSet(oldTop, node)); 11 } 12 public void push(T value) { 13 Node node = new Node(value); 14 while (true) { 15 if (tryPush(node)) { 16 return; 17 } else { 18 backoff.backoff(); 19 } 20 } 21 } Figure 11.2 The LockFreeStack<T> class: in the push() method, threads alternate between trying to alter the top reference by calling tryPush(), and backing off using the Backoff class from Fig. 7.5 of Chapter 7. is empty.) The tryPop() method is called until it succeeds, at which point pop() returns the value from the removed node. As we have seen in Chapter 7, one can significantly reduce contention at the top field using exponential backoff (see Fig. 7.5 of Chapter 7). Accordingly, both
  • 3. 11.2 An Unbounded Lock-Free Stack 247 1 public class Node { 2 public T value; 3 public Node next; 4 public Node(T value) { 5 value = value; 6 next = null; 7 } 8 } Figure 11.3 Lock-free stack list node. 1 protected Node tryPop() throws EmptyException { 2 Node oldTop = top.get(); 3 if (oldTop == null) { 4 throw new EmptyException(); 5 } 6 Node newTop = oldTop.next; 7 if (top.compareAndSet(oldTop, newTop)) { 8 return oldTop; 9 } else { 10 return null; 11 } 12 } 13 public T pop() throws EmptyException { 14 while (true) { 15 Node returnNode = tryPop(); 16 if (returnNode != null) { 17 return returnNode.value; 18 } else { 19 backoff.backoff(); 20 } 21 } 22 } Figure 11.4 The LockFreeStack<T> class: The pop() method alternates between trying to change the top field and backing off. the push() and pop() methods back off after an unsuccessful call to tryPush() or tryPop(). This implementation is lock-free because a thread fails to complete a push() or pop() method call only if there were infinitely many successful calls that modified the top of the stack. The linearization point of both the push() and the pop() methods is the successful compareAndSet(), or the throwing of the exception, in Line 3, in case of a pop() on an empty stack. Note that the compareAndSet() call by pop() does not have an ABA problem (see Chapter 10) because the Java garbage collector ensures that a node cannot be reused by one thread, as long as that node is accessible to another thread. Designing a lock-free stack that avoids the ABA problem without a garbage collector is left as an exercise.
  • 4. 248 Chapter 11 Concurrent Stacks and Elimination 11.3 Elimination The LockFreeStack implementation scales poorly, not so much because the stack’s top field is a source of contention, but primarily because it is a sequen- tial bottleneck: method calls can proceed only one after the other, ordered by successful compareAndSet() calls applied to the stack’s top field. Although exponential backoff can significantly reduce contention, it does nothing to alleviate the sequential bottleneck. To make the stack parallel, we exploit this simple observation: if a push() is immediately followed by a pop(), the two operations cancel out, and the stack’s state does not change. It is as if both operations never happened. If one could somehow cause concurrent pairs of pushes and pops to cancel, then threads calling push() could exchange val- ues with threads calling pop(), without ever modifying the stack itself. These two calls would eliminate one another. As depicted in Fig. 11.5, threads eliminate one another through an EliminationArray in which threads pick random array entries to try to meet complementary calls. Pairs of complementary push() and pop() calls exchange values and return. A thread whose call cannot be eliminated, either because it has failed to find a partner, or found a partner with the wrong kind of method call (such as a push() meeting a push()), can either try again to eliminate at a new location, or can access the shared LockFreeStack. The combined data structure, array, and shared stack, is linearizable because the shared stack is linearizable, and the eliminated calls can be ordered as if they happened at the point in which they exchanged values. We can use the EliminationArray as a backoff scheme on a shared LockFreeStack. Each thread first accesses the LockFreeStack, and if it fails A:pop() top d e fB:push(b) C:pop() B:return() A:return(b) C:return(d) Figure 11.5 The EliminationBackoffStack<T> class. Each thread selects a random loca- tion in the array. If thread A’s pop() and B’s push() calls arrive at the same location at about the same time, then they exchange values without accessing the shared LockFreeStack. Thread C that does not meet another thread eventually pops the shared LockFreeStack.
  • 5. 11.4 The Elimination Backoff Stack 249 to complete its call (that is, the compareAndSet() attempt fails), it attempts to eliminate its call using the array instead of simply backing off. If it fails to elimi- nate itself, it calls the LockFreeStack again, and so on. We call this structure an EliminationBackoffStack. 11.4 The Elimination Backoff Stack Here is how to construct an EliminationBackoffStack, a lock-free linearizable stack implementation. We are reminded of a story about two friends who are discussing poli- tics on election day, each trying, to no avail, to convince the other to switch sides. Finally, one says to the other: “Look, it’s clear that we are unalterably opposed on every political issue. Our votes will surely cancel out. Why not save ourselves some time and both agree to not vote today?” The other agrees enthusiastically and they part. Shortly after that, a friend of the first one who had heard the conversation says, “That was a sporting offer you made.” “Not really,” says the second. “This is the third time I’ve done this today.” The principle behind our construction is the same. We wish to allow threads with pushes and pops to coordinate and cancel out, but must avoid a situation in which a thread can make a sporting offer to more than one other thread. We do so by implementing the EliminationArray using coordination structures called exchangers, objects that allow exactly two threads (and no more) to rendezvous and exchange values. We already saw how to exchange values using locks in the synchronous queue of Chapter 10. Here, we need a lock-free exchange, one in which threads spin rather than block, as we expect them to wait only for very short durations. 11.4.1 A Lock-Free Exchanger A LockFreeExchanger<T> object permits two threads to exchange values of type T. If thread A calls the object’s exchange() method with argument a, and B calls the same object’s exchange() method with argument b, then A’s call will return value b and vice versa. On a high level, the exchanger works by having the first thread arrive to write its value, and spin until a second arrives. The second then detects that the first is waiting, reads its value, and signals the exchange. They each have now read the other’s value, and can return. The first thread’s call may timeout if the second does not show up, allowing it to proceed and leave the exchanger, if it is unable to exchange a value within a reasonable duration.
  • 6. 250 Chapter 11 Concurrent Stacks and Elimination 1 public class LockFreeExchanger<T> { 2 static final int EMPTY = ..., WAITING = ..., BUSY = ...; 3 AtomicStampedReference<T> slot = new AtomicStampedReference<T>(null, 0); 4 public T exchange(T myItem, long timeout, TimeUnit unit) 5 throws TimeoutException { 6 long nanos = unit.toNanos(timeout); 7 long timeBound = System.nanoTime() + nanos; 8 int[] stampHolder = {EMPTY}; 9 while (true) { 10 if (System.nanoTime() > timeBound) 11 throw new TimeoutException(); 12 T yrItem = slot.get(stampHolder); 13 int stamp = stampHolder[0]; 14 switch(stamp) { 15 case EMPTY: 16 if (slot.compareAndSet(yrItem, myItem, EMPTY, WAITING)) { 17 while (System.nanoTime() < timeBound){ 18 yrItem = slot.get(stampHolder); 19 if (stampHolder[0] == BUSY) { 20 slot.set(null, EMPTY); 21 return yrItem; 22 } 23 } 24 if (slot.compareAndSet(myItem, null, WAITING, EMPTY)) { 25 throw new TimeoutException(); 26 } else { 27 yrItem = slot.get(stampHolder); 28 slot.set(null, EMPTY); 29 return yrItem; 30 } 31 } 32 break; 33 case WAITING: 34 if (slot.compareAndSet(yrItem, myItem, WAITING, BUSY)) 35 return yrItem; 36 break; 37 case BUSY: 38 break; 39 default: // impossible 40 ... 41 } 42 } 43 } 44 } Figure 11.6 The LockFreeExchanger<T> Class. The LockFreeExchanger<T> class appears in Fig. 11.6. It has a single AtomicStampedReference<T> field,1 slot. The exchanger has three possible states: EMPTY, BUSY, or WAITING. The reference’s stamp records the exchanger’s state (Line 14). The exchanger’s main loop continues until the timeout limit 1 See Chapter 10, Pragma 10.6.1.
  • 7. 11.4 The Elimination Backoff Stack 251 passes, when it throws an exception (Line 10). In the meantime, a thread reads the state of the slot (Line 12) and proceeds as follows: If the state is EMPTY, then the thread tries to place its item in the slot and set the state to WAITING using a compareAndSet() (Line 16). If it fails, then some other thread succeeds and it retries. If it was successful (Line 17), then its item is in the slot and the state is WAITING, so it spins, waiting for another thread to complete the exchange. If another thread shows up, it will take the item in the slot, replace it with its own, and set the state to BUSY (Line 19), indicat- ing to the waiting thread that the exchange is complete. The waiting thread will consume the item and reset the state to EMPTY. Resetting to EMPTY can be done using a simple write because the waiting thread is the only one that can change the state from BUSY to EMPTY (Line 20). If no other thread shows up, the waiting thread needs to reset the state of the slot to EMPTY. This change requires a compareAndSet() because other threads might be attempting to exchange by setting the state from WAITING to BUSY (Line 24). If the call is suc- cessful, it raises a timeout exception. If, however, the call fails, some exchang- ing thread must have shown up, so the waiting thread completes the exchange (Line 26). If the state is WAITING, then some thread is waiting and the slot contains its item. The thread takes the item, and tries to replace it with its own by changing the state from WAITING to BUSY using a compareAndSet() (Line 34). It may fail if another thread succeeds, or the other thread resets the state to EMPTY following a timeout. If so, the thread must retry. If it does succeed changing the state to BUSY, then it can return the item. If the state is BUSY then two other threads are currently using the slot for an exchange and the thread must retry (Line 37). Notice that the algorithm allows the inserted item to be null, something used later in the elimination array construction. There is no ABA problem because the compareAndSet() call that changes the state never inspects the item. A successful exchange’s linearization point occurs when the second thread to arrive changes the state from WAITING to BUSY (Line 34). At this point both exchange() calls overlap, and the exchange is committed to being successful. An unsuccessful exchange’s linearization point occurs when the timeout exception is thrown. The algorithm is lock-free because overlapping exchange() calls with suffi- cient time to exchange will fail only if other exchanges are repeatedly succeeding. Clearly, too short an exchange time can cause a thread never to succeed, so care must be taken when choosing timeout durations. 11.4.2 The Elimination Array An EliminationArray is implemented as an array of Exchanger objects of maximal size capacity. A thread attempting to perform an exchange picks an array entry at random, and calls that entry’s exchange() method, providing
  • 8. 252 Chapter 11 Concurrent Stacks and Elimination 1 public class EliminationArray<T> { 2 private static final int duration = ...; 3 LockFreeExchanger<T>[] exchanger; 4 Random random; 5 public EliminationArray(int capacity) { 6 exchanger = (LockFreeExchanger<T>[]) new LockFreeExchanger[capacity]; 7 for (int i = 0; i < capacity; i++) { 8 exchanger[i] = new LockFreeExchanger<T>(); 9 } 10 random = new Random(); 11 } 12 public T visit(T value, int range) throws TimeoutException { 13 int slot = random.nextInt(range); 14 return (exchanger[slot].exchange(value, duration, 15 TimeUnit.MILLISECONDS)); 16 } 17 } Figure 11.7 The EliminationArray<T> class: in each visit, a thread can choose dynamically the sub-range of the array from which it will will randomly select a slot. its own input as an exchange value with another thread. The code for the EliminationArray appears in Fig. 11.7. The constructor takes as an argu- ment the capacity of the array (the number of distinct exchangers). The EliminationArray class provides a single method, visit(), which takes time- out arguments. (Following the conventions used in the java.util.concurrent pack- age, a timeout is expressed as a number and a time unit.) The visit() call takes a value of type T and either returns the value input by its exchange partner, or throws an exception if the timeout expires without exchanging a value with another thread. At any point in time, each thread will select a random location in a subrange of the array (Line 13). This subrange will be determined dynamically based on the load on the data structure, and will be passed as a parameter to the visit() method. The EliminationBackoffStack is a subclass of LockFreeStack that over- rides the push() and pop() methods, and adds an EliminationArray field. Figs. 11.8 and 11.9 show the new push() and pop() methods. Upon failure of a tryPush() or tryPop() attempt, instead of simply backing off, these methods try to use the EliminationArray to exchange values (Lines 15 and 34). A push() call calls visit() with its input value as argument, and a pop() call with null as argument. Both push() and pop() have a thread-local RangePolicy object that determines the EliminationArray subrange to be used. When push() calls visit(), it selects a random array entry within its range and attempts to exchange a value with another thread. If the exchange is suc- cessful, the pushing thread checks whether the value was exchanged with a pop() method (Line 18) by testing if the value exchanged was null. (Recall that pop() always offers null to the exchanger while push() always offers a non-null value.) Symmetrically, when pop() calls visit(), it attempts an exchange, and if the
  • 9. 11.4 The Elimination Backoff Stack 253 1 public class EliminationBackoffStack<T> extends LockFreeStack<T> { 2 static final int capacity = ...; 3 EliminationArray<T> eliminationArray = new EliminationArray<T>(capacity); 4 static ThreadLocal<RangePolicy> policy = new ThreadLocal<RangePolicy>() { 5 protected synchronized RangePolicy initialValue() { 6 return new RangePolicy(); 7 } 8 9 public void push(T value) { 10 RangePolicy rangePolicy = policy.get(); 11 Node node = new Node(value); 12 while (true) { 13 if (tryPush(node)) { 14 return; 15 } else try { 16 T otherValue = eliminationArray.visit 17 (value, rangePolicy.getRange()); 18 if (otherValue == null) { 19 rangePolicy.recordEliminationSuccess(); 20 return; // exchanged with pop 21 } 22 } catch (TimeoutException ex) { 23 rangePolicy.recordEliminationTimeout(); 24 } 25 } 26 } 27 } Figure 11.8 The EliminationBackoffStack<T> class: this push() method overrides the LockFreeStack push() method. Instead of using a simple Backoff class, it uses an EliminationArray and a dynamic RangePolicy to select the subrange of the array within which to eliminate. 28 public T pop() throws EmptyException { 29 RangePolicy rangePolicy = policy.get(); 30 while (true) { 31 Node returnNode = tryPop(); 32 if (returnNode != null) { 33 return returnNode.value; 34 } else try { 35 T otherValue = eliminationArray.visit(null, rangePolicy.getRange()); 36 if (otherValue != null) { 37 rangePolicy.recordEliminationSuccess(); 38 return otherValue; 39 } 40 } catch (TimeoutException ex) { 41 rangePolicy.recordEliminationTimeout(); 42 } 43 } 44 } Figure 11.9 The EliminationBackoffStack<T> class: this pop() method overrides the LockFreeStack push() method.
  • 10. 254 Chapter 11 Concurrent Stacks and Elimination exchange is successful it checks (Line 36) whether the value was exchanged with a push() call by checking whether it is not null. It is possible that the exchange will be unsuccessful, either because no exchange took place (the call to visit() timed out) or because the exchange was with the same type of operation (such as a pop() with a pop()). For brevity, we choose a simple approach to deal with such cases: we retry the tryPush() or tryPop() calls (Lines 13 and 31). One important parameter is the range of the EliminationArray from which a thread selects an Exchanger location. A smaller range will allow a greater chance of a successful collision when there are few threads, while a larger range will lower the chances of threads waiting on a busy Exchanger (recall that an Exchanger can only handle one exchange at a time). Thus, if few threads access the array, they should choose smaller ranges, and as the number of threads increase, so should the range. One can control the range dynamically using a RangePolicy object that records both successful exchanges (as in Line 37) and timeout failures (Line 40). We ignore exchanges that fail because the operations do not match (such as push() with push()), because they account for a fixed frac- tion of the exchanges for any given distribution of push() and pop() calls. One simple policy is to shrink the range as the number of failures increases and vice versa. There are many other possible policies. For example, one can devise a more elaborate range selection policy, vary the delays on the exchangers dynamically, add additional backoff delays before accessing the shared stack, and control whether to access the shared stack or the array dynamically. We leave these as exercises. The EliminationBackoffStack is a linearizable stack: any successful push() or pop() call that completes by accessing the LockFreeStack can be linearized at the point of its LockFreeStack access. Any pair of eliminated push() and pop() calls can be linearized when they collide. As noted earlier, the method calls com- pleted through elimination do not affect the linearizability of those completed in the LockFreeStack, because they could have taken effect in any state of the LockFreeStack, and having taken effect, the state of the LockFreeStack would not have changed. Because the EliminationArray is effectively used as a backoff scheme, we expect it to deliver performance comparable to the LockFreeStack at low loads. Unlike the LockFreeStack, it has the potential to scale. As the load increases, the number of successful eliminations will grow, allowing many operations to com- plete in parallel. Moreover, contention at the LockFreeStack is reduced because eliminated operations never access the stack. 11.5 Chapter Notes The LockFreeStack is credited to Treiber [145]. Actually it predates Treiber’s report in 1986. It was probably invented in the early 1970s to motivate the
  • 11. 11.6 Exercises 255 CAS operation on the IBM 370. The EliminationBackoffStack is due to Danny Hendler, Nir Shavit, and Lena Yerushalmi [57]. An efficient exchanger, which quite interestingly uses an elimination array, was introduced by Doug Lea, Michael Scott, and Bill Scherer [136]. A variant of this exchanger appears in the Java Concurrency Package. The EliminationBackoffStack we present here is modular, making use of exchangers, but somewhat inefficient. Mark Moir, Daniel Nussbaum, Ori Shalev, and Nir Shavit present a highly effective implementation of an EliminationArray [120]. 11.6 Exercises Exercise 126. Design an unbounded lock-based Stack<T> implementation based on a linked list. Exercise 127. Design a bounded lock-based Stack<T> using an array. 1. Use a single lock and a bounded array. 2. Try to make your algorithm lock-free. Where do you run into difficulty? Exercise 128. Modify the unbounded lock-free stack of Section 11.2 to work in the absence of a garbage collector. Create a thread-local pool of preallo- cated nodes and recycle them. To avoid the ABA problem, consider using the AtomicStampedReference<T> class from java.util.concurrent.atomic that encap- sulates both a reference and an integer stamp. Exercise 129. Discuss the backoff policies used in our implementation. Does it make sense to use the same shared Backoff object for both pushes and pops in our LockFreeStack<T> object? How else could we structure the backoff in space and time in the EliminationBackoffStack<T>? Exercise 130. Implement a stack algorithm assuming there is a bound, in any state of the execution, on the total difference between the number of pushes and pops to the stack. Exercise 131. Consider the problem of implementing a bounded stack using an array indexed by a top counter, initially zero. In the absence of concurrency, these methods are almost trivial. To push an item, increment top to reserve an array entry, and then store the item at that index. To pop an item, decrement top, and return the item at the previous top index. Clearly, this strategy does not work for concurrent implementations, because one cannot make atomic changes to multiple memory locations. A single synchronization operation can either increment or decrement the top counter, but not both, and there is no way atomically to increment the counter and store a value.
  • 12. 256 Chapter 11 Concurrent Stacks and Elimination Nevertheless, Bob D. Hacker decides to solve this problem. He decides to adapt the dual-data structure approach of Chapter 10 to implement a dual stack. His DualStack<T> class splits push() and pop() methods into reservation and ful- fillment steps. Bob’s implementation appears in Fig. 11.10. The stack’s top is indexed by the top field, an AtomicInteger manipu- lated only by getAndIncrement() and getAndDecrement() calls. Bob’s push() 1 public class DualStack<T> { 2 private class Slot { 3 boolean full = false; 4 volatile T value = null; 5 } 6 Slot[] stack; 7 int capacity; 8 private AtomicInteger top = new AtomicInteger(0); // array index 9 public DualStack(int myCapacity) { 10 capacity = myCapacity; 11 stack = (Slot[]) new Object[capacity]; 12 for (int i = 0; i < capacity; i++) { 13 stack[i] = new Slot(); 14 } 15 } 16 public T pop() throws EmptyException { 17 while (true) { 18 int i = top.getAndDecrement(); 19 if (i <= 0) { // is stack empty? 20 throw new EmptyException(); 21 } else if (i-1 < capacity){ 22 while (!stack [i-1].full ) {}; 23 T value = stack[i-1].value; 24 stack[i-1].full = false; 25 return value ; //pop fulfilled 26 } 27 } 28 } 29 public T pop() throws EmptyException { 30 while (true) { 31 int i = top.getAndDecrement(); 32 if (i < 0) { // is stack empty? 33 throw new EmptyException(); 34 } else if (i < capacity - 1) { 35 while (!stack[i].full){}; 36 T value = stack[i].value; 37 stack[i].full = false; 38 return value; //pop fulfilled 39 } 40 } 41 } 42 } Figure 11.10 Bob’s problematic dual stack.
  • 13. 11.6 Exercises 257 method’s reservation step reserves a slot by applying getAndIncrement() to top. Suppose the call returns index i. If i is in the range 0...capacity − 1, the reser- vation is complete. In the fulfillment phase, push(x) stores x at index i in the array, and raises the full flag to indicate that the value is ready to be read. The value field must be volatile to guarantee that once flag is raised, the value has already been written to index i of the array. If the index returned from push()’s getAndIncrement() is less than 0, the push() method repeatedly retries getAndIncrement() until it returns an index greater than or equal to 0. The index could be less than 0 due to getAndDecrement() calls of failed pop() calls to an empty stack. Each such failed getAndDecrement() decrements the top by one more past the 0 array bound. If the index returned is greater than capacity −1, push() throws an exception because the stack is full. The situation is symmetric for pop(). It checks that the index is within the bounds and removes an item by applying getAndDecrement() to top, returning index i. If i is in the range 0...capacity−1, the reservation is complete. For the fulfillment phase, pop() spins on the full flag of array slot i, until it detects that the flag is true, indicating that the push() call is successful. What is wrong with Bob’s algorithm? Is this an inherent problem or can you think of a way to fix it? Exercise 132. In Exercise 97 we ask you to implement the Rooms interface, repro- duced in Fig. 11.11. The Rooms class manages a collection of rooms, indexed from 0 to m (where m is a known constant). Threads can enter or exit any room in that range. Each room can hold an arbitrary number of threads simul- taneously, but only one room can be occupied at a time. The last thread to leave a room triggers an onEmpty() handler, which runs while all rooms are empty. Fig. 11.12 shows an incorrect concurrent stack implementation. 1. Explain why this stack implementation does not work. 2. Fix it by adding calls to a two-room Rooms class: one room for pushing and one for popping. 1 public interface Rooms { 2 public interface Handler { 3 void onEmpty(); 4 } 5 void enter(int i); 6 boolean exit(); 7 public void setExitHandler(int i, Rooms.Handler h) ; 8 } Figure 11.11 The Rooms interface.
  • 14. 258 Chapter 11 Concurrent Stacks and Elimination 1 public class Stack<T> { 2 private AtomicInteger top; 3 private T[] items; 4 public Stack(int capacity) { 5 top = new AtomicInteger(); 6 items = (T[]) new Object[capacity]; 7 } 8 public void push(T x) throws FullException { 9 int i = top.getAndIncrement(); 10 if (i >= items.length) { // stack is full 11 top.getAndDecrement(); // restore state 12 throw new FullException(); 13 } 14 items[i] = x; 15 } 16 public T pop() throws EmptyException { 17 int i = top.getAndDecrement() - 1; 18 if (i < 0) { // stack is empty 19 top.getAndIncrement(); // restore state 20 throw new EmptyException(); 21 } 22 return items[i]; 23 } 24 } Figure 11.12 Unsynchronized concurrent stack. Exercise 133. This exercise is a follow-on to Exercise 132. Instead of having the push() method throw FullException, exploit the push room’s exit handler to resize the array. Remember that no thread can be in any room when an exit han- dler is running, so (of course) only one exit handler can run at a time.