SlideShare a Scribd company logo
Racing To Win
Using Race Conditions to Build
Correct & Concurrent Software
Nathan Taylor | nathan.dijkstracula.net | @dijkstracula
Racing To Win
Using Race Conditions to Build
Correct & Concurrent Software
Nathan Taylor | nathan.dijkstracula.net | @dijkstracula
Hi, I’m Nathan.
( @dijkstracula )
I’m an engineer at
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
A problem
A problem
A solution
A problem
A solution
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Cache node
Cache node
Process A
stack
heap
text
Process B
stack
heap
text
Process C
stack
heap
text
Cache node
Process A
stack
heap
text
Process B
stack
heap
text
Process C
stack
heap
text
A Persistent, Shared-
State Memory Allocator
Cache node
Process A
stack
heap
text
Process B
stack
heap
text
Process C
stack
heap
text
uSlab
Slab allocation
Slab allocation
Object Object Object Object Object Object Object Object
Object Object Object Object Object Object Object Object
s_alloc();
s_alloc();Object
Object
Object
Object Object Object Object Object
s_alloc();
Object
Object
Object
Object Object Object Object Object
Object
Object
Object
Object Object Object Object Object
s_free(	
  	
  	
  	
  	
  	
  	
  );
s_free(	
  	
  	
  	
  	
  	
  	
  );
s_free(	
  	
  	
  	
  	
  	
  	
  );
Object ObjectObject Object Object Object Object Object
Object Object Object Object Object Object Object Object
Object Object Object Object Object Object Object Object
Allocation Protocol
• An request to allocate is followed by a response
containing an object
• A request to free is followed by a response after the
supplied object has been released



• Allocation requests must not respond with an already-
allocated object
• A free request must not release an already-unallocated
object
An Execution History
An Execution History
void foo() {

obj *a = s_alloc();

s_free(a);

…

}
An Execution History
Time
void foo() {

obj *a = s_alloc();

s_free(a);

…

}
A(allocate request)
B(allocate response)
A(free request)
B(free response)
An Execution History
Time
A(allocate request)
B(allocate request)
A(allocate response)
B(allocate response)
An Execution History
Time
A(allocate request)
B(allocate request)
A(allocate response)
B(allocate response)
“X happened before Y” =>
“Y may observe X to have occurred”
A(allocate response)
A(allocate request)
B(allocate request)
B(allocate response)
Time
A(allocate response)
A(allocate request)
B(allocate request)
B(allocate response)
Time
A(allocate response)
A(allocate request)
B(allocate request)
B(allocate response)
A protocol violation!
Time
Time A(allocate response)
A(allocate request)
B(allocate request)
B(allocate response)
Time A(allocate response)
A(allocate request)
B(allocate request)
B(allocate response)
http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
A Sequential History
Time
A Sequential History
Time
A(allocate request)
A(allocate response)
A(free request)
A(free response)
B(allocate request)
B(allocate response)
A Sequential History
Time
A(allocate request)
A(allocate response)
{ }
A(free request)
A(free response)
{ }
B(allocate request)
B(allocate response)
{ }
A Sequential History
Time
A(allocate request)
A(allocate response)
{ }
A(free request)
A(free response)
{ }
B(allocate request)
B(allocate response)
{ }
A Sequential History
Time
A(allocate request)
A(allocate response)
{ }
A(free request)
A(free response)
{ }
B(allocate request)
B(allocate response)
{ }
obj	
  *allocate(slab	
  *s)	
  {



	
  	
  obj	
  *a	
  =	
  s-­‐>head;

	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  s-­‐>head	
  =	
  a-­‐>next;



	
  	
  return	
  a;

}	
  
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {

	
  	
  o-­‐>next	
  =	
  s-­‐>head;

	
  	
  s-­‐>head	
  =	
  o;

}
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  s-­‐>head	
  =	
  a-­‐>next;

	
  	
  unlock(&allocator_lock);

	
  	
  return	
  a;

}	
  
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {

	
  	
  lock(&allocator_lock);	
  
	
  	
  o-­‐>next	
  =	
  s-­‐>head;

	
  	
  s-­‐>head	
  =	
  o;

	
  	
  unlock(&allocator_lock);	
  
}
Was the State Locked?
Yes
Done
No
Atomic
Fetch Old Lock State
Set State Locked
Was old State Locked?
Yes
Done
No
Atomic
Fetch Old Lock State
Set State Locked
Was old State Locked?
Yes
Done
No
Atomic
Test And Set Lock
Test And Set Unlock
Set State Unlocked
Atomic
typedef	
  spinlock	
  int;

#define	
  LOCKED	
  1

#define	
  UNLOCKED	
  0



void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}	
  
void	
  unlock(spinlock	
  *m)	
  {

	
  	
  atomic_store(m,	
  UNLOCKED);

} Many code examples
derived from Concurrency Kit
http://concurrencykit.org
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
A(TAS request)
A(TAS response)
{ }
A(TAS request)
A(TAS response)
{ }
TAS is embedded in Lock
A(TAS request)
A(TAS response)
{ }
A(lock request)
A(lock response)
Time
TAS is embedded in Lock
A(TAS request)
A(TAS response)
{ }
A(lock request)
A(lock response)
Time
TAS & Store can’t be
reordered
A(TAS request)
A(TAS response)
{ }
A(lock request)
A(lock response)
Time
B(unlock request)
B(unlock response)
B(Store request)
B(Store response)
{ }
TAS & Store can’t be
reordered
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
All execution histories
All sequentially-consistent
execution histories
⊇
All execution histories
All sequentially-consistent
execution histories
All ???able execution
histories
⊇
⊇
All execution histories
All sequentially-consistent
execution histories
All linearizable execution
histories
⊇
⊇
A(TAS request)
A(TAS response)
{ }
A(lock request)
A(lock response)
Time
Others can be reordered
B(unlock request)
B(unlock response)
B(Store request)
B(Store response)
{ }
A(TAS request)
A(TAS response)
{ }
A(lock request)
A(lock response)
Time
Others can be reordered
B(unlock request)
B(unlock response)
B(Store request)
B(Store response)
{ }
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}	
  
void	
  unlock(spinlock	
  *m)	
  {

	
  	
  atomic_store(m,	
  UNLOCKED);

}
http://dl.acm.org/citation.cfm?id=69624.357207
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  s-­‐>head	
  =	
  a-­‐>next;

	
  	
  unlock(&allocator_lock);

	
  	
  return	
  a;

}	
  
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {

	
  	
  lock(&allocator_lock);	
  
	
  	
  o-­‐>next	
  =	
  s-­‐>head;

	
  	
  s-­‐>head	
  =	
  o;

	
  	
  unlock(&allocator_lock);	
  
}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}	
  
void	
  unlock(spinlock	
  *m)	
  {

	
  	
  atomic_store(m,	
  UNLOCKED);

}
Spinlock performance
millionsoflock
acquisitions/sec
15
30
45
60
75
90
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
87.351
Test and Set
Spinlock performance
millionsoflock
acquisitions/sec
15
30
45
60
75
90
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Platonic ideal of a spinlock
Spinlock performance
millionsoflock
acquisitions/sec
15
30
45
60
75
90
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
87.351
4.343
Test and Set
Spinlock performance
millionsoflock
acquisitions/sec
15
30
45
60
75
90
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Test and Set
Spinlock performance
acquisitions/sec
1E+01
1E+02
1E+03
1E+04
1E+05
1E+06
1E+07
1E+08
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Test and Set
typedef	
  spinlock	
  int;

#define	
  LOCKED	
  1

#define	
  UNLOCKED	
  0



void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}	
  
typedef	
  spinlock	
  int;

#define	
  LOCKED	
  1

#define	
  UNLOCKED	
  0



void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  {

	
  	
  	
  	
  while	
  (atomic_store(m)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  	
  	
  snooze();

	
  	
  }

}	
  
typedef	
  spinlock	
  int;

#define	
  LOCKED	
  1

#define	
  UNLOCKED	
  0



void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  {

	
  	
  	
  	
  while	
  (atomic_store(m)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  	
  	
  snooze();

	
  	
  }

}	
  
Test-and-Test-and-Set
Lockedalloc/free(10s)
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Test and Set T&T&S
Spinlock performance
Lockedalloc/free(10s)
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Test and Set T&T&S
typedef	
  spinlock	
  int;

#define	
  LOCKED	
  1

#define	
  UNLOCKED	
  0



void	
  lock(spinlock	
  *m)	
  {	
  
	
  	
  unsigned	
  long	
  backoff,	
  exp	
  =	
  0;	
  

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  {	
  
	
  	
  	
  	
  for	
  (i	
  =	
  0;	
  i	
  <	
  backoff;	
  i++)	
  
	
  	
  	
  	
  	
  	
  snooze();	
  
	
  	
  	
  	
  backoff	
  =	
  (1ULL	
  <<	
  exp++);	
  
	
  	
  }

}	
  
typedef	
  spinlock	
  int;

#define	
  LOCKED	
  1

#define	
  UNLOCKED	
  0



void	
  lock(spinlock	
  *m)	
  {	
  
	
  	
  unsigned	
  long	
  backoff,	
  exp	
  =	
  0;	
  

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  {	
  
	
  	
  	
  	
  for	
  (i	
  =	
  0;	
  i	
  <	
  backoff;	
  i++)	
  
	
  	
  	
  	
  	
  	
  snooze();	
  
	
  	
  	
  	
  backoff	
  =	
  (1ULL	
  <<	
  exp++);	
  
	
  	
  }

}	
  
TAS + backoff
Lockedalloc/free(10s)
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Test and Set T&T&S TAS + EB
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = UNLOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = UNLOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = UNLOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = UNLOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = LOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = LOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = LOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = LOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = LOCKED
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
void	
  lock(spinlock	
  *m)	
  {

	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

}
spinlock	
  global_lock	
  = LOCKED
A function is lock-free if at all times
at least one thread is
guaranteed to be making
progress [in the function].
(Herlihy & Shavit)
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
//	
  TODO:	
  make	
  this	
  safe	
  and	
  scalable

obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  s-­‐>head	
  =	
  a-­‐>next;

	
  	
  return	
  a;

}	
  
//	
  TODO:	
  make	
  this	
  safe	
  and	
  scalable	
  
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {	
  
	
  	
  o-­‐>next	
  =	
  s-­‐>head;

	
  	
  s-­‐>head	
  =	
  o;	
  
}
Non-Blocking
Algorithms
Compare-And-Swap
Compare-And-Swap
Cmpr and *
Old value
Destination
Address
Compare-And-Swap
≠
Return false
Old value
Destination
Address
Cmpr and *
Compare-And-Swap
Old value New value
≠
Return false
=
Destination
Address
Copy to *
Return true
Cmpr and *
Compare-And-Swap
Old value New value
≠
Return false
=
Destination
Address
Return true
Atomic
Copy to *Cmpr and *
Atomic i	
  =	
  i+1;
void	
  atomic_inc(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  i_plus_one;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  
	
  	
  	
  	
  i_plus_one	
  =	
  i	
  +	
  1;

	
  	
  }	
  while	
  (!cas(i,	
  i_plus_one,	
  ptr));	
  


}
void	
  atomic_inc(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  i_plus_one;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  
	
  	
  	
  	
  i_plus_one	
  =	
  i	
  +	
  1;

	
  	
  }	
  while	
  (!cas(i,	
  i_plus_one,	
  ptr));	
  


}
Atomic i	
  =	
  i+1;
void	
  atomic_inc(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  i_plus_one;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  
	
  	
  	
  	
  i_plus_one	
  =	
  i	
  +	
  1;

	
  	
  }	
  while	
  (!cas(i,	
  i_plus_one,	
  ptr));	
  


}
Atomic i	
  =	
  i+1;
void	
  atomic_inc(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  i_plus_one;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  
	
  	
  	
  	
  i_plus_one	
  =	
  i	
  +	
  1;

	
  	
  }	
  while	
  (!cas(i,	
  i_plus_one,	
  ptr));	
  


}
Atomic i	
  =	
  i+1;
void	
  atomic_inc_mod_32(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  new_i;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  
	
  	
  	
  	
  new_i	
  =	
  i	
  +	
  1;	
  
	
  	
  	
  	
  new_i	
  =	
  new_i	
  %	
  32;

	
  	
  }	
  while	
  (!cas(i,	
  new_i,	
  ptr));

}
Atomic i	
  =	
  (i+1)	
  %	
  32;
TAS using CAS
void	
  tas_loop(spinlock	
  *m)	
  {

	
  	
  do	
  {	
  
	
  	
  	
  	
  ;

	
  	
  }	
  while	
  (!cas(UNLOCKED,	
  LOCKED,	
  m));	
  
}
Read/Modify/Write
void	
  atomic_inc_mod_32(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  new_i;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /*	
  Read	
  */	
  
	
  	
  	
  	
  new_i	
  =	
  fancy_function();	
  	
  	
  /*	
  Modify	
  */

	
  	
  }	
  while	
  (!cas(i,	
  new_i,	
  ptr));	
  /*	
  Write	
  */	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  

}
Read/Modify/Write
void	
  atomic_inc_mod_32(int	
  *ptr)	
  {

	
  	
  int	
  i,	
  new_i;

	
  	
  do	
  {	
  
	
  	
  	
  	
  i	
  =	
  *ptr;	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /*	
  Read	
  */	
  
	
  	
  	
  	
  new_i	
  =	
  fancy_function();	
  	
  	
  /*	
  Modify	
  */

	
  	
  }	
  while	
  (!cas(i,	
  new_i,	
  ptr));	
  /*	
  Write	
  */	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /*	
  (or	
  retry)	
  */

}
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head	
  ));

	
  	
  return	
  a;

}
slab head
A B …
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head	
  ));

	
  	
  return	
  a;

}
slab head
A B …
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head	
  ));

	
  	
  return	
  a;

}
A B …slab head
B …
slab head
A
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head	
  ));

	
  	
  return	
  a;

}
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(	
  	
  ,	
  	
  ,	
  	
  	
  	
  	
  	
  	
  	
  	
  ));

	
  	
  return	
  	
  a;

}
slab head
a
a b
Cmpr and *
&s->head
A B …
b
a
slab head
Cmpr and
Z
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(	
  	
  ,	
  	
  ,	
  	
  	
  	
  	
  	
  	
  	
  	
  ));

	
  	
  return	
  	
  a;

}
a
a b &s->head
b
a
slab head
Z A B
Cmpr and
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(	
  	
  ,	
  	
  ,	
  	
  	
  	
  	
  	
  	
  	
  	
  ));

	
  	
  return	
  	
  a;

}
a
a b &s->head
b
a
slab head
B …
Cmpr and
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(	
  	
  ,	
  	
  ,	
  	
  	
  	
  	
  	
  	
  	
  	
  ));

	
  	
  return	
  	
  a;

}
a
a b &s->head
b
a
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {	
  
	
  	
  	
  	
  do	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  obj	
  *t	
  =	
  s-­‐>head;	
  
	
  	
  	
  	
  	
  	
  	
  	
  o-­‐>next	
  =	
  t;	
  
	
  	
  	
  	
  }	
  while	
  (!cas(t,	
  o,	
  &s-­‐>head));	
  
}
B …slab head
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {	
  
	
  	
  	
  	
  do	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  obj	
  *t	
  =	
  s-­‐>head;	
  
	
  	
  	
  	
  	
  	
  	
  	
  o-­‐>next	
  =	
  t;	
  
	
  	
  	
  	
  }	
  while	
  (!cas(t,	
  o,	
  &s-­‐>head));	
  
}
slab head
A B …
A B Cslab head
A B Cslab head
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
A B Cslab head
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
A B Cslab head
A B C
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
slab head
A B C
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
slab head
A B C
some_object	
  =	
  allocate(&shared_slab);
slab head
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
A B C
some_object	
  =	
  allocate(&shared_slab);
slab head
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B C
A
slab head
some_object	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B C
A
slab head
some_object	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B C
another_obj	
  =	
  allocate(&shared_slab);
A
slab head
some_object	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B C
another_obj	
  =	
  allocate(&shared_slab);
A
slab head
some_object	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
C
A
B
slab head
another_obj	
  =	
  allocate(&shared_slab);
some_object	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
C
A
B
slab head
another_obj	
  =	
  allocate(&shared_slab);
some_object	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B
C
A
slab head
another_obj	
  =	
  allocate(&shared_slab);
free(&shared_slab,	
  some_object);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B
C
A
slab head
another_obj	
  =	
  allocate(&shared_slab);
free(&shared_slab,	
  some_object);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
B
A Cslab head
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
free(&shared_slab,	
  some_object);
B
A Cslab head
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
free(&shared_slab,	
  some_object);
B
A Cslab head
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
free(&shared_slab,	
  some_object);
B
A Cslab head
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
free(&shared_slab,	
  some_object);
free(&shared_slab,	
  some_object);
B
B Cslab head
A
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
free(&shared_slab,	
  some_object);
B
B Cslab head
A
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
free(&shared_slab,	
  some_object);
B
B Cslab head
A
another_obj	
  =	
  allocate(&shared_slab);
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
The ABA Problem
“A reference about to be modified by a CAS
changes from a to b and back to a again. As a
result, the CAS succeeds even though its effect on
the data structure has changed and no longer has
the desired effect.” —Herlihy & Shavit, p. 235
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  obj	
  *a,	
  *b;

	
  	
  do	
  {

	
  	
  	
  	
  a	
  =	
  s-­‐>head;

	
  	
  	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  	
  	
  b	
  =	
  a-­‐>next;

	
  	
  }	
  while	
  (!cas(a,	
  b,	
  &s-­‐>head));

	
  	
  return	
  a;

}
A B …slab head
166
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  slab	
  orig,	
  update;

	
  	
  do	
  {

	
  	
  	
  	
  orig.gen	
  =	
  s.gen;

	
  	
  	
  	
  orig.head	
  =	
  s.head;

	
  	
  	
  	
  if	
  (!orig.head)	
  return	
  NULL;	
  
	
  	
  	
  	
  update.gen	
  =	
  orig.gen	
  +	
  1;

	
  	
  	
  	
  update.head	
  =	
  orig.head-­‐>next;

	
  	
  }	
  while	
  (!dcas(&orig,	
  &update,	
  s));

	
  	
  return	
  orig.head;

}
A B …slab head
166
free(slab	
  *s,	
  obj	
  *o)	
  {	
  
	
  	
  	
  	
  do	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  obj	
  *t	
  =	
  s-­‐>head;	
  
	
  	
  	
  	
  	
  	
  	
  	
  o-­‐>next	
  =	
  t;	
  
	
  	
  	
  	
  }	
  while	
  (!cas(t,	
  o,	
  &s-­‐>head));	
  
}
obj	
  *allocate(slab	
  *s)	
  {

	
  	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

	
  	
  if	
  (a	
  ==	
  NULL)	
  return	
  NULL;

	
  	
  s-­‐>head	
  =	
  a-­‐>next;

	
  	
  unlock(&allocator_lock);

	
  	
  return	
  a;

}	
  
void	
  free(slab	
  *s,	
  obj	
  *o)	
  {

	
  	
  lock(&allocator_lock);	
  
	
  	
  o-­‐>next	
  =	
  s-­‐>head;

	
  	
  s-­‐>head	
  =	
  o;

	
  	
  unlock(&allocator_lock);	
  
}
slab head
A B …
obj	
  *o	
  =	
  allocate(&shared_slab);
obj	
  *o	
  =	
  allocate(&shared_slab);
slab head
B …
obj	
  *o	
  =	
  allocate(&shared_slab);
obj	
  *o	
  =	
  allocate(&shared_slab);
A
A
slab head
B …
obj	
  *o	
  =	
  allocate(&shared_slab);
obj	
  *o	
  =	
  allocate(&shared_slab);
A
A
Memory barriers
 	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
	
  	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
 	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
	
  	
  while	
  (atomic_tas(m,	
  LOCKED)	
  ==	
  LOCKED)	
  
	
  	
  	
  	
  snooze();

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
 	
  LDREX	
  R5,	
  [m]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  TAS:	
  fetch.	
  .	
  .	
  
	
  	
  STREXEQ	
  R5,	
  LOCKED,	
  [m]	
  ;	
  TAS:	
  .	
  .	
  .	
  and	
  set	
  
	
  	
  CMPEQ	
  R5,	
  #0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Did	
  we	
  succeed?
	
  LDR	
  R0,	
  [R1,	
  4]	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  a	
  =	
  s-­‐>head
	
  	
  BEQ	
  lock_done	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Yes:	
  we	
  are	
  all	
  done

	
  	
  BL	
  snooze	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  No:	
  Call	
  snooze()…

	
  	
  B	
  lock_loop	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  	
  	
  	
  	
  …then	
  loop	
  again

lock_done:	
  
	
  	
  B	
  LR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  return
;;;;	
  IN	
  lock()	
  
lock_loop:	
  
;;;;	
  IN	
  allocate()	
  
 	
  LDREX	
  R5,	
  [m]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  TAS:	
  fetch.	
  .	
  .	
  
	
  	
  STREXEQ	
  R5,	
  LOCKED,	
  [m]	
  ;	
  TAS:	
  .	
  .	
  .	
  and	
  set	
  
	
  	
  CMPEQ	
  R5,	
  #0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Did	
  we	
  succeed?
	
  LDR	
  R0,	
  [R1,	
  4]	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  a	
  =	
  s-­‐>head
	
  	
  BEQ	
  lock_done	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Yes:	
  we	
  are	
  all	
  done

	
  	
  BL	
  snooze	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  No:	
  Call	
  snooze()…

	
  	
  B	
  lock_loop	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  	
  	
  	
  	
  …then	
  loop	
  again

lock_done:	
  
	
  	
  B	
  LR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  return
;;;;	
  IN	
  allocate()	
  
;;;;	
  IN	
  lock()	
  
lock_loop:	
  
 LDR	
  R0,	
  [R1,	
  4]	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  a	
  =	
  s-­‐>head
	
  	
  BEQ	
  lock_done	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Yes:	
  we	
  are	
  all	
  done

	
  	
  BL	
  snooze	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  No:	
  Call	
  snooze()…

	
  	
  B	
  lock_loop	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  	
  	
  	
  	
  …then	
  loop	
  again

lock_done:	
  
	
  	
  B	
  LR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  return
;;;;	
  IN	
  allocate()	
  
	
  	
  LDREX	
  R5,	
  [m]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  TAS:	
  fetch.	
  .	
  .	
  
	
  	
  STREXEQ	
  R5,	
  LOCKED,	
  [m]	
  ;	
  TAS:	
  .	
  .	
  .	
  and	
  set	
  
	
  	
  CMPEQ	
  R5,	
  #0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Did	
  we	
  succeed?
;;;;	
  IN	
  lock()	
  
lock_loop:	
  
McKenney , p. 504
McKenney , p. 504
McKenney , p. 504
McKenney , p. 504
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
 	
  LDREX	
  R5,	
  [m]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  TAS:	
  fetch.	
  .	
  .	
  
	
  	
  STREXEQ	
  R5,	
  LOCKED,	
  [m]	
  ;	
  TAS:	
  .	
  .	
  .	
  and	
  set	
  
	
  	
  CMPEQ	
  R5,	
  #0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Did	
  we	
  succeed?
	
  LDR	
  R0,	
  [R1,	
  4]	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  a	
  =	
  s-­‐>head
	
  	
  BEQ	
  lock_done	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Yes:	
  we	
  are	
  all	
  done

	
  	
  BL	
  snooze	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  No:	
  Call	
  snooze()…

	
  	
  B	
  lock_loop	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  	
  	
  	
  	
  …then	
  loop	
  again

lock_done:	
  
	
  	
  B	
  LR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  return
;;;;	
  IN	
  allocate()	
  
;;;;	
  IN	
  lock()	
  
lock_loop:	
  
 	
  LDREX	
  R5,	
  [m]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  TAS:	
  fetch.	
  .	
  .	
  
	
  	
  STREXEQ	
  R5,	
  LOCKED,	
  [m]	
  ;	
  TAS:	
  .	
  .	
  .	
  and	
  set	
  
	
  	
  CMPEQ	
  R5,	
  #0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Did	
  we	
  succeed?
	
  LDR	
  R0,	
  [R1,	
  4]	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  a	
  =	
  s-­‐>head
	
  	
  BEQ	
  lock_done	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Yes:	
  we	
  are	
  all	
  done

	
  	
  BL	
  snooze	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  No:	
  Call	
  snooze()…

	
  	
  B	
  lock_loop	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  	
  	
  	
  	
  …then	
  loop	
  again

lock_done:	
  
	
  	
  B	
  LR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  return
;;;;	
  IN	
  allocate()	
  
;;;;	
  IN	
  lock()	
  
lock_loop:	
  
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
 	
  obj	
  *a	
  =	
  s-­‐>head;	
  
	
  	
  lock(&allocator_lock);

…
	
  	
  obj	
  *a	
  =	
  s-­‐>head;

	
  	
  lock(&allocator_lock);

…
 	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
	
  	
  lock(&allocator_lock);

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
 	
  lock(&allocator_lock);	
  
	
  	
  <	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐>

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
	
  	
  lock(&allocator_lock);	
  
	
  	
  <	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐	
  -­‐>

	
  	
  obj	
  *a	
  =	
  s-­‐>head;

…
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
 	
  LDREX	
  R5,	
  [m]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  TAS:	
  fetch.	
  .	
  .	
  
	
  	
  STREXEQ	
  R5,	
  LOCKED,	
  [m]	
  ;	
  TAS:	
  .	
  .	
  .	
  and	
  set	
  
	
  	
  CMPEQ	
  R5,	
  #0	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Did	
  we	
  succeed?
	
  	
  BEQ	
  lock_done	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Yes:	
  we	
  are	
  all	
  done

	
  	
  BL	
  snooze	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  No:	
  Call	
  snooze()…

	
  	
  B	
  lock_loop	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  	
  	
  	
  	
  …then	
  loop	
  again

lock_done:	
  
	
  	
  DMB	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Ensure	
  all	
  previous	
  reads	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  have	
  been	
  completed	
  
	
  	
  B	
  LR	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  return
;;;;	
  IN	
  unlock()	
  
	
  	
  MOV	
  R0,	
  UNLOCKED	
  
	
  	
  DMB	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  Ensure	
  all	
  previous	
  reads	
  have	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ;	
  been	
  completed	
  
	
  	
  STR	
  R0,	
  LR
;;;;	
  IN	
  lock()	
  
lock_loop:	
  
nathan~$	
  cat	
  /proc/cpuinfo	
  |	
  grep	
  "physical.*0"	
  |	
  wc	
  -­‐l	
  
16	
  
nathan~$	
  cat	
  /proc/cpuinfo	
  |	
  grep	
  "model	
  name"	
  |	
  uniq	
  
model	
  name	
  :	
  Intel(R)	
  Xeon(R)	
  CPU	
  E5-­‐2690	
  0	
  @	
  2.90GHz
Allocator performance
MillionsofAlloc/free

pairs/sec
10
20
30
40
50
60
Threads
1
20.56822.392
50.52951.23452.721
T&S T&S-EB T&T&S CAS
pthread_mutex
Allocator Throughput
MillionsofAlloc/free

pairs/sec
10
20
30
40
50
60
Threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
TAS T&T&S TAS + EB
Concurrent Allocator pthread
Allocator Throughput
Allocator latency
Allocator latency
Threads
CPUCycles
Allocator latency
Threads
CPUCycles
Allocator latency
Threads
CPUCycles
Allocator latency
Threads
CPUCycles
https://github.com/fastly/uslab
The lyf so short,
the CAS so longe to lerne
• Cache coherency and NUMA architecture
• Transactional memory
#thoughtleadership
a safe race?
When is a race
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
“lock-free programming is
hard; let’s go ride bikes”?
“lock-free programming is
hard; let’s go ride bikes”?
• high-level performance necessitates an
understanding of low level performance
“lock-free programming is
hard; let’s go ride bikes”?
• high-level performance necessitates an
understanding of low level performance
• your computer is a distributed system
“lock-free programming is
hard; let’s go ride bikes”?
• high-level performance necessitates an
understanding of low level performance
• your computer is a distributed system
• (optional third answer: it’s real neato)
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Racing To Win: Using Race Conditions to Build Correct and Concurrent Software
Come see us at the booth!
Nathan Taylor | nathan.dijkstracula.net | @dijkstracula
Thanks
credits, code, and additional material at
https://github.com/dijkstracula/Surge2015/

More Related Content

PDF
JavaOne 2013 - Clojure for Java Developers
PDF
The Ring programming language version 1.2 book - Part 79 of 84
PDF
Clojure for Java developers - Stockholm
PDF
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
KEY
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
PPT
Advanced locking
PDF
J S B6 Ref Booklet
PDF
The Ring programming language version 1.3 book - Part 84 of 88
JavaOne 2013 - Clojure for Java Developers
The Ring programming language version 1.2 book - Part 79 of 84
Clojure for Java developers - Stockholm
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Advanced locking
J S B6 Ref Booklet
The Ring programming language version 1.3 book - Part 84 of 88

What's hot (20)

PDF
Extend R with Rcpp!!!
KEY
Clojure Intro
PDF
Java Concurrency Gotchas
PPTX
Beginning direct3d gameprogrammingcpp02_20160324_jintaeks
PDF
Understanding the Disruptor
PPTX
Pattern Matching in Java 14
PPTX
Disruptor
PPTX
CppConcurrencyInAction - Chapter07
PPT
20100712-OTcl Command -- Getting Started
PDF
Predictably
PDF
EdSketch: Execution-Driven Sketching for Java
PDF
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
DOC
Ds 2 cycle
ODP
Turtle Graphics in Groovy
PPT
Deuce STM - CMP'09
DOCX
WOTC_Import
PDF
Modern c++ Memory Management
PDF
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
PDF
Actor Concurrency
PPT
NS2: Binding C++ and OTcl variables
Extend R with Rcpp!!!
Clojure Intro
Java Concurrency Gotchas
Beginning direct3d gameprogrammingcpp02_20160324_jintaeks
Understanding the Disruptor
Pattern Matching in Java 14
Disruptor
CppConcurrencyInAction - Chapter07
20100712-OTcl Command -- Getting Started
Predictably
EdSketch: Execution-Driven Sketching for Java
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
Ds 2 cycle
Turtle Graphics in Groovy
Deuce STM - CMP'09
WOTC_Import
Modern c++ Memory Management
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Actor Concurrency
NS2: Binding C++ and OTcl variables
Ad

Similar to Racing To Win: Using Race Conditions to Build Correct and Concurrent Software (20)

KEY
Agile Iphone Development
PDF
The Future of JVM Languages
PPT
iOS Development with Blocks
ODP
Groovy Ast Transformations (greach)
ODP
AST Transformations
PDF
Parallele Suche in grossen Graphen mit Heuristiken und Caches
PDF
The State of Lightweight Threads for the JVM
PPTX
Nicety of Java 8 Multithreading
PPTX
Nicety of java 8 multithreading for advanced, Max Voronoy
PDF
Kotlin coroutine - the next step for RxJava developer?
PDF
Blocks & GCD
PDF
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
PDF
Openstack taskflow 簡介
PDF
Lock free algorithms
ODP
AST Transformations at JFokus
PDF
Bdd for ios with kiwi
ODP
Ast transformations
ODP
Parsing with Perl6 Grammars
PDF
Kotlin @ Coupang Backend 2017
PPTX
The Art of Java Type Patterns
Agile Iphone Development
The Future of JVM Languages
iOS Development with Blocks
Groovy Ast Transformations (greach)
AST Transformations
Parallele Suche in grossen Graphen mit Heuristiken und Caches
The State of Lightweight Threads for the JVM
Nicety of Java 8 Multithreading
Nicety of java 8 multithreading for advanced, Max Voronoy
Kotlin coroutine - the next step for RxJava developer?
Blocks & GCD
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Openstack taskflow 簡介
Lock free algorithms
AST Transformations at JFokus
Bdd for ios with kiwi
Ast transformations
Parsing with Perl6 Grammars
Kotlin @ Coupang Backend 2017
The Art of Java Type Patterns
Ad

More from Fastly (20)

PDF
Revisiting HTTP/2
PPTX
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
PPTX
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
PDF
Altitude San Francisco 2018: The World Cup Stream
PDF
Altitude San Francisco 2018: We Own Our Destiny
PDF
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
PDF
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
PDF
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
PDF
Altitude San Francisco 2018: HTTP Invalidation Workshop
PDF
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
PPTX
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
PDF
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
PPTX
Altitude San Francisco 2018: Authentication at the Edge
PDF
Altitude San Francisco 2018: WebAssembly Tools & Applications
PPTX
Altitude San Francisco 2018: Testing with Fastly Workshop
PDF
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
PPTX
Altitude San Francisco 2018: WAF Workshop
PPTX
Altitude San Francisco 2018: Logging at the Edge
PPTX
Altitude San Francisco 2018: Video Workshop Docs
PPTX
Altitude San Francisco 2018: Programming the Edge
Revisiting HTTP/2
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Programming the Edge

Recently uploaded (20)

PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
Testing WebRTC applications at scale.pdf
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
presentation_pfe-universite-molay-seltan.pptx
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
Funds Management Learning Material for Beg
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
artificial intelligence overview of it and more
PPTX
E -tech empowerment technologies PowerPoint
PPTX
Digital Literacy And Online Safety on internet
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
Introuction about ICD -10 and ICD-11 PPT.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PptxGenJS_Demo_Chart_20250317130215833.pptx
An introduction to the IFRS (ISSB) Stndards.pdf
Unit-1 introduction to cyber security discuss about how to secure a system
Testing WebRTC applications at scale.pdf
SAP Ariba Sourcing PPT for learning material
presentation_pfe-universite-molay-seltan.pptx
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Sims 4 Historia para lo sims 4 para jugar
Paper PDF World Game (s) Great Redesign.pdf
Funds Management Learning Material for Beg
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Module 1 - Cyber Law and Ethics 101.pptx
artificial intelligence overview of it and more
E -tech empowerment technologies PowerPoint
Digital Literacy And Online Safety on internet
Power Point - Lesson 3_2.pptx grad school presentation
introduction about ICD -10 & ICD-11 ppt.pptx
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Introuction about ICD -10 and ICD-11 PPT.pptx

Racing To Win: Using Race Conditions to Build Correct and Concurrent Software

  • 1. Racing To Win Using Race Conditions to Build Correct & Concurrent Software Nathan Taylor | nathan.dijkstracula.net | @dijkstracula
  • 2. Racing To Win Using Race Conditions to Build Correct & Concurrent Software Nathan Taylor | nathan.dijkstracula.net | @dijkstracula
  • 3. Hi, I’m Nathan. ( @dijkstracula )
  • 14. Cache node Process A stack heap text Process B stack heap text Process C stack heap text
  • 15. Cache node Process A stack heap text Process B stack heap text Process C stack heap text
  • 16. A Persistent, Shared- State Memory Allocator Cache node Process A stack heap text Process B stack heap text Process C stack heap text uSlab
  • 18. Slab allocation Object Object Object Object Object Object Object Object
  • 19. Object Object Object Object Object Object Object Object
  • 23. s_free(              ); s_free(              ); s_free(              ); Object ObjectObject Object Object Object Object Object
  • 24. Object Object Object Object Object Object Object Object
  • 25. Object Object Object Object Object Object Object Object
  • 26. Allocation Protocol • An request to allocate is followed by a response containing an object • A request to free is followed by a response after the supplied object has been released
 
 • Allocation requests must not respond with an already- allocated object • A free request must not release an already-unallocated object
  • 28. An Execution History void foo() {
 obj *a = s_alloc();
 s_free(a);
 …
 }
  • 29. An Execution History Time void foo() {
 obj *a = s_alloc();
 s_free(a);
 …
 } A(allocate request) B(allocate response) A(free request) B(free response)
  • 30. An Execution History Time A(allocate request) B(allocate request) A(allocate response) B(allocate response)
  • 31. An Execution History Time A(allocate request) B(allocate request) A(allocate response) B(allocate response) “X happened before Y” => “Y may observe X to have occurred”
  • 32. A(allocate response) A(allocate request) B(allocate request) B(allocate response) Time
  • 33. A(allocate response) A(allocate request) B(allocate request) B(allocate response) Time
  • 34. A(allocate response) A(allocate request) B(allocate request) B(allocate response) A protocol violation! Time
  • 35. Time A(allocate response) A(allocate request) B(allocate request) B(allocate response)
  • 36. Time A(allocate response) A(allocate request) B(allocate request) B(allocate response)
  • 39. A Sequential History Time A(allocate request) A(allocate response) A(free request) A(free response) B(allocate request) B(allocate response)
  • 40. A Sequential History Time A(allocate request) A(allocate response) { } A(free request) A(free response) { } B(allocate request) B(allocate response) { }
  • 41. A Sequential History Time A(allocate request) A(allocate response) { } A(free request) A(free response) { } B(allocate request) B(allocate response) { }
  • 42. A Sequential History Time A(allocate request) A(allocate response) { } A(free request) A(free response) { } B(allocate request) B(allocate response) { }
  • 43. obj  *allocate(slab  *s)  {
 
    obj  *a  =  s-­‐>head;
    if  (a  ==  NULL)  return  NULL;
    s-­‐>head  =  a-­‐>next;
 
    return  a;
 }   void  free(slab  *s,  obj  *o)  {
    o-­‐>next  =  s-­‐>head;
    s-­‐>head  =  o;
 }
  • 44. obj  *allocate(slab  *s)  {
    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
    if  (a  ==  NULL)  return  NULL;
    s-­‐>head  =  a-­‐>next;
    unlock(&allocator_lock);
    return  a;
 }   void  free(slab  *s,  obj  *o)  {
    lock(&allocator_lock);      o-­‐>next  =  s-­‐>head;
    s-­‐>head  =  o;
    unlock(&allocator_lock);   }
  • 45. Was the State Locked? Yes Done No Atomic
  • 46. Fetch Old Lock State Set State Locked Was old State Locked? Yes Done No Atomic
  • 47. Fetch Old Lock State Set State Locked Was old State Locked? Yes Done No Atomic Test And Set Lock
  • 48. Test And Set Unlock Set State Unlocked Atomic
  • 49. typedef  spinlock  int;
 #define  LOCKED  1
 #define  UNLOCKED  0
 
 void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 }   void  unlock(spinlock  *m)  {
    atomic_store(m,  UNLOCKED);
 } Many code examples derived from Concurrency Kit http://concurrencykit.org
  • 50. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } A(TAS request) A(TAS response) { }
  • 51. A(TAS request) A(TAS response) { } TAS is embedded in Lock
  • 52. A(TAS request) A(TAS response) { } A(lock request) A(lock response) Time TAS is embedded in Lock
  • 53. A(TAS request) A(TAS response) { } A(lock request) A(lock response) Time TAS & Store can’t be reordered
  • 54. A(TAS request) A(TAS response) { } A(lock request) A(lock response) Time B(unlock request) B(unlock response) B(Store request) B(Store response) { } TAS & Store can’t be reordered
  • 56. All execution histories All sequentially-consistent execution histories ⊇
  • 57. All execution histories All sequentially-consistent execution histories All ???able execution histories ⊇ ⊇
  • 58. All execution histories All sequentially-consistent execution histories All linearizable execution histories ⊇ ⊇
  • 59. A(TAS request) A(TAS response) { } A(lock request) A(lock response) Time Others can be reordered B(unlock request) B(unlock response) B(Store request) B(Store response) { }
  • 60. A(TAS request) A(TAS response) { } A(lock request) A(lock response) Time Others can be reordered B(unlock request) B(unlock response) B(Store request) B(Store response) { }
  • 61. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 }   void  unlock(spinlock  *m)  {
    atomic_store(m,  UNLOCKED);
 }
  • 63. obj  *allocate(slab  *s)  {
    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
    if  (a  ==  NULL)  return  NULL;
    s-­‐>head  =  a-­‐>next;
    unlock(&allocator_lock);
    return  a;
 }   void  free(slab  *s,  obj  *o)  {
    lock(&allocator_lock);      o-­‐>next  =  s-­‐>head;
    s-­‐>head  =  o;
    unlock(&allocator_lock);   }
  • 64. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 }   void  unlock(spinlock  *m)  {
    atomic_store(m,  UNLOCKED);
 }
  • 65. Spinlock performance millionsoflock acquisitions/sec 15 30 45 60 75 90 Threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 87.351 Test and Set
  • 66. Spinlock performance millionsoflock acquisitions/sec 15 30 45 60 75 90 Threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Platonic ideal of a spinlock
  • 67. Spinlock performance millionsoflock acquisitions/sec 15 30 45 60 75 90 Threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 87.351 4.343 Test and Set
  • 70. typedef  spinlock  int;
 #define  LOCKED  1
 #define  UNLOCKED  0
 
 void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 }  
  • 71. typedef  spinlock  int;
 #define  LOCKED  1
 #define  UNLOCKED  0
 
 void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)  {
        while  (atomic_store(m)  ==  LOCKED)              snooze();
    }
 }  
  • 72. typedef  spinlock  int;
 #define  LOCKED  1
 #define  UNLOCKED  0
 
 void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)  {
        while  (atomic_store(m)  ==  LOCKED)              snooze();
    }
 }   Test-and-Test-and-Set
  • 75. typedef  spinlock  int;
 #define  LOCKED  1
 #define  UNLOCKED  0
 
 void  lock(spinlock  *m)  {      unsigned  long  backoff,  exp  =  0;  
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)  {          for  (i  =  0;  i  <  backoff;  i++)              snooze();          backoff  =  (1ULL  <<  exp++);      }
 }  
  • 76. typedef  spinlock  int;
 #define  LOCKED  1
 #define  UNLOCKED  0
 
 void  lock(spinlock  *m)  {      unsigned  long  backoff,  exp  =  0;  
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)  {          for  (i  =  0;  i  <  backoff;  i++)              snooze();          backoff  =  (1ULL  <<  exp++);      }
 }   TAS + backoff
  • 78. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = UNLOCKED
  • 79. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = UNLOCKED
  • 80. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = UNLOCKED
  • 81. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = UNLOCKED
  • 82. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = LOCKED
  • 83. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = LOCKED
  • 84. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = LOCKED
  • 85. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = LOCKED
  • 86. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = LOCKED
  • 87. void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } void  lock(spinlock  *m)  {
    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
 } spinlock  global_lock  = LOCKED
  • 88. A function is lock-free if at all times at least one thread is guaranteed to be making progress [in the function]. (Herlihy & Shavit)
  • 90. //  TODO:  make  this  safe  and  scalable
 obj  *allocate(slab  *s)  {
    obj  *a  =  s-­‐>head;
    if  (a  ==  NULL)  return  NULL;
    s-­‐>head  =  a-­‐>next;
    return  a;
 }   //  TODO:  make  this  safe  and  scalable   void  free(slab  *s,  obj  *o)  {      o-­‐>next  =  s-­‐>head;
    s-­‐>head  =  o;   }
  • 93. Compare-And-Swap Cmpr and * Old value Destination Address
  • 95. Compare-And-Swap Old value New value ≠ Return false = Destination Address Copy to * Return true Cmpr and *
  • 96. Compare-And-Swap Old value New value ≠ Return false = Destination Address Return true Atomic Copy to *Cmpr and *
  • 97. Atomic i  =  i+1; void  atomic_inc(int  *ptr)  {
    int  i,  i_plus_one;
    do  {          i  =  *ptr;          i_plus_one  =  i  +  1;
    }  while  (!cas(i,  i_plus_one,  ptr));   
 }
  • 98. void  atomic_inc(int  *ptr)  {
    int  i,  i_plus_one;
    do  {          i  =  *ptr;          i_plus_one  =  i  +  1;
    }  while  (!cas(i,  i_plus_one,  ptr));   
 } Atomic i  =  i+1;
  • 99. void  atomic_inc(int  *ptr)  {
    int  i,  i_plus_one;
    do  {          i  =  *ptr;          i_plus_one  =  i  +  1;
    }  while  (!cas(i,  i_plus_one,  ptr));   
 } Atomic i  =  i+1;
  • 100. void  atomic_inc(int  *ptr)  {
    int  i,  i_plus_one;
    do  {          i  =  *ptr;          i_plus_one  =  i  +  1;
    }  while  (!cas(i,  i_plus_one,  ptr));   
 } Atomic i  =  i+1;
  • 101. void  atomic_inc_mod_32(int  *ptr)  {
    int  i,  new_i;
    do  {          i  =  *ptr;          new_i  =  i  +  1;          new_i  =  new_i  %  32;
    }  while  (!cas(i,  new_i,  ptr));
 } Atomic i  =  (i+1)  %  32;
  • 102. TAS using CAS void  tas_loop(spinlock  *m)  {
    do  {          ;
    }  while  (!cas(UNLOCKED,  LOCKED,  m));   }
  • 103. Read/Modify/Write void  atomic_inc_mod_32(int  *ptr)  {
    int  i,  new_i;
    do  {          i  =  *ptr;                                          /*  Read  */          new_i  =  fancy_function();      /*  Modify  */
    }  while  (!cas(i,  new_i,  ptr));  /*  Write  */                                                    
 }
  • 104. Read/Modify/Write void  atomic_inc_mod_32(int  *ptr)  {
    int  i,  new_i;
    do  {          i  =  *ptr;                                          /*  Read  */          new_i  =  fancy_function();      /*  Modify  */
    }  while  (!cas(i,  new_i,  ptr));  /*  Write  */                                                          /*  (or  retry)  */
 }
  • 105. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head  ));
    return  a;
 } slab head A B …
  • 106. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head  ));
    return  a;
 } slab head A B …
  • 107. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head  ));
    return  a;
 } A B …slab head
  • 108. B … slab head A obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head  ));
    return  a;
 }
  • 109. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(    ,    ,                  ));
    return    a;
 } slab head a a b Cmpr and * &s->head A B … b a
  • 110. slab head Cmpr and Z obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(    ,    ,                  ));
    return    a;
 } a a b &s->head b a
  • 111. slab head Z A B Cmpr and obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(    ,    ,                  ));
    return    a;
 } a a b &s->head b a
  • 112. slab head B … Cmpr and obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(    ,    ,                  ));
    return    a;
 } a a b &s->head b a
  • 113. void  free(slab  *s,  obj  *o)  {          do  {                  obj  *t  =  s-­‐>head;                  o-­‐>next  =  t;          }  while  (!cas(t,  o,  &s-­‐>head));   } B …slab head
  • 114. void  free(slab  *s,  obj  *o)  {          do  {                  obj  *t  =  s-­‐>head;                  o-­‐>next  =  t;          }  while  (!cas(t,  o,  &s-­‐>head));   } slab head A B …
  • 115. A B Cslab head
  • 116. A B Cslab head
  • 117. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } A B Cslab head
  • 118. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } A B Cslab head
  • 119. A B C obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } slab head
  • 120. A B C obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } slab head
  • 121. A B C some_object  =  allocate(&shared_slab); slab head obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 122. A B C some_object  =  allocate(&shared_slab); slab head obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 123. B C A slab head some_object  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 124. B C A slab head some_object  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 125. B C another_obj  =  allocate(&shared_slab); A slab head some_object  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 126. B C another_obj  =  allocate(&shared_slab); A slab head some_object  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 127. C A B slab head another_obj  =  allocate(&shared_slab); some_object  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 128. C A B slab head another_obj  =  allocate(&shared_slab); some_object  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 129. B C A slab head another_obj  =  allocate(&shared_slab); free(&shared_slab,  some_object); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 130. B C A slab head another_obj  =  allocate(&shared_slab); free(&shared_slab,  some_object); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 131. B A Cslab head another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } free(&shared_slab,  some_object);
  • 132. B A Cslab head another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } free(&shared_slab,  some_object);
  • 133. B A Cslab head another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } free(&shared_slab,  some_object);
  • 134. B A Cslab head another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } free(&shared_slab,  some_object);
  • 135. free(&shared_slab,  some_object); B B Cslab head A another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 136. free(&shared_slab,  some_object); B B Cslab head A another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 137. free(&shared_slab,  some_object); B B Cslab head A another_obj  =  allocate(&shared_slab); obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 }
  • 138. The ABA Problem “A reference about to be modified by a CAS changes from a to b and back to a again. As a result, the CAS succeeds even though its effect on the data structure has changed and no longer has the desired effect.” —Herlihy & Shavit, p. 235
  • 139. obj  *allocate(slab  *s)  {
    obj  *a,  *b;
    do  {
        a  =  s-­‐>head;
        if  (a  ==  NULL)  return  NULL;
        b  =  a-­‐>next;
    }  while  (!cas(a,  b,  &s-­‐>head));
    return  a;
 } A B …slab head 166
  • 140. obj  *allocate(slab  *s)  {
    slab  orig,  update;
    do  {
        orig.gen  =  s.gen;
        orig.head  =  s.head;
        if  (!orig.head)  return  NULL;          update.gen  =  orig.gen  +  1;
        update.head  =  orig.head-­‐>next;
    }  while  (!dcas(&orig,  &update,  s));
    return  orig.head;
 } A B …slab head 166
  • 141. free(slab  *s,  obj  *o)  {          do  {                  obj  *t  =  s-­‐>head;                  o-­‐>next  =  t;          }  while  (!cas(t,  o,  &s-­‐>head));   }
  • 142. obj  *allocate(slab  *s)  {
    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
    if  (a  ==  NULL)  return  NULL;
    s-­‐>head  =  a-­‐>next;
    unlock(&allocator_lock);
    return  a;
 }   void  free(slab  *s,  obj  *o)  {
    lock(&allocator_lock);      o-­‐>next  =  s-­‐>head;
    s-­‐>head  =  o;
    unlock(&allocator_lock);   }
  • 143. slab head A B … obj  *o  =  allocate(&shared_slab); obj  *o  =  allocate(&shared_slab);
  • 144. slab head B … obj  *o  =  allocate(&shared_slab); obj  *o  =  allocate(&shared_slab); A A
  • 145. slab head B … obj  *o  =  allocate(&shared_slab); obj  *o  =  allocate(&shared_slab); A A Memory barriers
  • 146.    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
 …    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
 …
  • 147.    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
    obj  *a  =  s-­‐>head;
 …    while  (atomic_tas(m,  LOCKED)  ==  LOCKED)          snooze();
    obj  *a  =  s-­‐>head;
 …
  • 148.    LDREX  R5,  [m]                      ;  TAS:  fetch.  .  .      STREXEQ  R5,  LOCKED,  [m]  ;  TAS:  .  .  .  and  set      CMPEQ  R5,  #0                        ;  Did  we  succeed?  LDR  R0,  [R1,  4]                  ;  a  =  s-­‐>head    BEQ  lock_done                      ;  Yes:  we  are  all  done
    BL  snooze                              ;  No:  Call  snooze()…
    B  lock_loop                          ;          …then  loop  again
 lock_done:      B  LR                                        ;  return ;;;;  IN  lock()   lock_loop:   ;;;;  IN  allocate()  
  • 149.    LDREX  R5,  [m]                      ;  TAS:  fetch.  .  .      STREXEQ  R5,  LOCKED,  [m]  ;  TAS:  .  .  .  and  set      CMPEQ  R5,  #0                        ;  Did  we  succeed?  LDR  R0,  [R1,  4]                  ;  a  =  s-­‐>head    BEQ  lock_done                      ;  Yes:  we  are  all  done
    BL  snooze                              ;  No:  Call  snooze()…
    B  lock_loop                          ;          …then  loop  again
 lock_done:      B  LR                                        ;  return ;;;;  IN  allocate()   ;;;;  IN  lock()   lock_loop:  
  • 150.  LDR  R0,  [R1,  4]                  ;  a  =  s-­‐>head    BEQ  lock_done                      ;  Yes:  we  are  all  done
    BL  snooze                              ;  No:  Call  snooze()…
    B  lock_loop                          ;          …then  loop  again
 lock_done:      B  LR                                        ;  return ;;;;  IN  allocate()      LDREX  R5,  [m]                      ;  TAS:  fetch.  .  .      STREXEQ  R5,  LOCKED,  [m]  ;  TAS:  .  .  .  and  set      CMPEQ  R5,  #0                        ;  Did  we  succeed? ;;;;  IN  lock()   lock_loop:  
  • 156.    LDREX  R5,  [m]                      ;  TAS:  fetch.  .  .      STREXEQ  R5,  LOCKED,  [m]  ;  TAS:  .  .  .  and  set      CMPEQ  R5,  #0                        ;  Did  we  succeed?  LDR  R0,  [R1,  4]                  ;  a  =  s-­‐>head    BEQ  lock_done                      ;  Yes:  we  are  all  done
    BL  snooze                              ;  No:  Call  snooze()…
    B  lock_loop                          ;          …then  loop  again
 lock_done:      B  LR                                        ;  return ;;;;  IN  allocate()   ;;;;  IN  lock()   lock_loop:  
  • 157.    LDREX  R5,  [m]                      ;  TAS:  fetch.  .  .      STREXEQ  R5,  LOCKED,  [m]  ;  TAS:  .  .  .  and  set      CMPEQ  R5,  #0                        ;  Did  we  succeed?  LDR  R0,  [R1,  4]                  ;  a  =  s-­‐>head    BEQ  lock_done                      ;  Yes:  we  are  all  done
    BL  snooze                              ;  No:  Call  snooze()…
    B  lock_loop                          ;          …then  loop  again
 lock_done:      B  LR                                        ;  return ;;;;  IN  allocate()   ;;;;  IN  lock()   lock_loop:  
  • 159.    obj  *a  =  s-­‐>head;      lock(&allocator_lock);
 …    obj  *a  =  s-­‐>head;
    lock(&allocator_lock);
 …
  • 160.    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
 …    lock(&allocator_lock);
    obj  *a  =  s-­‐>head;
 …
  • 161.    lock(&allocator_lock);      <  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐>
    obj  *a  =  s-­‐>head;
 …    lock(&allocator_lock);      <  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐  -­‐>
    obj  *a  =  s-­‐>head;
 …
  • 163.    LDREX  R5,  [m]                      ;  TAS:  fetch.  .  .      STREXEQ  R5,  LOCKED,  [m]  ;  TAS:  .  .  .  and  set      CMPEQ  R5,  #0                        ;  Did  we  succeed?    BEQ  lock_done                      ;  Yes:  we  are  all  done
    BL  snooze                              ;  No:  Call  snooze()…
    B  lock_loop                          ;          …then  loop  again
 lock_done:      DMB                                          ;  Ensure  all  previous  reads                                                      ;  have  been  completed      B  LR                                        ;  return ;;;;  IN  unlock()      MOV  R0,  UNLOCKED      DMB                            ;  Ensure  all  previous  reads  have                                        ;  been  completed      STR  R0,  LR ;;;;  IN  lock()   lock_loop:  
  • 164. nathan~$  cat  /proc/cpuinfo  |  grep  "physical.*0"  |  wc  -­‐l   16   nathan~$  cat  /proc/cpuinfo  |  grep  "model  name"  |  uniq   model  name  :  Intel(R)  Xeon(R)  CPU  E5-­‐2690  0  @  2.90GHz Allocator performance
  • 166. MillionsofAlloc/free
 pairs/sec 10 20 30 40 50 60 Threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TAS T&T&S TAS + EB Concurrent Allocator pthread Allocator Throughput
  • 173. The lyf so short, the CAS so longe to lerne • Cache coherency and NUMA architecture • Transactional memory
  • 175. a safe race? When is a race
  • 177. “lock-free programming is hard; let’s go ride bikes”?
  • 178. “lock-free programming is hard; let’s go ride bikes”? • high-level performance necessitates an understanding of low level performance
  • 179. “lock-free programming is hard; let’s go ride bikes”? • high-level performance necessitates an understanding of low level performance • your computer is a distributed system
  • 180. “lock-free programming is hard; let’s go ride bikes”? • high-level performance necessitates an understanding of low level performance • your computer is a distributed system • (optional third answer: it’s real neato)
  • 187. Come see us at the booth! Nathan Taylor | nathan.dijkstracula.net | @dijkstracula Thanks credits, code, and additional material at https://github.com/dijkstracula/Surge2015/