SlideShare a Scribd company logo
jvm/java: towards lock-
free concurrency
Arvind Kalyan
Engineer at LinkedIn
agenda
intro to concurrency & memory model on jvm
reordering -> barriers -> happens-before
jdk 8 concurrency primitives
volatile -> atomics, collections, explicit locks, fork/join
trends in this area (to make all of this practical)
lock-free, STM, TM
background
for control and performance, sometimes there are valid
reasons to use locks (like a mutex) for concurrency control
in most other situations, primitive synchronization
constructs in some modules lead to unreliable & incorrect
programs in most non-trivial systems that are composed
over such modules
the best practice, in the current state, is to write single
threaded programs
‘automatic’ concurrency
there are platforms that take your single
threaded program and run it concurrently —
most web servers do this, for example
on the other hand, there are times when you
really must use multiple threads
practicality
concurrency control techniques have been studied for a
while, but since 2005 it is being studied intensely* to make
it more practical for more widespread (and safer) use
simpler software techniques, and also hardware level
support for those techniques are being developed
before we see how to write safe code using these new
techniques, let’s look into some basics
* https://scholar.google.com/scholar?as_ylo=2005&q=%22software+transactional+memory%22
why concurrency control?
when dealing with multiple threads,
concurrency control/synchronization is
necessary not only to guard critical sections
from multiple threads using a mutex…
but also to ensure that the memory updates
(through mutable variables) are made visible
to all threads ‘correctly’
memory model
as a platform, jvm guarantees that ‘correctly
synchronized’ programs have a very well
defined memory behavior
let’s look into the jvm memory model which
defines those guarantees
memory model
your code manipulates memory by using variables and
objects
the memory is separated by a few layers of caches from
the cpu
on a multi-core cpu when a write happens in one cpu’s
cache, we need to make it visible to other cpus as well
and then there is the topic of re-odering…
* http://en.wikipedia.org/wiki/Memory_barrier
memory model
to improve performance, the hardware (cpu,
caches, …) reorders memory access using its
own memory model (set of rules)* dynamically
the visibility of a value in a memory location is
further complicated by the code reordering
performed by the compiler statically
http://en.wikipedia.org/wiki/Memory_ordering
memory model
the static and dynamic reordering strive to
ensure an ‘as-if serial’ semantics
i.e., the program appears to be executing
sequentially as per the lines in your source
code
memory model
memory reordering is transparent in single-
threaded use-cases because of that as-if-
serial guarantee
but logic quickly falls apart and causes
surprises in incorrectly synchronized multi-
threaded programs
memory model
while jvm’s OOTA safety (out of thin air)
guarantees that a thread always reads a value
written by *some* thread, and not some value
out of thin air…
with all the reordering, it’s good to have a
slightly stronger guarantee …
the need for memory barriers
in the following code, say reader is called after writer
(from different threads)

class Reordering {

int x = 0, y = 0;

public void writer() {

x = 1;

y = 2;

}

public void reader() {

int r1 = y;

int r2 = x;

// use r1 and r2

}

}
in reader, even if r1 == 2, r2 can be 0 or 1
synchronization is needed if we want to control the
ordering (and ensure r2 == 1) using a memory barrier
memory barrier
the jvm memory model essentially defines the
relationship between the variables in your
code
the semantics also define a partial ordering on
the memory operations so certain actions are
guaranteed to ‘happen before’ others
happens-before
happens-before is a visibility guarantee for
memory provided through synchronization
such as locking, volatiles, atomics, etc
…and for completeness, through Thread
start() & join()
Concurrency control on jvm with
JDK 8
with that background, let’s look at some
specific tools & mechanisms available on the
jvm & jdk 8..
Concurrency control on jvm with
JDK 8
volatiles
atomics
concurrent collections/data-structures
synchronizers
fork/join framework
volatiles
volatiles are typically used as a state variables
across threads
writing to & reading from a volatile is like releasing
and acquiring a monitor (lock), respectively
i.e., it guarantees a happens-before relationship
not just with other volatile but also non-volatile
memory
volatiles
typical use of volatiles with reader and writer called from
different threads:

class VolatileExample {

int x = 0;

volatile boolean v = false;

public void writer() {

x = 42;

v = true;

}

public void reader() {

if (v == true) {

//uses x - guaranteed to see 42.

}

}

}
the happens-before guarantee in jvm memory model makes it
simpler to reason about the value in x, even though x is non-
volatile!
code: https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
volatiles
guaranteeing happens-before relationship for
non-volatile memory is a performance
overhead, so like any other synchronization
primitive, it must be used judiciously
but, it greatly simplifies the program and by
aligning the dynamic and static reordering with
most programmers’ expectations
atomics
atomics* extend the notion of volatiles, and support
conditional updates
being an extension to volatiles, they guarantee
happens-before relationship on memory operations
the updates are performed through a CAS cpu
instruction
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/package-summary.html
atomics
atomics/cas allow designing non-blocking
algorithms where the critical section is around
a single variable
if there is more than one variable, other forms
of synchronization is needed
CAS
JDK 8 uses CAS for ‘lock-free’ operation
at a high-level, it piggy backs on a cpu
provided CAS* instruction —like lock:cmpxchg on
x86
let’s see how jvm dynamically improves the
performance of the hardware provided CAS
*CAS: http://en.wikipedia.org/wiki/Compare-and-swap
CAS/atomics
CAS in recent cpu implementations don’t assert the lock# to gain
exclusive bus access, but rather rely on efficient cache-coherence
protocols* — unless the memory address is not cache-line aligned
even if that helps CAS to scale on many-core systems, CAS still
adds a lot to local latency, sometimes nearly halting the cpu
to address that local latency, a biased-locking* approach is used
— where uncontended usage of atomics are recompiled
dynamically to not use CAS instructions!
* more about MESI: https://courses.engr.illinois.edu/cs232/sp2009/lectures/x24.pdf

* biased locking in jvm: https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
biased-locking
the biased-locking feature in jvm extends
beyond atomics, and generalizes to different
kinds of locking (monitor entry & exit) on the
jvm
atomics
before we move on, JDK 7 also provides
‘weakCompareAndSet’ atomic api, which relaxes the
happens-before ordering guarantee
relaxing the ordering makes it very hard to reason
about the program’s execution so its use is limited to
debugging counters, etc
there are better ways of doing this ‘fast’ — which
brings us to…
adders & accumulators
under high contention, the biased locking would be
spending too much time in lock revocation from a
thread if we used atomics
in these high contention situations, adders* help
gather counts by actively reducing contention, and
‘gather’ the value only when sum() or longValue() is
called
* http://download.java.net/lambda/b78/docs/api/java/util/concurrent/atomic/LongAdder.html
concurrent collections
the JDK also comes with a handful of lock-
free collections
these help in correctly synchronizing larger
data sets than single variables
concurrent collections
ConcurrentHashMap (CHM) uses some of
the concepts listed so far and provides a lock-
free read, and a mostly lock-free write in java 8
relies on a good hashCode to reduce
collisions, after which it reverts to using a lock
for that bin
concurrent collections
CHM — in general — allows concurrent use of
a Map which can be pretty useful especially to
represent a shared ‘mutating’ state, and such
CHM, together with adders for example,
enable concurrent, lock-free, histogram
generation across threads
more about CHM here, ofcourse: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-
summary.html
synchronizers
let’s look at some synchronization primitives…
(a.k.a. ‘source of bugs’)
synchronizers
2 major categories…
coarse-grained locks are usually less
performant, but are easy to code
and, fine-grained locking has potential for
higher performance, but is more error prone
synchronized
synchronized keyword is a coarse grained locking
scheme
you acquire & release locks at method or block level,
typically holding the lock longer than needed
translates directly to jvm synchronization (intrinsic) &
hardware monitor
so its use is currently discouraged (might change in java9)
explicit locks
Locks* enables fine-grained locking
these extend intrinsic locks, and allow unconditional,
polled, timed & interruptible lock acquisition
allow ‘custom’ wait/notify queues (Condition*) on the
same lock
nice features, but …
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html

* http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
explicit locks
developer needs to remember to release locks, so
following style is encouraged:

Lock l = ...;

l.lock();

try {

// access the resource protected by this lock

} finally {

l.unlock();

}
it gets *very* complicated when we have to deal with
more than 1 lock
…source of all kinds of bugs & surprises
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
ReentrantLock
an implementation of Lock described earlier
support fairness policy to deal with lock
starvation — ‘fair’, not ‘fast’
there is nothing special in this lock to make it
‘reentrant’; all intrinsic locks are per-thread and
reentrant, unlike POSIX invocation based locks
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html
a note about reentrancy
reentrancy helps encapsulate locking behavior & helps write
cleaner (oop) concurrent code
in simpler cases (using single ‘resource’ but multiple methods)
this also helps avoid deadlocks:

class A {

synchronized void run(){

//..

}

}

class B extends A {

synchronized void run() {

super.run()

}

}
if intrinsic locks were not reentrant on jvm, the call to
super.run() would be deadlocked
ReentrantLock
ReentrantLock (not reentrancy in general)
has some issues so it must be used with
caution:
causes starvation, and performs poorly when
fairness is used
StampedLock
supports optimistic reads & lock upgrades
is not reentrant — needs the stamp, so not
usable across calls to unknown methods
for internal use in thread safe components,
where you fully understand the data, objects
& methods involved
StampedLock
for very short read-only code, optimistic
reads improve throughput by reducing
contention
useful when reading multiple fields of an
object from memory without locking
must call validate() later to ensure consistency
StampedLock
along with optimistic reads, the lock upgrade
capability enables many useful idioms:

StampedLock sl = new StampedLock();

double x, y;

..

double distanceFromOrigin() { // A read-only method

long stamp = sl.tryOptimisticRead();

double currentX = x, currentY = y; // read without locking

if (!sl.validate(stamp)) {

stamp = sl.readLock(); // upgrade to read-lock if values are dirty

try {

currentX = x;

currentY = y;

} finally {

sl.unlockRead(stamp);

}

}

return Math.sqrt(currentX * currentX + currentY * currentY);

}
fork/join
unlike regular java.lang.Thread (which are mostly
based on POSIX threads), fork/join tasks never
‘block’
for simple tasks, the overhead of constructing and/or
managing a thread is more expensive than the task
itself
programming on fork/join, in essence, allows
frameworks to optimize such tasks ‘behind the scenes’
fork/join
going beyond performance, the framework does
nothing to ensure concurrency control
the framework is also only usable in a few
scenarios where task can be easily disintegrated
in a sense, this is not making it easier to create
correct (and fast) programs
lambdas & streams
framework available on jdk 8 for data-
processing workloads
looks ‘functional’ — but due to type-erasure
these aren't typed
‘look’ like anonymous inner class but are
fundamentally different from the ground-up —
enabling jvm optimizations for concurrency & gc
lock-free
we’ve looked at a few lock-free concepts at a
single-variable level, using CAS
and atomics, which rely on CAS
and optimizations to make CAS faster…
lock-free
but how do we write ‘real-world’ concurrent
applications using lock-free concepts?
i.e., more than just CAS?
lock-free
that brings us to software transactional
memory (STM)!
STM is to concurrency control, what garbage-
collection is to memory management
STM
brings DB transaction concept to regular
memory access
read & write ‘as-if’ there is no contention…
during commit time the system ensures sanity
under the hood
… no locks in the code!
STM
in low contention use-cases (i.e., well-
designed programs), the absence of
synchronization makes execution very fast!
even in poorly designed programs, the
absence of locks makes it easier to focus on
correctness
STM implementation
multiverse[1] is a popular jvm implementation of
STM (groovy and Scala/Akka use it in their STM)
in essence, multiverse implements multiversion
concurrency control (MVCC[2])
Clojure has a language built-in STM feature
[1] http://multiverse.codehaus.org/overview.html 

[2] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
STM & composability
the biggest benefit of STM is composability (software
reuse)

class Account {

private final TxnRef<Date> lastUpdate = …;

private final TxnInteger balance = …;

public void incBalance(int amount, Date date){

atomic(new Runnable() {

public void run(){

balance.inc(amount);

lastUpdate.set(date);

if(balance.get() < 0) {

throw new IllegalStateException("Not enough money");

}

}

});

}

}

class Teller {

state void transfer(Account from, Account to, int amount) {

atomic(new Runnable() {

public void run() {

Date date = new Date();

from.incBalance(-amount, date);

to.incBalance(amount, date);

}

});

}

}
STM & composability
the Teller class is able to ‘compose’ over other
atomic operations without knowing their internal
details (i.e., what locks they use to synchronize)
so if to.incBalance() fails, the memory effects of
from.incBalance() are not committed so will
never be visible to other threads!
this is a pretty big deal…
Simplicity
STM makes composing concurrent software
modules appear very trivial
in the absence of locks, it is easier to
conceptualize the code flow
the ability to code atomic operations this way
essentially nullifies the challenges typically
associated with concurrent programming
performance
as stated earlier, stm allows optimistic execution: ‘as
though’ there are no other threads running, so it
increases concurrency
STM synchronizes only when required and falls back to
slower (serialized) executions when necessary
STM performs better than explicit locks as the number
of cores increase beyond 4*
* http://en.wikipedia.org/wiki/Software_transactional_memory

http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
more performance
apart from just software improvements, cpu
makers have started looking into hardware
support for TM
this is an emerging area and more advances are
being made, apart from Haswell, and TSX from
Intel
* https://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell

* http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
STM & Practicality
concurrent programming is getting more
practical
stm brings the benefits of fine-grained locking
to coarse-grained locking without using locks
Summary
lock-free concurrency control techniques like
STM not only make it easier to write correct
code…
but also allows platforms (like JVM) to make
your code correct code run faster
References
Being a long slideshow with dense content,
I’ve put references on each slide so you can
read through
Reach out to me on LinkedIn if you’d like more
info or just to discuss!

More Related Content

PPTX
The Java Memory Model
PDF
Java Course 10: Threads and Concurrency
ODP
Concept of thread
PPTX
Threads and multi threading
PDF
Multithreading
PPTX
Java concurrency in practice
PPT
Multi threading
PDF
Thread
The Java Memory Model
Java Course 10: Threads and Concurrency
Concept of thread
Threads and multi threading
Multithreading
Java concurrency in practice
Multi threading
Thread

What's hot (20)

PDF
Concurrency in Java
PDF
Xilkernel
PPT
Java8 - Under the hood
ODP
Multithreading 101
PDF
Java Multithreading Using Executors Framework
PPT
Security Applications For Emulation
PDF
PPT
Efficient Memory and Thread Management in Highly Parallel Java Applications
PDF
Coherence and consistency models in multiprocessor architecture
PPT
Free FreeRTOS Course-Task Management
PPTX
Concurrency in java
PDF
Cache coherence
PPTX
Threads (operating System)
PPTX
Java concurrency - Thread pools
PPT
Multithreading models
PPTX
Cache coherence
PPTX
Networking threads
PPT
Operating System Chapter 4 Multithreaded programming
Concurrency in Java
Xilkernel
Java8 - Under the hood
Multithreading 101
Java Multithreading Using Executors Framework
Security Applications For Emulation
Efficient Memory and Thread Management in Highly Parallel Java Applications
Coherence and consistency models in multiprocessor architecture
Free FreeRTOS Course-Task Management
Concurrency in java
Cache coherence
Threads (operating System)
Java concurrency - Thread pools
Multithreading models
Cache coherence
Networking threads
Operating System Chapter 4 Multithreaded programming
Ad

Viewers also liked (20)

PDF
Lock free algorithms
PDF
50 nouvelles choses que l'on peut faire en Java 8
PDF
Memory Management in the Java HotSpot Virtual Machine
PDF
Java SE 8 for Java EE developers
PDF
Streams and collectors in action
PDF
Déploiement d'une application Java EE dans Azure
PPTX
JFokus 50 new things with java 8
PDF
Java 8 Streams and Rx Java Comparison
PPTX
Java 8 concurrency abstractions
PDF
ArrayList et LinkedList sont dans un bateau
ODP
Java Concurrency, Memory Model, and Trends
PDF
Free your lambdas
PDF
Autumn collection JavaOne 2014
PDF
50 new things you can do with java 8
PDF
Building microservices with Scala, functional domain models and Spring Boot (...
PDF
50 new things we can do with Java 8
PDF
Profiler Guided Java Performance Tuning
PDF
Java Concurrency by Example
PDF
Linked to ArrayList: the full story
PDF
Developing and deploying applications with Spring Boot and Docker (@oakjug)
Lock free algorithms
50 nouvelles choses que l'on peut faire en Java 8
Memory Management in the Java HotSpot Virtual Machine
Java SE 8 for Java EE developers
Streams and collectors in action
Déploiement d'une application Java EE dans Azure
JFokus 50 new things with java 8
Java 8 Streams and Rx Java Comparison
Java 8 concurrency abstractions
ArrayList et LinkedList sont dans un bateau
Java Concurrency, Memory Model, and Trends
Free your lambdas
Autumn collection JavaOne 2014
50 new things you can do with java 8
Building microservices with Scala, functional domain models and Spring Boot (...
50 new things we can do with Java 8
Profiler Guided Java Performance Tuning
Java Concurrency by Example
Linked to ArrayList: the full story
Developing and deploying applications with Spring Boot and Docker (@oakjug)
Ad

Similar to jvm/java - towards lock-free concurrency (20)

DOC
Concurrency Learning From Jdk Source
PPT
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
PPTX
Memory model
DOC
Wiki 2
PDF
Thread Dump Analysis
PPT
Optimizing your java applications for multi core hardware
PPT
Java programing considering performance
PPT
Java Multithreading and Concurrency
PDF
Here comes the Loom - Ya!vaConf.pdf
PPTX
Cloud Module 3 .pptx
PPTX
Multithreading and concurrency in android
PPT
The Pillars Of Concurrency
PDF
S peculative multi
PPTX
PDF
Linux Device Driver parallelism using SMP and Kernel Pre-emption
PPT
Java Threading
PPTX
Introduction to OS LEVEL Virtualization & Containers
PPT
Intro To .Net Threads
PDF
Shared memory Parallelism (NOTES)
PDF
Dosass2
Concurrency Learning From Jdk Source
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Memory model
Wiki 2
Thread Dump Analysis
Optimizing your java applications for multi core hardware
Java programing considering performance
Java Multithreading and Concurrency
Here comes the Loom - Ya!vaConf.pdf
Cloud Module 3 .pptx
Multithreading and concurrency in android
The Pillars Of Concurrency
S peculative multi
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Java Threading
Introduction to OS LEVEL Virtualization & Containers
Intro To .Net Threads
Shared memory Parallelism (NOTES)
Dosass2

Recently uploaded (20)

PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
Digital Logic Computer Design lecture notes
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
web development for engineering and engineering
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PPT on Performance Review to get promotions
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT
Project quality management in manufacturing
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Sustainable Sites - Green Building Construction
Digital Logic Computer Design lecture notes
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
web development for engineering and engineering
additive manufacturing of ss316l using mig welding
UNIT 4 Total Quality Management .pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Mechanical Engineering MATERIALS Selection
Lecture Notes Electrical Wiring System Components
PPT on Performance Review to get promotions
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Structs to JSON How Go Powers REST APIs.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Project quality management in manufacturing

jvm/java - towards lock-free concurrency

  • 1. jvm/java: towards lock- free concurrency Arvind Kalyan Engineer at LinkedIn
  • 2. agenda intro to concurrency & memory model on jvm reordering -> barriers -> happens-before jdk 8 concurrency primitives volatile -> atomics, collections, explicit locks, fork/join trends in this area (to make all of this practical) lock-free, STM, TM
  • 3. background for control and performance, sometimes there are valid reasons to use locks (like a mutex) for concurrency control in most other situations, primitive synchronization constructs in some modules lead to unreliable & incorrect programs in most non-trivial systems that are composed over such modules the best practice, in the current state, is to write single threaded programs
  • 4. ‘automatic’ concurrency there are platforms that take your single threaded program and run it concurrently — most web servers do this, for example on the other hand, there are times when you really must use multiple threads
  • 5. practicality concurrency control techniques have been studied for a while, but since 2005 it is being studied intensely* to make it more practical for more widespread (and safer) use simpler software techniques, and also hardware level support for those techniques are being developed before we see how to write safe code using these new techniques, let’s look into some basics * https://scholar.google.com/scholar?as_ylo=2005&q=%22software+transactional+memory%22
  • 6. why concurrency control? when dealing with multiple threads, concurrency control/synchronization is necessary not only to guard critical sections from multiple threads using a mutex… but also to ensure that the memory updates (through mutable variables) are made visible to all threads ‘correctly’
  • 7. memory model as a platform, jvm guarantees that ‘correctly synchronized’ programs have a very well defined memory behavior let’s look into the jvm memory model which defines those guarantees
  • 8. memory model your code manipulates memory by using variables and objects the memory is separated by a few layers of caches from the cpu on a multi-core cpu when a write happens in one cpu’s cache, we need to make it visible to other cpus as well and then there is the topic of re-odering… * http://en.wikipedia.org/wiki/Memory_barrier
  • 9. memory model to improve performance, the hardware (cpu, caches, …) reorders memory access using its own memory model (set of rules)* dynamically the visibility of a value in a memory location is further complicated by the code reordering performed by the compiler statically http://en.wikipedia.org/wiki/Memory_ordering
  • 10. memory model the static and dynamic reordering strive to ensure an ‘as-if serial’ semantics i.e., the program appears to be executing sequentially as per the lines in your source code
  • 11. memory model memory reordering is transparent in single- threaded use-cases because of that as-if- serial guarantee but logic quickly falls apart and causes surprises in incorrectly synchronized multi- threaded programs
  • 12. memory model while jvm’s OOTA safety (out of thin air) guarantees that a thread always reads a value written by *some* thread, and not some value out of thin air… with all the reordering, it’s good to have a slightly stronger guarantee …
  • 13. the need for memory barriers in the following code, say reader is called after writer (from different threads)
 class Reordering {
 int x = 0, y = 0;
 public void writer() {
 x = 1;
 y = 2;
 }
 public void reader() {
 int r1 = y;
 int r2 = x;
 // use r1 and r2
 }
 } in reader, even if r1 == 2, r2 can be 0 or 1 synchronization is needed if we want to control the ordering (and ensure r2 == 1) using a memory barrier
  • 14. memory barrier the jvm memory model essentially defines the relationship between the variables in your code the semantics also define a partial ordering on the memory operations so certain actions are guaranteed to ‘happen before’ others
  • 15. happens-before happens-before is a visibility guarantee for memory provided through synchronization such as locking, volatiles, atomics, etc …and for completeness, through Thread start() & join()
  • 16. Concurrency control on jvm with JDK 8 with that background, let’s look at some specific tools & mechanisms available on the jvm & jdk 8..
  • 17. Concurrency control on jvm with JDK 8 volatiles atomics concurrent collections/data-structures synchronizers fork/join framework
  • 18. volatiles volatiles are typically used as a state variables across threads writing to & reading from a volatile is like releasing and acquiring a monitor (lock), respectively i.e., it guarantees a happens-before relationship not just with other volatile but also non-volatile memory
  • 19. volatiles typical use of volatiles with reader and writer called from different threads:
 class VolatileExample {
 int x = 0;
 volatile boolean v = false;
 public void writer() {
 x = 42;
 v = true;
 }
 public void reader() {
 if (v == true) {
 //uses x - guaranteed to see 42.
 }
 }
 } the happens-before guarantee in jvm memory model makes it simpler to reason about the value in x, even though x is non- volatile! code: https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
  • 20. volatiles guaranteeing happens-before relationship for non-volatile memory is a performance overhead, so like any other synchronization primitive, it must be used judiciously but, it greatly simplifies the program and by aligning the dynamic and static reordering with most programmers’ expectations
  • 21. atomics atomics* extend the notion of volatiles, and support conditional updates being an extension to volatiles, they guarantee happens-before relationship on memory operations the updates are performed through a CAS cpu instruction * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/package-summary.html
  • 22. atomics atomics/cas allow designing non-blocking algorithms where the critical section is around a single variable if there is more than one variable, other forms of synchronization is needed
  • 23. CAS JDK 8 uses CAS for ‘lock-free’ operation at a high-level, it piggy backs on a cpu provided CAS* instruction —like lock:cmpxchg on x86 let’s see how jvm dynamically improves the performance of the hardware provided CAS *CAS: http://en.wikipedia.org/wiki/Compare-and-swap
  • 24. CAS/atomics CAS in recent cpu implementations don’t assert the lock# to gain exclusive bus access, but rather rely on efficient cache-coherence protocols* — unless the memory address is not cache-line aligned even if that helps CAS to scale on many-core systems, CAS still adds a lot to local latency, sometimes nearly halting the cpu to address that local latency, a biased-locking* approach is used — where uncontended usage of atomics are recompiled dynamically to not use CAS instructions! * more about MESI: https://courses.engr.illinois.edu/cs232/sp2009/lectures/x24.pdf
 * biased locking in jvm: https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
  • 25. biased-locking the biased-locking feature in jvm extends beyond atomics, and generalizes to different kinds of locking (monitor entry & exit) on the jvm
  • 26. atomics before we move on, JDK 7 also provides ‘weakCompareAndSet’ atomic api, which relaxes the happens-before ordering guarantee relaxing the ordering makes it very hard to reason about the program’s execution so its use is limited to debugging counters, etc there are better ways of doing this ‘fast’ — which brings us to…
  • 27. adders & accumulators under high contention, the biased locking would be spending too much time in lock revocation from a thread if we used atomics in these high contention situations, adders* help gather counts by actively reducing contention, and ‘gather’ the value only when sum() or longValue() is called * http://download.java.net/lambda/b78/docs/api/java/util/concurrent/atomic/LongAdder.html
  • 28. concurrent collections the JDK also comes with a handful of lock- free collections these help in correctly synchronizing larger data sets than single variables
  • 29. concurrent collections ConcurrentHashMap (CHM) uses some of the concepts listed so far and provides a lock- free read, and a mostly lock-free write in java 8 relies on a good hashCode to reduce collisions, after which it reverts to using a lock for that bin
  • 30. concurrent collections CHM — in general — allows concurrent use of a Map which can be pretty useful especially to represent a shared ‘mutating’ state, and such CHM, together with adders for example, enable concurrent, lock-free, histogram generation across threads more about CHM here, ofcourse: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package- summary.html
  • 31. synchronizers let’s look at some synchronization primitives… (a.k.a. ‘source of bugs’)
  • 32. synchronizers 2 major categories… coarse-grained locks are usually less performant, but are easy to code and, fine-grained locking has potential for higher performance, but is more error prone
  • 33. synchronized synchronized keyword is a coarse grained locking scheme you acquire & release locks at method or block level, typically holding the lock longer than needed translates directly to jvm synchronization (intrinsic) & hardware monitor so its use is currently discouraged (might change in java9)
  • 34. explicit locks Locks* enables fine-grained locking these extend intrinsic locks, and allow unconditional, polled, timed & interruptible lock acquisition allow ‘custom’ wait/notify queues (Condition*) on the same lock nice features, but … * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
 * http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
  • 35. explicit locks developer needs to remember to release locks, so following style is encouraged:
 Lock l = ...;
 l.lock();
 try {
 // access the resource protected by this lock
 } finally {
 l.unlock();
 } it gets *very* complicated when we have to deal with more than 1 lock …source of all kinds of bugs & surprises * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
  • 36. ReentrantLock an implementation of Lock described earlier support fairness policy to deal with lock starvation — ‘fair’, not ‘fast’ there is nothing special in this lock to make it ‘reentrant’; all intrinsic locks are per-thread and reentrant, unlike POSIX invocation based locks * http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html
  • 37. a note about reentrancy reentrancy helps encapsulate locking behavior & helps write cleaner (oop) concurrent code in simpler cases (using single ‘resource’ but multiple methods) this also helps avoid deadlocks:
 class A {
 synchronized void run(){
 //..
 }
 }
 class B extends A {
 synchronized void run() {
 super.run()
 }
 } if intrinsic locks were not reentrant on jvm, the call to super.run() would be deadlocked
  • 38. ReentrantLock ReentrantLock (not reentrancy in general) has some issues so it must be used with caution: causes starvation, and performs poorly when fairness is used
  • 39. StampedLock supports optimistic reads & lock upgrades is not reentrant — needs the stamp, so not usable across calls to unknown methods for internal use in thread safe components, where you fully understand the data, objects & methods involved
  • 40. StampedLock for very short read-only code, optimistic reads improve throughput by reducing contention useful when reading multiple fields of an object from memory without locking must call validate() later to ensure consistency
  • 41. StampedLock along with optimistic reads, the lock upgrade capability enables many useful idioms:
 StampedLock sl = new StampedLock();
 double x, y;
 ..
 double distanceFromOrigin() { // A read-only method
 long stamp = sl.tryOptimisticRead();
 double currentX = x, currentY = y; // read without locking
 if (!sl.validate(stamp)) {
 stamp = sl.readLock(); // upgrade to read-lock if values are dirty
 try {
 currentX = x;
 currentY = y;
 } finally {
 sl.unlockRead(stamp);
 }
 }
 return Math.sqrt(currentX * currentX + currentY * currentY);
 }
  • 42. fork/join unlike regular java.lang.Thread (which are mostly based on POSIX threads), fork/join tasks never ‘block’ for simple tasks, the overhead of constructing and/or managing a thread is more expensive than the task itself programming on fork/join, in essence, allows frameworks to optimize such tasks ‘behind the scenes’
  • 43. fork/join going beyond performance, the framework does nothing to ensure concurrency control the framework is also only usable in a few scenarios where task can be easily disintegrated in a sense, this is not making it easier to create correct (and fast) programs
  • 44. lambdas & streams framework available on jdk 8 for data- processing workloads looks ‘functional’ — but due to type-erasure these aren't typed ‘look’ like anonymous inner class but are fundamentally different from the ground-up — enabling jvm optimizations for concurrency & gc
  • 45. lock-free we’ve looked at a few lock-free concepts at a single-variable level, using CAS and atomics, which rely on CAS and optimizations to make CAS faster…
  • 46. lock-free but how do we write ‘real-world’ concurrent applications using lock-free concepts? i.e., more than just CAS?
  • 47. lock-free that brings us to software transactional memory (STM)! STM is to concurrency control, what garbage- collection is to memory management
  • 48. STM brings DB transaction concept to regular memory access read & write ‘as-if’ there is no contention… during commit time the system ensures sanity under the hood … no locks in the code!
  • 49. STM in low contention use-cases (i.e., well- designed programs), the absence of synchronization makes execution very fast! even in poorly designed programs, the absence of locks makes it easier to focus on correctness
  • 50. STM implementation multiverse[1] is a popular jvm implementation of STM (groovy and Scala/Akka use it in their STM) in essence, multiverse implements multiversion concurrency control (MVCC[2]) Clojure has a language built-in STM feature [1] http://multiverse.codehaus.org/overview.html 
 [2] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
  • 51. STM & composability the biggest benefit of STM is composability (software reuse)
 class Account {
 private final TxnRef<Date> lastUpdate = …;
 private final TxnInteger balance = …;
 public void incBalance(int amount, Date date){
 atomic(new Runnable() {
 public void run(){
 balance.inc(amount);
 lastUpdate.set(date);
 if(balance.get() < 0) {
 throw new IllegalStateException("Not enough money");
 }
 }
 });
 }
 }
 class Teller {
 state void transfer(Account from, Account to, int amount) {
 atomic(new Runnable() {
 public void run() {
 Date date = new Date();
 from.incBalance(-amount, date);
 to.incBalance(amount, date);
 }
 });
 }
 }
  • 52. STM & composability the Teller class is able to ‘compose’ over other atomic operations without knowing their internal details (i.e., what locks they use to synchronize) so if to.incBalance() fails, the memory effects of from.incBalance() are not committed so will never be visible to other threads! this is a pretty big deal…
  • 53. Simplicity STM makes composing concurrent software modules appear very trivial in the absence of locks, it is easier to conceptualize the code flow the ability to code atomic operations this way essentially nullifies the challenges typically associated with concurrent programming
  • 54. performance as stated earlier, stm allows optimistic execution: ‘as though’ there are no other threads running, so it increases concurrency STM synchronizes only when required and falls back to slower (serialized) executions when necessary STM performs better than explicit locks as the number of cores increase beyond 4* * http://en.wikipedia.org/wiki/Software_transactional_memory
 http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
  • 55. more performance apart from just software improvements, cpu makers have started looking into hardware support for TM this is an emerging area and more advances are being made, apart from Haswell, and TSX from Intel * https://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell
 * http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
  • 56. STM & Practicality concurrent programming is getting more practical stm brings the benefits of fine-grained locking to coarse-grained locking without using locks
  • 57. Summary lock-free concurrency control techniques like STM not only make it easier to write correct code… but also allows platforms (like JVM) to make your code correct code run faster
  • 58. References Being a long slideshow with dense content, I’ve put references on each slide so you can read through Reach out to me on LinkedIn if you’d like more info or just to discuss!