JVM Memory Model - Yoav Abrahami, Wix

Anomalies
• How long does it take to count to 100?
• How long does it take to append to a list?
To sort a list?
• How long does it take to append to a
vector? To sort a vector?

Dynamic vs Static Compilation
• Static Compilation
– “ahead-of-time” (AOT) compilation
– Source code -> Native executable
– Compiles before executing
• Dynamic compiler (JIT)
– “just-in-time” (JIT) compilation
– Source -> bytecode -> interpreter -> JITed
– Most of compilation happens during executing

JIT Compilation
• Aggressive optimistic optimizations
– Through extensive usage of profiling info
– Limited budget (CPU, Memory)
– Startup speed may suffer
• The JIT
– Compiles bytecode when needed
– Maybe immediately before execution?
– Maybe never?

JVM JIT Compilation
• Eventually JITs bytecode
– Based on profiling
– After 10,000 cycles, again after 20,000 cycles
• Profiling allows focused code-gen
• Profiling allows better code-gen
– Inline what’s hot
– Loop unrolling, range-check elimination, etc.
– Branch prediction, spill-code-gen, scheduling

JVM JIT Compilation
• JVM applications operate in mixed mode
• Interpreted
– Bytecode-walking
– Artificial stack machine
• Compiled
– Direct native operations
– Native register machine

Inlining
int addAll(int max) {
int accum = 0;
for (int i=0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
int add(int a, int b) {
return a+b;
}
int addAll(int max) {
int accum = 0;
for (int i=0; i < max; i++) {
accum = accum + i;
}
return accum;
}

Loop unrolling
public void foo(int[] arr, int a) {
for (int i=0; i<arr.length; i++) {
arr[i] += a;
}
}
public void foo(int[] arr, int a) {
int limit = arr.length / 4;
for (int i=0; i<limit ; i++){
arr[4*i] += a; arr[4*i+1] += a;
arr[4*i+2] += a; arr[4*i+3] += a;
}
for (int i=limit*4; i<arr.length; i++) {
arr[i] += a;
}
}

Escape Analysis
public int m1() {
Pair p = new Pair(1,2);
return m2(p);
}
public int m2(Pair p) {
return p.first + m3(p);
}
public int m3(Pair p) {
return p.second;
}
// after deep inlining
public int m1() {
Pair p = new Pair(1,2);
return p.first + p.second;
}
// optimized version
public int m1() {
return 3;
}

Monitoring Jit
• Info about compiled methods
– -XX:+PrintCompilation
• Info about inlining
– -xx:+PrintInlining
– Requires also -XX:+UnlockDiagnosticVMOptions
• Print the assembly code
– -XX:+PrintAssembly
– Also requires also -
XX:+UnlockDiagnosticVMOptions
– On Mac OS requires adding hsdis-amd64.dylib
to the LD_LIBRARY_PATH environment variable.

Challenge
• Rerun the benchmarks, this time using
1. -XX:+PrintCompilation
2. -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

JVM Memory
The Java Memory Model

Java Memory Model
• The Java Memory Model (JMM) describes
how threads in the Java (Scala)
Programming language interact through
memory.
• Provides sequential consistency for data
race free programs.

Instruction Reordering
• Program Order
int a=1;
int b=2;
int c=3;
int d=4;
int e = a + b;
int f = c - d;
• Execution Order
int d=4;
int c=3;
int f = c - d;
int b=2;
int a=1;
int e = a + b;

Anomaly
• Two threads running
• What will be the result?
i=1, j=1
i=0, j=1
i=1, j=0
i=0, j=0
x=y=0
j=y
x=1
i=x
y=1
Thread 1 Thread 2

Let’s Check
• Let’s build the scenario
val t1 = new Thread(new Runnable {
def run() {
// sleep a little to add some uncertainty
Thread.sleep(1)
x=1
j=y
}
})
• Then run it a few times
• Do we see the anomaly?

Happens Before Ordering
• Defines constraints on instruction reordering
• A monitor release
• A matching monitor acquire
• Volatile field reads are after writes
– For non volatile field, this is not necessarily the
case!
• Assignment dependency within a single
thread
• Happens Before ordering is transitive

Anomaly
• Let’s see how far we can count in 100 milli-seconds
var running = true
• Let thread 1 count
var count = 0
while (running)
count = count + 1
println(count)
• Let thread 2 signal thread 1 to stop
Thread.sleep(100)
running = false
println("thread 2 set running to false”)

Volatile
• Compilers can reorder instructions
• Compilers can keep values in registers
• Processors can reorder instructions
• Values may be in different caching levels
and not synced to main memory
• JMM is designed for aggressive
optimizations

Volatile
• Modern processor caches
Core 1 Core 2 Core 3 Core 4
L1 L1 L1 L1
L2 L2 L2 L2
L3 L3
Main Memory ~65 ns (DRAM)
~15 ns (40-45 cycles)
~3 ns (10-12 cycles)
~1 ns (3-4 cycles)
< 1 ns

Volatile
• Volatile instructs the compiler and processor
to sync the value to main memory on every
access
– Does not utilize the L1, L2 or L3 cache
• Volatile reads / writes cannot be reordered
• Volatile long and doubles are atomic
– Long and double types are over 32bit – the
processor operates on 32bit atomicity by default.

Resolve the Anomaly
• Let’s see how far we can count in 100 milli-seconds
@volatile var running = true
• Let thread 1 count
var count = 0
while (running)
count = count + 1
println(count)
• Let thread 2 signal thread 1 to stop
Thread.sleep(100)
running = false
println("thread 2 set running to false”)

Anomaly
• Let’s count to 10,000
• But lets use 10 threads, each adding 1,000 to
our count
var count = 0
• Each of the 10 threads does
for (i <- 1 to 1000)
count = count + 1
• What did we get?

Synchronization
• Let’s have another look at the assignment
count = count + 1
count = count + 1
• Is this a single instruction?
• javap
– javap <class> - Print the class signature
– javap -c <class> - Print the class bytecode

Synchronization
• The bytecode for count = count + 1
14: getfield #38 // Field scala/runtime/IntRef.elem:I
17: iconst_1
18: iadd
19: putfield #38 // Field scala/runtime/IntRef.elem:I

Synchronization
• The bytecode for count = count + 1
// Read the current counter value from field 38
// and add it to the stack
14: getfield #38 // Field scala/runtime/IntRef.elem:I
// Add 1 to the stack
17: iconst_1
// Add the first two stack elements as integers,
// and put the result in the stack
18: iadd
// set field 38 to the current top element of the stack
// assuming it is an integer
19: putfield #38 // Field scala/runtime/IntRef.elem:I

Synchronization Tools
Actions by
thread 1
Thread 1
“release”
monitor
Thread 2
“acquire”
monitor
Actions by
thread 2
Happens-before

• Synchronization tools allow grouping
instructions as if “one atomic instruction”
– Only one thread can perform the code at a time
• Some tools
– Synchronized
– ReentrantLock
– CountDownLatch
– Semaphore
– ReentrantReadWriteLock

• Simplest tools – synchronized
// for each thread
for (i <- 1 to 1000)
synchronized {
count = count + 1
}
• Works relative to ‘this’

• Using ReentrantLock
// before the threads
val lock = new ReentrantLock()
// for each thread
for (i <- 1 to 1000) {
lock.lock()
try {
count = count + 1
}
finally {
lock.unlock()
}
}

Atomic Operations
• Containers for simple values or references
with atomic operations
• getAndIncrement
• getAndDecrement
• getAndAdd

Atomic Operations
• All are based on compareAndSwap
– From the unsafe class
– Used to implement spin-locks

Atomic Operations
• Spin Lock
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
}
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this,
valueOffset, expect, update);
}

References
• The examples on Github
https://github.com/yoavaa/jvm-memory-model

JVM Memory Model - Yoav Abrahami, Wix

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to JVM Memory Model - Yoav Abrahami, Wix (20)

More from Codemotion Tel Aviv (20)

Recently uploaded (20)

JVM Memory Model - Yoav Abrahami, Wix