SlideShare a Scribd company logo
JVM
Memory Model
JIT
Anomalies
• How long does it take to count to 100?
• How long does it take to append to a list?
To sort a list?
• How long does it take to append to a
vector? To sort a vector?
Dynamic vs Static Compilation
• Static Compilation
– “ahead-of-time” (AOT) compilation
– Source code -> Native executable
– Compiles before executing
• Dynamic compiler (JIT)
– “just-in-time” (JIT) compilation
– Source -> bytecode -> interpreter -> JITed
– Most of compilation happens during executing
JIT Compilation
• Aggressive optimistic optimizations
– Through extensive usage of profiling info
– Limited budget (CPU, Memory)
– Startup speed may suffer
• The JIT
– Compiles bytecode when needed
– Maybe immediately before execution?
– Maybe never?
JVM JIT Compilation
• Eventually JITs bytecode
– Based on profiling
– After 10,000 cycles, again after 20,000 cycles
• Profiling allows focused code-gen
• Profiling allows better code-gen
– Inline what’s hot
– Loop unrolling, range-check elimination, etc.
– Branch prediction, spill-code-gen, scheduling
JVM JIT Compilation
• JVM applications operate in mixed mode
• Interpreted
– Bytecode-walking
– Artificial stack machine
• Compiled
– Direct native operations
– Native register machine
JVM application utilization
Optimizations in HotSpots JVM
Inlining
int addAll(int max) {
int accum = 0;
for (int i=0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
int add(int a, int b) {
return a+b;
}
int addAll(int max) {
int accum = 0;
for (int i=0; i < max; i++) {
accum = accum + i;
}
return accum;
}
Loop unrolling
public void foo(int[] arr, int a) {
for (int i=0; i<arr.length; i++) {
arr[i] += a;
}
}
public void foo(int[] arr, int a) {
int limit = arr.length / 4;
for (int i=0; i<limit ; i++){
arr[4*i] += a; arr[4*i+1] += a;
arr[4*i+2] += a; arr[4*i+3] += a;
}
for (int i=limit*4; i<arr.length; i++) {
arr[i] += a;
}
}
Escape Analysis
public int m1() {
Pair p = new Pair(1,2);
return m2(p);
}
public int m2(Pair p) {
return p.first + m3(p);
}
public int m3(Pair p) {
return p.second;
}
// after deep inlining
public int m1() {
Pair p = new Pair(1,2);
return p.first + p.second;
}
// optimized version
public int m1() {
return 3;
}
Monitoring Jit
• Info about compiled methods
– -XX:+PrintCompilation
• Info about inlining
– -xx:+PrintInlining
– Requires also -XX:+UnlockDiagnosticVMOptions
• Print the assembly code
– -XX:+PrintAssembly
– Also requires also -
XX:+UnlockDiagnosticVMOptions
– On Mac OS requires adding hsdis-amd64.dylib
to the LD_LIBRARY_PATH environment variable.
Challenge
• Rerun the benchmarks, this time using
1. -XX:+PrintCompilation
2. -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
JVM Memory
The Java Memory Model
Java Memory Model
• The Java Memory Model (JMM) describes
how threads in the Java (Scala)
Programming language interact through
memory.
• Provides sequential consistency for data
race free programs.
Instruction Reordering
• Program Order
int a=1;
int b=2;
int c=3;
int d=4;
int e = a + b;
int f = c - d;
• Execution Order
int d=4;
int c=3;
int f = c - d;
int b=2;
int a=1;
int e = a + b;
Anomaly
• Two threads running
• What will be the result?
i=1, j=1
i=0, j=1
i=1, j=0
i=0, j=0
x=y=0
j=y
x=1
i=x
y=1
Thread 1 Thread 2
Let’s Check
• Let’s build the scenario
val t1 = new Thread(new Runnable {
def run() {
// sleep a little to add some uncertainty
Thread.sleep(1)
x=1
j=y
}
})
• Then run it a few times
• Do we see the anomaly?
Happens Before Ordering
• Defines constraints on instruction reordering
• A monitor release
• A matching monitor acquire
• Volatile field reads are after writes
– For non volatile field, this is not necessarily the
case!
• Assignment dependency within a single
thread
• Happens Before ordering is transitive
Anomaly
• Let’s see how far we can count in 100 milli-seconds
var running = true
• Let thread 1 count
var count = 0
while (running)
count = count + 1
println(count)
• Let thread 2 signal thread 1 to stop
Thread.sleep(100)
running = false
println("thread 2 set running to false”)
Volatile
• Compilers can reorder instructions
• Compilers can keep values in registers
• Processors can reorder instructions
• Values may be in different caching levels
and not synced to main memory
• JMM is designed for aggressive
optimizations
Volatile
• Modern processor caches
Core 1 Core 2 Core 3 Core 4
L1 L1 L1 L1
L2 L2 L2 L2
L3 L3
Main Memory ~65 ns (DRAM)
~15 ns (40-45 cycles)
~3 ns (10-12 cycles)
~1 ns (3-4 cycles)
< 1 ns
Volatile
• Volatile instructs the compiler and processor
to sync the value to main memory on every
access
– Does not utilize the L1, L2 or L3 cache
• Volatile reads / writes cannot be reordered
• Volatile long and doubles are atomic
– Long and double types are over 32bit – the
processor operates on 32bit atomicity by default.
Resolve the Anomaly
• Let’s see how far we can count in 100 milli-seconds
@volatile var running = true
• Let thread 1 count
var count = 0
while (running)
count = count + 1
println(count)
• Let thread 2 signal thread 1 to stop
Thread.sleep(100)
running = false
println("thread 2 set running to false”)
Anomaly
• Let’s count to 10,000
• But lets use 10 threads, each adding 1,000 to
our count
var count = 0
• Each of the 10 threads does
for (i <- 1 to 1000)
count = count + 1
• What did we get?
Synchronization
• Let’s have another look at the assignment
count = count + 1
count = count + 1
• Is this a single instruction?
• javap
– javap <class> - Print the class signature
– javap -c <class> - Print the class bytecode
Synchronization
• The bytecode for count = count + 1
14: getfield #38 // Field scala/runtime/IntRef.elem:I
17: iconst_1
18: iadd
19: putfield #38 // Field scala/runtime/IntRef.elem:I
Synchronization
• The bytecode for count = count + 1
// Read the current counter value from field 38
// and add it to the stack
14: getfield #38 // Field scala/runtime/IntRef.elem:I
// Add 1 to the stack
17: iconst_1
// Add the first two stack elements as integers,
// and put the result in the stack
18: iadd
// set field 38 to the current top element of the stack
// assuming it is an integer
19: putfield #38 // Field scala/runtime/IntRef.elem:I
Synchronization Tools
Actions by
thread 1
Thread 1
“release”
monitor
Thread 2
“acquire”
monitor
Actions by
thread 2
Happens-before
Synchronization Tools
• Synchronization tools allow grouping
instructions as if “one atomic instruction”
– Only one thread can perform the code at a time
• Some tools
– Synchronized
– ReentrantLock
– CountDownLatch
– Semaphore
– ReentrantReadWriteLock
Synchronization Tools
• Simplest tools – synchronized
// for each thread
for (i <- 1 to 1000)
synchronized {
count = count + 1
}
• Works relative to ‘this’
Synchronization Tools
• Using ReentrantLock
// before the threads
val lock = new ReentrantLock()
// for each thread
for (i <- 1 to 1000) {
lock.lock()
try {
count = count + 1
}
finally {
lock.unlock()
}
}
Atomic Operations
• Containers for simple values or references
with atomic operations
• getAndIncrement
• getAndDecrement
• getAndAdd
Atomic Operations
• All are based on compareAndSwap
– From the unsafe class
– Used to implement spin-locks
Atomic Operations
• Spin Lock
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
}
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this,
valueOffset, expect, update);
}
References
• The examples on Github
https://github.com/yoavaa/jvm-memory-model
Questions?

More Related Content

PDF
Processing Big Data in Real-Time - Yanai Franchi, Tikal
PDF
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
PPTX
Scheduling in Linux and Web Servers
PDF
[232]mist 고성능 iot 스트림 처리 시스템
PDF
Engineering fast indexes (Deepdive)
PPTX
The Internet
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
PPTX
Jvm memory model
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Scheduling in Linux and Web Servers
[232]mist 고성능 iot 스트림 처리 시스템
Engineering fast indexes (Deepdive)
The Internet
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Jvm memory model

What's hot (20)

PDF
Performance Profiling in Rust
PDF
Highly Scalable Java Programming for Multi-Core System
PDF
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
PPTX
Am I reading GC logs Correctly?
PDF
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PPTX
Michael Häusler – Everyday flink
PPTX
Apache Flink Training: DataStream API Part 1 Basic
PDF
Exploiting Concurrency with Dynamic Languages
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
PPT
Erlang For Five Nines
PDF
Introduction to RevKit
POTX
Performance Tuning EC2 Instances
PDF
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
PDF
Reversible Logic Synthesis and RevKit
PPTX
Seeing with Python presented at PyCon AU 2014
PPSX
Dx11 performancereloaded
PPTX
Java Performance Tweaks
PPTX
Optimizing Communicating Event-Loop Languages with Truffle
PDF
Xdp and ebpf_maps
Performance Profiling in Rust
Highly Scalable Java Programming for Multi-Core System
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Am I reading GC logs Correctly?
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Michael Häusler – Everyday flink
Apache Flink Training: DataStream API Part 1 Basic
Exploiting Concurrency with Dynamic Languages
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
Erlang For Five Nines
Introduction to RevKit
Performance Tuning EC2 Instances
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Reversible Logic Synthesis and RevKit
Seeing with Python presented at PyCon AU 2014
Dx11 performancereloaded
Java Performance Tweaks
Optimizing Communicating Event-Loop Languages with Truffle
Xdp and ebpf_maps
Ad

Viewers also liked (17)

PPT
Internet02
PPTX
Digipak making process
PPTX
Question 6: Technology – What have you learnt about technologies from the pro...
DOC
RAM PRASAD SVK
PDF
10 Common Health Insurance Terms Explained
PPTX
Most Common Causes of Injuries in a Workplace
PDF
Descriptif des produits_d_investissement_et_des_risques1
PDF
Cyber Capabilities of the Netherlands
PPT
Символ Пасхи
PDF
DIEGO_G M_Ext _English_March_2015
PDF
Student-to-Student Ticket Sales - MVP1 Feedback
PDF
Clare Air Presentation__
PDF
Strategies of Cyber Security_EU_Member States_Georgia_12.12.2014
PDF
St. Louis Industrial Outlook Q3 2016
PPTX
Улица Себастия
PPTX
homesteader, the city of homestead, homestead convertible home sector - 25, g...
PPTX
Подарок маме 1
Internet02
Digipak making process
Question 6: Technology – What have you learnt about technologies from the pro...
RAM PRASAD SVK
10 Common Health Insurance Terms Explained
Most Common Causes of Injuries in a Workplace
Descriptif des produits_d_investissement_et_des_risques1
Cyber Capabilities of the Netherlands
Символ Пасхи
DIEGO_G M_Ext _English_March_2015
Student-to-Student Ticket Sales - MVP1 Feedback
Clare Air Presentation__
Strategies of Cyber Security_EU_Member States_Georgia_12.12.2014
St. Louis Industrial Outlook Q3 2016
Улица Себастия
homesteader, the city of homestead, homestead convertible home sector - 25, g...
Подарок маме 1
Ad

Similar to JVM Memory Model - Yoav Abrahami, Wix (20)

PPTX
Java Jit. Compilation and optimization by Andrey Kovalenko
KEY
SMP implementation for OpenBSD/sgi
PDF
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
PPTX
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
PPTX
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
PDF
Cache aware hybrid sorter
PPTX
Onnc intro
PDF
A New Tracer for Reverse Engineering - PacSec 2010
PDF
Eclipse Day India 2015 - Java bytecode analysis and JIT
PDF
Sista: Improving Cog’s JIT performance
PPTX
Keeping Your Java Hot by Solving the JVM Warmup Problem
PPTX
synchronization in operating system structure
PDF
Unmanaged Parallelization via P/Invoke
PDF
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
PDF
Programar para GPUs
PDF
pytdddddddddddddddddddddddddddddddddorch.pdf
PPTX
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
PDF
A22 Introduction to DTrace by Kyle Hailey
PDF
2010 02 instrumentation_and_runtime_measurement
PDF
Concurrency
Java Jit. Compilation and optimization by Andrey Kovalenko
SMP implementation for OpenBSD/sgi
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Cache aware hybrid sorter
Onnc intro
A New Tracer for Reverse Engineering - PacSec 2010
Eclipse Day India 2015 - Java bytecode analysis and JIT
Sista: Improving Cog’s JIT performance
Keeping Your Java Hot by Solving the JVM Warmup Problem
synchronization in operating system structure
Unmanaged Parallelization via P/Invoke
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
Programar para GPUs
pytdddddddddddddddddddddddddddddddddorch.pdf
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
A22 Introduction to DTrace by Kyle Hailey
2010 02 instrumentation_and_runtime_measurement
Concurrency

More from Codemotion Tel Aviv (20)

PDF
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
PDF
Angular is one fire(base)! - Shmuela Jacobs
PDF
Demystifying docker networking black magic - Lorenzo Fontana, Kiratech
PDF
Faster deep learning solutions from training to inference - Amitai Armon & Ni...
PDF
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
PDF
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
PDF
Unleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
PDF
Can we build an Azure IoT controlled device in less than 40 minutes that cost...
PDF
Actors and Microservices - Can two walk together? - Rotem Hermon, Gigya
PDF
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
PDF
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
PDF
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
PDF
Containerised ASP.NET Core apps with Kubernetes
PDF
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
PDF
The Art of Decomposing Monoliths - Kfir Bloch, Wix
PDF
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PDF
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerry
PDF
Web based virtual reality - Tanay Pant, Mozilla
PDF
Material Design Demytified - Ran Nachmany, Google
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Angular is one fire(base)! - Shmuela Jacobs
Demystifying docker networking black magic - Lorenzo Fontana, Kiratech
Faster deep learning solutions from training to inference - Amitai Armon & Ni...
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
Unleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
Can we build an Azure IoT controlled device in less than 40 minutes that cost...
Actors and Microservices - Can two walk together? - Rotem Hermon, Gigya
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Containerised ASP.NET Core apps with Kubernetes
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
The Art of Decomposing Monoliths - Kfir Bloch, Wix
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerry
Web based virtual reality - Tanay Pant, Mozilla
Material Design Demytified - Ran Nachmany, Google

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

JVM Memory Model - Yoav Abrahami, Wix

  • 2. JIT
  • 3. Anomalies • How long does it take to count to 100? • How long does it take to append to a list? To sort a list? • How long does it take to append to a vector? To sort a vector?
  • 4. Dynamic vs Static Compilation • Static Compilation – “ahead-of-time” (AOT) compilation – Source code -> Native executable – Compiles before executing • Dynamic compiler (JIT) – “just-in-time” (JIT) compilation – Source -> bytecode -> interpreter -> JITed – Most of compilation happens during executing
  • 5. JIT Compilation • Aggressive optimistic optimizations – Through extensive usage of profiling info – Limited budget (CPU, Memory) – Startup speed may suffer • The JIT – Compiles bytecode when needed – Maybe immediately before execution? – Maybe never?
  • 6. JVM JIT Compilation • Eventually JITs bytecode – Based on profiling – After 10,000 cycles, again after 20,000 cycles • Profiling allows focused code-gen • Profiling allows better code-gen – Inline what’s hot – Loop unrolling, range-check elimination, etc. – Branch prediction, spill-code-gen, scheduling
  • 7. JVM JIT Compilation • JVM applications operate in mixed mode • Interpreted – Bytecode-walking – Artificial stack machine • Compiled – Direct native operations – Native register machine
  • 10. Inlining int addAll(int max) { int accum = 0; for (int i=0; i < max; i++) { accum = add(accum, i); } return accum; } int add(int a, int b) { return a+b; } int addAll(int max) { int accum = 0; for (int i=0; i < max; i++) { accum = accum + i; } return accum; }
  • 11. Loop unrolling public void foo(int[] arr, int a) { for (int i=0; i<arr.length; i++) { arr[i] += a; } } public void foo(int[] arr, int a) { int limit = arr.length / 4; for (int i=0; i<limit ; i++){ arr[4*i] += a; arr[4*i+1] += a; arr[4*i+2] += a; arr[4*i+3] += a; } for (int i=limit*4; i<arr.length; i++) { arr[i] += a; } }
  • 12. Escape Analysis public int m1() { Pair p = new Pair(1,2); return m2(p); } public int m2(Pair p) { return p.first + m3(p); } public int m3(Pair p) { return p.second; } // after deep inlining public int m1() { Pair p = new Pair(1,2); return p.first + p.second; } // optimized version public int m1() { return 3; }
  • 13. Monitoring Jit • Info about compiled methods – -XX:+PrintCompilation • Info about inlining – -xx:+PrintInlining – Requires also -XX:+UnlockDiagnosticVMOptions • Print the assembly code – -XX:+PrintAssembly – Also requires also - XX:+UnlockDiagnosticVMOptions – On Mac OS requires adding hsdis-amd64.dylib to the LD_LIBRARY_PATH environment variable.
  • 14. Challenge • Rerun the benchmarks, this time using 1. -XX:+PrintCompilation 2. -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
  • 15. JVM Memory The Java Memory Model
  • 16. Java Memory Model • The Java Memory Model (JMM) describes how threads in the Java (Scala) Programming language interact through memory. • Provides sequential consistency for data race free programs.
  • 17. Instruction Reordering • Program Order int a=1; int b=2; int c=3; int d=4; int e = a + b; int f = c - d; • Execution Order int d=4; int c=3; int f = c - d; int b=2; int a=1; int e = a + b;
  • 18. Anomaly • Two threads running • What will be the result? i=1, j=1 i=0, j=1 i=1, j=0 i=0, j=0 x=y=0 j=y x=1 i=x y=1 Thread 1 Thread 2
  • 19. Let’s Check • Let’s build the scenario val t1 = new Thread(new Runnable { def run() { // sleep a little to add some uncertainty Thread.sleep(1) x=1 j=y } }) • Then run it a few times • Do we see the anomaly?
  • 20. Happens Before Ordering • Defines constraints on instruction reordering • A monitor release • A matching monitor acquire • Volatile field reads are after writes – For non volatile field, this is not necessarily the case! • Assignment dependency within a single thread • Happens Before ordering is transitive
  • 21. Anomaly • Let’s see how far we can count in 100 milli-seconds var running = true • Let thread 1 count var count = 0 while (running) count = count + 1 println(count) • Let thread 2 signal thread 1 to stop Thread.sleep(100) running = false println("thread 2 set running to false”)
  • 22. Volatile • Compilers can reorder instructions • Compilers can keep values in registers • Processors can reorder instructions • Values may be in different caching levels and not synced to main memory • JMM is designed for aggressive optimizations
  • 23. Volatile • Modern processor caches Core 1 Core 2 Core 3 Core 4 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 Main Memory ~65 ns (DRAM) ~15 ns (40-45 cycles) ~3 ns (10-12 cycles) ~1 ns (3-4 cycles) < 1 ns
  • 24. Volatile • Volatile instructs the compiler and processor to sync the value to main memory on every access – Does not utilize the L1, L2 or L3 cache • Volatile reads / writes cannot be reordered • Volatile long and doubles are atomic – Long and double types are over 32bit – the processor operates on 32bit atomicity by default.
  • 25. Resolve the Anomaly • Let’s see how far we can count in 100 milli-seconds @volatile var running = true • Let thread 1 count var count = 0 while (running) count = count + 1 println(count) • Let thread 2 signal thread 1 to stop Thread.sleep(100) running = false println("thread 2 set running to false”)
  • 26. Anomaly • Let’s count to 10,000 • But lets use 10 threads, each adding 1,000 to our count var count = 0 • Each of the 10 threads does for (i <- 1 to 1000) count = count + 1 • What did we get?
  • 27. Synchronization • Let’s have another look at the assignment count = count + 1 count = count + 1 • Is this a single instruction? • javap – javap <class> - Print the class signature – javap -c <class> - Print the class bytecode
  • 28. Synchronization • The bytecode for count = count + 1 14: getfield #38 // Field scala/runtime/IntRef.elem:I 17: iconst_1 18: iadd 19: putfield #38 // Field scala/runtime/IntRef.elem:I
  • 29. Synchronization • The bytecode for count = count + 1 // Read the current counter value from field 38 // and add it to the stack 14: getfield #38 // Field scala/runtime/IntRef.elem:I // Add 1 to the stack 17: iconst_1 // Add the first two stack elements as integers, // and put the result in the stack 18: iadd // set field 38 to the current top element of the stack // assuming it is an integer 19: putfield #38 // Field scala/runtime/IntRef.elem:I
  • 30. Synchronization Tools Actions by thread 1 Thread 1 “release” monitor Thread 2 “acquire” monitor Actions by thread 2 Happens-before
  • 31. Synchronization Tools • Synchronization tools allow grouping instructions as if “one atomic instruction” – Only one thread can perform the code at a time • Some tools – Synchronized – ReentrantLock – CountDownLatch – Semaphore – ReentrantReadWriteLock
  • 32. Synchronization Tools • Simplest tools – synchronized // for each thread for (i <- 1 to 1000) synchronized { count = count + 1 } • Works relative to ‘this’
  • 33. Synchronization Tools • Using ReentrantLock // before the threads val lock = new ReentrantLock() // for each thread for (i <- 1 to 1000) { lock.lock() try { count = count + 1 } finally { lock.unlock() } }
  • 34. Atomic Operations • Containers for simple values or references with atomic operations • getAndIncrement • getAndDecrement • getAndAdd
  • 35. Atomic Operations • All are based on compareAndSwap – From the unsafe class – Used to implement spin-locks
  • 36. Atomic Operations • Spin Lock public final int getAndIncrement() { for (;;) { int current = get(); int next = current + 1; if (compareAndSet(current, next)) return current; } } } public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }
  • 37. References • The examples on Github https://github.com/yoavaa/jvm-memory-model