SlideShare a Scribd company logo
The following is intended to outline our general product direction. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality described for Oracle’s products may change
and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed
discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and
Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q
under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website
at http://www.oracle.com/investor. All information in this presentation is current as of September 2019
and Oracle undertakes no duty to update any statement in light of new information or future events.
Safe Harbor
Copyright © 2019 Oracle and/or its affiliates.
CSI (Crash Scene Investigation) HotSpot:
Common JVM Crash Causes and Solutions
[DEV4421]
Principal Member of Technical Staff
Java Platform Group
September 17, 2019
David Buck
Copyright © 2019 Oracle and/or its affiliates.
JVM Sustaining Engineer
OpenJDK Update Project
Maintainer
JavaOne Rock Star
Co-author of Oracle WebLogic
Server 11g 構築・運用ガイド
@DavidBuckJP
https://blogs.oracle.com/buck/
Who am I? David Buck (left)
Insurance Institute
for Highway Safety
[CC BY-SA 3.0
(https://creativecom
mons.org/licenses/by
-sa/3.0)]
Motivation
Identify root cause
Prevent future occurrences
Collect information to help others debug further
Background
JVM Crash
JVM process terminates abnormally
OS signals a fatal error (e.g. SIGSEGV, SIGFPE)
JVM detects internal unrecoverable error
Native-level manifestation
“Crash” is often used in other contexts for any process that ends in
failure, but here we mean the above only
Why not try to recover?
Often, “unrecoverable” means continuing would be too risky
Fast fail is preferred
Integrity of data
Quicker detection and resolution of problem
Redundancy of JVM instances (clustering) maintains availability
Responding to a crash
Collect any necessary data
Restart JVM process
Ideally the above two will be automated
Analyze offline
JVM Crashes
JVM Crashes
Result
JVM Bugs
Data
Fatal Error Log
hs_err_<pid>.log
Output to
-XX:ErrorFile
JVM current working directory
Temporary directory (e.g. /tmp) if can’t write to CWD
Useful for identifying known issues
Useful for identifying environmental / application issues
Not very useful for trying to identify new JVM bugs
Should not contain sensitive data
Avoid credentials on command line or environmental variables
Fatal Error Log Audience
JVM Vendor
Identify known issues
Quicker core file analysis
Lots of JVM internal data
End Users
Anyone troubleshooting a crash
Lots of useful non-internal data
Fatal Error Log Audience
JVM Vendor
Identify known issues
Quicker core file analysis
Lots of JVM internal data
End Users
Anyone troubleshooting a crash
Lots useful non-internal data
JVM
Fatal Error Log Audience
JVM Vendor
Identify known issues
Quicker core file analysis
Lots of JVM internal data
End Users
Anyone troubleshooting a crash
Lots useful non-internal data
JVM
hs_err_4242.log
Core File
Memory dump of the JVM process
Large heap -> Large core file
May contain sensitive data (passwords, PII, etc.)
Truncation is very often an issue
May consume significant disk space
Automatic restart could result in disk space exhaustion
Make sure you have plenty of space on file system
Configure core file output to non-critical file system
Core File
Heap
“Other Stuff”
Thread Stacks and more
“Other Stuff”
Heap
“Other Stuff”
Core File
Heap
“Other Stuff”
Thread Stacks and more
“Other Stuff”
Heap
“Other Stuff”
Core File
Not enabled by default in many configurations
Linux be sure to set “ulimit –c unlimited”
Disabled by default on non-Server Windows
-XX:+CreateCoredumpOnCrash (JDK >= 9)
-XX:+CreateMinidumpOnCrash (JDK <= 8)
Core File
JVM
core
Core File
JVM
Native
core
Serviceability Agent
Platform-independent core file
debugging
Built-in knowledge of JVM
internals
Possibly able to recover JFR data
from core file
Much easier to use JDK >= 9
Other Important Data
Native libraries loaded by JVM
Copies of libraries (Linux / macOS / Solaris)
PDB files (Windows)
Any unexpected output in log files / stdout / stderr
OutOfMemoryError
StackOverflowError
Strange OS or native library output
Identifying native libraries
Linux: gdb “info shared”
Windows: windbg “lm”
Solaris: “pldd corefile” (yes... this works!)
macOS: lldb “image list”
can be automated (e.g. pkgapp)
Crash Causes
Stack Overflow
Way more dangerous than many people think
HotSpot is able to recover most of the time
Can silently corrupt memory
JVM behavior is considered undefined until reboot
Very easy to handle while interpreting bytecode
Impossible to guarantee proper handling in native code
Stack Overflow
Guard
Page
Stack Pointer
Stack Overflow
Guard
Page
Stack Pointer
○ Read
○ Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Guard
Page
Stack Pointer
○ Read
○ Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Guard
Page
Stack Pointer
○ Read
○ Write
× Execute
× Read
× Write
× Execute
SIGSEGV
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
SIGSEGV
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
○ Read
○ Write
× Execute
× Read
× Write
× Execute
StackOverFlowError
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Unwinding a Stack Overflow
If SOFE thrown in a critical section
Java-level data may be left in inconsistent state
Java-level lock may be left “held” by nobody (likely hang)
No JVM crash, but system unlikely to be able to continue running
Unwinding a Stack Overflow
No way to unwind arbitrary native code
Must be executing Java when we “discover” the overflow
Stack Banging
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Banging
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Banging
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
StackOverFlowError
Stack Banging
Can be controlled by StackShadowPages
Too low of a value makes you more vulnerable to unrecoverable
stack overflow
Too high of a value could waste stack space
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
??????????????
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
??????????????
Stack Overflow
Red
Pages
Stack Pointer
Yellow
Pages
[guard page]
○ Read
○ Write
× Execute
× Read
× Write
× Execute
× Read
× Write
× Execute
??????????????
Red / Yellow Pages
StackYellowPages
StackRedPages
Too low values makes you more vulnerable to unrecoverable stack
overflow
Too high of a value could waste stack space
Stack Overflow
Even when we recover, the JVM should be restarted
Locks held at the time of SOFE may be left locked
Data structures may be been left in an inconsistent state
Other stack overflows may have silently corrupted native data
VirtualMachineError
Thrown to indicate that the Java Virtual Machine is broken or has
run out of resources necessary for it to continue operating.
VirtualMachineError
InternalError OutOfMemoryError StackOverFlowError UnknownError
If You See One Stack Overflow…
Public Domain, https://commons.wikimedia.org/w/index.php?curid=696464
Stack Overflow
OutOfMemory and StackOverflow Exception counts:
StackOverflowErrors=1
Error log (JDK >= 8) will record number of SOFE that were
successfully handled
Many stack overflow crashes show no obvious sign of stack
overflow
Stack Overflow
No StackOverFlow is benign
All SOFEs should be investigated for root cause and resolved
SOFE is very hard to eliminate as a possible root cause for many
crashes
Use of Internal / Private APIs
sun.misc.Unsafe
Java code can directly access various JVM internal functionality
Used sparingly to implement parts of the Java SE Class Library
Never intended for use outside of Sun / Oracle
By Cbmeeks / processed by Pixel8 - Original uploader was Cbmeeks at en.wikipedia, CC 表示-継承 3.0, https://commons.wikimedia.org/w/index.php?curid=3672924
CSI (Crash Scene Investigation) HotSpot: Common JVM Crash Causes and Solutions [Code One 2019]
BASIC Support for Direct Access
PEEK
Retrieve data from an arbitrary address
POKE
Write an arbitrary value to an arbitrary address
sun.misc.Unsafe
Allocate uninitialized memory on Heap
More flexible memory model
PEEK/POKE of JVM address space
Unsafe Usage
Reflection
Serialization
NIO
java.util.concurrent
Encryption / Decryption
BigDecimal / BigInteger
Java2D
CPU usage monitoring (JMX)
private static final Unsafe theUnsafe = new Unsafe();
public static Unsafe getUnsafe() {
Class cc =
sun.reflect.Reflection.getCallerClass(2);
if (cc.getClassLoader() != null)
throw new SecurityException("Unsafe");
return theUnsafe;
}
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
unsafe = (Unsafe) f.get(null);
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
unsafe = (Unsafe) f.get(null);
Unsafe Demo!
Isn’t more flexibility a good thing?
Isn’t more flexibility a good thing?
No
Isn’t more flexibility a good thing?
No
Not Always
The problem with more flexibility
Without limits on what code is allowed to do, we lose the ability to
reason about it.
Tradeoffs are sometimes reasonable, but only if you know you’re
making them.
Most “users” of Unsafe are not aware that their systems depend on
an unsupported and dangerous API.
Jigsaw: closing the loophole
By Jared Tarbell - Flickr: sky puzzle, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=31953973
Native Code
Native code can do anything (Unsafe on steroids!)
JNI used heavily within the JDK
HotSpot: ~1.1 mloc (c and c++)
Class library / tools ~0.9 mloc (c and c++)
Gross majority of native-caused crashes are 3rd party code
Native Code
Debugging / troubleshooting native code requires close familiarity
with platform and native tools.
Native code can cause memory corruption that only manifests as a
crash later in JVM code.
Native Code – strict JNI checking
Xcheck:jni tells the JVM to sanity check arguments and other
prerequisites during any JNI call.
The additional checking comes at a performance cost.
Can help identify mistakes in calling the JNI API, but not anything
else.
Still, JNI usage mistakes are common and are often found with
Xcheck:jni
Native Code – Signal Handling
Native code may install its own signal handlers
HotSpot makes heavy use of signals internally
Error log will list any handlers installed for signals we care about:
Signal Handlers:
…
SIGILL: [libjvm.so+0x8c1cb0],
sa_mask[0]=11111111011111111101111111111110,
sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR1: SIG_DFL,
sa_mask[0]=00000000000000000000000000000000,
sa_flags=none
…
Native Code – Signal Handling
Native code may install its own signal handlers
HotSpot makes heavy use of signals internally
Error log will list any handlers installed for signals we care about:
Signal Handlers:
…
SIGILL: [libjvm.so+0x8c1cb0],
sa_mask[0]=11111111011111111101111111111110,
sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR1: SIG_DFL,
sa_mask[0]=00000000000000000000000000000000,
sa_flags=none
…
Native Code – Signal Handling
Native code may install its own signal handlers
HotSpot makes heavy use of signals internally
Error log will list any handlers installed for signals we care about:
Signal Handlers:
…
SIGILL: [libyourmom.so+0x8c1cb0],
sa_mask[0]=11111111011111111101111111111110,
sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR1: SIG_DFL,
sa_mask[0]=00000000000000000000000000000000,
sa_flags=none
…
Native Code – Signal Chaining
Prevents native code from overriding HotSpot handlers
Keeps track of any custom handler native code tries to install
HotSpot signal handler is called by the OS first
Signal originated (PC) from HotSpot code -> HotSpot handles it
Signal originated elsewhere -> HotSpot calls custom handler
HotSpot
Handler
OS
Custom
Handler
Native Code – Signal Chaining
HotSpot signal chaining code needs to override OS-provided signal
functions (e.g. sigaction).
Easiest way to force signal chaining is to preload the HotSpot signal
chaining library:
export LD_PRELOAD=<libjvm.so dir>/libjsig.so
Memory Exhaustion
Out of backing store
Address space layout issues
Memory Exhaustion
OS is out of backing store (RAM or swap space)
Memory Exhaustion
Address space exhaustion
32-bit JVMs have less than 4GB of address space
32-bit Windows defaults to 2GB!
64-bit platforms
Address space layout issues can prevent allocation
Most often seen on Solaris
Memory Exhaustion
Get Rid of OutOfMemoryError Messages [DEV3420]
Poonam Parhar
Thursday, September 19, 12:15 PM - 01:00 PM | Moscone South - Room
304
Troubleshooting Native Memory Leaks in Java Applications
CodeOne 2018
Slides available on-line
Poonam’s blog has great related content
Corrupt Bytecode
demo
ClassA
public class ClassA {
public int doSomething(int i1, int i2, int i3)
{
return i1+i1+i3;
}
}
ClassB
public class ClassB {
public Integer doSomethingElse(int i1, int i2, int i3)
{
return new Integer(i1+i1+i3);
}
}
ClassC
public class ClassC extends ClassA {}
Demo
public class Demo {
public static void main(String[] args) {
ClassA obj = new ClassC();
System.out.println(obj.doSomething(1,2,3));
}
}
Object
ClassA ClassB
ClassC
Demo
It works…
$ java Demo
5
$
Lets do something bad…
public class ClassC extends ClassB {}
Object
ClassA ClassB
ClassC
Demo
Object
ClassA ClassB
ClassC
Demo
Object
ClassA ClassB
ClassC
Demo
Demo
public class Demo {
public static void main(String[] args) {
ClassA obj = new ClassC();
System.out.println(obj.doSomething(1,2,3));
}
}
$ java Demo
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
Demo.main([Ljava/lang/String;)V @15: invokevirtual
Reason:
Type 'ClassC' (current frame, stack[1]) is not assignable to 'ClassA'
Current Frame:
bci: @15
flags: { }
locals: { '[Ljava/lang/String;', 'ClassC' }
stack: { 'java/io/PrintStream', 'ClassC', integer, integer, integer }
Bytecode:
0x0000000: bb00 0259 b700 034c b200 042b 0405 06b6
0x0000010: 0005 b600 06b1
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
As expected, the verifier protects us from ourselves.
As expected, the verifier protects us from ourselves.
What if we disable it…
We reap what we sow
[dbuck@dbuck02 demo1]$ java -Xverify:none Demo
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fa93be7991c, pid=22925, tid=140364857087744
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x46391c]
#
# Core dump written. Default location: /home/dbuck/BCV_TOI/demo/demo1/core or core.22925
#
# An error report file with more information is saved as:
# /home/dbuck/BCV_TOI/demo/demo1/hs_err_pid22925.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)
Demo Takeaways
No obvious evidence that bad bytecode was root cause of crash
A class is only valid in the context of previously loaded classes
No malicious intent / 3rd party tools used
OS Issue (example)
Intel keeps adding new SIMD
registers
Preexisting SIMD registers keep
growing
128 bit -> 256 bit -> 512 bit
Using these registers helps avoid
having to spill to local memory
By XMM_registers.png: Jonasmikederivative work: Racecar56 - XMM_registers.png, Public Domain, https://commons.wikimedia.org/w/index.php?curid=8540155
XMM Corruption
Linux kernels have not always correctly saved / restored XMM
register content on context switch
Can lead to virtually random memory corruption
Only hint that OS is a factor: recent kernel update
Has happened at least 3 times in the past decade
Code compiled with newer toolchains depends on XMM much
more heavily than in the past
JVM Bug
Most “obvious” cause of JVM crashed
Very hard for end users to identify root cause
Often possible to work around many issues
Performance / Stability Tradeoff
It's easy to make it fast.
It's easy to make it correct.
It’s almost impossible to do both at the same time.
Garbage Collector Complexity
Single threaded is simpler than parallel
STW (Throughput) is simpler than Concurrent
Garbage Collector Complexity
Serial Parallel
Concurrent Mark and Sweep
G1
Bytecode Execution Complexity
Interpreter C1 JIT C2 JIT
JIT Crashes
Can happen anywhere
During JIT compilation
During execution of JITed code
Anywhere else (e.g. during GC of data corrupted by JITed code)
JIT Crash During Compilation
--------------- T H R E A D ---------------
Current thread (0x000000000061e800): JavaThread "C2
CompilerThread1" daemon [_thread_in_vm, id=12,
stack(0xfffffd7ef87fe000,0xfffffd7ef88fe000)]
Current CompileTask:
C2: 15252 9024 b 4
com.sun.crypto.provider.CipherCore::update (609
bytes)
JIT Crash During Execution
Java Execution Thread (Not JIT compilation thread)
Stack: [0xfffffffcc8a00000,0xfffffffcc8b00000], sp=0xfffffffcc8afb780, free space=1005k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J oracle.j2ee.ws.wsdl.extensions.AbstractSerializer.startMarshall(Ljavax/wsdl/De
finition;Loracle/j2ee/ws/wsdl/util/XMLWriter;Ljavax/wsdl/extensions/Extensibil
ityElement;)V
j oracle.j2ee.ws.wsdl.extensions.addressing.EndpointReferenceSerializer.marshall
(Ljava/lang/Class;Ljavax/xml/namespace/QName;Ljavax/wsdl/extensions/Extensibil
ityElement;Ljava/io/PrintWriter;Ljavax/wsdl/Definition;Ljavax/wsdl/extensions/
ExtensionRegistry;)V+41
JIT Crash During Execution
Java Execution Thread (Not JIT compilation thread)
Stack: [0xfffffffcc8a00000,0xfffffffcc8b00000], sp=0xfffffffcc8afb780, free space=1005k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J oracle.j2ee.ws.wsdl.extensions.AbstractSerializer.startMarshall(Ljavax/wsdl/De
finition;Loracle/j2ee/ws/wsdl/util/XMLWriter;Ljavax/wsdl/extensions/Extensibil
ityElement;)V
j oracle.j2ee.ws.wsdl.extensions.addressing.EndpointReferenceSerializer.marshall
(Ljava/lang/Class;Ljavax/xml/namespace/QName;Ljavax/wsdl/extensions/Extensibil
ityElement;Ljava/io/PrintWriter;Ljavax/wsdl/Definition;Ljavax/wsdl/extensions/
ExtensionRegistry;)V+41
Other JIT Crashes
Compilation events (10 events):
Event: 15.131 Thread 0x000000000061e800 nmethod 9019 0xfffffd7fee3ecdd0 code [0xfffffd7fee3ed020,
0xfffffd7fee3eddd0]
Event: 15.131 Thread 0x000000000061c000 9020 b 4 javax.crypto.Cipher$Transform::matches (48 bytes)
Event: 15.143 Thread 0x000000000061c000 nmethod 9020 0xfffffd7fee3f3210 code [0xfffffd7fee3f3420,
0xfffffd7fee3f4030]
Event: 15.143 Thread 0x000000000061e800 9021 b 4 javax.crypto.Cipher::checkOpmode (21 bytes)
Event: 15.143 Thread 0x000000000061e800 nmethod 9021 0xfffffd7fee3f2990 code [0xfffffd7fee3f2ae0,
0xfffffd7fee3f2b38]
Event: 15.144 Thread 0x000000000061c000 9022 b 4 com.sun.crypto.provider.CipherCore::init (552 bytes)
Event: 15.150 Thread 0x000000000061c000 nmethod 9022 0xfffffd7fee3ebc90 code [0xfffffd7fee3ebe80,
0xfffffd7fee3ec5e0]
Event: 15.151 Thread 0x0000000000621000 9023 b 2 com.sun.crypto.provider.CipherCore::init (552 bytes)
Event: 15.153 Thread 0x0000000000621000 nmethod 9023 0xfffffd7feea17010 code [0xfffffd7feea17460,
0xfffffd7feea18d48]
Event: 15.182 Thread 0x000000000061e800 9024 b 4 com.sun.crypto.provider.CipherCore::update (609 bytes)
JIT Bug Workarounds
Globally disable JIT
-Xint (Interpreter only)
-XX:TieredStopAtLevel=1 (Interpreter + C1 JIT only)
Disable JIT for a particular package / class / method
-XX:CompileCommand=
exclude,
oracle/j2ee/ws/wsdl/extensions/AbstractSerializer,startMarshall
Always Use Up-to-date Runtime
1000s of stability fixes during lifetime of a major release
Tremendous effort to avoid regression / incompatibilities in update
releases
Security vulnerabilities alone should justify staying up to date
Risk of known issues / vulnerabilities > Risk of updating
Conclusion
Most JVM crashes reported can be resolved or worked around by
end users
Lots of end-user actionable data in hs_err log
A quick sanity check of the “usual suspects” can resolve most crash
issues
Thank You!
Resources
Java SE Troubleshooting Guide
https://docs.oracle.com/en/java/javase/11/troubleshoot/index.ht
ml
Poonam’s CodeOne Native Leak slides
https://www.slideshare.net/PoonamBajaj5/troubleshooting-
native-memory-leaks-in-java-applications
Poonam’s blog
https://blogs.oracle.com/poonam/
The preceding is intended to outline our general product direction. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality described for Oracle’s products may change
and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed
discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and
Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q
under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website
at http://www.oracle.com/investor. All information in this presentation is current as of September 2019
and Oracle undertakes no duty to update any statement in light of new information or future events.
Safe Harbor
Copyright © 2019 Oracle and/or its affiliates.

More Related Content

PDF
JDK Mission Control: Where We Are, Where We Are Going [Code One 2019]
PDF
Java Concurrency, A(nother) Peek Under the Hood [Code One 2019]
PDF
invokedynamic for Mere Mortals [Code One 2019]
PDF
Hangs, Slowdowns, Starvation—Oh My! A Deep Dive into the Life of a Java Threa...
PDF
Java Bytecode Crash Course [Code One 2019]
PDF
Hotspot & AOT
PDF
Compile ahead of time. It's fine?
PPTX
Future of Java EE with Java SE 8
JDK Mission Control: Where We Are, Where We Are Going [Code One 2019]
Java Concurrency, A(nother) Peek Under the Hood [Code One 2019]
invokedynamic for Mere Mortals [Code One 2019]
Hangs, Slowdowns, Starvation—Oh My! A Deep Dive into the Life of a Java Threa...
Java Bytecode Crash Course [Code One 2019]
Hotspot & AOT
Compile ahead of time. It's fine?
Future of Java EE with Java SE 8

What's hot (20)

PPTX
Java EE 7 for Real Enterprise Systems
PDF
JavaOne 2016: Life after Modularity
PPTX
Hacking Oracle From Web Apps 1 9
PDF
CompletableFuture уже здесь
PPTX
JavaOne 2015 CON5211 Digital Java EE 7 with JSF Conversations, Flows, and CDI...
PDF
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
PDF
Java EE 7: Whats New in the Java EE Platform @ Devoxx 2013
PDF
Clone Clone Make: a better way to build
PDF
Java EE 7: Boosting Productivity and Embracing HTML5
ODP
Javaee6 Overview
PDF
Flavors of Concurrency in Java
PDF
Concierge - Bringing OSGi (back) to Embedded Devices
PDF
JDK9 Features (Summary, 31/Jul/2015) #JJUG
PPTX
FOSDEM 2017 - Open J9 The Next Free Java VM
PDF
Hierarchy Viewer Internals
PDF
Haj 4344-java se 9 and the application server-1
PDF
Building Java Desktop Apps with JavaFX 8 and Java EE 7
PDF
J9: Under the hood of the next open source JVM
RTF
Readme
PDF
Hacking oracle using metasploit
Java EE 7 for Real Enterprise Systems
JavaOne 2016: Life after Modularity
Hacking Oracle From Web Apps 1 9
CompletableFuture уже здесь
JavaOne 2015 CON5211 Digital Java EE 7 with JSF Conversations, Flows, and CDI...
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
Java EE 7: Whats New in the Java EE Platform @ Devoxx 2013
Clone Clone Make: a better way to build
Java EE 7: Boosting Productivity and Embracing HTML5
Javaee6 Overview
Flavors of Concurrency in Java
Concierge - Bringing OSGi (back) to Embedded Devices
JDK9 Features (Summary, 31/Jul/2015) #JJUG
FOSDEM 2017 - Open J9 The Next Free Java VM
Hierarchy Viewer Internals
Haj 4344-java se 9 and the application server-1
Building Java Desktop Apps with JavaFX 8 and Java EE 7
J9: Under the hood of the next open source JVM
Readme
Hacking oracle using metasploit
Ad

Similar to CSI (Crash Scene Investigation) HotSpot: Common JVM Crash Causes and Solutions [Code One 2019] (20)

PDF
Beyond JVM - YOW Melbourne 2013
PPT
JavaSecure
PDF
Java is Container Ready - Vaibhav - Container Conference 2018
ODP
Debugging Native heap OOM - JavaOne 2013
PDF
Presentations Unusual Java Bugs And Detecting Them Using Foss Tools
PDF
Troubleshooting Java HotSpot VM
PDF
What the CRaC - Superfast JVM startup
PDF
Get Rid Of OutOfMemoryError messages
PDF
JVMs in Containers
PPTX
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
PDF
Java tuning on GNU/Linux for busy dev
PPTX
How to Troubleshoot 9 Types of OutOfMemoryError
PPTX
How to Troubleshoot 9 Types of OutOfMemoryError
PDF
Jdj Foss Java Tools
PDF
JDK 10 Java Module System
PPT
Let It Crash (@pavlobaron)
PDF
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
PDF
JavaOne 2014: Java Debugging
PPTX
GOTO Night with Charles Nutter Slides
ODP
Jvm tuning in a rush! - Lviv JUG
Beyond JVM - YOW Melbourne 2013
JavaSecure
Java is Container Ready - Vaibhav - Container Conference 2018
Debugging Native heap OOM - JavaOne 2013
Presentations Unusual Java Bugs And Detecting Them Using Foss Tools
Troubleshooting Java HotSpot VM
What the CRaC - Superfast JVM startup
Get Rid Of OutOfMemoryError messages
JVMs in Containers
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Java tuning on GNU/Linux for busy dev
How to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryError
Jdj Foss Java Tools
JDK 10 Java Module System
Let It Crash (@pavlobaron)
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2014: Java Debugging
GOTO Night with Charles Nutter Slides
Jvm tuning in a rush! - Lviv JUG
Ad

More from David Buck (20)

PDF
JDK 13 New Features [MeetUp with Java Experts! @Gaienmae/Dojima 2019]
PDF
JDK Mission Control: Where We Are, Where We Are Going [Groundbreakers APAC 20...
PDF
Z Garbage Collector
PDF
Valhalla Update JJUG CCC Spring 2019
PDF
Var handles jjug_ccc_spring_2018
PDF
JDK 10 へようこそ
PDF
Java SE 8におけるHotSpotの進化 [Java Day Tokyo 2014 C-2]
PDF
HotSpot のロック: A Peek Under the Hood [JJUG ナイトセミナ JVM 特集 2015年8月]
PDF
Java Concurrency, A(nother) Peek Under the Hood [JavaOne 2016 CON1497]
PDF
Bytecode Verification, the Hero That Java Needs [JavaOne 2016 CON1500]
PDF
Java Debuggers: A Peek Under the Hood [JavaOne 2016 CON1503]
PDF
Lambda: A Peek Under The Hood [Java Day Tokyo 2015 6-3]
PDF
Java Concurrency, A(nother) Peek Under the Hood [Java Day Tokyo 2016 3-C]
PDF
Ahead-of-Time Compilation with JDK 9 [Java Day Tokyo 2017 D1-A1]
PDF
InvokeDynamic for Mere Mortals [JavaOne 2015 CON7682]
PDF
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
PDF
Let’s Write Our Own Chip-8 Interpreter! [JavaOne 2017 CON3584]
PDF
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
PDF
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
PPTX
OpenJDK: How to Join In on All the Fun [JavaOne 2017 CON3667]
JDK 13 New Features [MeetUp with Java Experts! @Gaienmae/Dojima 2019]
JDK Mission Control: Where We Are, Where We Are Going [Groundbreakers APAC 20...
Z Garbage Collector
Valhalla Update JJUG CCC Spring 2019
Var handles jjug_ccc_spring_2018
JDK 10 へようこそ
Java SE 8におけるHotSpotの進化 [Java Day Tokyo 2014 C-2]
HotSpot のロック: A Peek Under the Hood [JJUG ナイトセミナ JVM 特集 2015年8月]
Java Concurrency, A(nother) Peek Under the Hood [JavaOne 2016 CON1497]
Bytecode Verification, the Hero That Java Needs [JavaOne 2016 CON1500]
Java Debuggers: A Peek Under the Hood [JavaOne 2016 CON1503]
Lambda: A Peek Under The Hood [Java Day Tokyo 2015 6-3]
Java Concurrency, A(nother) Peek Under the Hood [Java Day Tokyo 2016 3-C]
Ahead-of-Time Compilation with JDK 9 [Java Day Tokyo 2017 D1-A1]
InvokeDynamic for Mere Mortals [JavaOne 2015 CON7682]
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
Let’s Write Our Own Chip-8 Interpreter! [JavaOne 2017 CON3584]
Everything You Wanted to Know About JIT Compilation but Were Afraid to Ask [J...
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
OpenJDK: How to Join In on All the Fun [JavaOne 2017 CON3667]

Recently uploaded (20)

PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
System and Network Administraation Chapter 3
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
medical staffing services at VALiNTRY
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administration Chapter 2
PDF
AI in Product Development-omnex systems
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Introduction to Artificial Intelligence
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
How Creative Agencies Leverage Project Management Software.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
How to Choose the Right IT Partner for Your Business in Malaysia
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Odoo Companies in India – Driving Business Transformation.pdf
System and Network Administraation Chapter 3
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
medical staffing services at VALiNTRY
CHAPTER 2 - PM Management and IT Context
PTS Company Brochure 2025 (1).pdf.......
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administration Chapter 2
AI in Product Development-omnex systems
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Online Work Permit System for Fast Permit Processing
Introduction to Artificial Intelligence
Upgrade and Innovation Strategies for SAP ERP Customers

CSI (Crash Scene Investigation) HotSpot: Common JVM Crash Causes and Solutions [Code One 2019]

  • 1. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of September 2019 and Oracle undertakes no duty to update any statement in light of new information or future events. Safe Harbor Copyright © 2019 Oracle and/or its affiliates.
  • 2. CSI (Crash Scene Investigation) HotSpot: Common JVM Crash Causes and Solutions [DEV4421] Principal Member of Technical Staff Java Platform Group September 17, 2019 David Buck Copyright © 2019 Oracle and/or its affiliates.
  • 3. JVM Sustaining Engineer OpenJDK Update Project Maintainer JavaOne Rock Star Co-author of Oracle WebLogic Server 11g 構築・運用ガイド @DavidBuckJP https://blogs.oracle.com/buck/ Who am I? David Buck (left)
  • 4. Insurance Institute for Highway Safety [CC BY-SA 3.0 (https://creativecom mons.org/licenses/by -sa/3.0)]
  • 5. Motivation Identify root cause Prevent future occurrences Collect information to help others debug further
  • 7. JVM Crash JVM process terminates abnormally OS signals a fatal error (e.g. SIGSEGV, SIGFPE) JVM detects internal unrecoverable error Native-level manifestation “Crash” is often used in other contexts for any process that ends in failure, but here we mean the above only
  • 8. Why not try to recover? Often, “unrecoverable” means continuing would be too risky Fast fail is preferred Integrity of data Quicker detection and resolution of problem Redundancy of JVM instances (clustering) maintains availability
  • 9. Responding to a crash Collect any necessary data Restart JVM process Ideally the above two will be automated Analyze offline
  • 12. Data
  • 13. Fatal Error Log hs_err_<pid>.log Output to -XX:ErrorFile JVM current working directory Temporary directory (e.g. /tmp) if can’t write to CWD Useful for identifying known issues Useful for identifying environmental / application issues Not very useful for trying to identify new JVM bugs Should not contain sensitive data Avoid credentials on command line or environmental variables
  • 14. Fatal Error Log Audience JVM Vendor Identify known issues Quicker core file analysis Lots of JVM internal data End Users Anyone troubleshooting a crash Lots of useful non-internal data
  • 15. Fatal Error Log Audience JVM Vendor Identify known issues Quicker core file analysis Lots of JVM internal data End Users Anyone troubleshooting a crash Lots useful non-internal data JVM
  • 16. Fatal Error Log Audience JVM Vendor Identify known issues Quicker core file analysis Lots of JVM internal data End Users Anyone troubleshooting a crash Lots useful non-internal data JVM hs_err_4242.log
  • 17. Core File Memory dump of the JVM process Large heap -> Large core file May contain sensitive data (passwords, PII, etc.) Truncation is very often an issue May consume significant disk space Automatic restart could result in disk space exhaustion Make sure you have plenty of space on file system Configure core file output to non-critical file system
  • 18. Core File Heap “Other Stuff” Thread Stacks and more “Other Stuff” Heap “Other Stuff”
  • 19. Core File Heap “Other Stuff” Thread Stacks and more “Other Stuff” Heap “Other Stuff”
  • 20. Core File Not enabled by default in many configurations Linux be sure to set “ulimit –c unlimited” Disabled by default on non-Server Windows -XX:+CreateCoredumpOnCrash (JDK >= 9) -XX:+CreateMinidumpOnCrash (JDK <= 8)
  • 23. Serviceability Agent Platform-independent core file debugging Built-in knowledge of JVM internals Possibly able to recover JFR data from core file Much easier to use JDK >= 9
  • 24. Other Important Data Native libraries loaded by JVM Copies of libraries (Linux / macOS / Solaris) PDB files (Windows) Any unexpected output in log files / stdout / stderr OutOfMemoryError StackOverflowError Strange OS or native library output
  • 25. Identifying native libraries Linux: gdb “info shared” Windows: windbg “lm” Solaris: “pldd corefile” (yes... this works!) macOS: lldb “image list” can be automated (e.g. pkgapp)
  • 27. Stack Overflow Way more dangerous than many people think HotSpot is able to recover most of the time Can silently corrupt memory JVM behavior is considered undefined until reboot Very easy to handle while interpreting bytecode Impossible to guarantee proper handling in native code
  • 29. Stack Overflow Guard Page Stack Pointer ○ Read ○ Write × Execute × Read × Write × Execute
  • 30. Stack Overflow Guard Page Stack Pointer ○ Read ○ Write × Execute × Read × Write × Execute
  • 31. Stack Overflow Guard Page Stack Pointer ○ Read ○ Write × Execute × Read × Write × Execute SIGSEGV
  • 32. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 33. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 34. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute SIGSEGV
  • 35. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute ○ Read ○ Write × Execute × Read × Write × Execute StackOverFlowError
  • 36. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 37. Unwinding a Stack Overflow If SOFE thrown in a critical section Java-level data may be left in inconsistent state Java-level lock may be left “held” by nobody (likely hang) No JVM crash, but system unlikely to be able to continue running
  • 38. Unwinding a Stack Overflow No way to unwind arbitrary native code Must be executing Java when we “discover” the overflow
  • 39. Stack Banging Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 40. Stack Banging Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 41. Stack Banging Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute StackOverFlowError
  • 42. Stack Banging Can be controlled by StackShadowPages Too low of a value makes you more vulnerable to unrecoverable stack overflow Too high of a value could waste stack space
  • 43. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 44. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 45. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute
  • 46. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute ??????????????
  • 47. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute ??????????????
  • 48. Stack Overflow Red Pages Stack Pointer Yellow Pages [guard page] ○ Read ○ Write × Execute × Read × Write × Execute × Read × Write × Execute ??????????????
  • 49. Red / Yellow Pages StackYellowPages StackRedPages Too low values makes you more vulnerable to unrecoverable stack overflow Too high of a value could waste stack space
  • 50. Stack Overflow Even when we recover, the JVM should be restarted Locks held at the time of SOFE may be left locked Data structures may be been left in an inconsistent state Other stack overflows may have silently corrupted native data
  • 51. VirtualMachineError Thrown to indicate that the Java Virtual Machine is broken or has run out of resources necessary for it to continue operating. VirtualMachineError InternalError OutOfMemoryError StackOverFlowError UnknownError
  • 52. If You See One Stack Overflow… Public Domain, https://commons.wikimedia.org/w/index.php?curid=696464
  • 53. Stack Overflow OutOfMemory and StackOverflow Exception counts: StackOverflowErrors=1 Error log (JDK >= 8) will record number of SOFE that were successfully handled Many stack overflow crashes show no obvious sign of stack overflow
  • 54. Stack Overflow No StackOverFlow is benign All SOFEs should be investigated for root cause and resolved SOFE is very hard to eliminate as a possible root cause for many crashes
  • 55. Use of Internal / Private APIs
  • 56. sun.misc.Unsafe Java code can directly access various JVM internal functionality Used sparingly to implement parts of the Java SE Class Library Never intended for use outside of Sun / Oracle
  • 57. By Cbmeeks / processed by Pixel8 - Original uploader was Cbmeeks at en.wikipedia, CC 表示-継承 3.0, https://commons.wikimedia.org/w/index.php?curid=3672924
  • 59. BASIC Support for Direct Access PEEK Retrieve data from an arbitrary address POKE Write an arbitrary value to an arbitrary address
  • 60. sun.misc.Unsafe Allocate uninitialized memory on Heap More flexible memory model PEEK/POKE of JVM address space
  • 61. Unsafe Usage Reflection Serialization NIO java.util.concurrent Encryption / Decryption BigDecimal / BigInteger Java2D CPU usage monitoring (JMX)
  • 62. private static final Unsafe theUnsafe = new Unsafe(); public static Unsafe getUnsafe() { Class cc = sun.reflect.Reflection.getCallerClass(2); if (cc.getClassLoader() != null) throw new SecurityException("Unsafe"); return theUnsafe; }
  • 63. Field f = Unsafe.class.getDeclaredField("theUnsafe"); f.setAccessible(true); unsafe = (Unsafe) f.get(null);
  • 64. Field f = Unsafe.class.getDeclaredField("theUnsafe"); f.setAccessible(true); unsafe = (Unsafe) f.get(null);
  • 66. Isn’t more flexibility a good thing?
  • 67. Isn’t more flexibility a good thing? No
  • 68. Isn’t more flexibility a good thing? No Not Always
  • 69. The problem with more flexibility Without limits on what code is allowed to do, we lose the ability to reason about it. Tradeoffs are sometimes reasonable, but only if you know you’re making them. Most “users” of Unsafe are not aware that their systems depend on an unsupported and dangerous API.
  • 70. Jigsaw: closing the loophole By Jared Tarbell - Flickr: sky puzzle, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=31953973
  • 71. Native Code Native code can do anything (Unsafe on steroids!) JNI used heavily within the JDK HotSpot: ~1.1 mloc (c and c++) Class library / tools ~0.9 mloc (c and c++) Gross majority of native-caused crashes are 3rd party code
  • 72. Native Code Debugging / troubleshooting native code requires close familiarity with platform and native tools. Native code can cause memory corruption that only manifests as a crash later in JVM code.
  • 73. Native Code – strict JNI checking Xcheck:jni tells the JVM to sanity check arguments and other prerequisites during any JNI call. The additional checking comes at a performance cost. Can help identify mistakes in calling the JNI API, but not anything else. Still, JNI usage mistakes are common and are often found with Xcheck:jni
  • 74. Native Code – Signal Handling Native code may install its own signal handlers HotSpot makes heavy use of signals internally Error log will list any handlers installed for signals we care about: Signal Handlers: … SIGILL: [libjvm.so+0x8c1cb0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none …
  • 75. Native Code – Signal Handling Native code may install its own signal handlers HotSpot makes heavy use of signals internally Error log will list any handlers installed for signals we care about: Signal Handlers: … SIGILL: [libjvm.so+0x8c1cb0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none …
  • 76. Native Code – Signal Handling Native code may install its own signal handlers HotSpot makes heavy use of signals internally Error log will list any handlers installed for signals we care about: Signal Handlers: … SIGILL: [libyourmom.so+0x8c1cb0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none …
  • 77. Native Code – Signal Chaining Prevents native code from overriding HotSpot handlers Keeps track of any custom handler native code tries to install HotSpot signal handler is called by the OS first Signal originated (PC) from HotSpot code -> HotSpot handles it Signal originated elsewhere -> HotSpot calls custom handler HotSpot Handler OS Custom Handler
  • 78. Native Code – Signal Chaining HotSpot signal chaining code needs to override OS-provided signal functions (e.g. sigaction). Easiest way to force signal chaining is to preload the HotSpot signal chaining library: export LD_PRELOAD=<libjvm.so dir>/libjsig.so
  • 79. Memory Exhaustion Out of backing store Address space layout issues
  • 80. Memory Exhaustion OS is out of backing store (RAM or swap space)
  • 81. Memory Exhaustion Address space exhaustion 32-bit JVMs have less than 4GB of address space 32-bit Windows defaults to 2GB! 64-bit platforms Address space layout issues can prevent allocation Most often seen on Solaris
  • 82. Memory Exhaustion Get Rid of OutOfMemoryError Messages [DEV3420] Poonam Parhar Thursday, September 19, 12:15 PM - 01:00 PM | Moscone South - Room 304 Troubleshooting Native Memory Leaks in Java Applications CodeOne 2018 Slides available on-line Poonam’s blog has great related content
  • 84. demo
  • 85. ClassA public class ClassA { public int doSomething(int i1, int i2, int i3) { return i1+i1+i3; } }
  • 86. ClassB public class ClassB { public Integer doSomethingElse(int i1, int i2, int i3) { return new Integer(i1+i1+i3); } }
  • 87. ClassC public class ClassC extends ClassA {}
  • 88. Demo public class Demo { public static void main(String[] args) { ClassA obj = new ClassC(); System.out.println(obj.doSomething(1,2,3)); } }
  • 90. It works… $ java Demo 5 $
  • 91. Lets do something bad… public class ClassC extends ClassB {}
  • 95. Demo public class Demo { public static void main(String[] args) { ClassA obj = new ClassC(); System.out.println(obj.doSomething(1,2,3)); } }
  • 96. $ java Demo Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.VerifyError: Bad type on operand stack Exception Details: Location: Demo.main([Ljava/lang/String;)V @15: invokevirtual Reason: Type 'ClassC' (current frame, stack[1]) is not assignable to 'ClassA' Current Frame: bci: @15 flags: { } locals: { '[Ljava/lang/String;', 'ClassC' } stack: { 'java/io/PrintStream', 'ClassC', integer, integer, integer } Bytecode: 0x0000000: bb00 0259 b700 034c b200 042b 0405 06b6 0x0000010: 0005 b600 06b1 at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784)
  • 97. As expected, the verifier protects us from ourselves.
  • 98. As expected, the verifier protects us from ourselves. What if we disable it…
  • 99. We reap what we sow [dbuck@dbuck02 demo1]$ java -Xverify:none Demo # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fa93be7991c, pid=22925, tid=140364857087744 # # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x46391c] # # Core dump written. Default location: /home/dbuck/BCV_TOI/demo/demo1/core or core.22925 # # An error report file with more information is saved as: # /home/dbuck/BCV_TOI/demo/demo1/hs_err_pid22925.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # Aborted (core dumped)
  • 100. Demo Takeaways No obvious evidence that bad bytecode was root cause of crash A class is only valid in the context of previously loaded classes No malicious intent / 3rd party tools used
  • 101. OS Issue (example) Intel keeps adding new SIMD registers Preexisting SIMD registers keep growing 128 bit -> 256 bit -> 512 bit Using these registers helps avoid having to spill to local memory By XMM_registers.png: Jonasmikederivative work: Racecar56 - XMM_registers.png, Public Domain, https://commons.wikimedia.org/w/index.php?curid=8540155
  • 102. XMM Corruption Linux kernels have not always correctly saved / restored XMM register content on context switch Can lead to virtually random memory corruption Only hint that OS is a factor: recent kernel update Has happened at least 3 times in the past decade Code compiled with newer toolchains depends on XMM much more heavily than in the past
  • 103. JVM Bug Most “obvious” cause of JVM crashed Very hard for end users to identify root cause Often possible to work around many issues
  • 104. Performance / Stability Tradeoff It's easy to make it fast. It's easy to make it correct. It’s almost impossible to do both at the same time.
  • 105. Garbage Collector Complexity Single threaded is simpler than parallel STW (Throughput) is simpler than Concurrent
  • 106. Garbage Collector Complexity Serial Parallel Concurrent Mark and Sweep G1
  • 108. JIT Crashes Can happen anywhere During JIT compilation During execution of JITed code Anywhere else (e.g. during GC of data corrupted by JITed code)
  • 109. JIT Crash During Compilation --------------- T H R E A D --------------- Current thread (0x000000000061e800): JavaThread "C2 CompilerThread1" daemon [_thread_in_vm, id=12, stack(0xfffffd7ef87fe000,0xfffffd7ef88fe000)] Current CompileTask: C2: 15252 9024 b 4 com.sun.crypto.provider.CipherCore::update (609 bytes)
  • 110. JIT Crash During Execution Java Execution Thread (Not JIT compilation thread) Stack: [0xfffffffcc8a00000,0xfffffffcc8b00000], sp=0xfffffffcc8afb780, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J oracle.j2ee.ws.wsdl.extensions.AbstractSerializer.startMarshall(Ljavax/wsdl/De finition;Loracle/j2ee/ws/wsdl/util/XMLWriter;Ljavax/wsdl/extensions/Extensibil ityElement;)V j oracle.j2ee.ws.wsdl.extensions.addressing.EndpointReferenceSerializer.marshall (Ljava/lang/Class;Ljavax/xml/namespace/QName;Ljavax/wsdl/extensions/Extensibil ityElement;Ljava/io/PrintWriter;Ljavax/wsdl/Definition;Ljavax/wsdl/extensions/ ExtensionRegistry;)V+41
  • 111. JIT Crash During Execution Java Execution Thread (Not JIT compilation thread) Stack: [0xfffffffcc8a00000,0xfffffffcc8b00000], sp=0xfffffffcc8afb780, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J oracle.j2ee.ws.wsdl.extensions.AbstractSerializer.startMarshall(Ljavax/wsdl/De finition;Loracle/j2ee/ws/wsdl/util/XMLWriter;Ljavax/wsdl/extensions/Extensibil ityElement;)V j oracle.j2ee.ws.wsdl.extensions.addressing.EndpointReferenceSerializer.marshall (Ljava/lang/Class;Ljavax/xml/namespace/QName;Ljavax/wsdl/extensions/Extensibil ityElement;Ljava/io/PrintWriter;Ljavax/wsdl/Definition;Ljavax/wsdl/extensions/ ExtensionRegistry;)V+41
  • 112. Other JIT Crashes Compilation events (10 events): Event: 15.131 Thread 0x000000000061e800 nmethod 9019 0xfffffd7fee3ecdd0 code [0xfffffd7fee3ed020, 0xfffffd7fee3eddd0] Event: 15.131 Thread 0x000000000061c000 9020 b 4 javax.crypto.Cipher$Transform::matches (48 bytes) Event: 15.143 Thread 0x000000000061c000 nmethod 9020 0xfffffd7fee3f3210 code [0xfffffd7fee3f3420, 0xfffffd7fee3f4030] Event: 15.143 Thread 0x000000000061e800 9021 b 4 javax.crypto.Cipher::checkOpmode (21 bytes) Event: 15.143 Thread 0x000000000061e800 nmethod 9021 0xfffffd7fee3f2990 code [0xfffffd7fee3f2ae0, 0xfffffd7fee3f2b38] Event: 15.144 Thread 0x000000000061c000 9022 b 4 com.sun.crypto.provider.CipherCore::init (552 bytes) Event: 15.150 Thread 0x000000000061c000 nmethod 9022 0xfffffd7fee3ebc90 code [0xfffffd7fee3ebe80, 0xfffffd7fee3ec5e0] Event: 15.151 Thread 0x0000000000621000 9023 b 2 com.sun.crypto.provider.CipherCore::init (552 bytes) Event: 15.153 Thread 0x0000000000621000 nmethod 9023 0xfffffd7feea17010 code [0xfffffd7feea17460, 0xfffffd7feea18d48] Event: 15.182 Thread 0x000000000061e800 9024 b 4 com.sun.crypto.provider.CipherCore::update (609 bytes)
  • 113. JIT Bug Workarounds Globally disable JIT -Xint (Interpreter only) -XX:TieredStopAtLevel=1 (Interpreter + C1 JIT only) Disable JIT for a particular package / class / method -XX:CompileCommand= exclude, oracle/j2ee/ws/wsdl/extensions/AbstractSerializer,startMarshall
  • 114. Always Use Up-to-date Runtime 1000s of stability fixes during lifetime of a major release Tremendous effort to avoid regression / incompatibilities in update releases Security vulnerabilities alone should justify staying up to date Risk of known issues / vulnerabilities > Risk of updating
  • 115. Conclusion Most JVM crashes reported can be resolved or worked around by end users Lots of end-user actionable data in hs_err log A quick sanity check of the “usual suspects” can resolve most crash issues
  • 117. Resources Java SE Troubleshooting Guide https://docs.oracle.com/en/java/javase/11/troubleshoot/index.ht ml Poonam’s CodeOne Native Leak slides https://www.slideshare.net/PoonamBajaj5/troubleshooting- native-memory-leaks-in-java-applications Poonam’s blog https://blogs.oracle.com/poonam/
  • 118. The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of September 2019 and Oracle undertakes no duty to update any statement in light of new information or future events. Safe Harbor Copyright © 2019 Oracle and/or its affiliates.