Secrets of Performance Tuning Java on Kubernetes - The Article (Part 1)
In the dynamic world of cloud-native development, Java remains a cornerstone. However, running Java efficiently on Kubernetes presents a unique set of challenges.
My talk “Secrets of Performance Tuning Java on Kubernetes” offers a comprehensive guide to overcoming these hurdles through JVM tuning, garbage collector selection, and Kubernetes resource optimization and redistribution, as well as a technique for testing performance tuning configurations in production.
If you prefer to watch it, I suggest one of these two versions:
This article covers the key learnings from that talk.
The Case for Smaller, Safer Containers
While image size matters, security is paramount. Reducing container size by using distroless base images, trimming dependencies, and building custom Java runtimes (e.g., with JLink or GraalVM Native Image) lowers the attack surface and simplifies patching. These measures improve security and performance in production environments. As storage cost is cheap, and bandwidth between your Kubernetes cluster and your container image registry is super fast, it all comes down to something else.
The benefits of smaller container images are really security:
Lesser 3rd-party dependencies reduces potential vulnerabilities that can be exploited
Smaller attacking surface area by eliminating unnecessary tools
Easier to meet compliance requirements and to audit
Lower risk of misconfiguration
Faster to patch and update
The documentation for Spring Boot Docker shows how to trim down container images, and how to smartly use the layer system to optimize building and publishing them. But to get to a smaller and safer image, other techniques are required.
The use of jlink or GraalVM can dramatically improve the security of your Java container image. After that, all you need is to ensure that the Base OS image is also trimmed down.
I personally recommend using Ubuntu / Debian slim images for stability and security reasons. But if your goal is to reduce image size for storage purposes, Alpine is a great option, as well as the "distroless" options out there.
Lesson: Focus on security, not on container image size. The first is a goal, the second one is a consequence.
JVM startup and warmup time can be improved dramatically
GraalVM Native Image can dramatically improve startup and warmup time of Java applications. But there are other ways to improve startup/warmup time performance without introducing a new runtime.
Source: leyden-jvmls-2023-08-08.pdf
Three options, two of which are available today, are:
Class Data Sharing and AppCDS
Class Data Sharing (CDS) in OpenJDK is a set of features that improves JVM startup time and memory footprint by allowing class metadata to be preprocessed and shared across JVM instances. CDS includes the Java SE API class data sharing (classes.jsa) and AppCDS for application-specific classes. CDS can reduce startup time by 30–50% in real-world apps (especially microservices).
To leverage the benefits of CDS, use OpenJDK 11+. For maximum benefits, use OpenJDK 17+.
Project Leyden
Project Leyden introduces static image (precomputed JVM and application state) generation to the Java Platform with build-time optimizations to the JVM to reduce startup delays, without compromising Java’s dynamic capabilities or compatibility.
Still in development.
Project CRaC
Project CRaC (Coordinated Restore at Checkpoint) is an OpenJDK project, led by Azul Systems, that enables fast JVM startup by allowing a running Java application to be checkpointed and restored later. CRaC snapshots a live Java process and brings it back to life instantly, perfect for speeding up Java in the cloud.
Project CRaC is still experimental.
Lesson: You can benefit today from faster startup and warmup time of your JVM workloads without updating the JDK (if you are on 11+) nor moving to GraalVM Native Image.
JVM Ergonomics: Understanding the Defaults
Most developers rely on JVM defaults and often blindly. According to my research on Azure data, New Relic surveys, and other surveys published over the years, about 30% or more of production JVM workloads are running with default settings (no heap setting, no GC setting, nothing). With default ergonomics in place, the JVM considers what is available in terms of CPU and RAM to determine which Garbage Collector (GC) to use and what heap size to allocate.
The default heap size is 50% for containers with up to 256 MB of memory (as set in the MinRAMPercentage flag; not to be confused with -Xms; see this issue). What I have observed is that this is hardly the case and most Java workloads in containers have 512MB or more. In this case, 25% of memory is allocated for heap by default in containers with 512 MB of memory or more. And to make things just funny, anywhere between 256 and 512, the JVM gets ~127MB of heap size.
For instance, a container with 2GB of RAM and 1 CPU (1,000 milicores on k8s) will default to SerialGC and a heap of 512MB, hindering performance and wasting resources.
Lesson: Make sure heap size is being adjusted accordingly, either manually or by a tool/script. Do NOT run CMD ["java", "-jar", "app.jar"]
JVM Native Memory Usage: It’s Not Just About the Heap
When tuning Java applications, most developers focus on heap size, using flags like -Xmx and -Xms. But this overlooks a critical area of memory: native (non-heap) memory. The JVM uses native memory for several essential operations, and if not accounted for, this can lead to out-of-memory (OOM) errors, even when the heap is properly sized.
The most notable native memory component is Metaspace, which holds class metadata such as method tables and class definitions. Unlike the old PermGen space, Metaspace resides outside the heap and grows as needed. JVM can clean up Metaspace when classloaders are unloaded, but it may be wise to cap it using flags like:
-XX:MetaspaceSize (initial allocation)
-XX:MaxMetaspaceSize (maximum allowed size)
Beyond Metaspace, native memory is also consumed by:
The JVM process itself, including internal structures and threads
Direct buffers, commonly used by NIO, Netty, and high-performance frameworks to store data off-heap
Here's a critical insight: whether you use a large container or a small container (and adjust your heap size accordingly) your application is likely to need roughly the same amount of non-heap memory. That means shrinking the heap doesn’t shrink your native memory requirements at the same ratio. If you oversize the heap in a small container, native memory won’t have room to breathe, increasing the risk of crashes.
Keep this in mind when horizontally scaling your Java workloads containers. Every replica will have the same off-heap memory usage. This means that 10 small containers (of 1x memory limit) will require more memory than 5 larger containers (of 2x memory limit).
Lesson: Consider larger containers (in terms of memory limit) with lesser replicas than smaller containers with more replicas. This reserves enough room for native memory consumers and reduces the risk of unexpected OOM errors, with reduced resource waste. Alternatively: remove memory limit for the container, and manually set the heap size.
What About Kubernetes?
Kubernetes is a powerful orchestrator, but it introduces challenges for Java applications, particularly around CPU throttling, memory limits, and autoscaling. Understanding how the JVM behaves under Kubernetes constraints is key to achieving performance and reliability.
Part 2 will cover the details of Kubernetes.
Stay tuned, and comment what you liked about this article!
Backend Developer | Java • Spring • Node.js • SQL Server | Looking for Next Challenge
2moGreat article! I especially appreciated the part about native memory vs heap sizing — it’s a topic that often gets overlooked when tuning Java apps in containers. This gave me a clearer understanding of what to watch out for as I scale my backend systems. Looking forward to Part 2!
Technology Evangelist | Associate Director | Cloud & Enterprise Architecture | Emerging Technologies | Digital Transformation
3moBrilliant
Software Engineer at LinkedIn
3mothanks for the sharing, great learning, may I ask any detailed use case for the conclusion: "Lesson: Consider larger containers (in terms of memory limit) with lesser replicas than smaller containers with more replicas. This reserves enough room for native memory consumers and reduces the risk of unexpected OOM errors, with reduced resource waste. Alternatively: remove memory limit for the container, and manually set the heap size."
Software Architect | Java Developer | Cloud-Native Enthusiast
3moNice article, waiting for Part 2 :)
Lead Performance Engineer at bKash Limited
3moExcellent article!