SlideShare a Scribd company logo
What to Do if Your
Kafka Streams App Gets OOMKilled?
Andrey Serebryanskiy
{
Andrey Serebryanskiy
Streaming Platform Owner at Raiffeisen Bank
What to Do if Your Kafka
Streams App Gets
OOMKilled?
The problem
3
Kafka Kafka Streams App
kubernetes
RocksDB
What is the problem with
this app?
Launch with resource limits
5
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -jar
- app.jar
...
How to check if it is OOMKilled?
6
kubectl describe pod your-pod-name -n your-namespace
Name: your-pod-name
...
Containers:
app:
...
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Example app
Example app
8
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> ...);
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
9
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> ...);
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
10
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> logMessageCount());
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
11
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
var builder = new StreamsBuilder();
var stream = builder.stream(INPUT_TOPIC);
builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STATE_STORE_NAME));
var persistedStream = stream.transformValues((readOnlyKey, val) -> {
...
stateStore.put(readOnlyKey, val);
...
}, STATE_STORE_NAME);
persistedStream.foreach((key, val) -> logMessageCount());
var topology = builder.build();
var kafkaStreams = new KafkaStreams(topology);
runApp(kafkaStreams);
}
Example app
12
Simple Kafka Streams topology
Application.java
public static void main(String[] args) {
...
runApp(kafkaStreams);
}
Please find full application code here:
https://github.com/a-serebryanskiy/kafka-streams-oom-killed
So you have your app
OOMKilled
Launch with resource limits
14
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -jar
- app.jar
...
Let’s add heap limits
15
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -XshowSettings:VM
- -XX:MinRAMPercentage=50.0
- -jar
- app.jar
...
Let’s add heap limits
16
helm/templates/deployment.yaml
...
resources:
limits:
memory: 256Mi
requests:
memory: 128Mi
command:
- java
args:
- -XshowSettings:VM
- -XX:MinRAMPercentage=50.0
- -jar
- app.jar
...
https://www.baeldung.com/java-jvm-parameters-rampercentage
VM settings:
Max. Heap Size (Estimated): 121.81M
Property settings:
java.version = 11.0.12
Turns out 50.0 is already a default value
App memory performance
17
Taken from grafana
container memory limit
container memory usage
jvm memory
100 mb
Kafka Streams
memory usage
Kafka Streams app memory
19
JVM Heap + RocksDB
Confluent article about Kafka Streams memory tunning:
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html
State Store
Kafka Streams
App
put(key, val)
Kafka Streams app memory
20
JVM Heap + RocksDB
Kafka Streams
App
CachingKeyValueStore
- ThreadCache context.cache()
Indexes
bloom
filters
block
cache
OS page
cache
memtable
State Store - RocksDB
! These items size
depends on number of
unique keys
JVM Heap
Not controlled by JVM
RocksDB memory details: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
put(key, val)
native put(key, val)
How to fix unbounded RocksDB memory usage?
21
Application.java
properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
”your.package.BoundedMemoryRocksDBConfig");
your.package.BoundedMemoryRocksDBConfig.java
@Override
public void setConfig(..., Options options, ...) {
BlockBasedTableConfig tableConfig = options.tableFormatConfig();
Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false);
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
...
}
How to fix unbounded RocksDB memory usage?
22
Application.java
properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
”your.package.BoundedMemoryRocksDBConfig");
your.package.BoundedMemoryRocksDBConfig.java
@Override
public void setConfig(..., Options options, ...) {
BlockBasedTableConfig tableConfig = options.tableFormatConfig();
Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false);
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
...
}
How to fix unbounded RocksDB memory usage?
23
Application.java
properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
”your.package.BoundedMemoryRocksDBConfig");
your.package.BoundedMemoryRocksDBConfig.java
@Override
public void setConfig(..., Options options, ...) {
BlockBasedTableConfig tableConfig = options.tableFormatConfig();
Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false);
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
}
Pay attention to the number of stores and partitions:
https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb
How to compute my
RocksDB memory?
Dynamic memory allocation
25
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
26
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
27
About using container props as env vars:
https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
helm/templates/deployment.yaml
...
env:
- name: CONTAINER_MEMORY_LIMIT
valueFrom:
resourceFieldRef:
containerName: app
resource: limits.memory
...
Dynamic memory allocation
28
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
29
helm/templates/deployment.yaml
env:
- name: OS_MEMORY_PERCENTAGE
value: "0.1"
- name: OFF_HEAP_SIZE_MB
value: "128"
- name: MAX_HEAP_SIZE_MB
value: "{{ .Values.heapSizeMb }}"
Dynamic memory allocation
30
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
31
helm/templates/deployment.yaml
env:
- name: OS_MEMORY_PERCENTAGE
value: "0.1"
# computed it based on the jcmd output
- name: OFF_HEAP_SIZE_MB
value: "128"
- name: MAX_HEAP_SIZE_MB
value: "{{ .Values.heapSizeMb }}"
If you would like to analyze non-heap JVM mem
32
1. Make sure you use JDK (not JRE) Docker image
2. Add to JVM args -XX:NativeMemoryTracking=summary
3. Execute command in container shell:
helm/templates/deployment.yaml
command:
- java
args:
- -XX:NativeMemoryTracking=summary
- -jar
- app.jar
bash
kubectl exec pod/your-pod-name -n your-namespace –it -- /bin/bash -c “jcmd 1 VM.native_memory”
Dynamic memory allocation
33
your.package.BoundedMemoryRocksDBConfig.java
private static long computeTotalRocksDbMem() {
long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT"));
double osPercentage = getEnv("OS_MEMORY_PERCENTAGE"));
long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB"));
long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB"));
long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024;
return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes;
}
Dynamic memory allocation
34
helm/templates/deployment.yaml
env:
- name: OFF_HEAP_SIZE_MB
value: "128"
- name: MAX_HEAP_SIZE_MB
value: "{{ .Values.heapSizeMb }}”
args:
- -Xmx{{ .Values.heapSizeMb }}m
- -jar
- app.jar
helm/values.yaml
heapSizeMb: 64
If you would like to profile your app
35
1) helm/templates/deployment.yaml
command:
- java
args:
- -Dcom.sun.management.jmxremote
- -Dcom.sun.management.jmxremote.port=13089
- -Dcom.sun.management.jmxremote.ssl=false
- -Dcom.sun.management.jmxremote.local.only=false
- -Dcom.sun.management.jmxremote.authenticate=false
- -Dcom.sun.management.jmxremote.rmi.port=13089
- -Djava.rmi.server.hostname=localhost
- -jar
- app.jar
ports:
- containerPort: 13089
name: jmx
protocol: TCP
2) kubectl port-forward pod/your-pod-name -n your-namespace 13089:13089
App memory performance (after fix)
36
Taken from grafana
Limitations are not the only way!
37
Links and materials
38
• How JVM analyze memory inside Docker container–
https://merikan.com/2019/04/jvm-in-a-container/#java-10
• How to use jcmd to analyze non-heap memory–
https://www.baeldung.com/native-memory-tracking-in-jvm
• Why use container_memory_working_set_bytes instead of container_memory_usage_bytes
https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e6
• How RocksDB store works with Kafka Streams:
https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/
• Linux memory controller for cgroups
https://lwn.net/Articles/432224/
/
Thank you!
39
https://t.me/aserebryanskiy
a.serebrianskiy@gmail.com

More Related Content

PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
CDC Stream Processing with Apache Flink
PDF
Kafka Streams Rebalances and Assignments: The Whole Story with Alieh Saeedi &...
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Kafka Streams: What it is, and how to use it?
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PDF
Scylla Summit 2022: ScyllaDB Embraces Wasm
PPTX
Introduction to Apache Kafka
Performance Tuning RocksDB for Kafka Streams’ State Stores
CDC Stream Processing with Apache Flink
Kafka Streams Rebalances and Assignments: The Whole Story with Alieh Saeedi &...
Apache Kafka Architecture & Fundamentals Explained
Kafka Streams: What it is, and how to use it?
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Scylla Summit 2022: ScyllaDB Embraces Wasm
Introduction to Apache Kafka

What's hot (20)

PPTX
RedisConf17- Using Redis at scale @ Twitter
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PDF
Quarkus - a next-generation Kubernetes Native Java framework
PPTX
Kafka at Peak Performance
PDF
Page cache in Linux kernel
PDF
Planning for Disaster Recovery (DR) with Galera Cluster
PPTX
Introduction to Apache Kafka
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
Understanding Data Partitioning and Replication in Apache Cassandra
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PPTX
Kafka monitoring using Prometheus and Grafana
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Ceph Block Devices: A Deep Dive
PPTX
Introduction to redis
PDF
WebAssembly Overview
PDF
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
PDF
Linux Profiling at Netflix
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
RedisConf17- Using Redis at scale @ Twitter
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Quarkus - a next-generation Kubernetes Native Java framework
Kafka at Peak Performance
Page cache in Linux kernel
Planning for Disaster Recovery (DR) with Galera Cluster
Introduction to Apache Kafka
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Understanding Data Partitioning and Replication in Apache Cassandra
Introduction to Apache Flink - Fast and reliable big data processing
Kafka monitoring using Prometheus and Grafana
Flexible and Real-Time Stream Processing with Apache Flink
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Evening out the uneven: dealing with skew in Flink
Ceph Block Devices: A Deep Dive
Introduction to redis
WebAssembly Overview
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Linux Profiling at Netflix
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Ad

Similar to What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy (20)

PDF
Java on Linux for devs and ops
PPTX
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
PDF
Mastering java in containers - MadridJUG
PPTX
Devoxx France 2018 : Mes Applications en Production sur Kubernetes
PPTX
Java performance tuning
PPTX
Jug Lugano - Scale over the limits
PDF
ContainerWorkloadwithSemeru.pdf
PPT
Jvm Performance Tunning
PPT
Jvm Performance Tunning
PPTX
7 jvm-arguments-Confoo
PDF
Spark 2.x Troubleshooting Guide
 
PPT
Efficient Memory and Thread Management in Highly Parallel Java Applications
PPTX
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
PPTX
Rails Engine | Modular application
PPTX
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
PDF
Scalr Demo
PDF
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
PDF
10 examples of hot spot jvm options in java
PPTX
How to Troubleshoot 9 Types of OutOfMemoryError
PPTX
How to Troubleshoot 9 Types of OutOfMemoryError
Java on Linux for devs and ops
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Mastering java in containers - MadridJUG
Devoxx France 2018 : Mes Applications en Production sur Kubernetes
Java performance tuning
Jug Lugano - Scale over the limits
ContainerWorkloadwithSemeru.pdf
Jvm Performance Tunning
Jvm Performance Tunning
7 jvm-arguments-Confoo
Spark 2.x Troubleshooting Guide
 
Efficient Memory and Thread Management in Highly Parallel Java Applications
16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE
Rails Engine | Modular application
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Scalr Demo
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
10 examples of hot spot jvm options in java
How to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryError
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf

What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy

  • 1. What to Do if Your Kafka Streams App Gets OOMKilled? Andrey Serebryanskiy
  • 2. { Andrey Serebryanskiy Streaming Platform Owner at Raiffeisen Bank What to Do if Your Kafka Streams App Gets OOMKilled?
  • 3. The problem 3 Kafka Kafka Streams App kubernetes RocksDB
  • 4. What is the problem with this app?
  • 5. Launch with resource limits 5 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -jar - app.jar ...
  • 6. How to check if it is OOMKilled? 6 kubectl describe pod your-pod-name -n your-namespace Name: your-pod-name ... Containers: app: ... Last State: Terminated Reason: OOMKilled Exit Code: 137
  • 8. Example app 8 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> ...); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 9. Example app 9 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> ...); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 10. Example app 10 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> logMessageCount()); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 11. Example app 11 Simple Kafka Streams topology Application.java public static void main(String[] args) { var builder = new StreamsBuilder(); var stream = builder.stream(INPUT_TOPIC); builder.addStateStore(Stores.keyValueStoreBuilder( Stores.persistentKeyValueStore(STATE_STORE_NAME)); var persistedStream = stream.transformValues((readOnlyKey, val) -> { ... stateStore.put(readOnlyKey, val); ... }, STATE_STORE_NAME); persistedStream.foreach((key, val) -> logMessageCount()); var topology = builder.build(); var kafkaStreams = new KafkaStreams(topology); runApp(kafkaStreams); }
  • 12. Example app 12 Simple Kafka Streams topology Application.java public static void main(String[] args) { ... runApp(kafkaStreams); } Please find full application code here: https://github.com/a-serebryanskiy/kafka-streams-oom-killed
  • 13. So you have your app OOMKilled
  • 14. Launch with resource limits 14 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -jar - app.jar ...
  • 15. Let’s add heap limits 15 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -XshowSettings:VM - -XX:MinRAMPercentage=50.0 - -jar - app.jar ...
  • 16. Let’s add heap limits 16 helm/templates/deployment.yaml ... resources: limits: memory: 256Mi requests: memory: 128Mi command: - java args: - -XshowSettings:VM - -XX:MinRAMPercentage=50.0 - -jar - app.jar ... https://www.baeldung.com/java-jvm-parameters-rampercentage VM settings: Max. Heap Size (Estimated): 121.81M Property settings: java.version = 11.0.12 Turns out 50.0 is already a default value
  • 17. App memory performance 17 Taken from grafana container memory limit container memory usage jvm memory 100 mb
  • 19. Kafka Streams app memory 19 JVM Heap + RocksDB Confluent article about Kafka Streams memory tunning: https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html State Store Kafka Streams App put(key, val)
  • 20. Kafka Streams app memory 20 JVM Heap + RocksDB Kafka Streams App CachingKeyValueStore - ThreadCache context.cache() Indexes bloom filters block cache OS page cache memtable State Store - RocksDB ! These items size depends on number of unique keys JVM Heap Not controlled by JVM RocksDB memory details: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB put(key, val) native put(key, val)
  • 21. How to fix unbounded RocksDB memory usage? 21 Application.java properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, ”your.package.BoundedMemoryRocksDBConfig"); your.package.BoundedMemoryRocksDBConfig.java @Override public void setConfig(..., Options options, ...) { BlockBasedTableConfig tableConfig = options.tableFormatConfig(); Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false); tableConfig.setBlockCache(cache); tableConfig.setCacheIndexAndFilterBlocks(true); options.setWriteBufferManager(writeBufferManager); ... }
  • 22. How to fix unbounded RocksDB memory usage? 22 Application.java properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, ”your.package.BoundedMemoryRocksDBConfig"); your.package.BoundedMemoryRocksDBConfig.java @Override public void setConfig(..., Options options, ...) { BlockBasedTableConfig tableConfig = options.tableFormatConfig(); Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false); tableConfig.setBlockCache(cache); tableConfig.setCacheIndexAndFilterBlocks(true); options.setWriteBufferManager(writeBufferManager); ... }
  • 23. How to fix unbounded RocksDB memory usage? 23 Application.java properties.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, ”your.package.BoundedMemoryRocksDBConfig"); your.package.BoundedMemoryRocksDBConfig.java @Override public void setConfig(..., Options options, ...) { BlockBasedTableConfig tableConfig = options.tableFormatConfig(); Cache cache = new LRUCache(computeTotalRocksDbMem(), -1, false); tableConfig.setBlockCache(cache); tableConfig.setCacheIndexAndFilterBlocks(true); } Pay attention to the number of stores and partitions: https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html#rocksdb
  • 24. How to compute my RocksDB memory?
  • 25. Dynamic memory allocation 25 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 26. Dynamic memory allocation 26 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 27. Dynamic memory allocation 27 About using container props as env vars: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/ helm/templates/deployment.yaml ... env: - name: CONTAINER_MEMORY_LIMIT valueFrom: resourceFieldRef: containerName: app resource: limits.memory ...
  • 28. Dynamic memory allocation 28 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 29. Dynamic memory allocation 29 helm/templates/deployment.yaml env: - name: OS_MEMORY_PERCENTAGE value: "0.1" - name: OFF_HEAP_SIZE_MB value: "128" - name: MAX_HEAP_SIZE_MB value: "{{ .Values.heapSizeMb }}"
  • 30. Dynamic memory allocation 30 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 31. Dynamic memory allocation 31 helm/templates/deployment.yaml env: - name: OS_MEMORY_PERCENTAGE value: "0.1" # computed it based on the jcmd output - name: OFF_HEAP_SIZE_MB value: "128" - name: MAX_HEAP_SIZE_MB value: "{{ .Values.heapSizeMb }}"
  • 32. If you would like to analyze non-heap JVM mem 32 1. Make sure you use JDK (not JRE) Docker image 2. Add to JVM args -XX:NativeMemoryTracking=summary 3. Execute command in container shell: helm/templates/deployment.yaml command: - java args: - -XX:NativeMemoryTracking=summary - -jar - app.jar bash kubectl exec pod/your-pod-name -n your-namespace –it -- /bin/bash -c “jcmd 1 VM.native_memory”
  • 33. Dynamic memory allocation 33 your.package.BoundedMemoryRocksDBConfig.java private static long computeTotalRocksDbMem() { long totalContainerMemoryBytes = getEnv("CONTAINER_MEMORY_LIMIT")); double osPercentage = getEnv("OS_MEMORY_PERCENTAGE")); long offHeapSizeMb = getEnv("OFF_HEAP_SIZE_MB")); long maxHeapSizeMb = getEnv("MAX_HEAP_SIZE_MB")); long jvmMemoryBytes = (offHeapSizeMb + maxHeapSizeMb) * 1024 * 1024; return totalContainerMemoryBytes * (1 - osPercentage)) - jvmMemoryBytes; }
  • 34. Dynamic memory allocation 34 helm/templates/deployment.yaml env: - name: OFF_HEAP_SIZE_MB value: "128" - name: MAX_HEAP_SIZE_MB value: "{{ .Values.heapSizeMb }}” args: - -Xmx{{ .Values.heapSizeMb }}m - -jar - app.jar helm/values.yaml heapSizeMb: 64
  • 35. If you would like to profile your app 35 1) helm/templates/deployment.yaml command: - java args: - -Dcom.sun.management.jmxremote - -Dcom.sun.management.jmxremote.port=13089 - -Dcom.sun.management.jmxremote.ssl=false - -Dcom.sun.management.jmxremote.local.only=false - -Dcom.sun.management.jmxremote.authenticate=false - -Dcom.sun.management.jmxremote.rmi.port=13089 - -Djava.rmi.server.hostname=localhost - -jar - app.jar ports: - containerPort: 13089 name: jmx protocol: TCP 2) kubectl port-forward pod/your-pod-name -n your-namespace 13089:13089
  • 36. App memory performance (after fix) 36 Taken from grafana
  • 37. Limitations are not the only way! 37
  • 38. Links and materials 38 • How JVM analyze memory inside Docker container– https://merikan.com/2019/04/jvm-in-a-container/#java-10 • How to use jcmd to analyze non-heap memory– https://www.baeldung.com/native-memory-tracking-in-jvm • Why use container_memory_working_set_bytes instead of container_memory_usage_bytes https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e6 • How RocksDB store works with Kafka Streams: https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/ • Linux memory controller for cgroups https://lwn.net/Articles/432224/