SlideShare a Scribd company logo
1
How Much Kafka?
The Art and Science of Capacity Planning
2
About Me
● Gwen Shapira
● Principal Data Architect
● At Confluent
● Committer to Apache Kafka
● Tweets. A lot. @gwenshap
3
In Which We Answer Numeric
Questions About Kafka
4
Factors
● Performance Requirements
● Availability Requirements
● Stability concerns
● Organizational concerns
● Operations “comfort zone”
● Kafka limitations
● Costs
● Rate of upgrades
5
Choose 2:
Easy to
Operate
Cheap
Fast
6
Choose 2:
7
It is tradeoffs all the way down
● Retention - disk size
● Throughput - network, CPU
● Producer performance - disk IO
● Consumer performance - CPU, memory
Just performance
requirements!
8
How Many Clusters?
9
Separate clusters for...
● DR
● Geographical distribution
● Writing vs Reading
● Real-time vs Batch
● Dev / Test
● High throughput
● Highly reliable
● Security
10
You will have many clusters.
Plan for it.
11
How Many Brokers?
12
Kafka is built to scale horizontally
● Largest cluster: 200+ nodes
● Lots of work on improving controller in 1.1, 2.0
● Larger / more loaded brokers mean longer restarts and recovery.
● Larger brokers require tuning to take full advantage.
13
Disks depends on version
- Before 1.1: RAID 10 recommended
- 1.1 and up:
- KIP 112 - Broker will survive loss of disk
- KIP 113 - Can assign replicas to specific disks
14
How Many Zookeepers?
15
Right now: 3 or 5
16
Future: 0
17
How Many Topics?
18
From capacity perspective
Topics don’t exist.
19
How Many Partitions?
20
The 25K question
1. Read Jun Rao blog on topic
2. More partitions == more scale
3. More partitions == more throughput
4. More partitions != more speed
5. Controller improvements in 1.1 mean more
partitions per broker
21
Do not design for partition
per user or device!
22
How Much Memory?
23
# of partitions
*
Max Fetch Size
+
Compaction Buffer
+
“Extra”
24
How Many clients?
25
Lots. 50K has been done.
26
Consumer groups have been tested
at 50+ consumers per group
27
Not all clients are same
1. Producers have very high throughput
2. Especially when tuned
3. EOS / Order require single-writer-for-entity
4. Many consumer groups is where Kafka shines
28
How Much Throughput?
29
Benchmarks
● Kafka ships with performance tools
● And your fav language has tools
● Your own workload (or similar)
● Your own configuration
● Your own failure scenarios
30
Tuning
● Don’t fly blind
● Why is it slow?
● Where is the bottleneck?
● Version control for all configuration
● Automate the
“change->test->observe” loop
31
My broker is slow 101
● Are all brokers working?
● Did you saturate network capacity?
● Is CPU utilization high?
● Are you running an old version?
● Do you have HUGE messages?
● Is it really the broker?
32
Capture as many metrics as possible
Alert / Dashboard select few
33
Do you enjoy tuning, capacity
planning, monitoring and all that?
If yes - we are hiring.
If not - check out Confluent Cloud
34
Questions?
https://slackpass.io/confluentcommunity
https://www.confluent.io/blog
https://confluent.cloud
@gwenshap
@confluentinc
35
Resources
http://tutorials.jenkov.com/java-performance/jmh.html
http://notes.stephenholiday.com/Kafka.pdf
https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
36

More Related Content

PDF
Apache Kafka – (Pattern and) Anti-Pattern
PDF
Exactly-once Semantics in Apache Kafka
PPTX
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
PDF
Apache Kafka - Martin Podval
PDF
High performance messaging with Apache Pulsar
PPTX
Stabilizing SE Build - Selenium conf 2013
PDF
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
PPTX
Ansible E2E Testing
Apache Kafka – (Pattern and) Anti-Pattern
Exactly-once Semantics in Apache Kafka
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
Apache Kafka - Martin Podval
High performance messaging with Apache Pulsar
Stabilizing SE Build - Selenium conf 2013
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Ansible E2E Testing

What's hot (20)

PDF
Pulsar - flexible pub-sub for internet scale
PDF
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
PPTX
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
PPTX
Introduction to Vagrant
PDF
State of the CLI- Kat Marchan
PDF
Ensuring Consistency in a Replicated World
PDF
Introduction to Apache Kafka
PDF
Gluster Metrics: why they are crucial for running stable deployments of all s...
KEY
Nginx in production
PPTX
Spring Boot+Kafka: the New Enterprise Platform
PPTX
MySQL Multi-Master Replication
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Local Development Environments
PDF
FlurryDB: A Dynamically Scalable Relational Database with Virtual Machine Clo...
PDF
Apache Kafka - Free Friday
PDF
Kafka At Scale in the Cloud
PDF
Devoxx Morocco 2016 - Microservices with Kafka
PPTX
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
PDF
Multi-DC Kafka
PPTX
Introducing Exactly Once Semantics To Apache Kafka
Pulsar - flexible pub-sub for internet scale
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Introduction to Vagrant
State of the CLI- Kat Marchan
Ensuring Consistency in a Replicated World
Introduction to Apache Kafka
Gluster Metrics: why they are crucial for running stable deployments of all s...
Nginx in production
Spring Boot+Kafka: the New Enterprise Platform
MySQL Multi-Master Replication
Apache Kafka Architecture & Fundamentals Explained
Local Development Environments
FlurryDB: A Dynamically Scalable Relational Database with Virtual Machine Clo...
Apache Kafka - Free Friday
Kafka At Scale in the Cloud
Devoxx Morocco 2016 - Microservices with Kafka
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Multi-DC Kafka
Introducing Exactly Once Semantics To Apache Kafka
Ad

Similar to How Much Kafka? (20)

PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
PDF
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
PDF
PDF
Buytaert kris my_sql-pacemaker
PDF
Apache KAfka
PDF
Non-Kafkaesque Apache Kafka - Yottabyte 2018
PPTX
Cognos Performance Tuning Tips & Tricks
PDF
Tips & Tricks for Apache Kafka®
PPTX
Kafka at scale facebook israel
PPTX
Scaling apps for the big time
PDF
Making Apache Kafka Even Faster And More Scalable
PDF
Still All on One Server: Perforce at Scale
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
PDF
Ambedded - how to build a true no single point of failure ceph cluster
PPTX
Oracle Performance On Linux X86 systems
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
PDF
Our Multi-Year Journey to a 10x Faster Confluent Cloud
PDF
Tuning Linux Windows and Firebird for Heavy Workload
PDF
Architecture for building scalable and highly available Postgres Cluster
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Ceph Community Talk on High-Performance Solid Sate Ceph
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Buytaert kris my_sql-pacemaker
Apache KAfka
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Cognos Performance Tuning Tips & Tricks
Tips & Tricks for Apache Kafka®
Kafka at scale facebook israel
Scaling apps for the big time
Making Apache Kafka Even Faster And More Scalable
Still All on One Server: Perforce at Scale
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Ambedded - how to build a true no single point of failure ceph cluster
Oracle Performance On Linux X86 systems
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Tuning Linux Windows and Firebird for Heavy Workload
Architecture for building scalable and highly available Postgres Cluster
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx

How Much Kafka?