SlideShare a Scribd company logo
From Message to
Cluster
A Realworld Introduction to Kafka Capacity Planning.
Jason “Jase” Bell - @jasonbelldata
https://digitalis.io
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
MeetupCat is my spirit animal.
Flight Mode is ON! You may……
• Heckle.
• Ask Questions.
• Heckle More.
• Talk about steak.
• Heckle again.
What I’m Going To
Cover
What I’m Going To
Cover
• The Old Days.
• The Now Times.
• The Stuff We Don’t Talk About
• The Message
• What I Usually Ask For
• Retention
• Estimated Capacity
• Compression
• Stress Testing
• Network and Disk Throughput
• Topic Partitions
• Kafka Connect
• KSQL
• Replicator
• Parting Thoughts…..
• ———————————————————
• Rapturous Applause
• Encore (Probably Eye of the Tiger……)
The Old Days
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
The Now Times
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
The Stuff We Don’t
Talk About
We think we know what
we need from our Kafka
Cluster
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
The Message
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
{
"text": "RT @PostGradProblem: In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball
team during ...",
"truncated": true,
"in_reply_to_user_id": null,
"in_reply_to_status_id": null,
"favorited": false,
"source": "<a href="http://twitter.com/" rel="nofollow">Twitter for iPhone</a>",
"in_reply_to_screen_name": null,
"in_reply_to_status_id_str": null,
"id_str": "54691802283900928",
"entities": {
"user_mentions": [
{
"indices": [
3,
19
],
"screen_name": "PostGradProblem",
"id_str": "271572434",
"name": "PostGradProblems",
"id": 271572434
}
],
"urls": [ ],
"hashtags": [ ]
},
"contributors": null,
"retweeted": false,
"in_reply_to_user_id_str": null,
"place": null,
"retweet_count": 4,
"created_at": "Sun Apr 03 23:48:36 +0000 2011",
"retweeted_status": {
"text": "In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during
company time. #PGP",
"truncated": false,
"in_reply_to_user_id": null,
"in_reply_to_status_id": null,
"favorited": false,
"source": "<a href="http://www.hootsuite.com" rel="nofollow">HootSuite</a>",
"in_reply_to_screen_name": null,
"in_reply_to_status_id_str": null,
"id_str": "54640519019642881",
"entities": {
"user_mentions": [ ],
"urls": [ ],
"hashtags": [
Twitter JSON Payload ~6kb
What I Usually Ask
For
•Average Message Size
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
•Desired Replication Factor
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
•Desired Replication Factor
•Desired Partitions
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
•Desired Replication Factor
•Desired Partitions
•Minimum In-sync Replicas
What I’ll Ask Team For…
•Average Message Size - (6 KB)
•Estimated Daily Quantity - (10,000,000/d)
•Any Peak Per Hour Quantity - (1,250,000)
•Desired Replication Factor - (4)
•Desired Partitions - (10)
•Minimum In-sync Replicas - (2)
What I’ll Ask Team For…
Estimated Capacity
Estimated Capacity
(Message size x 3) x Daily Qty
x 1.4 (add 40%)
= Volume per replicated broker.
Estimated Capacity
(6KB x 3) x 10,0000,000 = 184,320,000 KB
x 1.4 (add 40%)
= 258,048,000 KB
= 248.09 GB
Roughly translates to 2.940 MB/sec
Estimated Capacity
The x3 gives me a payload size with key,
header, timestamp and the value. It’s just a
rough calculation.
Estimated Capacity
The x3 gives me a payload size with key,
header, timestamp and the value. It’s just a
rough calculation.
Adding 40% overhead will give you some
breathing space when someone does a
stress test and doesn’t tell you…..
Retention
(6KB x 3) x 10,0000,000 = 184,320,000 KB
x 1.4 (add 40%)
= 258,048,000 KB
= 248.09 GB
248.09 GB/day x 14 days retention
= 3.4 TB per broker.
Estimated Capacity
df -hIs your friend…..
Estimated Capacity
du -H .Is also your friend…..
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Compression
Producer configuration compression.type defaults to “none”.
Options are gzip, snappy, lz4 and zstd.
Expect ~20%-40% message compression depending on the algorithm used.
Stress Testing
kafka-producer-perf-test --topic TOPIC --record-size SIZE_IN_BYTES
$ bin/kafka-producer-perf-test --topic testtopic --record-size 1000 --num-
records 10000  --throughput 1000 --producer-props
bootstrap.servers=localhost:9092
5003 records sent, 1000.4 records/sec (0.95 MB/sec), 1.6 ms avg latency,
182.0 ms max latency.
10000 records sent, 998.801438 records/sec (0.95 MB/sec), 1.12 ms avg
latency, 182.00 ms max latency, 1 ms 50th, 2 ms 95th, 19 ms 99th, 23 ms
99.9th.
kafka-consumer-perf-test --broker-list host1:port1,host2:port2 --topic
TOPIC
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Network and Disk
Throughput
• D - Data to be written (MB/sec)
• R - Replication Factor
• C - Number of Consumer Groups (readers for each write)
The Volume of Writes: (D * R)
The Volume of Reads within Replication: ((R-1) * D)
Reads happen internally by the replicas, this gives us:
The Volume of Reads within Replication: ((R - 1) * D)
Reads happen internally by the replicas, this gives us:
Adding the consumers we end up with:
The Volume of Reads within Replication: (((R + C) - 1) * D)
We have memory! We have Caching!
M/(D * R) = seconds of writes cached.
We have memory! We have Caching!
M/(D * R) = seconds of writes cached.
We have to assume that consumers might drop from the cache, consumers are running
slower than expected or even that replicas might restart due to failure, patching or
rolling restarts.
Lagging Readers L = R + C - 1
Disk Throughput: D * R + L * D
Network (reads) Throughput: ((R + C -1) * D)
Network (writes) Throughput: D * R
Topic Partitions
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
You can set partitions either creating
the topic (—partitions n) or afterwards.
Having a large number of partitions will have effects on Zookeeper znodes.
• More network requests
• If leader or broker goes down it may affect startup
time as the broker returns to the cluster.
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
If you need to reduce partitions create a new topic and reduce the partition count.
Kafka Connect
The latency trap…..
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Think about second and third order
consequences if a connector would fail.
What is the impact?
The latency trap…..
KSQL
ksqlDB
•4 CPU Cores
•32GB RAM
•100GB SSD Disk
•1Gbit Network
Baseline Server Requirements
ksqlDB
•Partition Count of 4
•Replication Factor of 1
Default Outbound Topic Assumptions
(These settings can be modified within your CREATE query)
ksqlDB
Some queries will require repartitioning
and intermediate topics for certain
operations, taking all available records.
Default Outbound Topic Assumptions
ksqlDB
Processing Small Message/Many Columns
= CPU Saturation
Default Outbound Topic Assumptions
ksqlDB
Processing Large Message/Small Columns
= Network Saturation
Default Outbound Topic Assumptions
Replicator
Data Centre to Data Centre is going to lead to increased network latency.
On producers and consumers, use send.buffer.bytes and receive.buffer.bytes.
On brokers, use socket.send.buffer.bytes and socket.receive.buffer.bytes. 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Parting Thoughts
Consumer Group Lag Reports are your guiding light.
(If you have Rundeck setup a scheduled job to email
you the log output)
1
kafka-consumer-groups --bootstrap-server BROKER_ADDRESS --describe
—group CONSUMER_GROUP --new-consumer
Kafka is about trade offs, from the producer right the
way through to the consumer (and beyond).
There’s no right or wrong answer, just
experimentation, monitoring and learning.
2
While securing Kafka is important there is also a
cost as certificates are verified and take up CPU
resources.
Your throughput will be affected.
3
The Kafka Ecosystem has increased in features over
the last few years. This has lead to increased topic
and disk space usages that need to be factored in to
capacity planning calculations.
4
"Can you create me a topic please?”
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Thank you.
Many thanks to Shay and David for organising, everyone who attended and sent
kind wishes. Lastly, a huge thank you to MeetupCat.
Photo supplied by @jbfletch_

More Related Content

PDF
Fundamentals of Apache Kafka
PDF
Producer Performance Tuning for Apache Kafka
PDF
Apache Kafka - Martin Podval
ODP
Stream processing using Kafka
PDF
Apache Kafka Introduction
PPTX
Kafka 101
ODP
Introduction to Kafka connect
PPTX
Apache kafka
Fundamentals of Apache Kafka
Producer Performance Tuning for Apache Kafka
Apache Kafka - Martin Podval
Stream processing using Kafka
Apache Kafka Introduction
Kafka 101
Introduction to Kafka connect
Apache kafka

What's hot (20)

PDF
PPTX
Introduction to Apache Kafka
PDF
Introduction to Apache Kafka
PPTX
Apache kafka
PPTX
Apache Kafka
PPTX
Introduction to Kafka
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PDF
Introduction to apache kafka
PPTX
Apache Kafka at LinkedIn
PPTX
Apache Kafka
PDF
An Introduction to Apache Kafka
PDF
Common issues with Apache Kafka® Producer
PPTX
A visual introduction to Apache Kafka
PDF
ksqlDB: A Stream-Relational Database System
PDF
ksqlDB - Stream Processing simplified!
PPTX
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PDF
A Deep Dive into Kafka Controller
Introduction to Apache Kafka
Introduction to Apache Kafka
Apache kafka
Apache Kafka
Introduction to Kafka
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
APACHE KAFKA / Kafka Connect / Kafka Streams
Introduction to apache kafka
Apache Kafka at LinkedIn
Apache Kafka
An Introduction to Apache Kafka
Common issues with Apache Kafka® Producer
A visual introduction to Apache Kafka
ksqlDB: A Stream-Relational Database System
ksqlDB - Stream Processing simplified!
Apache Kafka Fundamentals for Architects, Admins and Developers
A Deep Dive into Kafka Controller
Ad

Similar to From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning (20)

PDF
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
PDF
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
PDF
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
PDF
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
PPTX
M6d cassandrapresentation
PDF
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
PDF
Micro-batching: High-performance writes
PPT
High Frequency Trading and NoSQL database
PDF
Designs, Lessons and Advice from Building Large Distributed Systems
PDF
Optimizing MongoDB: Lessons Learned at Localytics
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PPTX
MongoDB for Time Series Data: Sharding
PDF
«Scrapy internals» Александр Сибиряков, Scrapinghub
PDF
Memory: The New Disk
PPTX
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
PDF
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
PDF
Top 5 mistakes when writing Spark applications
PPTX
Kafka overview v0.1
PDF
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
PDF
Kafka to the Maxka - (Kafka Performance Tuning)
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
M6d cassandrapresentation
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance writes
High Frequency Trading and NoSQL database
Designs, Lessons and Advice from Building Large Distributed Systems
Optimizing MongoDB: Lessons Learned at Localytics
Cassandra @ Sony: The good, the bad, and the ugly part 2
MongoDB for Time Series Data: Sharding
«Scrapy internals» Александр Сибиряков, Scrapinghub
Memory: The New Disk
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
Top 5 mistakes when writing Spark applications
Kafka overview v0.1
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Kafka to the Maxka - (Kafka Performance Tuning)
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Mobile App Security Testing_ A Comprehensive Guide.pdf
KodekX | Application Modernization Development
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning

  • 1. From Message to Cluster A Realworld Introduction to Kafka Capacity Planning. Jason “Jase” Bell - @jasonbelldata
  • 4. MeetupCat is my spirit animal.
  • 5. Flight Mode is ON! You may…… • Heckle. • Ask Questions. • Heckle More. • Talk about steak. • Heckle again.
  • 6. What I’m Going To Cover
  • 7. What I’m Going To Cover
  • 8. • The Old Days. • The Now Times. • The Stuff We Don’t Talk About • The Message • What I Usually Ask For • Retention • Estimated Capacity • Compression • Stress Testing • Network and Disk Throughput • Topic Partitions • Kafka Connect • KSQL • Replicator • Parting Thoughts….. • ——————————————————— • Rapturous Applause • Encore (Probably Eye of the Tiger……)
  • 13. The Stuff We Don’t Talk About
  • 14. We think we know what we need from our Kafka Cluster
  • 24. { "text": "RT @PostGradProblem: In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during ...", "truncated": true, "in_reply_to_user_id": null, "in_reply_to_status_id": null, "favorited": false, "source": "<a href="http://twitter.com/" rel="nofollow">Twitter for iPhone</a>", "in_reply_to_screen_name": null, "in_reply_to_status_id_str": null, "id_str": "54691802283900928", "entities": { "user_mentions": [ { "indices": [ 3, 19 ], "screen_name": "PostGradProblem", "id_str": "271572434", "name": "PostGradProblems", "id": 271572434 } ], "urls": [ ], "hashtags": [ ] }, "contributors": null, "retweeted": false, "in_reply_to_user_id_str": null, "place": null, "retweet_count": 4, "created_at": "Sun Apr 03 23:48:36 +0000 2011", "retweeted_status": { "text": "In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during company time. #PGP", "truncated": false, "in_reply_to_user_id": null, "in_reply_to_status_id": null, "favorited": false, "source": "<a href="http://www.hootsuite.com" rel="nofollow">HootSuite</a>", "in_reply_to_screen_name": null, "in_reply_to_status_id_str": null, "id_str": "54640519019642881", "entities": { "user_mentions": [ ], "urls": [ ], "hashtags": [ Twitter JSON Payload ~6kb
  • 25. What I Usually Ask For
  • 26. •Average Message Size What I’ll Ask Team For…
  • 27. •Average Message Size •Estimated Daily Quantity What I’ll Ask Team For…
  • 28. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity What I’ll Ask Team For…
  • 29. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity •Desired Replication Factor What I’ll Ask Team For…
  • 30. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity •Desired Replication Factor •Desired Partitions What I’ll Ask Team For…
  • 31. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity •Desired Replication Factor •Desired Partitions •Minimum In-sync Replicas What I’ll Ask Team For…
  • 32. •Average Message Size - (6 KB) •Estimated Daily Quantity - (10,000,000/d) •Any Peak Per Hour Quantity - (1,250,000) •Desired Replication Factor - (4) •Desired Partitions - (10) •Minimum In-sync Replicas - (2) What I’ll Ask Team For…
  • 34. Estimated Capacity (Message size x 3) x Daily Qty x 1.4 (add 40%) = Volume per replicated broker.
  • 35. Estimated Capacity (6KB x 3) x 10,0000,000 = 184,320,000 KB x 1.4 (add 40%) = 258,048,000 KB = 248.09 GB Roughly translates to 2.940 MB/sec
  • 36. Estimated Capacity The x3 gives me a payload size with key, header, timestamp and the value. It’s just a rough calculation.
  • 37. Estimated Capacity The x3 gives me a payload size with key, header, timestamp and the value. It’s just a rough calculation. Adding 40% overhead will give you some breathing space when someone does a stress test and doesn’t tell you…..
  • 38. Retention (6KB x 3) x 10,0000,000 = 184,320,000 KB x 1.4 (add 40%) = 258,048,000 KB = 248.09 GB 248.09 GB/day x 14 days retention = 3.4 TB per broker.
  • 39. Estimated Capacity df -hIs your friend…..
  • 40. Estimated Capacity du -H .Is also your friend…..
  • 43. Producer configuration compression.type defaults to “none”. Options are gzip, snappy, lz4 and zstd. Expect ~20%-40% message compression depending on the algorithm used.
  • 45. kafka-producer-perf-test --topic TOPIC --record-size SIZE_IN_BYTES
  • 46. $ bin/kafka-producer-perf-test --topic testtopic --record-size 1000 --num- records 10000  --throughput 1000 --producer-props bootstrap.servers=localhost:9092 5003 records sent, 1000.4 records/sec (0.95 MB/sec), 1.6 ms avg latency, 182.0 ms max latency. 10000 records sent, 998.801438 records/sec (0.95 MB/sec), 1.12 ms avg latency, 182.00 ms max latency, 1 ms 50th, 2 ms 95th, 19 ms 99th, 23 ms 99.9th.
  • 50. • D - Data to be written (MB/sec) • R - Replication Factor • C - Number of Consumer Groups (readers for each write)
  • 51. The Volume of Writes: (D * R)
  • 52. The Volume of Reads within Replication: ((R-1) * D) Reads happen internally by the replicas, this gives us:
  • 53. The Volume of Reads within Replication: ((R - 1) * D) Reads happen internally by the replicas, this gives us: Adding the consumers we end up with: The Volume of Reads within Replication: (((R + C) - 1) * D)
  • 54. We have memory! We have Caching! M/(D * R) = seconds of writes cached.
  • 55. We have memory! We have Caching! M/(D * R) = seconds of writes cached. We have to assume that consumers might drop from the cache, consumers are running slower than expected or even that replicas might restart due to failure, patching or rolling restarts. Lagging Readers L = R + C - 1
  • 56. Disk Throughput: D * R + L * D Network (reads) Throughput: ((R + C -1) * D) Network (writes) Throughput: D * R
  • 62. You can set partitions either creating the topic (—partitions n) or afterwards.
  • 63. Having a large number of partitions will have effects on Zookeeper znodes. • More network requests • If leader or broker goes down it may affect startup time as the broker returns to the cluster.
  • 65. If you need to reduce partitions create a new topic and reduce the partition count.
  • 69. Think about second and third order consequences if a connector would fail. What is the impact?
  • 71. KSQL
  • 72. ksqlDB •4 CPU Cores •32GB RAM •100GB SSD Disk •1Gbit Network Baseline Server Requirements
  • 73. ksqlDB •Partition Count of 4 •Replication Factor of 1 Default Outbound Topic Assumptions (These settings can be modified within your CREATE query)
  • 74. ksqlDB Some queries will require repartitioning and intermediate topics for certain operations, taking all available records. Default Outbound Topic Assumptions
  • 75. ksqlDB Processing Small Message/Many Columns = CPU Saturation Default Outbound Topic Assumptions
  • 76. ksqlDB Processing Large Message/Small Columns = Network Saturation Default Outbound Topic Assumptions
  • 78. Data Centre to Data Centre is going to lead to increased network latency.
  • 79. On producers and consumers, use send.buffer.bytes and receive.buffer.bytes. On brokers, use socket.send.buffer.bytes and socket.receive.buffer.bytes. 
  • 82. Consumer Group Lag Reports are your guiding light. (If you have Rundeck setup a scheduled job to email you the log output) 1
  • 83. kafka-consumer-groups --bootstrap-server BROKER_ADDRESS --describe —group CONSUMER_GROUP --new-consumer
  • 84. Kafka is about trade offs, from the producer right the way through to the consumer (and beyond). There’s no right or wrong answer, just experimentation, monitoring and learning. 2
  • 85. While securing Kafka is important there is also a cost as certificates are verified and take up CPU resources. Your throughput will be affected. 3
  • 86. The Kafka Ecosystem has increased in features over the last few years. This has lead to increased topic and disk space usages that need to be factored in to capacity planning calculations. 4
  • 87. "Can you create me a topic please?”
  • 89. Thank you. Many thanks to Shay and David for organising, everyone who attended and sent kind wishes. Lastly, a huge thank you to MeetupCat. Photo supplied by @jbfletch_