SlideShare a Scribd company logo
@allenxwang
MultiCluster, MultiTenant and
Hierarchical Kafka Messaging Service
Allen Wang
Growing Pains for A Kafka Cluster
● A dozen brokers, handful topics, tens of partitions
○ Wonderful!
● Tens of brokers, tens of topics, hundreds of
partitions
○ Life is good!
● A hundred brokers, a hundred topics, thousands of
partitions
○ … OK
● Hundreds of brokers, hundreds of topics, one
hundred thousand partitions
○ ???
Why Huge Kafka Cluster Does Not Work
● Significant time increase on operations
○ Rolling restart/binary update
■ Three minutes per broker, 500 brokers = 1 whole day
○ Rolling AMI (image) update with data copying
■ One hour per broker, 500 brokers = 20 days
● Increased latency due to number of partitions
○ https://www.confluent.io/blog/how-to-choose-the-number
-of-topicspartitions-in-a-kafka-cluster/
● Vulnerability to ZK/Controller failures
Scaling and Data Balancing Challenge
● The problem with partition reassignment
○ Time consuming
○ Replication traffic taking bandwidth
○ Complexity of bin packing for data balancing
The Consumer Fan-out Problem
BytesOut = (numberOfConsumers + replicationFactor - 1) ✕ BytesIn
● A single cluster may easily fit for bytes in, but not
necessarily for bytes out
Solve Consumer Fan-out with Hierarchies
Inevitability of MultiCluster
The Idea
● Create many small and mostly “immutable”
clusters
● Organize them in a topology with routing service
connecting the clusters
Multi-Cluster Kafka Service At Netflix
Router
(w/ simple ETL)
Fronting
Kafka
Event
Producer
Consumer
Kafka
Management
HTTP
PROXY
Consumers
MultiCluster Producers
● Support producing to multiple clusters at the same
time
● High level producer API implemented by multiple
embedded Kafka producers
public interface KsProducer<V> {
// ...
<T extends V> CompletableFuture<SendResult> send(T obj)
}
● Dynamic topic to cluster mapping
{
"t1, t2" : {
"where" : [{
"sink" : "fronting-kafka-1"
}]
},
"t3" : {
"where" : [{
"sink" : "fronting-kafka-2"
}]
},
"__default__" : {
"where" : [ {
"sink" : "fronting-kafka-2"
}]
}
}
@Stream("foo") // send to topic “foo”
public class Foo {
// ...
}
@Stream("bar") // send to topic “bar”
public class Bar {
// ...
}
KsProducer<Object> producer = // …
producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic
producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
Fronting Kafka
● For data collection and buffering
● Optimized for producers
○ Only consumers are routers
Scaling of Fronting Kafka
● Creating / destroying Kafka clusters
○ E.g., create new topic on new clusters and update topic to
cluster mapping
● No partition reassignment
Data Balancing
● Assign the same number of partitions of any topic
to every brokers
○ E.g., for clusters of 12 brokers, create topics with partitions
of 12, 24, 36
○ Guaranteed even distribution of data (aside from
occasional leader imbalance)
● Balance data among clusters by moving topics
○ Must dynamically update topic to cluster mapping
Topic Move
RouterFronting
Kafka
Event
Producer
Consumer
Kafka
Create topic “foo”
Consumer
“foo”
“foo”
Consumer Kafka
● Optimized for consumers
○ Only producers are the routers
● Scaling
○ Add brokers and partitions for small cluster
○ Create new cluster, update routing and move consumers
Future Plan
● Cross-cluster topic
○ load sharing beyond single cluster
○ Auto-scale
○ Consumer/producer support needed
Multi-Cluster Consumer (Ongoing work)
● Same Kafka consumer interface
● Consume from multiple clusters with dynamic
topic to cluster mapping
○ Keep subscription state
○ Receive mapping updates
○ Create and delegate to underlying Kafka consumer for each
associated cluster on the fly
Multi-Cluster Consumer Topic to Cluster Mapping and
Code Example
{
"foo": [
{"vip": "consumerKafka1"},
{"vip": "consumerKafka2"}
],
“bar”: [
{“vip”: “consumerKafka3”}
]
}
// Create a multi-cluster consumer
Consumer<String, String> multiClusterConsumer = ...
// subscribe as usual and keep subscription state
consumer.subscribe(new ArrayList<String>(“foo”));
while (...) {
// fetch from both clusters for topic “foo” and
// return the aggregated records
ConsumerRecords<String, String> records =
consumer.poll(2000);
process(records);
}
Topic move for Multi-cluster Consumers
Multi-cluster Consumer
Producer
“foo”: “cluster1” “foo”: [“cluster1”]
“foo”: “cluster2”
“foo”: [“cluster1”, “cluster2]
“foo”: [“cluster2”]
cluster1
cluster2
Our Vision
Producers
“foo”
“foo”
“bar”
“bar”
“bar”
Multi-cluster
Consumer
Advanced Consumer
Router
Basic Consumer
Fronting Kafka w/
Cross-cluster Topics
Consumer Kafka
What About Keyed Messages
● Few topics requiring keyed messages
● Concerns for keyed messages
○ Inflexible/skewed load balancing
○ Difficult to scale
● Handling of keyed messages
○ Currently only produced by routers to consumer Kafka
○ Loose ordering guarantee
○ Strict key-consumer affinity guarantee
Think Differently on Scaling Kafka
The “broker” way The “cluster” way
Scale up Add brokers Add clusters
Data balance Move partitions to
different brokers
Move/expand topics to
different clusters
Producer Produce to different
brokers at the same time
Produce to different clusters at
the same time
Consumer Consume from different
brokers at the same time
Consume from different
clusters at the same time
Thank You
https://medium.com/netflix-techblog
https://jobs.netflix.com/

More Related Content

PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
PDF
Kafka Summit SF 2017 - Shopify Flash Sales with Apache Kafka
PDF
From Three Nines to Five Nines - A Kafka Journey
PDF
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
PPTX
Apache Kafka at LinkedIn
PDF
Kafka on ZFS: Better Living Through Filesystems
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Kafka Summit SF 2017 - Shopify Flash Sales with Apache Kafka
From Three Nines to Five Nines - A Kafka Journey
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Apache Kafka at LinkedIn
Kafka on ZFS: Better Living Through Filesystems
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul

What's hot (20)

PDF
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
PDF
Apache Kafka - Martin Podval
PDF
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
PDF
War Stories: DIY Kafka
PDF
Kafka on Pulsar
PDF
Follow the (Kafka) Streams
PDF
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
PDF
How Orange Financial combat financial frauds over 50M transactions a day usin...
PDF
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
PDF
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
PDF
Query Pulsar Streams using Apache Flink
PDF
Putting Kafka Together with the Best of Google Cloud Platform
PDF
Kafka on Kubernetes—From Evaluation to Production at Intuit
PDF
Lessons from managing a Pulsar cluster (Nutanix)
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
PDF
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
PDF
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
PPTX
Architecture of a Kafka camus infrastructure
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Apache Kafka - Martin Podval
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
War Stories: DIY Kafka
Kafka on Pulsar
Follow the (Kafka) Streams
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
How Orange Financial combat financial frauds over 50M transactions a day usin...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Query Pulsar Streams using Apache Flink
Putting Kafka Together with the Best of Google Cloud Platform
Kafka on Kubernetes—From Evaluation to Production at Intuit
Lessons from managing a Pulsar cluster (Nutanix)
Deploying Kafka Streams Applications with Docker and Kubernetes
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Architecture of a Kafka camus infrastructure
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Ad

Similar to Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messaging Service (20)

PDF
I can't believe it's not a queue: Kafka and Spring
PDF
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
PDF
Enabling Data Scientists to easily create and own Kafka Consumers
PDF
Updating materialized views and caches using kafka
PPTX
Exactly-once Stream Processing with Kafka Streams
PDF
Introduction to Kafka Streams
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
PPTX
Apache Kafka
PDF
Integration for real-time Kafka SQL
PDF
Uber Real Time Data Analytics
PDF
Exactly-once Data Processing with Kafka Streams - July 27, 2017
PPTX
Data Pipeline at Tapad
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
PDF
Kafka Workshop
PDF
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
PDF
Introduction to apache kafka
I can't believe it's not a queue: Kafka and Spring
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers
Updating materialized views and caches using kafka
Exactly-once Stream Processing with Kafka Streams
Introduction to Kafka Streams
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Scalable Message-Processing and more !
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Apache Kafka
Integration for real-time Kafka SQL
Uber Real Time Data Analytics
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Data Pipeline at Tapad
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Kafka Workshop
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
From a Kafkaesque Story to The Promised Land at LivePerson
Introduction to apache kafka
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
System and Network Administraation Chapter 3
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
medical staffing services at VALiNTRY
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administration Chapter 2
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
ai tools demonstartion for schools and inter college
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
System and Network Administraation Chapter 3
Navsoft: AI-Powered Business Solutions & Custom Software Development
Odoo Companies in India – Driving Business Transformation.pdf
Nekopoi APK 2025 free lastest update
PTS Company Brochure 2025 (1).pdf.......
medical staffing services at VALiNTRY
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Operating system designcfffgfgggggggvggggggggg
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administration Chapter 2
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
ai tools demonstartion for schools and inter college
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03

Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messaging Service

  • 1. @allenxwang MultiCluster, MultiTenant and Hierarchical Kafka Messaging Service Allen Wang
  • 2. Growing Pains for A Kafka Cluster ● A dozen brokers, handful topics, tens of partitions ○ Wonderful! ● Tens of brokers, tens of topics, hundreds of partitions ○ Life is good!
  • 3. ● A hundred brokers, a hundred topics, thousands of partitions ○ … OK ● Hundreds of brokers, hundreds of topics, one hundred thousand partitions ○ ???
  • 4. Why Huge Kafka Cluster Does Not Work ● Significant time increase on operations ○ Rolling restart/binary update ■ Three minutes per broker, 500 brokers = 1 whole day ○ Rolling AMI (image) update with data copying ■ One hour per broker, 500 brokers = 20 days
  • 5. ● Increased latency due to number of partitions ○ https://www.confluent.io/blog/how-to-choose-the-number -of-topicspartitions-in-a-kafka-cluster/ ● Vulnerability to ZK/Controller failures
  • 6. Scaling and Data Balancing Challenge ● The problem with partition reassignment ○ Time consuming ○ Replication traffic taking bandwidth ○ Complexity of bin packing for data balancing
  • 8. BytesOut = (numberOfConsumers + replicationFactor - 1) ✕ BytesIn ● A single cluster may easily fit for bytes in, but not necessarily for bytes out
  • 9. Solve Consumer Fan-out with Hierarchies
  • 11. The Idea ● Create many small and mostly “immutable” clusters ● Organize them in a topology with routing service connecting the clusters
  • 12. Multi-Cluster Kafka Service At Netflix Router (w/ simple ETL) Fronting Kafka Event Producer Consumer Kafka Management HTTP PROXY Consumers
  • 13. MultiCluster Producers ● Support producing to multiple clusters at the same time ● High level producer API implemented by multiple embedded Kafka producers public interface KsProducer<V> { // ... <T extends V> CompletableFuture<SendResult> send(T obj) }
  • 14. ● Dynamic topic to cluster mapping { "t1, t2" : { "where" : [{ "sink" : "fronting-kafka-1" }] }, "t3" : { "where" : [{ "sink" : "fronting-kafka-2" }] }, "__default__" : { "where" : [ { "sink" : "fronting-kafka-2" }] } }
  • 15. @Stream("foo") // send to topic “foo” public class Foo { // ... } @Stream("bar") // send to topic “bar” public class Bar { // ... } KsProducer<Object> producer = // … producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
  • 16. Fronting Kafka ● For data collection and buffering ● Optimized for producers ○ Only consumers are routers
  • 17. Scaling of Fronting Kafka ● Creating / destroying Kafka clusters ○ E.g., create new topic on new clusters and update topic to cluster mapping ● No partition reassignment
  • 18. Data Balancing ● Assign the same number of partitions of any topic to every brokers ○ E.g., for clusters of 12 brokers, create topics with partitions of 12, 24, 36 ○ Guaranteed even distribution of data (aside from occasional leader imbalance) ● Balance data among clusters by moving topics ○ Must dynamically update topic to cluster mapping
  • 20. Consumer Kafka ● Optimized for consumers ○ Only producers are the routers ● Scaling ○ Add brokers and partitions for small cluster ○ Create new cluster, update routing and move consumers
  • 21. Future Plan ● Cross-cluster topic ○ load sharing beyond single cluster ○ Auto-scale ○ Consumer/producer support needed
  • 22. Multi-Cluster Consumer (Ongoing work) ● Same Kafka consumer interface ● Consume from multiple clusters with dynamic topic to cluster mapping ○ Keep subscription state ○ Receive mapping updates ○ Create and delegate to underlying Kafka consumer for each associated cluster on the fly
  • 23. Multi-Cluster Consumer Topic to Cluster Mapping and Code Example { "foo": [ {"vip": "consumerKafka1"}, {"vip": "consumerKafka2"} ], “bar”: [ {“vip”: “consumerKafka3”} ] } // Create a multi-cluster consumer Consumer<String, String> multiClusterConsumer = ... // subscribe as usual and keep subscription state consumer.subscribe(new ArrayList<String>(“foo”)); while (...) { // fetch from both clusters for topic “foo” and // return the aggregated records ConsumerRecords<String, String> records = consumer.poll(2000); process(records); }
  • 24. Topic move for Multi-cluster Consumers Multi-cluster Consumer Producer “foo”: “cluster1” “foo”: [“cluster1”] “foo”: “cluster2” “foo”: [“cluster1”, “cluster2] “foo”: [“cluster2”] cluster1 cluster2
  • 26. What About Keyed Messages ● Few topics requiring keyed messages ● Concerns for keyed messages ○ Inflexible/skewed load balancing ○ Difficult to scale ● Handling of keyed messages ○ Currently only produced by routers to consumer Kafka ○ Loose ordering guarantee ○ Strict key-consumer affinity guarantee
  • 27. Think Differently on Scaling Kafka The “broker” way The “cluster” way Scale up Add brokers Add clusters Data balance Move partitions to different brokers Move/expand topics to different clusters Producer Produce to different brokers at the same time Produce to different clusters at the same time Consumer Consume from different brokers at the same time Consume from different clusters at the same time