SlideShare a Scribd company logo
Balance Kafka Cluster with Zero Data Movement
Yaodong Yang (Apple), Haochen Li (Apple)
Yaodong Yang, Apple Inc. May, 2023
Haochen Li, Apple Inc. NOT A CONTRIBUTION
Balance Kafka Cluster with Zero
Data Movement
Kafka Cluster Load Balancing
• Bene
fi
ts
• High Performance
• Cost E
ffi
ciency
• Determining Factors
• Kafka Partition Placement
• Kafka Partition Access Pattern
• Challenges
• Kafka Partitions are Heterogenous
• Storage Retention Requirement
• Produce & Consume Tra
ffi
c Pattern
Current Solution
• Continuously rebalance Kafka cluster based on Load Metrics
• collect the load metrics from Kafka
• generate the cluster load model
• compute the optimization proposal
• execute the proposal
• Overhead
• data movement between di
ff
erent brokers
• negative impact for producers and consumers
• long time to
fi
nish (hours or even days)
• infra cost
Data Ingestion Use Case
• Workload Pattern
• Data events are randomly assigned
to partitions from the kafka topic
• All partitions from one topic are
consumed evenly
• Kafka producers and consumers
don’t have strict requirement for
Kafka Partition Count
• Kafka Partitions from the same topic
• Same data volumes produced,
consumed and retained
Kafka Partition Replica Placement
• Partition Replica Placement Strategy
• Partition Count
• scale_number: Number of Leader Replica per broker
for a topic
• partition_count = scale_number * broker_count
• Partition Replica Placement
• For every Kafka Topic, the number of Replicas in each
broker should be the same.
• For every Kafka Topic, the number of Leader Replicas in
each broker should be the same.
• Same load on individual Kafka Brokers.
• Same hardware utilization on individual Kafka brokers
• CPU
• Storage Volume
• Network
Scenarios
• New Topic Creation
• Generate the Replica Assignment for the new topic
• Create the topic in the Kafka cluster with the above Replica Assignment
Scenarios
• Increase Partition Count: scale_number increase
• Generate the Replica Assignment for the new partitions
• Create partitions in the Kafka cluster with the above Replica Assignment
Scenarios
• Add more brokers
• Generate the Replica Assignment for partitions in new brokers
• Create partitions in the Kafka cluster with the above Replica Assignment
Scenarios
• Ingestion tra
ffi
c volume and retention changes
• no impact on the load balance of Kafka Cluster
• Remove some brokers
• data movement is unavoidable
• avoid it with cluster migration if possible
• Cluster Migration & Merge
• rebalance the cluster:
• partition reassignment
• scale_number increase
Implementation
• Current
• Implemented as a Topic Operator
• Deployed in production
• Plan
• Open a KIP in Apache Kafka Project
• Contribute back to upstream
Take Away
• Partition Placement Strategy can greatly improve the Load Balance of Kafka
Clusters
Thank you!

More Related Content

PDF
Geecon.cz 2015 debski krzysztof
PDF
Removing performance bottlenecks with Kafka Monitoring and topic configuration
PPTX
Kafka RealTime Streaming
PDF
Introduction to Apache Kafka
PDF
Building zero data loss pipelines with apache kafka
PDF
Kafka Technical Overview
ODP
Kafka aws
PDF
JDD2015: Make your world event driven - Krzysztof Dębski
Geecon.cz 2015 debski krzysztof
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Kafka RealTime Streaming
Introduction to Apache Kafka
Building zero data loss pipelines with apache kafka
Kafka Technical Overview
Kafka aws
JDD2015: Make your world event driven - Krzysztof Dębski

Similar to Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang (20)

PPTX
Streaming in Practice - Putting Apache Kafka in Production
PDF
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
PDF
Kafka zero to hero
PDF
Apache Kafka - From zero to hero
PDF
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
PDF
PPTX
Getting Started with Kafka on k8s
PPTX
Autonomous workload rebalancing in kafka
PDF
Kafka internals
PDF
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction with El...
PPTX
Velocity 2019 - Kafka Operations Deep Dive
PDF
Cruise Control: Effortless management of Kafka clusters
PDF
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
PDF
Java zone 2015 How to make life with kafka easier.
PDF
Balance Your Data Across Apache Kafka Partitions With Olena Kutsenko | Curren...
PDF
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
PDF
Introduction to apache kafka
PDF
Kafka in action - Tech Talk - Paytm
Streaming in Practice - Putting Apache Kafka in Production
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Kafka zero to hero
Apache Kafka - From zero to hero
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Getting Started with Kafka on k8s
Autonomous workload rebalancing in kafka
Kafka internals
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction with El...
Velocity 2019 - Kafka Operations Deep Dive
Cruise Control: Effortless management of Kafka clusters
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Java zone 2015 How to make life with kafka easier.
Balance Your Data Across Apache Kafka Partitions With Olena Kutsenko | Curren...
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Deploying Kafka Streams Applications with Docker and Kubernetes
Introduction to apache kafka
Kafka in action - Tech Talk - Paytm
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Ad

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
project resource management chapter-09.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
August Patch Tuesday
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Hybrid model detection and classification of lung cancer
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
OMC Textile Division Presentation 2021.pptx
Heart disease approach using modified random forest and particle swarm optimi...
A novel scalable deep ensemble learning framework for big data classification...
A comparative study of natural language inference in Swahili using monolingua...
project resource management chapter-09.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Enhancing emotion recognition model for a student engagement use case through...
MIND Revenue Release Quarter 2 2025 Press Release
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
August Patch Tuesday
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
1 - Historical Antecedents, Social Consideration.pdf
WOOl fibre morphology and structure.pdf for textiles
Hybrid model detection and classification of lung cancer
Unlocking AI with Model Context Protocol (MCP)
Zenith AI: Advanced Artificial Intelligence
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...

Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang

  • 1. Balance Kafka Cluster with Zero Data Movement Yaodong Yang (Apple), Haochen Li (Apple)
  • 2. Yaodong Yang, Apple Inc. May, 2023 Haochen Li, Apple Inc. NOT A CONTRIBUTION Balance Kafka Cluster with Zero Data Movement
  • 3. Kafka Cluster Load Balancing • Bene fi ts • High Performance • Cost E ffi ciency • Determining Factors • Kafka Partition Placement • Kafka Partition Access Pattern • Challenges • Kafka Partitions are Heterogenous • Storage Retention Requirement • Produce & Consume Tra ffi c Pattern
  • 4. Current Solution • Continuously rebalance Kafka cluster based on Load Metrics • collect the load metrics from Kafka • generate the cluster load model • compute the optimization proposal • execute the proposal • Overhead • data movement between di ff erent brokers • negative impact for producers and consumers • long time to fi nish (hours or even days) • infra cost
  • 5. Data Ingestion Use Case • Workload Pattern • Data events are randomly assigned to partitions from the kafka topic • All partitions from one topic are consumed evenly • Kafka producers and consumers don’t have strict requirement for Kafka Partition Count • Kafka Partitions from the same topic • Same data volumes produced, consumed and retained
  • 6. Kafka Partition Replica Placement • Partition Replica Placement Strategy • Partition Count • scale_number: Number of Leader Replica per broker for a topic • partition_count = scale_number * broker_count • Partition Replica Placement • For every Kafka Topic, the number of Replicas in each broker should be the same. • For every Kafka Topic, the number of Leader Replicas in each broker should be the same. • Same load on individual Kafka Brokers. • Same hardware utilization on individual Kafka brokers • CPU • Storage Volume • Network
  • 7. Scenarios • New Topic Creation • Generate the Replica Assignment for the new topic • Create the topic in the Kafka cluster with the above Replica Assignment
  • 8. Scenarios • Increase Partition Count: scale_number increase • Generate the Replica Assignment for the new partitions • Create partitions in the Kafka cluster with the above Replica Assignment
  • 9. Scenarios • Add more brokers • Generate the Replica Assignment for partitions in new brokers • Create partitions in the Kafka cluster with the above Replica Assignment
  • 10. Scenarios • Ingestion tra ffi c volume and retention changes • no impact on the load balance of Kafka Cluster • Remove some brokers • data movement is unavoidable • avoid it with cluster migration if possible • Cluster Migration & Merge • rebalance the cluster: • partition reassignment • scale_number increase
  • 11. Implementation • Current • Implemented as a Topic Operator • Deployed in production • Plan • Open a KIP in Apache Kafka Project • Contribute back to upstream
  • 12. Take Away • Partition Placement Strategy can greatly improve the Load Balance of Kafka Clusters