Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang

Balance Kafka Cluster with Zero Data Movement
Yaodong Yang (Apple), Haochen Li (Apple)

Yaodong Yang, Apple Inc. May, 2023
Haochen Li, Apple Inc. NOT A CONTRIBUTION
Balance Kafka Cluster with Zero
Data Movement

Kafka Cluster Load Balancing
• Bene
fi
ts
• High Performance
• Cost E
ffi
ciency
• Determining Factors
• Kafka Partition Placement
• Kafka Partition Access Pattern
• Challenges
• Kafka Partitions are Heterogenous
• Storage Retention Requirement
• Produce & Consume Tra
ffi
c Pattern

Current Solution
• Continuously rebalance Kafka cluster based on Load Metrics
• collect the load metrics from Kafka
• generate the cluster load model
• compute the optimization proposal
• execute the proposal
• Overhead
• data movement between di
ff
erent brokers
• negative impact for producers and consumers
• long time to
fi
nish (hours or even days)
• infra cost

Data Ingestion Use Case
• Workload Pattern
• Data events are randomly assigned
to partitions from the kafka topic
• All partitions from one topic are
consumed evenly
• Kafka producers and consumers
don’t have strict requirement for
Kafka Partition Count
• Kafka Partitions from the same topic
• Same data volumes produced,
consumed and retained

Kafka Partition Replica Placement
• Partition Replica Placement Strategy
• Partition Count
• scale_number: Number of Leader Replica per broker
for a topic
• partition_count = scale_number * broker_count
• Partition Replica Placement
• For every Kafka Topic, the number of Replicas in each
broker should be the same.
• For every Kafka Topic, the number of Leader Replicas in
each broker should be the same.
• Same load on individual Kafka Brokers.
• Same hardware utilization on individual Kafka brokers
• CPU
• Storage Volume
• Network

Scenarios
• New Topic Creation
• Generate the Replica Assignment for the new topic
• Create the topic in the Kafka cluster with the above Replica Assignment

Scenarios
• Increase Partition Count: scale_number increase
• Generate the Replica Assignment for the new partitions
• Create partitions in the Kafka cluster with the above Replica Assignment

Scenarios
• Add more brokers
• Generate the Replica Assignment for partitions in new brokers
• Create partitions in the Kafka cluster with the above Replica Assignment

Scenarios
• Ingestion tra
ffi
c volume and retention changes
• no impact on the load balance of Kafka Cluster
• Remove some brokers
• data movement is unavoidable
• avoid it with cluster migration if possible
• Cluster Migration & Merge
• rebalance the cluster:
• partition reassignment
• scale_number increase

Implementation
• Current
• Implemented as a Topic Operator
• Deployed in production
• Plan
• Open a KIP in Apache Kafka Project
• Contribute back to upstream

Take Away
• Partition Placement Strategy can greatly improve the Load Balance of Kafka
Clusters

Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang

More Related Content

Similar to Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang (20)

More from HostedbyConfluent (20)

Recently uploaded (20)

Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang