Kafka internals

Kafka internals
David Gruzman, www.nestlogic.com

What is Kafka?
Why it is so interesting? Is it just "yet another
queue" with better performance?
It is not queue, although can be used in that
sense. Lets look on it as a database / storage
technology

Data model
● Ordered, Partitioned collection of Key-
Values.
● Key is optional
● Values - opaque

Role of the broker
Broker is handling read/writes
It forward messages for replication
It does compactions on its own logs replica
(without influence to any other copies)

Role of the controller
Handling “cluster wide events” sent by Zookeeper (all in
sync with Zookeeper registries)
● Brokers list change (registration, failure)
● Leaders election
● Change in topics (deleted, added, num of partitions
changed)
● Track partitions replica

Zookeeper role
- Kafka controller registration
- List of topics and partitions
- Partition states
- Brokers registration (id, host, port)
- Consumer registration & Subscriptions.

Partitions
- Each partition has its leader broker and N followers.
- Consumers and producers works with leaders only.
- Partition is the main mechanism of a scale, within the
topic.
- Producer specify the target partition via Partitioner
implementation (balancing within available topic
partitions)

Access Pattern
● Writers write data in massive streams. Data
is already "ordered". (This ordering is re-
used)
● Readers consume data, each one from
some position, sequentially.

Write path
Producer
broker
broker
broker
Local Storage Local Storage Local Storage
Elected leader FollowersFollowers

Read path
Read always happens via partition leader.
Kafka helps to balance consumers within the
group. Each topic’s partition could be read by
single consumer at a time to avoid
simultaneous read of the same message by
several consumers within the same group.

Data transfer efficiency
1. Sequential disk access - optimal disk
utilization
2. Zero copy - save CPU cycles.
3. Compression - save network bandwidth.

Compression
Up to 0.7 Kafka - special kind of compressed
messages was handled by clients (producer
and consumer parts), the Broker was not
involved.
Starting 8.1, Kafka broker repackages
messages in order to support logical offsets.

Indexing
- When data flows into the broker, it is being
“indexed”.
- Index files are stored alongside “segments”.
- Segments are files with the data.

Consumer API Levels
● Low level API : work with partitions and
offsets
● High level API : Work with topics, automatic
offset management, load balancing.
Can be rephrased as
● Low level API : Database
● High level : Queue

Offset management
Prior to 0.81 it was pure Zookeeper
responsibility to hold offsets metadata.
Starting from 0.81 - there is special offset
manager service. It runs with Broker, use
special topic to store offsets and also do in-
memory caching as optimization.
We can choose what mechanism to use

Key Compaction
Kafka is capable to store only latest value per
key.
It is not a Queue. It is a table.
This capability enables to store the whole state
(the historical data flow), not only latest X days
(in comparison to auto-deletion approach).

Performance
Why it is so fast?
1. Network and Disk formats of messages are
the same. What is to be done is just append.
2. Local storage is used.
3. No attempts to cache / optimize.

Something big happens
We have new world of needs of real time data
processing.
In many cases - it means streams.
For many years I thought it is just counters to
be calculated before saving data into HDFS for
“real work”. Now I see it quite different.

Kafka as NoSQL
- Sync replication as resilience model
- Single master per partition
- Opaque data
- Compactions
- Optimized for read in the same order as
write was done
- Optimized for massive writes

Compute
Samza, Kafka Streams relation to Kafka is like
MapReduce, Spark relation to HDFS
Kafka became media on top of which we build
computational layers.
Have to be said - no data locality.
Samza, Kafka Streams solve common
problems

State, recovery approach
Both Samza and Kafka streams took approach,
for long time serving RDBMS. Snapshot +
Redo log.
They force stateful stream processing
applications to follow this paradigm.

NestLogic case
What are we doing?
Why do we need Kafka / Spark
How Kafka helped us?

Statistical analysis of data segments

What was a problem
- All data have to be processed. We might
have not enough resources to process
particular - huge - segment.
- Spark shuffle when data is bigger than RAM
is challenging
- We are moving to more “real time” and
streaming.

What we learned - flexibility
● We can re-run “reduce” stage several times.
● Kafka clients could wait for connection to be
reestablished with no timeouts, so we can repair failed
Kafka resource leader, and the job will proceed.
● We can run clusters for map and reduce separately :
flexibility to select their sizes. It saves us some money.
● Now we can have different technologies for Map and
Reduce. We are about to replace map stage
(transformations) with ImpalaToGo

What we learned - cont
More concise resource management.
We can look on size of shuffle data, number of
groups (available from Kafka cluster metadata),
and only than decide on size of “reduce”
cluster.
We can interleave map and reduce stages,
because there is no sorting requirements.

Is it universal solution?
● If you need dozens of different, concurrent
jobs :
Yarn + Spark probably the best
● If you need single job to run smoothly and be
flexible with it - our approach comes into
place

So, what we do?
We help to distinguish your data by its nature,
present it and help to decide what should be
done with each

As data scientists...
We believe that checking statistical
homogeneity of data is very important

As business people...
- Do not count attack as popularity
- Do not count fraud as profit
- Do not count bug as lack of interest
And most important
- Work hard to distinguish all above

As big data experts
It is not simple to achieve. It took a lot of efforts
to get good results, orchestrate operation etc.
We believe - you have better utilization of your
Big data, data science and devops resources.

NestLogic inc
We work hard to help you to
Know your data.

Сontact us
Contact us on info@nestlogic.com
Or in our site www.nestlogic.com

State is in RocksDB
RocksDB was selected. A few quick facts:
1. Developed by Facebook, based on LevelDB
2. Single node
3. C++ library
4. HBase ideas of sorting, snapshot, transaction logs.
5. My speculation - transaction log is what “glue” it with
Kafka streams

Rebalancing - part 1
- One of the brokers is elected as the coordinator for a
subset of the consumer groups. It will be responsible for
triggering rebalancing attempts for certain consumer
groups on consumer group membership changes or
subscribed topic partition changes.
- It will also be responsible for communicating the
resulting partition-consumer ownership configuration to
all consumers of the group undergoing a rebalance
operation.

Rebalancing - part 2
- On startup or on co-ordinator failover, the consumer
sends a ClusterMetadataRequest to any of the brokers
in the "bootstrap.brokers" list. In the request, it receives
the location of the co-ordinator for it's group.
- The consumer sends a RegisterConsumer request to
it's co-ordinator broker. In the response, it receives the
list of topic partitions that it should own.
- At this time, group management is done and the
consumer starts fetching data and (optionally)

Consumer balancing
It is the capability to balance load, fail over between
consumers in the same group.
Kafka consumer communicates with Co-ordinator Broker
for this. Co-ordinator broker info is stored on ZK and is
available from any broker.
These mechanisms are reused in Kafka Streams

Co-ordinator broker, part 1
1. Reads the list of groups it manages and their
membership information from ZK.
2. If discovered membership is alive (as from ZK), waits for
consumers in each of the groups to re-register with it.
3. Does failure detection for all consumers in a group.
Consumers marked as dead by the co-ordinator's failure
detection protocol are removed from the group and the
co-ordinator marks the rebalance for a group completed
by communicating the new partition ownership to the

Co-ordinator broker, part 2
4. The co-ordinator tracks the changes to topic partition
changes for all topics that any consumer group has
registered interest for. If it detects a new partition for any
topic, it triggers a rebalance operation (killing consumers
socket connection with itself). The creation of new topics
can also trigger a rebalance operation as consumers can
register for topics before they are created

Log compaction
Any read from offset 0 to any offset Q where Q > P that
completes in less than a configurable SLA will see the final
state of all keys as of time Q. Log head is always a single
segment (default 1GB)

Material used
About compression : http://www.confluent.
io/blog/compression-in-apache-kafka-is-now-
34-percent-faster
Change in message offsets : https://cwiki.
apache.
org/confluence/display/KAFKA/Keyed+Messag
es+Proposal

Kafka internals

More Related Content

What's hot (20)

Similar to Kafka internals (20)

More from David Groozman (7)

Recently uploaded (20)

Kafka internals