SlideShare a Scribd company logo
TUGA IT 2017
LISBON, PORTUGAL
CLOUD
PRO
PT
PARTICIPATING
COMMUNITIES
Stream Processing Metamorphosis
A Kafka’s tale
JLVV
“ “Paths are made by walking”
Frank Kafka
JLVV
João Vazão Vasques
Data Engineer at Talkdesk
Scala, Clojure, Ruby, Python
@JoaoVasques
Agenda
Kafka
Motivation
Main concepts
Architecture & Design
Comparison of other message queue systems
Main concepts
Architecture
Community
Metamorphosis
Surprise surprise
JLVV
1
JLVV
Motivation
Why do we need Kafka?
JLVV
Motivation - Why Kafka
Client Backend 1
It all starts like this
JLVV
Motivation - Why Kafka
Client Backend 1
Client Backend 2
JLVV
Motivation - Why Kafka
Client Backend 1
Client
Client
Client
Backend 2
Backend 3
Backend 4
Then it starts to look like this
JLVV
Motivation - Why Kafka
LinkedIn before Kafka
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
JLVV
Motivation - Why Kafka
There must be a better way to
handle this…
JLVV
Motivation - Why Kafka
Source System
Kafka decouples data pipelines
Source System Source System
Real-Time
monitoring
Billing ….
JLVV
Motivation - Why Kafka
Kafka decouples data pipelines
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
LinkedIn after Kafka
JLVV
What is Kafka?
JLVV
A log
JLVV
JLVV
http://cartoonsmix.com/cartoons/cartoon-thinking-face.html
JLVV
1 2 3 4 5 6 7 8
First record
0
Next record
9
“Append-only sequence of records ordered by time”
Jay Kreps
JLVV
JLVV
Key Concepts
JLVV
Concepts
Broker
A Kafka server
JLVV
Concepts
Broker
Topics
Stream of messages of a particular type (e.g. calls, user clicks, likes, page
views, etc..)
JLVV
Concepts
Broker
Topics
Producers
Publish messages to a Kafka topic
JLVV
Concepts
Broker
Topics
Producers
Consumers
Subscribe to a topic and process the published messages
JLVV
Architecture
Broker Broker Broker
Consumer Consumer
Producer Producer
Zookeeper
JLVV
Broker
JLVV
Broker
Node in a cluster running a Kafka
server
JLVV
Broker
Node in a cluster running a Kafka
server
Stores one of more partitions per topic
JLVV
Topics
JLVV
Topics - partitions
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
partition 1
partition 2
partition 3
Old New
• A Kafka topic, T, is split in P partitions
JLVV
Topics - partitions
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
partition 1
partition 2
partition 3
Old New
• A Kafka topic, T, is split in P partitions
• Messages in each partition are identified by sequential id (offset)
JLVV
Topics - partitions
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
partition 1
partition 2
partition 3
Old New
• A Kafka topic, T, is split in P partitions
• Messages in each partition are identified by sequential id (offset)
• Partition is the smallest unit of parallelism in Kafka (ordering)
JLVV
Topics - partitions
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
partition 1
partition 2
partition 3
Old New
• A Kafka topic, T, is split in P partitions
• Messages in each partition are identified by sequential id (offset)
• Partition is the smallest unit of parallelism in Kafka (ordering)
• Each message in a topic has a time-based retention policy
JLVV
Topics - Replication
• Topics should be replicated
JLVV
Topics - Replication
• Topics should be replicated
• Each partition has 1 leader and 0 or more replicas
JLVV
Topics - Replication
• Topics should be replicated
• Each partition has 1 leader and 0 or more replicas
• Replica is in-sync if:
• can communicate with Zookeeper
• Not far behind the leader
JLVV
Topics - Replication
• Topics should be replicated
• Each partition has 1 leader and 0 or more replicas
• Replica is in-sync if:
• can communicate with Zookeeper
• Not far behind the leader
• Replication factor cannot be lowered
JLVV
Topics - Log compaction
• Kafka will always retain at least the last known value for each
message key within the log of data for a single topic partition
JLVV
Topics - Log compaction
• Kafka will always retain at least the last known value for each
message key within the log of data for a single topic partition
JLVV
Producers
JLVV
Producers
◦ Publish messages to a partition in a topic
JLVV
1 2 3 4 5 6 7
Producers
◦ Publish messages to a partition in a topic
JLVV
1 2 3 4 5 6 7 8
Producers
◦ Publish messages to a partition in a topic
◦ Load balancing (using partitions)
◦ round robin (if no key)
JLVV
1 2 3 4 5 6 7 8
1 2 3 4 5
1 2 3 4 5 6
partition 1
partition 2
partition 3
Old New
Producers
◦ Publish messages to a partition in a topic
◦ Load balancing (using partitions)
◦ round robin (if no key)
◦ Hash function based on key & number of partitions (if key)
JLVV
Producers
◦ Publish messages to a partition in a topic
◦ Load balancing (using partitions)
◦ round robin (if no key)
◦ Hash function based on key and number of partitions
◦ Replication
Acks Durability Latency
0 some data loss no latency
1 (wait for leader) a few data loss 1 network roundtrip
B (all brokers) no data loss B -1 network roundtrips
JLVV
Consumers
JLVV
Consumers
◦ Multiple consumers can read from the same topic (but from
a single partition)
23
24
25
26
27
28
Consumer
Consumer
Consumer
Fetch
Fetch
Fetch
JLVV
Consumers
◦ Multiple consumers can read from the same topic (but from
a single partition)
◦ Messages stay in Kafka after they’re consumed
23
24
25
26
27
28
Consumer
Consumer
Consumer
Fetch
Fetch
Fetch
JLVV
Consumers
◦ Multiple consumers can read from the same topic (but from
a single partition)
◦ Messages stay in Kafka after they’re consumed
◦ Consumers can go away
23
24
25
26
27
28 Consumer
Consumer
Fetch
Fetch
JLVV
Consumers
◦ Multiple consumers can read from the same topic (but from
a single partition)
◦ Messages stay in Kafka after they’re consumed
◦ Consumers can go away… and them come back
23
24
25
26
27
28 Consumer
Consumer
Fetch
Fetch
Consumer
Fetch
JLVV
Consumer groups
◦ Consumers can be organised into consumer groups. Each
group has a coordinator
JLVV
Consumer groups
◦ Consumers can be organised into consumer groups
Pattern I
All consumer
instances in one
group.
Acts like a queue
with load balancing
JLVV
Consumer groups
◦ Consumers can be organised into consumer groups
Pattern II
All consumers
instances in
different groups.
All messages are
broadcast to all
consumer
instances
Pattern I
All consumer
instances in one
group.
Acts like a queue
with load balancing
JLVV
Consumer groups
◦ Consumers can be organised into consumer groups
Pattern II
All consumers
instances in
different groups.
All messages are
broadcast to all
consumer
instances
Pattern III
Many consumer
instances in a
group.
Added for
scalability and fault
tolerance. Each
reads from one or
more partitions of a
topic.
Pattern I
All consumer
instances in one
group.
Acts like a queue
with load balancing
JLVV
Comparing with other
systems
JLVV
Comparison with other systems
Let’s compare Kafka with another very popular messaging system
JLVV
Comparing Kafka and RabbitMQ
System Language Rate Routing Tooling Reprocessing
Slow
consumers?
Kafka
Scala +
Java
~100K/sec No Decent Yes Ok
Rabbit Erlang ~20K/sec Yes Very good No Hell
JLVV
Metamorphosis
JLVV
Why Kafka is like arriving to Mars
Lambda Architecture
JLVV
JLVV
The rise of streaming architectures
Kappa Architecture
• Pure streaming system (no batch layer a la Lambda)

• Stream + retention (Kafka) —> Real Time —> Serving

• Streaming system:
• Kafka Streams
• Flink
• Spark Streaming
• Storm
http://redmonk.com
JLVV
1012 events per day
20M events/sec
Numbers of April 2017
3 Petabytes data per day
At peak
JLVV
Kafka’s Metamorphosis
Scale Efficiency
ReplayabilityEcosystem
JLVV
Thanks!
ANY QUESTIONS?
You can find me at
@JoaoVasques
THANK YOU
TO OUR
SPONSORS
PLATINUM
GOLD SILVER

More Related Content

PPTX
Introduction to Apache Kafka
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
PPTX
Apache kafka
PPTX
Fundamentals and Architecture of Apache Kafka
PPTX
Apache Kafka: Next Generation Distributed Messaging System
PPTX
Tuning kafka pipelines
PDF
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
PDF
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
Introduction to Apache Kafka
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Apache kafka
Fundamentals and Architecture of Apache Kafka
Apache Kafka: Next Generation Distributed Messaging System
Tuning kafka pipelines
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...

What's hot (20)

PDF
RabbitMQ fairly-indepth
PDF
Apache Kafka - Free Friday
PDF
Kafka Overview
PDF
Apache Kafka Introduction
PPTX
Kafka 101
PDF
Kafka on Pulsar
PPTX
High powered messaging with RabbitMQ
PPTX
Apache kafka
PDF
Apache Kafka - Martin Podval
PDF
Messaging Standards and Systems - AMQP & RabbitMQ
PPT
Kafka goutam chowdhury-unicom-spark kafka-summit
PDF
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
PPTX
A tour of Java and the JVM
PDF
Architecture | The Future of Messaging: RabbitMQ and AMQP | Eberhard Wolff
PPTX
Apache Kafka
PPTX
Data Pipelines with Kafka Connect
PDF
5 things you need to know about the Scala compiler
PPTX
Introducing Exactly Once Semantics To Apache Kafka
PPTX
Rabbitmq & Kafka Presentation
PDF
Messaging Standards and Systems - AMQP & RabbitMQ
RabbitMQ fairly-indepth
Apache Kafka - Free Friday
Kafka Overview
Apache Kafka Introduction
Kafka 101
Kafka on Pulsar
High powered messaging with RabbitMQ
Apache kafka
Apache Kafka - Martin Podval
Messaging Standards and Systems - AMQP & RabbitMQ
Kafka goutam chowdhury-unicom-spark kafka-summit
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
A tour of Java and the JVM
Architecture | The Future of Messaging: RabbitMQ and AMQP | Eberhard Wolff
Apache Kafka
Data Pipelines with Kafka Connect
5 things you need to know about the Scala compiler
Introducing Exactly Once Semantics To Apache Kafka
Rabbitmq & Kafka Presentation
Messaging Standards and Systems - AMQP & RabbitMQ
Ad

Similar to Stream Processing Metamorphosis - A Kafka's tale (20)

PPTX
Kafka Introduction.pptx
PDF
PDF
Kafka syed academy_v1_introduction
PPTX
Notes leo kafka
PPTX
Kafkha real time analytics platform.pptx
PDF
Introduction_to_Kafka - A brief Overview.pdf
PDF
Data Pipelines with Apache Kafka
PPTX
Introduction to Kafka Streams Presentation
PPTX
Distributed messaging with Apache Kafka
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
PDF
An Introduction to Apache Kafka
PDF
Kafka for begginer
PPTX
Apache Kafka
PPTX
Apache Kafka - Messaging System Overview
PDF
Kafka 10000 feet view
PPTX
Introduction to Kafka
PPTX
Kafka
PPTX
kafka for db as postgres
DOCX
Fundamentals of Apache Kafka
Kafka Introduction.pptx
Kafka syed academy_v1_introduction
Notes leo kafka
Kafkha real time analytics platform.pptx
Introduction_to_Kafka - A brief Overview.pdf
Data Pipelines with Apache Kafka
Introduction to Kafka Streams Presentation
Distributed messaging with Apache Kafka
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
An Introduction to Apache Kafka
Kafka for begginer
Apache Kafka
Apache Kafka - Messaging System Overview
Kafka 10000 feet view
Introduction to Kafka
Kafka
kafka for db as postgres
Fundamentals of Apache Kafka
Ad

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to machine learning and Linear Models
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IB Computer Science - Internal Assessment.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to machine learning and Linear Models
Business Acumen Training GuidePresentation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
ISS -ESG Data flows What is ESG and HowHow
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

Stream Processing Metamorphosis - A Kafka's tale