SlideShare a Scribd company logo
Apache Kafka
Stream processing made easy
Apache kafka
Streams 101
An introduction to stream
processing
Transformation of
a stream of data fragments
into a continuous flow of
information
Stream Processing
Get real-time insights
Lower processing latency
Easier to test
Easier to maintain
Easier to scale
Different way of thinking
At-least-once vs exactly-
once
Time
+ -
Every company is already
doing stream processing
(more or less ... )
A Stream
Key 1 -> value 1
Key 2 -> value 2
Key 1 -> value 3
...
A Table
+-------+---------+
| Key 1 | value 3 |
| Key 2 | value 2 |
+-------+---------+
A Table through time
+-------+---------+
| Key 1 | value 1 |
+-------+---------+
Timestamp 1
+-------+---------+
| Key 1 | value 1 |
| Key 2 | value 2 |
+-------+---------+
Timestamp 2
+-------+---------+
| Key 1 | value 3 |
| Key 2 | value 2 |
+-------+---------+
Timestamp 3
Timestamp ...
Let’s remove the redundancy
A Table through time as SETs
SET(key1 -> value1)
Timestamp 1
SET(key2 -> value2)
Timestamp 2
SET(key 1 -> value3)
Timestamp 3
Timestamp ...
SET(key1 -> value1)
SET(key2 -> value2)
SET(key1 -> value3)
Changelog
key1 -> value1
key2 -> value2
key1 -> value3
Stream
Tables are
materialized views
of streams
Why is this important?
Events used to manipulate core data.
Today events are our core data
Daan Gerits, 2012
Every stream process app is a
combination of state and streams
Streaming vs batch
is like
agile vs waterfall
but then for data.
Kafka
A Stream Processing Platform
Kafka
Proxy
Kafka
Streams
Kafka
Connect
Kafka
Security
Schema
Repo
Kafka
Kafka Platform
Streams and Connect apps are just (java) apps
Streams and Connect are libraries
Can be deployed like any other (java) app
Multiple instances of the same app can be launched
Use tools like Mesos, kubernetes, Docker Swarm, ...
Batch Microbatch
Flink
Kafka
Spark
Storm / Heron
Event
Build apps, not Jobs
Kafka
Proxy
Kafka
Streams
Kafka
Connect
Kafka
Security
Schema
Repo
Kafka
Kafka Engine
A message broker with a twist
Kafka Engine
Producer ConsumerTopic
message message
Producer
Producer Consumer
Consumer
Kafka Engine
Producer ConsumerTopic
message message
Producer
Producer Consumer
Consumer
Messages
Contain byte arrays
Have a
Timestamp
Key
Value
Topics
Are more like datastores
Uses disk instead of memory
Retains the messages
Are partitioned and replicated
Wait?? … Disk??
Sequential disk access is fast*
* Don’t believe me? Read http://kafka.apache.org/documentation#persistence
Producer
Puts messages onto kafka
Determines the partition to write to
Can be implemented in many, many languages
Consumer
Gets messages from kafka
Can be grouped into Consumer Groups
Allows for round robin message delivery
Enables scaling of consumers
Have a persisted offset per Consumer Group
Stored in Zookeeper
Or in Kafka
Kafka Engine
Producer Consumer
Topic
Partition B
Producer Consumer
Topic
Partition A
100 000 msg/sec
On a barely tweaked, 3 node cluster
2 000 000 msg/sec
On a heavily tweaked cluster
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Kafka Connect
Getting data in and out
A
Simple and scalable
way to get
data in and out
of topics
Kafka Connect
Datasource Topic
Kafka
Connect
Topic Datasink
Kafka
Connect
Or
Kafka Connect
Datasource Topic
Kafka
Connect
Kafka
Connect
Kafka
Connect
Kafka Connect
MySQL ⬢ Salesforce ⬢ Redis ⬢ MQTT ⬢
InfluxDB ⬢ RethinkDB ⬢ HBase ⬢ Solr ⬢
Couchbase ⬢ Elasticsearch ⬢ Hazelcast ⬢
Google PubSub ⬢ HDFS ⬢ S3 ⬢ Splunk ⬢
Spooldir ⬢ JDBC ⬢ Syslog ⬢ Cassandra ⬢
Vertica ⬢ DB2 ⬢ Goldengate ⬢ Jenkins ⬢
PredictionIO ⬢ JMS ⬢ Twitter ⬢ Attunity ⬢
MSSQL ⬢ Postgres ⬢ DynamoDB ⬢ IRC ⬢
Kudu ⬢ Ignite ⬢ MongoDB ⬢ Bloomberg
Ticker ⬢ FTP
Kafka Streams
Processing streaming data
Kafka Streams
Topic Topic
Kafka
Streams
Topic
Topic
Kafka Streams
KStream for a stream of data
KTable to keep the latest value for each key
KTable state is distributed across app instances
Transform from streams to tables and tables to streams
Choose which field to use as “timestamp”
TOPIC A
TOPIC B
TOPIC C
Kafka
Connect
App
Kafka
Streams
App
Kafka
Streams
App
Kafka
Connect
App
TOPIC C
TOPIC B
TOPIC A
So how do you build
solutions with this?
Kafka
Kafka
Connect
Kafka
Streams
Kafka
Kafka
Kafka
Connect
TOPIC A
TOPIC B
TOPIC C
Sales JDBC
Kafka
Connect
Top
Products
Ranker
Emailer
TOPIC C
TOPIC B
TOPIC A
Low Stock
Notifier
Kafka
Connect
App
Slack Poster
Proposal

More Related Content

PPTX
Kafka connect-london-meetup-2016
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
PPTX
Kafka presentation
PPTX
Kafka
PDF
Kafka Connect by Datio
PDF
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
PPT
January 2011 HUG: Kafka Presentation
PDF
Introduction to Spark Streaming
Kafka connect-london-meetup-2016
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Kafka presentation
Kafka
Kafka Connect by Datio
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
January 2011 HUG: Kafka Presentation
Introduction to Spark Streaming

What's hot (20)

PPTX
Apache kafka
PDF
Data Integration
PPTX
Apache kafka
PDF
Data Pipeline with Kafka
PDF
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
PDF
Kafka internals
PDF
Introduction to Apache Kafka and Confluent... and why they matter
PPTX
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
PPTX
I Heart Log: Real-time Data and Apache Kafka
PPTX
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
PPTX
Flume vs. kafka
PDF
PDF
Hoodie: How (And Why) We built an analytical datastore on Spark
PPTX
Apache Kafka at LinkedIn
PPTX
kafka for db as postgres
PPTX
Data Pipeline at Tapad
PPTX
Architecture of a Kafka camus infrastructure
PPTX
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
PPTX
Bullet: A Real Time Data Query Engine
PPTX
Current and Future of Apache Kafka
Apache kafka
Data Integration
Apache kafka
Data Pipeline with Kafka
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Kafka internals
Introduction to Apache Kafka and Confluent... and why they matter
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
I Heart Log: Real-time Data and Apache Kafka
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Flume vs. kafka
Hoodie: How (And Why) We built an analytical datastore on Spark
Apache Kafka at LinkedIn
kafka for db as postgres
Data Pipeline at Tapad
Architecture of a Kafka camus infrastructure
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
Bullet: A Real Time Data Query Engine
Current and Future of Apache Kafka
Ad

Similar to Apache kafka (20)

PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
PDF
Devoxx university - Kafka de haut en bas
PPTX
Introduction to Kafka Streams Presentation
PDF
What is Apache Kafka and What is an Event Streaming Platform?
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Streaming Solutions for Real time problems
PDF
Kafka Connect and Streams (Concepts, Architecture, Features)
PDF
How to Build Streaming Apps with Confluent II
ODP
Stream processing using Kafka
PPTX
Kafka Streams for Java enthusiasts
PPTX
Streaming Data and Stream Processing with Apache Kafka
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PDF
Build real time stream processing applications using Apache Kafka
PDF
Building Streaming Data Applications Using Apache Kafka
PDF
Concepts and Patterns for Streaming Services with Kafka
PDF
Apache Kafka Scalable Message Processing and more!
PDF
Kafka Vienna Meetup 020719
PDF
KSQL - Stream Processing simplified!
PDF
JHipster conf 2019 - Kafka Ecosystem
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
Devoxx university - Kafka de haut en bas
Introduction to Kafka Streams Presentation
What is Apache Kafka and What is an Event Streaming Platform?
Introduction to apache kafka, confluent and why they matter
Streaming Solutions for Real time problems
Kafka Connect and Streams (Concepts, Architecture, Features)
How to Build Streaming Apps with Confluent II
Stream processing using Kafka
Kafka Streams for Java enthusiasts
Streaming Data and Stream Processing with Apache Kafka
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Build real time stream processing applications using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Concepts and Patterns for Streaming Services with Kafka
Apache Kafka Scalable Message Processing and more!
Kafka Vienna Meetup 020719
KSQL - Stream Processing simplified!
JHipster conf 2019 - Kafka Ecosystem
Ad

More from Daan Gerits (6)

PDF
Big Data BluePrint
PPTX
BigBoards.io Strata Ignite
PDF
IoT and BigData
PDF
Big data architectures
PDF
Start small bigger biggest
PDF
Big data, why care
Big Data BluePrint
BigBoards.io Strata Ignite
IoT and BigData
Big data architectures
Start small bigger biggest
Big data, why care

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Quality review (1)_presentation of this 21
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to machine learning and Linear Models
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Quality review (1)_presentation of this 21
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Business Ppt On Nestle.pptx huunnnhhgfvu
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to machine learning and Linear Models
ISS -ESG Data flows What is ESG and HowHow
Qualitative Qantitative and Mixed Methods.pptx
Mega Projects Data Mega Projects Data
Clinical guidelines as a resource for EBP(1).pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IB Computer Science - Internal Assessment.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
1_Introduction to advance data techniques.pptx

Apache kafka

Editor's Notes

  • #35: 1kb - 2kb per message, 3 node cluster with 32Gb RAM and dual quad CPU, tested at a customer
  • #36: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines