SlideShare a Scribd company logo
Introduction 

to Apache Kafka
Introduction à kafka
About me
- My name is Jonathan Winandy (@ahoy_jon).
- I am a Data pipeline engineer :
- I worked on a “DataLake” !
- I use tools in the larger Java ecosystem like
Java, Scala, Clojure, Hadoop …
- And I am an “entrepreneur”.
> Introduction
I cofounded
We do health care oriented software engineering.
We provide : 

- Coordination for health care professionals.
- “Big health care Data” pipelines.
> Introduction
So let’s talk about Streams
What is a Stream ?
It’s an abstract data structure with the following :
operations :
• append(bytes) -> void?
• readAt(int) -> null | bytes
rule 1 :
∀p ∈ ℕ, for some definition of ‘==‘
x := readAt(p)
y := readAt(p)
x != null => x == y
Rule 1 implies : Infinite cacheability 

once the data is available at a position.
> Theory
Streams are the simplest way
to manage data.
And they are naturally compatible with the perception of information from a
singular observer …
0 1 2 3 4 5 6
> Theory
And we know that since
the XVth
century …
So what happened ?
Memorial Journal Ledger
> History
> History
> Dist systems
@aphyr : Jepsen II Linearizable Boogaloo
Distributed
Systems
Are
HARD
https://www.youtube.com/watch?v=ggCffvKEJmQ
Peter Alvaro - Outwards from the Middle of the Maze
> Dist systems
> Dist systems
=> Idempotence
> Summary
The need of unified log arises ‘quickly’
in apps that manage state (or multiple states)
when they need to do :
- Business Intelligence,
- Notifications,
- Advanced search (secondary indexation),
- ….
But there is a lot of legacy in projets and practices, 

this technique has been regularly “forgotten*”.
> Basic anatomy of Kafka
Topic : _test
But do use that, we need a
bit of coordination
Broker 1
Broker 2
Broker3
ZK
Producer
1. Hello ZK, do you
know where I can find
some brokers ?
2. Ahoy ?
3. Want some data ?
> Producer
Demo
Message acking
for producer (“write concern”)
0 : Here a messa’
1 : If at least you, leader of this partition,
received it and saved it, I am ok.
-1: Hey, I just send you a message,
I know it’s maybe to much to ask,
But are you really sure you saved it ?
Ok, and did all brokers in the “In Sync Replicas
partition did too ?
I now I am … but this information is really imp
Speed
Durability
> Producer
Consumers flatmap that Log
Introduction à kafka
Conclusion
State : A timeless way for failure
Questions ?
Introduction à kafka
Introduction à kafka
Introduction à kafka
https://github.com/bulldog2011/luxun
Kafka as unique properties,
PLEASE : don’t try to use
something else instead.
We should talk about CAP
But CAP is about mutation
And consistency is
a complicated subject
And consistency is
a complicated subject
Introduction à kafka
A quick note on Causality
If you don’t ensure causality for
web apps, some strange
comportements may arise :
Sometimes, as a user, I
cannot see my own “edits”.
Sometimes, as a client, I
cannot buy on the website
after I checkout my basket.
APP APP
“Who is the fastest
between the Data bus
and the client ?”



You don’t want to bet,
especially under load.
Introduction à kafka
Introduction à kafka
Bonus :What is a CAS ?
A Content Adressable Storage is a specific “key
value store” :
operations :
- store(bytes) -> key
- get(key) -> null | bytes
rule 1 :
key = h(data)
h being a cryptographic hash
function like md5 or sha1.
rule 2 :
∀data
get(store(data)) = data
Rule 1 and 2 imply :
Infinite cacheability 

and scalability.
Exemple of architectures
CLASSICAL
APP
APP
DB
APP
APP
append
broadcast
WITH STREAMS
Exemple of architectures
CLASSICAL
APP
REPLICATION
(BIN/LOG)
APP
APP
DB
DB
APP
APP
APP
append
broadcast
WITH STREAMS
The broadcast mechanism is equivalent to 

a db replication mechanism.

More Related Content

PDF
Streaming in Scala with Avro
PDF
Data encoding and Metadata for Streams
PPT
Not only SQL
PPT
Building a CRM on top of ElasticSearch
ODP
Query DSL In Elasticsearch
PDF
Elasticsearch quick Intro (English)
PDF
Null Bachaav - May 07 Attack Monitoring workshop.
PDF
Analyse your SEO Data with R and Kibana
Streaming in Scala with Avro
Data encoding and Metadata for Streams
Not only SQL
Building a CRM on top of ElasticSearch
Query DSL In Elasticsearch
Elasticsearch quick Intro (English)
Null Bachaav - May 07 Attack Monitoring workshop.
Analyse your SEO Data with R and Kibana

What's hot (20)

PPTX
The tale of 100 cve's
PDF
Dcm#8 elastic search
PDF
No sq lv1_0
PPTX
quick intro to elastic search
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PDF
Elasticsearch in 15 minutes
PDF
Forcelandia 2016 PK Chunking
PDF
Prometheus lightning talk (Devops Dublin March 2015)
PPTX
Elasticsearch - under the hood
PDF
URLSession Reloaded
PDF
Cassandra Summit 2014: Fuzzy Entity Matching at Scale
PPTX
Elastic search
PDF
ElasticSearch in action
PDF
Introduction to Elasticsearch
PDF
Introduction to Elasticsearch
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
PDF
Introduction to Apache Solr
PPTX
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
PPT
Solr and Elasticsearch, a performance study
PDF
Apache Solr/Lucene Internals by Anatoliy Sokolenko
The tale of 100 cve's
Dcm#8 elastic search
No sq lv1_0
quick intro to elastic search
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch in 15 minutes
Forcelandia 2016 PK Chunking
Prometheus lightning talk (Devops Dublin March 2015)
Elasticsearch - under the hood
URLSession Reloaded
Cassandra Summit 2014: Fuzzy Entity Matching at Scale
Elastic search
ElasticSearch in action
Introduction to Elasticsearch
Introduction to Elasticsearch
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Introduction to Apache Solr
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
Solr and Elasticsearch, a performance study
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Ad

Viewers also liked (12)

PDF
Introduction aux Macros
PDF
Big data forever
PDF
Logging in Scala
PDF
Type Checking Scala Spark Datasets: Dataset Transforms
PDF
Test strategies for data processing pipelines
PDF
How to use Parquet as a basis for ETL and analytics
PDF
Parquet and AVRO
PDF
Parquet Strata/Hadoop World, New York 2013
PPTX
File Format Benchmarks - Avro, JSON, ORC, & Parquet
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
PPTX
7 key recipes for data engineering
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Introduction aux Macros
Big data forever
Logging in Scala
Type Checking Scala Spark Datasets: Dataset Transforms
Test strategies for data processing pipelines
How to use Parquet as a basis for ETL and analytics
Parquet and AVRO
Parquet Strata/Hadoop World, New York 2013
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Efficient Data Storage for Analytics with Apache Parquet 2.0
7 key recipes for data engineering
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Ad

Similar to Introduction à kafka (20)

DOCX
Adsa u4 ver 1.0
PDF
RxJava pour Android : présentation lors du GDG Android Montréal
PPTX
Metric Abuse: Frequently Misused Metrics in Oracle
PDF
2019 PHP Serbia - Boosting your performance with Blackfire
PDF
Austin Cassandra Meetup re: Atomic Counters
PPTX
How I Developed My First MCP Server? & How You Can Develop It Too?
ODP
Low level java programming
KEY
Synchronous Reads Asynchronous Writes RubyConf 2009
PPTX
Akka for big data developers
PDF
Microservices: moving parts around
PDF
Stream Processing with CompletableFuture and Flow in Java 9
PDF
JCON World 2023 - Cache, but Cache Wisely.pdf
PDF
Need for Async: Hot pursuit for scalable applications
PPTX
Changing rules 1_stopcheating_slideshare
PPTX
Large Components in the Rearview Mirror
PDF
2019 StartIT - Boosting your performance with Blackfire
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
PDF
Top-10-Java-Performance-Problems.pdf
PDF
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
PDF
Workflow Yapceu2010
Adsa u4 ver 1.0
RxJava pour Android : présentation lors du GDG Android Montréal
Metric Abuse: Frequently Misused Metrics in Oracle
2019 PHP Serbia - Boosting your performance with Blackfire
Austin Cassandra Meetup re: Atomic Counters
How I Developed My First MCP Server? & How You Can Develop It Too?
Low level java programming
Synchronous Reads Asynchronous Writes RubyConf 2009
Akka for big data developers
Microservices: moving parts around
Stream Processing with CompletableFuture and Flow in Java 9
JCON World 2023 - Cache, but Cache Wisely.pdf
Need for Async: Hot pursuit for scalable applications
Changing rules 1_stopcheating_slideshare
Large Components in the Rearview Mirror
2019 StartIT - Boosting your performance with Blackfire
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Top-10-Java-Performance-Problems.pdf
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Workflow Yapceu2010

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx

Introduction à kafka

  • 3. About me - My name is Jonathan Winandy (@ahoy_jon). - I am a Data pipeline engineer : - I worked on a “DataLake” ! - I use tools in the larger Java ecosystem like Java, Scala, Clojure, Hadoop … - And I am an “entrepreneur”. > Introduction
  • 4. I cofounded We do health care oriented software engineering. We provide : 
 - Coordination for health care professionals. - “Big health care Data” pipelines. > Introduction
  • 5. So let’s talk about Streams
  • 6. What is a Stream ? It’s an abstract data structure with the following : operations : • append(bytes) -> void? • readAt(int) -> null | bytes rule 1 : ∀p ∈ ℕ, for some definition of ‘==‘ x := readAt(p) y := readAt(p) x != null => x == y Rule 1 implies : Infinite cacheability 
 once the data is available at a position. > Theory
  • 7. Streams are the simplest way to manage data. And they are naturally compatible with the perception of information from a singular observer … 0 1 2 3 4 5 6 > Theory
  • 8. And we know that since the XVth century … So what happened ? Memorial Journal Ledger
  • 11. > Dist systems @aphyr : Jepsen II Linearizable Boogaloo Distributed Systems Are HARD
  • 12. https://www.youtube.com/watch?v=ggCffvKEJmQ Peter Alvaro - Outwards from the Middle of the Maze > Dist systems
  • 13. > Dist systems => Idempotence
  • 14. > Summary The need of unified log arises ‘quickly’ in apps that manage state (or multiple states) when they need to do : - Business Intelligence, - Notifications, - Advanced search (secondary indexation), - …. But there is a lot of legacy in projets and practices, 
 this technique has been regularly “forgotten*”.
  • 15. > Basic anatomy of Kafka Topic : _test
  • 16. But do use that, we need a bit of coordination
  • 17. Broker 1 Broker 2 Broker3 ZK Producer 1. Hello ZK, do you know where I can find some brokers ? 2. Ahoy ? 3. Want some data ? > Producer
  • 18. Demo
  • 19. Message acking for producer (“write concern”) 0 : Here a messa’ 1 : If at least you, leader of this partition, received it and saved it, I am ok. -1: Hey, I just send you a message, I know it’s maybe to much to ask, But are you really sure you saved it ? Ok, and did all brokers in the “In Sync Replicas partition did too ? I now I am … but this information is really imp Speed Durability > Producer
  • 23. State : A timeless way for failure
  • 28. https://github.com/bulldog2011/luxun Kafka as unique properties, PLEASE : don’t try to use something else instead.
  • 29. We should talk about CAP But CAP is about mutation And consistency is a complicated subject And consistency is a complicated subject
  • 31. A quick note on Causality If you don’t ensure causality for web apps, some strange comportements may arise : Sometimes, as a user, I cannot see my own “edits”. Sometimes, as a client, I cannot buy on the website after I checkout my basket. APP APP “Who is the fastest between the Data bus and the client ?”
 
 You don’t want to bet, especially under load.
  • 34. Bonus :What is a CAS ? A Content Adressable Storage is a specific “key value store” : operations : - store(bytes) -> key - get(key) -> null | bytes rule 1 : key = h(data) h being a cryptographic hash function like md5 or sha1. rule 2 : ∀data get(store(data)) = data Rule 1 and 2 imply : Infinite cacheability 
 and scalability.
  • 36. Exemple of architectures CLASSICAL APP REPLICATION (BIN/LOG) APP APP DB DB APP APP APP append broadcast WITH STREAMS The broadcast mechanism is equivalent to 
 a db replication mechanism.