SlideShare a Scribd company logo
STREAMING WITH
KAFKA
Publish/Subscribe Messaging with Kafka
What is streaming?
■ So far we’ve really just talked about processing historical, existing big data
– Sitting on HDFS
– Sitting in a database
■ But how does new data get into your cluster? Especially if it’s “Big data”?
– New log entries from your web servers
– New sensor data from your IoT system
– New stock trades
■ Streaming lets you publish this data, in real time, to your cluster.
– And you can even process it in real time as it comes in!
Two problems
■ How to get data from many different sources flowing into your cluster
■ Processing it when it gets there
■ First, let’s focus on the first problem
Enter Kafka
■ Kafka is a general-purpose publish/subscribe messaging system
■ Kafka servers store all incoming messages from publishers for some period of
time, and publishes them to a stream of data called a topic.
■ Kafka consumers subscribe to one or more topics, and receive data as it’s
published
■ A stream / topic can have many different consumers, all with their own
position in the stream maintained
■ It’s not just for Hadoop
Kafka architecture
Kafka Cluster
App App App
App
App
App App App
DB
DB
Producers
Consumers
Stream
Processors
Connectors
How Kafka scales
Image: kafka.apache.org
■ Kafka itself may be distributed among
many processes on many servers
– Will distribute the storage of stream
data as well
■ Consumers may also be distributed
– Consumers of the same group will
have messages distributed amongst
them
– Consumers of different groups will get
their own copy of each message
Let’s play
■ Start Kafka on our sandbox
■ Set up a topic
– Publish some data to it, and watch it get consumed
■ Set up a file connector
– Monitor a log file and publish additions to it

More Related Content

PDF
Streaming Analytics unit 2 notes for engineers
PPTX
Kafka presentation
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
PDF
Devoxx university - Kafka de haut en bas
PDF
Connect K of SMACK:pykafka, kafka-python or?
PDF
PPTX
Kafka
Streaming Analytics unit 2 notes for engineers
Kafka presentation
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
Devoxx university - Kafka de haut en bas
Connect K of SMACK:pykafka, kafka-python or?
Kafka

Similar to STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka (20)

PPTX
Kafkha real time analytics platform.pptx
PDF
Kafka syed academy_v1_introduction
PDF
Introduction_to_Kafka - A brief Overview.pdf
PPTX
How kafka is transforming hadoop, spark & storm
PPTX
Kafka Basic For Beginners
PDF
Streaming Data with Apache Kafka
PDF
Kafka for begginer
PDF
Data Pipelines with Apache Kafka
PPTX
Service messaging using Kafka
PPTX
Westpac Bank Tech Talk 1: Dive into Apache Kafka
PPTX
Kafka overview
PPTX
Kafka for Scale
PPTX
Introduction to Kafka Streams Presentation
PDF
Event driven-arch
PPTX
Streaming Data and Stream Processing with Apache Kafka
PDF
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
PDF
Building Streaming Data Applications Using Apache Kafka
PPTX
How Apache Kafka is transforming Hadoop, Spark and Storm
PDF
An Introduction to Apache Kafka
PDF
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Kafkha real time analytics platform.pptx
Kafka syed academy_v1_introduction
Introduction_to_Kafka - A brief Overview.pdf
How kafka is transforming hadoop, spark & storm
Kafka Basic For Beginners
Streaming Data with Apache Kafka
Kafka for begginer
Data Pipelines with Apache Kafka
Service messaging using Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Kafka overview
Kafka for Scale
Introduction to Kafka Streams Presentation
Event driven-arch
Streaming Data and Stream Processing with Apache Kafka
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Building Streaming Data Applications Using Apache Kafka
How Apache Kafka is transforming Hadoop, Spark and Storm
An Introduction to Apache Kafka
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Ad

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PPT on Performance Review to get promotions
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
composite construction of structures.pdf
DOCX
573137875-Attendance-Management-System-original
PPT
Mechanical Engineering MATERIALS Selection
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CH1 Production IntroductoryConcepts.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
OOP with Java - Java Introduction (Basics)
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CYBER-CRIMES AND SECURITY A guide to understanding
PPT on Performance Review to get promotions
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Operating System & Kernel Study Guide-1 - converted.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
composite construction of structures.pdf
573137875-Attendance-Management-System-original
Mechanical Engineering MATERIALS Selection
Ad

STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka

  • 2. What is streaming? ■ So far we’ve really just talked about processing historical, existing big data – Sitting on HDFS – Sitting in a database ■ But how does new data get into your cluster? Especially if it’s “Big data”? – New log entries from your web servers – New sensor data from your IoT system – New stock trades ■ Streaming lets you publish this data, in real time, to your cluster. – And you can even process it in real time as it comes in!
  • 3. Two problems ■ How to get data from many different sources flowing into your cluster ■ Processing it when it gets there ■ First, let’s focus on the first problem
  • 4. Enter Kafka ■ Kafka is a general-purpose publish/subscribe messaging system ■ Kafka servers store all incoming messages from publishers for some period of time, and publishes them to a stream of data called a topic. ■ Kafka consumers subscribe to one or more topics, and receive data as it’s published ■ A stream / topic can have many different consumers, all with their own position in the stream maintained ■ It’s not just for Hadoop
  • 5. Kafka architecture Kafka Cluster App App App App App App App App DB DB Producers Consumers Stream Processors Connectors
  • 6. How Kafka scales Image: kafka.apache.org ■ Kafka itself may be distributed among many processes on many servers – Will distribute the storage of stream data as well ■ Consumers may also be distributed – Consumers of the same group will have messages distributed amongst them – Consumers of different groups will get their own copy of each message
  • 7. Let’s play ■ Start Kafka on our sandbox ■ Set up a topic – Publish some data to it, and watch it get consumed ■ Set up a file connector – Monitor a log file and publish additions to it