SlideShare a Scribd company logo
2
Most read
3
Most read
7
Most read
Messaging Queue
-Kafka
What is Messaging Queue ?
Which software is best fit for our service ?
-Amazon SQS, Amazon SNS, Apache Kafka, Rabbit MQ, IBM MQ
Can we create our own Messaging Queue ?
Queue contains sequence of messages, sent between applications, awaiting their turn to be processed.
Message is the data to be sent from producer to consumer.
Why Messaging Queue ?
Why can’t we have Rest APIs everywhere ?
Sync Call
Failed Case
Why Messaging Queue ?
Async Call
Failed Case
Messaging Queue
Kafka
Distributed streaming platform●
Real-time streaming of data.●
Can handle billions of messages in
a day.
●
High throughput, reliability,
replication capabilities.
●
Amazon MSK - Amazon Manager
Streaming for Apache Kafka.
●
Linkedin, Twitter, Netflix, etc.●
Kafka - Terminologies
Kafka Cluster - Cluster of one or more servers (Kafka Brokers) to maintain the load balanced.●
Kafka broker - Broker is a Kafka server. They shares information between each other.●
Bootstrap Server - Server used for the initial connection to Kafka cluster. Consists of
Host:Port.
●
Producer - Produces the message and send to a topic (partition).●
Consumer - Polls the message from the topic (partition).●
Consumer Group - A message can be read by once in each Consumer Group. - SNS Handlers●
Kafka - Terminologies
Topic - To store or publish particular streams of data. A topic can have one or more partitions.●
Partition - To support the parallelism for fast processing. - SQS Messaging Group.●
Segment - Data is stored into segments. A partition is divided into multiple segments.●
Offset - To uniquely identify the message in each partition. It starts from 0 for each partition.●
Zookeeper - Manages election algorithm for brokers. Each partition has its own leader.●
Producer
Sends data with topic only●
Producer partitioner decides the
partition.
○
Default Round-Robin algorithm is used.
We can implement our own.
○
Sends data with topic and Partition Id●
Directly selects the partition and sends
the data.
○
Sends data with topic and Partition Key●
Create a hash value of partition key and
basis that decides partition id.
○
It is similar to SQS message group id.○
Kafka Broker Data Storage
Segments
Segments are named by their base offset. The
base offset of a segment is an offset greater
than offsets in previous segments and less
than or equal to offsets in that segment.
segment.index - The segment index maps
offsets to their message’s position in the
segment log.
●
segment.log - stores the actual message.●
Consumer
All partitions are assigned to the
only consumer
Partitions are equally divided and
assign to the consumers
Each partition maps to each
consumer
When more no. of consumers -
they become idle
Each partition is only consumed by
a single consumer from the group
Partition Allocation
Consumer
Reads messages from a Parition
Offset: from-beginning●
On restart, reads from first available offset.○
Not from 1. As Kafka has default retention of
7 days.
○
Offset: earliest●
On restart, reads from last committed offset.○
Auto commit: commits after 5 sec of poll call.○
Manual commit: send the ack manually to
broker with the offset.
○
Offset: latest●
On restart, reads from the latest message.○
Used for Real-time cases. ○
Types of Message Delivery
At most once delivered●
If the producer does not retry when an ack times out, then the message might end up
not being written to the Kafka topic.
○
Producer waits for only one ack. - acks=1○
20 times faster.○
At least once delivered●
If a producer retries, if the broker had failed right before it sent the ack but after the
message was successfully written to the Kafka topic, this retry leads to the message
being written twice. (Standard SQS)
○
Producer waits for all the ack. -acks=all○
3 times faster.○
Exactly once delivered●
Unique identifier is required. So whenever producer sends the duplicate, broker will not
store that message again. - enable.idempotence=true (FIFO SQS)
○
 Difficult to handle it at consumer end, manual offset needs to handled carefully.○
Alternate way is transaction from producer sends till ack received from consumer.○
Zookeeper
Electing a controller. It maintains the
leader/follower relationship for all the
partitions. 
●
When a node shuts down, it tells other
replicas to become partition leaders.
●
Manage service discovery for Kafka
Brokers that form the cluster.
●
 Sends changes of the topology to
Kafka, so each node in the cluster
knows when a new broker joined, a
Broker died, a topic was removed or a
topic was added, etc. 
●
SQS vs Kafka
Paramter AWS SQS Apache Kafka
Order of Messages Standard Queue: can be out of order
FIFO Queue: in order within message group
in order within the partition
Message Delivery
Standard Queue: At least once delivered
FIFO Queue: Exactly once delivered
provide all three types of message delivery. At-most once,
atleast once and exactly once.
Retention
Default: 4 days
upto 14 days
Default: 7 days
upto 14 days
Metrics CloudWatch Metrics openTSDB - to analyse number of packets
yet to be consumed on each partition/
Security IAM, AWS KMS - Key Management Service Kerberos
Consume same message Connect SQS with SNS Using Consumer Group
Cost
Pay as you use
depends on req/sec and data transfer/sec
Open-source
Server cost and magement cost
Long Polling can reduce cost
max value- 20 sec
Not providing this feature
Poll interval is configurable
Exception Handling Dead-Letter Queues Handle manually - create a separate topic for this
Message Size
Default: 256 KB
to increase further - connect with S3 (support upto 2 GB)
Default: 1 MB
to increase further: change configs of producer, brokers, consumers
Serialization/Deserialization Default: String
Default: String
Avro, protobuf
Throughput Standard Queue: Unlimited
FIFO Queue: 300/sec (10 messages in batch- 3000/sec)
Very High
Questions?

More Related Content

PPTX
A visual introduction to Apache Kafka
PDF
Fundamentals of Apache Kafka
PPTX
Apache kafka
PDF
Apache Kafka Introduction
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
Introduction to Microservices Patterns
PDF
PDF
Kafka Overview
A visual introduction to Apache Kafka
Fundamentals of Apache Kafka
Apache kafka
Apache Kafka Introduction
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Introduction to Microservices Patterns
Kafka Overview

What's hot (20)

PPTX
Kafka 101
PPTX
Kafka pub sub demo
PPTX
Apache kafka
ODP
Introduction To RabbitMQ
PPTX
Stability Patterns for Microservices
PDF
Fluent Bit: Log Forwarding at Scale
PPTX
Kafka presentation
PPTX
PDF
Apache Kafka - Martin Podval
PDF
Scaling Twitter
PPTX
Introduction to Apache Kafka
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
PDF
An Introduction to Apache Kafka
PPTX
Apache kafka
PPTX
Apache Kafka
PDF
Apache Kafka Architecture & Fundamentals Explained
PPTX
Introduction to Apache Kafka
PPTX
Software architecture for high traffic website
PDF
Consumer offset management in Kafka
Kafka 101
Kafka pub sub demo
Apache kafka
Introduction To RabbitMQ
Stability Patterns for Microservices
Fluent Bit: Log Forwarding at Scale
Kafka presentation
Apache Kafka - Martin Podval
Scaling Twitter
Introduction to Apache Kafka
The Top 5 Apache Kafka Use Cases and Architectures in 2022
An Introduction to Apache Kafka
Apache kafka
Apache Kafka
Apache Kafka Architecture & Fundamentals Explained
Introduction to Apache Kafka
Software architecture for high traffic website
Consumer offset management in Kafka
Ad

Similar to Messaging queue - Kafka (20)

PDF
Kafka Deep Dive
PPTX
Introduction to Kafka and Event-Driven
PDF
Introduction to Kafka and Event-Driven
PPTX
Copy of Kafka-Camus
PDF
Stateful stream processing with kafka and samza
PPTX
Apache kafka
PPTX
Kafka basics and best prectices
PPTX
Kafka tutorial
PDF
Non-Kafkaesque Apache Kafka - Yottabyte 2018
PDF
apachekafka-160907180205.pdf
PDF
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
PPTX
Apache kafka
DOCX
A Quick Guide to Refresh Kafka Skills
PDF
Event driven-arch
PPTX
Session 23 - Kafka and Zookeeper
ODP
Kafka aws
PDF
Timothy Spann: Apache Pulsar for ML
PDF
Kafka 10000 feet view
PPTX
Proof of Concept on Kafka.pptx
PPTX
Kafka: Internals
Kafka Deep Dive
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
Copy of Kafka-Camus
Stateful stream processing with kafka and samza
Apache kafka
Kafka basics and best prectices
Kafka tutorial
Non-Kafkaesque Apache Kafka - Yottabyte 2018
apachekafka-160907180205.pdf
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Apache kafka
A Quick Guide to Refresh Kafka Skills
Event driven-arch
Session 23 - Kafka and Zookeeper
Kafka aws
Timothy Spann: Apache Pulsar for ML
Kafka 10000 feet view
Proof of Concept on Kafka.pptx
Kafka: Internals
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Messaging queue - Kafka

  • 2. What is Messaging Queue ? Which software is best fit for our service ? -Amazon SQS, Amazon SNS, Apache Kafka, Rabbit MQ, IBM MQ Can we create our own Messaging Queue ? Queue contains sequence of messages, sent between applications, awaiting their turn to be processed. Message is the data to be sent from producer to consumer.
  • 3. Why Messaging Queue ? Why can’t we have Rest APIs everywhere ? Sync Call Failed Case
  • 4. Why Messaging Queue ? Async Call Failed Case Messaging Queue
  • 5. Kafka Distributed streaming platform● Real-time streaming of data.● Can handle billions of messages in a day. ● High throughput, reliability, replication capabilities. ● Amazon MSK - Amazon Manager Streaming for Apache Kafka. ● Linkedin, Twitter, Netflix, etc.●
  • 6. Kafka - Terminologies Kafka Cluster - Cluster of one or more servers (Kafka Brokers) to maintain the load balanced.● Kafka broker - Broker is a Kafka server. They shares information between each other.● Bootstrap Server - Server used for the initial connection to Kafka cluster. Consists of Host:Port. ● Producer - Produces the message and send to a topic (partition).● Consumer - Polls the message from the topic (partition).● Consumer Group - A message can be read by once in each Consumer Group. - SNS Handlers●
  • 7. Kafka - Terminologies Topic - To store or publish particular streams of data. A topic can have one or more partitions.● Partition - To support the parallelism for fast processing. - SQS Messaging Group.● Segment - Data is stored into segments. A partition is divided into multiple segments.● Offset - To uniquely identify the message in each partition. It starts from 0 for each partition.● Zookeeper - Manages election algorithm for brokers. Each partition has its own leader.●
  • 8. Producer Sends data with topic only● Producer partitioner decides the partition. ○ Default Round-Robin algorithm is used. We can implement our own. ○ Sends data with topic and Partition Id● Directly selects the partition and sends the data. ○ Sends data with topic and Partition Key● Create a hash value of partition key and basis that decides partition id. ○ It is similar to SQS message group id.○
  • 9. Kafka Broker Data Storage Segments Segments are named by their base offset. The base offset of a segment is an offset greater than offsets in previous segments and less than or equal to offsets in that segment. segment.index - The segment index maps offsets to their message’s position in the segment log. ● segment.log - stores the actual message.●
  • 10. Consumer All partitions are assigned to the only consumer Partitions are equally divided and assign to the consumers Each partition maps to each consumer When more no. of consumers - they become idle Each partition is only consumed by a single consumer from the group Partition Allocation
  • 11. Consumer Reads messages from a Parition Offset: from-beginning● On restart, reads from first available offset.○ Not from 1. As Kafka has default retention of 7 days. ○ Offset: earliest● On restart, reads from last committed offset.○ Auto commit: commits after 5 sec of poll call.○ Manual commit: send the ack manually to broker with the offset. ○ Offset: latest● On restart, reads from the latest message.○ Used for Real-time cases. ○
  • 12. Types of Message Delivery At most once delivered● If the producer does not retry when an ack times out, then the message might end up not being written to the Kafka topic. ○ Producer waits for only one ack. - acks=1○ 20 times faster.○ At least once delivered● If a producer retries, if the broker had failed right before it sent the ack but after the message was successfully written to the Kafka topic, this retry leads to the message being written twice. (Standard SQS) ○ Producer waits for all the ack. -acks=all○ 3 times faster.○ Exactly once delivered● Unique identifier is required. So whenever producer sends the duplicate, broker will not store that message again. - enable.idempotence=true (FIFO SQS) ○  Difficult to handle it at consumer end, manual offset needs to handled carefully.○ Alternate way is transaction from producer sends till ack received from consumer.○
  • 13. Zookeeper Electing a controller. It maintains the leader/follower relationship for all the partitions.  ● When a node shuts down, it tells other replicas to become partition leaders. ● Manage service discovery for Kafka Brokers that form the cluster. ●  Sends changes of the topology to Kafka, so each node in the cluster knows when a new broker joined, a Broker died, a topic was removed or a topic was added, etc.  ●
  • 14. SQS vs Kafka Paramter AWS SQS Apache Kafka Order of Messages Standard Queue: can be out of order FIFO Queue: in order within message group in order within the partition Message Delivery Standard Queue: At least once delivered FIFO Queue: Exactly once delivered provide all three types of message delivery. At-most once, atleast once and exactly once. Retention Default: 4 days upto 14 days Default: 7 days upto 14 days Metrics CloudWatch Metrics openTSDB - to analyse number of packets yet to be consumed on each partition/ Security IAM, AWS KMS - Key Management Service Kerberos Consume same message Connect SQS with SNS Using Consumer Group Cost Pay as you use depends on req/sec and data transfer/sec Open-source Server cost and magement cost Long Polling can reduce cost max value- 20 sec Not providing this feature Poll interval is configurable Exception Handling Dead-Letter Queues Handle manually - create a separate topic for this Message Size Default: 256 KB to increase further - connect with S3 (support upto 2 GB) Default: 1 MB to increase further: change configs of producer, brokers, consumers Serialization/Deserialization Default: String Default: String Avro, protobuf Throughput Standard Queue: Unlimited FIFO Queue: 300/sec (10 messages in batch- 3000/sec) Very High