SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
Rahul Jain
Software Engineer
www.linkedin.com/in/rahuldausa
   Why Kafka
   Introduction
   Design
   Q&A
*Apache ActiveMQ, JBoss HornetQ, Zero MQ, RabbitMQ are respective brands of Apache Software Foundation,
JBoss Inc, iMatix Corporation and Vmware Inc.
Apache kafka
   Transportation of logs
   Activity Stream in Real time.
   Collection of Performance Metrics
    ◦ CPU/IO/Memory usage
    ◦ Application Specific
        Time taken to load a web-page.
        Time taken by Multiple Services while building a web-page.
        No of requests.
        No of hits on a particular page/url.
   Scalable: Need to be Highly Scalable. A lot         of Data. It can be
    billions of message.
   Reliability     of messages, What If, I loose a small no. of
    messages. Is it fine with me ?.
   Distributed : Multiple Producers, Multiple Consumers
   High-throughput: Does not require to have JMS Standards,
    as it may be overkill for some use-cases like transportation of logs.
    ◦ As per JMS, each message has to be acknowledged back.
    ◦ Exactly one delivery guarantee requires two-phase commit.
   An Apache Project, initially developed by
    LinkedIn's SNA team.
   A High-throughput distributed Publish-
    Subscribe based messaging system.
   A Kind of Data Pipeline
   Written in Scala.
   Does not follow JMS Standards, neither uses
    JMS APIs.
   Supports both queue and topic semantics.
Credit : http://kafka.apache.org/design.html
Handshake
Producer                Zookeeper

                                                     Consumer




                                      Coordination
Producer
                       Kafka Broker


                          .
                          .
Producer                  .                          Consumer
                          .
                          .


Producer               Kafka Broker
Coordination


           Handshake
                                                   Store Consumed Offset
Producer                    Zookeeper              and Watch for Cluster        Consumer 1
                                                   event                         (groupId1)



Producer
                            Kafka Broker
                            (Partition 1)
                                                                                Consumer 2
                                 .                                               (groupId1)
                                 .
Producer                         .
                                 .
                                 .


Producer                     Kafka Broker
                             (Partition 2)                                  *   Consumer 3
                                                                                 (groupId1)



                       * Consumer 3 would not receive any data, as number
                       of consumers are more than number of partitions.
   Filesystem Cache
   Zero-copy transfer of messages
   Batching of Messages
   Batch Compression
   Automatic Producer Load balancing.
   Broker does not Push messages to Consumer,
    Consumer Polls messages from Broker.
   And Some others.
        Cluster formation of Broker/Consumer using Zookeeper, So on the fly more consumer, broker
         can be introduced. The new cluster rebalancing will be taken care by Zookeeper
        Data is persisted in broker and is not removed on consumption (till retention period), so if one
         consumer fails while consuming, same message can be re-consume again later from broker.
        Simplified storage mechanism for message, not for each message per consumer.
Producer Performance                                               Consumer Performance


Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
Credit: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Apache kafka

More Related Content

PPTX
Kafka presentation
PDF
PDF
Apache Kafka - Martin Podval
PDF
An Introduction to Apache Kafka
PPTX
Apache kafka
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Apache Kafka Introduction
PPTX
Introduction to Apache Kafka
Kafka presentation
Apache Kafka - Martin Podval
An Introduction to Apache Kafka
Apache kafka
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Apache Kafka Introduction
Introduction to Apache Kafka

What's hot (20)

PPTX
Introduction to Kafka and Zookeeper
PDF
PPTX
A visual introduction to Apache Kafka
ODP
Stream processing using Kafka
PPTX
Apache Kafka - Messaging System Overview
PDF
Apache Kafka Architecture & Fundamentals Explained
PPTX
Kafka 101
PDF
Introduction to apache kafka
PDF
Fundamentals of Apache Kafka
PPTX
Apache kafka
PPTX
Introduction to Apache Kafka
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PPTX
Apache kafka
PDF
Introduction to Apache Kafka
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PDF
Introduction to Kafka Streams
PDF
Securing Kafka
PDF
Can Apache Kafka Replace a Database?
PDF
ksqlDB: A Stream-Relational Database System
Introduction to Kafka and Zookeeper
A visual introduction to Apache Kafka
Stream processing using Kafka
Apache Kafka - Messaging System Overview
Apache Kafka Architecture & Fundamentals Explained
Kafka 101
Introduction to apache kafka
Fundamentals of Apache Kafka
Apache kafka
Introduction to Apache Kafka
APACHE KAFKA / Kafka Connect / Kafka Streams
Apache kafka
Introduction to Apache Kafka
Apache Kafka Fundamentals for Architects, Admins and Developers
Introduction to Kafka Streams
Securing Kafka
Can Apache Kafka Replace a Database?
ksqlDB: A Stream-Relational Database System
Ad

Similar to Apache kafka (20)

PPTX
Kafka RealTime Streaming
PDF
Developing Real-Time Data Pipelines with Apache Kafka
PDF
Apache Kafka Women Who Code Meetup
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
PPTX
Apache Kafka
PPT
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
PDF
Flume-Cassandra Log Processor
PDF
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
DOCX
KAFKA Quickstart
PDF
Developing Realtime Data Pipelines With Apache Kafka
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
PPTX
Session 23 - Kafka and Zookeeper
PDF
Fault Tolerance with Kafka
PDF
Consuming, providing and publishing Web Services
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Kafka Deep Dive
PPTX
Kafka overview
PDF
Hello, kafka! (an introduction to apache kafka)
PDF
Kafka internals
Kafka RealTime Streaming
Developing Real-Time Data Pipelines with Apache Kafka
Apache Kafka Women Who Code Meetup
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
Apache Kafka
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
Flume-Cassandra Log Processor
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
KAFKA Quickstart
Developing Realtime Data Pipelines With Apache Kafka
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Session 23 - Kafka and Zookeeper
Fault Tolerance with Kafka
Consuming, providing and publishing Web Services
Apache Kafka - Scalable Message-Processing and more !
Kafka Deep Dive
Kafka overview
Hello, kafka! (an introduction to apache kafka)
Kafka internals
Ad

More from Rahul Jain (14)

PDF
Flipkart Strategy Analysis and Recommendation
PPTX
Emerging technologies /frameworks in Big Data
PPTX
Case study of Rujhaan.com (A social news app )
PPTX
Building a Large Scale SEO/SEM Application with Apache Solr
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PPTX
Introduction to Apache Spark
PPTX
Introduction to Machine Learning
PPTX
Introduction to Scala
PPTX
What is NoSQL and CAP Theorem
PPTX
Introduction to Elasticsearch with basics of Lucene
PPTX
Introduction to Apache Lucene/Solr
PPTX
Introduction to Lucene & Solr and Usecases
PPTX
Hadoop & HDFS for Beginners
DOC
Hibernate tutorial for beginners
Flipkart Strategy Analysis and Recommendation
Emerging technologies /frameworks in Big Data
Case study of Rujhaan.com (A social news app )
Building a Large Scale SEO/SEM Application with Apache Solr
Real time Analytics with Apache Kafka and Apache Spark
Introduction to Apache Spark
Introduction to Machine Learning
Introduction to Scala
What is NoSQL and CAP Theorem
Introduction to Elasticsearch with basics of Lucene
Introduction to Apache Lucene/Solr
Introduction to Lucene & Solr and Usecases
Hadoop & HDFS for Beginners
Hibernate tutorial for beginners

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Network Security Unit 5.pdf for BCA BBA.
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
NewMind AI Monthly Chronicles - July 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx

Apache kafka

  • 2. Why Kafka  Introduction  Design  Q&A
  • 3. *Apache ActiveMQ, JBoss HornetQ, Zero MQ, RabbitMQ are respective brands of Apache Software Foundation, JBoss Inc, iMatix Corporation and Vmware Inc.
  • 5. Transportation of logs  Activity Stream in Real time.  Collection of Performance Metrics ◦ CPU/IO/Memory usage ◦ Application Specific  Time taken to load a web-page.  Time taken by Multiple Services while building a web-page.  No of requests.  No of hits on a particular page/url.
  • 6. Scalable: Need to be Highly Scalable. A lot of Data. It can be billions of message.  Reliability of messages, What If, I loose a small no. of messages. Is it fine with me ?.  Distributed : Multiple Producers, Multiple Consumers  High-throughput: Does not require to have JMS Standards, as it may be overkill for some use-cases like transportation of logs. ◦ As per JMS, each message has to be acknowledged back. ◦ Exactly one delivery guarantee requires two-phase commit.
  • 7. An Apache Project, initially developed by LinkedIn's SNA team.  A High-throughput distributed Publish- Subscribe based messaging system.  A Kind of Data Pipeline  Written in Scala.  Does not follow JMS Standards, neither uses JMS APIs.  Supports both queue and topic semantics.
  • 9. Handshake Producer Zookeeper Consumer Coordination Producer Kafka Broker . . Producer . Consumer . . Producer Kafka Broker
  • 10. Coordination Handshake Store Consumed Offset Producer Zookeeper and Watch for Cluster Consumer 1 event (groupId1) Producer Kafka Broker (Partition 1) Consumer 2 . (groupId1) . Producer . . . Producer Kafka Broker (Partition 2) * Consumer 3 (groupId1) * Consumer 3 would not receive any data, as number of consumers are more than number of partitions.
  • 11. Filesystem Cache  Zero-copy transfer of messages  Batching of Messages  Batch Compression  Automatic Producer Load balancing.  Broker does not Push messages to Consumer, Consumer Polls messages from Broker.  And Some others.  Cluster formation of Broker/Consumer using Zookeeper, So on the fly more consumer, broker can be introduced. The new cluster rebalancing will be taken care by Zookeeper  Data is persisted in broker and is not removed on consumption (till retention period), so if one consumer fails while consuming, same message can be re-consume again later from broker.  Simplified storage mechanism for message, not for each message per consumer.
  • 12. Producer Performance Consumer Performance Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf