SlideShare a Scribd company logo
Stream Processing with Big Data 
Learn Apache Kafka 
Kishore Veleti 
Big Data Engineer 
©2014 Knowledgent Group Inc. All Rights Reserved
• Big Data Engineer at Knowledgent 
• Background in enterprise application development using 
Hadoop stack, Java, PHP 
• Worked in Healthcare, Banking, and Social Media 
Applications 
• Passionate in sharing knowledge 
©2014 Knowledgent 2 Group Inc. All Rights Reserved 
About Me
Tutorial 
©2014 Knowledgent 3 Group Inc. All Rights Reserved
©2014 Knowledgent 4 Group Inc. All Rights Reserved 
We will discuss: 
•What is Apache Kafka? 
• Apache Kafka Terminology 
• Apache Kafka – about Topic & Partition 
• Apache Kafka hands-on
What is Apache Kafka? 
• Apache Kafka is a publish-subscribe messaging system 
implemented as a distributed commit log 
©2014 Knowledgent 5 Group Inc. All Rights Reserved 
• It is written in Java/Scala 
• Built by LinkedIn to process activity stream data from their 
website
• All the messages in Kafka are real-time 
• There are many subscribers to a message 
• Kafka persists messages to the disk 
• Messages are retained for a specific time period 
• Subscribers/clients store the state of their reads 
• Easy to replay messages 
©2014 Knowledgent 6 Group Inc. All Rights Reserved 
What is Apache Kafka?
Apache Kafka Terminology 
©2014 Knowledgent 7 Group Inc. All Rights Reserved 
• Message: A datum to send 
• Topic: Kafka maintains messages in categories called “topics” 
• Partition: A logical division of a topic 
• Producer: An API to publish messages to Kafka topic 
• Broker: A server 
• Cluster: Kafka cluster comprises one or more brokers 
• Consumer: API to consume published messages and process further 
• Replication: Kafka replicates log for each partition across servers
Apache Kafka Terminology & 
Big Picture 
Message Topic Partition Producer Broker Consumer 
At a high level, producers send messages over the network to 
the Kafka cluster. 
Kafka cluster in turn serves them up to consumers. 
©2014 Knowledgent 8 Group Inc. All Rights Reserved
Apache Kafka Terminology & 
Big Picture 
Message Topic Partition Producer Broker Consumer 
Let’s do a hands-on exercise of Kafka with knowledge we’ve 
learned until now 
©2014 Knowledgent 9 Group Inc. All Rights Reserved
Apache Kafka: About Topic and 
Partition 
Message Topic Partition Producer Broker Consumer 
In Kafka for each topic a partition log is maintained. 
Each partition is an ordered, immutable sequence of messages that is 
appended to 
Each message in the partition is assigned a sequential id number called the 
offset 
©2014 Knowledgent 10 Group Inc. All Rights Reserved 
Partition 1 
Writes 
Partition 2 
Partition 3
Apache Kafka: About Topic and 
Partition 
Message Topic Partition Producer Broker Consumer 
In Kafka, a Producer is an API to publish messages to topic 
©2014 Knowledgent 11 Group Inc. All Rights Reserved
Apache Kafka: About Topic and 
Partition 
Message Topic Partition Producer Broker Consumer 
In Kafka, a Consumer is an API to consume messages from topics 
©2014 Knowledgent 12 Group Inc. All Rights Reserved
Apache Kafka Terminology & 
Big Picture 
Message Topic Partition Producer Broker Consumer 
Let’s do a hands-on exercise of Kafka with knowledge we’ve 
learned until now 
©2014 Knowledgent 13 Group Inc. All Rights Reserved
Apache Kafka Use Cases 
©2014 Knowledgent 14 Group Inc. All Rights Reserved 
• Trading Systems 
- Risk Identification in real-time 
• Change Data Capture 
- Capturing the changed data into data lake environment 
• Online Gaming 
- Identifying top scorers of a game
Thank you! 
Questions? 
©2014 Knowledgent 15 Group Inc. All Rights Reserved

More Related Content

PDF
intro-kafka
PPTX
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
PDF
PPTX
AWS network services
PPTX
PPTX
Apache Kafka Security
PPTX
Kafka Security
PPTX
Encrypting Kafka messages at rest to secure applications | Robert Barnes, Has...
intro-kafka
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
AWS network services
Apache Kafka Security
Kafka Security
Encrypting Kafka messages at rest to secure applications | Robert Barnes, Has...

What's hot (11)

PDF
Securing Kafka
PPTX
AWS Network Topology/Architecture
PPTX
AWS VPC & Networking basic concepts
PDF
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
PDF
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
PPTX
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
PDF
AWS VPC best practices 2016 by Bogdan Naydenov
PDF
How Apache Kafka® Works
PDF
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
PDF
The best of Apache Kafka Architecture
PPTX
Amazon Virtual Private Cloud VPC Architecture AWS Web Services
Securing Kafka
AWS Network Topology/Architecture
AWS VPC & Networking basic concepts
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
AWS VPC best practices 2016 by Bogdan Naydenov
How Apache Kafka® Works
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
The best of Apache Kafka Architecture
Amazon Virtual Private Cloud VPC Architecture AWS Web Services
Ad

Similar to Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up (20)

PPTX
Apache kafka
PDF
Apache kafka
PPTX
Apache kafka
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
PPTX
kafka_session1_basics_1.pptx kafka_session1_basics_1.pptx
PPTX
kafka_session_updated.pptx
PDF
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
PPTX
Apache kafka
PDF
Kinesis vs-kafka-and-kafka-deep-dive
PDF
Hello, kafka! (an introduction to apache kafka)
PDF
Kafka Architecture | Key Components | kafka training online
PDF
apache kafka training online | kafka online training
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
PPTX
Columbus mule soft_meetup_aug2021_Kafka_Integration
PPTX
Kafka for DBAs
PDF
Introduction_to_Kafka - A brief Overview.pdf
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PDF
Python Kafka Integration: Developers Guide
PPTX
Decoupling Decisions with Apache Kafka
PPTX
Kafka for Scale
Apache kafka
Apache kafka
Apache kafka
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
kafka_session1_basics_1.pptx kafka_session1_basics_1.pptx
kafka_session_updated.pptx
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Apache kafka
Kinesis vs-kafka-and-kafka-deep-dive
Hello, kafka! (an introduction to apache kafka)
Kafka Architecture | Key Components | kafka training online
apache kafka training online | kafka online training
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Columbus mule soft_meetup_aug2021_Kafka_Integration
Kafka for DBAs
Introduction_to_Kafka - A brief Overview.pdf
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Python Kafka Integration: Developers Guide
Decoupling Decisions with Apache Kafka
Kafka for Scale
Ad

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Global journeys: estimating international migration
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Taxes Foundatisdcsdcsdon Certificate.pdf
Fluorescence-microscope_Botany_detailed content
Galatica Smart Energy Infrastructure Startup Pitch Deck
Major-Components-ofNKJNNKNKNKNKronment.pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to Knowledge Engineering Part 1
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction-to-Cloud-ComputingFinal.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Global journeys: estimating international migration
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

  • 1. Stream Processing with Big Data Learn Apache Kafka Kishore Veleti Big Data Engineer ©2014 Knowledgent Group Inc. All Rights Reserved
  • 2. • Big Data Engineer at Knowledgent • Background in enterprise application development using Hadoop stack, Java, PHP • Worked in Healthcare, Banking, and Social Media Applications • Passionate in sharing knowledge ©2014 Knowledgent 2 Group Inc. All Rights Reserved About Me
  • 3. Tutorial ©2014 Knowledgent 3 Group Inc. All Rights Reserved
  • 4. ©2014 Knowledgent 4 Group Inc. All Rights Reserved We will discuss: •What is Apache Kafka? • Apache Kafka Terminology • Apache Kafka – about Topic & Partition • Apache Kafka hands-on
  • 5. What is Apache Kafka? • Apache Kafka is a publish-subscribe messaging system implemented as a distributed commit log ©2014 Knowledgent 5 Group Inc. All Rights Reserved • It is written in Java/Scala • Built by LinkedIn to process activity stream data from their website
  • 6. • All the messages in Kafka are real-time • There are many subscribers to a message • Kafka persists messages to the disk • Messages are retained for a specific time period • Subscribers/clients store the state of their reads • Easy to replay messages ©2014 Knowledgent 6 Group Inc. All Rights Reserved What is Apache Kafka?
  • 7. Apache Kafka Terminology ©2014 Knowledgent 7 Group Inc. All Rights Reserved • Message: A datum to send • Topic: Kafka maintains messages in categories called “topics” • Partition: A logical division of a topic • Producer: An API to publish messages to Kafka topic • Broker: A server • Cluster: Kafka cluster comprises one or more brokers • Consumer: API to consume published messages and process further • Replication: Kafka replicates log for each partition across servers
  • 8. Apache Kafka Terminology & Big Picture Message Topic Partition Producer Broker Consumer At a high level, producers send messages over the network to the Kafka cluster. Kafka cluster in turn serves them up to consumers. ©2014 Knowledgent 8 Group Inc. All Rights Reserved
  • 9. Apache Kafka Terminology & Big Picture Message Topic Partition Producer Broker Consumer Let’s do a hands-on exercise of Kafka with knowledge we’ve learned until now ©2014 Knowledgent 9 Group Inc. All Rights Reserved
  • 10. Apache Kafka: About Topic and Partition Message Topic Partition Producer Broker Consumer In Kafka for each topic a partition log is maintained. Each partition is an ordered, immutable sequence of messages that is appended to Each message in the partition is assigned a sequential id number called the offset ©2014 Knowledgent 10 Group Inc. All Rights Reserved Partition 1 Writes Partition 2 Partition 3
  • 11. Apache Kafka: About Topic and Partition Message Topic Partition Producer Broker Consumer In Kafka, a Producer is an API to publish messages to topic ©2014 Knowledgent 11 Group Inc. All Rights Reserved
  • 12. Apache Kafka: About Topic and Partition Message Topic Partition Producer Broker Consumer In Kafka, a Consumer is an API to consume messages from topics ©2014 Knowledgent 12 Group Inc. All Rights Reserved
  • 13. Apache Kafka Terminology & Big Picture Message Topic Partition Producer Broker Consumer Let’s do a hands-on exercise of Kafka with knowledge we’ve learned until now ©2014 Knowledgent 13 Group Inc. All Rights Reserved
  • 14. Apache Kafka Use Cases ©2014 Knowledgent 14 Group Inc. All Rights Reserved • Trading Systems - Risk Identification in real-time • Change Data Capture - Capturing the changed data into data lake environment • Online Gaming - Identifying top scorers of a game
  • 15. Thank you! Questions? ©2014 Knowledgent 15 Group Inc. All Rights Reserved