SlideShare a Scribd company logo
Apache Kafka
CHAPTER – 4
THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
Copyright @ 2019 Learntek. All Rights Reserved. 3
Apache Kafka
Data Analytics is often described as one of the biggest challenges associated with
big data, but even before that step can happen, data must be ingested and made
available to enterprise users. That’s where Apache Kafka comes in. Kafka’s growth
is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These
companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten
insurance companies, 9 of top ten telecom companies, and much more. LinkedIn,
Microsoft and Netflix process four comma messages a day with Kafka
(1,000,000,000,000).
Copyright @ 2019 Learntek. All Rights Reserved. 4
Introduction:
Apache Kafka is a streaming platform for collecting, storing, and processing high
volumes of data in real-time. Apache Kafka is a highly scalable, fast and fault-
tolerant messaging application used for streaming applications and data
processing. This application is written in Java and Scala programming languages.
Apache Kafka is a distributed data streaming platform that can publish, subscribe
to, store, and process streams of records in real time. It is designed to handle
data streams from multiple sources and deliver them to multiple consumers. In
short, it moves massive amounts of data – not just from point A to B, but from
points A to Z and anywhere else you need, all at the same time.
Apache Kafka started out as an internal system developed by LinkedIn to handle
1.4 trillion messages per day, but now it’s an open source data streaming solution
with application for a variety of enterprise needs.
Copyright @ 2019 Learntek. All Rights Reserved. 5
Copyright @ 2019 Learntek. All Rights Reserved. 6
Features:
•Apache Kafka is a distributed publish-subscribe messaging system that is designed to
be fast, scalable, and durable
•Apache Kafka is designed for distributed high throughput systems
•Apache Kafka tends to work very well as a replacement for a more traditional
message broker
•Apache Kafka has better throughput, built-in partitioning, replication and inherent
fault-tolerance, which makes it a good fit for large-scale message processing
applications
•Apache Kafka maintains feeds of messages in topics
•Producers write data to topics and consumers read from topics
•Since Kafka is a distributed system, topics are partitioned and replicated across
multiple nodes
•Kafka is very fast and guarantees zero downtime and zero data loss.
Copyright @ 2019 Learntek. All Rights Reserved. 7
Learn Big Data & Hadoop
Who uses Apache Kafka?
A lot of large companies who handle a lot of data use Kafka. LinkedIn, where it
originated, uses it to track activity data and operational metrics. Twitter uses it as
part of Storm to provide a stream processing infrastructure. Square uses Kafka as a
bus to move all system events to various Square data centers (logs, custom events,
metrics, and so on), outputs to Splunk, Graphite (dashboards), and to implement
an Esper-like/CEP alerting systems. It gets used by other companies too like Spotify,
Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, NetFlix, and much
more.
Copyright @ 2019 Learntek. All Rights Reserved. 8
Why is Kafka so Fast?
Kafka relies heavily on the OS kernel to move data around quickly. It relies on the
principals of Zero Copy. Kafka enables you to batch data records into chunks. These
batches of data can be seen end to end from Producer to file system (Kafka Topic
Log) to the Consumer. Batching allows for more efficient data compression and
reduces I/O latency. Kafka writes to the immutable commit log to the disk
sequential; thus, avoids random disk access, slow disk seeking. Kafka provides
horizontal Scale through sharding. It shards a Topic Log into hundreds potentially
thousands of partitions to thousands of servers. This sharding allows Kafka to
handle massive load.
Copyright @ 2019 Learntek. All Rights Reserved. 9
Key Benefits:
Copyright @ 2019 Learntek. All Rights Reserved. 10
Apache Kafka API:
Apache Kafka is a popular tool for developers because it is easy to pick up and
provides a powerful event streaming platform complete with 4 APIs: Producer,
Consumer, Streams, and Connect.
Basically, it has four core APIs:
•Producer API: This API permits the applications to publish a stream of records to
one or more topics.
•Consumer API: The Consumer API lets the application to subscribe to one or
more topics and process the produced stream of records.
•Streams API: This API takes the input from one or more topics and produces the
output to one or more topics by converting the input streams to the output ones.
•Connector API: This API is responsible for producing and executing reusable
producers and consumers who are able to link topics to the existing applications.
Copyright @ 2019 Learntek. All Rights Reserved. 11
Need for Apache Kafka :
•Kafka is a unified platform for handling all the real-time data feeds
•Kafka supports low latency message delivery and gives guarantee for fault tolerance in
the presence of machine failures
•It has the ability to handle a large number of diverse consumers
•Kafka is very fast, performs 2 million writes/sec
•Kafka persists all data to the disk, which essentially means that all the writes go to the
page cache of the OS (RAM)
•This makes it very efficient to transfer data from page cache to a network socket
Copyright @ 2019 Learntek. All Rights Reserved. 12
Apache Kafka – Use Cases:
Kafka can be used in many Use Cases. Some of them are listed below −
•Metrics− Kafka is often used for operational monitoring data. This involves
aggregating statistics from distributed applications to produce centralized feeds of
operational data.
•Twitter: Registered users can read and post tweets, but unregistered users can
only read tweets. Twitter uses Storm-Kafka as a part of their stream processing
infrastructure.
•Netflix: is an American multinational provider of on-demand Internet streaming
media. Netflix uses Kafka for real-time monitoring and event processing.
Copyright @ 2019 Learntek. All Rights Reserved. 13
•Log Aggregation Solution− Kafka can be used across an organization to collect
logs from multiple services and make them available in a standard format to multiple
con-summers.
•LinkedIn: Apache Kafka is used at LinkedIn for activity stream data and operational
metrics. Kafka messaging system helps LinkedIn with various products like LinkedIn
Newsfeed, LinkedIn Today for online message consumption and in addition to offline
analytics systems like Hadoop.
•Stream Processing− Popular frameworks such as Storm and Spark Streaming read
data from a topic, processes it, and write processed data to a new topic where it
becomes available for users and applications. Kafka’s strong durability is also very
useful in the context of stream processing.
Copyright @ 2019 Learntek. All Rights Reserved. 14
•Website activity tracking – The web application sends events such as page
views and searches Kafka, where they become available for real-time processing,
dashboards and offline analytics in Hadoop.
Copyright @ 2019 Learntek. All Rights Reserved. 15
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624

More Related Content

PPTX
Apache kafka
PDF
Kinesis vs-kafka-and-kafka-deep-dive
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
PPTX
Kafka for data scientists
PPTX
Securing Hadoop in an Enterprise Context
PPTX
Design Patterns for working with Fast Data
PPTX
Kafka for Scale
PDF
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Apache kafka
Kinesis vs-kafka-and-kafka-deep-dive
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Kafka for data scientists
Securing Hadoop in an Enterprise Context
Design Patterns for working with Fast Data
Kafka for Scale
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...

What's hot (19)

PPTX
Using Apache Spark with IBM SPSS Modeler
PDF
Introduction to Apache Kafka and why it matters - Madrid
PDF
The Many Faces of Apache Kafka: Leveraging real-time data at scale
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Kafka connect-london-meetup-2016
PPTX
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
PPTX
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
PPTX
How do spark_kafka_and_syncsort_dmx-h
PPTX
Apache kafka
PPTX
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
PPTX
Combining Machine Learning frameworks with Apache Spark
PDF
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
PDF
Impala use case @ Zoosk
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PDF
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
PPTX
Couchbase Meetup Jan 2016
Using Apache Spark with IBM SPSS Modeler
Introduction to Apache Kafka and why it matters - Madrid
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Real time Messages at Scale with Apache Kafka and Couchbase
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Kafka connect-london-meetup-2016
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
How do spark_kafka_and_syncsort_dmx-h
Apache kafka
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Combining Machine Learning frameworks with Apache Spark
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Impala use case @ Zoosk
Real time fraud detection at 1+M scale on hadoop stack
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Hadoop & Cloud Storage: Object Store Integration in Production
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Couchbase Meetup Jan 2016
Ad

Similar to Apache kafka (20)

PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
PDF
OSSNA Building Modern Data Streaming Apps
PPTX
unit5_Big Data Framework and security.pptx
PDF
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PPTX
Kafka Basic For Beginners
PDF
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
PPTX
A Short Presentation on Kafka
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
PPTX
Data streaming
PDF
Building Streaming Data Applications Using Apache Kafka
PDF
GSJUG: Mastering Data Streaming Pipelines 09May2023
PPTX
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
PPTX
Apache kafka
PDF
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
PPTX
Current and Future of Apache Kafka
PDF
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
PDF
ITPC Building Modern Data Streaming Apps
PPTX
Apache kafka
PDF
kafka-tutorial-cloudruable-v2.pdf
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
OSSNA Building Modern Data Streaming Apps
unit5_Big Data Framework and security.pptx
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Kafka Basic For Beginners
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
A Short Presentation on Kafka
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Data streaming
Building Streaming Data Applications Using Apache Kafka
GSJUG: Mastering Data Streaming Pipelines 09May2023
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Apache kafka
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
Current and Future of Apache Kafka
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
ITPC Building Modern Data Streaming Apps
Apache kafka
kafka-tutorial-cloudruable-v2.pdf
Ad

More from Janu Jahnavi (20)

PDF
Analytics using r programming
PDF
Software testing
PPTX
Software testing
PPTX
Spring
PDF
Stack skills
PPTX
Ui devopler
PPTX
Apache flink
PDF
Apache flink
PDF
Angular js
PDF
Mysql python
PPTX
Mysql python
PDF
Ruby with cucmber
PDF
Apache kafka
PPTX
Google cloud platform
PPTX
Google cloud Platform
PDF
Apache spark with java 8
PPTX
Apache spark with java 8
PDF
Categorizing and pos tagging with nltk python
PPTX
Categorizing and pos tagging with nltk python
PDF
Python multithreading
Analytics using r programming
Software testing
Software testing
Spring
Stack skills
Ui devopler
Apache flink
Apache flink
Angular js
Mysql python
Mysql python
Ruby with cucmber
Apache kafka
Google cloud platform
Google cloud Platform
Apache spark with java 8
Apache spark with java 8
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
Python multithreading

Recently uploaded (20)

PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Cell Structure & Organelles in detailed.
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Basic Mud Logging Guide for educational purpose
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Open Quiz Monsoon Mind Game Prelims.pptx
PDF
Pre independence Education in Inndia.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Open folder Downloads.pdf yes yes ges yes
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Insiders guide to clinical Medicine.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PDF
Business Ethics Teaching Materials for college
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Cell Structure & Organelles in detailed.
STATICS OF THE RIGID BODIES Hibbelers.pdf
Renaissance Architecture: A Journey from Faith to Humanism
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Basic Mud Logging Guide for educational purpose
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Open Quiz Monsoon Mind Game Prelims.pptx
Pre independence Education in Inndia.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Open folder Downloads.pdf yes yes ges yes
102 student loan defaulters named and shamed – Is someone you know on the list?
Insiders guide to clinical Medicine.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Cardiovascular Pharmacology for pharmacy students.pptx
Business Ethics Teaching Materials for college
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

Apache kafka

  • 2. CHAPTER – 4 THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
  • 3. Copyright @ 2019 Learntek. All Rights Reserved. 3 Apache Kafka Data Analytics is often described as one of the biggest challenges associated with big data, but even before that step can happen, data must be ingested and made available to enterprise users. That’s where Apache Kafka comes in. Kafka’s growth is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000).
  • 4. Copyright @ 2019 Learntek. All Rights Reserved. 4 Introduction: Apache Kafka is a streaming platform for collecting, storing, and processing high volumes of data in real-time. Apache Kafka is a highly scalable, fast and fault- tolerant messaging application used for streaming applications and data processing. This application is written in Java and Scala programming languages. Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. In short, it moves massive amounts of data – not just from point A to B, but from points A to Z and anywhere else you need, all at the same time. Apache Kafka started out as an internal system developed by LinkedIn to handle 1.4 trillion messages per day, but now it’s an open source data streaming solution with application for a variety of enterprise needs.
  • 5. Copyright @ 2019 Learntek. All Rights Reserved. 5
  • 6. Copyright @ 2019 Learntek. All Rights Reserved. 6 Features: •Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable •Apache Kafka is designed for distributed high throughput systems •Apache Kafka tends to work very well as a replacement for a more traditional message broker •Apache Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications •Apache Kafka maintains feeds of messages in topics •Producers write data to topics and consumers read from topics •Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes •Kafka is very fast and guarantees zero downtime and zero data loss.
  • 7. Copyright @ 2019 Learntek. All Rights Reserved. 7 Learn Big Data & Hadoop Who uses Apache Kafka? A lot of large companies who handle a lot of data use Kafka. LinkedIn, where it originated, uses it to track activity data and operational metrics. Twitter uses it as part of Storm to provide a stream processing infrastructure. Square uses Kafka as a bus to move all system events to various Square data centers (logs, custom events, metrics, and so on), outputs to Splunk, Graphite (dashboards), and to implement an Esper-like/CEP alerting systems. It gets used by other companies too like Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, NetFlix, and much more.
  • 8. Copyright @ 2019 Learntek. All Rights Reserved. 8 Why is Kafka so Fast? Kafka relies heavily on the OS kernel to move data around quickly. It relies on the principals of Zero Copy. Kafka enables you to batch data records into chunks. These batches of data can be seen end to end from Producer to file system (Kafka Topic Log) to the Consumer. Batching allows for more efficient data compression and reduces I/O latency. Kafka writes to the immutable commit log to the disk sequential; thus, avoids random disk access, slow disk seeking. Kafka provides horizontal Scale through sharding. It shards a Topic Log into hundreds potentially thousands of partitions to thousands of servers. This sharding allows Kafka to handle massive load.
  • 9. Copyright @ 2019 Learntek. All Rights Reserved. 9 Key Benefits:
  • 10. Copyright @ 2019 Learntek. All Rights Reserved. 10 Apache Kafka API: Apache Kafka is a popular tool for developers because it is easy to pick up and provides a powerful event streaming platform complete with 4 APIs: Producer, Consumer, Streams, and Connect. Basically, it has four core APIs: •Producer API: This API permits the applications to publish a stream of records to one or more topics. •Consumer API: The Consumer API lets the application to subscribe to one or more topics and process the produced stream of records. •Streams API: This API takes the input from one or more topics and produces the output to one or more topics by converting the input streams to the output ones. •Connector API: This API is responsible for producing and executing reusable producers and consumers who are able to link topics to the existing applications.
  • 11. Copyright @ 2019 Learntek. All Rights Reserved. 11 Need for Apache Kafka : •Kafka is a unified platform for handling all the real-time data feeds •Kafka supports low latency message delivery and gives guarantee for fault tolerance in the presence of machine failures •It has the ability to handle a large number of diverse consumers •Kafka is very fast, performs 2 million writes/sec •Kafka persists all data to the disk, which essentially means that all the writes go to the page cache of the OS (RAM) •This makes it very efficient to transfer data from page cache to a network socket
  • 12. Copyright @ 2019 Learntek. All Rights Reserved. 12 Apache Kafka – Use Cases: Kafka can be used in many Use Cases. Some of them are listed below − •Metrics− Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. •Twitter: Registered users can read and post tweets, but unregistered users can only read tweets. Twitter uses Storm-Kafka as a part of their stream processing infrastructure. •Netflix: is an American multinational provider of on-demand Internet streaming media. Netflix uses Kafka for real-time monitoring and event processing.
  • 13. Copyright @ 2019 Learntek. All Rights Reserved. 13 •Log Aggregation Solution− Kafka can be used across an organization to collect logs from multiple services and make them available in a standard format to multiple con-summers. •LinkedIn: Apache Kafka is used at LinkedIn for activity stream data and operational metrics. Kafka messaging system helps LinkedIn with various products like LinkedIn Newsfeed, LinkedIn Today for online message consumption and in addition to offline analytics systems like Hadoop. •Stream Processing− Popular frameworks such as Storm and Spark Streaming read data from a topic, processes it, and write processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.
  • 14. Copyright @ 2019 Learntek. All Rights Reserved. 14 •Website activity tracking – The web application sends events such as page views and searches Kafka, where they become available for real-time processing, dashboards and offline analytics in Hadoop.
  • 15. Copyright @ 2019 Learntek. All Rights Reserved. 15 For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624