SlideShare a Scribd company logo
Kafka - Linkedin’s messaging backbone
Kafka - Linkedin's messaging backbone
Who are we ?
▪ Kafka SRE at LinkedIn
▪ Site Reliability Engineering
– Administrators
– Architects
– Developers
▪ Keep the site running, always
Presenters
▪ Clark Haskins
– Manager for Data Infra Streaming SRE (Mountain View, CA)
▪ Ayyappadas Ravindran
– Staff Site Reliability Engineer,
Data Infra Streaming (Bengaluru)
▪ Akash Vacher
▪ Site Reliability Engineer,
Data Infra Streaming (Bengaluru)
Agenda
▪ What the heck is Kafka ?
– Brief intro
– Motivation to build Kafka
▪ Okay, why should I bother ?
– Kafka facts, scale & performance
▪ You have my attention, tell me more !
– Core concepts
– Operating kafka
– Kafka @ Linkedin
▪ Nice, where all do you use kafka ?
– Tale of two applications
▪ Have questions ?
What the heck is Kafka?
▪ A high-throughput distributed messaging system
▪ Developed at Linkedin and open sourced in early 2011
▪ Implemented in Scala and Java
▪ Linkedin’s messaging backbone
▪ Kafka powers around 1000 companies including
Linkedin, Yahoo!, Netflix, Uber, Twitter and many more
If data is lifeblood of high technology, Apache Kafka is the circulatory system used in Linkedin
– Todd Palino (Staff SRE Engineer Linkedin)
Motivation to create Kafka ?
▪ Needed a unified platform to handle all
real time data feeds and stream processing
▪ Wanted a messaging system with high
throughput to support high volume event feeds
▪ Needed data persistence for offline
systems and in case of service recovery
▪ Low latency
▪ Fault tolerant
▪ Linearly scalable
Okay ! what was
the motivation to
create Kafka?
Before
After
How is Kafka used at Linkedin?
▪ Application and System Monitoring (inGraphs)
▪ User tracking on Linkedin web sites
▪ Email, push & SMS notifications
▪ Live search updates
▪ Samza Jobs (standardization, call graph and more)
▪ Database Replication
Okay, why should I bother?
▪ Over 1,300,000,000,000 messages are transported
via Kafka every day at LinkedIn
▪ 300 Terabytes of inbound and 900 Terabytes of
outbound traffic
▪ 4.5 Million messages per second, on single
cluster
▪ Kafka runs on around 1300 servers at LinkedIn
hmmm .. ! How
good is Kafka ?
You have my attention, tell me more !
▪ Building blocks
– Message
– Producers
– Consumers
– Topics
– Partitions
– Segments
– Brokers
– Replicas
Awesome !! I am in Tell
me more !
Bird’s eye view
The data continues ..
What Is Kafka?
Broker
A
P0
A
P1
A
P0
15
Consumer
Producer
Zookeeper
Performance recipes
▪ OS page cache
▪ Linear IO, never fear the file system !
▪ sendfile(), system call
▪ Message batching
Dude, tell me the
performance secret!!
Operating Kafka
▪ Broker Hardware
– Cisco C240, Intel xeon, 64GB
RAM , 14 disk Raid-10
▪ Zookeeper Hardware
– 5 + 1 ensemble, 64GB RAM,
500GB SSD
▪ Monitoring
– Lag monitoring
– Under Replicated Partitions
– Unclean leader election
– Burrow
▪ Cluster rebalance
– Sizewise rebalance
– Partitionwise rebalance
Tell me how you
manage this beast !
Mirror Maker and Audit
Kafka - Linkedin's messaging backbone
Kafka Audit(event count)
Kafka Audit(data transport time)
Kafka @ Linkedin
▪ Cluster Types
– Tracking
– Metrics
– Queuing
▪ Kafka Rest
▪ Schema Registry
Kafka @ Linkedin - Schema registry
Autometrics
▪ Building Blocks
– Sensors
– EventBus
– Kafka Rest
– Kafka cluster
– Kafka consumer
– RRD
– Front end
▪ Facts & Figures
– 320,000,000 metrics
collected per minute
▪ 530 TB of disk space
▪ Over 210,000 metrics
collected per service
InGraphs
Kafka for database replication - Master slave
Kafka for database replication - Multi master
Have questions?

More Related Content

PPTX
Architecture of a Kafka camus infrastructure
PPTX
Apache Kafka at LinkedIn
PPTX
Netflix Data Pipeline With Kafka
PPTX
Introduction to Kafka
PPTX
Change Data Capture using Kafka
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
PPTX
Copy of Kafka-Camus
PPTX
Introduction to Kafka Cruise Control
Architecture of a Kafka camus infrastructure
Apache Kafka at LinkedIn
Netflix Data Pipeline With Kafka
Introduction to Kafka
Change Data Capture using Kafka
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Copy of Kafka-Camus
Introduction to Kafka Cruise Control

What's hot (20)

PPTX
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
PPTX
I Heart Log: Real-time Data and Apache Kafka
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
PPTX
Apache Kafka at LinkedIn
PDF
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
PDF
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
PDF
Flink forward-2017-netflix keystones-paas
PPTX
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
PPTX
How to Lock Down Apache Kafka and Keep Your Streams Safe
PDF
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
PDF
The Many Faces of Apache Kafka: Leveraging real-time data at scale
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
PDF
The Netflix Way to deal with Big Data Problems
PDF
Introduction to Apache Kafka and why it matters - Madrid
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
PDF
Unbounded bounded-data-strangeloop-2016-monal-daxini
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
I Heart Log: Real-time Data and Apache Kafka
Real time Messages at Scale with Apache Kafka and Couchbase
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Apache Kafka at LinkedIn
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Flink forward-2017-netflix keystones-paas
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
How to Lock Down Apache Kafka and Keep Your Streams Safe
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
The Netflix Way to deal with Big Data Problems
Introduction to Apache Kafka and why it matters - Madrid
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Unbounded bounded-data-strangeloop-2016-monal-daxini
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Ad

Viewers also liked (14)

PPTX
Linked in multi tier, multi-tenant, multi-problem kafka
PDF
Scalable stream processing with Apache Kafka and Apache Samza
 
PPTX
AWS Chicago 2016 Lessons Learned Deploying the ELK Stack
PDF
A Visual Introduction to Event Sourcing and CQRS by Lorenzo Nicora
PDF
Apache Kafka, and the Rise of Stream Processing
PPTX
Scaling an ELK stack at bol.com
PPTX
Building Event-Driven Systems with Apache Kafka
PPTX
ELK at LinkedIn - Kafka, scaling, lessons learned
PPTX
CQRS and Event Sourcing, An Alternative Architecture for DDD
PPTX
Elastic Search
PDF
Developing event-driven microservices with event sourcing and CQRS (svcc, sv...
PPTX
An Introduction to Elastic Search.
PPTX
15 Tips for Compelling Company Updates on LinkedIn
PDF
The Top Skills That Can Get You Hired in 2017
Linked in multi tier, multi-tenant, multi-problem kafka
Scalable stream processing with Apache Kafka and Apache Samza
 
AWS Chicago 2016 Lessons Learned Deploying the ELK Stack
A Visual Introduction to Event Sourcing and CQRS by Lorenzo Nicora
Apache Kafka, and the Rise of Stream Processing
Scaling an ELK stack at bol.com
Building Event-Driven Systems with Apache Kafka
ELK at LinkedIn - Kafka, scaling, lessons learned
CQRS and Event Sourcing, An Alternative Architecture for DDD
Elastic Search
Developing event-driven microservices with event sourcing and CQRS (svcc, sv...
An Introduction to Elastic Search.
15 Tips for Compelling Company Updates on LinkedIn
The Top Skills That Can Get You Hired in 2017
Ad

Similar to Kafka - Linkedin's messaging backbone (20)

PPTX
An introduction to Apache Kafka and Kafka ecosystem at LinkedIn
PPTX
CouchbasetoHadoop_Matt_Michael_Justin v4
PDF
Apache kafka
PPTX
Apache kafka
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
PPTX
Understanding kafka
PPTX
Kafka overview and use cases
PDF
PDF
Apache Kafka - Scalable Message-Processing and more !
PPTX
Apache Kafka: Next Generation Distributed Messaging System
PPTX
Apache kafka
PPTX
Apache Kafka 0.8 basic training - Verisign
PDF
Fault Tolerance with Kafka
PPTX
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
PDF
kafka-tutorial-cloudruable-v2.pdf
PDF
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
PPTX
How Apache Kafka is transforming Hadoop, Spark and Storm
An introduction to Apache Kafka and Kafka ecosystem at LinkedIn
CouchbasetoHadoop_Matt_Michael_Justin v4
Apache kafka
Apache kafka
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Understanding kafka
Kafka overview and use cases
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka: Next Generation Distributed Messaging System
Apache kafka
Apache Kafka 0.8 basic training - Verisign
Fault Tolerance with Kafka
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
kafka-tutorial-cloudruable-v2.pdf
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
How Apache Kafka is transforming Hadoop, Spark and Storm

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
A Presentation on Artificial Intelligence
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
A Presentation on Artificial Intelligence
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Chapter 3 Spatial Domain Image Processing.pdf
Modernizing your data center with Dell and AMD
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Digital-Transformation-Roadmap-for-Companies.pptx

Kafka - Linkedin's messaging backbone

  • 1. Kafka - Linkedin’s messaging backbone
  • 3. Who are we ? ▪ Kafka SRE at LinkedIn ▪ Site Reliability Engineering – Administrators – Architects – Developers ▪ Keep the site running, always
  • 4. Presenters ▪ Clark Haskins – Manager for Data Infra Streaming SRE (Mountain View, CA) ▪ Ayyappadas Ravindran – Staff Site Reliability Engineer, Data Infra Streaming (Bengaluru) ▪ Akash Vacher ▪ Site Reliability Engineer, Data Infra Streaming (Bengaluru)
  • 5. Agenda ▪ What the heck is Kafka ? – Brief intro – Motivation to build Kafka ▪ Okay, why should I bother ? – Kafka facts, scale & performance ▪ You have my attention, tell me more ! – Core concepts – Operating kafka – Kafka @ Linkedin ▪ Nice, where all do you use kafka ? – Tale of two applications ▪ Have questions ?
  • 6. What the heck is Kafka? ▪ A high-throughput distributed messaging system ▪ Developed at Linkedin and open sourced in early 2011 ▪ Implemented in Scala and Java ▪ Linkedin’s messaging backbone ▪ Kafka powers around 1000 companies including Linkedin, Yahoo!, Netflix, Uber, Twitter and many more If data is lifeblood of high technology, Apache Kafka is the circulatory system used in Linkedin – Todd Palino (Staff SRE Engineer Linkedin)
  • 7. Motivation to create Kafka ? ▪ Needed a unified platform to handle all real time data feeds and stream processing ▪ Wanted a messaging system with high throughput to support high volume event feeds ▪ Needed data persistence for offline systems and in case of service recovery ▪ Low latency ▪ Fault tolerant ▪ Linearly scalable Okay ! what was the motivation to create Kafka?
  • 10. How is Kafka used at Linkedin? ▪ Application and System Monitoring (inGraphs) ▪ User tracking on Linkedin web sites ▪ Email, push & SMS notifications ▪ Live search updates ▪ Samza Jobs (standardization, call graph and more) ▪ Database Replication
  • 11. Okay, why should I bother? ▪ Over 1,300,000,000,000 messages are transported via Kafka every day at LinkedIn ▪ 300 Terabytes of inbound and 900 Terabytes of outbound traffic ▪ 4.5 Million messages per second, on single cluster ▪ Kafka runs on around 1300 servers at LinkedIn hmmm .. ! How good is Kafka ?
  • 12. You have my attention, tell me more ! ▪ Building blocks – Message – Producers – Consumers – Topics – Partitions – Segments – Brokers – Replicas Awesome !! I am in Tell me more !
  • 16. Performance recipes ▪ OS page cache ▪ Linear IO, never fear the file system ! ▪ sendfile(), system call ▪ Message batching Dude, tell me the performance secret!!
  • 17. Operating Kafka ▪ Broker Hardware – Cisco C240, Intel xeon, 64GB RAM , 14 disk Raid-10 ▪ Zookeeper Hardware – 5 + 1 ensemble, 64GB RAM, 500GB SSD ▪ Monitoring – Lag monitoring – Under Replicated Partitions – Unclean leader election – Burrow ▪ Cluster rebalance – Sizewise rebalance – Partitionwise rebalance Tell me how you manage this beast !
  • 22. Kafka @ Linkedin ▪ Cluster Types – Tracking – Metrics – Queuing ▪ Kafka Rest ▪ Schema Registry
  • 23. Kafka @ Linkedin - Schema registry
  • 24. Autometrics ▪ Building Blocks – Sensors – EventBus – Kafka Rest – Kafka cluster – Kafka consumer – RRD – Front end ▪ Facts & Figures – 320,000,000 metrics collected per minute ▪ 530 TB of disk space ▪ Over 210,000 metrics collected per service
  • 26. Kafka for database replication - Master slave
  • 27. Kafka for database replication - Multi master