SlideShare a Scribd company logo
5
Most read
6
Most read
11
Most read
Apache
Kafka
@MartinPodval, hpsv.cz
What is Apache Kafka?
Messaging System
Distributed
Persistent and Replicable
Very fast - low latency - and scalable
Simple but highly configurable
By Linkedin, open sourced under apache.org
Data Streaming
New kind of data ...
● User or application data (events) streams
● Monitoring - App, System
● App Logging
● High volume
Data Streaming Cont’d
… you want to process
● Using various components
● Into a target form
● Map, reduce, shuffle
● Real time or batch
HP Service Virtualization Use Cases
Process of clients
message streams
Real-time performance
modeling
Logs aggregation
How To Solve It?
Producers and
Consumers
● Distributed
● Decoupled
● Configurable
● Dynamic
Kafka Cluster
Brokers
● = Instances, Nodes
● Topics
● Partitions
● Replicas
ZK
● Coordination
Kafka Topics
Commit Log
● Immutable
● Ordered
● Sequential Offset
Kafka Topics Cont’d
Partitioned
Independently:
● Stored
● Produced
● Consumed
⇒ Scalable
Replicated
● On partition basis
● Different brokers
⇒ Fault Tolerant
What Can I Do?
producer.
write(topic_id, message);
consumer.
read(topic_id, offset);
I Want To Produce
● java/scala client
● address of one or more brokers
● choose a topic where to produce
● highly configurable and tunable:
○ partitioner
○ number of acks (async=0, master=1, replicas=1+?)
○ batching, buffer size, timeouts, retries, ...
I Want To Consume
High Level API
● Groups abstraction
○ To All, To One
○ To Some
● Stream API
● Stores positions to support fault tolerance
I Want To Consume Cont’d
Low Level
● Java/scala client
● Find a leader for a topic
● Calculate an offset
● Fetches messages
○ Re-consume if needed
I Want To Consume Cont’d
Delivery Semantic:
● At most once
● At least once
● Exactly once
Kafka Internals - Disks
Avoid:
● GC
● Random disk
access
Kafka Internals - Disks Cont’d
Disks are fast ...
… when properly used
● sequential access - read ahead, write behind
● rely on operating system
○ avoid heap, materialization and GC
● it’s more like file copy over network
It’s easy … with immutable topics
Kafka Internals - Replication
“In Sync” Replicas
● Replication factor on partition basis
● One leader + 0..n replicas
● Replicas are consumers
○ “In Sync” if they are not “too far” behind a leader
○ Batch sync
Kafka Internals - Replication Cont’d
Tunable Trade-Offs
● Producer’s write method:
○ Not blocked, async
○ Waits for master ACK
○ Waits for all in-sync replicas
● Consumer pulls only committed messages
● Server’s minimum in-sync replicas
Performance
“Incredible”
Scales with:
● clients count, message size
● number of replicas, partitions or topics
Depends on network and disk throughput
Performance Cont’d
Our testing
● 3 nodes, master + 2 replicas
● 500 000 msg/s (100 bytes[])
● 400 mbit/s - 1.2 gbit/s network throughput
● end2end latency 2-3 ms
@see http://bit.ly/1FsIR9a
Easy of Use
● No installation, just run a
java/scala program
● Streams in files & dirs
● Transparent zookeeper
● Ecosystem
Cons
● Beta version
● Dependency on Zookeeper
● The way how it is written in Scala
● No easy way how to remove messages
Questions?

More Related Content

PDF
Apache Kafka Architecture & Fundamentals Explained
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
Apache kafka
PDF
PPTX
Introduction to Apache Kafka
PPTX
Introduction to Apache Kafka
PPTX
PPTX
A visual introduction to Apache Kafka
Apache Kafka Architecture & Fundamentals Explained
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Apache kafka
Introduction to Apache Kafka
Introduction to Apache Kafka
A visual introduction to Apache Kafka

What's hot (20)

PDF
An Introduction to Apache Kafka
PPTX
Apache Kafka
PDF
Fundamentals of Apache Kafka
PPTX
Kafka 101
PDF
Apache Kafka Introduction
PDF
Introduction to Apache Kafka
ODP
Stream processing using Kafka
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PPTX
Kafka 101
PPTX
Apache kafka
PPTX
Apache Kafka at LinkedIn
PPTX
Apache kafka
PDF
Kafka Streams: What it is, and how to use it?
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Kafka presentation
PDF
ksqlDB: A Stream-Relational Database System
PDF
Kafka Connect & Streams - the ecosystem around Kafka
PPTX
Apache Kafka - Patterns anti-patterns
PDF
Kafka Deep Dive
An Introduction to Apache Kafka
Apache Kafka
Fundamentals of Apache Kafka
Kafka 101
Apache Kafka Introduction
Introduction to Apache Kafka
Stream processing using Kafka
Apache Kafka Fundamentals for Architects, Admins and Developers
APACHE KAFKA / Kafka Connect / Kafka Streams
Kafka 101
Apache kafka
Apache Kafka at LinkedIn
Apache kafka
Kafka Streams: What it is, and how to use it?
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Kafka presentation
ksqlDB: A Stream-Relational Database System
Kafka Connect & Streams - the ecosystem around Kafka
Apache Kafka - Patterns anti-patterns
Kafka Deep Dive
Ad

Similar to Apache Kafka - Martin Podval (20)

PDF
Build real time stream processing applications using Apache Kafka
PDF
Building zero data loss pipelines with apache kafka
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
PDF
Insta clustr seattle kafka meetup presentation bb
PDF
Stateful stream processing with kafka and samza
PDF
Structured Streaming with Kafka
PDF
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
PDF
Event driven architectures with Kinesis
PDF
Netflix Open Source Meetup Season 4 Episode 2
PDF
Tips & Tricks for Apache Kafka®
PPTX
Netflix Data Pipeline With Kafka
PPTX
Netflix Data Pipeline With Kafka
PDF
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
PDF
Building realtime data pipeline with Apache Kafka
PDF
Activity feeds (and more) at mate1
PDF
Uber: Kafka Consumer Proxy
PPTX
Apache Kafka
PDF
Introduction to apache kafka
Build real time stream processing applications using Apache Kafka
Building zero data loss pipelines with apache kafka
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Insta clustr seattle kafka meetup presentation bb
Stateful stream processing with kafka and samza
Structured Streaming with Kafka
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Event driven architectures with Kinesis
Netflix Open Source Meetup Season 4 Episode 2
Tips & Tricks for Apache Kafka®
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Building realtime data pipeline with Apache Kafka
Activity feeds (and more) at mate1
Uber: Kafka Consumer Proxy
Apache Kafka
Introduction to apache kafka
Ad

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
L1 - Introduction to python Backend.pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPT
Introduction Database Management System for Course Database
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
AI in Product Development-omnex systems
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Essential Infomation Tech presentation.pptx
PDF
Digital Strategies for Manufacturing Companies
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Transform Your Business with a Software ERP System
2025 Textile ERP Trends: SAP, Odoo & Oracle
ManageIQ - Sprint 268 Review - Slide Deck
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
L1 - Introduction to python Backend.pptx
Softaken Excel to vCard Converter Software.pdf
PTS Company Brochure 2025 (1).pdf.......
Online Work Permit System for Fast Permit Processing
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Introduction Database Management System for Course Database
Upgrade and Innovation Strategies for SAP ERP Customers
Materi-Enum-and-Record-Data-Type (1).pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
AI in Product Development-omnex systems
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Essential Infomation Tech presentation.pptx
Digital Strategies for Manufacturing Companies
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
VVF-Customer-Presentation2025-Ver1.9.pptx
Transform Your Business with a Software ERP System

Apache Kafka - Martin Podval

  • 2. What is Apache Kafka? Messaging System Distributed Persistent and Replicable Very fast - low latency - and scalable Simple but highly configurable By Linkedin, open sourced under apache.org
  • 3. Data Streaming New kind of data ... ● User or application data (events) streams ● Monitoring - App, System ● App Logging ● High volume
  • 4. Data Streaming Cont’d … you want to process ● Using various components ● Into a target form ● Map, reduce, shuffle ● Real time or batch
  • 5. HP Service Virtualization Use Cases Process of clients message streams Real-time performance modeling Logs aggregation
  • 6. How To Solve It? Producers and Consumers ● Distributed ● Decoupled ● Configurable ● Dynamic
  • 7. Kafka Cluster Brokers ● = Instances, Nodes ● Topics ● Partitions ● Replicas ZK ● Coordination
  • 8. Kafka Topics Commit Log ● Immutable ● Ordered ● Sequential Offset
  • 9. Kafka Topics Cont’d Partitioned Independently: ● Stored ● Produced ● Consumed ⇒ Scalable Replicated ● On partition basis ● Different brokers ⇒ Fault Tolerant
  • 10. What Can I Do? producer. write(topic_id, message); consumer. read(topic_id, offset);
  • 11. I Want To Produce ● java/scala client ● address of one or more brokers ● choose a topic where to produce ● highly configurable and tunable: ○ partitioner ○ number of acks (async=0, master=1, replicas=1+?) ○ batching, buffer size, timeouts, retries, ...
  • 12. I Want To Consume High Level API ● Groups abstraction ○ To All, To One ○ To Some ● Stream API ● Stores positions to support fault tolerance
  • 13. I Want To Consume Cont’d Low Level ● Java/scala client ● Find a leader for a topic ● Calculate an offset ● Fetches messages ○ Re-consume if needed
  • 14. I Want To Consume Cont’d Delivery Semantic: ● At most once ● At least once ● Exactly once
  • 15. Kafka Internals - Disks Avoid: ● GC ● Random disk access
  • 16. Kafka Internals - Disks Cont’d Disks are fast ... … when properly used ● sequential access - read ahead, write behind ● rely on operating system ○ avoid heap, materialization and GC ● it’s more like file copy over network It’s easy … with immutable topics
  • 17. Kafka Internals - Replication “In Sync” Replicas ● Replication factor on partition basis ● One leader + 0..n replicas ● Replicas are consumers ○ “In Sync” if they are not “too far” behind a leader ○ Batch sync
  • 18. Kafka Internals - Replication Cont’d Tunable Trade-Offs ● Producer’s write method: ○ Not blocked, async ○ Waits for master ACK ○ Waits for all in-sync replicas ● Consumer pulls only committed messages ● Server’s minimum in-sync replicas
  • 19. Performance “Incredible” Scales with: ● clients count, message size ● number of replicas, partitions or topics Depends on network and disk throughput
  • 20. Performance Cont’d Our testing ● 3 nodes, master + 2 replicas ● 500 000 msg/s (100 bytes[]) ● 400 mbit/s - 1.2 gbit/s network throughput ● end2end latency 2-3 ms @see http://bit.ly/1FsIR9a
  • 21. Easy of Use ● No installation, just run a java/scala program ● Streams in files & dirs ● Transparent zookeeper ● Ecosystem
  • 22. Cons ● Beta version ● Dependency on Zookeeper ● The way how it is written in Scala ● No easy way how to remove messages