SlideShare a Scribd company logo
1
Introducing Exactly Once
Semantics in Apache Kafka
Jason Gustafson, Guozhang Wang, Sriram
Subramaniam, and Apurva Mehta
2
On deck..
• Kafka’s existing delivery semantics.
• Why did we improve them?
• What’s new?
• How do you use it?
• Summary.
3
Apache Kafka’s existing semantics
4
Existing Semantics
5
Existing Semantics
6
Existing Semantics
7
Existing Semantics
8
Existing Semantics
9
Existing Semantics
10
Existing Semantics
11
Existing Semantics
12
Existing Semantics
13
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates.
14
Why improve?
15
Why improve?
• Stream processing is becoming an ever bigger part of the
data landscape.
• Apache Kafka is the heart of the streams platform.
• Strengthening Kafka’s semantics expands the universe of
streaming applications.
16
A motivating example..
A peer to peer lending platform which processes micro-
loans between users.
17
A Peer to Peer Lender
18
The Basic Flow
19
Offset commits
20
Reprocessed transfer, eek!
21
Lost money! Eek eek!
22
What’s new?
23
What’s new
• Exactly once in order delivery per partition
• Atomic writes across multiple partitions
• Performance considerations
24
What’s new, Part 1
Exactly once, in order, delivery per partition
25
The idempotent producer
26
The idempotent producer
27
The idempotent producer
28
The idempotent producer
29
The idempotent producer
30
The idempotent producer
31
The idempotent producer
32
The idempotent producer
33
TL;DR
• Sequence numbers and producer ids:
• enable de-dup
• are in the log.
• Hence de-dup works transparently across leader changes.
• Will not de-dup application-level resends.
• Works transparently – no API changes.
34
What’s new, part 2
Multi partition writes.
35
Introducing ‘transactions’
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
36
Introducing ‘transactions’
37
Initializing ‘transactions’
38
Transactional sends – part 1
39
Transactional sends – part 2
40
Commit – phase 1
41
Commit – phase 2
42
Commit – phase 2
43
Success!
44
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
45
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
46
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
47
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
48
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
49
Consumer returns only committed messages
50
Some notes on consuming transactions
• Two ‘isolation levels’ : read_committed, and
read_uncommitted.
• Messages read in offset order.
• read_committed consumers read to the point where there
are no open transactions.
51
TL;DR
• Transaction coordinator and transaction log maintain
transaction state.
• Use the new producer APIs for transactions.
• Consumers can read only committed messages.
52
Part 3
Performance!
53
What’s new, part 3: Performance boost!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Savings start when you batch
• Details: https://bit.ly/kafka-eos-perf
54
Too good to be true?
Let’s understand how!
55
The old message format
56
The new format
57
The new format -> new fields
58
The new format -> new fields
59
The new format -> delta encoding
60
A visual comparison with 7 records, 10 bytes each
61
TL;DR
• With a batch size of 2, the new format starts saving
space.
• Savings are maximal for large batches of small
messages.
• Hence higher throughput when IO bound.
• Works as soon as you upgrade to the new format.
62
Cool!
But how do I use this?
63
Producer Configs
• enable.idempotence = true
• max.inflight.requests.per.connection=1
• acks = “all”
• retries > 1 (preferably MAX_INT)
• transactional.id = ‘some unique id’
• enable.idempotence = true
64
Consumer configs
• isolation.level:
• “read_committed”, or
• “read_uncommitted”
65
Streams config
• processing.mode = “exactly_once”
66
Putting it together
• We understood Kafka’s existing delivery semantics
• Understood why we want to improve them
• Learned how these have been strengthened
• Learned how the new semantics work
67
When is it available?
Available to try in Kafka 0.11, June 2017.
68
Thank You!

More Related Content

PDF
Deploying Confluent Platform for Production
PPTX
Capture the Streams of Database Changes
PPTX
Introducing Exactly Once Semantics To Apache Kafka
PPTX
Portable Streaming Pipelines with Apache Beam
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
PPTX
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
PDF
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
PPTX
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Deploying Confluent Platform for Production
Capture the Streams of Database Changes
Introducing Exactly Once Semantics To Apache Kafka
Portable Streaming Pipelines with Apache Beam
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...

What's hot (20)

PPTX
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
PDF
Kafka At Scale in the Cloud
PDF
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
PDF
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
PPTX
How to Lock Down Apache Kafka and Keep Your Streams Safe
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
PDF
Exactly-once Semantics in Apache Kafka
PDF
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
PDF
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
PPTX
Exactly-once Stream Processing with Kafka Streams
PDF
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
PPTX
How to manage large amounts of data with akka streams
PPTX
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
PDF
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
PDF
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
PDF
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
PDF
ksqlDB: A Stream-Relational Database System
PDF
Flink forward-2017-netflix keystones-paas
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Kafka At Scale in the Cloud
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
How to Lock Down Apache Kafka and Keep Your Streams Safe
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Exactly-once Semantics in Apache Kafka
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Exactly-once Stream Processing with Kafka Streams
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
How to manage large amounts of data with akka streams
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
ksqlDB: A Stream-Relational Database System
Flink forward-2017-netflix keystones-paas
Ad

Similar to Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka (20)

PPTX
Kafka eos
PDF
Apache Kafka: New Features That You Might Not Know About
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
DOCX
A Quick Guide to Refresh Kafka Skills
PDF
TDEA 2018 Kafka EOS (Exactly-once)
PPTX
Design Patterns for working with Fast Data in Kafka
PPTX
Design Patterns for working with Fast Data
PDF
PDF
Exactly-once Stream Processing Done Right with Matthias J Sax
PDF
Structured Streaming with Kafka
PPT
Kafka Explainaton
PDF
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
PDF
Hello, kafka! (an introduction to apache kafka)
PDF
Spark streaming + kafka 0.10
PPTX
Kafka overview
PPTX
Kafka reliability velocity 17
PDF
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
PPTX
Introduction to Kafka
PDF
The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
Kafka eos
Apache Kafka: New Features That You Might Not Know About
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
A Quick Guide to Refresh Kafka Skills
TDEA 2018 Kafka EOS (Exactly-once)
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data
Exactly-once Stream Processing Done Right with Matthias J Sax
Structured Streaming with Kafka
Kafka Explainaton
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Hello, kafka! (an introduction to apache kafka)
Spark streaming + kafka 0.10
Kafka overview
Kafka reliability velocity 17
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Introduction to Kafka
The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Introduction to Artificial Intelligence
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
AI in Product Development-omnex systems
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
L1 - Introduction to python Backend.pptx
PDF
System and Network Administration Chapter 2
PPTX
ai tools demonstartion for schools and inter college
PPTX
Transform Your Business with a Software ERP System
PPTX
history of c programming in notes for students .pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Design an Analysis of Algorithms I-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
Introduction to Artificial Intelligence
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
ManageIQ - Sprint 268 Review - Slide Deck
How Creative Agencies Leverage Project Management Software.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Upgrade and Innovation Strategies for SAP ERP Customers
AI in Product Development-omnex systems
CHAPTER 2 - PM Management and IT Context
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PTS Company Brochure 2025 (1).pdf.......
L1 - Introduction to python Backend.pptx
System and Network Administration Chapter 2
ai tools demonstartion for schools and inter college
Transform Your Business with a Software ERP System
history of c programming in notes for students .pptx
Odoo Companies in India – Driving Business Transformation.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Design an Analysis of Algorithms I-SECS-1021-03

Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka