SlideShare a Scribd company logo
Exactly Once Processing
with Arroyo and Kafka
Exactly Once Stream
Processing with
Arroyo and Kafka
Kafka Summit London / March 20, 2024
Micah Wylde
co-founder, Arroyo
@mwylde
What is Arroyo
● SQL stream processing engine built on Apache Arrow
● Stateful computations at millions of events/sec
● Designed for modern cloud environments with fast rescaling
and recovery
● 10-20x faster than Flink
What is Arroyo
https://github.com/ArroyoSystems/arroyo
Fully open-source (apache 2)
Exactly-once Stream Processing with Arroyo and Kafka
X
Retry the write → at least once
Don’t retry → at most once
Solution 1:
Idempotent writes
Solution 1:
Idempotent writes
Solution 1:
Idempotent writes
enable.idempotence=true
acks=all
max.inflight.requests.per.connection= <=5
Producer config
* default enabled since Kafka 3.0.0
Solution 2:
Transactions
A digression to stateful dataflows
SELECT user_id, count(*) as count
FROM transactions
WHERE status = 'FAILED'
GROUP BY user_id, hop(interval '5 seconds', interval '24 hours');
kafka source
Filter
status = ‘FAILED’
Sliding Window kafka sink
s3
partition → last read offset
s3
24-hour window state
To make this work, we need
to introduce a two-phase
commit protocol (2PC)
Exactly-once Stream Processing with Arroyo and Kafka
KIP-939: Support Participation in 2PC
● Producers can call InitProducerId without causing existing
transactions to be cancelled
● Then, we can choose to commit or abort the ongoing transaction
during recovery
questions?
micah@arroyo.dev
@mwylde
linkedin.com/u/wylde
https://github.com/ArroyoSystems/arroyo

More Related Content

PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PDF
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
PDF
Scaling docker with kubernetes
PPTX
Where is my scalable api?
PDF
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
PDF
Sparkstreaming
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Server(less) Swift at SwiftCloudWorkshop 3
Apache Flink(tm) - A Next-Generation Stream Processor
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Scaling docker with kubernetes
Where is my scalable api?
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Sparkstreaming
Introduction to apache kafka, confluent and why they matter
Server(less) Swift at SwiftCloudWorkshop 3

Similar to Exactly-once Stream Processing with Arroyo and Kafka (20)

PPTX
Where is my scalable API?
PDF
Unified Stream and Batch Processing with Apache Flink
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
PDF
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
PPTX
Apache Kafka
PDF
Phil Basford - machine learning at scale with aws sage maker
PPTX
Machine learning at scale with aws sage maker
PDF
Stream Processing using Apache Spark and Apache Kafka
PDF
Asynchronous Ruby
PDF
Docker Athens: Docker Engine Evolution & Containerd Use Cases
PPTX
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
PPTX
Spark Streaming & Kafka-The Future of Stream Processing
PDF
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
PDF
Developing Realtime Data Pipelines With Apache Kafka
PDF
Automate drupal deployments with linux containers, docker and vagrant
PDF
Deltacloud API
PDF
Serverless London 2019 FaaS composition using Kafka and CloudEvents
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
Apache OpenWhisk Serverless Computing
KEY
Building Distributed Systems in Scala
Where is my scalable API?
Unified Stream and Batch Processing with Apache Flink
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Apache Kafka
Phil Basford - machine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
Stream Processing using Apache Spark and Apache Kafka
Asynchronous Ruby
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming & Kafka-The Future of Stream Processing
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
Developing Realtime Data Pipelines With Apache Kafka
Automate drupal deployments with linux containers, docker and vagrant
Deltacloud API
Serverless London 2019 FaaS composition using Kafka and CloudEvents
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Apache OpenWhisk Serverless Computing
Building Distributed Systems in Scala
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
PDF
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Ad

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing

Exactly-once Stream Processing with Arroyo and Kafka