SlideShare a Scribd company logo
Real-Time Dynamic
Data Export Using the
Kafka Ecosystem
2
Product Overview

What were we trying to build?
Architecture

How did we build it?
What did we learn building it?
Today
About Me
• Preston Thompson
• Senior Software Engineer
• 4 years at Braze
• Backend Application Developer
• Data Infrastructure
• Application Infrastructure
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
More than 

1 Billion 

MAU
ON SIX CONTINENTS
1.5B+
MAU
30B+
EVENTS PER DAY
Scale
8
TOC
Product Overview
Currents
• Real time data export
• Customers can create one or more exports to seven different
partner destinations
• Data Warehouse - AWS S3, Azure Blob, Google Cloud Storage
• Behavioral Analytics - Amplitude, Mixpanel
• Customer Data Platform - mParticle, Segment
• 30 different event types
• Message Engagement Events
• Customer Behavior Events
• 200+ active integrations
• ~1B events exported per day
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
1 2
TOC
Architecture +
Lessons Learned
Events
• Ruby applications producing to Kafka using ruby-kafka
• All events are actions related to a specific end-user
• Push Notification Send
• Push Notification Open
• Campaign Conversion
• Purchase
• Custom Event
• 30 different events types, one topic each
• Events for all Braze customers are mixed together within a
partition
• Use user ID for key to guarantee balanced partitions
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Filter and Transform
• Requirements
• Events types are configurable per integration
• 7 different destinations
• REST API
• Object storage
• Solution
• Kafka Streams application
• Input topics = event topics
• Output topics = integration-specific topics
• Configuration file storing integration settings (e.g. which events
to send, anonymous users)
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Denormalization
• Events have lots of IDs to reference other items
• Campaigns
• Message Variations
• Apps
• Names can be nice in some of the destinations
• Example: Amplitude dashboard when selecting campaign
• Transform
• New topic with database changes
• Global State Store
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Connect
• One topic per integration
• Independent processing
• Invalid credentials
• Partner downtime
• Rate limiting
• Difficult because we must limit number of partitions per connector
• Rebalance loops
• Increase number of hosts
• Future - maybe split integrations across many connect clusters
• REST API
• Automatically restart failed tasks
• Manage active connectors - new and recently updated
• Scale number of active tasks when needed
Custom Connectors
• HTTP Connectors
• Try to batch the best we can
• Retries are difficult
• Retry immediately a few times
• If that fails, throw RetriableException
• Exponential retry not built in
• Object storage connectors
• S3, Azure Blob, GCS
• Built on top of Confluent S3 Connector using pluggable
classes
• Inner Avro format
• Different credentials per connector
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Volume Metrics
• Counts the number of events exported
• Kafka Streams application
• Consumes integration-specific topics
• Uses a simple aggregator
• Requires state, so a bit more complicated
• Interactive Queries
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Misato
• Manages applications to be in the desired state of the system
• Creates a configuration file for Streams
• Restarts Streams to pick up changes to that file
• Creates new topics for integrations
• Creates new connectors to read from those topics
• Updates configurations for connectors
• Manages topics read by Volume Metrics app
Real-Time Dynamic Data Export Using the Kafka Ecosystem
2 7
TOC
Quick Recap
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Thanks!
• Team at Braze
• We’re hiring!
• braze.com/careers
• Confluent
• Arcadia Data

More Related Content

PDF
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
PDF
Feature Toggle
PDF
Scaling up uber's real time data analytics
PDF
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
PDF
Image Processing on Delta Lake
PDF
Delta Lake: Optimizing Merge
PDF
Development Best Practices
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
How Uber scaled its Real Time Infrastructure to Trillion events per day
Feature Toggle
Scaling up uber's real time data analytics
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Image Processing on Delta Lake
Delta Lake: Optimizing Merge
Development Best Practices

What's hot (20)

PDF
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
PPTX
Netflix Data Pipeline With Kafka
PDF
[기술 트렌드] Gartner 선정 10대 전략 기술
PDF
Introducing Databricks Delta
PDF
Grokking TechTalk #33: High Concurrency Architecture at TIKI
PDF
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
PPTX
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
PDF
How to use Impala query plan and profile to fix performance issues
PDF
Bringing code to the data: from MySQL to RocksDB for high volume searches
PDF
Implementing Domain Events with Kafka
PDF
Domain Driven Design và Event Driven Architecture
PPTX
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
PDF
Enterprise Knowledge Graph
PDF
Uber Real Time Data Analytics
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Insurance industry trends 2015 and beyond: #3 Cloud Computing
PDF
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
PDF
Understanding of Apache kafka metrics for monitoring
PPTX
Generative AI Masterclass - Model Risk Management.pptx
PPT
State of Security: Apache Spark & Apache Zeppelin
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
Netflix Data Pipeline With Kafka
[기술 트렌드] Gartner 선정 10대 전략 기술
Introducing Databricks Delta
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
How to use Impala query plan and profile to fix performance issues
Bringing code to the data: from MySQL to RocksDB for high volume searches
Implementing Domain Events with Kafka
Domain Driven Design và Event Driven Architecture
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Enterprise Knowledge Graph
Uber Real Time Data Analytics
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Insurance industry trends 2015 and beyond: #3 Cloud Computing
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
Understanding of Apache kafka metrics for monitoring
Generative AI Masterclass - Model Risk Management.pptx
State of Security: Apache Spark & Apache Zeppelin
Ad

Similar to Real-Time Dynamic Data Export Using the Kafka Ecosystem (20)

PDF
AWS Community Nordics Virtual Meetup
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
PPTX
Anatomy of a data driven architecture - Tamir Dresher
PDF
Streaming Visualization
PDF
A Dive into Streams @LinkedIn with Brooklin
PPTX
Ingestion and Dimensions Compute and Enrich using Apache Apex
PDF
Unbounded bounded-data-strangeloop-2016-monal-daxini
PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
PDF
Streaming Visualization
PPTX
Introduction to streaming and messaging flume,kafka,SQS,kinesis
PDF
Real-time Fraudulent Trips Detection with Xueyao Jiang
PDF
Streaming Visualisation
PPTX
Music streams
PPT
Moving Towards a Streaming Architecture
PDF
Confluent Partner Tech Talk with BearingPoint
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
PDF
Open Security Operations Center - OpenSOC
AWS Community Nordics Virtual Meetup
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Anatomy of a data driven architecture - Tamir Dresher
Streaming Visualization
A Dive into Streams @LinkedIn with Brooklin
Ingestion and Dimensions Compute and Enrich using Apache Apex
Unbounded bounded-data-strangeloop-2016-monal-daxini
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Streaming Visualization
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Real-time Fraudulent Trips Detection with Xueyao Jiang
Streaming Visualisation
Music streams
Moving Towards a Streaming Architecture
Confluent Partner Tech Talk with BearingPoint
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Open Security Operations Center - OpenSOC
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
PDF
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced IT Governance
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Understanding_Digital_Forensics_Presentation.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced IT Governance
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
Advanced Soft Computing BINUS July 2025.pdf
Modernizing your data center with Dell and AMD
GamePlan Trading System Review: Professional Trader's Honest Take
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Real-Time Dynamic Data Export Using the Kafka Ecosystem

  • 1. Real-Time Dynamic Data Export Using the Kafka Ecosystem
  • 2. 2 Product Overview
 What were we trying to build? Architecture
 How did we build it? What did we learn building it? Today
  • 3. About Me • Preston Thompson • Senior Software Engineer • 4 years at Braze • Backend Application Developer • Data Infrastructure • Application Infrastructure
  • 6. More than 
 1 Billion 
 MAU ON SIX CONTINENTS
  • 9. Currents • Real time data export • Customers can create one or more exports to seven different partner destinations • Data Warehouse - AWS S3, Azure Blob, Google Cloud Storage • Behavioral Analytics - Amplitude, Mixpanel • Customer Data Platform - mParticle, Segment • 30 different event types • Message Engagement Events • Customer Behavior Events • 200+ active integrations • ~1B events exported per day
  • 13. Events • Ruby applications producing to Kafka using ruby-kafka • All events are actions related to a specific end-user • Push Notification Send • Push Notification Open • Campaign Conversion • Purchase • Custom Event • 30 different events types, one topic each • Events for all Braze customers are mixed together within a partition • Use user ID for key to guarantee balanced partitions
  • 15. Filter and Transform • Requirements • Events types are configurable per integration • 7 different destinations • REST API • Object storage • Solution • Kafka Streams application • Input topics = event topics • Output topics = integration-specific topics • Configuration file storing integration settings (e.g. which events to send, anonymous users)
  • 18. Denormalization • Events have lots of IDs to reference other items • Campaigns • Message Variations • Apps • Names can be nice in some of the destinations • Example: Amplitude dashboard when selecting campaign • Transform • New topic with database changes • Global State Store
  • 20. Connect • One topic per integration • Independent processing • Invalid credentials • Partner downtime • Rate limiting • Difficult because we must limit number of partitions per connector • Rebalance loops • Increase number of hosts • Future - maybe split integrations across many connect clusters • REST API • Automatically restart failed tasks • Manage active connectors - new and recently updated • Scale number of active tasks when needed
  • 21. Custom Connectors • HTTP Connectors • Try to batch the best we can • Retries are difficult • Retry immediately a few times • If that fails, throw RetriableException • Exponential retry not built in • Object storage connectors • S3, Azure Blob, GCS • Built on top of Confluent S3 Connector using pluggable classes • Inner Avro format • Different credentials per connector
  • 23. Volume Metrics • Counts the number of events exported • Kafka Streams application • Consumes integration-specific topics • Uses a simple aggregator • Requires state, so a bit more complicated • Interactive Queries
  • 25. Misato • Manages applications to be in the desired state of the system • Creates a configuration file for Streams • Restarts Streams to pick up changes to that file • Creates new topics for integrations • Creates new connectors to read from those topics • Updates configurations for connectors • Manages topics read by Volume Metrics app
  • 34. Thanks! • Team at Braze • We’re hiring! • braze.com/careers • Confluent • Arcadia Data