SlideShare a Scribd company logo
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
Verifying Apache Kafka-based
Data Pipelines
Current 2022
October 5, 2022
Subhangi Agarwala
Senior Software Engineer
© 2022 Bloomberg Finance L.P. All rights reserved.
• Bloomberg
• Derivatives Data Engineering
— Data ingestion, storage and aggregation
• We build real-time financial market data pipelines
About Us
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
Agenda
● Background and our use case
● Why we need to test streaming applications in finance
● Integration test framework
● Conclusion
© 2022 Bloomberg Finance L.P. All rights reserved.
Real-Time Data Pipelines in
Finance – and at Bloomberg
© 2022 Bloomberg Finance L.P. All rights reserved.
Application Architecture
Config
Publisher
Trading Data
Publisher
Kafka
Connect
Persistent
data store
Kafka
Streams
Client App
Analytical Engine
Real Time
analysis
Config
Updates
Config
Changes
Trading
Data
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
● Unit Testing
○ Easily test new functional logic in isolation
○ Insufficient for larger workflows
● Integration Testing
○ Handle exceptions and failures where different moving parts are involved
○ Reliability
Unit Testing vs. Integration Testing
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
Testing Framework
Test
Validator
TestContainers
using Docker
(Producer)
TestContainers
using Docker
(Consumer)
TestContainers using
Docker
(test Kafka Connect)
Kafka Kafka
KConnect
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
● Asynchronous nature of Test Containers
○ Increase in test runtime
● Ensuring consistency between components
○ Add a marker at the end of the input record
Challenges
m0
Producer m0
m0
c
m1 m2
m1
c
c
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
● Testing interaction between a Kafka producer and a Kafka consumer.
● Producing and consuming raw records and JSON records.
● Combining REST API testing with Kafka testing.
● Spinning up Kafka components - Kafka, KConnect and KStreams in a TestContainer.
Conclusion
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2022 Bloomberg Finance L.P. All rights reserved.
Thank you!
https://www.bloomberg.com/careers
Learn more about our Derivatives team:
https://www.bloomberg.com/company/stories/meet-the-team-derivativ
es-engineering/
Subhangi Agarwala: sagarwala10@bloomberg.net

More Related Content

PDF
Real-Time Market Data Analytics Using Kafka Streams
PDF
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
PDF
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
PDF
Testing data streaming applications
PDF
Test strategies for data processing pipelines, v2.0
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
PDF
Apache Big Data Europe 2015: Selected Talks
PPTX
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Real-Time Market Data Analytics Using Kafka Streams
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Testing data streaming applications
Test strategies for data processing pipelines, v2.0
Building High-Throughput, Low-Latency Pipelines in Kafka
Apache Big Data Europe 2015: Selected Talks
Westpac Bank Tech Talk 1: Dive into Apache Kafka

Similar to Verifying Apache Kafka-Based Data Pipelines With Subhangi Agarwala | Current 2022 (16)

PDF
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
PDF
Confluent Partner Tech Talk with Synthesis
PDF
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
PPTX
Realtime stream processing with kafka
PPTX
Api functional monitoring -9th October 2021
PDF
Kafka Vienna Meetup 020719
PDF
Testing Kafka components with Kafka for JUnit
PDF
Why Build an Apache Kafka® Connector
PPTX
Big Data Analytics_basic introduction of Kafka.pptx
PDF
NAB Tech Talk
PDF
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
PDF
Apache Kafka® Use Cases for Financial Services
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
PDF
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
PDF
Test strategies for data processing pipelines
PPTX
Streaming Data and Stream Processing with Apache Kafka
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
Confluent Partner Tech Talk with Synthesis
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Realtime stream processing with kafka
Api functional monitoring -9th October 2021
Kafka Vienna Meetup 020719
Testing Kafka components with Kafka for JUnit
Why Build an Apache Kafka® Connector
Big Data Analytics_basic introduction of Kafka.pptx
NAB Tech Talk
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
Apache Kafka® Use Cases for Financial Services
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
The Power of Event Driven Caches (Brendan Powers, Bloomberg L.P) Kafka Summit...
Test strategies for data processing pipelines
Streaming Data and Stream Processing with Apache Kafka
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Ad

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25-Week II
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MIND Revenue Release Quarter 2 2025 Press Release
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Verifying Apache Kafka-Based Data Pipelines With Subhangi Agarwala | Current 2022

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. Verifying Apache Kafka-based Data Pipelines Current 2022 October 5, 2022 Subhangi Agarwala Senior Software Engineer
  • 2. © 2022 Bloomberg Finance L.P. All rights reserved. • Bloomberg • Derivatives Data Engineering — Data ingestion, storage and aggregation • We build real-time financial market data pipelines About Us
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. Agenda ● Background and our use case ● Why we need to test streaming applications in finance ● Integration test framework ● Conclusion
  • 4. © 2022 Bloomberg Finance L.P. All rights reserved. Real-Time Data Pipelines in Finance – and at Bloomberg
  • 5. © 2022 Bloomberg Finance L.P. All rights reserved. Application Architecture Config Publisher Trading Data Publisher Kafka Connect Persistent data store Kafka Streams Client App Analytical Engine Real Time analysis Config Updates Config Changes Trading Data
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. ● Unit Testing ○ Easily test new functional logic in isolation ○ Insufficient for larger workflows ● Integration Testing ○ Handle exceptions and failures where different moving parts are involved ○ Reliability Unit Testing vs. Integration Testing
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. Testing Framework Test Validator TestContainers using Docker (Producer) TestContainers using Docker (Consumer) TestContainers using Docker (test Kafka Connect) Kafka Kafka KConnect
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. ● Asynchronous nature of Test Containers ○ Increase in test runtime ● Ensuring consistency between components ○ Add a marker at the end of the input record Challenges m0 Producer m0 m0 c m1 m2 m1 c c
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. ● Testing interaction between a Kafka producer and a Kafka consumer. ● Producing and consuming raw records and JSON records. ● Combining REST API testing with Kafka testing. ● Spinning up Kafka components - Kafka, KConnect and KStreams in a TestContainer. Conclusion
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. © 2022 Bloomberg Finance L.P. All rights reserved. Thank you! https://www.bloomberg.com/careers Learn more about our Derivatives team: https://www.bloomberg.com/company/stories/meet-the-team-derivativ es-engineering/ Subhangi Agarwala: sagarwala10@bloomberg.net