SlideShare a Scribd company logo
Cosmos DB –
Kafka Connectors
Abinav Rameesh
Program Manager, Cosmos DB
01
Kafka Connect
Overview
02
Kafka Integration
Use Cases for
Cosmos DB
03
Cosmos DB Source
& Sink Architecture
Overview
04
Demo
05
Taking It
Further
What is a
Connector?
Confluent Platform offers 120+
pre-built connectors to help you
quickly and reliably integrate
with Apache Kafka®.
Connectors import and export data from
some of the most commonly used data
systems.
Connectors either run as a managed
resource on Confluent Cloud or as a self
managed resource on a self managed kafka
cluster.
Kafka Connect runs under the Java
virtual machine (JVM) as a process
known as a worker. Each worker can
execute multiple connectors.
Connect Architecture
•Connectors are responsible for the interaction between Kafka
Connect and the external technology being integrated with
•Converters handle the serialization and deserialization of data
•Transformations can optionally apply one or more transformations
to the data passing through the pipeline
Kafka Cosmos DB
SHARDING
SCALING
CDC
TUNABILITY
MANAGED
Kafka is horizontally partitioned with
brokers serving as leaders and
followers, each owning it’s own
logical range of data.
Kafka can be scaled seamlessly by
simply increasing the number of
brokers in the cluster.
Kafka provides Consumer library to
retrieve changes from each of the
physical partitions of the topic.
Kafka performance can be tuned for
batch sizes, memory thresholds,
polling frequencies etc.
Kafka can be self provisioned or fully
managed through Confluent Cloud.
Cosmos DB is horizontally
partitioned with partitions
composing a replica set.
Cosmos DB can be scaled elastically
by simply increasing the throughput
for dataset.
Cosmos DB provides a Change Feed
Processor, which contains inbuilt
logic to retrieve changes from the
physical partitions for the container.
Cosmos DB can be tuned for query
performance, RU consumption,
batch sizes for writes and reads etc.
Cosmos DB is a fully managed
service.
01
Source and Sink
Connectors for
Cosmos DB facilitate
seamless integration
without the need to
write complex
application code to
migrate data to and
from Kafka.
SOURCE, SINK
ZERO CODE
DATA FORMATS
FULLY MANAGED
02
Only
configurations are
needed to point to
the Cosmos DB
account and
Kafka cluster with
additional
customization
options.
03
JSON and AVRO
data formats are
supported with
additional format
options to come
based on user
feedback.
04
Fully managed
(through Confluent
Cloud) as well as
self managed (using
the connector
directly) are
available.
Cosmos DB – Kafka Use Cases
Bookings
Forecasting
Analytics
Flight
Recommendations
Marketing
Revenue Optimizer
Cosmos DB – Kafka Connector Architecture
…. ….
Managed Kafka
Connect Cluster
(Source Connector)
Change Feed
Processor Kafka Producer
Reading from
Cosmos DB’s
physical
partitions
Writing to the
Kafka topic’s
physical
partitions
Cosmos DB – Kafka Connector Architecture
….
Managed Kafka
Connect Cluster
(Sink Connector)
Cosmos DB Java Client
issuing writes to the
Cosmos container
Kafka Consumer
pulling from the
topic’s brokers
Writing to Cosmos DB’s
physical partitions
….
Demo
Taking It Further

More Related Content

PPTX
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
PPTX
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
PDF
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
PDF
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
PPTX
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
PDF
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
PDF
Creating a Kafka Topic. Super easy? | Andrew Stevenson and Marios Andreopoulo...
PDF
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
Creating a Kafka Topic. Super easy? | Andrew Stevenson and Marios Andreopoulo...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...

What's hot (20)

PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
PDF
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
PDF
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
PDF
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...
PDF
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
PDF
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
PDF
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
PDF
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
PDF
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
PDF
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
PPTX
Devops Days, 2019 - Charlotte
PDF
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
PDF
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
PDF
Developing custom transformation in the Kafka connect to minimize data redund...
PPTX
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
PDF
apidays LIVE New York 2021 - Service reliability through autoscaling workload...
PDF
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
PDF
Kafka and Kafka Streams in the Global Schibsted Data Platform
PDF
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
PDF
Building Microservices with Apache Kafka by Colin McCabe
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Devops Days, 2019 - Charlotte
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Developing custom transformation in the Kafka connect to minimize data redund...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
apidays LIVE New York 2021 - Service reliability through autoscaling workload...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Kafka and Kafka Streams in the Global Schibsted Data Platform
Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera
Building Microservices with Apache Kafka by Colin McCabe
Ad

Similar to Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft (20)

PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
PPTX
Connecting kafka message systems with scylla
PPTX
Apache Cassandra Lunch #79: Cassandra API in Cosmos DB
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
PDF
Cosmos DB - Database for Serverless era
PDF
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
PPTX
Data Pipelines with Kafka Connect
PDF
Data integration with Apache Kafka
PDF
Luciano Moreira_Jacob Bogie-BRSP005-10.3_22_FINAL.pdf
PDF
Diving into the Deep End - Kafka Connect
PPTX
Riding the Streaming Wave DIY style
PDF
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
PDF
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
PDF
Dealing with Azure Cosmos DB
PDF
Partner Development Guide for Kafka Connect
PPTX
Introduction to Cosmos DB Presentation.pptx
PPTX
Confluent and Syncsort Webinar August 2016
PDF
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
PDF
Leverage Kafka to build a stream processing platform
PPTX
How to integrate your database with kafka & CDC
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Connecting kafka message systems with scylla
Apache Cassandra Lunch #79: Cassandra API in Cosmos DB
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Cosmos DB - Database for Serverless era
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Data Pipelines with Kafka Connect
Data integration with Apache Kafka
Luciano Moreira_Jacob Bogie-BRSP005-10.3_22_FINAL.pdf
Diving into the Deep End - Kafka Connect
Riding the Streaming Wave DIY style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Dealing with Azure Cosmos DB
Partner Development Guide for Kafka Connect
Introduction to Cosmos DB Presentation.pptx
Confluent and Syncsort Webinar August 2016
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Leverage Kafka to build a stream processing platform
How to integrate your database with kafka & CDC
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Monthly Chronicles - July 2025
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft

  • 1. Cosmos DB – Kafka Connectors Abinav Rameesh Program Manager, Cosmos DB
  • 2. 01 Kafka Connect Overview 02 Kafka Integration Use Cases for Cosmos DB 03 Cosmos DB Source & Sink Architecture Overview 04 Demo 05 Taking It Further
  • 3. What is a Connector? Confluent Platform offers 120+ pre-built connectors to help you quickly and reliably integrate with Apache Kafka®. Connectors import and export data from some of the most commonly used data systems. Connectors either run as a managed resource on Confluent Cloud or as a self managed resource on a self managed kafka cluster. Kafka Connect runs under the Java virtual machine (JVM) as a process known as a worker. Each worker can execute multiple connectors.
  • 4. Connect Architecture •Connectors are responsible for the interaction between Kafka Connect and the external technology being integrated with •Converters handle the serialization and deserialization of data •Transformations can optionally apply one or more transformations to the data passing through the pipeline
  • 5. Kafka Cosmos DB SHARDING SCALING CDC TUNABILITY MANAGED Kafka is horizontally partitioned with brokers serving as leaders and followers, each owning it’s own logical range of data. Kafka can be scaled seamlessly by simply increasing the number of brokers in the cluster. Kafka provides Consumer library to retrieve changes from each of the physical partitions of the topic. Kafka performance can be tuned for batch sizes, memory thresholds, polling frequencies etc. Kafka can be self provisioned or fully managed through Confluent Cloud. Cosmos DB is horizontally partitioned with partitions composing a replica set. Cosmos DB can be scaled elastically by simply increasing the throughput for dataset. Cosmos DB provides a Change Feed Processor, which contains inbuilt logic to retrieve changes from the physical partitions for the container. Cosmos DB can be tuned for query performance, RU consumption, batch sizes for writes and reads etc. Cosmos DB is a fully managed service.
  • 6. 01 Source and Sink Connectors for Cosmos DB facilitate seamless integration without the need to write complex application code to migrate data to and from Kafka. SOURCE, SINK ZERO CODE DATA FORMATS FULLY MANAGED 02 Only configurations are needed to point to the Cosmos DB account and Kafka cluster with additional customization options. 03 JSON and AVRO data formats are supported with additional format options to come based on user feedback. 04 Fully managed (through Confluent Cloud) as well as self managed (using the connector directly) are available.
  • 7. Cosmos DB – Kafka Use Cases
  • 9. Cosmos DB – Kafka Connector Architecture …. …. Managed Kafka Connect Cluster (Source Connector) Change Feed Processor Kafka Producer Reading from Cosmos DB’s physical partitions Writing to the Kafka topic’s physical partitions
  • 10. Cosmos DB – Kafka Connector Architecture …. Managed Kafka Connect Cluster (Sink Connector) Cosmos DB Java Client issuing writes to the Cosmos container Kafka Consumer pulling from the topic’s brokers Writing to Cosmos DB’s physical partitions ….
  • 11. Demo