SlideShare a Scribd company logo
@lyaruu#Voxxed
Embracing Database Diversity
with Kafka and Debezium
Frank Lyaruu
Embracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and Debezium
Embracing Database Diversity with Kafka and Debezium
@lyaruu#Voxxed
Embracing Database Diversity
with Kafka and Debezium
Frank Lyaruu
CTO Dexels
Amsterdam
• Service provider for professional and amateur team sports in the
Netherlands & Belgium
• 10+ years old
• Managing personal data, planning competitions, assigning officials,
supplying data feeds
• 2M+ players
• 6K+ clubs
• 40K matches a week
• Spikey but predictable load
Technology stack
• Oracle database
• Cluster of Java based application servers
• Diverse set of clients
Application Code
Client
Application
Code
Desktop client Desktop client Web client Web client Mobile client Mobile client
Application
Code
Application
Code
Application
Code
Challenge
• Move to a player centric model instead of a club centric
• Few orders of magnitude more users and load
• Moving away from Oracle is not feasible in the short term
• Scaling Oracle is just too expensive, if at all possible
Application Code
Client
Application Code
Client
Plan
1. Capture data in realtime
2. Dump into Kafka
3. Insert into MongoDB
Kafka
• Kafka Broker
• Kafka Connect API
• Kafka Streams API
Kafka
• Persistent pub/sub message bus
• High throughput
• Subscribers can consume at their own speed
• Subscribers can request a ‘rewind’ and re-consume a topic
• Has some tricks to keep the data volume down
• Having both fast and slow consumers is not a problem
Change Data Capture
https://www.confluent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data/
Martin Kleppmann:
Write ahead log
• Archive Log (Oracle)
• Oplog (MongoDB)
• Write Ahead Log (Postgres)
• Binlog (MySQL)
Write ahead log
• Recovery
• Replication
Master / slave replication
Embracing Database Diversity with Kafka and Debezium
Postgres (>=9.4)
46.2.1. Logical Decoding
Logical decoding is the process of extracting all persistent changes to a
database's tables into a coherent, easy to understand format which can be
interpreted without detailed knowledge of the database's internal state.
In PostgreSQL, logical decoding is implemented by decoding the contents of
the write-ahead log, which describe changes on a storage level, into an
application-specific form such as a stream of tuples or SQL statements.
Jeff Klukas / Postgres 9.4 docs
Debezium
• RedHat
• Standardize change data capture
• Uses Kafka Connect API
• Pretty young: 0.7.4
• Based on ‘Bottled Water’ research project
Debezium
• Postgres
• Mysql
• MongoDB
• Oracle*
Core Service
Source database
Java Application Server
Kafka
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
User Backend Service
MongoDB Database
Java Application Server
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
SELECT * FROM Communication C WHERE PersonId = 1
SELECT * FROM Person WHERE PersonId=1
Example
MongoDB
{
   “_id":1,
   "Name":"Alfredo",
   "DOB":"1990-1-1",
   "Communication":{
      "Mobile":"12345",
      "Email":"alfredo@aol.com",
      "Twitter":"@alfredo"
   }
}
Stream Processing
SQL Record SQL Record
Stream
transformation
MongoDB
Stream Processing
SQL Record SQL Record
Kafka Streams
MongoDB
RocksDB
Kafka Streams at Scale
• ± 500M rows of SQL data
• ± 50 joins
• 500 topics
• 400 Gb of Kafka Data
• 300 Gb of RocksDb data
• Building a complete replica from scratch takes many hours
• After that <100ms latency for changes
Development cycle
• Developing and testing is hard for stateful code
• Starting a new ‘generation’ is costly
• Contaminated data might show up
Conclusions
• Went into production last June
• Generally behaves well (aside from some glitches)
• Kafka Streams is in a lot better state than a year ago
Architecture
Microservice
API
Any private store
Code in whatever
language
Different parts need the same
data
… but in a different way
Application
UI
SQL Database
Code in some
language
Analytics code
Analytics UI
Application Service
UI
SQL Database
Code in some
language
Analytics UI
Analytics code
Analytics Service
Analytics Database
Analytics API
Application Service
UI
SQL Database
Code in some
language
Analytics UI
Analytics code
Analytics Service
Analytics Database
Analytics API
Event Driven Microservices
• Services push events instead of a request/response model
• Usually backed by a publish/subscribe bus
Application Service
SQL Database
Code in some language
Event Bus
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
Analytics Service
Analytics Database
Code in some language
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
Event Sourcing
Embracing Database Diversity with Kafka and Debezium
Elasticsearch
• Add unstructured search to our application
• Reduces load on our source databases
• Users expect google-like interfaces
Neo4J
• Graph Database
• Some analytics are much easier to express in terms of graphs
Firebase Realtime
• Real-time database
• ‘Backend As a Service’
• Essentially one big JSON document
• Very easy to use client libraries for web and mobile
• Safe to develop
Caches
• We can use our streaming engine to update / invalidate caches
Push to clients
• We can push data to clients in real time
Embracing Database Diversity with Kafka and Debezium
Mac book
Postgres Zookeeper Kafka
Kafka
Connect /
Debezium
Kafka
Streams
MongoDB
Embracing Database Diversity with Kafka and Debezium
{
"_id" : "1001",
"Address" : [
{
"zip" : "76036",
"city" : "Euless",
"street" : "3183 Moore Avenue",
"id" : 10,
"state" : "Texas",
"customer_id" : 1001,
"type" : "SHIPPING"
}
],
"last_name" : "Thomas",
"id" : 1001,
"first_name" : "Sally",
"email" : "sally.thomas@acme.com"
}
MongoDB
Kafka
Postgres Debezium
Kafka
Streams
MongoDB
Kafka
Connect
Demo?
If all you have is a hammer…
If all you have is SQL…
you will always think relational
you will always think you need nothing else
Questions?
* Any question that is not: “Why don’t you use Postgres? Postgres can do anything”

More Related Content

PPTX
Kafka Connect - debezium
PPTX
Migrating with Debezium
PDF
Change data capture with MongoDB and Kafka.
PDF
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
PDF
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
PDF
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
PDF
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
PPTX
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
Kafka Connect - debezium
Migrating with Debezium
Change data capture with MongoDB and Kafka.
Change Data Streaming Patterns For Microservices With Debezium (Gunnar Morlin...
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...

What's hot (20)

PDF
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
PPTX
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
PDF
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
PDF
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
PDF
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
PDF
How to over-engineer things and have fun? | Oto Brglez, OPALAB
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
PDF
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
PDF
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
PDF
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
PDF
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
PDF
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
PDF
Data integration with Apache Kafka
PDF
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
PDF
Understanding Apache Kafka® Latency at Scale
PPTX
Change Data Capture using Kafka
PPTX
Databus - LinkedIn's Change Data Capture Pipeline
PPTX
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
PDF
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
PDF
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
How to over-engineer things and have fun? | Oto Brglez, OPALAB
Real time Messages at Scale with Apache Kafka and Couchbase
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Data integration with Apache Kafka
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
Understanding Apache Kafka® Latency at Scale
Change Data Capture using Kafka
Databus - LinkedIn's Change Data Capture Pipeline
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Ad

Similar to Embracing Database Diversity with Kafka and Debezium (20)

PPTX
Capture the Streams of Database Changes
PDF
Relational vs. Non-Relational
ODP
Реляционные или нереляционные (Josh Berkus)
PDF
NOSQL Overview
KEY
Non-Relational Databases at ACCU2011
PDF
SQL or NoSQL - how to choose
PPTX
Why Organizations are Looking at Alternative Database Technologies – Introduc...
PDF
The NoSQL Ecosystem
PDF
HPTS 2011: The NoSQL Ecosystem
PDF
How Kafka Powers the World's Most Popular Vector Database System with Charles...
PDF
Chris Lea - What does NoSQL Mean for You
PPTX
Strata NY 2018: The deconstructed database
PDF
Is NoSQL The Future of Data Storage?
PPTX
DBMS outline.pptx
PDF
NoSQL Databases Introduction - UTN 2013
PPTX
mongodb_DS.pptx
PPTX
Software architecture for data applications
PDF
Database Systems - A Historical Perspective
PPT
NoSql Databases
PDF
Overview of no sql
Capture the Streams of Database Changes
Relational vs. Non-Relational
Реляционные или нереляционные (Josh Berkus)
NOSQL Overview
Non-Relational Databases at ACCU2011
SQL or NoSQL - how to choose
Why Organizations are Looking at Alternative Database Technologies – Introduc...
The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
How Kafka Powers the World's Most Popular Vector Database System with Charles...
Chris Lea - What does NoSQL Mean for You
Strata NY 2018: The deconstructed database
Is NoSQL The Future of Data Storage?
DBMS outline.pptx
NoSQL Databases Introduction - UTN 2013
mongodb_DS.pptx
Software architecture for data applications
Database Systems - A Historical Perspective
NoSql Databases
Overview of no sql
Ad

More from Frank Lyaruu (8)

PDF
Too young to quit, too old to change
PDF
Non Blocking I/O for Everyone with RxJava
PPTX
The Road To Reactive with RxJava JEEConf 2016
PDF
Scripting Languages in OSGi
PDF
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...
PDF
Developing Like There's No Tomorrow
PDF
Service Discovery in OSGi: Beyond the JVM using Docker and Consul
PPT
Deploying OSGi on an Army of CubieTrucksSendrato powerpoint
Too young to quit, too old to change
Non Blocking I/O for Everyone with RxJava
The Road To Reactive with RxJava JEEConf 2016
Scripting Languages in OSGi
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...
Developing Like There's No Tomorrow
Service Discovery in OSGi: Beyond the JVM using Docker and Consul
Deploying OSGi on an Army of CubieTrucksSendrato powerpoint

Recently uploaded (20)

PPTX
Funds Management Learning Material for Beg
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
Testing WebRTC applications at scale.pdf
PDF
Introduction to the IoT system, how the IoT system works
PPTX
SAP Ariba Sourcing PPT for learning material
PDF
The Internet -By the Numbers, Sri Lanka Edition
PPTX
Digital Literacy And Online Safety on internet
PPTX
artificial intelligence overview of it and more
PPT
tcp ip networks nd ip layering assotred slides
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
Internet___Basics___Styled_ presentation
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
WebRTC in SignalWire - troubleshooting media negotiation
Funds Management Learning Material for Beg
Slides PDF The World Game (s) Eco Economic Epochs.pdf
522797556-Unit-2-Temperature-measurement-1-1.pptx
Testing WebRTC applications at scale.pdf
Introduction to the IoT system, how the IoT system works
SAP Ariba Sourcing PPT for learning material
The Internet -By the Numbers, Sri Lanka Edition
Digital Literacy And Online Safety on internet
artificial intelligence overview of it and more
tcp ip networks nd ip layering assotred slides
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
Unit-1 introduction to cyber security discuss about how to secure a system
Paper PDF World Game (s) Great Redesign.pdf
Tenda Login Guide: Access Your Router in 5 Easy Steps
Introuction about WHO-FIC in ICD-10.pptx
Module 1 - Cyber Law and Ethics 101.pptx
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Internet___Basics___Styled_ presentation
Sims 4 Historia para lo sims 4 para jugar
WebRTC in SignalWire - troubleshooting media negotiation

Embracing Database Diversity with Kafka and Debezium