SlideShare a Scribd company logo
@nicolas_frankel
A CDC use-case:
Designing an Evergreen
Cache
Nicolas Fränkel
@nicolas_frankel
Me, myself and I
 Former developer, team lead, architect,
blah-blah
 Developer Advocate
 Interested in CDC and data streaming
@nicolas_frankel
Hazelcast
HAZELCAST IMDG is an operational,
in-memory, distributed computing
platform that manages data using
in-memory storage and performs
execution for breakthrough
and scale.
HAZELCAST JET is the ultra
fast, application embeddable,
3rd generation stream
processing engine for low
latency batch and stream
processing.
GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache
@nicolas_frankel
Agenda
1. Why cache?
2. Alternatives to keeping the cache in sync
3. Change-Data-Capture (CDC)
4. Debezium, a CDC implementation
5. Hazelcast Jet + Debezium
6. Demo!
@nicolas_frankel
The caching trade-off
 Improved performance/availability
 Stale data
@nicolas_frankel
The initial state
1. The application
2. The RDBMS
3. The cache
@nicolas_frankel
Aye, there’s the rub!
 A new component
writes to the
database
 E.g.: a table holding
references needs to
be updated every
now and then
@nicolas_frankel
How to keep the cache in sync with the DB?
@nicolas_frankel
Cache invalidation
“There are two hard things in computer
science:
1. Naming things
2. Cache invalidation
3. And off-by-one errors”
@nicolas_frankel
Cache eviction vs Time-To-Live
 Cache eviction: which entities to evict
when the cache is full
• Least Recently Used
• Least Frequently Used
 TTL: how long will an entity be kept in
the cache
@nicolas_frankel
Choosing the “correct” TTL
 Less frequent than the update frequency
• Miss updates
 More frequent than the update frequency
• Waste resources
@nicolas_frankel
Polling process
Same issue regarding the frequency
@nicolas_frankel
Event-driven for the win!
1. If no writes happen, there's no need to
update the cache
2. If a write happens, then the relevant
cache item should be updated
accordingly
@nicolas_frankel
RDMBS triggers
 Not all RDBMS implement triggers
 How to call an external process from the
trigger?
@nicolas_frankel
The example of MySQL: User-defined function
 Functions must be written in C++
 The OS must support dynamic loading
 Becomes part of the running server
• Bound by all constraints that apply to
writing server code
 Etc.
-- https://dev.mysql.com/doc/refman/8.0/en/adding-udf.html
@nicolas_frankel
lib_mysqludf_sys
UDF library with functions to interact with the operating system
CREATE TRIGGER MyTrigger
AFTER INSERT ON MyTable
FOR EACH ROW
BEGIN
DECLARE cmd CHAR(255);
DECLARE result INT(10);
SET cmd = CONCAT('update_row', '1');
SET result = sys_exec(cmd);
END;
-- https://github.com/mysqludf/lib_mysqludf_sys
@nicolas_frankel
Cons
 Implementation-dependent
 Fragile
 Who maintains/debugs it?
 Resource-consuming if done frequently
@nicolas_frankel
Change-Data-Capture
“In databases, Change Data Capture is a set
of software design patterns used to determine
and track the data that has changed so that
action can be taken using the changed data.
CDC is an approach to data integration that is
based on the identification, capture and
delivery of the changes made to enterprise
data sources.”
-- https://en.wikipedia.org/wiki/Change_data_capture
@nicolas_frankel
CDC implementation options
1. Polling + Timestamps on rows
2. Polling + Version numbers on rows
3. Polling + Status indicators on rows
4. Triggers on tables
5. Log scanners
-- https://en.wikipedia.org/wiki/Change_data_capture
@nicolas_frankel
“Turning the database inside out” - Martin Kleppman
-- https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
@nicolas_frankel
What is a transaction/binary/etc. log?
“The binary log contains ‘events’ that
describe database changes such as table
creation operations or changes to table
data.”
-- https://dev.mysql.com/doc/refman/8.0/en/binary-log.html
@nicolas_frankel
Reasons for the log
1. Data recovery
2. Replication
@nicolas_frankel
What if we “hacked” the log?
@nicolas_frankel
Sample MySQL binlog
### UPDATE `test`.`t`
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2='apple' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=NULL /* VARSTRING(20) meta=0 nullable=1 is_null=1 */
### SET
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2='pear' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3='2009:01:01' /* DATE meta=0 nullable=1 is_null=0 */
# at 569
#150112 21:40:14 server id 1 end_log_pos 617 CRC32 0xf134ad89
#Table_map: `test`.`t` mapped to number 251
# at 617
#150112 21:40:14 server id 1 end_log_pos 665 CRC32 0x87047106
#Delete_rows: table id 251 flags: STMT_END_F
@nicolas_frankel
Kind reminder…
 Implementation-dependent
 Fragile
 Who maintains/debugs it?
@nicolas_frankel
Debezium to the rescue
 Java-based abstraction layer for CDC
 Provided by Red Hat
 Apache v2 licensed
 Very skewed toward Kafka
@nicolas_frankel
Debezium connector plugins
 Production-ready
• MySQL
• PostrgreSQL
• MongoDB
• SQL Server
 Incubating
• Oracle
• DB2 (!)
• Cassandra
@nicolas_frankel
Debezium
“Debezium records all
row-level changes
within each database
table in a change
event stream”
-- https://debezium.io/
@nicolas_frankel
Hazelcast Jet
 Stream Processing Engine (SPE)
 Distributed
 In-memory
 Embeds Hazelcast IMDG
 Apache v2 licensed
 (Hazelcast Jet Enterprise offering)
@nicolas_frankel
Jet overview
Stream Processor
Data SinkData Source
Hazelcast IMDG
Map, Cache, List,
Change Events
Live Streams
Kafka, JMS,
Sensors, Feeds
Databases
JDBC, Relational,
NoSQL, Change Events
Files
HDFS, Flat Files,
Logs, File watcher
Applications
Sockets
Ingest
In-Memory
Operational Storage
Combine
Join, Enrich,
Group, Aggregate
Stream
Windowing, Event-Time
Processing
Compute
Distributed and Parallel
Computations
Transform
Filter, Clean,
Convert
Publish
In-Memory, Subscriber
Notifications
Stream Stream
@nicolas_frankel
Deployment modes
// Create new cluster member
JetInstance jet = Jet.newJetInstance();
// Connect to running cluster
JetInstance jet = Jet.newJetClient();
Client/ServerEmbedded
Java API
Application
Java API
Application
Java API
Application
Client API
Application
Client API
Application
Client API
Application
Client API
Application
@nicolas_frankel
Pipeline Job
 Declarative code that
defines and links sources,
transforms, and sinks
 Platform-specific SDK
 Client submits pipeline to
the SPE
 Running instance of pipeline
in SPE
 SPE executes the pipeline
• Code execution
• Data routing
• Flow control
@nicolas_frankel
Back to our use-case
A Jet job:
1. Watches change events in the
database
2. Analyzes the change event
3. Updates the cache accordingly
@nicolas_frankel
@nicolas_frankel
Recap
 The caching trade-off
 Event-based architectures FTW
 Change-Data-Capture
• Integration through Hazelcast Jet
@nicolas_frankel
Thanks for your attention!
 https://blog.frankel.ch/
 @nicolas_frankel
 https://jet-start.sh/docs/tutorials/cdc
 https://bit.ly/evergreen-cache

More Related Content

PPTX
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
PDF
jLove - A Change-Data-Capture use-case: designing an evergreen cache
PDF
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PPTX
Hadoop at Bloomberg:Medium data for the financial industry
PDF
NetApp cluster failover giveback
PDF
Become a MySQL DBA - slides: Deciding on a relevant backup solution
PDF
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
London In-Memory Computing Meetup - A Change-Data-Capture use-case: designing...
jLove - A Change-Data-Capture use-case: designing an evergreen cache
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Hadoop at Bloomberg:Medium data for the financial industry
NetApp cluster failover giveback
Become a MySQL DBA - slides: Deciding on a relevant backup solution
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi

What's hot (20)

PDF
Five More Ways to Break Your Ceph Cluster
PPT
KSCOPE 2013: Exadata Consolidation Success Story
PPTX
Aerospike Architecture
PDF
Ceph Day Netherlands - Ceph Management and Monitoring with openATTIC 3.x
DOC
netapp c-mode terms
PDF
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PDF
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
PDF
RedGateWebinar - Where did my CPU go?
PPTX
What every data programmer needs to know about disks
PDF
OOW 2013: Where did my CPU go
PPTX
Configuring Aerospike - Part 1
PDF
PGConf.ASIA 2019 Bali - Setup a High-Availability and Load Balancing PostgreS...
PDF
Oracle Exadata Exam Dump
PDF
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
PDF
Feed Burner Scalability
PPTX
Redis Persistence
PPT
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
PPTX
Redis vs Aerospike
PDF
On The Building Of A PostgreSQL Cluster
PDF
Out of the box replication in postgres 9.4
Five More Ways to Break Your Ceph Cluster
KSCOPE 2013: Exadata Consolidation Success Story
Aerospike Architecture
Ceph Day Netherlands - Ceph Management and Monitoring with openATTIC 3.x
netapp c-mode terms
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RedGateWebinar - Where did my CPU go?
What every data programmer needs to know about disks
OOW 2013: Where did my CPU go
Configuring Aerospike - Part 1
PGConf.ASIA 2019 Bali - Setup a High-Availability and Load Balancing PostgreS...
Oracle Exadata Exam Dump
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
Feed Burner Scalability
Redis Persistence
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Redis vs Aerospike
On The Building Of A PostgreSQL Cluster
Out of the box replication in postgres 9.4
Ad

Similar to GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache (20)

PPTX
JOnConf - A CDC use-case: designing an Evergreen Cache
PDF
MySQL's NoSQL -- SCaLE 13x Feb. 20, 2015
PDF
All in one
PPTX
Intro to Azure SQL database
PPTX
Copy Data Management for the DBA
PPTX
Denver SQL Saturday The Next Frontier
PPTX
Sql server 2019 New Features by Yevhen Nedaskivskyi
PPTX
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
PDF
Designing For Occasionally Connected Apps Slideshare
PPTX
Db2 analytics accelerator on ibm integrated analytics system technical over...
PDF
101 ways to configure kafka - badly
PDF
Testing Delphix: easy data virtualization
PDF
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
PDF
MySQL's NoSQL -- Texas Linuxfest August 22nd 2015
PDF
Cloud Bursting 101: What to do When Cloud Computing Demand Exceeds Capacity
PDF
Docker Containers- Data Engineers' Arsenal.pdf
ODP
Handling Database Deployments
PDF
Sample Solution Blueprint
PDF
Using DC/OS for Continuous Delivery - DevPulseCon 2017
PPTX
OSCONF Hyderabad - Shorten all URLs!
JOnConf - A CDC use-case: designing an Evergreen Cache
MySQL's NoSQL -- SCaLE 13x Feb. 20, 2015
All in one
Intro to Azure SQL database
Copy Data Management for the DBA
Denver SQL Saturday The Next Frontier
Sql server 2019 New Features by Yevhen Nedaskivskyi
OSCONF Koshi - Zero downtime deployment with Kubernetes, Flyway and Spring Boot
Designing For Occasionally Connected Apps Slideshare
Db2 analytics accelerator on ibm integrated analytics system technical over...
101 ways to configure kafka - badly
Testing Delphix: easy data virtualization
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
MySQL's NoSQL -- Texas Linuxfest August 22nd 2015
Cloud Bursting 101: What to do When Cloud Computing Demand Exceeds Capacity
Docker Containers- Data Engineers' Arsenal.pdf
Handling Database Deployments
Sample Solution Blueprint
Using DC/OS for Continuous Delivery - DevPulseCon 2017
OSCONF Hyderabad - Shorten all URLs!
Ad

More from Nicolas Fränkel (20)

PPTX
SnowCamp - Adding search to a legacy application
PPTX
Un CV de dévelopeur toujours a jour
PPTX
Zero-downtime deployment on Kubernetes with Hazelcast
PPTX
BigData conference - Introduction to stream processing
PPTX
ADDO - Your own Kubernetes controller, not only in Go
PPTX
TestCon Europe - Mutation Testing to the Rescue of Your Tests
PPTX
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
PPTX
JavaDay Istanbul - 3 improvements in your microservices architecture
PPTX
Devclub.lv - Introduction to stream processing
PPTX
JUG Tirana - Introduction to data streaming
PPTX
Java.IL - Your own Kubernetes controller, not only in Go!
PPTX
vJUG - Introduction to data streaming
PPTX
London Java Community - An Experiment in Continuous Deployment of JVM applica...
PPTX
OSCONF - Your own Kubernetes controller: not only in Go
PPTX
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
PPTX
Tech talks - 3 performance improvements
PPTX
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
PPTX
ING Meetup - Migrating Spring Boot Config Annotations to Functional with Kotlin
PPTX
SouJava- 3 easy performance improvements in your microservices architecture
PPTX
JUG SF - Introduction to data streaming
SnowCamp - Adding search to a legacy application
Un CV de dévelopeur toujours a jour
Zero-downtime deployment on Kubernetes with Hazelcast
BigData conference - Introduction to stream processing
ADDO - Your own Kubernetes controller, not only in Go
TestCon Europe - Mutation Testing to the Rescue of Your Tests
OSCONF Jaipur - A Hitchhiker's Tour to Containerizing a Java application
JavaDay Istanbul - 3 improvements in your microservices architecture
Devclub.lv - Introduction to stream processing
JUG Tirana - Introduction to data streaming
Java.IL - Your own Kubernetes controller, not only in Go!
vJUG - Introduction to data streaming
London Java Community - An Experiment in Continuous Deployment of JVM applica...
OSCONF - Your own Kubernetes controller: not only in Go
vKUG - Migrating Spring Boot apps from annotation-based config to Functional
Tech talks - 3 performance improvements
AllTheTalks.online - A Streaming Use-Case: And Experiment in Continuous Deplo...
ING Meetup - Migrating Spring Boot Config Annotations to Functional with Kotlin
SouJava- 3 easy performance improvements in your microservices architecture
JUG SF - Introduction to data streaming

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administraation Chapter 3
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Cost to Outsource Software Development in 2025
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Nekopoi APK 2025 free lastest update
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Digital Strategies for Manufacturing Companies
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
Understanding Forklifts - TECH EHS Solution
System and Network Administraation Chapter 3
Upgrade and Innovation Strategies for SAP ERP Customers
Design an Analysis of Algorithms I-SECS-1021-03
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Cost to Outsource Software Development in 2025
CHAPTER 2 - PM Management and IT Context
Why Generative AI is the Future of Content, Code & Creativity?
Nekopoi APK 2025 free lastest update
Computer Software and OS of computer science of grade 11.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PTS Company Brochure 2025 (1).pdf.......
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Digital Systems & Binary Numbers (comprehensive )
Digital Strategies for Manufacturing Companies
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Wondershare Filmora 15 Crack With Activation Key [2025

GeekcampSG 2020 - A Change-Data-Capture use-case: designing an evergreen cache