SlideShare a Scribd company logo
© 2022 Altinity, Inc.
Adventures with the ClickHouse
ReplacingMergeTree Engine
Mirroring data from OLTP databases to ClickHouse
Robert Hodges & Altinity Engineering
1
14 December 2022
© 2022 Altinity, Inc.
Let’s make some introductions
ClickHouse support and services including Altinity.Cloud
Authors of Altinity Kubernetes Operator for ClickHouse
and other open source projects
Robert Hodges
Database geek with 30+ years
on DBMS systems. Day job:
Altinity CEO
Altinity Engineering
Database geeks with centuries
of experience in DBMS and
applications
2
© 2022 Altinity, Inc.
How do different database types arise?
3
eCommerce
Inventory management
and purchasing
vs
Funnel analysis and
fraud detection
Digital Marketing Campaign management vs Campaign evaluation
Software Defined
Networking
Network topology and
micro-segment
definition
vs
Access patterns
analysis
MySQL - OLTP ClickHouse - Analytics
© 2022 Altinity, Inc.
OLTP vs OLAP – Key difference is storage organization
4
ClickHouse
Read only selected columns
Rows minimally or not compressed Columns highly compressed
PostgreSQL, MySQL
Read all columns in row
© 2022 Altinity, Inc. 5
Some data just
needs a copy in
a column store
WordPress
dynamic site data
eCommerce
transactions
Ad bidding
transactions
Online auction
transactions
Chat messages Mobile
provisioning
requests
Credit card
transactions
Financial market
transactions
© 2022 Altinity, Inc.
Mirroring copies data and keeps it up-to-date
6
Initial Dump/Load
MySQL ClickHouse
OLTP App Analytic App
MySQL
Binlog Real-time Replication
© 2022 Altinity, Inc.
So…What’s the problem?
7
Mutating Rows
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Basics of
ReplacingMergeTree
8
© 2022 Altinity, Inc.
Let’s consider a source table in MySQL
CREATE TABLE `film` (
`film_id` smallint unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(128) NOT NULL,
. . .
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`),
. . .
) ENGINE=InnoDB
9
Primary key
© 2022 Altinity, Inc.
CREATE TABLE sakila.film (
`film_id` UInt16,
`title` String,
`description` Nullable(String),
`release_year` Nullable(String),
. . .
`last_update` DateTime,
`_version` UInt64 DEFAULT 0,
`_sign` Int8 DEFAULT 1
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY language_id, studio_id, film_id
Create a ClickHouse table to contain the mirrored data
10
© 2022 Altinity, Inc.
How ReplacingMergeTree works
11
0
3
3
1001 . . . . . .
1001 . . . . . .
1001
INSERT
_version
+1
-1
+1
_sign
(Other data columns)
fiilm
_id
UPDATE
DELETE
language_id
5 1001 . . . . . .
-1
studio_id
Eventually
consistent
replacement
of rows
© 2022 Altinity, Inc.
INSERT INTO sakila.film VALUES
(1001,'Blade Runner','Best. Sci-fi. Film. Ever.',
'1982',1,NULL,6,'0.99',117,'20.99','PG',
'Deleted Scenes,Behind the Scenes',now()
,0,1)
SELECT title, release_year
FROM film WHERE film_id = 1001
┌─title────────┬─release_year─┐
│ Blade Runner │ 1982 │
└──────────────┴──────────────┘
Adding a row to RMT table
12
© 2022 Altinity, Inc.
INSERT INTO sakila.film VALUES (1001,'Blade Runner',
'Best. Sci-fi. Film. Ever.',...,3,-1),
(1001,'Blade Runner - Director''s Cut','Best. Sci-fi. Film.
Ever.',...,3,1)
SELECT title, release_year
FROM film WHERE film_id = 1001
┌─title─────────────────────────┬─release_year─┐
│ Blade Runner - Director's Cut │ 1982 │
└───────────────────────────────┴──────────────┘
┌─title────────┬─release_year─┐
│ Blade Runner │ 1982 │
└──────────────┴──────────────┘
Updating a row in the RMT table
13
Unmerged
rows!
© 2022 Altinity, Inc.
Rows are replaced when merges occur
14
0
3
3
1001 1 . . .
1001 1 . . .
1001 2
+1
-1
+1
Part
Part
Merged Part
3 1001 1
-1
3 1001 2
+1
X
Pro tip: never assume rows
will merge full
?
© 2022 Altinity, Inc.
SELECT film_id, title
FROM sakila.film FINAL
WHERE film_id = 1001
┌─title─────────────────────────┬─release_year─┐
│ Blade Runner - Director's Cut │ 1982 │
└───────────────────────────────┴──────────────┘
FINAL keyword merges data dynamically
15
Adds initial scan to
merge rows
© 2022 Altinity, Inc.
INSERT INTO sakila.film VALUES
(1001,'Blade Runner - Director''s Cut',
'Best. Sci-fi. Film. Ever.',...,5,-1)
SELECT title, release_year, _version, _sign
FROM sakila.film FINAL
WHERE film_id = 1001
┌─title─────────────────────────┬─release_year─┬─_version─┬─_sign─┐
│ Blade Runner - Director's Cut │ 1982 │ 5 │ -1 │
└───────────────────────────────┴──────────────┴──────────┴───────┘
Deleting a row in RMT table
16
Deleted row!
© 2022 Altinity, Inc.
CREATE ROW POLICY
sakila_film_rp ON sakila.film
FOR SELECT USING sign != -1 TO ALL
SELECT title, release_year, _version, _sign
FROM sakila.film FINAL
WHERE film_id = 1001
Ok.
0 rows in set. Elapsed: 0.005 sec.
Row policies prevent deleted rows from showing up
17
Predicate automatically
added to queries
© 2022 Altinity, Inc.
SELECT inventory_id, film_id, title
FROM sakila.inventory AS i
INNER JOIN sakila.film AS f ON i.film_id = f.film_id
WHERE film.film_id = 1001
┌─inventory_id─┬─film_id─┬─title─────────────────────────┐
│ 1 │ 1001 │ Blade Runner - Director's Cut │
│ 1 │ 1001 │ Blade Runner │
└──────────────┴─────────┴───────────────────────────────┘
JOINs are tricky with RMT
18
Right side table does
not have FINAL!
© 2022 Altinity, Inc.
SELECT inventory_id, f.film_id, title
FROM sakila.inventory AS i
INNER JOIN (
SELECT film_id, title FROM sakila.film FINAL
)
AS f ON i.film_id = f.film_id
WHERE f.film_id = 1001
┌─inventory_id─┬─film_id─┬─title─────────────────────────┐
│ 1 │ 1001 │ Blade Runner - Director's Cut │
└──────────────┴─────────┴───────────────────────────────┘
Use a subquery with FINAL for right hand side table
19
Not elegant, but we can
work with it
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Performance
tips
20
© 2022 Altinity, Inc.
CREATE TABLE sakila.film (
`film_id` UInt16,
`title` String,
. . .
`_version` UInt64 DEFAULT 0,
`_sign` Int8 DEFAULT 1
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY language_id, studio_id, film_id
ORDER BY is critical for MergeTree performance
21
Row key goes
on right
Other cols go
to left
Pro tip: Use PRIMARY KEY to
prefix a long ORDER BY
© 2022 Altinity, Inc.
Use care on updates when ORDER BY has > 1 column
INSERT INTO sakila.film VALUES
(1001,'Blade Runner','Best. Sci-fi. Film. Ever.',
'1982',1,NULL,6,'0.99',117,'20.99','PG',
'Deleted Scenes,Behind the Scenes',now()
,3,-1),
(1001,'Blade Runner - Director''s Cut',
'Best. Sci-fi. Film. Ever.',
'1982',2,NULL,6,'0.99',120,'20.99','PG',
'Deleted Scenes,Behind the Scenes',now()
,3,1)
22
Must delete row if
ORDER BY
columns change!
© 2022 Altinity, Inc.
CREATE TABLE sakila.film (
`film_id` UInt16,
`title` String,
. . .
`_version` UInt64 DEFAULT 0,
`_sign` Int8 DEFAULT 1
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(film_id, 10000000)
ORDER BY language_id, studio_id, film_id
Partitioning is important for large tables
23
Choose a partition
key that keeps row
changes local to
single partitions
© 2022 Altinity, Inc.
SELECT release_year, count()
FROM sakila.film FINAL
GROUP BY release_year
ORDER BY release_year
SETTINGS
do_not_merge_across_partitions_select_final = 1
Restrict FINAL merge scope to single partitions
24
Run faster;
parallelize across
partitions
Note: Run ClickHouse
22.10+ to use this feature*
* https://github.com/ClickHouse/ClickHouse/issues/43296
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Current work to
improve
ReplacingMergeTree
25
© 2022 Altinity, Inc.
Just update from my side: the param min_age_to_force_merge_on_partition_only
works
Alexandr DubovikovSETTINGS min_age_to_force_merge_seconds = 120,
min_age_to_force_merge_on_partition_only = true;
Alexandr Dubovikovit will merge all parts at once
26
© 2022 Altinity, Inc.
# https://github.com/ClickHouse/ClickHouse/pull/40945
SET force_select_final = 1
SELECT inventory_id, film_id, title
FROM sakila.inventory AS i
INNER JOIN sakila.film AS f
ON i.film_id = f.film_id
WHERE film.film_id = 1001
┌─inventory_id─┬─film_id─┬─title─────────────────────────┐
│ 1 │ 1001 │ Blade Runner - Director's Cut │
└──────────────┴─────────┴───────────────────────────────┘
Add implicit FINAL at query level (Altinity)
27
Add FINAL
automatically to table
© 2022 Altinity, Inc.
# https://github.com/ClickHouse/ClickHouse/pull/41005
CREATE TABLE sakila.film (
`film_id` UInt16,
. . .
`_version` UInt64 DEFAULT 0,
`_sign` UInt8 DEFAULT 1
)
ENGINE = ReplacingMergeTree(_version, _sign)
ORDER BY language_id, studio_id, film_id
Eliminate deleted rows automatically (ContentSquare)
28
Delete column
processed by
RMT engine
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Wrap up
29
© 2022 Altinity, Inc.
Fully wired, continuous replication based on RMT
30
Table Engine(s)
Initial Dump/Load
MySQL ClickHouse
OLTP App Analytic App
MySQL
Binlog
Debezium
Altinity Sink
Connector
Kafka*
Event
Stream
*Including Pulsar and RedPanda
ReplacingMergeTree
© 2022 Altinity, Inc.
Where is the documentation?
ClickHouse official docs – https://clickhouse.com/docs/
Altinity Blog – https://altinity.com/blog/
Altinity Sink Connector for ClickHouse –
https://github.com/Altinity/clickhouse-sink-connector
Altinity Knowledge Base – https://kb.altinity.com/
31
© 2022 Altinity, Inc.
Thank you!
Questions?
rhodges at altinity dot com
https://altinity.com
32

More Related Content

PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
Altinity Quickstart for ClickHouse
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
ClickHouse Keeper
PDF
ClickHouse Monitoring 101: What to monitor and how
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Quickstart for ClickHouse
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
ClickHouse Keeper
ClickHouse Monitoring 101: What to monitor and how
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO

What's hot (20)

PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
ClickHouse Materialized Views: The Magic Continues
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
PDF
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
PDF
Your first ClickHouse data warehouse
PDF
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
PDF
Altinity Quickstart for ClickHouse-2202-09-15.pdf
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
10 Good Reasons to Use ClickHouse
PDF
Using ClickHouse for Experimentation
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
ClickHouse Deep Dive, by Aleksei Milovidov
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
ClickHouse Materialized Views: The Magic Continues
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
All about Zookeeper and ClickHouse Keeper.pdf
High Performance, High Reliability Data Loading on ClickHouse
ClickHouse materialized views - a secret weapon for high performance analytic...
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Your first ClickHouse data warehouse
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Altinity Quickstart for ClickHouse-2202-09-15.pdf
A Day in the Life of a ClickHouse Query Webinar Slides
Better than you think: Handling JSON data in ClickHouse
10 Good Reasons to Use ClickHouse
Using ClickHouse for Experimentation
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Ad

Similar to Adventures with the ClickHouse ReplacingMergeTree Engine (20)

PDF
A day in the life of a click house query
PDF
Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content a...
PPTX
2021_3DX_SimUnits_Internal_Presentation_Tillet.pptx
PPTX
2021_3DX_SimUnits_Internal_Presentation_Tillet.pptx
PPTX
Guido Baron, CMC (DELL Technologies)
PDF
“Optimization Techniques with Intel’s OpenVINO to Enhance Performance on Your...
PDF
Dell opti plex-3020-spec-sheet
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
UG204_MDE_GettingStarted_2_0_Beta_1
PPT
Optimizing Direct X On Multi Core Architectures
PDF
Dell Vostro 3671 datasheet
PDF
External should that be a microservice
PPTX
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
PPTX
New ThousandEyes Product Features and Release Highlights
PPTX
2022 SF Summit - Improving Developer Experience with CDK
DOCX
Creating Virtual Infrastructure
PDF
From zero to SYSTEM on full disk encrypted windows system
PDF
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
PPT
Database Development Replication Security Maintenance Report
PDF
Reference architecture xtrem-io-x2-with-citrix-xendesktop-7-16
A day in the life of a click house query
Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content a...
2021_3DX_SimUnits_Internal_Presentation_Tillet.pptx
2021_3DX_SimUnits_Internal_Presentation_Tillet.pptx
Guido Baron, CMC (DELL Technologies)
“Optimization Techniques with Intel’s OpenVINO to Enhance Performance on Your...
Dell opti plex-3020-spec-sheet
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
UG204_MDE_GettingStarted_2_0_Beta_1
Optimizing Direct X On Multi Core Architectures
Dell Vostro 3671 datasheet
External should that be a microservice
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
New ThousandEyes Product Features and Release Highlights
2022 SF Summit - Improving Developer Experience with CDK
Creating Virtual Infrastructure
From zero to SYSTEM on full disk encrypted windows system
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
Database Development Replication Security Maintenance Report
Reference architecture xtrem-io-x2-with-citrix-xendesktop-7-16
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...

Recently uploaded (20)

PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Fluorescence-microscope_Botany_detailed content
Clinical guidelines as a resource for EBP(1).pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Moving the Public Sector (Government) to a Digital Adoption
Launch Your Data Science Career in Kochi – 2025
Miokarditis (Inflamasi pada Otot Jantung)
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction-to-Cloud-ComputingFinal.pptx
.pdf is not working space design for the following data for the following dat...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Adventures with the ClickHouse ReplacingMergeTree Engine

  • 1. © 2022 Altinity, Inc. Adventures with the ClickHouse ReplacingMergeTree Engine Mirroring data from OLTP databases to ClickHouse Robert Hodges & Altinity Engineering 1 14 December 2022
  • 2. © 2022 Altinity, Inc. Let’s make some introductions ClickHouse support and services including Altinity.Cloud Authors of Altinity Kubernetes Operator for ClickHouse and other open source projects Robert Hodges Database geek with 30+ years on DBMS systems. Day job: Altinity CEO Altinity Engineering Database geeks with centuries of experience in DBMS and applications 2
  • 3. © 2022 Altinity, Inc. How do different database types arise? 3 eCommerce Inventory management and purchasing vs Funnel analysis and fraud detection Digital Marketing Campaign management vs Campaign evaluation Software Defined Networking Network topology and micro-segment definition vs Access patterns analysis MySQL - OLTP ClickHouse - Analytics
  • 4. © 2022 Altinity, Inc. OLTP vs OLAP – Key difference is storage organization 4 ClickHouse Read only selected columns Rows minimally or not compressed Columns highly compressed PostgreSQL, MySQL Read all columns in row
  • 5. © 2022 Altinity, Inc. 5 Some data just needs a copy in a column store WordPress dynamic site data eCommerce transactions Ad bidding transactions Online auction transactions Chat messages Mobile provisioning requests Credit card transactions Financial market transactions
  • 6. © 2022 Altinity, Inc. Mirroring copies data and keeps it up-to-date 6 Initial Dump/Load MySQL ClickHouse OLTP App Analytic App MySQL Binlog Real-time Replication
  • 7. © 2022 Altinity, Inc. So…What’s the problem? 7 Mutating Rows
  • 8. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Basics of ReplacingMergeTree 8
  • 9. © 2022 Altinity, Inc. Let’s consider a source table in MySQL CREATE TABLE `film` ( `film_id` smallint unsigned NOT NULL AUTO_INCREMENT, `title` varchar(128) NOT NULL, . . . `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`film_id`), KEY `idx_title` (`title`), . . . ) ENGINE=InnoDB 9 Primary key
  • 10. © 2022 Altinity, Inc. CREATE TABLE sakila.film ( `film_id` UInt16, `title` String, `description` Nullable(String), `release_year` Nullable(String), . . . `last_update` DateTime, `_version` UInt64 DEFAULT 0, `_sign` Int8 DEFAULT 1 ) ENGINE = ReplacingMergeTree(_version) ORDER BY language_id, studio_id, film_id Create a ClickHouse table to contain the mirrored data 10
  • 11. © 2022 Altinity, Inc. How ReplacingMergeTree works 11 0 3 3 1001 . . . . . . 1001 . . . . . . 1001 INSERT _version +1 -1 +1 _sign (Other data columns) fiilm _id UPDATE DELETE language_id 5 1001 . . . . . . -1 studio_id Eventually consistent replacement of rows
  • 12. © 2022 Altinity, Inc. INSERT INTO sakila.film VALUES (1001,'Blade Runner','Best. Sci-fi. Film. Ever.', '1982',1,NULL,6,'0.99',117,'20.99','PG', 'Deleted Scenes,Behind the Scenes',now() ,0,1) SELECT title, release_year FROM film WHERE film_id = 1001 ┌─title────────┬─release_year─┐ │ Blade Runner │ 1982 │ └──────────────┴──────────────┘ Adding a row to RMT table 12
  • 13. © 2022 Altinity, Inc. INSERT INTO sakila.film VALUES (1001,'Blade Runner', 'Best. Sci-fi. Film. Ever.',...,3,-1), (1001,'Blade Runner - Director''s Cut','Best. Sci-fi. Film. Ever.',...,3,1) SELECT title, release_year FROM film WHERE film_id = 1001 ┌─title─────────────────────────┬─release_year─┐ │ Blade Runner - Director's Cut │ 1982 │ └───────────────────────────────┴──────────────┘ ┌─title────────┬─release_year─┐ │ Blade Runner │ 1982 │ └──────────────┴──────────────┘ Updating a row in the RMT table 13 Unmerged rows!
  • 14. © 2022 Altinity, Inc. Rows are replaced when merges occur 14 0 3 3 1001 1 . . . 1001 1 . . . 1001 2 +1 -1 +1 Part Part Merged Part 3 1001 1 -1 3 1001 2 +1 X Pro tip: never assume rows will merge full ?
  • 15. © 2022 Altinity, Inc. SELECT film_id, title FROM sakila.film FINAL WHERE film_id = 1001 ┌─title─────────────────────────┬─release_year─┐ │ Blade Runner - Director's Cut │ 1982 │ └───────────────────────────────┴──────────────┘ FINAL keyword merges data dynamically 15 Adds initial scan to merge rows
  • 16. © 2022 Altinity, Inc. INSERT INTO sakila.film VALUES (1001,'Blade Runner - Director''s Cut', 'Best. Sci-fi. Film. Ever.',...,5,-1) SELECT title, release_year, _version, _sign FROM sakila.film FINAL WHERE film_id = 1001 ┌─title─────────────────────────┬─release_year─┬─_version─┬─_sign─┐ │ Blade Runner - Director's Cut │ 1982 │ 5 │ -1 │ └───────────────────────────────┴──────────────┴──────────┴───────┘ Deleting a row in RMT table 16 Deleted row!
  • 17. © 2022 Altinity, Inc. CREATE ROW POLICY sakila_film_rp ON sakila.film FOR SELECT USING sign != -1 TO ALL SELECT title, release_year, _version, _sign FROM sakila.film FINAL WHERE film_id = 1001 Ok. 0 rows in set. Elapsed: 0.005 sec. Row policies prevent deleted rows from showing up 17 Predicate automatically added to queries
  • 18. © 2022 Altinity, Inc. SELECT inventory_id, film_id, title FROM sakila.inventory AS i INNER JOIN sakila.film AS f ON i.film_id = f.film_id WHERE film.film_id = 1001 ┌─inventory_id─┬─film_id─┬─title─────────────────────────┐ │ 1 │ 1001 │ Blade Runner - Director's Cut │ │ 1 │ 1001 │ Blade Runner │ └──────────────┴─────────┴───────────────────────────────┘ JOINs are tricky with RMT 18 Right side table does not have FINAL!
  • 19. © 2022 Altinity, Inc. SELECT inventory_id, f.film_id, title FROM sakila.inventory AS i INNER JOIN ( SELECT film_id, title FROM sakila.film FINAL ) AS f ON i.film_id = f.film_id WHERE f.film_id = 1001 ┌─inventory_id─┬─film_id─┬─title─────────────────────────┐ │ 1 │ 1001 │ Blade Runner - Director's Cut │ └──────────────┴─────────┴───────────────────────────────┘ Use a subquery with FINAL for right hand side table 19 Not elegant, but we can work with it
  • 20. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Performance tips 20
  • 21. © 2022 Altinity, Inc. CREATE TABLE sakila.film ( `film_id` UInt16, `title` String, . . . `_version` UInt64 DEFAULT 0, `_sign` Int8 DEFAULT 1 ) ENGINE = ReplacingMergeTree(_version) ORDER BY language_id, studio_id, film_id ORDER BY is critical for MergeTree performance 21 Row key goes on right Other cols go to left Pro tip: Use PRIMARY KEY to prefix a long ORDER BY
  • 22. © 2022 Altinity, Inc. Use care on updates when ORDER BY has > 1 column INSERT INTO sakila.film VALUES (1001,'Blade Runner','Best. Sci-fi. Film. Ever.', '1982',1,NULL,6,'0.99',117,'20.99','PG', 'Deleted Scenes,Behind the Scenes',now() ,3,-1), (1001,'Blade Runner - Director''s Cut', 'Best. Sci-fi. Film. Ever.', '1982',2,NULL,6,'0.99',120,'20.99','PG', 'Deleted Scenes,Behind the Scenes',now() ,3,1) 22 Must delete row if ORDER BY columns change!
  • 23. © 2022 Altinity, Inc. CREATE TABLE sakila.film ( `film_id` UInt16, `title` String, . . . `_version` UInt64 DEFAULT 0, `_sign` Int8 DEFAULT 1 ) ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(film_id, 10000000) ORDER BY language_id, studio_id, film_id Partitioning is important for large tables 23 Choose a partition key that keeps row changes local to single partitions
  • 24. © 2022 Altinity, Inc. SELECT release_year, count() FROM sakila.film FINAL GROUP BY release_year ORDER BY release_year SETTINGS do_not_merge_across_partitions_select_final = 1 Restrict FINAL merge scope to single partitions 24 Run faster; parallelize across partitions Note: Run ClickHouse 22.10+ to use this feature* * https://github.com/ClickHouse/ClickHouse/issues/43296
  • 25. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Current work to improve ReplacingMergeTree 25
  • 26. © 2022 Altinity, Inc. Just update from my side: the param min_age_to_force_merge_on_partition_only works Alexandr DubovikovSETTINGS min_age_to_force_merge_seconds = 120, min_age_to_force_merge_on_partition_only = true; Alexandr Dubovikovit will merge all parts at once 26
  • 27. © 2022 Altinity, Inc. # https://github.com/ClickHouse/ClickHouse/pull/40945 SET force_select_final = 1 SELECT inventory_id, film_id, title FROM sakila.inventory AS i INNER JOIN sakila.film AS f ON i.film_id = f.film_id WHERE film.film_id = 1001 ┌─inventory_id─┬─film_id─┬─title─────────────────────────┐ │ 1 │ 1001 │ Blade Runner - Director's Cut │ └──────────────┴─────────┴───────────────────────────────┘ Add implicit FINAL at query level (Altinity) 27 Add FINAL automatically to table
  • 28. © 2022 Altinity, Inc. # https://github.com/ClickHouse/ClickHouse/pull/41005 CREATE TABLE sakila.film ( `film_id` UInt16, . . . `_version` UInt64 DEFAULT 0, `_sign` UInt8 DEFAULT 1 ) ENGINE = ReplacingMergeTree(_version, _sign) ORDER BY language_id, studio_id, film_id Eliminate deleted rows automatically (ContentSquare) 28 Delete column processed by RMT engine
  • 29. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Wrap up 29
  • 30. © 2022 Altinity, Inc. Fully wired, continuous replication based on RMT 30 Table Engine(s) Initial Dump/Load MySQL ClickHouse OLTP App Analytic App MySQL Binlog Debezium Altinity Sink Connector Kafka* Event Stream *Including Pulsar and RedPanda ReplacingMergeTree
  • 31. © 2022 Altinity, Inc. Where is the documentation? ClickHouse official docs – https://clickhouse.com/docs/ Altinity Blog – https://altinity.com/blog/ Altinity Sink Connector for ClickHouse – https://github.com/Altinity/clickhouse-sink-connector Altinity Knowledge Base – https://kb.altinity.com/ 31
  • 32. © 2022 Altinity, Inc. Thank you! Questions? rhodges at altinity dot com https://altinity.com 32