SlideShare a Scribd company logo
© 2022 Altinity, Inc.
Deep Dive on
ClickHouse Sharding
and Replication
Robert Hodges and Altinity Engineering
22 September 2022
1
© 2202 Altinity, Inc.
© 2022 Altinity, Inc.
Let’s make some introductions
ClickHouse support and services including Altinity.Cloud
Authors of Altinity Kubernetes Operator for ClickHouse
and other open source projects
Us
Database geeks with centuries
of experience in DBMS and
applications
You
Applications developers
looking to learn about
ClickHouse
2
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
What’s a
ClickHouse?
3
© 2022 Altinity, Inc.
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
ClickHouse is a SQL Data Warehouse
It’s the core engine for
real-time analytics
ClickHouse
Event
Streams
ELT
Object
Storage
Interactive
Graphics
Dashboards
APIs
4
© 2022 Altinity, Inc.
Distributed data is deeper than it looks
5
Width:
2 meters
Depth:
60 meters
“The
Bolton
Strid”
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Introducing
sharding and
replication
6
© 2022 Altinity, Inc.
Clickhouse nodes can scale vertically
Network-
Attached
Storage
CPU
RAM
Host
7
© 2022 Altinity, Inc.
Clickhouse nodes can scale vertically
CPU
RAM
Host
Network-
Attached
Storage
8
© 2022 Altinity, Inc.
Clusters introduce horizontal scaling
Shards
Replicas
Host Host Host
Host
Replicas improve read
IOPs and concurrency
Shards add write
IOPS
9
© 2022 Altinity, Inc.
Different sharding and replication patterns
Shard 1
Shard 3
Shard 2
Shard 4
All Sharded
Data sharded 4
ways without
replication
Replica 1
Replica 3
Replica 2
Replica 4
All Replicated
Data replicated 4
times without
sharding
Shard 1
Replica 1
Shard 1
Replica 2
Shard 2
Replica 1
Shard 2
Replica 2
Sharded and
Replicated
Data sharded 2
ways and
replicated 2 times
10
© 2022 Altinity, Inc.
MergeTree tables support replication
MergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsing
MergeTree
ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedVersionedCollapsing
MergeTree
ReplacingMergeTree ReplicatedReplacingMergeTree
Source data
Aggregated
data; single
row per group
Evolving data
11
© 2022 Altinity, Inc.
How replication works
INSERT
Replicate
ClickHouse Node 1
Table: ontime
(Parts)
ReplicatedMergeTree
:9009
:9443 ClickHouse Node 2
Table: ontime
(Parts)
ReplicatedMergeTree
:9009
:9443
zookeeper-1
ZNodes
:2181 zookeeper-2
ZNodes
:2181 zookeeper-3
ZNodes
:2181
12
© 2022 Altinity, Inc.
What is replicated?
Replicated statements Non-replicated statements
● INSERT
● ALTER TABLE
exceptions: FREEZE, MOVE TO
DISK, FETCH
● OPTIMIZE
● TRUNCATE
● CREATE table
● DROP table
● RENAME table
● DETACH table
● ATTACH table
Replicated*MergeTree ONLY
13
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Building
distributed
schema
14
© 2022 Altinity, Inc.
Example of a distributed data set with shards and replicas
clickhouse-0
ontime
_local
airports
ontime
clickhouse-1
ontime
_local
airports
ontime
clickhouse-2
ontime
_local
airports
ontime
clickhouse-3
ontime
_local
airports
ontime
Distributed
table
(No data)
Sharded,
replicated
table
(Partial data)
Fully
replicated
table
(All data)
15
© 2022 Altinity, Inc.
Step 1: A sharded, replicated fact table
CREATE TABLE IF NOT EXISTS `ontime_local` (
`Year` UInt16 CODEC(DoubleDelta, ZSTD(1)),
`Quarter` UInt8,
`Month` UInt8,
`DayofMonth` UInt8,
`DayOfWeek` UInt8, ...
) Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{shard}/{database}/ontime_local',
'{replica}')
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (FlightDate, `Year`, `Month`, DepDel15)
Replication is at the table level! Use a Replicated% Engine
16
© 2022 Altinity, Inc.
Step 2: A distributed table to find data
CREATE TABLE IF NOT EXISTS ontime
AS ontime_local
ENGINE = Distributed(
'{cluster}', currentDatabase(), ontime_local, rand())
Cluster
layout
Database Table Sharding
key
(optional)
17
© 2022 Altinity, Inc.
Step 3: A fully replicated dimension table
CREATE TABLE IF NOT EXISTS airports
AS default.dot_airports
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/all/{database}/airports',
'{replica}')
PARTITION BY tuple()
PRIMARY KEY AirportID
ORDER BY AirportID
Don’t bother with partitions
for small tables
Resolves to current
database
18
© 2022 Altinity, Inc.
Macros help CREATE TABLE ON CLUSTER
/etc/clickhouse-server/config.d/macros.xml:
<clickhouse>
<macros>
<all-sharded-shard>2</all-sharded-shard>
<cluster>demo</cluster>
<shard>0</shard>
<replica>clickhouse-0-1</replica>
</macros>
</clickhouse>
select * from system.macros
Replica names
should be unique
per host
19
© 2022 Altinity, Inc.
What does ON CLUSTER do?
ON CLUSTER executes a command over a set of nodes
CREATE TABLE IF NOT EXISTS `ontime_local` ON CLUSTER `{cluster}` ...
DROP TABLE IF EXISTS `ontime_local` ON CLUSTER `{cluster}` ...
ALTER TABLE `ontime_local` ON CLUSTER `{cluster}` ...
20
© 2022 Altinity, Inc.
How does ON CLUSTER know where to go?
/etc/clickhouse-server/config.d/remote_servers.xml:
<clickhouse>
<remote_servers>
<demo>
<!-- <secret>top secret</secret> -->
<shard>
<replica><host>10.0.0.71</host><port>9000</port></replica>
<replica><host>10.0.0.72</host><port>9000</port></replica>
<internal_replication>true</internal_replication>
</shard>
<shard>
. . .
</shard>
</demo>
</remote_servers>
</clickhouse>
“It’s a cluster
because I said so!”
Cluster name
21
Shared secret
© 2022 Altinity, Inc.
List layouts using system.clusters
-- Find name and hosts in each layout
SELECT
cluster,
groupArray(concat(host_name,':',toString(port))) AS hosts
FROM system.clusters
GROUP BY cluster ORDER BY cluster
22
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Loading and
querying data
23
© 2022 Altinity, Inc.
Data loading: Distributed vs. local INSERTs
ontime
_local
ontime
Insert via
distributed
table
Insert directly
to shards
ontime
_local
ontime
ontime
_local
ontime
ontime
_local
ontime
Data
Pipeline Data
Pipeline
Applications may have to
be more intelligent
May require more
resources
(Queue)
24
© 2022 Altinity, Inc.
INSERT into a distributed vs. local table
-- Insert into distributed table
INSERT INTO ontime VALUES
(2017,1,1,1,7,'2017-01-01','AA',19805,...),
(2017,1,1,1,7,'2017-01-01','AA',19805,...),
...
-- Insert into a local table
INSERT INTO ontime_local VALUES
(2017,1,1,1,7,'2017-01-01','AA',19805,...),
(2017,1,1,1,7,'2017-01-01','AA',19805,...),
...
25
© 2022 Altinity, Inc.
How does a distributed INSERT work?
ontime
_local
ontime
Insert via
distributed table
ontime
_local
ontime
ontime
_local
ontime
Data
Pipeline
(Queue)
insert_distributed_sync:
● 0 = async propagation
● 1 = sync propagation ontime
_local
ontime
Thread Pool
select * from
system.distribution_queue
replication
26
© 2022 Altinity, Inc.
Options for processing INSERTs
● Local vs distributed data insertion
○ INSERT to local table – no need to sync, larger blocks, faster
○ INSERT to Distributed table – sharding by ClickHouse
○ CHProxy -- distributes transactions across nodes, only works with HTTP
connections
● Asynchronous (default) vs synchronous insertions
○ insert_distributed_sync - Wait until batches make it to local tables
○ insert_quorum, select_sequential_consistency – Wait until replicas sync
27
© 2022 Altinity, Inc.
How do distributed SELECTs work?
ontime
_local
ontime
Application
ontime
_local
ontime
ontime
_local
ontime
ontime
_local
ontime
Application
Innermost
subselect is
distributed
AggregateState
computed
locally
Aggregates
merged on
initiator node
28
© 2022 Altinity, Inc.
Queries are pushed to all shards
SELECT Carrier, avg(DepDelay) AS Delay
FROM ontime
GROUP BY Carrier ORDER BY Delay DESC
SELECT Carrier, avg(DepDelay) AS Delay
FROM ontime_local
GROUP BY Carrier ORDER BY Delay DESC
29
© 2022 Altinity, Inc.
ClickHouse pushes down JOINs by default
SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad
FROM default.ontime o
JOIN default.airports a ON (a.IATA = o.Dest)
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC
LIMIT 10
SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS
ad
FROM default.ontime_local AS o
ALL INNER JOIN default.airports AS a ON a.IATA = o.Dest
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10
30
© 2022 Altinity, Inc.
...Unless the left side “table” is a subquery
SELECT d, Name n, c AS flights, ad
FROM
(
SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad
FROM default.ontime
GROUP BY d HAVING c > 100000
ORDER BY ad DESC
) AS o
LEFT JOIN airports ON airports.IATA = o.d
LIMIT 10
Remote
Servers
31
© 2022 Altinity, Inc.
It’s more complex when multiple tables are distributed
select foo from T1 where a in (select a from T2)
distributed_product_mode=?
local
select foo
from T1_local
where a in (
select a
from T2_local)
allow
select foo
from T1_local
where a in (
select a
from T2)
global
create temporary table
tmp Engine = Set
AS select a from T2;
select foo from
T1_local where a in
tmp;
(Subquery runs on
local table)
(Subquery runs on
distributed table) (Subquery runs on initiator;
broadcast to local temp table)
32
© 2022 Altinity, Inc.
What’s actually happening with queries? Let’s find out!
SELECT hostName() host, event_time, query_id,
is_initial_query AS initial,
if(is_initial_query, '', initial_query_id) as initial_q,
query
FROM cluster('{cluster}', system.query_log) AS st
WHERE type = 'QueryFinish' AND has(databases, 'test')
ORDER BY st.event_time DESC LIMIT 25
33
© 2022 Altinity, Inc.
Thinking about distributed data and joins
Large
id
1
2
…
…
1000
Small
id
1
…
100
Large
id
1
2
…
…
1000
Large
id
1
2
…
…
1000
Large
id
1001
1002
…
…
2000
Large
id
2001
2002
…
…
2000
Large
id
1001
1002
…
…
2000
Small
id
1
…
100
Shard 1 Shard 2 Shard 1 Shard 2
“Bucketing Model”
“Big Table Model”
All keys replicated Matching keys in
each bucket
34
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Tricks to query
distributed tables
35
© 2022 Altinity, Inc.
Use remote() to select from another node
SELECT count()
FROM remote('host-2', currentDatabase(), 'ontime_ref')
SELECT count()
FROM remoteSecure('host-2', currentDatabase(), 'ontime_ref')
┌───count()─┐
│ 196508419 │
└───────────┘
-- You can insert too, with FUNCTION keyword.
INSERT INTO FUNCTION remote(host, database, table, login,
password)
VALUES . . .
36
© 2022 Altinity, Inc.
More remote query tricks!
SELECT hostName() AS h, count() AS c FROM sdata GROUP BY h
┌─h─────────────────────────┬───c─┐
│ chi-test-rh-test-rh-1-0-0 │ 492 │
│ chi-test-rh-test-rh-0-0-0 │ 508 │
└───────────────────────────┴─────┘
SELECT hostName() AS h, count() AS c
FROM remote('chi-test-rh-test-rh-{0,1}-{0,1}', default, sdata)
GROUP BY h
┌─h─────────────────────────┬────c─┐
│ chi-test-rh-test-rh-1-0-0 │ 984 │
│ chi-test-rh-test-rh-1-1-0 │ 984 │
│ chi-test-rh-test-rh-0-1-0 │ 1016 │
│ chi-test-rh-test-rh-0-0-0 │ 1016 │
└───────────────────────────┴──────┘
Distributed table
Remote query all 4
hosts
37
© 2022 Altinity, Inc.
cluster() distributes queries dynamically
SELECT
hostName() AS host, count() AS tables
FROM cluster('{cluster}', system.tables)
WHERE database = 'default'
GROUP BY host
┌─host──────────────────────┬─tables─┐
│ chi-test-rh-test-rh-1-0-0 │ 2 │
│ chi-test-rh-test-rh-0-1-0 │ 2 │
└───────────────────────────┴────────┘
38
© 2022 Altinity, Inc.
clusterAllReplicas() goes to every node
SELECT
hostName() AS host, count() AS tables
FROM clusterAllReplicas('{cluster}', system.tables)
WHERE database = 'default'
GROUP BY host
┌─host──────────────────────┬─tables─┐
│ chi-test-rh-test-rh-1-0-0 │ 2 │
│ chi-test-rh-test-rh-1-1-0 │ 2 │
│ chi-test-rh-test-rh-0-1-0 │ 2 │
│ chi-test-rh-test-rh-0-0-0 │ 2 │
└───────────────────────────┴────────┘
39
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Scaling up
40
© 2022 Altinity, Inc.
Load testing and capacity planning made simple…
1. Establish single node baseload
● Use production data
● Max out SELECT & INSERT capacity with load tests
● Adjust schema and queries, retest
2. Add replicas to increase SELECT capacity
3. Add shards to increase INSERT capacity
41
© 2022 Altinity, Inc.
Selecting the sharding key
Shard 2 Shard 3
Shard 1
Randomized Key, e.g.,
cityHash64(Url)
Must query
all shards
Nodes are
balanced
Shard 3
Specific Key e.g.,
cityHash64(TenantId)
Unbalanced
nodes
Queries can
skip shards
Shard 2
Shard 1
Easier to
add nodes
Hard to
add nodes
42
© 2022 Altinity, Inc.
Options for shard rebalancing
● INSERT INTO new_cluster SELECT FROM old_cluster
○ Clickhouse-copier automates this
● Use (undocumented) ALTER TABLE MOVE PART TO SHARD
○ Example: ALTER TABLE test_move MOVE PART 'all_0_0_0' TO SHARD
'/clickhouse/shard_1/tables/test_move
● Move parts manually
○ ALTER TABLE FREEZE PARTITION
○ rsync to new host
○ ALTER TABLE ATTACH PARTITION
○ Drop original partition
43
© 2022 Altinity, Inc.
Bi-level sharding combines both approaches
cityHash64(Url)
Shard 2 Shard 3
Shard 1
TenantId
Shard 2
Shard 1
cityHash64(Url) cityHash64(Url)
Shard 2
Shard 1
Tenant-Group-1 Tenant-Group-2 Tenant-Group-3
Application chooses group
Distributed table
44
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Wrap-up and more
information
45
© 2022 Altinity, Inc.
Where is the documentation?
ClickHouse official docs – https://clickhouse.com/docs/
Altinity Blog – https://altinity.com/blog/
Altinity Youtube Channel –
https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA
Altinity Knowledge Base – https://kb.altinity.com/
ClickHouse Capacity Planning by Mik Kocikowski of CloudFlare
Meetups, other blogs, and external resources. Use your powers of Search!
46
© 2022 Altinity, Inc.
Where can I get help?
Telegram - ClickHouse Channel
Slack
● ClickHouse Public Workspace - clickhousedb.slack.com
● Altinity Public Workspace - altinitydbworkspace.slack.com
Education - Altinity ClickHouse Training
Support - Altinity offers support for ClickHouse in all environments
47
© 2022 Altinity, Inc. 48
© 2202 Altinity, Inc.
Thank you and
good luck!
Website: https://altinity.com
Email: info@altinity.com
Slack: altinitydbworkspace.slack.com
Altinity.Cloud
Altinity Support
Altinity Stable
Builds
We’re hiring!

More Related Content

PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
Altinity Quickstart for ClickHouse
PDF
ClickHouse Keeper
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
ClickHouse Materialized Views: The Magic Continues
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Deep Dive, by Aleksei Milovidov
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Quickstart for ClickHouse
ClickHouse Keeper
All about Zookeeper and ClickHouse Keeper.pdf
ClickHouse Materialized Views: The Magic Continues

What's hot (20)

PDF
ClickHouse Monitoring 101: What to monitor and how
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
10 Good Reasons to Use ClickHouse
PDF
Using ClickHouse for Experimentation
PDF
Altinity Quickstart for ClickHouse-2202-09-15.pdf
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Your first ClickHouse data warehouse
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
ClickHouse Features for Advanced Users, by Aleksei Milovidov
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
A day in the life of a click house query
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
PDF
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
ClickHouse Monitoring 101: What to monitor and how
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
10 Good Reasons to Use ClickHouse
Using ClickHouse for Experimentation
Altinity Quickstart for ClickHouse-2202-09-15.pdf
High Performance, High Reliability Data Loading on ClickHouse
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Adventures with the ClickHouse ReplacingMergeTree Engine
Your first ClickHouse data warehouse
Better than you think: Handling JSON data in ClickHouse
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
A day in the life of a click house query
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
A Day in the Life of a ClickHouse Query Webinar Slides
Ad

Similar to Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf (20)

PDF
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PPTX
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
PPTX
Relational Database to Apache Spark (and sometimes back again)
PDF
Introduction to Presto at Treasure Data
PDF
All you need to know about CREATE STATISTICS
 
PPTX
Snowflake’s Cloud Data Platform and Modern Analytics
PDF
OpenInfra Summit Vancouver 2023 - SSoT
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
PPTX
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
PDF
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
PDF
Data warehouse or conventional database: Which is right for you?
PPTX
Data relay introduction to big data clusters
PDF
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
PDF
Andre Paul: Importing VMware infrastructures into CloudStack
PPT
Hive User Meeting 2009 8 Facebook
PPT
Hive User Meeting August 2009 Facebook
PPTX
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
Creating Beautiful Dashboards with Grafana and ClickHouse
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Relational Database to Apache Spark (and sometimes back again)
Introduction to Presto at Treasure Data
All you need to know about CREATE STATISTICS
 
Snowflake’s Cloud Data Platform and Modern Analytics
OpenInfra Summit Vancouver 2023 - SSoT
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ClickHouse materialized views - a secret weapon for high performance analytic...
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Data warehouse or conventional database: Which is right for you?
Data relay introduction to big data clusters
ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf
Andre Paul: Importing VMware infrastructures into CloudStack
Hive User Meeting 2009 8 Facebook
Hive User Meeting August 2009 Facebook
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Mega Projects Data Mega Projects Data
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Global journeys: estimating international migration
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Computer network topology notes for revision
PPTX
1_Introduction to advance data techniques.pptx
PDF
Introduction to Business Data Analytics.
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
Mega Projects Data Mega Projects Data
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Major-Components-ofNKJNNKNKNKNKronment.pptx
.pdf is not working space design for the following data for the following dat...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Clinical guidelines as a resource for EBP(1).pdf
Fluorescence-microscope_Botany_detailed content
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Global journeys: estimating international migration
IBA_Chapter_11_Slides_Final_Accessible.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Computer network topology notes for revision
1_Introduction to advance data techniques.pptx
Introduction to Business Data Analytics.
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf

  • 1. © 2022 Altinity, Inc. Deep Dive on ClickHouse Sharding and Replication Robert Hodges and Altinity Engineering 22 September 2022 1 © 2202 Altinity, Inc.
  • 2. © 2022 Altinity, Inc. Let’s make some introductions ClickHouse support and services including Altinity.Cloud Authors of Altinity Kubernetes Operator for ClickHouse and other open source projects Us Database geeks with centuries of experience in DBMS and applications You Applications developers looking to learn about ClickHouse 2
  • 3. © 2022 Altinity, Inc. © 2022 Altinity, Inc. What’s a ClickHouse? 3
  • 4. © 2022 Altinity, Inc. Understands SQL Runs on bare metal to cloud Shared nothing architecture Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) ClickHouse is a SQL Data Warehouse It’s the core engine for real-time analytics ClickHouse Event Streams ELT Object Storage Interactive Graphics Dashboards APIs 4
  • 5. © 2022 Altinity, Inc. Distributed data is deeper than it looks 5 Width: 2 meters Depth: 60 meters “The Bolton Strid”
  • 6. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Introducing sharding and replication 6
  • 7. © 2022 Altinity, Inc. Clickhouse nodes can scale vertically Network- Attached Storage CPU RAM Host 7
  • 8. © 2022 Altinity, Inc. Clickhouse nodes can scale vertically CPU RAM Host Network- Attached Storage 8
  • 9. © 2022 Altinity, Inc. Clusters introduce horizontal scaling Shards Replicas Host Host Host Host Replicas improve read IOPs and concurrency Shards add write IOPS 9
  • 10. © 2022 Altinity, Inc. Different sharding and replication patterns Shard 1 Shard 3 Shard 2 Shard 4 All Sharded Data sharded 4 ways without replication Replica 1 Replica 3 Replica 2 Replica 4 All Replicated Data replicated 4 times without sharding Shard 1 Replica 1 Shard 1 Replica 2 Shard 2 Replica 1 Shard 2 Replica 2 Sharded and Replicated Data sharded 2 ways and replicated 2 times 10
  • 11. © 2022 Altinity, Inc. MergeTree tables support replication MergeTree SummingMergeTree AggregatingMergeTree CollapsingMergeTree VersionedCollapsing MergeTree ReplicatedMergeTree ReplicatedSummingMergeTree ReplicatedAggregatingMergeTree ReplicatedCollapsingMergeTree ReplicatedVersionedCollapsing MergeTree ReplacingMergeTree ReplicatedReplacingMergeTree Source data Aggregated data; single row per group Evolving data 11
  • 12. © 2022 Altinity, Inc. How replication works INSERT Replicate ClickHouse Node 1 Table: ontime (Parts) ReplicatedMergeTree :9009 :9443 ClickHouse Node 2 Table: ontime (Parts) ReplicatedMergeTree :9009 :9443 zookeeper-1 ZNodes :2181 zookeeper-2 ZNodes :2181 zookeeper-3 ZNodes :2181 12
  • 13. © 2022 Altinity, Inc. What is replicated? Replicated statements Non-replicated statements ● INSERT ● ALTER TABLE exceptions: FREEZE, MOVE TO DISK, FETCH ● OPTIMIZE ● TRUNCATE ● CREATE table ● DROP table ● RENAME table ● DETACH table ● ATTACH table Replicated*MergeTree ONLY 13
  • 14. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Building distributed schema 14
  • 15. © 2022 Altinity, Inc. Example of a distributed data set with shards and replicas clickhouse-0 ontime _local airports ontime clickhouse-1 ontime _local airports ontime clickhouse-2 ontime _local airports ontime clickhouse-3 ontime _local airports ontime Distributed table (No data) Sharded, replicated table (Partial data) Fully replicated table (All data) 15
  • 16. © 2022 Altinity, Inc. Step 1: A sharded, replicated fact table CREATE TABLE IF NOT EXISTS `ontime_local` ( `Year` UInt16 CODEC(DoubleDelta, ZSTD(1)), `Quarter` UInt8, `Month` UInt8, `DayofMonth` UInt8, `DayOfWeek` UInt8, ... ) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/{shard}/{database}/ontime_local', '{replica}') PARTITION BY toYYYYMM(FlightDate) ORDER BY (FlightDate, `Year`, `Month`, DepDel15) Replication is at the table level! Use a Replicated% Engine 16
  • 17. © 2022 Altinity, Inc. Step 2: A distributed table to find data CREATE TABLE IF NOT EXISTS ontime AS ontime_local ENGINE = Distributed( '{cluster}', currentDatabase(), ontime_local, rand()) Cluster layout Database Table Sharding key (optional) 17
  • 18. © 2022 Altinity, Inc. Step 3: A fully replicated dimension table CREATE TABLE IF NOT EXISTS airports AS default.dot_airports Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/all/{database}/airports', '{replica}') PARTITION BY tuple() PRIMARY KEY AirportID ORDER BY AirportID Don’t bother with partitions for small tables Resolves to current database 18
  • 19. © 2022 Altinity, Inc. Macros help CREATE TABLE ON CLUSTER /etc/clickhouse-server/config.d/macros.xml: <clickhouse> <macros> <all-sharded-shard>2</all-sharded-shard> <cluster>demo</cluster> <shard>0</shard> <replica>clickhouse-0-1</replica> </macros> </clickhouse> select * from system.macros Replica names should be unique per host 19
  • 20. © 2022 Altinity, Inc. What does ON CLUSTER do? ON CLUSTER executes a command over a set of nodes CREATE TABLE IF NOT EXISTS `ontime_local` ON CLUSTER `{cluster}` ... DROP TABLE IF EXISTS `ontime_local` ON CLUSTER `{cluster}` ... ALTER TABLE `ontime_local` ON CLUSTER `{cluster}` ... 20
  • 21. © 2022 Altinity, Inc. How does ON CLUSTER know where to go? /etc/clickhouse-server/config.d/remote_servers.xml: <clickhouse> <remote_servers> <demo> <!-- <secret>top secret</secret> --> <shard> <replica><host>10.0.0.71</host><port>9000</port></replica> <replica><host>10.0.0.72</host><port>9000</port></replica> <internal_replication>true</internal_replication> </shard> <shard> . . . </shard> </demo> </remote_servers> </clickhouse> “It’s a cluster because I said so!” Cluster name 21 Shared secret
  • 22. © 2022 Altinity, Inc. List layouts using system.clusters -- Find name and hosts in each layout SELECT cluster, groupArray(concat(host_name,':',toString(port))) AS hosts FROM system.clusters GROUP BY cluster ORDER BY cluster 22
  • 23. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Loading and querying data 23
  • 24. © 2022 Altinity, Inc. Data loading: Distributed vs. local INSERTs ontime _local ontime Insert via distributed table Insert directly to shards ontime _local ontime ontime _local ontime ontime _local ontime Data Pipeline Data Pipeline Applications may have to be more intelligent May require more resources (Queue) 24
  • 25. © 2022 Altinity, Inc. INSERT into a distributed vs. local table -- Insert into distributed table INSERT INTO ontime VALUES (2017,1,1,1,7,'2017-01-01','AA',19805,...), (2017,1,1,1,7,'2017-01-01','AA',19805,...), ... -- Insert into a local table INSERT INTO ontime_local VALUES (2017,1,1,1,7,'2017-01-01','AA',19805,...), (2017,1,1,1,7,'2017-01-01','AA',19805,...), ... 25
  • 26. © 2022 Altinity, Inc. How does a distributed INSERT work? ontime _local ontime Insert via distributed table ontime _local ontime ontime _local ontime Data Pipeline (Queue) insert_distributed_sync: ● 0 = async propagation ● 1 = sync propagation ontime _local ontime Thread Pool select * from system.distribution_queue replication 26
  • 27. © 2022 Altinity, Inc. Options for processing INSERTs ● Local vs distributed data insertion ○ INSERT to local table – no need to sync, larger blocks, faster ○ INSERT to Distributed table – sharding by ClickHouse ○ CHProxy -- distributes transactions across nodes, only works with HTTP connections ● Asynchronous (default) vs synchronous insertions ○ insert_distributed_sync - Wait until batches make it to local tables ○ insert_quorum, select_sequential_consistency – Wait until replicas sync 27
  • 28. © 2022 Altinity, Inc. How do distributed SELECTs work? ontime _local ontime Application ontime _local ontime ontime _local ontime ontime _local ontime Application Innermost subselect is distributed AggregateState computed locally Aggregates merged on initiator node 28
  • 29. © 2022 Altinity, Inc. Queries are pushed to all shards SELECT Carrier, avg(DepDelay) AS Delay FROM ontime GROUP BY Carrier ORDER BY Delay DESC SELECT Carrier, avg(DepDelay) AS Delay FROM ontime_local GROUP BY Carrier ORDER BY Delay DESC 29
  • 30. © 2022 Altinity, Inc. ClickHouse pushes down JOINs by default SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad FROM default.ontime o JOIN default.airports a ON (a.IATA = o.Dest) GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS ad FROM default.ontime_local AS o ALL INNER JOIN default.airports AS a ON a.IATA = o.Dest GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 30
  • 31. © 2022 Altinity, Inc. ...Unless the left side “table” is a subquery SELECT d, Name n, c AS flights, ad FROM ( SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad FROM default.ontime GROUP BY d HAVING c > 100000 ORDER BY ad DESC ) AS o LEFT JOIN airports ON airports.IATA = o.d LIMIT 10 Remote Servers 31
  • 32. © 2022 Altinity, Inc. It’s more complex when multiple tables are distributed select foo from T1 where a in (select a from T2) distributed_product_mode=? local select foo from T1_local where a in ( select a from T2_local) allow select foo from T1_local where a in ( select a from T2) global create temporary table tmp Engine = Set AS select a from T2; select foo from T1_local where a in tmp; (Subquery runs on local table) (Subquery runs on distributed table) (Subquery runs on initiator; broadcast to local temp table) 32
  • 33. © 2022 Altinity, Inc. What’s actually happening with queries? Let’s find out! SELECT hostName() host, event_time, query_id, is_initial_query AS initial, if(is_initial_query, '', initial_query_id) as initial_q, query FROM cluster('{cluster}', system.query_log) AS st WHERE type = 'QueryFinish' AND has(databases, 'test') ORDER BY st.event_time DESC LIMIT 25 33
  • 34. © 2022 Altinity, Inc. Thinking about distributed data and joins Large id 1 2 … … 1000 Small id 1 … 100 Large id 1 2 … … 1000 Large id 1 2 … … 1000 Large id 1001 1002 … … 2000 Large id 2001 2002 … … 2000 Large id 1001 1002 … … 2000 Small id 1 … 100 Shard 1 Shard 2 Shard 1 Shard 2 “Bucketing Model” “Big Table Model” All keys replicated Matching keys in each bucket 34
  • 35. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Tricks to query distributed tables 35
  • 36. © 2022 Altinity, Inc. Use remote() to select from another node SELECT count() FROM remote('host-2', currentDatabase(), 'ontime_ref') SELECT count() FROM remoteSecure('host-2', currentDatabase(), 'ontime_ref') ┌───count()─┐ │ 196508419 │ └───────────┘ -- You can insert too, with FUNCTION keyword. INSERT INTO FUNCTION remote(host, database, table, login, password) VALUES . . . 36
  • 37. © 2022 Altinity, Inc. More remote query tricks! SELECT hostName() AS h, count() AS c FROM sdata GROUP BY h ┌─h─────────────────────────┬───c─┐ │ chi-test-rh-test-rh-1-0-0 │ 492 │ │ chi-test-rh-test-rh-0-0-0 │ 508 │ └───────────────────────────┴─────┘ SELECT hostName() AS h, count() AS c FROM remote('chi-test-rh-test-rh-{0,1}-{0,1}', default, sdata) GROUP BY h ┌─h─────────────────────────┬────c─┐ │ chi-test-rh-test-rh-1-0-0 │ 984 │ │ chi-test-rh-test-rh-1-1-0 │ 984 │ │ chi-test-rh-test-rh-0-1-0 │ 1016 │ │ chi-test-rh-test-rh-0-0-0 │ 1016 │ └───────────────────────────┴──────┘ Distributed table Remote query all 4 hosts 37
  • 38. © 2022 Altinity, Inc. cluster() distributes queries dynamically SELECT hostName() AS host, count() AS tables FROM cluster('{cluster}', system.tables) WHERE database = 'default' GROUP BY host ┌─host──────────────────────┬─tables─┐ │ chi-test-rh-test-rh-1-0-0 │ 2 │ │ chi-test-rh-test-rh-0-1-0 │ 2 │ └───────────────────────────┴────────┘ 38
  • 39. © 2022 Altinity, Inc. clusterAllReplicas() goes to every node SELECT hostName() AS host, count() AS tables FROM clusterAllReplicas('{cluster}', system.tables) WHERE database = 'default' GROUP BY host ┌─host──────────────────────┬─tables─┐ │ chi-test-rh-test-rh-1-0-0 │ 2 │ │ chi-test-rh-test-rh-1-1-0 │ 2 │ │ chi-test-rh-test-rh-0-1-0 │ 2 │ │ chi-test-rh-test-rh-0-0-0 │ 2 │ └───────────────────────────┴────────┘ 39
  • 40. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Scaling up 40
  • 41. © 2022 Altinity, Inc. Load testing and capacity planning made simple… 1. Establish single node baseload ● Use production data ● Max out SELECT & INSERT capacity with load tests ● Adjust schema and queries, retest 2. Add replicas to increase SELECT capacity 3. Add shards to increase INSERT capacity 41
  • 42. © 2022 Altinity, Inc. Selecting the sharding key Shard 2 Shard 3 Shard 1 Randomized Key, e.g., cityHash64(Url) Must query all shards Nodes are balanced Shard 3 Specific Key e.g., cityHash64(TenantId) Unbalanced nodes Queries can skip shards Shard 2 Shard 1 Easier to add nodes Hard to add nodes 42
  • 43. © 2022 Altinity, Inc. Options for shard rebalancing ● INSERT INTO new_cluster SELECT FROM old_cluster ○ Clickhouse-copier automates this ● Use (undocumented) ALTER TABLE MOVE PART TO SHARD ○ Example: ALTER TABLE test_move MOVE PART 'all_0_0_0' TO SHARD '/clickhouse/shard_1/tables/test_move ● Move parts manually ○ ALTER TABLE FREEZE PARTITION ○ rsync to new host ○ ALTER TABLE ATTACH PARTITION ○ Drop original partition 43
  • 44. © 2022 Altinity, Inc. Bi-level sharding combines both approaches cityHash64(Url) Shard 2 Shard 3 Shard 1 TenantId Shard 2 Shard 1 cityHash64(Url) cityHash64(Url) Shard 2 Shard 1 Tenant-Group-1 Tenant-Group-2 Tenant-Group-3 Application chooses group Distributed table 44
  • 45. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Wrap-up and more information 45
  • 46. © 2022 Altinity, Inc. Where is the documentation? ClickHouse official docs – https://clickhouse.com/docs/ Altinity Blog – https://altinity.com/blog/ Altinity Youtube Channel – https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA Altinity Knowledge Base – https://kb.altinity.com/ ClickHouse Capacity Planning by Mik Kocikowski of CloudFlare Meetups, other blogs, and external resources. Use your powers of Search! 46
  • 47. © 2022 Altinity, Inc. Where can I get help? Telegram - ClickHouse Channel Slack ● ClickHouse Public Workspace - clickhousedb.slack.com ● Altinity Public Workspace - altinitydbworkspace.slack.com Education - Altinity ClickHouse Training Support - Altinity offers support for ClickHouse in all environments 47
  • 48. © 2022 Altinity, Inc. 48 © 2202 Altinity, Inc. Thank you and good luck! Website: https://altinity.com Email: info@altinity.com Slack: altinitydbworkspace.slack.com Altinity.Cloud Altinity Support Altinity Stable Builds We’re hiring!