SlideShare a Scribd company logo
CLICKHOUSE
QUERY PERFORMANCE
TIPS AND TRICKS
Robert Hodges -- October ClickHouse San Francisco Meetup
Brief Intros
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Goals of the talk
● Understand single node MergeTree structure
● Optimize queries without changing data
● Get bigger performance gains by changing data layout
● Introduce tools for performance monitoring
Non-Goals:
● Boost performance of sharded/replicated clusters
● Teach advanced ClickHouse performance management
ClickHouse &
MergeTree Intro
Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
a b c d
a b c d
a b c d
a b c d
And it’s really fast!
Introducing the MergeTree table engine
CREATE TABLE ontime (
Year UInt16,
Quarter UInt8,
Month UInt8,
...
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (Carrier, FlightDate)
Table engine type
How to break data
into parts
How to index and
sort data in each part
Basic MergeTree data layout
Table
Part
Index Columns
Sparse Index
Columns
sorted on
ORDER BY
columns
Rows match
PARTITION BY
expression
Part
Index Columns
Part
MergeTree layout within a single part
/var/lib/clickhouse/data/airline/ontime_reordered
2017-01-01 AA
2017-01-01 EV
2018-01-01 UA
2018-01-02 AA
...
primary.idx
||||
.mrk .bin
20170701_20170731_355_355_2/
(FlightDate, Carrier...) ActualElapsedTime Airline AirlineID...
||||
.mrk .bin
||||
.mrk .bin
Granule Compressed
Block
Mark
Basic Query
Tuning
ClickHouse performance tuning is different...
The bad news…
● No query optimizer
● No EXPLAIN PLAN
● May need to move [a lot
of] data for performance
The good news…
● No query optimizer!
● System log is great
● System tables are too
● Performance drivers are
simple: I/O and CPU
Your friend: the ClickHouse query log
clickhouse-client --send_logs_level=trace
select * from system.text_log
sudo less 
/var/log/clickhouse-server/clickhouse-server.log
Return messages to
clickhouse-client
View all log
messages on server
Must enable in
config.xml
(Log messages)
Limit
Expression
MergeSorting
PartialSorting
Expression
ParallelAggregating
Expression × 8
MergeTreeThread
Use system log to find out query details
SELECT toYear(FlightDate) year,
sum(Cancelled)/count(*) cancelled,
sum(DepDel15)/count(*) delayed_15
FROM airline.ontime
GROUP BY year ORDER BY year LIMIT 10
8 parallel threads
to read table
Query pipeline in log
Speed up query executing by adding threads
SELECT toYear(FlightDate) year,
sum(Cancelled)/count(*) cancelled,
sum(DepDel15)/count(*) delayed_15
FROM airline.ontime
GROUP BY year ORDER BY year LIMIT 10
SET max_threads = 2
SET max_threads = 4
. . .
max_threads defaults to half the
number of physical CPU cores
(Log messages)
Selected 355 parts by date,
355 parts by key,
21393 marks to read from 355
ranges
Speed up queries by reducing reads
SELECT toYear(FlightDate) year,
sum(Cancelled)/count(*) cancelled,
sum(DepDel15)/count(*) delayed_15
FROM airline.ontime
GROUP BY year ORDER BY year LIMIT 10
(Log messages)
Selected 12 parts by date,
12 parts by key,
692 marks to read from 12
ranges
SELECT toYear(FlightDate) year,
sum(Cancelled)/count(*) cancelled,
sum(DepDel15)/count(*) delayed_15
FROM airline.ontime
WHERE year =
toYear(toDate('2016-01-01'))
GROUP BY year ORDER BY year LIMIT 10
(Log messages)
Selected 2 parts by date,
2 parts by key,
73 marks to read from 2 ranges
Query execution tends to scale with I/O
SELECT
FlightDate,
count(*) AS total_flights,
sum(Cancelled) / count(*) AS cancelled,
sum(DepDel15) / count(*) AS delayed_15
FROM airline.ontime
WHERE (FlightDate >= toDate('2016-01-01'))
AND (FlightDate <= toDate('2016-02-10'))
GROUP BY FlightDate
(PREWHERE Log messages)
Elapsed: 0.591 sec.
Processed 173.82 million rows,
2.09 GB (294.34 million rows/s.,
3.53 GB/s.)
Use PREWHERE to help filter unindexed data
SELECT
Year, count(*) AS total_flights,
count(distinct Dest) as destinations,
count(distinct Carrier) as carriers,
sum(Cancelled) / count(*) AS cancelled,
sum(DepDel15) / count(*) AS delayed_15
FROM airline.ontime [PRE]WHERE Dest = 'BLI' GROUP BY Year
(WHERE Log messages)
Elapsed: 0.816 sec.
Processed 173.82 million rows,
5.74 GB (213.03 million rows/s.,
7.03 GB/s.)
But PREWHERE can kick in automatically
SET optimize_move_to_prewhere = 1
SELECT
Year, count(*) AS total_flights,
count(distinct Dest) as destinations,
count(distinct Carrier) as carriers,
sum(Cancelled) / count(*) AS cancelled,
sum(DepDel15) / count(*) AS delayed_15
FROM airline.ontime
WHERE Dest = 'BLI' GROUP BY Year (Log messages)
InterpreterSelectQuery:
MergeTreeWhereOptimizer: condition
"Dest = 'BLI'" moved to PREWHERE
This is the default value
Restructure joins to reduce data scanning
SELECT
Dest d, Name n, count(*) c, avg(ArrDelayMinutes)
FROM ontime
JOIN airports ON (airports.IATA = ontime.Dest)
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC
LIMIT 10
SELECT dest, Name n, c AS flights, ad FROM (
SELECT Dest dest, count(*) c, avg(ArrDelayMinutes) ad
FROM ontime
GROUP BY dest HAVING c > 100000
ORDER BY ad DESC
) LEFT JOIN airports ON airports.IATA = dest LIMIT 10
Faster
3.878
seconds
1.177
seconds
(Log messages)
ParallelAggregatingBlockInputStream
: Total aggregated. 173790727 rows
(from 10199.035 MiB) in 3.844 sec.
(45214666.568 rows/sec., 2653.455
MiB/sec.)
The log tells the story
(Log messages)
ParallelAggregatingBlockInputStream
: Total aggregated. 173815409 rows
(from 2652.213 MiB) in 1.142 sec.
(152149486.717 rows/sec., 2321.617
MiB/sec.)
Join during
MergeTree scan
Join after
MergeTree scan
More ways to find out about queries
SET log_queries = 1
Run a query
SELECT version()
SET log_queries = 0
SELECT * FROM system.query_log
WHERE query='SELECT version()'
SHOW PROCESSLIST
Start query logging
Stop query logging
Show currently
executing queries
Optimizing Data
Layout
Restructure data for big performance gains
● Ensure optimal number of parts
● Optimize primary key index and ordering to reduce data size and index
selectivity
● Use skip indexes to avoid unnecessary I/O
● Use encodings to reduce data size before compression
● Use materialized views to transform data outside of source table
● Plus many other tricks
CREATE TABLE ontime ...
ENGINE=MergeTree()
PARTITION BY
toYYYYMM(FlightDate)
CREATE TABLE ontime_many_parts
...
ENGINE=MergeTree()
PARTITION BY FlightDate
How do partition keys affect performance?
Is there a
practical
difference?
Keep parts in the hundreds, not thousands
Table Rows Parts
ontime 174M 355
ontime_many_parts (after
OPTIMIZE)
174M 10,085
ontime_many_parts (before
OPTIMIZE)
174M 14,635
CREATE TABLE ontime ...
ENGINE=MergeTree()
PARTITION BY
toYYYYMM(FlightDate)
CREATE TABLE ontime_many_parts
...
ENGINE=MergeTree()
PARTITION BY FlightDate
Think about primary key index structure
CREATE TABLE ontime_reordered (
Year UInt16,
Quarter` UInt8,
. . .)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (Carrier, Origin, FlightDate)
SETTINGS index_granularity=8192
Hashing large values
allows index to fit in
memory more easily
Large granularity
makes index smaller
Small granularity can make
skip indexes more selective
Table ORDER BY is key to performance
CREATE TABLE ontime_reordered (
Year UInt16,
Quarter` UInt8,
. . .)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (Carrier, Origin, FlightDate)
SETTINGS index_granularity=8192
Choose order to make
dependent non-key
values less random
Benefits:
➔ Higher compression
➔ Better index selectivity
➔ Better PREWHERE
performance
SET allow_experimental_data_skipping_indices=1;
ALTER TABLE ontime ADD INDEX
dest_name Dest TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 1
ALTER TABLE ontime ADD INDEX
cname Carrier TYPE set(100) GRANULARITY 1
OPTIMIZE TABLE ontime FINAL
-- In future releases:
ALTER TABLE ontime
UPDATE Dest=Dest, Carrier=Carrier
WHERE 1=1
Skip indexes cut down on I/O
Multiplier on base table
granularity; bigger is
less selective
Indexes & PREWHERE remove granules
(Log messages)
InterpreterSelectQuery: MergeTreeWhereOptimizer:
condition "Dest = 'PPG'" moved to PREWHERE
. . .
(SelectExecutor): Index `dest_name` has dropped 55
granules.
(SelectExecutor): Index `dest_name` has dropped 52
granules.
Apply PREWHERE
on Dest predicate
Use index to remove
granules from scan
Effectiveness depends on data distribution
SELECT
Year, count(*) AS flights,
sum(Cancelled) / flights AS cancelled,
sum(DepDel15) / flights AS delayed_15
FROM airline.ontime WHERE [Column] = [Value] GROUP BY Year
Column Value Index Count Rows Processed Query Response
Dest PPG ngrambf_v1 525 4.30M 0.053
Dest ATL ngrambf_v1 9,360,581 166.81M 0.622
Carrier ML set 70,622 3.39M 0.090
Carrier WN set 25,918,402 166.24M 0.566
Current index types
Name What it tracks
minmax High and low range of data; good for numbers with strong
locality like timestamps
set Unique values
ngrambf_v1 Presence of character ngrams, works with =, LIKE, search
predicates; good for long strings
tokenbf_v1 Like ngram but for whitespace-separated strings; good for
searching on tags
bloomfilter Presence of value in column
Encodings reduce data size
CREATE TABLE test_codecs ( a String,
a_lc LowCardinality(String) DEFAULT a,
b UInt32,
b_delta UInt32 DEFAULT b Codec(Delta),
b_delta_lz4 UInt32 DEFAULT b Codec(Delta, LZ4),
b_dd UInt32 DEFAULT b Codec(DoubleDelta),
b_dd_lz4 UInt32 DEFAULT b Codec(DoubleDelta, LZ4)
)
Engine = MergeTree
PARTITION BY tuple() ORDER BY tuple();
Differences
between
values
Differences
between change
of value
Values with
dictionary
encoding
Relative sizes of column data
But wait, there’s more! Encodings are faster
SELECT a AS a, count(*) AS c FROM test_codecs
GROUP BY a ORDER BY c ASC LIMIT 10
. . .
10 rows in set. Elapsed: 0.681 sec. Processed 100.00 million
rows, 2.69 GB (146.81 million rows/s., 3.95 GB/s.)
SELECT a_lc AS a, count(*) AS c FROM test_codecs
GROUP BY a ORDER BY c ASC LIMIT 10
. . .
10 rows in set. Elapsed: 0.148 sec. Processed 100.00 million
rows, 241.16 MB (675.55 million rows/s., 1.63 GB/s.)
Faster
Overview of encodings
Name Best for
LowCardinality Strings with fewer than 10K values
Delta Time series
Double Delta Increasing counters
Gorilla Gauge data (bounces around mean)
T64 Integers other than random hashes
Compression may vary across ZSTD and LZ4
Use mat views to boost performance further
CREATE MATERIALIZED VIEW ontime_daily_cancelled_mv
ENGINE = SummingMergeTree
PARTITION BY tuple() ORDER BY (FlightDate, Carrier)
POPULATE
AS SELECT
FlightDate, Carrier, count(*) AS flights,
sum(Cancelled) / count(*) AS cancelled,
sum(DepDel15) / count(*) AS delayed_15
FROM ontime
GROUP BY FlightDate, Carrier
Returns cancelled/late
flights where Carrier =
‘WN’ in 0.007 seconds
More things to think about
Use smaller datatypes wherever possible
Use ZSTD compression (slower but better ratio)
Use dictionaries instead of joins
Use sampling when approximate answers are acceptable
Shard/replicate data across a cluster for large datasets
Metrics and
Monitoring
Use system.parts to track partition content
SELECT
database, table,
count(*) AS parts,
uniq(partition) AS partitions,
sum(marks) AS marks,
sum(rows) AS rows,
formatReadableSize(sum(data_compressed_bytes)) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100.0, 2)
AS percentage
FROM system.parts
WHERE active and database = currentDatabase()
GROUP BY database, table
ORDER BY database ASC, table ASC
Use system.columns to check data size
SELECT table,
formatReadableSize(sum(data_compressed_bytes)) tc,
formatReadableSize(sum(data_uncompressed_bytes)) tu,
sum(data_compressed_bytes) / sum(data_uncompressed_bytes) as ratio
FROM system.columns
WHERE database = currentDatabase()
GROUP BY table ORDER BY table
ClickHouse has great operational metrics
Some of our favorite tables in the system database…
query_log
(Query history)
part_log
(Part merges)
text_log
(System log
messages)
asynchronous
_events
(Background
metrics)
events
(Cumulative event
counters)
metrics
(Current counters)
It’s easy to dump values and graph them
Open files during
OPTIMIZE TABLE
command on
174M row table
(system.metrics
OpenFileForRead
counter)
And don’t forget all the great OS utilities!
● top and htop -- CPU and memory
● dstat -- I/O and network consumption
● iostat -- I/O by device
● iotop -- I/O by process
● iftop -- Network consumption by host
● perf top -- CPU utilization by system function
For a full description see Performance Analysis of ClickHouse Queries by Alexey
Milovidov
Wrap-up
Takeaways on ClickHouse Performance
● ClickHouse performance drivers are CPU and I/O
● The system query log is key to understanding performance
● Query optimization can improve response substantially
● Restructure tables and add indexes/mat views for biggest
gains
● Use the system database tables and OS tools for deeper
analysis of performance
Further resources
● Altinity Blog
● ClickHouse and the Magic of Materialized Views (Webinar)
● Performance Analysis of ClickHouse Queries by Alexey Milovidov
● ClickHouse Telegram Channel
Thank you!
We’re hiring!
Presenter:
rhodges@altinity.com
ClickHouse:
https://github.com/ClickHouse/ClickHouse
Altinity:
https://www.altinity.com
Additional material
● Other things to try: use_uncompressed_cache

More Related Content

PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
Altinity Quickstart for ClickHouse
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
Introducing Change Data Capture with Debezium
PDF
10 Good Reasons to Use ClickHouse
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Quickstart for ClickHouse
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
All about Zookeeper and ClickHouse Keeper.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
ClickHouse Deep Dive, by Aleksei Milovidov
Introducing Change Data Capture with Debezium
10 Good Reasons to Use ClickHouse

What's hot (20)

PDF
ClickHouse Monitoring 101: What to monitor and how
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
ClickHouse Materialized Views: The Magic Continues
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
Your first ClickHouse data warehouse
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
PDF
A day in the life of a click house query
PDF
Using ClickHouse for Experimentation
PDF
[Meetup] a successful migration from elastic search to clickhouse
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
ClickHouse Monitoring 101: What to monitor and how
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
ClickHouse Materialized Views: The Magic Continues
Better than you think: Handling JSON data in ClickHouse
Your first ClickHouse data warehouse
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
High Performance, High Reliability Data Loading on ClickHouse
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
A day in the life of a click house query
Using ClickHouse for Experimentation
[Meetup] a successful migration from elastic search to clickhouse
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Ad

Similar to ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO (20)

PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
PPTX
Simplifying SQL with CTE's and windowing functions
PDF
nter-pod Revolutions: Connected Enterprise Solution in Oracle EPM Cloud
PDF
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
PDF
Ctes percona live_2017
PPTX
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
PDF
MySQL Optimizer: What’s New in 8.0
PDF
Data warehouse or conventional database: Which is right for you?
PDF
Fun with click house window functions webinar slides 2021-08-19
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PPT
PHP tips by a MYSQL DBA
PPT
Oracle tips and tricks
PPT
Database Development Replication Security Maintenance Report
ODP
Scaling PostgreSQL With GridSQL
PDF
Performance Enhancements In Postgre Sql 8.4
PDF
Monitoring InfluxEnterprise
ODP
PostgreSQL 8.4 TriLUG 2009-11-12
PDF
Tactical data engineering
PPTX
CS 542 -- Query Execution
PPTX
U-SQL Query Execution and Performance Tuning
ClickHouse materialized views - a secret weapon for high performance analytic...
Simplifying SQL with CTE's and windowing functions
nter-pod Revolutions: Connected Enterprise Solution in Oracle EPM Cloud
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Ctes percona live_2017
Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event St...
MySQL Optimizer: What’s New in 8.0
Data warehouse or conventional database: Which is right for you?
Fun with click house window functions webinar slides 2021-08-19
Fun with ClickHouse Window Functions-2021-08-19.pdf
PHP tips by a MYSQL DBA
Oracle tips and tricks
Database Development Replication Security Maintenance Report
Scaling PostgreSQL With GridSQL
Performance Enhancements In Postgre Sql 8.4
Monitoring InfluxEnterprise
PostgreSQL 8.4 TriLUG 2009-11-12
Tactical data engineering
CS 542 -- Query Execution
U-SQL Query Execution and Performance Tuning
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
A Presentation on Artificial Intelligence
PPTX
Cloud computing and distributed systems.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
A Presentation on Artificial Intelligence
Cloud computing and distributed systems.
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KodekX | Application Modernization Development

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO

  • 1. CLICKHOUSE QUERY PERFORMANCE TIPS AND TRICKS Robert Hodges -- October ClickHouse San Francisco Meetup
  • 2. Brief Intros www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Goals of the talk ● Understand single node MergeTree structure ● Optimize queries without changing data ● Get bigger performance gains by changing data layout ● Introduce tools for performance monitoring Non-Goals: ● Boost performance of sharded/replicated clusters ● Teach advanced ClickHouse performance management
  • 5. Introduction to ClickHouse Understands SQL Runs on bare metal to cloud Shared nothing architecture Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) a b c d a b c d a b c d a b c d And it’s really fast!
  • 6. Introducing the MergeTree table engine CREATE TABLE ontime ( Year UInt16, Quarter UInt8, Month UInt8, ... ) ENGINE = MergeTree() PARTITION BY toYYYYMM(FlightDate) ORDER BY (Carrier, FlightDate) Table engine type How to break data into parts How to index and sort data in each part
  • 7. Basic MergeTree data layout Table Part Index Columns Sparse Index Columns sorted on ORDER BY columns Rows match PARTITION BY expression Part Index Columns Part
  • 8. MergeTree layout within a single part /var/lib/clickhouse/data/airline/ontime_reordered 2017-01-01 AA 2017-01-01 EV 2018-01-01 UA 2018-01-02 AA ... primary.idx |||| .mrk .bin 20170701_20170731_355_355_2/ (FlightDate, Carrier...) ActualElapsedTime Airline AirlineID... |||| .mrk .bin |||| .mrk .bin Granule Compressed Block Mark
  • 10. ClickHouse performance tuning is different... The bad news… ● No query optimizer ● No EXPLAIN PLAN ● May need to move [a lot of] data for performance The good news… ● No query optimizer! ● System log is great ● System tables are too ● Performance drivers are simple: I/O and CPU
  • 11. Your friend: the ClickHouse query log clickhouse-client --send_logs_level=trace select * from system.text_log sudo less /var/log/clickhouse-server/clickhouse-server.log Return messages to clickhouse-client View all log messages on server Must enable in config.xml
  • 12. (Log messages) Limit Expression MergeSorting PartialSorting Expression ParallelAggregating Expression × 8 MergeTreeThread Use system log to find out query details SELECT toYear(FlightDate) year, sum(Cancelled)/count(*) cancelled, sum(DepDel15)/count(*) delayed_15 FROM airline.ontime GROUP BY year ORDER BY year LIMIT 10 8 parallel threads to read table Query pipeline in log
  • 13. Speed up query executing by adding threads SELECT toYear(FlightDate) year, sum(Cancelled)/count(*) cancelled, sum(DepDel15)/count(*) delayed_15 FROM airline.ontime GROUP BY year ORDER BY year LIMIT 10 SET max_threads = 2 SET max_threads = 4 . . . max_threads defaults to half the number of physical CPU cores
  • 14. (Log messages) Selected 355 parts by date, 355 parts by key, 21393 marks to read from 355 ranges Speed up queries by reducing reads SELECT toYear(FlightDate) year, sum(Cancelled)/count(*) cancelled, sum(DepDel15)/count(*) delayed_15 FROM airline.ontime GROUP BY year ORDER BY year LIMIT 10 (Log messages) Selected 12 parts by date, 12 parts by key, 692 marks to read from 12 ranges SELECT toYear(FlightDate) year, sum(Cancelled)/count(*) cancelled, sum(DepDel15)/count(*) delayed_15 FROM airline.ontime WHERE year = toYear(toDate('2016-01-01')) GROUP BY year ORDER BY year LIMIT 10
  • 15. (Log messages) Selected 2 parts by date, 2 parts by key, 73 marks to read from 2 ranges Query execution tends to scale with I/O SELECT FlightDate, count(*) AS total_flights, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM airline.ontime WHERE (FlightDate >= toDate('2016-01-01')) AND (FlightDate <= toDate('2016-02-10')) GROUP BY FlightDate
  • 16. (PREWHERE Log messages) Elapsed: 0.591 sec. Processed 173.82 million rows, 2.09 GB (294.34 million rows/s., 3.53 GB/s.) Use PREWHERE to help filter unindexed data SELECT Year, count(*) AS total_flights, count(distinct Dest) as destinations, count(distinct Carrier) as carriers, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM airline.ontime [PRE]WHERE Dest = 'BLI' GROUP BY Year (WHERE Log messages) Elapsed: 0.816 sec. Processed 173.82 million rows, 5.74 GB (213.03 million rows/s., 7.03 GB/s.)
  • 17. But PREWHERE can kick in automatically SET optimize_move_to_prewhere = 1 SELECT Year, count(*) AS total_flights, count(distinct Dest) as destinations, count(distinct Carrier) as carriers, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM airline.ontime WHERE Dest = 'BLI' GROUP BY Year (Log messages) InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "Dest = 'BLI'" moved to PREWHERE This is the default value
  • 18. Restructure joins to reduce data scanning SELECT Dest d, Name n, count(*) c, avg(ArrDelayMinutes) FROM ontime JOIN airports ON (airports.IATA = ontime.Dest) GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 SELECT dest, Name n, c AS flights, ad FROM ( SELECT Dest dest, count(*) c, avg(ArrDelayMinutes) ad FROM ontime GROUP BY dest HAVING c > 100000 ORDER BY ad DESC ) LEFT JOIN airports ON airports.IATA = dest LIMIT 10 Faster 3.878 seconds 1.177 seconds
  • 19. (Log messages) ParallelAggregatingBlockInputStream : Total aggregated. 173790727 rows (from 10199.035 MiB) in 3.844 sec. (45214666.568 rows/sec., 2653.455 MiB/sec.) The log tells the story (Log messages) ParallelAggregatingBlockInputStream : Total aggregated. 173815409 rows (from 2652.213 MiB) in 1.142 sec. (152149486.717 rows/sec., 2321.617 MiB/sec.) Join during MergeTree scan Join after MergeTree scan
  • 20. More ways to find out about queries SET log_queries = 1 Run a query SELECT version() SET log_queries = 0 SELECT * FROM system.query_log WHERE query='SELECT version()' SHOW PROCESSLIST Start query logging Stop query logging Show currently executing queries
  • 22. Restructure data for big performance gains ● Ensure optimal number of parts ● Optimize primary key index and ordering to reduce data size and index selectivity ● Use skip indexes to avoid unnecessary I/O ● Use encodings to reduce data size before compression ● Use materialized views to transform data outside of source table ● Plus many other tricks
  • 23. CREATE TABLE ontime ... ENGINE=MergeTree() PARTITION BY toYYYYMM(FlightDate) CREATE TABLE ontime_many_parts ... ENGINE=MergeTree() PARTITION BY FlightDate How do partition keys affect performance? Is there a practical difference?
  • 24. Keep parts in the hundreds, not thousands Table Rows Parts ontime 174M 355 ontime_many_parts (after OPTIMIZE) 174M 10,085 ontime_many_parts (before OPTIMIZE) 174M 14,635 CREATE TABLE ontime ... ENGINE=MergeTree() PARTITION BY toYYYYMM(FlightDate) CREATE TABLE ontime_many_parts ... ENGINE=MergeTree() PARTITION BY FlightDate
  • 25. Think about primary key index structure CREATE TABLE ontime_reordered ( Year UInt16, Quarter` UInt8, . . .) ENGINE = MergeTree() PARTITION BY toYYYYMM(FlightDate) ORDER BY (Carrier, Origin, FlightDate) SETTINGS index_granularity=8192 Hashing large values allows index to fit in memory more easily Large granularity makes index smaller Small granularity can make skip indexes more selective
  • 26. Table ORDER BY is key to performance CREATE TABLE ontime_reordered ( Year UInt16, Quarter` UInt8, . . .) ENGINE = MergeTree() PARTITION BY toYYYYMM(FlightDate) ORDER BY (Carrier, Origin, FlightDate) SETTINGS index_granularity=8192 Choose order to make dependent non-key values less random Benefits: ➔ Higher compression ➔ Better index selectivity ➔ Better PREWHERE performance
  • 27. SET allow_experimental_data_skipping_indices=1; ALTER TABLE ontime ADD INDEX dest_name Dest TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 1 ALTER TABLE ontime ADD INDEX cname Carrier TYPE set(100) GRANULARITY 1 OPTIMIZE TABLE ontime FINAL -- In future releases: ALTER TABLE ontime UPDATE Dest=Dest, Carrier=Carrier WHERE 1=1 Skip indexes cut down on I/O Multiplier on base table granularity; bigger is less selective
  • 28. Indexes & PREWHERE remove granules (Log messages) InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "Dest = 'PPG'" moved to PREWHERE . . . (SelectExecutor): Index `dest_name` has dropped 55 granules. (SelectExecutor): Index `dest_name` has dropped 52 granules. Apply PREWHERE on Dest predicate Use index to remove granules from scan
  • 29. Effectiveness depends on data distribution SELECT Year, count(*) AS flights, sum(Cancelled) / flights AS cancelled, sum(DepDel15) / flights AS delayed_15 FROM airline.ontime WHERE [Column] = [Value] GROUP BY Year Column Value Index Count Rows Processed Query Response Dest PPG ngrambf_v1 525 4.30M 0.053 Dest ATL ngrambf_v1 9,360,581 166.81M 0.622 Carrier ML set 70,622 3.39M 0.090 Carrier WN set 25,918,402 166.24M 0.566
  • 30. Current index types Name What it tracks minmax High and low range of data; good for numbers with strong locality like timestamps set Unique values ngrambf_v1 Presence of character ngrams, works with =, LIKE, search predicates; good for long strings tokenbf_v1 Like ngram but for whitespace-separated strings; good for searching on tags bloomfilter Presence of value in column
  • 31. Encodings reduce data size CREATE TABLE test_codecs ( a String, a_lc LowCardinality(String) DEFAULT a, b UInt32, b_delta UInt32 DEFAULT b Codec(Delta), b_delta_lz4 UInt32 DEFAULT b Codec(Delta, LZ4), b_dd UInt32 DEFAULT b Codec(DoubleDelta), b_dd_lz4 UInt32 DEFAULT b Codec(DoubleDelta, LZ4) ) Engine = MergeTree PARTITION BY tuple() ORDER BY tuple(); Differences between values Differences between change of value Values with dictionary encoding
  • 32. Relative sizes of column data
  • 33. But wait, there’s more! Encodings are faster SELECT a AS a, count(*) AS c FROM test_codecs GROUP BY a ORDER BY c ASC LIMIT 10 . . . 10 rows in set. Elapsed: 0.681 sec. Processed 100.00 million rows, 2.69 GB (146.81 million rows/s., 3.95 GB/s.) SELECT a_lc AS a, count(*) AS c FROM test_codecs GROUP BY a ORDER BY c ASC LIMIT 10 . . . 10 rows in set. Elapsed: 0.148 sec. Processed 100.00 million rows, 241.16 MB (675.55 million rows/s., 1.63 GB/s.) Faster
  • 34. Overview of encodings Name Best for LowCardinality Strings with fewer than 10K values Delta Time series Double Delta Increasing counters Gorilla Gauge data (bounces around mean) T64 Integers other than random hashes Compression may vary across ZSTD and LZ4
  • 35. Use mat views to boost performance further CREATE MATERIALIZED VIEW ontime_daily_cancelled_mv ENGINE = SummingMergeTree PARTITION BY tuple() ORDER BY (FlightDate, Carrier) POPULATE AS SELECT FlightDate, Carrier, count(*) AS flights, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM ontime GROUP BY FlightDate, Carrier Returns cancelled/late flights where Carrier = ‘WN’ in 0.007 seconds
  • 36. More things to think about Use smaller datatypes wherever possible Use ZSTD compression (slower but better ratio) Use dictionaries instead of joins Use sampling when approximate answers are acceptable Shard/replicate data across a cluster for large datasets
  • 38. Use system.parts to track partition content SELECT database, table, count(*) AS parts, uniq(partition) AS partitions, sum(marks) AS marks, sum(rows) AS rows, formatReadableSize(sum(data_compressed_bytes)) AS compressed, formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed, round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100.0, 2) AS percentage FROM system.parts WHERE active and database = currentDatabase() GROUP BY database, table ORDER BY database ASC, table ASC
  • 39. Use system.columns to check data size SELECT table, formatReadableSize(sum(data_compressed_bytes)) tc, formatReadableSize(sum(data_uncompressed_bytes)) tu, sum(data_compressed_bytes) / sum(data_uncompressed_bytes) as ratio FROM system.columns WHERE database = currentDatabase() GROUP BY table ORDER BY table
  • 40. ClickHouse has great operational metrics Some of our favorite tables in the system database… query_log (Query history) part_log (Part merges) text_log (System log messages) asynchronous _events (Background metrics) events (Cumulative event counters) metrics (Current counters)
  • 41. It’s easy to dump values and graph them Open files during OPTIMIZE TABLE command on 174M row table (system.metrics OpenFileForRead counter)
  • 42. And don’t forget all the great OS utilities! ● top and htop -- CPU and memory ● dstat -- I/O and network consumption ● iostat -- I/O by device ● iotop -- I/O by process ● iftop -- Network consumption by host ● perf top -- CPU utilization by system function For a full description see Performance Analysis of ClickHouse Queries by Alexey Milovidov
  • 44. Takeaways on ClickHouse Performance ● ClickHouse performance drivers are CPU and I/O ● The system query log is key to understanding performance ● Query optimization can improve response substantially ● Restructure tables and add indexes/mat views for biggest gains ● Use the system database tables and OS tools for deeper analysis of performance
  • 45. Further resources ● Altinity Blog ● ClickHouse and the Magic of Materialized Views (Webinar) ● Performance Analysis of ClickHouse Queries by Alexey Milovidov ● ClickHouse Telegram Channel
  • 47. Additional material ● Other things to try: use_uncompressed_cache