SlideShare a Scribd company logo
Your first ClickHouse
data warehouse
Robert Hodges - 2 December 2020
SF Bay Area ClickHouse Meetup
1
Presenter and Company Bio
www.altinity.com
Enterprise provider for ClickHouse, a
popular, open source data warehouse.
Community sponsor and major
committers to ClickHouse project.
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security. Using
Kubernetes since 2018.
2
Introducing
ClickHouse
Single binary
Understands SQL
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
ClickHouse is an open source data warehouse
ClickHouse Server
a b c d
And it’s really fast!
ClickHouse Server
a b c d
ClickHouse Server
a b c d
ClickHouse Server
a b c d
Installing ClickHouse goodness on Linux
# UBUNTU/DEBIAN INSTALL
sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 
--recv E0C56BD4
echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee 
/etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
sudo systemctl start clickhouse-server
Debian
Packages
TarballsRPMs
ClickHouse goodness delivered by Docker
mkdir $HOME/clickhouse-data
docker run -d --name clickhouse-server 
--ulimit nofile=262144:262144 
--volume=$HOME/clickhouse-data:/var/lib/clickhouse 
-p 8123:8123 -p 9000:9000 
yandex/clickhouse-server
6
Persist data
Make ports visible
Make ClickHouse happy
YES!
● Yandex Managed Service for ClickHouse --
Runs in Yandex.Cloud
● Altinity.Cloud -- Runs in Amazon Public Cloud
Is there ClickHouse cloud goodness?
7
Where is the documentation?
8
https://clickhouse.tech/
Getting started
with app
development
10
First step: The ClickHouse Tutorial
10
https://clickhouse.tech/docs/en/getting-started/tutorial/
Second step: Design table(s) and load data
CREATE TABLE meetup.readings (
sensor_id Int32,
time DateTime,
date Date,
temperature Decimal(5,2)
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time);
Don’t stress about data types
Use MergeTree table types
Partition by month or day
Sort by “keys” to find dataLZ4 compression by default
Table
Part
Index Columns
Sparse index
Columns sorted
on ORDER BY
columns
Rows match
PARTITION BY
expression
Part
Index Columns
Part
Compressed
block
12
Your friend: the MergeTree table type
12
CSVWithNames
"sensor_id","time","date","temperature"
0,"2019-01-01 00:00:00","2019-01-01",43.31
0,"2019-01-01 00:01:00","2019-01-01",43.35
JSONEachRow
{"sensor_id":0,"time":"2019-01-01 00:00:00","date":"2019-01-01",...}
{"sensor_id":0,"time":"2019-01-01 00:01:00","date":"2019-01-01",...}
{"sensor_id":0,"time":"2019-01-01 00:02:00","date":"2019-01-01",...}
Popular formats for loading data
# Load CSV
cat readings.csv | 
clickhouse-client 
--query "INSERT INTO meetup.readings FORMAT CSVWithNames"
# Load JSON
cat readings.json | 
clickhouse-client --query "INSERT INTO meetup.readings
FORMAT JSONEachRow"
Loading through clickhouse-client
-- Load from a file function.
sudo mkdir -p /var/lib/clickhouse/user_files
sudo chmod 777 /var/lib/clickhouse/user_files
sudo cp readings.json /var/lib/clickhouse/user_files
clickhouse-client
pika :) INSERT INTO meetup.readings
SELECT *
FROM file('readings.json', 'JSONEachRow',
'sensor_id Int32, time DateTime, date Date, temperature
Decimal(5,2)')
Loading through table functions
-- Insert from S3
INSERT INTO meetup.readings
SELECT * FROM
s3('https://s3.us-east-1.amazonaws.com/altinity-data-1/readings.csv',
'CSVWithNames',
'sensor_id Int32, time DateTime, date Date, temperature
Decimal(5,2)')
NEW: loading data from S3 (20.8+)
17
Third Step: Go crazy with your own queries
17
https://clickhouse.tech/docs/en/sql-reference/statements/select/
But what about client libraries??
1818
Language Popular Drivers
C++ https://github.com/ClickHouse/clickhouse-cpp
Golang https://github.com/ClickHouse/clickhouse-go
Java https://github.com/ClickHouse/clickhouse-jdbc
ODBC https://github.com/ClickHouse/clickhouse-odbc
Python https://github.com/mymarilyn/clickhouse-driver
PHP and Javascript Use a library listed on ClickHouse.tech *or* roll your own using
the ClickHouse HTTP interface
ClickHouse
Database
self-defense
Database Choices
Row Store Column Store
“Data Warehouse”
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
MySQL: Row Store Access
Read row data serially
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
Column Store Access
Read compressed columns in parallel
There is no penalty for wide tables
“Pay” only for the columns you read
Compression makes data even smaller
Data
Type
Codec Compression
LowCardinality
(String)
(none) LZ4
UInt32 DoubleDelta ZSTD(1)
Optimize compression to reduce I/O!
CREATE TABLE billy.readings (
sensor_id Int32 Codec(DoubleDelta, ZSTD(1)),
time DateTime Codec(DoubleDelta, ZSTD(1)),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, ZSTD(1))
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time);
Codec
Compression
Computed value
Query system.columns to see compression
3.22%
0.13%
3.34%
0.14%
43.8%
29.3%
Materialized views restructure/reduce data
readings
Table
Ingest
All sensor readings Daily max/min by sensor
readings_daily
AggregatingMergeTree
(Trigger)
readings_daily_mv
Materialized View
CREATE MATERIALIZED VIEW billy.readings_daily_mv
TO billy.readings_daily AS
SELECT sensor_id, date,
minState(temperature) as temp_min,
maxState(temperature) as temp_max
FROM billy.readings
GROUP BY sensor_id, date;
Size: 544GB
Rows: 500B
Size: 1.7GB
Rows: 347M
Materialized views function like indexes!
SELECT max(temp_max)
FROM billy.readings_daily
WHERE sensor_id = 55
┌─max(temp_max)─┐
│ 75.91 │
└───────────────┘
1 rows in set. Elapsed: 0.011 sec. Processed 180.22
thousand rows, 1.44 MB (15.86 million rows/s., 126.84
MB/s.)
ClickHouse performance tuning is different...
The bad news…
● No query optimizer
● No EXPLAIN PLAN
● May need to move [a lot
of] data for performance
The good news…
● No query optimizer!
● System log is great
● System tables are too
● Performance drivers are
simple: I/O and CPU
● Constantly improving
Your friend: the ClickHouse query log
clickhouse-client --send_logs_level=trace
sudo less 
/var/log/clickhouse-server/clickhouse-server.log
Return messages to
clickhouse-client
View all log
messages on server
Strengths and weaknesses of ClickHouse
(-) Lots of “small” lookups
(-) Lots of updates
(-) High concurrency
(-) Consistency critical
(+) Very long tables
(+) Very wide tables
(+) Open ended questions
(+) Lots of aggregates
OLTP
(“Online Transaction Processing”)
OLAP
(“Online Analytical Processing”)
ClickHouse >> MySQL for analytic queries
● Community docs on ClickHouse.tech
○ Everything Clickhouse
● ClickHouse Youtube Channel
○ Piles of community videos
● Altinity Blog
○ Lots of articles about ClickHouse usage
● Altinity Webinars
○ Webinars on all aspects of ClickHouse
● ClickHouse source code on Github
○ Check out tests for examples of detailed usage
More information and references
32
Thank you!
We’re hiring
ClickHouse:
https://github.com/ClickHouse/
ClickHouse
Documentation:
https://clickhouse.tech
Altinity Website:
https://www.altinity.com
33

More Related Content

PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
Altinity Quickstart for ClickHouse
PDF
ClickHouse Keeper
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
Xplora - Brief Document
PDF
ClickHouse Materialized Views: The Magic Continues
PDF
10 Good Reasons to Use ClickHouse
PPTX
PLC Programming Introduction
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Quickstart for ClickHouse
ClickHouse Keeper
All about Zookeeper and ClickHouse Keeper.pdf
Xplora - Brief Document
ClickHouse Materialized Views: The Magic Continues
10 Good Reasons to Use ClickHouse
PLC Programming Introduction

What's hot (20)

PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
ClickHouse Intro
PDF
ClickHouse Monitoring 101: What to monitor and how
PPTX
PostGreSQL Performance Tuning
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
PDF
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PDF
Using ClickHouse for Experimentation
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Better than you think: Handling JSON data in ClickHouse
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
A Day in the Life of a ClickHouse Query Webinar Slides
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
ClickHouse Deep Dive, by Aleksei Milovidov
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
ClickHouse Intro
ClickHouse Monitoring 101: What to monitor and how
PostGreSQL Performance Tuning
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Efficient Data Storage for Analytics with Apache Parquet 2.0
High Performance, High Reliability Data Loading on ClickHouse
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Using ClickHouse for Experimentation
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Tuning Apache Kafka Connectors for Flink.pptx
Ad

Similar to Your first ClickHouse data warehouse (20)

PDF
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
PDF
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
PPTX
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
PPTX
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
PDF
Cassandra, Modeling and Availability at AMUG
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PPTX
Maryna Popova "Deep dive AWS Redshift"
PDF
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
PDF
Cassandra Summit 2013 Keynote
PPTX
The rise of json in rdbms land jab17
PPT
扩展世界上最大的图片Blog社区
PPT
Fotolog: Scaling the World's Largest Photo Blogging Community
PDF
DZone Cassandra Data Modeling Webinar
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
PDF
MongoDB Solution for Internet of Things and Big Data
PDF
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
PDF
Spark Streaming with Cassandra
PDF
Extra performance out of thin air
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
Cassandra, Modeling and Availability at AMUG
Design Patterns for Building 360-degree Views with HBase and Kiji
Maryna Popova "Deep dive AWS Redshift"
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
Cassandra Summit 2013 Keynote
The rise of json in rdbms land jab17
扩展世界上最大的图片Blog社区
Fotolog: Scaling the World's Largest Photo Blogging Community
DZone Cassandra Data Modeling Webinar
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Solution for Internet of Things and Big Data
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Spark Streaming with Cassandra
Extra performance out of thin air
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...

Recently uploaded (20)

PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Transform Your Business with a Software ERP System
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Nekopoi APK 2025 free lastest update
PDF
System and Network Administraation Chapter 3
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
System and Network Administration Chapter 2
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
ai tools demonstartion for schools and inter college
PDF
Digital Strategies for Manufacturing Companies
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Transform Your Business with a Software ERP System
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Nekopoi APK 2025 free lastest update
System and Network Administraation Chapter 3
CHAPTER 2 - PM Management and IT Context
Odoo POS Development Services by CandidRoot Solutions
System and Network Administration Chapter 2
How to Choose the Right IT Partner for Your Business in Malaysia
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03
ai tools demonstartion for schools and inter college
Digital Strategies for Manufacturing Companies

Your first ClickHouse data warehouse

  • 1. Your first ClickHouse data warehouse Robert Hodges - 2 December 2020 SF Bay Area ClickHouse Meetup 1
  • 2. Presenter and Company Bio www.altinity.com Enterprise provider for ClickHouse, a popular, open source data warehouse. Community sponsor and major committers to ClickHouse project. Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. Using Kubernetes since 2018. 2
  • 4. Single binary Understands SQL Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) ClickHouse is an open source data warehouse ClickHouse Server a b c d And it’s really fast! ClickHouse Server a b c d ClickHouse Server a b c d ClickHouse Server a b c d
  • 5. Installing ClickHouse goodness on Linux # UBUNTU/DEBIAN INSTALL sudo apt-get install apt-transport-https ca-certificates dirmngr sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list sudo apt-get update sudo apt-get install -y clickhouse-server clickhouse-client sudo systemctl start clickhouse-server Debian Packages TarballsRPMs
  • 6. ClickHouse goodness delivered by Docker mkdir $HOME/clickhouse-data docker run -d --name clickhouse-server --ulimit nofile=262144:262144 --volume=$HOME/clickhouse-data:/var/lib/clickhouse -p 8123:8123 -p 9000:9000 yandex/clickhouse-server 6 Persist data Make ports visible Make ClickHouse happy
  • 7. YES! ● Yandex Managed Service for ClickHouse -- Runs in Yandex.Cloud ● Altinity.Cloud -- Runs in Amazon Public Cloud Is there ClickHouse cloud goodness? 7
  • 8. Where is the documentation? 8 https://clickhouse.tech/
  • 10. 10 First step: The ClickHouse Tutorial 10 https://clickhouse.tech/docs/en/getting-started/tutorial/
  • 11. Second step: Design table(s) and load data CREATE TABLE meetup.readings ( sensor_id Int32, time DateTime, date Date, temperature Decimal(5,2) ) Engine = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (sensor_id, time); Don’t stress about data types Use MergeTree table types Partition by month or day Sort by “keys” to find dataLZ4 compression by default
  • 12. Table Part Index Columns Sparse index Columns sorted on ORDER BY columns Rows match PARTITION BY expression Part Index Columns Part Compressed block 12 Your friend: the MergeTree table type 12
  • 13. CSVWithNames "sensor_id","time","date","temperature" 0,"2019-01-01 00:00:00","2019-01-01",43.31 0,"2019-01-01 00:01:00","2019-01-01",43.35 JSONEachRow {"sensor_id":0,"time":"2019-01-01 00:00:00","date":"2019-01-01",...} {"sensor_id":0,"time":"2019-01-01 00:01:00","date":"2019-01-01",...} {"sensor_id":0,"time":"2019-01-01 00:02:00","date":"2019-01-01",...} Popular formats for loading data
  • 14. # Load CSV cat readings.csv | clickhouse-client --query "INSERT INTO meetup.readings FORMAT CSVWithNames" # Load JSON cat readings.json | clickhouse-client --query "INSERT INTO meetup.readings FORMAT JSONEachRow" Loading through clickhouse-client
  • 15. -- Load from a file function. sudo mkdir -p /var/lib/clickhouse/user_files sudo chmod 777 /var/lib/clickhouse/user_files sudo cp readings.json /var/lib/clickhouse/user_files clickhouse-client pika :) INSERT INTO meetup.readings SELECT * FROM file('readings.json', 'JSONEachRow', 'sensor_id Int32, time DateTime, date Date, temperature Decimal(5,2)') Loading through table functions
  • 16. -- Insert from S3 INSERT INTO meetup.readings SELECT * FROM s3('https://s3.us-east-1.amazonaws.com/altinity-data-1/readings.csv', 'CSVWithNames', 'sensor_id Int32, time DateTime, date Date, temperature Decimal(5,2)') NEW: loading data from S3 (20.8+)
  • 17. 17 Third Step: Go crazy with your own queries 17 https://clickhouse.tech/docs/en/sql-reference/statements/select/
  • 18. But what about client libraries?? 1818 Language Popular Drivers C++ https://github.com/ClickHouse/clickhouse-cpp Golang https://github.com/ClickHouse/clickhouse-go Java https://github.com/ClickHouse/clickhouse-jdbc ODBC https://github.com/ClickHouse/clickhouse-odbc Python https://github.com/mymarilyn/clickhouse-driver PHP and Javascript Use a library listed on ClickHouse.tech *or* roll your own using the ClickHouse HTTP interface
  • 20. Database Choices Row Store Column Store “Data Warehouse”
  • 21. a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... MySQL: Row Store Access Read row data serially
  • 22. a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... Column Store Access Read compressed columns in parallel
  • 23. There is no penalty for wide tables “Pay” only for the columns you read
  • 24. Compression makes data even smaller Data Type Codec Compression LowCardinality (String) (none) LZ4 UInt32 DoubleDelta ZSTD(1)
  • 25. Optimize compression to reduce I/O! CREATE TABLE billy.readings ( sensor_id Int32 Codec(DoubleDelta, ZSTD(1)), time DateTime Codec(DoubleDelta, ZSTD(1)), date ALIAS toDate(time), temperature Decimal(5,2) Codec(T64, ZSTD(1)) ) Engine = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (sensor_id, time); Codec Compression Computed value
  • 26. Query system.columns to see compression 3.22% 0.13% 3.34% 0.14% 43.8% 29.3%
  • 27. Materialized views restructure/reduce data readings Table Ingest All sensor readings Daily max/min by sensor readings_daily AggregatingMergeTree (Trigger) readings_daily_mv Materialized View CREATE MATERIALIZED VIEW billy.readings_daily_mv TO billy.readings_daily AS SELECT sensor_id, date, minState(temperature) as temp_min, maxState(temperature) as temp_max FROM billy.readings GROUP BY sensor_id, date; Size: 544GB Rows: 500B Size: 1.7GB Rows: 347M
  • 28. Materialized views function like indexes! SELECT max(temp_max) FROM billy.readings_daily WHERE sensor_id = 55 ┌─max(temp_max)─┐ │ 75.91 │ └───────────────┘ 1 rows in set. Elapsed: 0.011 sec. Processed 180.22 thousand rows, 1.44 MB (15.86 million rows/s., 126.84 MB/s.)
  • 29. ClickHouse performance tuning is different... The bad news… ● No query optimizer ● No EXPLAIN PLAN ● May need to move [a lot of] data for performance The good news… ● No query optimizer! ● System log is great ● System tables are too ● Performance drivers are simple: I/O and CPU ● Constantly improving
  • 30. Your friend: the ClickHouse query log clickhouse-client --send_logs_level=trace sudo less /var/log/clickhouse-server/clickhouse-server.log Return messages to clickhouse-client View all log messages on server
  • 31. Strengths and weaknesses of ClickHouse (-) Lots of “small” lookups (-) Lots of updates (-) High concurrency (-) Consistency critical (+) Very long tables (+) Very wide tables (+) Open ended questions (+) Lots of aggregates OLTP (“Online Transaction Processing”) OLAP (“Online Analytical Processing”) ClickHouse >> MySQL for analytic queries
  • 32. ● Community docs on ClickHouse.tech ○ Everything Clickhouse ● ClickHouse Youtube Channel ○ Piles of community videos ● Altinity Blog ○ Lots of articles about ClickHouse usage ● Altinity Webinars ○ Webinars on all aspects of ClickHouse ● ClickHouse source code on Github ○ Check out tests for examples of detailed usage More information and references 32