SlideShare a Scribd company logo
ClickHouse 2018
How to stop waiting for your queries
to complete and start having fun
Alexander Zaitsev
Altinity
2
Who am I
M.Sc. In mathematics from Moscow State University
Software engineer since 1997
Developed distributed systems since 2002
Focused on high performance analytics since 2007
Director of Engineering in LifeStreet
Co-founder of Altinity – ClickHouse Service Provider
3
.. and I am not Peter’s brother :)
4
What Is ClickHouse?
5
© http://mattturck.com/
6
ClickHouse DBMS is
Column Store
MPP
Realtime
SQL
Open Source
7
http://clickhouse.yandex
• Developed by Yandex for Yandex.Metrica
- Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+
b2b and b2c products)
- Yandex.Metrica – world 2nd largest web analytics platform
• Open Source since June 2016 (Apache 2.0 license)
• 200+ companies using in production today
• Several hundred experimenting, doing POC etc.
• Dozens of contributors to the source code
8
Why Yet Another DBMS?
9
SQLFlexible
11
OpenSource
Analytical
DBMS
Commercial
Analytical
DBMS
12
ClickHouse
Fast!
Flexible!
Free!
Fun!
13
How Fast?
14
:) select count(*) from dw.ad8_fact_event;
SELECT count(*)
FROM dw.ad8_fact_event
┌───────count()─┐
│ 1261705085657 │
└───────────────┘
1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion
rows/s., 355.22 GB/s.)
Altinity Ltd. www.altinity.com
1+ trillion rows table
15
:) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2;
SELECT sum(price_cpm)
FROM dw.ad8_fact_event
WHERE (access_day = (today() - 1)) AND (event_key = -2)
┌────sum(price_cpm)─┐
│ 87579.09035192338 │
└───────────────────┘
1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million
rows/s., 17.31 GB/s.)
Altinity Ltd. www.altinity.com
1+ trillion rows table
16
WikiStat data, 28B rows.
https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
17
Query 1 Query 2 Query 3 Query 4 Setup
0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster
0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs
- 2.415 3.599 4.962 ClickHouse at Altinity demo server
0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster
1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K
1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster
2 2 1 3 BigQuery
6.41 6.19 6.09 6.63 Amazon Athena
8.1 18.18 n/a n/a Elasticsearch (heavily tuned)
14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K
22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS
35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS
152 175 235 368 PostgreSQL 9.5 & cstore_fdw
“1.1 Billion Taxi Rides Benchmarks”
http://tech.marksblogg.com/benchmarks.html
18• 19 queries, 1200M rows table, 3-node clusters
2016 LifeStreet benchmark
(unpublished)
19
Time Series benchmarks
(first time today!)
https://github.com/timescale/tsbs
Benchmark suite to automate testing
Loads 103M rows, 10 metrics per row
Runs 15 queries, 1000 runs each in 8 parallel threads
Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse
(Altinity PR is submitted)
20
0
100
200
300
400
500
600
700
800
900
ClickHouse TimescaleDB InfluxDB
Load time (s)
21
0
10
20
30
40
50
60
70
80
ClickHouse
TimescaleDB
InfluxDB
“Light” queries, time in ms
22
0
10
20
30
40
50
60
70
80
90
ClickHouse
TimescaleDB
InfluxDB
“Heavy” queries, time in sec
23
How flexible?
24
ClickHouse runs at
Bare metal (any Linux)
Amazon
Azure
VMware, VirtualBox
Docker, K8s
25
ClickHouse solves business problems at:
Mobile App and Web analytics
AdTech bidding analytics
Operational Logs analytics
DNS queries analysis
Stock correlation analytics
Telecom
Security audit
Fintech SaaS
Manufactoring process control
BlockChain transactions analysis
26
Worldwide
* www.altinity.com visits in 2018
27
Size does not matter
Yandex: 500+ servers, 25B rec/day
LifeStreet: 60 servers, 75B rec/day
CloudFlare: 36 servers, 200B rec/day
Bloomberg: 102 servers, 1000B rec/day
Toutiao: 400 servers, moving to 1000 this month
28
How fun ☺
life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω}
29
with (select groupArray(C) from C) as Ca
select id,
groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da,
groupArray(P) Pa,
arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka,
arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] -
Sa[k])*(c-Sa[k]),Ca,Ka) Ta,
arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ?
arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja,
arrayMap(i -> Ta[i], Ja) Ra,
arrayMap((v,r) -> v - r, Va, Ra) ARa,
arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result
from T group by id
30
What’s new in 2018
• Table functions mysql/odbc/file/http
• clickhouse-copier
• Predicate pushdown for views/subselects
• LowCardinality datatype
• Decimal datatype
• JOIN enhancements
• ALTER TABLE UPDATE/DELETE
• WITH ROLLUP
… and tons of performance improvements and small features
31
More user friendly than ever!
• GDPR compliance – thanks to UPDATE/DELETE
• Easier BI integration – thanks to SQL compatibility changes
and improvements in ODBC driver
• Easier cluster operation – thanks to clickhouse-copier,
distributed DDL
• Easier integration with other systems. Thanks to:
• Table functions
• Kafka storage engine
• Logs integration with Logstash, ClickTail
• clickhouse-mysql for migration from MySQL
32
Case Study. Ivinco jump on to ClickHouse
Supports mature boardreader system
A lot of data collected from different sources
A lot of operational data (performance monitoring)
200TB in MySQL!
33
Operational problems
Hard to scale
Hard to make HA solution
Performance issues:
• ‘Manual’ partitioning and sharding
• Dozens of indexes per table etc.
34
Organizational problems
No development resources to rewrite
Minimal changes to current system are allowed
35
Binary log replication from MySQL
to ClickHouse
MySQL
clickhouse-mysql
Queries
Source Data
See details at:
https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
36
Results
Seamless integration of ClickHouse into the current system
No developers/coding involved, project is done with DevOps
Easy to test performance side by side
ClickHouse is 100 times faster
Now ready to re-write main system
37
More ways to integrate with MySQL
• mysql() table function
• MySQL table engine
• MySQL external dictionaries
• ProxySQL
38
mysql() table function
select * from mysql('host:port', database, 'table', 'user', 'password');
https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse
• Easiest and fastest way to get data from MySQL
• Load to CH table and run queries much faster
39
MySQL table engine
CREATE TABLE …
Engine = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query,
'on_duplicate_clause']);
•SELECTs and INSERTs!
•No caching, data is queried from remote server
https://clickhouse.yandex/docs/en/operations/table_engines/mysql/
40
MySQL external dictionaries
• Makes data from mysql database accessible in ClickHouse queries
• Stores in memory
• Updates when the source data changes
SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name,
sum(imps)
FROM T
GROUP BY country_name;
41
Accessing ClickHouse from MySQL
42
ClickTail
• Log ingesting based on honeycomb.io
• Understands Nginx Access Log, MySQL Slow Log,
MySQL Audit Logs, MongoDB and Regex Custom Format
• Easily extensible with other formats
https://github.com/Altinity/clicktail
https://www.altinity.com/blog/2018/3/12/clicktail-introduction
https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-query-logs-clickhouse/
https://www.percona.com/blog/2018/03/29/analyze-mysql-audit-logs-clickhouse-
clicktail/
43
Kafka Engine
Engine = Kafka MV
Engine =
MergeTree
https://clickhouse.yandex/docs/en/operations/table_engines/kafka/
ClickHouse
Kafka
44
“Secret” Roadmap disclosed
ANSI SQL JOIN support:
• Multi-table joins – Q1/2019
• merge joins – Q2/2019
Protobuf/Parquet formats - Q4/2018
Per column compression/encoding settings – Q4/2018
Dictionary DDLs – Q1/2019
Secondary indexes – Q2/2019
LDAP integration, security enhancements -- Q2/2019
45
ClickHouse Today
Mature Analytic DBMS. Proven by many companies
2+ years in Open Source
Constantly improves – new cool features were added recently
Many community contributors
Emerging eco-system
Support from Altinity
46
ClickHouse Today
47
Q&A
Contact me:
alexander.zaitsev@lifestreet.com
alz@altinity.com
skype: alex.zaitsev
telegram: @alexanderzaitsev
Altinity

More Related Content

PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PDF
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
Bitquery GraphQL for Analytics on ClickHouse
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PPTX
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
PDF
ClickHouse new features and development roadmap, by Aleksei Milovidov
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Bitquery GraphQL for Analytics on ClickHouse
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
ClickHouse new features and development roadmap, by Aleksei Milovidov

What's hot (19)

PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
PPTX
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
PDF
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
PDF
Building an Observability platform with ClickHouse
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
Clickhouse at Cloudflare. By Marek Vavrusa
PDF
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
PDF
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
PDF
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PDF
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
PDF
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
PDF
12 in 12 – A closer look at twelve or so new things in Postgres 12
PDF
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Building an Observability platform with ClickHouse
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
High Performance, High Reliability Data Loading on ClickHouse
ClickHouse Deep Dive, by Aleksei Milovidov
Clickhouse at Cloudflare. By Marek Vavrusa
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
12 in 12 – A closer look at twelve or so new things in Postgres 12
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Ad

Similar to ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO (20)

PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PDF
Altinity Quickstart for ClickHouse-2202-09-15.pdf
PDF
Altinity Quickstart for ClickHouse
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
PDF
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
PDF
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PDF
10 Good Reasons to Use ClickHouse
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
A day in the life of a click house query
PDF
Your first ClickHouse data warehouse
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
Low Cost Transactional and Analytics with MySQL + Clickhouse
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
10 Good Reasons to Use ClickHouse
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Creating Beautiful Dashboards with Grafana and ClickHouse
A Day in the Life of a ClickHouse Query Webinar Slides
A day in the life of a click house query
Your first ClickHouse data warehouse
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Low Cost Transactional and Analytics with MySQL + Clickhouse
Ad

More from Altinity Ltd (20)

PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...

Recently uploaded (20)

PDF
System and Network Administraation Chapter 3
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
L1 - Introduction to python Backend.pptx
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
ai tools demonstartion for schools and inter college
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
AI in Product Development-omnex systems
System and Network Administraation Chapter 3
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms I-SECS-1021-03
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Operating system designcfffgfgggggggvggggggggg
L1 - Introduction to python Backend.pptx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How to Migrate SBCGlobal Email to Yahoo Easily
PTS Company Brochure 2025 (1).pdf.......
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
ai tools demonstartion for schools and inter college
Understanding Forklifts - TECH EHS Solution
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Which alternative to Crystal Reports is best for small or large businesses.pdf
AI in Product Development-omnex systems

ClickHouse 2018. How to stop waiting for your queries to complete and start having fun, by Alexander Zaitsev, Altinity CTO

  • 1. ClickHouse 2018 How to stop waiting for your queries to complete and start having fun Alexander Zaitsev Altinity
  • 2. 2 Who am I M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics since 2007 Director of Engineering in LifeStreet Co-founder of Altinity – ClickHouse Service Provider
  • 3. 3 .. and I am not Peter’s brother :)
  • 6. 6 ClickHouse DBMS is Column Store MPP Realtime SQL Open Source
  • 7. 7 http://clickhouse.yandex • Developed by Yandex for Yandex.Metrica - Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+ b2b and b2c products) - Yandex.Metrica – world 2nd largest web analytics platform • Open Source since June 2016 (Apache 2.0 license) • 200+ companies using in production today • Several hundred experimenting, doing POC etc. • Dozens of contributors to the source code
  • 13. 14 :) select count(*) from dw.ad8_fact_event; SELECT count(*) FROM dw.ad8_fact_event ┌───────count()─┐ │ 1261705085657 │ └───────────────┘ 1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion rows/s., 355.22 GB/s.) Altinity Ltd. www.altinity.com 1+ trillion rows table
  • 14. 15 :) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2; SELECT sum(price_cpm) FROM dw.ad8_fact_event WHERE (access_day = (today() - 1)) AND (event_key = -2) ┌────sum(price_cpm)─┐ │ 87579.09035192338 │ └───────────────────┘ 1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million rows/s., 17.31 GB/s.) Altinity Ltd. www.altinity.com 1+ trillion rows table
  • 15. 16 WikiStat data, 28B rows. https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
  • 16. 17 Query 1 Query 2 Query 3 Query 4 Setup 0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster 0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs - 2.415 3.599 4.962 ClickHouse at Altinity demo server 0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster 1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K 1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster 2 2 1 3 BigQuery 6.41 6.19 6.09 6.63 Amazon Athena 8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS 35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS 152 175 235 368 PostgreSQL 9.5 & cstore_fdw “1.1 Billion Taxi Rides Benchmarks” http://tech.marksblogg.com/benchmarks.html
  • 17. 18• 19 queries, 1200M rows table, 3-node clusters 2016 LifeStreet benchmark (unpublished)
  • 18. 19 Time Series benchmarks (first time today!) https://github.com/timescale/tsbs Benchmark suite to automate testing Loads 103M rows, 10 metrics per row Runs 15 queries, 1000 runs each in 8 parallel threads Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse (Altinity PR is submitted)
  • 23. 24 ClickHouse runs at Bare metal (any Linux) Amazon Azure VMware, VirtualBox Docker, K8s
  • 24. 25 ClickHouse solves business problems at: Mobile App and Web analytics AdTech bidding analytics Operational Logs analytics DNS queries analysis Stock correlation analytics Telecom Security audit Fintech SaaS Manufactoring process control BlockChain transactions analysis
  • 26. 27 Size does not matter Yandex: 500+ servers, 25B rec/day LifeStreet: 60 servers, 75B rec/day CloudFlare: 36 servers, 200B rec/day Bloomberg: 102 servers, 1000B rec/day Toutiao: 400 servers, moving to 1000 this month
  • 27. 28 How fun ☺ life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω}
  • 28. 29 with (select groupArray(C) from C) as Ca select id, groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da, groupArray(P) Pa, arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka, arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] - Sa[k])*(c-Sa[k]),Ca,Ka) Ta, arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ? arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja, arrayMap(i -> Ta[i], Ja) Ra, arrayMap((v,r) -> v - r, Va, Ra) ARa, arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result from T group by id
  • 29. 30 What’s new in 2018 • Table functions mysql/odbc/file/http • clickhouse-copier • Predicate pushdown for views/subselects • LowCardinality datatype • Decimal datatype • JOIN enhancements • ALTER TABLE UPDATE/DELETE • WITH ROLLUP … and tons of performance improvements and small features
  • 30. 31 More user friendly than ever! • GDPR compliance – thanks to UPDATE/DELETE • Easier BI integration – thanks to SQL compatibility changes and improvements in ODBC driver • Easier cluster operation – thanks to clickhouse-copier, distributed DDL • Easier integration with other systems. Thanks to: • Table functions • Kafka storage engine • Logs integration with Logstash, ClickTail • clickhouse-mysql for migration from MySQL
  • 31. 32 Case Study. Ivinco jump on to ClickHouse Supports mature boardreader system A lot of data collected from different sources A lot of operational data (performance monitoring) 200TB in MySQL!
  • 32. 33 Operational problems Hard to scale Hard to make HA solution Performance issues: • ‘Manual’ partitioning and sharding • Dozens of indexes per table etc.
  • 33. 34 Organizational problems No development resources to rewrite Minimal changes to current system are allowed
  • 34. 35 Binary log replication from MySQL to ClickHouse MySQL clickhouse-mysql Queries Source Data See details at: https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
  • 35. 36 Results Seamless integration of ClickHouse into the current system No developers/coding involved, project is done with DevOps Easy to test performance side by side ClickHouse is 100 times faster Now ready to re-write main system
  • 36. 37 More ways to integrate with MySQL • mysql() table function • MySQL table engine • MySQL external dictionaries • ProxySQL
  • 37. 38 mysql() table function select * from mysql('host:port', database, 'table', 'user', 'password'); https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse • Easiest and fastest way to get data from MySQL • Load to CH table and run queries much faster
  • 38. 39 MySQL table engine CREATE TABLE … Engine = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']); •SELECTs and INSERTs! •No caching, data is queried from remote server https://clickhouse.yandex/docs/en/operations/table_engines/mysql/
  • 39. 40 MySQL external dictionaries • Makes data from mysql database accessible in ClickHouse queries • Stores in memory • Updates when the source data changes SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name, sum(imps) FROM T GROUP BY country_name;
  • 41. 42 ClickTail • Log ingesting based on honeycomb.io • Understands Nginx Access Log, MySQL Slow Log, MySQL Audit Logs, MongoDB and Regex Custom Format • Easily extensible with other formats https://github.com/Altinity/clicktail https://www.altinity.com/blog/2018/3/12/clicktail-introduction https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-query-logs-clickhouse/ https://www.percona.com/blog/2018/03/29/analyze-mysql-audit-logs-clickhouse- clicktail/
  • 42. 43 Kafka Engine Engine = Kafka MV Engine = MergeTree https://clickhouse.yandex/docs/en/operations/table_engines/kafka/ ClickHouse Kafka
  • 43. 44 “Secret” Roadmap disclosed ANSI SQL JOIN support: • Multi-table joins – Q1/2019 • merge joins – Q2/2019 Protobuf/Parquet formats - Q4/2018 Per column compression/encoding settings – Q4/2018 Dictionary DDLs – Q1/2019 Secondary indexes – Q2/2019 LDAP integration, security enhancements -- Q2/2019
  • 44. 45 ClickHouse Today Mature Analytic DBMS. Proven by many companies 2+ years in Open Source Constantly improves – new cool features were added recently Many community contributors Emerging eco-system Support from Altinity