SlideShare a Scribd company logo
ClickHouse Paris Meetup
18:30 ClickHouse DBMS. Introduction and Case Studies. Alexander Zaitsev, Altinity
19:15 ClickHouse at ContentSquare. Christophe Kalenzaga &Vianney Foucault
20:00 Pizza Break
20:20 Storetail Data-warehouse Project. Matthieu Jacquet, Storetail (Criteo)
20:45 Pragma Analytics Software Suite w/ClickHouse. MatthieuTexier, Pragma Innovation
21:15 What's new in ClickHouse. Alexey Milovidov,Yandex
Oct 2, 2018
ClickHouse Analytical DBMS
Introduction
Alexander Zaitsev, Altinity
ContentSquare, Paris, 2 Oct 2018
What Is ClickHouse?
ClickHouse DBMS is
• Realtime
• Column Store
• MPP
• SQL
• Open Source
ClickHouseTimeline
• Developed inYandex in 2012-2015
• Open Sourced June 2016
• First non-Yandex deploymentsQ4 2016
• Hundreds of companies by Q4 2018
WhyYet Another DBMS?
OpenSource
Analytical
DBMS
Commercial
Analytical
DBMS
Vertica
MemSQL
Actian
SnowFlake
RedShift
…
InfiniDB (MariaDB
cs)
InfoBright
MonetDB
GreenPlum
Spark
…
OpenSource
Analytical
DBMS
Commercial
Analytical
DBMS
OpenSource
Analytical
DBMS
Commercial
Analytical
DBMS
ClickHouse
• Fast!
• Flexible!
• Free!
How Fast?
:) select count(*) from T;
SELECT count(*)
FROM T
┌───────count()─┐
│ 1261705085657 │
└───────────────┘
1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26
TB (355.22 billion rows/s., 355.22 GB/s.)
Query 1 Query 2 Query 3 Query 4 Setup
0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster
0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs
0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster
1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K
1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster
2 2 1 3 BigQuery
6.41 6.19 6.09 6.63 Amazon Athena
8.1 18.18 n/a n/a Elasticsearch (heavily tuned)
14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K
22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS
35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS
152 175 235 368 PostgreSQL 9.5 & cstore_fdw
“1.1 BillionTaxi Rides Benchmarks”
http://tech.marksblogg.com/benchmarks.html
$ 45.00
$ 104.00
$ 208.00
$ 950.00
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
900.0
1,000.0
r4.xlarge x1 i3.2xlarge x1 i3.4xlarge x1 RS dc2 x2
Time (s)
Price (per week)
• 8 queries
1.3B taxi rides
dataset
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
ClickHouse Vertica
Single Table SELECT
GROUP BY
JOIN t USING t_key
WHERE<t.smth>?
JOIN (SELECT * FROM t
WHERE <smth>) USING
t_key
• 19 queries, 1200M rows table, 3-node clusters
ClickHouse runs at
• Bare metal (any Linux)
• Virtualization environments:
– Kubernets,VMware etc.
• Clouds:
– Amazon,Azure, Alibaba
Real companies are using ClickHouse for:
• Mobile App and Web analytics
• AdTech bidding analytics
• Operational Logs analytics
• DNS queries analysis
• Stock correlation analytics
• Telecom
• Security audit
• Fintech SaaS
• Manufactoring process control
• BlockChain transactions analysis
Worldwide
* www.altinity.com visits in 2018
Size does not matter
• Yandex: 500+ servers, 25B rec/day
• LifeStreet: 60 servers, 75B rec/day
• CloudFlare: 36 servers, 200B rec/day
• Bloomberg: 102 servers, 1000B rec/day
Happy Migrations!
• From
MySQL/InfoBright/Postr
eSQL/Spark/Elastic
• FromVertica/RedShift
SPEED!
COST!
VENDOR UN-LOCKING!
Few Case Studies
• AdTech (ad exchange, ad server, RTB, DMP etc.)
• Ad Optimization, programmatic bidding
• A lot of data:
– 10,000,000,000+ events/day
• A lot of queries: users and algorithms
UsedVertica, but needed to move
• Data sizes constantly grow
• Estimated PBs
• Vertica license would be too expensive
… migration was not easy
* More details at October 2017 Berlin Meetup
Major Design Decisions
• Dictionaries for star-schema design
• Extensive use of Arrays
• SummingMergeTree for realtime aggregation
• Smart query generation
• Multiple shards and replicas
Results
• Successful migration, 1y+ in production
• Better performance and flexibility
– 75B rows/day
– 1М rows/sec in peak hours
– 1.3M SQL queries /day
• 30% hardware cost reduction (less expensive storage)
• No license cost and limits:
– 3PB of raw data
– 6,000 billion rows Powered by:
Case 2. Fintech Company
• Stock Symbols Correlation Analysis
• 5000 Symbols
• 10 years of data
100B data points
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexander Zaitsev
Challenge
• (time, symbol, price) – 100 billion
• log_return = runningDifference(log(price)) – 100 billion times
• corr(s1,s2) = corr(log_return(s1),log_return(s2))
For every pair (s1,s2) from 5000 s(i), 12.5M pairs overall
• Group by hours
Calculate 12,500,000 times
For every hour!
Very slow
Tried…
• Hadoop
• Spark
• Greenplum
ClickHouse
S1
S3
S2
time
symbol
price
time
symbol
logReturn(price)
time
groupArray(symbol)
groupArray(logRet..)
S1
S3
S2S2
R
R
R
S1
S3
2500
tasks
date+hour
corr(S(i),S(j))
POC Performance Results
• 3 servers setup
• 2 years, 5000 symbols:
– log_return calculations: ~1 h (distributed)
– Converting to arrays: ~ 1 h (almost distributed)
– Correlations: ~50 hours (also distributed)
• 12,5M/50h = 70/sec
Distributed => it scales easily!
Case 3. Ivinco
• Mature boardreader application
• A lot of data collected from different sources
• A lot of operational data (performance monitoring)
200TB in MySQL!
Operational problems
• Hard to scale
• Hard to make HA solution
• Performance issues:
– ‘Manual’ partitioning and sharding
– Dozens of indexes per table etc.
Organizational problems
• No development resources to rewrite
• Minimal changes to current system are
allowed
Binary log replication from
MySQL to ClickHouse
MySQL
clickhouse-mysql
Queries
Source Data
Results
• Seamless integration of ClickHouse into the current system
• No developers/coding involved, project is done with DevOps
• Easy to test performance side by side (ClickHouse is 100 times faster)
• Now ready to re-write main system
More details at:
https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
ClickHouseToday
• Mature Analytic DBMS. Proven by many companies
• 2+ years in Open Source
• Constantly improves – new cool features were added recently
• Many community contributors
• Emerging eco-system (tools, drivers, integrations –Tableaux works!)
• Support fromAltinity
Q&A
Contact me:
alexander.zaitsev@lifestreet.com
alz@altinity.com
skype: alex.zaitsev
telegram: @alexanderzaitsev
Altinity

More Related Content

PDF
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
PDF
Clickhouse at Cloudflare. By Marek Vavrusa
PDF
10 Good Reasons to Use ClickHouse
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Clickhouse at Cloudflare. By Marek Vavrusa
10 Good Reasons to Use ClickHouse
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Migration to ClickHouse. Practical guide, by Alexander Zaitsev

What's hot (20)

PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
Using ClickHouse for Experimentation
PDF
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PDF
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
PDF
Our Story With ClickHouse at seo.do
PDF
Building an Observability platform with ClickHouse
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PPTX
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
PDF
[Meetup] a successful migration from elastic search to clickhouse
DOCX
empirical analysis modeling of power dissipation control in internet data ce...
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PDF
Presto @ Treasure Data - Presto Meetup Boston 2015
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
tdtechtalk20160330johan
PDF
Real-time Analytics with Apache Flink and Druid
PDF
Data Analytics with Druid
PDF
Workflow Hacks #1 - dots. Tokyo
PPTX
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Using ClickHouse for Experimentation
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Our Story With ClickHouse at seo.do
Building an Observability platform with ClickHouse
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
[Meetup] a successful migration from elastic search to clickhouse
empirical analysis modeling of power dissipation control in internet data ce...
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Presto @ Treasure Data - Presto Meetup Boston 2015
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
tdtechtalk20160330johan
Real-time Analytics with Apache Flink and Druid
Data Analytics with Druid
Workflow Hacks #1 - dots. Tokyo
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
Ad

Similar to ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexander Zaitsev (20)

PPTX
Azure Stream Analytics : Analyse Data in Motion
PDF
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
PDF
Social media analytics using Azure Technologies
PDF
MongoDB Solution for Internet of Things and Big Data
PDF
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
PDF
Real-Time Analytics with Confluent and MemSQL
PPTX
High availability, real-time and scalable architectures
PPTX
Google for モバイル アプリ 16:00: モバイル kpi 分析の新標準 fluentd + google big query
PDF
Webinar: SQL for Machine Data?
PPTX
MySQL performance monitoring using Statsd and Graphite
PPTX
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
PPTX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
PPT
MongoDB Tick Data Presentation
PPT
16 greg hope_com_wics
PDF
Barga IC2E & IoTDI'16 Keynote
PDF
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
PDF
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
PDF
Strtio Spark Streaming + Siddhi CEP Engine
Azure Stream Analytics : Analyse Data in Motion
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Social media analytics using Azure Technologies
MongoDB Solution for Internet of Things and Big Data
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Real-Time Analytics with Confluent and MemSQL
High availability, real-time and scalable architectures
Google for モバイル アプリ 16:00: モバイル kpi 分析の新標準 fluentd + google big query
Webinar: SQL for Machine Data?
MySQL performance monitoring using Statsd and Graphite
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Tick Data Presentation
16 greg hope_com_wics
Barga IC2E & IoTDI'16 Keynote
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
Strtio Spark Streaming + Siddhi CEP Engine
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
sap open course for s4hana steps from ECC to s4
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”

ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexander Zaitsev