SlideShare a Scribd company logo
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
Data is represented as small single-dimensional arrays (vectors), easily accessible for CPUs.
The percentage of instructions spent in interpretation logic is reduced by a factor equal to the
vector-size
The functions that perform work now typically process an array of values in a tight loop
Tight loops can be optimized well by compilers, enable compilers to generate SIMD instructions
automatically.
Modern CPUs also do well on such loops, out-of-order execution in CPUs often takes multiple loop
iterations into execution concurrently, exploiting the deeply pipelined resources of modern CPUs.
It was shown that vectorized execution can improve data-intensive (OLAP) queries by a factor 50.
* The image taken from [1]
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
SELECT foo FROM distributed_table
SELECT foo FROM local_table GROUP BY col1
• Server 1
SELECT foo FROM local_table GROUP BY col1
• Server 2
SELECT foo FROM local_table GROUP BY col1
• Server 3
N Servers 1 3 140
Time, sec 1.224 0.438 0.043
Speedup x2.8 x28.5
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
seconds
query
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
ClickHouse on MemCloud
Kodiak Data and Altinity now Offer a Cloud Version of ClickHouse
38
1. FASTEST MPP Open Source DBMS
2. Cutting Edge Cloud for Big Data Apps and Processing
3. World-class ClickHouse Expertise
Try the ClickHouse on MemCloud demo here
http://clickhouse-demo.memcloud.works/
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.

More Related Content

PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
PDF
Clickhouse at Cloudflare. By Marek Vavrusa
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PPTX
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
PDF
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Clickhouse at Cloudflare. By Marek Vavrusa
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev

What's hot (19)

PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PPTX
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
PDF
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
PDF
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
PDF
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
PDF
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
PDF
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
tdtechtalk20160330johan
PDF
Bitquery GraphQL for Analytics on ClickHouse
PDF
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
PDF
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
PDF
Presto At Treasure Data
PDF
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
PDF
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
Analyzing MySQL Logs with ClickHouse, by Peter Zaitsev
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
High Performance, High Reliability Data Loading on ClickHouse
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
tdtechtalk20160330johan
Bitquery GraphQL for Analytics on ClickHouse
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
Presto At Treasure Data
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
ClickHouse Deep Dive, by Aleksei Milovidov
Ad

Similar to How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko. (7)

PDF
B17 Eliminating the database bottleneck
PDF
Vectorization vs Compilation
PDF
Exploiting vectorization with ISPC
PDF
Code GPU with CUDA - Applying optimization techniques
PDF
OLAP Indexes and Algorithms CMU Advanced Databases
PDF
IRJET- Adding Support for Vector Instructions to 8051 Architecture
PPTX
CS 542 -- Query Execution
B17 Eliminating the database bottleneck
Vectorization vs Compilation
Exploiting vectorization with ISPC
Code GPU with CUDA - Applying optimization techniques
OLAP Indexes and Algorithms CMU Advanced Databases
IRJET- Adding Support for Vector Instructions to 8051 Architecture
CS 542 -- Query Execution
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Spectroscopy.pptx food analysis technology
PDF
A comparative analysis of optical character recognition models for extracting...
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
SOPHOS-XG Firewall Administrator PPT.pptx
Tartificialntelligence_presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Getting Started with Data Integration: FME Form 101
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectroscopy.pptx food analysis technology
A comparative analysis of optical character recognition models for extracting...

How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkachenko.

  • 11. Data is represented as small single-dimensional arrays (vectors), easily accessible for CPUs. The percentage of instructions spent in interpretation logic is reduced by a factor equal to the vector-size The functions that perform work now typically process an array of values in a tight loop Tight loops can be optimized well by compilers, enable compilers to generate SIMD instructions automatically. Modern CPUs also do well on such loops, out-of-order execution in CPUs often takes multiple loop iterations into execution concurrently, exploiting the deeply pipelined resources of modern CPUs. It was shown that vectorized execution can improve data-intensive (OLAP) queries by a factor 50.
  • 12. * The image taken from [1]
  • 21. SELECT foo FROM distributed_table SELECT foo FROM local_table GROUP BY col1 • Server 1 SELECT foo FROM local_table GROUP BY col1 • Server 2 SELECT foo FROM local_table GROUP BY col1 • Server 3
  • 22. N Servers 1 3 140 Time, sec 1.224 0.438 0.043 Speedup x2.8 x28.5
  • 37. ClickHouse on MemCloud Kodiak Data and Altinity now Offer a Cloud Version of ClickHouse 38 1. FASTEST MPP Open Source DBMS 2. Cutting Edge Cloud for Big Data Apps and Processing 3. World-class ClickHouse Expertise Try the ClickHouse on MemCloud demo here http://clickhouse-demo.memcloud.works/

Editor's Notes

  • #3: Analytic database landscape: Commerical -- fast and expensive: Vertica RedShift Teradata etc. Open Source -- somewhat slow, buggy but free InfiniDB (part of MariaDB now) InfoBright GreenPlum (started as commerical) Hadoop systems ClickHouse: fast as free! ClickHouse story: Yandex -- Russian Google Yandex Metrika -- Russian Google Analytics Interactive Ad Hoc reports at multiple petabytes That's why they developed ClickHouse ClickHouse is extremelly fast and scalable. Why ClickHouse is so fast Popular Yandex answer -- because they had no choice Techical details Vectorised processing (see VectorWise) True MPP  True shared nothing True column store with late materialization (like C-Store and Vertica but unlike many others): Data compression Column locality No random reads Some technical details (in Russian): https://clickhouse.yandex/presentations/meetup7/internals.pdf  What is column store (I think it is important to explain) Why it is good for quries like range scan + aggregation (look at Yandex presentation above, there are examples) Conclusion -- such an architecture allows very fast queries on a single table with filters and group by's Not enough speed -- let's see how data distribution works  Care about reliability -- let's see how replication is set up (again, can use Yandex slides above as the source) Benchmark 1 Benchmark 2 Benchmark 3 Few words on limitations: Custom SQL dialect As a consequence -- limited ecosystem (can not fit to standard one) No deletes/updates: but there are mutable table types (engines) there is a way to connect to external updateble data (dictionaries) Somewhat hard to manage -- no tools Final word: Potential to be MySQL for Analytics Invite to try Need more info -- http://clickhouse.yandex Need consulting/support -- http://www.altinity.com