SlideShare a Scribd company logo
ClickHouse In Real Life
Case Studies and Best Practices
Alexander Zaitsev, LifeStreet/Altinity
Percona Live 2018
Who am I
• M.Sc. In mathematics from Moscow State University
• Software engineer since 1997
• Developed distributed systems since 2002
• Focused on high performance analytics since 2007
• Director of Engineering in LifeStreet
• Co-founder of Altinity – ClickHouse Service Provider
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
.. and I am not Peter’s brother
ClickHouse is
•Fast
•Flexible
•Scalable
How does it work in real?
What Is It For?
What Is It For?
• Fast analytical queries
• Low latent data ingestion/aggregation
• Distributed computations
• Fault-tolerant data warehousing
All scaled from 1 to 1000s servers
Who Is It For?
Who Is It For?
• Analysts/Developers/DevOps
• who need analyze huge amounts of data
• Startups
• build high performance analytics with low investment
• Companies
• having performance problems with current systems
• paying too much for license or infrastructure
Successful Production Deployments
• DNS queries analytics (CloudFlare)
• AdTech (multiple companies worldwide)
• Operational logs analytics (multiple companies worldwide)
• Stock correlation analytics, investor tools (Canadian company)
• Hotel booking analytics SaaS (Spanish company)
• Security audit (Great Britain, USA)
• Fintech SaaS (France)
• Mobile App and Web analytics (multiple companies worldwide)
Evaluating/implementing:
• Telecom companies
• Satellite data processing
• Search engine ranking analytics
• Blockchain platform analysis
• Manufacturing process control
Happy Transitions!
• From
MySQL/InfoBright/PostreSQL/Sp
ark to ClickHouse
• From Vertica/RedShift to
ClickHouse
SPEED!
COST!
VENDOR UN-LOCKING!
24.04 16:50 CLICKHOUSE GATE 2 boarding
24.04 19:30 CLICKHOUSE GATE 3
24.04 20:00 CLICKHOUSE GATE 4
Case Studies
• Migration from Vertica to ClickHouse
• Distributed Computations and Analysis of Financial Data
• Blockchain Platform Analytics
• ClickHouse with MySQL
• Ad Tech (ad exchange, ad server, RTB, DMP etc.)
• Creative optimization, programmatic bidding
• A lot of data:
• 10,000,000,000+ bid requests/day
• 2-3K event record (300+ dimensions)
• 90-120 days of detailed data
10B * 3K * [90-120] = [2.7-3.6]PB
Case 1.
Business Requirements
• Ad-hoc analytical reports on 3 months of detail data
• Low data and query latency
• High Availability
• Tried/used/evaluated:
• MySQL (TokuDB, ShardQuery)
• InfiniDB
• MonetDB
• InfoBright EE
• Paraccel (now RedShift)
• Oracle
• Greenplum
• Snowflake DB
• Vertica
ClickHouse
Main Migration Challenges
• Efficient star-schema for OLAP
• Reliable data ingestion
• Sharding and replication
• Client interfaces
Data Load Diagram
Temp tables (local)
Fact tables (shard)
SummingMergeTree
(shard)
SummingMergeTree
(shard)
Log Files
INSERT
MV MV
INSERT Buffer tables
(local)
Realtime producers
INSERT
Buffer flush
MySQL
Dictionaries
CLICKHOUSE NODE
Sharding and Replication
S1 S2 S3 S4 SnTable1
S1 S2 S3 S4 SnTable1
Replica1
Replica2
Altinity Ltd.
S1 S2 S3 S4 SnTable1 Replica3
Major Design Decisions
• Dictionaries for star-schema design
• Extensive use of Arrays
• SummingMergeTree for realtime aggregation
• Smart query generation
• Multiple shards and replicas
Project Results
• Successful migration and cost reduction
• Increased performance and flexibility
• 60 servers in 3 replicas
• 2-3PB of data
• 6,000B+ rows in fact and aggregate tables (50B+ daily load)
• 1M+ SQL-queries/day
Powered by:
Case 2. Fintech Company
• Stock Symbols Correlation Analysis
• 5000 Symbols
• 100ms granularity
• 10 years of data
100B data points
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Main Challenge
• Symbols S(1)..S(5000)
• Time points Т(1)…T(300M)
• log_return(n)(m) = runningDifference(log(price(n)))
• corr(n1,n2) = corr(log_return(n1),log_return(n2))
• For every tuple (n1,n2), 12.5M tuples altogether
calculate 12,500,000 times!
Tried…
• Hadoop
• Spark
• Greenplum ClickHouse
Distributed Computations
• Distribute data across N servers
• Calculate log_return for every symbol at every server using Arrays:
• (timestamp, Array[String], Array[Float32])
• Distribute correlation computations across all servers
• Batch planning
POC Performance Results
• 3 servers setup
• 2 years, 5000 symbols:
• log_return calculations: ~1 h
• Converting to arrays: ~ 1 h
• Correlations: ~50 hours
• 12,5M/50h = 70/sec
And is scales easily!
Case 3.
Bloxy.info - Etherium network analysis
• 450M transactions
• Transaction level interactive reports
• Transaction graph navigation
• Aggregate reports
• Rich visualization
Tried
• MySQL ClickHouse
Main Challenge:
ClickHouse is bad for point queries!
Main Design Decisions
• Encode transaction IDs to binary
• ClickHouse MergeTree with low index_granularity
• Materialized Views for different sort orders
• Apache SuperSet for visualization
http://stat.bloxy.info/superset/dashboard/today/?standalone=true
http://stat.bloxy.info/superset/dashboard/today/?standalone=true
http://stat.bloxy.info/superset/dashboard/mixer/?standalone=true
Mystical Mixer
And more: http://bloxy.info
• Etherium Mixer Analysis
• Token Dynamics
• Token Distribution
• ERC721 Token and Collectibles
• ICO Analysis and Trends
• Smart Contract Events and Methods
• Etherium Mining
• DAO Efficiency Analytics
Powered by:
Case 4. ClickHouse with MySQL
• Accessing MySQL from ClickHouse
• Accessing ClickHouse from MySQL
• Streaming data from MySQL to ClickHouse
• Analyzing MySQL logs with ClickHouse
Accessing MySQL from ClickHouse
• External dictionaries from MySQL table
• Map mysql table to in-memory structure
• Mysql() function
select * from MySQL('host:port', 'database', 'table', 'user', 'password');
https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse
Accessing ClickHouse from MySQL
Streaming Data from MySQL to
ClickHouse
https://github.com/Altinity/clickhouse-mysql-data-reader
Combine together
MySQL
ProxySQL
binlog reader
Applications
Analyzing MySQL logs with ClickHouse
• MySQL Logs may grow large
• https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-
query-logs-clickhouse/
• https://www.percona.com/blog/2018/03/29/analyze-mysql-audit-
logs-clickhouse-clicktail/
Main Lessons
• Schema is the most important
• Proper data types
• Arrays
• Dictionaries
• Summing/Aggregating MergeTree for realtime aggregation
• Materialized Views if one key is not enough
• Reduce Index granularity for point queries
• Distribute data and load as uniform as possible
• Integrate smartly
ClickHouse is
•Fast
•Flexible
•Scalable
And it really works!
Q&A
Contact me:
alexander.zaitsev@lifestreet.com
alz@altinity.com
skype: alex.zaitsev
telegram: @alexanderzaitsev

More Related Content

PDF
ClickHouse Monitoring 101: What to monitor and how
PDF
Using ClickHouse for Experimentation
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
ClickHouse Keeper
PDF
10 Good Reasons to Use ClickHouse
ClickHouse Monitoring 101: What to monitor and how
Using ClickHouse for Experimentation
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Adventures with the ClickHouse ReplacingMergeTree Engine
ClickHouse Keeper
10 Good Reasons to Use ClickHouse

What's hot (20)

PDF
Altinity Quickstart for ClickHouse
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
Your first ClickHouse data warehouse
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
[Meetup] a successful migration from elastic search to clickhouse
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
Elasticsearch in Netflix
PDF
Introduction to MongoDB
PDF
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
PDF
Adventures in Observability - Clickhouse and Instana
Altinity Quickstart for ClickHouse
All about Zookeeper and ClickHouse Keeper.pdf
ClickHouse Deep Dive, by Aleksei Milovidov
A Day in the Life of a ClickHouse Query Webinar Slides
Better than you think: Handling JSON data in ClickHouse
Your first ClickHouse data warehouse
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
[Meetup] a successful migration from elastic search to clickhouse
High Performance, High Reliability Data Loading on ClickHouse
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Elasticsearch in Netflix
Introduction to MongoDB
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Adventures in Observability - Clickhouse and Instana
Ad

Similar to ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev (20)

PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PDF
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
Low Cost Transactional and Analytics with MySQL + Clickhouse
PDF
My first 90 days with ClickHouse.pdf
PDF
Dok Talks #133 - My First 90 days with Clickhouse
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Our Story With ClickHouse at seo.do
PDF
Altinity Quickstart for ClickHouse-2202-09-15.pdf
PPT
Designing Scalable Data Warehouse Using MySQL
PPTX
ADOPTING CLICKHOUSE at your YOUR WORK.pptx
PDF
Cloud arch patterns
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Low Cost Transactional and Analytics with MySQL + Clickhouse
My first 90 days with ClickHouse.pdf
Dok Talks #133 - My First 90 days with Clickhouse
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Our Story With ClickHouse at seo.do
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Designing Scalable Data Warehouse Using MySQL
ADOPTING CLICKHOUSE at your YOUR WORK.pptx
Cloud arch patterns
Creating Beautiful Dashboards with Grafana and ClickHouse
Ad

More from Altinity Ltd (20)

PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
PDF
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
A Presentation on Artificial Intelligence
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
NewMind AI Monthly Chronicles - July 2025
A Presentation on Artificial Intelligence
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev

  • 1. ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity Percona Live 2018
  • 2. Who am I • M.Sc. In mathematics from Moscow State University • Software engineer since 1997 • Developed distributed systems since 2002 • Focused on high performance analytics since 2007 • Director of Engineering in LifeStreet • Co-founder of Altinity – ClickHouse Service Provider
  • 4. .. and I am not Peter’s brother
  • 6. What Is It For?
  • 7. What Is It For? • Fast analytical queries • Low latent data ingestion/aggregation • Distributed computations • Fault-tolerant data warehousing All scaled from 1 to 1000s servers
  • 8. Who Is It For?
  • 9. Who Is It For? • Analysts/Developers/DevOps • who need analyze huge amounts of data • Startups • build high performance analytics with low investment • Companies • having performance problems with current systems • paying too much for license or infrastructure
  • 10. Successful Production Deployments • DNS queries analytics (CloudFlare) • AdTech (multiple companies worldwide) • Operational logs analytics (multiple companies worldwide) • Stock correlation analytics, investor tools (Canadian company) • Hotel booking analytics SaaS (Spanish company) • Security audit (Great Britain, USA) • Fintech SaaS (France) • Mobile App and Web analytics (multiple companies worldwide)
  • 11. Evaluating/implementing: • Telecom companies • Satellite data processing • Search engine ranking analytics • Blockchain platform analysis • Manufacturing process control
  • 12. Happy Transitions! • From MySQL/InfoBright/PostreSQL/Sp ark to ClickHouse • From Vertica/RedShift to ClickHouse SPEED! COST! VENDOR UN-LOCKING! 24.04 16:50 CLICKHOUSE GATE 2 boarding 24.04 19:30 CLICKHOUSE GATE 3 24.04 20:00 CLICKHOUSE GATE 4
  • 13. Case Studies • Migration from Vertica to ClickHouse • Distributed Computations and Analysis of Financial Data • Blockchain Platform Analytics • ClickHouse with MySQL
  • 14. • Ad Tech (ad exchange, ad server, RTB, DMP etc.) • Creative optimization, programmatic bidding • A lot of data: • 10,000,000,000+ bid requests/day • 2-3K event record (300+ dimensions) • 90-120 days of detailed data 10B * 3K * [90-120] = [2.7-3.6]PB Case 1.
  • 15. Business Requirements • Ad-hoc analytical reports on 3 months of detail data • Low data and query latency • High Availability
  • 16. • Tried/used/evaluated: • MySQL (TokuDB, ShardQuery) • InfiniDB • MonetDB • InfoBright EE • Paraccel (now RedShift) • Oracle • Greenplum • Snowflake DB • Vertica ClickHouse
  • 17. Main Migration Challenges • Efficient star-schema for OLAP • Reliable data ingestion • Sharding and replication • Client interfaces
  • 18. Data Load Diagram Temp tables (local) Fact tables (shard) SummingMergeTree (shard) SummingMergeTree (shard) Log Files INSERT MV MV INSERT Buffer tables (local) Realtime producers INSERT Buffer flush MySQL Dictionaries CLICKHOUSE NODE
  • 19. Sharding and Replication S1 S2 S3 S4 SnTable1 S1 S2 S3 S4 SnTable1 Replica1 Replica2 Altinity Ltd. S1 S2 S3 S4 SnTable1 Replica3
  • 20. Major Design Decisions • Dictionaries for star-schema design • Extensive use of Arrays • SummingMergeTree for realtime aggregation • Smart query generation • Multiple shards and replicas
  • 21. Project Results • Successful migration and cost reduction • Increased performance and flexibility • 60 servers in 3 replicas • 2-3PB of data • 6,000B+ rows in fact and aggregate tables (50B+ daily load) • 1M+ SQL-queries/day Powered by:
  • 22. Case 2. Fintech Company • Stock Symbols Correlation Analysis • 5000 Symbols • 100ms granularity • 10 years of data 100B data points
  • 24. Main Challenge • Symbols S(1)..S(5000) • Time points Т(1)…T(300M) • log_return(n)(m) = runningDifference(log(price(n))) • corr(n1,n2) = corr(log_return(n1),log_return(n2)) • For every tuple (n1,n2), 12.5M tuples altogether calculate 12,500,000 times!
  • 25. Tried… • Hadoop • Spark • Greenplum ClickHouse
  • 26. Distributed Computations • Distribute data across N servers • Calculate log_return for every symbol at every server using Arrays: • (timestamp, Array[String], Array[Float32]) • Distribute correlation computations across all servers • Batch planning
  • 27. POC Performance Results • 3 servers setup • 2 years, 5000 symbols: • log_return calculations: ~1 h • Converting to arrays: ~ 1 h • Correlations: ~50 hours • 12,5M/50h = 70/sec And is scales easily!
  • 28. Case 3. Bloxy.info - Etherium network analysis • 450M transactions • Transaction level interactive reports • Transaction graph navigation • Aggregate reports • Rich visualization
  • 30. Main Challenge: ClickHouse is bad for point queries!
  • 31. Main Design Decisions • Encode transaction IDs to binary • ClickHouse MergeTree with low index_granularity • Materialized Views for different sort orders • Apache SuperSet for visualization
  • 35. And more: http://bloxy.info • Etherium Mixer Analysis • Token Dynamics • Token Distribution • ERC721 Token and Collectibles • ICO Analysis and Trends • Smart Contract Events and Methods • Etherium Mining • DAO Efficiency Analytics Powered by:
  • 36. Case 4. ClickHouse with MySQL • Accessing MySQL from ClickHouse • Accessing ClickHouse from MySQL • Streaming data from MySQL to ClickHouse • Analyzing MySQL logs with ClickHouse
  • 37. Accessing MySQL from ClickHouse • External dictionaries from MySQL table • Map mysql table to in-memory structure • Mysql() function select * from MySQL('host:port', 'database', 'table', 'user', 'password'); https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse
  • 39. Streaming Data from MySQL to ClickHouse https://github.com/Altinity/clickhouse-mysql-data-reader
  • 41. Analyzing MySQL logs with ClickHouse • MySQL Logs may grow large • https://www.percona.com/blog/2018/02/28/analyze-raw-mysql- query-logs-clickhouse/ • https://www.percona.com/blog/2018/03/29/analyze-mysql-audit- logs-clickhouse-clicktail/
  • 42. Main Lessons • Schema is the most important • Proper data types • Arrays • Dictionaries • Summing/Aggregating MergeTree for realtime aggregation • Materialized Views if one key is not enough • Reduce Index granularity for point queries • Distribute data and load as uniform as possible • Integrate smartly