SlideShare a Scribd company logo
June 7th, 2022 - My first 90 days with ClickHouse - Alkin Tezuysal -
EVP Global Services - ChistaDATA Inc.
Let’s get connected with Alkin first
● Alkin Tezuysal - EVP - Global Services @chistadata
○ Linkedin : https://www.linkedin.com/in/askdba/
○ Twitter: https://twitter.com/ask_dba
● Open Source Database Evangelist
○ Previously PlanetScale, Percona and Pythian as Technical Manager, SRE, DBA (MySQL)
○ Previously Enterprise DBA , Informix, Oracle, DB2 , SQL Server
● Author, Speaker, Mentor, and Coach
@ChistaDATA Inc. 2022
@ask_dba
Also…
Someone who is Born to Sail
@ChistaDATA Inc. 2022
Forced to Work
@ask_dba
Trivia Question ?
@ChistaDATA Inc. 2022
The left and right sides of the boat are referred to as what?
@ask_dba
About MySQL Cookbook 4th Edition
● O’reilly Book previously authored by Paul Dubois 3 editions
● Solutions for Database Developers and Administrators
● More than 950 pages of recipes for specific database challenges
● It took two years of authoring, rewriting, reviewing, editing and learning.
● Co-authored with Sveta Smirnova - MySQL Expert / Author , Percona
@ChistaDATA Inc. 2022
@ask_dba
@ChistaDATA Inc. 2022
@ask_dba
@svetsmirnova
About another book…
By Vijay Anand
● Database Fundamentals overview
● Comparison and examples from different data stores
● Techniques, tips and tricks for ClickHouse
● Great overview and summary for beginners
@ChistaDATA Inc. 2022
@ask_dba
About ClickHouse
● Columnar Storage
● SQL Compatible
● Open Source (Apache 2.0)
● Shared Nothing Architecture
● Parallel Execution
● Rich in Aggregate Functions
● Super fast for Analytics workload
○ Compression and Encoding
@ChistaDATA Inc. 2022
@ask_dba
Other ClickHouse features
● Engine types for analytical workloads
● Materialized Views
● External data connectors
● Data types for compatibility with other sources
@ChistaDATA Inc. 2022
@ask_dba
Columnar Storage
orders
@ChistaDATA Inc. 2022
order_id 1 2 3 4
order_code AB-01 AB-02 AB-02 AB-03
order_amount 2.99 1.99 1.50 2.25
order_category stationary stationary stationary gifts
@ask_dba
SQL Compatible
● Full SQL parser (INSERT)
○ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')
● Data format parser
○ SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS
ORDER BY EventDate FORMAT TabSeparated
@ChistaDATA Inc. 2022
@ask_dba
Shared Nothing Architecture
Data distribution refers to splitting the very large dataset into multiple shards
which are stored on different servers. ClickHouse divides the dataset into
shards according to the sharding key. Each shard holds and processes a part
of the data, the query results from multiple shards are then combined together
to give the final result.
@ChistaDATA Inc. 2022
@ask_dba
Zookeeper
Sharding
@ChistaDATA Inc. 2022
@ask_dba
Shard_01 Shard_02 Shard_03 Shard_n
Replication
Data replication refers to keeping a copy of the data on the other server nodes for
ensuring availability in case of server node failure.
This can also improve performance by allowing multiple servers to process the
data queries in parallel.
@ChistaDATA Inc. 2022
@ask_dba
Zookeeper
Replication
@ChistaDATA Inc. 2022
@ask_dba
Shard_01 Shard_02 Shard_03 Shard_n
Replica_01 Replica_02 Replica_03 Shard_n
Replication & Sharding
@ChistaDATA Inc. 2022
@ask_dba
Shard_01
Replica_01
Clickhouse Node 1
Shard_04
Replica_04
Shard_02
Replica_02
Clickhouse Node 2
Shard_03
Replica_03
Shard_05
Replica_05
Clickhouse Node 3
Shard_06
Replica_06
Shard_n
Replica_n
Clickhouse Node n
Shard_n
Replica_n
Zookeeper
Replication & Sharding
@ChistaDATA Inc. 2022
@ask_dba
Parallel Execution
● Large queries are parallelized naturally, taking all the necessary resources available on the current
server.
● Distributed processing on multiple nodes.
@ChistaDATA Inc. 2022
@ask_dba
Rich in Aggregate Functions
● Generic aggregate functions (count, min, max, avg, etc.)
○ Ton of ClickHouse specific aggregate functions
● Parametric aggregate functions (histogram, sequenceMatch, etc.)
● Combinators to change the behavior of the aggregate function (-if sumIf,
avgIf)(-array sumArray, uniqArray)
@ChistaDATA Inc. 2022
@ask_dba
Super fast for Analytics workload
● Cost efficient performance against other solutions
● Improved performance on every release
● Use cases and usage increasing hence
@ChistaDATA Inc. 2022
@ask_dba
Data on Kubernetes?
● Still a controversial subject?
● Community and use cases
● Is it production grade yet?
@ChistaDATA Inc. 2022
@ask_dba
Operators
● Vitess operator by PlanetScale
● Percona XtraDB, MongoDB, PostgreSQL operators by Percona
● ClickHouse Kubernetes Operator by Altinity
● Oracle MySQL InnoDB Cluster
● More to come? Maybe …
● In the meantime other operators
○ RedShift,
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
● Creates ClickHouse clusters defined as custom resources
● Customized storage provisioning (VolumeClaim templates)
● Customized pod templates
● Customized service templates for endpoints
● ClickHouse configuration management
● ClickHouse users management
● ClickHouse cluster scaling including automatic schema propagation
● ClickHouse version upgrades
● Exporting ClickHouse metrics to Prometheus
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
@ChistaDATA Inc. 2022
@ask_dba
Open Source ClickHouse Community
1. DOK
2. ClickHouse
3. Altinity
4. ChistaDATA
@ChistaDATA Inc. 2022
@ask_dba
About ChistaDATA Inc.
● Founded in 2021 by Shiv Iyer - CEO and Principal
● Has received 3M USD seed investment
● Focusing on ClickHouse infrastructure operations
● Services around dedicated Managed Services, Support and Consulting
● We’re hiring globally DBAs, SREs and DevOps Engineers
@ChistaDATA Inc. 2022
@ask_dba

More Related Content

PDF
How OLTP to OLAP Archival Demystified
PDF
How is Real-Time Analytics Different from Traditional OLAP?
PPTX
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
PDF
OSA Con 2022 - Quick Reflexes_ Building Real-Time Data Analytics with Redpand...
How OLTP to OLAP Archival Demystified
How is Real-Time Analytics Different from Traditional OLAP?
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
OSA Con 2022 - Quick Reflexes_ Building Real-Time Data Analytics with Redpand...

Similar to Dok Talks #133 - My First 90 days with Clickhouse (20)

PDF
Building Real-Time Analytics Infrastructure on ClickHouse with ChistaDATA
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PPTX
ParStream - Big Data for Business Users
PDF
Analytics with Apache Superset and ClickHouse - DoK Talks #151
PPTX
Chen li asterix db: 大数据处理开源平台
PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PDF
Modern data warehouse
PDF
Modern data warehouse
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PDF
Treasure Data and Heroku
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
How to build and run a big data platform in the 21st century
PPTX
Big data4businessusers
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PPT
Designing Scalable Data Warehouse Using MySQL
PPTX
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
PPT
CS8091_BDA_Unit_I_Analytical_Architecture
PPTX
Architecting for Big Data: Trends, Tips, and Deployment Options
Building Real-Time Analytics Infrastructure on ClickHouse with ChistaDATA
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ParStream - Big Data for Business Users
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Chen li asterix db: 大数据处理开源平台
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
Modern data warehouse
Modern data warehouse
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
Treasure Data and Heroku
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
How to build and run a big data platform in the 21st century
Big data4businessusers
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Designing Scalable Data Warehouse Using MySQL
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
CS8091_BDA_Unit_I_Analytical_Architecture
Architecting for Big Data: Trends, Tips, and Deployment Options
Ad

More from DoKC (20)

PDF
Distributed Vector Databases - What, Why, and How
PDF
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
PDF
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
PDF
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
PDF
The State of Stateful on Kubernetes
PDF
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
PDF
Make Your Kafka Cluster Production-Ready
PDF
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
PDF
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
PDF
The Kubernetes Native Database
PDF
ING Data Services hosted on ICHP DoK Amsterdam 2023
PDF
Implementing data and databases on K8s within the Dutch government
PDF
StatefulSets in K8s - DoK Talks #154
PDF
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
PPTX
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
PDF
Evaluating Cloud Native Storage Vendors - DoK Talks #147
PDF
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
PDF
We will Dok You! - The journey to adopt stateful workloads on k8s
PPTX
Mastering MongoDB on Kubernetes, the power of operators
PDF
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Distributed Vector Databases - What, Why, and How
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
The State of Stateful on Kubernetes
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Make Your Kafka Cluster Production-Ready
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
The Kubernetes Native Database
ING Data Services hosted on ICHP DoK Amsterdam 2023
Implementing data and databases on K8s within the Dutch government
StatefulSets in K8s - DoK Talks #154
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
We will Dok You! - The journey to adopt stateful workloads on k8s
Mastering MongoDB on Kubernetes, the power of operators
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Ad

Recently uploaded (20)

PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Introduction to Windows Operating System
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Tech Workshop Escape Room Tech Workshop
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
Types of Token_ From Utility to Security.pdf
Autodesk AutoCAD Crack Free Download 2025
Weekly report ppt - harsh dattuprasad patel.pptx
Computer Software and OS of computer science of grade 11.pptx
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Topaz Photo AI Crack New Download (Latest 2025)
Introduction to Windows Operating System
Salesforce Agentforce AI Implementation.pdf
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Tech Workshop Escape Room Tech Workshop
Oracle Fusion HCM Cloud Demo for Beginners
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Monitoring Stack: Grafana, Loki & Promtail
Visual explanation of Dijkstra's Algorithm using Python
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
CCleaner 6.39.11548 Crack 2025 License Key
Types of Token_ From Utility to Security.pdf

Dok Talks #133 - My First 90 days with Clickhouse

  • 1. June 7th, 2022 - My first 90 days with ClickHouse - Alkin Tezuysal - EVP Global Services - ChistaDATA Inc.
  • 2. Let’s get connected with Alkin first ● Alkin Tezuysal - EVP - Global Services @chistadata ○ Linkedin : https://www.linkedin.com/in/askdba/ ○ Twitter: https://twitter.com/ask_dba ● Open Source Database Evangelist ○ Previously PlanetScale, Percona and Pythian as Technical Manager, SRE, DBA (MySQL) ○ Previously Enterprise DBA , Informix, Oracle, DB2 , SQL Server ● Author, Speaker, Mentor, and Coach @ChistaDATA Inc. 2022 @ask_dba
  • 3. Also… Someone who is Born to Sail @ChistaDATA Inc. 2022 Forced to Work @ask_dba
  • 4. Trivia Question ? @ChistaDATA Inc. 2022 The left and right sides of the boat are referred to as what? @ask_dba
  • 5. About MySQL Cookbook 4th Edition ● O’reilly Book previously authored by Paul Dubois 3 editions ● Solutions for Database Developers and Administrators ● More than 950 pages of recipes for specific database challenges ● It took two years of authoring, rewriting, reviewing, editing and learning. ● Co-authored with Sveta Smirnova - MySQL Expert / Author , Percona @ChistaDATA Inc. 2022 @ask_dba
  • 7. About another book… By Vijay Anand ● Database Fundamentals overview ● Comparison and examples from different data stores ● Techniques, tips and tricks for ClickHouse ● Great overview and summary for beginners @ChistaDATA Inc. 2022 @ask_dba
  • 8. About ClickHouse ● Columnar Storage ● SQL Compatible ● Open Source (Apache 2.0) ● Shared Nothing Architecture ● Parallel Execution ● Rich in Aggregate Functions ● Super fast for Analytics workload ○ Compression and Encoding @ChistaDATA Inc. 2022 @ask_dba
  • 9. Other ClickHouse features ● Engine types for analytical workloads ● Materialized Views ● External data connectors ● Data types for compatibility with other sources @ChistaDATA Inc. 2022 @ask_dba
  • 10. Columnar Storage orders @ChistaDATA Inc. 2022 order_id 1 2 3 4 order_code AB-01 AB-02 AB-02 AB-03 order_amount 2.99 1.99 1.50 2.25 order_category stationary stationary stationary gifts @ask_dba
  • 11. SQL Compatible ● Full SQL parser (INSERT) ○ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') ● Data format parser ○ SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated @ChistaDATA Inc. 2022 @ask_dba
  • 12. Shared Nothing Architecture Data distribution refers to splitting the very large dataset into multiple shards which are stored on different servers. ClickHouse divides the dataset into shards according to the sharding key. Each shard holds and processes a part of the data, the query results from multiple shards are then combined together to give the final result. @ChistaDATA Inc. 2022 @ask_dba
  • 14. Replication Data replication refers to keeping a copy of the data on the other server nodes for ensuring availability in case of server node failure. This can also improve performance by allowing multiple servers to process the data queries in parallel. @ChistaDATA Inc. 2022 @ask_dba
  • 15. Zookeeper Replication @ChistaDATA Inc. 2022 @ask_dba Shard_01 Shard_02 Shard_03 Shard_n Replica_01 Replica_02 Replica_03 Shard_n
  • 16. Replication & Sharding @ChistaDATA Inc. 2022 @ask_dba Shard_01 Replica_01 Clickhouse Node 1 Shard_04 Replica_04 Shard_02 Replica_02 Clickhouse Node 2 Shard_03 Replica_03 Shard_05 Replica_05 Clickhouse Node 3 Shard_06 Replica_06 Shard_n Replica_n Clickhouse Node n Shard_n Replica_n Zookeeper
  • 18. Parallel Execution ● Large queries are parallelized naturally, taking all the necessary resources available on the current server. ● Distributed processing on multiple nodes. @ChistaDATA Inc. 2022 @ask_dba
  • 19. Rich in Aggregate Functions ● Generic aggregate functions (count, min, max, avg, etc.) ○ Ton of ClickHouse specific aggregate functions ● Parametric aggregate functions (histogram, sequenceMatch, etc.) ● Combinators to change the behavior of the aggregate function (-if sumIf, avgIf)(-array sumArray, uniqArray) @ChistaDATA Inc. 2022 @ask_dba
  • 20. Super fast for Analytics workload ● Cost efficient performance against other solutions ● Improved performance on every release ● Use cases and usage increasing hence @ChistaDATA Inc. 2022 @ask_dba
  • 21. Data on Kubernetes? ● Still a controversial subject? ● Community and use cases ● Is it production grade yet? @ChistaDATA Inc. 2022 @ask_dba
  • 22. Operators ● Vitess operator by PlanetScale ● Percona XtraDB, MongoDB, PostgreSQL operators by Percona ● ClickHouse Kubernetes Operator by Altinity ● Oracle MySQL InnoDB Cluster ● More to come? Maybe … ● In the meantime other operators ○ RedShift, @ChistaDATA Inc. 2022 @ask_dba
  • 23. Altinity Operator for ClickHouse ● Creates ClickHouse clusters defined as custom resources ● Customized storage provisioning (VolumeClaim templates) ● Customized pod templates ● Customized service templates for endpoints ● ClickHouse configuration management ● ClickHouse users management ● ClickHouse cluster scaling including automatic schema propagation ● ClickHouse version upgrades ● Exporting ClickHouse metrics to Prometheus @ChistaDATA Inc. 2022 @ask_dba
  • 24. Altinity Operator for ClickHouse @ChistaDATA Inc. 2022 @ask_dba
  • 25. Altinity Operator for ClickHouse @ChistaDATA Inc. 2022 @ask_dba
  • 26. Altinity Operator for ClickHouse @ChistaDATA Inc. 2022 @ask_dba
  • 27. Open Source ClickHouse Community 1. DOK 2. ClickHouse 3. Altinity 4. ChistaDATA @ChistaDATA Inc. 2022 @ask_dba
  • 28. About ChistaDATA Inc. ● Founded in 2021 by Shiv Iyer - CEO and Principal ● Has received 3M USD seed investment ● Focusing on ClickHouse infrastructure operations ● Services around dedicated Managed Services, Support and Consulting ● We’re hiring globally DBAs, SREs and DevOps Engineers @ChistaDATA Inc. 2022 @ask_dba