SlideShare a Scribd company logo
DATA WAREHOUSE AND
KUBERNETES:
LESSONS FROM THE
CLICKHOUSE
OPERATOR
Robert Hodges -- August 2019
Brief Intros
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Why run data warehouse on Kubernetes?
1. Same environment as other cloud native services
2. Portability
3. Fast deployment cycles
4. Flexible mapping to resources
ClickHouse
Data
Warehouse
Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
a b c d
a b c d
a b c d
a b c d
And it’s really fast!
ClickHouse structure is optimized for speed
Table
Part
Index Columns
Indexed
Sorted
Compressed
Part
Index Columns
Part
ClickHouse has built-in sharding & replication
ClickHouse
event_loc
ClickHouse
event
event_loc
ClickHouse
event_loc
ClickHouse
event_loc
ClickHouse
event_loc
ClickHouse
event_loc
SELECT ...
FROM event
GROUP BY ...
Result Set
Zookeeper
ZNodes
Zookeeper
ZNodes
Zookeeper
ZNodes
Kubernetes
“Kubernetes is the new Linux”
Actually it’s an open-source platform to:
● manage container-based systems
● build distributed applications declaratively
● allocate machine resources efficiently
● automate application deployment
A typical distributed service
Load
Balancer
Service
#1
Service
#3
Service
#2
Storage
Storage
Storage
Traffic
Defined using Kubernetes resources
Pod
“svc-1”
Persistent
Volume
Service
“svc”
Stateful
Set
Persistent
Volume
Claim
Persistent
Volume
Persistent
Volume
Pod
“svc-2”
Pod
“svc-2”
Persistent
Volume
Claim
Persistent
Volume
Claim
Config
Maps
SecretsConfig
Maps
Secrets
Kubernetes NodeKubernetes NodeKubernetes Node
Mapped to proxies, containers, and storage
Container
“svc-1”
NVMe
SSD
NGINX
“svc”
Container
“svc-2”
Container
“svc-3”
NVMe
SSD
NVMe
SSD
ClickHouse
Operator
ClickHouse on Kubernetes is complex!
Zookeeper
Services
Zookeeper-0
Zookeeper-2
Zookeeper-1Shard 1 Replica 1
Replica
Service
Load
Balancer
Service
Shard 1 Replica 2
Shard 2 Replica 1
Shard 2 Replica 2
Replica
Service
Replica
Service
Replica
Service
User Config Map Common Config Map
Stateful
Set
Pod
Persistent
Volume
Claim
Persistent
Volume
Per-replica Config Map
Operators encapsulate complex deployments
kube-system namespace
ClickHouse
Operator
your-favorite namespace
Apache 2.0 source,
distributed as Docker
imageSingle specification
Best practice deployment
ClickHouse
Resource
Definition
Basic data warehouse topology
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "ch01"
spec:
configuration:
clusters:
- name: replicated
layout:
shardsCount: 2
replicasCount: 2
zookeeper:
nodes:
- host: zookeeper.zk
Name used to identify all resources
Definition of cluster
Location of service we depend on
Simplicity requires defaults
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.6
serviceTemplate: minikube
templates:
volumeClaimTemplates:
- name: persistent
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Name of template
Storage misconfigurations
lead to insidious errors
Templates can be simple, too
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.6
serviceTemplate: minikube
templates:
podTemplates:
- name: clickhouse:19.6
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.6.2.11
Name of template
Most values take
defaults
templates:
podTemplates:
- name: clickhouse-in-zone-us-east-1b
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "failure-domain.beta.kubernetes.io/zone"
operator: In
values:
- "us-east-1b"
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.3.7
Or specify complex configuration
Set availability
zone affinity
More container
properties
Versatile mapping to different deployments
ClickHouse
Resource
Definition
Pod
Load
Balance
PodPod
Pod Pod
Load
BalanceLoad
Balance
Load
BalanceLoad
Balance
Pod Pod
Load
BalanceLoad
Balance
Pod Pod
Minikube Multi-AZ Deployment
(Differences mostly
in templates)
Changes are recognized automatically
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.11
serviceTemplate: minikube
templates:
podTemplates:
- name: clickhouse:19.11
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.11.3.11
Make new version
the default
Define template
for new version
Upgrade runs while service is online
Pod
chi-0-0
Update resource definition
ClickHouse
Operator
Apply Pod
chi-0-1
Pod
chi-1-1
Pod
chi-1-0
Compare resource
to actual state
Upgrade pods sequentially
ClickHouse
Resource
Definition
Grafana
ClickHouse monitoring with prometheus
ClickHouse
Operator
(ServiceMonitor)
ClickHouse Installations
Prometheus
Demo Time
Fast data warehouse
deployment on Kubernetes
Lessons from
Operator
Development
Pod
chi-0-1
Surprise! DNS is different in Kubernetes
Pod
chi-1-1 Pod
chi-0-1
Pod
chi-1-0
Pod
chi-0-0
DNS DNS
DNS
Restart
Pod restart invalidates
cluster DNS mappings
Core DNS
Server
Name resolution
deadlock at startup
Must resolve
host name
to start up
Won’t resolve
host until
pod starts
Kubernetes overhead is minimal (whew!)
Cluster deploy and load Query Comparison
Redshift dc2.large vs. Kubernetes EC2 r5.xlarge with EBS (st1)
No surprise: error handling is complicated
ClickHouse
Operator
ClickHouse
Resource
Definition
Complex
specification
Kubernetes
Storage
Provider
Asynchronous
execution
Local
semantics
Biggest challenge
Data warehouses are not cattle
Losing/compromising data is bad
Safety is paramount
Security, migration, availability require logic
above level of the operator
Biggest gain
Kubernetes democratizes data
warehouse access
Set up complex configurations in minutes
Map data warehouse flexibly to resources
Integrate easily with other services
Wrap-up
Conclusions
● Kubernetes operators set up DW from single specification
● ClickHouse experience validates Kubernetes value:
○ Every application can have a data warehouse!
○ Portable
○ Fast deployment
○ Flexible resource management
● Kubernetes operator alone is not enough for all use cases
Future Work
● Data warehouse as a service on Kubernetes
○ Multi-tenancy
○ Data availability
○ Security
○ Optimized resource utilization
● Extend ClickHouse to match cloud native execution model
○ Decouple storage and compute
○ Rebalance data on scale-up/down
Thank you!
We’re hiring!
Presenter:
rhodges@altinity.com
ClickHouse Operator:
https://github.com/Altinity/clickhouse-operator
ClickHouse:
https://github.com/yandex/ClickHouse
Altinity:
https://www.altinity.com

More Related Content

PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
ClickHouse Monitoring 101: What to monitor and how
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PDF
Better than you think: Handling JSON data in ClickHouse
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
All about Zookeeper and ClickHouse Keeper.pdf
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
ClickHouse Monitoring 101: What to monitor and how
Creating Beautiful Dashboards with Grafana and ClickHouse
Better than you think: Handling JSON data in ClickHouse
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO

What's hot (20)

PDF
ClickHouse Keeper
PDF
A day in the life of a click house query
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
PDF
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
MySQL/MariaDB Proxy Software Test
PDF
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
PDF
Parallel Query in AWS Aurora MySQL
PPTX
Qlik Replicate - Control Tableの詳細
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PDF
AWS Aurora 운영사례 (by 배은미)
PDF
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
PDF
patroni-based citrus high availability environment deployment
PDF
Linux tuning to improve PostgreSQL performance
PPTX
Qlik Replicateのファイルチャネルの利用
PDF
MAA Best Practices for Oracle Database 19c
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Keeper
A day in the life of a click house query
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
High Performance, High Reliability Data Loading on ClickHouse
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
MySQL/MariaDB Proxy Software Test
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Parallel Query in AWS Aurora MySQL
Qlik Replicate - Control Tableの詳細
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
AWS Aurora 운영사례 (by 배은미)
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
patroni-based citrus high availability environment deployment
Linux tuning to improve PostgreSQL performance
Qlik Replicateのファイルチャネルの利用
MAA Best Practices for Oracle Database 19c
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Ad

Similar to Data Warehouse on Kubernetes: lessons from Clickhouse Operator (20)

PDF
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
PDF
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
PDF
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEO
PDF
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
PDF
Effective Platform Building with Kubernetes. Is K8S new Linux?
PDF
Kubernetes for Java Developers
PDF
Running and Managing Kubernetes on OpenStack
PDF
Effective Building your Platform with Kubernetes == Keep it Simple
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
stackconf 2020 | The blinking cursor or kubernetes for people who aren´t supp...
PPTX
K8s in 3h - Kubernetes Fundamentals Training
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
Deploying kubernetes at scale on OpenStack
PDF
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
PDF
Get you Java application ready for Kubernetes !
PDF
Kubernetes for java developers - Tutorial at Oracle Code One 2018
PDF
Kubernetes for Java developers
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEO
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Effective Platform Building with Kubernetes. Is K8S new Linux?
Kubernetes for Java Developers
Running and Managing Kubernetes on OpenStack
Effective Building your Platform with Kubernetes == Keep it Simple
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
stackconf 2020 | The blinking cursor or kubernetes for people who aren´t supp...
K8s in 3h - Kubernetes Fundamentals Training
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Deploying kubernetes at scale on OpenStack
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...
Get you Java application ready for Kubernetes !
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for Java developers
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
PDF
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
PDF
OSA Con 2022 - Quick Reflexes_ Building Real-Time Data Analytics with Redpand...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Quick Reflexes_ Building Real-Time Data Analytics with Redpand...

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
A Presentation on Artificial Intelligence
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Diabetes mellitus diagnosis method based random forest with bat algorithm
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025

Data Warehouse on Kubernetes: lessons from Clickhouse Operator

  • 1. DATA WAREHOUSE AND KUBERNETES: LESSONS FROM THE CLICKHOUSE OPERATOR Robert Hodges -- August 2019
  • 2. Brief Intros www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Why run data warehouse on Kubernetes? 1. Same environment as other cloud native services 2. Portability 3. Fast deployment cycles 4. Flexible mapping to resources
  • 5. Introduction to ClickHouse Understands SQL Runs on bare metal to cloud Shared nothing architecture Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) a b c d a b c d a b c d a b c d And it’s really fast!
  • 6. ClickHouse structure is optimized for speed Table Part Index Columns Indexed Sorted Compressed Part Index Columns Part
  • 7. ClickHouse has built-in sharding & replication ClickHouse event_loc ClickHouse event event_loc ClickHouse event_loc ClickHouse event_loc ClickHouse event_loc ClickHouse event_loc SELECT ... FROM event GROUP BY ... Result Set Zookeeper ZNodes Zookeeper ZNodes Zookeeper ZNodes
  • 9. “Kubernetes is the new Linux” Actually it’s an open-source platform to: ● manage container-based systems ● build distributed applications declaratively ● allocate machine resources efficiently ● automate application deployment
  • 10. A typical distributed service Load Balancer Service #1 Service #3 Service #2 Storage Storage Storage Traffic
  • 11. Defined using Kubernetes resources Pod “svc-1” Persistent Volume Service “svc” Stateful Set Persistent Volume Claim Persistent Volume Persistent Volume Pod “svc-2” Pod “svc-2” Persistent Volume Claim Persistent Volume Claim Config Maps SecretsConfig Maps Secrets
  • 12. Kubernetes NodeKubernetes NodeKubernetes Node Mapped to proxies, containers, and storage Container “svc-1” NVMe SSD NGINX “svc” Container “svc-2” Container “svc-3” NVMe SSD NVMe SSD
  • 14. ClickHouse on Kubernetes is complex! Zookeeper Services Zookeeper-0 Zookeeper-2 Zookeeper-1Shard 1 Replica 1 Replica Service Load Balancer Service Shard 1 Replica 2 Shard 2 Replica 1 Shard 2 Replica 2 Replica Service Replica Service Replica Service User Config Map Common Config Map Stateful Set Pod Persistent Volume Claim Persistent Volume Per-replica Config Map
  • 15. Operators encapsulate complex deployments kube-system namespace ClickHouse Operator your-favorite namespace Apache 2.0 source, distributed as Docker imageSingle specification Best practice deployment ClickHouse Resource Definition
  • 16. Basic data warehouse topology apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "ch01" spec: configuration: clusters: - name: replicated layout: shardsCount: 2 replicasCount: 2 zookeeper: nodes: - host: zookeeper.zk Name used to identify all resources Definition of cluster Location of service we depend on
  • 17. Simplicity requires defaults defaults: templates: volumeClaimTemplate: persistent podTemplate: clickhouse:19.6 serviceTemplate: minikube templates: volumeClaimTemplates: - name: persistent spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi Name of template Storage misconfigurations lead to insidious errors
  • 18. Templates can be simple, too defaults: templates: volumeClaimTemplate: persistent podTemplate: clickhouse:19.6 serviceTemplate: minikube templates: podTemplates: - name: clickhouse:19.6 spec: containers: - name: clickhouse-pod image: yandex/clickhouse-server:19.6.2.11 Name of template Most values take defaults
  • 19. templates: podTemplates: - name: clickhouse-in-zone-us-east-1b spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "failure-domain.beta.kubernetes.io/zone" operator: In values: - "us-east-1b" containers: - name: clickhouse-pod image: yandex/clickhouse-server:19.3.7 Or specify complex configuration Set availability zone affinity More container properties
  • 20. Versatile mapping to different deployments ClickHouse Resource Definition Pod Load Balance PodPod Pod Pod Load BalanceLoad Balance Load BalanceLoad Balance Pod Pod Load BalanceLoad Balance Pod Pod Minikube Multi-AZ Deployment (Differences mostly in templates)
  • 21. Changes are recognized automatically defaults: templates: volumeClaimTemplate: persistent podTemplate: clickhouse:19.11 serviceTemplate: minikube templates: podTemplates: - name: clickhouse:19.11 spec: containers: - name: clickhouse-pod image: yandex/clickhouse-server:19.11.3.11 Make new version the default Define template for new version
  • 22. Upgrade runs while service is online Pod chi-0-0 Update resource definition ClickHouse Operator Apply Pod chi-0-1 Pod chi-1-1 Pod chi-1-0 Compare resource to actual state Upgrade pods sequentially ClickHouse Resource Definition
  • 23. Grafana ClickHouse monitoring with prometheus ClickHouse Operator (ServiceMonitor) ClickHouse Installations Prometheus
  • 24. Demo Time Fast data warehouse deployment on Kubernetes
  • 26. Pod chi-0-1 Surprise! DNS is different in Kubernetes Pod chi-1-1 Pod chi-0-1 Pod chi-1-0 Pod chi-0-0 DNS DNS DNS Restart Pod restart invalidates cluster DNS mappings Core DNS Server Name resolution deadlock at startup Must resolve host name to start up Won’t resolve host until pod starts
  • 27. Kubernetes overhead is minimal (whew!) Cluster deploy and load Query Comparison Redshift dc2.large vs. Kubernetes EC2 r5.xlarge with EBS (st1)
  • 28. No surprise: error handling is complicated ClickHouse Operator ClickHouse Resource Definition Complex specification Kubernetes Storage Provider Asynchronous execution Local semantics
  • 29. Biggest challenge Data warehouses are not cattle Losing/compromising data is bad Safety is paramount Security, migration, availability require logic above level of the operator
  • 30. Biggest gain Kubernetes democratizes data warehouse access Set up complex configurations in minutes Map data warehouse flexibly to resources Integrate easily with other services
  • 32. Conclusions ● Kubernetes operators set up DW from single specification ● ClickHouse experience validates Kubernetes value: ○ Every application can have a data warehouse! ○ Portable ○ Fast deployment ○ Flexible resource management ● Kubernetes operator alone is not enough for all use cases
  • 33. Future Work ● Data warehouse as a service on Kubernetes ○ Multi-tenancy ○ Data availability ○ Security ○ Optimized resource utilization ● Extend ClickHouse to match cloud native execution model ○ Decouple storage and compute ○ Rebalance data on scale-up/down
  • 34. Thank you! We’re hiring! Presenter: rhodges@altinity.com ClickHouse Operator: https://github.com/Altinity/clickhouse-operator ClickHouse: https://github.com/yandex/ClickHouse Altinity: https://www.altinity.com