SlideShare a Scribd company logo
ClickHouse Monitoring 101: What to monitor and how
Presenting the Presenters
Ned McClain
Data Warehouse Engineer
20+ years architecting Internet-scale
infrastructure with an emphasis on
performance and security.
Robert Hodges
CEO
30+ years on DBMS including 20
different DBMS types. Working
on Kubernetes since 2018.
...And Altinity
Leading software and services provider for ClickHouse
#2 committer on ClickHouse
Lead maintainer for Kubernetes, Grafana, ODBC, ...
Sponsor of US and EU ClickHouse events
What to Monitor
on ClickHouse
Useful information from monitoring
Determining current state
Analyzing long-term trends
Alerts and reporting
Comparing over time or experiment groups
Conducting ad hoc retrospective analysis (i.e., debugging)
Security incident detection and response
Cost modeling and analysis
Case #1: When INSERTs go bad
Too many parts (655). Merges
are processing significantly
slower than inserts
How does ClickHouse add or change data?
ClickHouse Server
Table: events
Part: 201801_1_1_0
Part: 201801_2_2_0
Part: 201801_1_2_1
INSERT
SELECT
Merge
ALTER
TABLE
Which metrics can reveal root causes?
ClickHouse Server
Table: events
Part: 201801_1_1_0
Part: 201801_2_2_0
Part: 201801_1_2_1
INSERT
SELECT
Merge
ALTER
TABLE
events.InsertQuery
events.InsertRows
events.RejectedInserts
events.DelayedInserts
asychronous_events.MaxPartCountForPartition
events.MergedRows
metrics.Merge
metrics.BackgroundPoolTask
Total number
of parts [per
table]
metrics.PartMutations
ClickHouse has great operational metrics
Always-available system tables
query_log
(Query
history)
part_log
(Part merges)
text_log
(System log
messages)
async…
_metrics
(Background
metrics)
events
(Cumulative
event
counters)
metrics
(Current
counters)
parts
(All table
parts)
More system tables you can enable
replicas
(Replicated
tables)
metric_log
(Query
history)
Case #2: Query memory questions
kernel: Out of memory: Kill
process 16959
(AsyncBlockInput) score 947 or
sacrifice child
How does ClickHouse process queries?
ClickHouse Server
Table: events
Parts
primary.idx .mrk & .bin
Table: devices
Parts
primary.idx .mrk & .bin
SELECT …
FROM events
JOIN devices
ON ...
Which metrics show query performance?
ClickHouse Server
Table: events
Parts
primary.idx .mrk & .bin
Table: devices
Parts
primary.idx .mrk & .bin
SELECT …
FROM events
JOIN devices
ON ...
events.
Query
SelectQuery
asynchronous_events.MarkCacheFiles
events.
SelectedMarks/SelectedParts
MarkCacheHits/MarkCacheMisses
UncompressedCacheHits
UncompressedCacheMisses
CompressedReadBufferBytes
ReadCompressedBytes
metrics.
MemoryTracking
Case #3: Replication regrets
<Error> . . . Table is in
readonly mode
Zookeeper
ZNodes
Zookeeper
ZNodes
How does ClickHouse replicate data?
INSERT
Zookeeper
ZNodes
ClickHouse
Table:
events
ClickHouse
Table:
events
Zookeeper
ZNodes
Zookeeper
ZNodes
Which metrics diagnose replication issues?
INSERT
Zookeeper
ZNodes
ClickHouse
Table:
events
ClickHouse
Table:
events
asynchronous_events.
ReplicasMaxAbsoluteDelay
ReplicasMaxInsertsInQueue
ReplicasMaxMergesInQueue
metrics.
LeaderReplica
ReadonlyReplica
ZookeeperTransactions
ZookeeperHardwareExceptions
zookeeper_NumAliveConnections
zookeeper_Max[Avg]RequestLatency
zookeeper_Max[Min]ClientResponseSize
zookeeper_OutstandingRequests
zookeeper_PacketsReceived[Sent]
zookeeper_ElectionTimeTaken
zookeeper_Log[Data]DirSize
zookeeper_InMemoryDataTree_NodeCount
How to monitor
ClickHouse
Standard ClickHouse monitoring platforms
Sematext / Datadog / Instana
Nagios/Icinga/Zabbix
InfluxDB / Graphite…
Prometheus + Grafana
Prometheus + ClickHouse Kubernetes Operator + Grafana
Three high-level approaches to monitoring
1. Proprietary Agents
2. Application-aware Polling
3. Metrics Exposition
Metrics collection with Prometheus
Alerts to: messaging,
ticketing,
and webhooks.
Dashboards and
trend analysis.
Prometheus
Alertmanager
Prometheus
Exposed
Metrics
Monitoring a ClickHouse node
Prometheus
Exposed
MetricsClickHouse
Monitoring a ClickHouse node
$ curl clickhouse:9363/metrics
Monitoring a ClickHouse node
/etc/clickhouse-server/config.xml
Monitoring a ClickHouse Cluster
Prometheus
ClickHouse Replica
ClickHouse Replica
ClickHouse Shard
ClickHouse Replica
ClickHouse Replica
ClickHouse Shard
Monitoring Linux and cloud dependencies
● node_exporter: operating system metrics
○ CPU, memory, load, I/O, filesystem, network, and more.
● cloudwatch_exporter: AWS metrics
○ EC2 instances, ELBs, hosted databases, and more.
● stackdriver_exporter: GCP metrics
● azure_metrics_exporter: Azure metrics
Monitoring everything else
Example ClickHouse Alerts
Google's Four Golden Signals
● Latency
● Traffic
● Errors
● Saturation
Monitoring
Example 1:
Linux OS
Demo Time!
ClickHouse Monitoring on Linux
Monitoring
Example 2:
Kubernetes
ClickHouse operator exports to Prometheus
ClickHouse
Operator
your-favorite namespace
ClickHouse
Resource
Definition
z
z
K8s API
Monitoring
data
Monitoringdata
Prometheus
Grafana
System dynamism complicates monitoring
Load
Balancer
Service
Shard 1 Replica 1
Stateful
Set
Pod
Persistent
Volume
Claim
Persistent
Volume
Per-replica Config Map
Replica
Service
ClickHouse
Operator
Monitoring data from
a single replica
Operator exports new pods automatically
Load
Balancer
Service
Shard 1 Replica 1
Stateful
Set
Pod
Persistent
Volume
Claim
Persistent
Volume
Per-replica Config Map
Shard 1 Replica 2
Replica
Service
Replica
Service
ClickHouse
Operator
Monitoring data from
two replicas
Demo Time!
ClickHouse Monitoring on Kubernetes
with Grafana Dashboard
Conclusion and
Further Reading
Summary
● ClickHouse has a wide range of accessible monitoring data
● Export relevant measurements from system database tables
○ Metrics, events, and asynchronous events
○ Query_log
● Prometheus and Grafana offer a great open source solution
○ But there are many other tools as well
● ClickHouse Operator exports monitoring data automatically to Prometheus
● Build monitoring for the whole application, not just ClickHouse
Where to look next...
● Google SRE Book: Monitoring Distributed Systems
○ https://landing.google.com/sre/sre-book
● ClickHouse monitoring documentation
○ https://clickhouse.tech/docs/en/operations/monitoring/
● Grafana Labs Dashboards
○ https://grafana.com/grafana/dashboards
● Altinity Blog
○ https://www.altinity.com/blog
● ClickHouse Kubernetes Operator
○ https://github.com/Altinity/clickhouse-operator
Thank you!
Special Offer:
Contact us for a free
1-hour consultation on
monitoring
Contacts:
info@altinity.com
Visit us at:
https://www.altinity.com
Free Consultation:
https://blog.altinity.com/offer

More Related Content

PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
Using ClickHouse for Experimentation
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
All about Zookeeper and ClickHouse Keeper.pdf
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Using ClickHouse for Experimentation
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...

What's hot (20)

PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
Your first ClickHouse data warehouse
PDF
10 Good Reasons to Use ClickHouse
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
PDF
Altinity Quickstart for ClickHouse
PDF
ClickHouse Keeper
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
A day in the life of a click house query
PDF
Patroni - HA PostgreSQL made easy
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
ClickHouse Materialized Views: The Magic Continues
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
PDF
Linux tuning to improve PostgreSQL performance
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Your first ClickHouse data warehouse
10 Good Reasons to Use ClickHouse
ClickHouse Deep Dive, by Aleksei Milovidov
Adventures with the ClickHouse ReplacingMergeTree Engine
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Better than you think: Handling JSON data in ClickHouse
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Quickstart for ClickHouse
ClickHouse Keeper
High Performance, High Reliability Data Loading on ClickHouse
A day in the life of a click house query
Patroni - HA PostgreSQL made easy
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Creating Beautiful Dashboards with Grafana and ClickHouse
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
ClickHouse Materialized Views: The Magic Continues
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Linux tuning to improve PostgreSQL performance
Ad

Similar to ClickHouse Monitoring 101: What to monitor and how (20)

PDF
OSMC 2023 | Newest developments in Checkmk Raw – the open-source monitoring s...
PDF
Adventures in Observability - Clickhouse and Instana
PDF
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
PDF
OSMC 2024 | Building a better check_http by Mattias Schlenker.pdf
PDF
Monitoring Your AWS EKS Environment with Datadog
PDF
Lessons from Large-Scale Cloud Software at Databricks
PDF
Combinación de logs, métricas y rastreos para observabilidad unificada
PDF
Next-Gen DDoS Detection
PPTX
Highway to heaven - Microservices Meetup Munich
PDF
Sukumar Nayak-Agile-DevOps-Cloud Management
PDF
Webinar Monitoring in era of cloud computing
PPT
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
PPTX
Cloud to hybrid edge cloud evolution Jun112020.pptx
PDF
Log Analytics for Distributed Microservices
PDF
How to Create Observable Integration Solutions Using WSO2 Enterprise Integrator
PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
PPTX
System Center Operations Manager (SCOM) 2007 R2 & Non Microsoft Monitoring
PPT
IBM Monitoring and Event Management Solutions
PDF
Industrial IoT bootcamp
PDF
Enterprise Cloud Security
OSMC 2023 | Newest developments in Checkmk Raw – the open-source monitoring s...
Adventures in Observability - Clickhouse and Instana
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
OSMC 2024 | Building a better check_http by Mattias Schlenker.pdf
Monitoring Your AWS EKS Environment with Datadog
Lessons from Large-Scale Cloud Software at Databricks
Combinación de logs, métricas y rastreos para observabilidad unificada
Next-Gen DDoS Detection
Highway to heaven - Microservices Meetup Munich
Sukumar Nayak-Agile-DevOps-Cloud Management
Webinar Monitoring in era of cloud computing
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
Cloud to hybrid edge cloud evolution Jun112020.pptx
Log Analytics for Distributed Microservices
How to Create Observable Integration Solutions Using WSO2 Enterprise Integrator
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
System Center Operations Manager (SCOM) 2007 R2 & Non Microsoft Monitoring
IBM Monitoring and Event Management Solutions
Industrial IoT bootcamp
Enterprise Cloud Security
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
A Presentation on Artificial Intelligence
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
MYSQL Presentation for SQL database connectivity
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
A Presentation on Artificial Intelligence
NewMind AI Monthly Chronicles - July 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars

ClickHouse Monitoring 101: What to monitor and how

  • 2. Presenting the Presenters Ned McClain Data Warehouse Engineer 20+ years architecting Internet-scale infrastructure with an emphasis on performance and security. Robert Hodges CEO 30+ years on DBMS including 20 different DBMS types. Working on Kubernetes since 2018.
  • 3. ...And Altinity Leading software and services provider for ClickHouse #2 committer on ClickHouse Lead maintainer for Kubernetes, Grafana, ODBC, ... Sponsor of US and EU ClickHouse events
  • 4. What to Monitor on ClickHouse
  • 5. Useful information from monitoring Determining current state Analyzing long-term trends Alerts and reporting Comparing over time or experiment groups Conducting ad hoc retrospective analysis (i.e., debugging) Security incident detection and response Cost modeling and analysis
  • 6. Case #1: When INSERTs go bad Too many parts (655). Merges are processing significantly slower than inserts
  • 7. How does ClickHouse add or change data? ClickHouse Server Table: events Part: 201801_1_1_0 Part: 201801_2_2_0 Part: 201801_1_2_1 INSERT SELECT Merge ALTER TABLE
  • 8. Which metrics can reveal root causes? ClickHouse Server Table: events Part: 201801_1_1_0 Part: 201801_2_2_0 Part: 201801_1_2_1 INSERT SELECT Merge ALTER TABLE events.InsertQuery events.InsertRows events.RejectedInserts events.DelayedInserts asychronous_events.MaxPartCountForPartition events.MergedRows metrics.Merge metrics.BackgroundPoolTask Total number of parts [per table] metrics.PartMutations
  • 9. ClickHouse has great operational metrics Always-available system tables query_log (Query history) part_log (Part merges) text_log (System log messages) async… _metrics (Background metrics) events (Cumulative event counters) metrics (Current counters) parts (All table parts) More system tables you can enable replicas (Replicated tables) metric_log (Query history)
  • 10. Case #2: Query memory questions kernel: Out of memory: Kill process 16959 (AsyncBlockInput) score 947 or sacrifice child
  • 11. How does ClickHouse process queries? ClickHouse Server Table: events Parts primary.idx .mrk & .bin Table: devices Parts primary.idx .mrk & .bin SELECT … FROM events JOIN devices ON ...
  • 12. Which metrics show query performance? ClickHouse Server Table: events Parts primary.idx .mrk & .bin Table: devices Parts primary.idx .mrk & .bin SELECT … FROM events JOIN devices ON ... events. Query SelectQuery asynchronous_events.MarkCacheFiles events. SelectedMarks/SelectedParts MarkCacheHits/MarkCacheMisses UncompressedCacheHits UncompressedCacheMisses CompressedReadBufferBytes ReadCompressedBytes metrics. MemoryTracking
  • 13. Case #3: Replication regrets <Error> . . . Table is in readonly mode
  • 14. Zookeeper ZNodes Zookeeper ZNodes How does ClickHouse replicate data? INSERT Zookeeper ZNodes ClickHouse Table: events ClickHouse Table: events
  • 15. Zookeeper ZNodes Zookeeper ZNodes Which metrics diagnose replication issues? INSERT Zookeeper ZNodes ClickHouse Table: events ClickHouse Table: events asynchronous_events. ReplicasMaxAbsoluteDelay ReplicasMaxInsertsInQueue ReplicasMaxMergesInQueue metrics. LeaderReplica ReadonlyReplica ZookeeperTransactions ZookeeperHardwareExceptions zookeeper_NumAliveConnections zookeeper_Max[Avg]RequestLatency zookeeper_Max[Min]ClientResponseSize zookeeper_OutstandingRequests zookeeper_PacketsReceived[Sent] zookeeper_ElectionTimeTaken zookeeper_Log[Data]DirSize zookeeper_InMemoryDataTree_NodeCount
  • 17. Standard ClickHouse monitoring platforms Sematext / Datadog / Instana Nagios/Icinga/Zabbix InfluxDB / Graphite… Prometheus + Grafana Prometheus + ClickHouse Kubernetes Operator + Grafana
  • 18. Three high-level approaches to monitoring 1. Proprietary Agents 2. Application-aware Polling 3. Metrics Exposition
  • 19. Metrics collection with Prometheus Alerts to: messaging, ticketing, and webhooks. Dashboards and trend analysis. Prometheus Alertmanager Prometheus Exposed Metrics
  • 20. Monitoring a ClickHouse node Prometheus Exposed MetricsClickHouse
  • 21. Monitoring a ClickHouse node $ curl clickhouse:9363/metrics
  • 22. Monitoring a ClickHouse node /etc/clickhouse-server/config.xml
  • 23. Monitoring a ClickHouse Cluster Prometheus ClickHouse Replica ClickHouse Replica ClickHouse Shard ClickHouse Replica ClickHouse Replica ClickHouse Shard
  • 24. Monitoring Linux and cloud dependencies ● node_exporter: operating system metrics ○ CPU, memory, load, I/O, filesystem, network, and more. ● cloudwatch_exporter: AWS metrics ○ EC2 instances, ELBs, hosted databases, and more. ● stackdriver_exporter: GCP metrics ● azure_metrics_exporter: Azure metrics
  • 27. Google's Four Golden Signals ● Latency ● Traffic ● Errors ● Saturation
  • 31. ClickHouse operator exports to Prometheus ClickHouse Operator your-favorite namespace ClickHouse Resource Definition z z K8s API Monitoring data Monitoringdata Prometheus Grafana
  • 32. System dynamism complicates monitoring Load Balancer Service Shard 1 Replica 1 Stateful Set Pod Persistent Volume Claim Persistent Volume Per-replica Config Map Replica Service ClickHouse Operator Monitoring data from a single replica
  • 33. Operator exports new pods automatically Load Balancer Service Shard 1 Replica 1 Stateful Set Pod Persistent Volume Claim Persistent Volume Per-replica Config Map Shard 1 Replica 2 Replica Service Replica Service ClickHouse Operator Monitoring data from two replicas
  • 34. Demo Time! ClickHouse Monitoring on Kubernetes with Grafana Dashboard
  • 36. Summary ● ClickHouse has a wide range of accessible monitoring data ● Export relevant measurements from system database tables ○ Metrics, events, and asynchronous events ○ Query_log ● Prometheus and Grafana offer a great open source solution ○ But there are many other tools as well ● ClickHouse Operator exports monitoring data automatically to Prometheus ● Build monitoring for the whole application, not just ClickHouse
  • 37. Where to look next... ● Google SRE Book: Monitoring Distributed Systems ○ https://landing.google.com/sre/sre-book ● ClickHouse monitoring documentation ○ https://clickhouse.tech/docs/en/operations/monitoring/ ● Grafana Labs Dashboards ○ https://grafana.com/grafana/dashboards ● Altinity Blog ○ https://www.altinity.com/blog ● ClickHouse Kubernetes Operator ○ https://github.com/Altinity/clickhouse-operator
  • 38. Thank you! Special Offer: Contact us for a free 1-hour consultation on monitoring Contacts: info@altinity.com Visit us at: https://www.altinity.com Free Consultation: https://blog.altinity.com/offer