SlideShare a Scribd company logo
cLoki: Like Loki but for ClickHouse
cLoki
like Loki, but for Clickhouse
cLoki is not affiliated or endorsed by Grafana Labs.
All rights belong to their respective owners.
Hello There
Lorenzo Mangani
- Co-Founder and CEO @ QXIP BV
- Proud father of four awesome kids and dozens of github repositories
- Parsing PCAPs, VoIP/RTC packets, events, statistics and logs since ~2007
QXIP BV
- Based in Amsterdam (NL) and Valencia (ES) with Remote Devs/Ops teams (UK/US/DE/UA)
- Research & Development of OSS & Commercial VoIP/RTC Monitoring Technologies
- Running a Healthy, Self-Sustainable OSS Business Model w/ great Technical Partnerships
- Worldwide Customer base ranging from Tiny Startups up to massive Fortune Corporations
- Experienced making OSS patches/plugins/extensions for many popular projects and platforms
- Clickhouse Centric development since around 2017, working on cLoki since 2019
https://qxip.net
Art of Monitoring
Monitoring and Troubleshooting are our passions at qxip
Over the last decade we’ve successfully assisted thousands of customers and deployments using our technologies,
working with some of the largest Network Operators and Corporations around running State-of-the-Art RTC and
VoIP/Telephony Services at massive scale, learning the arts from real-life issues of watching mission critical setups.
Billions of calls each day are safely captured, dissected and indexed using our stack by our Customers worldwide.
Clickhouse powers the most critical part of our design, and provides us with the reliability our Customers expect.
Art of Monitoring
Monitoring and Troubleshooting are our passions at qxip
Over the last decade we’ve successfully assisted thousands of customers and deployments using our technologies,
working with some of the largest Network Operators and Corporations around running State-of-the-Art RTC and
VoIP/Telephony Services at massive scale, learning the arts from real-life issues of watching mission critical setups.
Billions of calls each day are safely captured, dissected and indexed using our stack by our Customers worldwide.
Clickhouse powers the most critical part of our design, and provides us with the reliability our Customers expect.
While our software stack has perfected the art of VoIP, to this day, about 20% of our activities are log based.
Good old logs, growing in complexity, each with its format and requiring a gazillion parsers and tools.
Not to mention WebRTC and Browser logs with their unstructured and fast-pacing changes….
Clickhouse itself is huge log producer! We all know working with logs can be painful and expensive!
Logging Pains
Logs and Metrics are a key part of any troubleshooting. There are plenty of open-source and commercial solutions
offering suites with parsing support, correlation and even amazing visualizations. Unfortunately many are not ideal.
● Using ELK, Splunk or and any full-text indexing or AWS solution for logs is waaaay too expensive at scale
● InfluxDB, Timescale & co are powerful but very much incompatible with high cadinality / entropy tags in logs/series
● Prometheus metrics are so great but static timeseries do not magically answer all questions alone
● Custom solutions using custom approaches fall short or dead too quickly (big names included)
● Log formats change. Metric reporters change. Debug logs change. Everything changes, all the time!
. . . How much wood would a woodchuck chuck?
Logging Pains
Logs are a key part of any troubleshooting. There are plenty of open-source and commercial solutions offering
complete suites of parsing support, correlation and even amazing visualizations. Unfortunately many are not ideal.
● Using ELK, Splunk or and any full-text indexing or AWS solution for logs is waaaay too expensive at scale
● InfluxDB, Timescale & co are powerful but very much incompatible with high entropy tags in logs/series
● Prometheus metrics is great but static timeseries do not magically answer all questions
● Custom solutions using custom approaches fall short or dead too quickly (big names included)
● Log formats change. Metric reporters change. Debug logs change. Everything changes, all the time.
In 2019 Loki was announced by Grafana Labs with an interesting design goal:
● Logs should be cheap to store and scale
● Log Search should be easy to Operate
● Metrics and Logs need to work together
Grafana Loki
key concepts
What is Loki
Compared to other log aggregation systems, Grafana Loki:
★ Does not do full-text indexing on logs. By only indexing metadata, Loki is simpler to operate and cheaper to run
★ Indexes and groups log streams using the same labels as Prometheus, enabling switch between metrics and logs
★ Uses LogQL query language with native support in Grafana
A Loki-based logging stack consists of 3 components:
★ Agents: responsible for gathering logs and sending them to Loki.
★ Loki: responsible for storing logs and processing queries.
★ Grafana: responsible for querying and displaying the logs.
https://grafana.com/oss/loki/
What is Loki
Grafana Loki and Cortex have a complex architecture design we leave up to their excellent online documentation.
https://grafana.com/oss/loki/
➔ Promtail from Grafana
➔ Pastash from qxip
➔ Logstash from Elastic
➔ Fluentd
➔ Loki4j
➔ & others
We immediately started using it and integrating!
What is Loki
Grafana Loki can of course collect Clickhouse logs…
https://grafana.com/oss/loki/
… everything is working fine but ...
What is Loki
Grafana Loki can of course collect Clickhouse logs… but what if …
https://grafana.com/oss/loki/
… hear me out …
What is Loki
Grafana Loki can of course collect Clickhouse logs… but what if … we just replaced its core with Clickhouse?
https://grafana.com/oss/loki/
cLoki
like Loki, but for Clickhouse
cLoki is not affiliated or endorsed by Grafana Labs.
All rights belong to their respective owners.
cLoki is a clear room implementation with exactly
zero lines of code from the original loki application
Loki vs cLoki ?
If Loki already exists and it works, why do we need cLoki?
★ Grafana Loki’s design concepts and native LogQL integration are brilliant - but we just love to use Clickhouse!
★ Clickhouse is missing a standardized logging/metrics solution and using any other database feels like blasphemy
★ Good Ideas are the fuel of innovation and this is one. A good way to really understand something is to replicate it
OK, but Grafana supports datasource plugins for Clickhouse, why do we need cLoki?
★ cLoki offers native Query and Push features with the same tools and APIs Loki uses for Datasources and Alerting
★ Grafana Loki does not sign open source plugins at all and there’s no easy way to get them installed on their cloud
★ Clickhouse SQL dialect is powerful but has a steep learning curve LogQL & cLoki can easily help compensate
https://cloki.org
NOTE: cLoki is a clear room implementation and uses exactly zero lines of code from the original loki application
Loki vs cLoki
Project Challenges and Ambitions
★ Loki API is overall quite simple and easy to fully map and emulate - done!
★ LogQL has lots of functionality to emulate but step by step - Log
★ Logs are relatively easy to handle, Timeseries math and precision can be tedious
Project Strengths: Keep it simple!
★ Clickhouse is the center of our design and cLoki is only intended as a useful API on top of its great core features
★ Not a clone but rather a study in finding and adapting effective analogies from Clickhouse usage best-practices
★ Long Term strategy rather than reinventing the wheel, we’re integrating with a stable platform with ~1M happy users
★ Portability to other languages while retaining compatibility is possible with a simple and consistent model (->golang)
https://cloki.org
NOTE: cLoki is a clear room implementation and uses exactly zero lines of code from the original loki application
Meet cLoki https://cloki.org
“Let’s rebuild Loki on top of Clickhouse!”
Meet cLoki
“Let’s rebuild Loki on top of Clickhouse!”
Just like Loki, cLoki does not parse or index incoming logs, but rather groups log streams using Prometheus-like Log Labels
Clickhouse does all the heavy-lifting only leaving the filtered LogQL interpolation stages (parsing, extractions, etc) to the client
Promtail and any other Loki compatible client work transparently.
We even bundle our own log collector and multi-IO processor: can extract logs from files, queues, pipes, sockets, etc
https://cloki.org
Meet cLoki
cLoki was designed to be thin, low profile and easy to extend:
★ Fast & Modular API design w/ Fastify NodeJS
★ JSON/Protobuf Push Bulk Inserts w/ LRU Buffering
★ Consistent Label Fingerprinting w/ LRU Caching
★ TTL Based Log Rotation w/ Customizable Intervals
★ LogQL Emulation (AST) w/ Clickhouse Query Transpiler
○ Fully transparent to Grafana and any LogQL tool
○ Native datasource support for Visualization & Alerting
○ Fully Extensible with external Plugins and npm Modules
Some bonus features are also available as collateral:
★ Clickhouse Query function gateway
○ Extract metrics from any clickhouse table using LogQL functions
★ Clickhouse Storage, Compression & Clustering options
○ Store using disks or object storage, cluster using clickhouse-keeper
★ Additional Input Emulation
○ Accept influx/telegraf/prometheus input for receiving logs, metrics
https://cloki.org
LRU
cache
Push cLoki
Example handler /loki/api/v1/push
https://cloki.org
LRU
cache
{ "labels": {"cloki":"UP", “db”:“clickhouse”},
"entries":[
{"timestamp":"2020-12-25T00:00:06.944Z","line":"hello!"},
]
}
HASH({"cloki":"UP", “db”:“clickhouse”}) = FINGERPRINT
TS, FINGERPRINT = "hello!"
LRU
cache
LRU
bulk
Push cLoki: under the hood
fingerprint = hash(labels.sorted);
INSERT INTO cloki.time_series (date, fingerprint, labels, name) VALUES (?, ?, ?, ?)
INSERT INTO cloki.samples (fingerprint, log.timestamp_ms, log.line, string) VALUES (?, ?, ?, ?)
{"labels": {"cloki":"UP", “db”:“clickhouse”}, "entries":[{"timestamp":"2020-12-25T00:00:06.944Z","line":"hello!"}]}
Labels Logs
https://cloki.org
LRU
bulk
LogQL Label & Log Browser
Navigate all Labels and Values available in a selected range of time using the native Grafana Log Browser or just curl
LRU
cache
LRU
https://cloki.org
Query cLoki
Example handler /loki/api/v1/query_range
https://cloki.org
LRU
cache
{“db”:“clickhouse”} |~ “click”
SELECT FINGERPRINT ... JSONExtractString(labels,'db') == 'clickhouse')
SELECT ... WHERE FINGERPRINT ... extractAllGroups(string,'(click)') != []
LRU
cache
Query cLoki: under the hood
WITH str_sel as (
SELECT fingerprint FROM cloki.time_series
WHERE JSONHas(labels, 'level') AND JSONExtractString(labels, 'level') != 'error' ),
sel_a as (
SELECT DISTINCT time_series.labels as labels, samples.string as string,
time_series.fingerprint as fingerprint, samples.timestamp_ms as timestamp_ms
FROM cloki.samples LEFT JOIN cloki.time_series ON samples.fingerprint = time_series.fingerprint
WHERE samples.fingerprint IN str_sel
AND extractAllGroups(string, '(ParsingException)') != []
AND timestamp_ms >= toUnixTimestamp(now()-60)*1000 AND timestamp_ms <= toUnixTimestamp(now())*1000
ORDER BY timestamp_ms desc, labels desc LIMIT 1000
) SELECT * FROM sel_a ORDER BY labels desc, timestamp_ms desc FORMAT JSONEachRow
{ level!="error"} |~ “ParsingException”
Label Matchers Line Filter
https://cloki.org
LogQL Query Language
Loki’s secret is LogQL, a PromQL-inspired query language acting like a distributed grep to aggregate and filter log sources.
LogQL uses labels and operators for filtering:
There are two types of LogQL queries:
➔ LOG queries return the contents of filtered log lines
➔ METRIC queries calculate values based on log extractions
https://grafana.com/docs/loki/latest/logql/
LogQL Label Filtering https://cloki.org
LogQL Regex Filtering https://cloki.org
LogQL Regex Filtering + Regex Parsing https://cloki.org
LogQL Regex Filtering + Regex Parsing + Unwrap https://cloki.org
LogQL JSON Parsing https://cloki.org
LogQL JSON Parsing
The JSON parser can operate without parameters to extract and flatten any available key:
When used with parameters, only the specified keys are parsed to their selected labels:
Use in combination with Label Filters and Aggregation Functions to project log metrics to grouped timeseries:
https://cloki.org
LogQL JSON Parsing + Label Filtering https://cloki.org
LogQL JSON Parsing + Unwrap Aggregations https://cloki.org
Demo
Is it real?
avg_over_time({emitter="janus"} | json | type="32" | event_media="video" | unwrap event_packets_sent[1s]) by (type)
Label Matchers Line & Label Filters
Parser Unwrap Expression Grouping
Expression
{ level!="error"} |~ “ParsingException”
{emitter="janus"} | json type=”deep.nested.type” | type="32"
Label Matchers Parser Label Filter
LogQL JSON Parsing: under the hood
C
L
I
C
K
H
O
U
S
E
C
L
O
K
I
Label Matchers Line Filter
https://cloki.org
As query complexity increases, so does its cost - a fair counterweight to the cheaper inserts and selects
cLoki vs Loki
Performance Benchmark
cLoki vs Loki Performance Benchmark https://cloki.org
3.6x faster
cLoki vs Loki Performance Benchmark https://cloki.org
2x-4x faster
cLoki vs Loki Performance Benchmark https://cloki.org
Loki unwrap interval
cLoki unwrap interval
cLoki
CH Queries
Clickhouse Query Wrappers
<aggr op> <labels> <database>.<table>
sum by (ruri_user, from_user) (rate(duration[300])) from my_database.my_table where duration > 10
<optional conditions>
<function>(<metric>[<range>])
Not into logs? cLoki can also act as a gateway to run queries to existing clickhouse data tables with the same query model:
https://cloki.org
Clickhouse Query Wrappers
<aggr op> <labels> <database>.<table>
sum by (ruri_user, from_user) (rate(duration[300])) from my_database.my_table where duration > 10
<optional conditions>
<function>(<metric>[<range>])
cLoki provides a simplified query model for Visualizing metrics and tags dynamically out of (almost) any clickhouse table
This approach requires no preparation and poses no discrimination as of how data is accessed or inserted into Clickhouse.
clickhouse({
db="my_database",
table="my_table",
tag="ruri_user, from_user",
metric="sum(duration)",
where="duration > 10",
interval="300"
timefield="record_datetime"
})
https://cloki.org
SELECT ruri_user, from_user,
groupArray((t, c)) AS groupArr
FROM (
SELECT (intDiv(toUInt32(record_datetime),
300)*300)*1000 AS t,
from_user, ruri_user,
sum(duration) c
FROM my_database.my_table
PREWHERE record_datetime BETWEEN
1610533076 AND 1610536677
AND duration > 10
GROUP BY t, ruri_user, from_user
ORDER BY t, ruri_user, from_user
) ...
cLoki
Integrations
cLoki Integrations https://hepic.tel
The first native implementation was in our own HEP stack (HOMER, HEPIC, HEP.*) offering live logging features
cLoki Integrations https://sipfront.com
As we all know, there’s only one way to test technologies and that’s in production with lots of data and metrics!
sipfront.com kindly provided us with variety of Logs, WebRTC and VoIP/SIP analytics from their automated testing.
cLoki Integrations https://cloki.org
In less than a week they went from zero to a beautiful live integration, tracking all service vitals in realtime!
cLoki Integrations https://cloki.org
The browser and complex logs of WebRTC are a great opportunity and use case for cLoki and Clickhouse
Roadmap
The project roadmap will be user and community driven with several items already being developed:
✓ Plugin subsystem to extend LogQL with parsers, functions and macros
䷢ Window Functions for smoothing/upsampling series (21.4)
䷢ Fingerprinting using native Clickhouse HASH functions
❏ Dictionaries as caches for Labels & Label Values
❏ Live Query Mode using WATCH or VIEW piping to WSS
❏ Alerting notifications using MV + URL engine
❏ Async INSERTS and Semistructured Data INSERTS
❏ Find more Sponsors or adopting orgs to continue long-term
https://cloki.org
info@qxip.net
Upcoming presentations about cLoki:
@Clickhouse Meetup October 27, 2021
@Cluecon October 28, 2021
@CommCon December 2021
https://cloki.org
Open-Source made by humans:
- Lorenzo Mangani
- Alexandr Dubovikov
- Volodymyr Akchurin
- Jachen Duschletta
https://metrico.in

More Related Content

PDF
Grafana Loki: like Prometheus, but for Logs
PDF
ClickHouse Monitoring 101: What to monitor and how
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
PDF
Using ClickHouse for Experimentation
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Grafana Loki: like Prometheus, but for Logs
ClickHouse Monitoring 101: What to monitor and how
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Using ClickHouse for Experimentation
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf

What's hot (20)

PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
10 Good Reasons to Use ClickHouse
PDF
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
VictoriaLogs: Open Source Log Management System - Preview
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PPTX
Netflix viewing data architecture evolution - QCon 2014
PDF
ClickHouse Keeper
PDF
Deploying Flink on Kubernetes - David Anderson
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
Loki - like prometheus, but for logs
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
Flink powered stream processing platform at Pinterest
PPTX
Airflow presentation
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
[Meetup] a successful migration from elastic search to clickhouse
ClickHouse Deep Dive, by Aleksei Milovidov
10 Good Reasons to Use ClickHouse
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Evening out the uneven: dealing with skew in Flink
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tuning Apache Kafka Connectors for Flink.pptx
VictoriaLogs: Open Source Log Management System - Preview
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Apache Iceberg - A Table Format for Hige Analytic Datasets
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Netflix viewing data architecture evolution - QCon 2014
ClickHouse Keeper
Deploying Flink on Kubernetes - David Anderson
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Loki - like prometheus, but for logs
All about Zookeeper and ClickHouse Keeper.pdf
Flink powered stream processing platform at Pinterest
Airflow presentation
Better than you think: Handling JSON data in ClickHouse
[Meetup] a successful migration from elastic search to clickhouse
Ad

Similar to cLoki: Like Loki but for ClickHouse (20)

PDF
Choose Your Own Adventure to Get Started with Grafana Loki
PDF
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
PDF
Linking Metrics to Logs using Loki
PDF
Linking Metrics to Logs using Loki
PDF
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
KEY
London devops logging
PDF
OSMC 2019 | Grafana Loki: Like Prometheus, but for Logs by Ganesh Vernekar
KEY
Messaging, interoperability and log aggregation - a new framework
PPTX
Grafana Loki (Monitoring Tool) Presentation
PDF
Elk stack @inbot
KEY
Zero mq logs
PPTX
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
PDF
Monitoring with Clickhouse
KEY
Message:Passing - lpw 2012
PDF
Roundtable_-_API_Research__Testing_Tools.pdf
DOCX
Log management with_logstash_and_elastic_search
PPTX
ELK-Stack-Grid-KA-School.pptx
PPTX
Building trust within the organization, first steps towards DevOps
PDF
ELK stack introduction
PPTX
Rootconf 2017 - State of the Open Source monitoring landscape
Choose Your Own Adventure to Get Started with Grafana Loki
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
Linking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
London devops logging
OSMC 2019 | Grafana Loki: Like Prometheus, but for Logs by Ganesh Vernekar
Messaging, interoperability and log aggregation - a new framework
Grafana Loki (Monitoring Tool) Presentation
Elk stack @inbot
Zero mq logs
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Monitoring with Clickhouse
Message:Passing - lpw 2012
Roundtable_-_API_Research__Testing_Tools.pdf
Log management with_logstash_and_elastic_search
ELK-Stack-Grid-KA-School.pptx
Building trust within the organization, first steps towards DevOps
ELK stack introduction
Rootconf 2017 - State of the Open Source monitoring landscape
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Mega Projects Data Mega Projects Data
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to machine learning and Linear Models
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
climate analysis of Dhaka ,Banglades.pptx
Database Infoormation System (DBIS).pptx
Miokarditis (Inflamasi pada Otot Jantung)
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Mega Projects Data Mega Projects Data
STUDY DESIGN details- Lt Col Maksud (21).pptx
Supervised vs unsupervised machine learning algorithms
Introduction to machine learning and Linear Models
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
.pdf is not working space design for the following data for the following dat...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
annual-report-2024-2025 original latest.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Acumen Training GuidePresentation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

cLoki: Like Loki but for ClickHouse

  • 2. cLoki like Loki, but for Clickhouse cLoki is not affiliated or endorsed by Grafana Labs. All rights belong to their respective owners.
  • 3. Hello There Lorenzo Mangani - Co-Founder and CEO @ QXIP BV - Proud father of four awesome kids and dozens of github repositories - Parsing PCAPs, VoIP/RTC packets, events, statistics and logs since ~2007 QXIP BV - Based in Amsterdam (NL) and Valencia (ES) with Remote Devs/Ops teams (UK/US/DE/UA) - Research & Development of OSS & Commercial VoIP/RTC Monitoring Technologies - Running a Healthy, Self-Sustainable OSS Business Model w/ great Technical Partnerships - Worldwide Customer base ranging from Tiny Startups up to massive Fortune Corporations - Experienced making OSS patches/plugins/extensions for many popular projects and platforms - Clickhouse Centric development since around 2017, working on cLoki since 2019 https://qxip.net
  • 4. Art of Monitoring Monitoring and Troubleshooting are our passions at qxip Over the last decade we’ve successfully assisted thousands of customers and deployments using our technologies, working with some of the largest Network Operators and Corporations around running State-of-the-Art RTC and VoIP/Telephony Services at massive scale, learning the arts from real-life issues of watching mission critical setups. Billions of calls each day are safely captured, dissected and indexed using our stack by our Customers worldwide. Clickhouse powers the most critical part of our design, and provides us with the reliability our Customers expect.
  • 5. Art of Monitoring Monitoring and Troubleshooting are our passions at qxip Over the last decade we’ve successfully assisted thousands of customers and deployments using our technologies, working with some of the largest Network Operators and Corporations around running State-of-the-Art RTC and VoIP/Telephony Services at massive scale, learning the arts from real-life issues of watching mission critical setups. Billions of calls each day are safely captured, dissected and indexed using our stack by our Customers worldwide. Clickhouse powers the most critical part of our design, and provides us with the reliability our Customers expect. While our software stack has perfected the art of VoIP, to this day, about 20% of our activities are log based. Good old logs, growing in complexity, each with its format and requiring a gazillion parsers and tools. Not to mention WebRTC and Browser logs with their unstructured and fast-pacing changes…. Clickhouse itself is huge log producer! We all know working with logs can be painful and expensive!
  • 6. Logging Pains Logs and Metrics are a key part of any troubleshooting. There are plenty of open-source and commercial solutions offering suites with parsing support, correlation and even amazing visualizations. Unfortunately many are not ideal. ● Using ELK, Splunk or and any full-text indexing or AWS solution for logs is waaaay too expensive at scale ● InfluxDB, Timescale & co are powerful but very much incompatible with high cadinality / entropy tags in logs/series ● Prometheus metrics are so great but static timeseries do not magically answer all questions alone ● Custom solutions using custom approaches fall short or dead too quickly (big names included) ● Log formats change. Metric reporters change. Debug logs change. Everything changes, all the time! . . . How much wood would a woodchuck chuck?
  • 7. Logging Pains Logs are a key part of any troubleshooting. There are plenty of open-source and commercial solutions offering complete suites of parsing support, correlation and even amazing visualizations. Unfortunately many are not ideal. ● Using ELK, Splunk or and any full-text indexing or AWS solution for logs is waaaay too expensive at scale ● InfluxDB, Timescale & co are powerful but very much incompatible with high entropy tags in logs/series ● Prometheus metrics is great but static timeseries do not magically answer all questions ● Custom solutions using custom approaches fall short or dead too quickly (big names included) ● Log formats change. Metric reporters change. Debug logs change. Everything changes, all the time. In 2019 Loki was announced by Grafana Labs with an interesting design goal: ● Logs should be cheap to store and scale ● Log Search should be easy to Operate ● Metrics and Logs need to work together
  • 9. What is Loki Compared to other log aggregation systems, Grafana Loki: ★ Does not do full-text indexing on logs. By only indexing metadata, Loki is simpler to operate and cheaper to run ★ Indexes and groups log streams using the same labels as Prometheus, enabling switch between metrics and logs ★ Uses LogQL query language with native support in Grafana A Loki-based logging stack consists of 3 components: ★ Agents: responsible for gathering logs and sending them to Loki. ★ Loki: responsible for storing logs and processing queries. ★ Grafana: responsible for querying and displaying the logs. https://grafana.com/oss/loki/
  • 10. What is Loki Grafana Loki and Cortex have a complex architecture design we leave up to their excellent online documentation. https://grafana.com/oss/loki/ ➔ Promtail from Grafana ➔ Pastash from qxip ➔ Logstash from Elastic ➔ Fluentd ➔ Loki4j ➔ & others We immediately started using it and integrating!
  • 11. What is Loki Grafana Loki can of course collect Clickhouse logs… https://grafana.com/oss/loki/ … everything is working fine but ...
  • 12. What is Loki Grafana Loki can of course collect Clickhouse logs… but what if … https://grafana.com/oss/loki/ … hear me out …
  • 13. What is Loki Grafana Loki can of course collect Clickhouse logs… but what if … we just replaced its core with Clickhouse? https://grafana.com/oss/loki/
  • 14. cLoki like Loki, but for Clickhouse cLoki is not affiliated or endorsed by Grafana Labs. All rights belong to their respective owners. cLoki is a clear room implementation with exactly zero lines of code from the original loki application
  • 15. Loki vs cLoki ? If Loki already exists and it works, why do we need cLoki? ★ Grafana Loki’s design concepts and native LogQL integration are brilliant - but we just love to use Clickhouse! ★ Clickhouse is missing a standardized logging/metrics solution and using any other database feels like blasphemy ★ Good Ideas are the fuel of innovation and this is one. A good way to really understand something is to replicate it OK, but Grafana supports datasource plugins for Clickhouse, why do we need cLoki? ★ cLoki offers native Query and Push features with the same tools and APIs Loki uses for Datasources and Alerting ★ Grafana Loki does not sign open source plugins at all and there’s no easy way to get them installed on their cloud ★ Clickhouse SQL dialect is powerful but has a steep learning curve LogQL & cLoki can easily help compensate https://cloki.org NOTE: cLoki is a clear room implementation and uses exactly zero lines of code from the original loki application
  • 16. Loki vs cLoki Project Challenges and Ambitions ★ Loki API is overall quite simple and easy to fully map and emulate - done! ★ LogQL has lots of functionality to emulate but step by step - Log ★ Logs are relatively easy to handle, Timeseries math and precision can be tedious Project Strengths: Keep it simple! ★ Clickhouse is the center of our design and cLoki is only intended as a useful API on top of its great core features ★ Not a clone but rather a study in finding and adapting effective analogies from Clickhouse usage best-practices ★ Long Term strategy rather than reinventing the wheel, we’re integrating with a stable platform with ~1M happy users ★ Portability to other languages while retaining compatibility is possible with a simple and consistent model (->golang) https://cloki.org NOTE: cLoki is a clear room implementation and uses exactly zero lines of code from the original loki application
  • 17. Meet cLoki https://cloki.org “Let’s rebuild Loki on top of Clickhouse!”
  • 18. Meet cLoki “Let’s rebuild Loki on top of Clickhouse!” Just like Loki, cLoki does not parse or index incoming logs, but rather groups log streams using Prometheus-like Log Labels Clickhouse does all the heavy-lifting only leaving the filtered LogQL interpolation stages (parsing, extractions, etc) to the client Promtail and any other Loki compatible client work transparently. We even bundle our own log collector and multi-IO processor: can extract logs from files, queues, pipes, sockets, etc https://cloki.org
  • 19. Meet cLoki cLoki was designed to be thin, low profile and easy to extend: ★ Fast & Modular API design w/ Fastify NodeJS ★ JSON/Protobuf Push Bulk Inserts w/ LRU Buffering ★ Consistent Label Fingerprinting w/ LRU Caching ★ TTL Based Log Rotation w/ Customizable Intervals ★ LogQL Emulation (AST) w/ Clickhouse Query Transpiler ○ Fully transparent to Grafana and any LogQL tool ○ Native datasource support for Visualization & Alerting ○ Fully Extensible with external Plugins and npm Modules Some bonus features are also available as collateral: ★ Clickhouse Query function gateway ○ Extract metrics from any clickhouse table using LogQL functions ★ Clickhouse Storage, Compression & Clustering options ○ Store using disks or object storage, cluster using clickhouse-keeper ★ Additional Input Emulation ○ Accept influx/telegraf/prometheus input for receiving logs, metrics https://cloki.org LRU cache
  • 20. Push cLoki Example handler /loki/api/v1/push https://cloki.org LRU cache { "labels": {"cloki":"UP", “db”:“clickhouse”}, "entries":[ {"timestamp":"2020-12-25T00:00:06.944Z","line":"hello!"}, ] } HASH({"cloki":"UP", “db”:“clickhouse”}) = FINGERPRINT TS, FINGERPRINT = "hello!" LRU cache LRU bulk
  • 21. Push cLoki: under the hood fingerprint = hash(labels.sorted); INSERT INTO cloki.time_series (date, fingerprint, labels, name) VALUES (?, ?, ?, ?) INSERT INTO cloki.samples (fingerprint, log.timestamp_ms, log.line, string) VALUES (?, ?, ?, ?) {"labels": {"cloki":"UP", “db”:“clickhouse”}, "entries":[{"timestamp":"2020-12-25T00:00:06.944Z","line":"hello!"}]} Labels Logs https://cloki.org LRU bulk
  • 22. LogQL Label & Log Browser Navigate all Labels and Values available in a selected range of time using the native Grafana Log Browser or just curl LRU cache LRU https://cloki.org
  • 23. Query cLoki Example handler /loki/api/v1/query_range https://cloki.org LRU cache {“db”:“clickhouse”} |~ “click” SELECT FINGERPRINT ... JSONExtractString(labels,'db') == 'clickhouse') SELECT ... WHERE FINGERPRINT ... extractAllGroups(string,'(click)') != [] LRU cache
  • 24. Query cLoki: under the hood WITH str_sel as ( SELECT fingerprint FROM cloki.time_series WHERE JSONHas(labels, 'level') AND JSONExtractString(labels, 'level') != 'error' ), sel_a as ( SELECT DISTINCT time_series.labels as labels, samples.string as string, time_series.fingerprint as fingerprint, samples.timestamp_ms as timestamp_ms FROM cloki.samples LEFT JOIN cloki.time_series ON samples.fingerprint = time_series.fingerprint WHERE samples.fingerprint IN str_sel AND extractAllGroups(string, '(ParsingException)') != [] AND timestamp_ms >= toUnixTimestamp(now()-60)*1000 AND timestamp_ms <= toUnixTimestamp(now())*1000 ORDER BY timestamp_ms desc, labels desc LIMIT 1000 ) SELECT * FROM sel_a ORDER BY labels desc, timestamp_ms desc FORMAT JSONEachRow { level!="error"} |~ “ParsingException” Label Matchers Line Filter https://cloki.org
  • 25. LogQL Query Language Loki’s secret is LogQL, a PromQL-inspired query language acting like a distributed grep to aggregate and filter log sources. LogQL uses labels and operators for filtering: There are two types of LogQL queries: ➔ LOG queries return the contents of filtered log lines ➔ METRIC queries calculate values based on log extractions https://grafana.com/docs/loki/latest/logql/
  • 26. LogQL Label Filtering https://cloki.org
  • 27. LogQL Regex Filtering https://cloki.org
  • 28. LogQL Regex Filtering + Regex Parsing https://cloki.org
  • 29. LogQL Regex Filtering + Regex Parsing + Unwrap https://cloki.org
  • 30. LogQL JSON Parsing https://cloki.org
  • 31. LogQL JSON Parsing The JSON parser can operate without parameters to extract and flatten any available key: When used with parameters, only the specified keys are parsed to their selected labels: Use in combination with Label Filters and Aggregation Functions to project log metrics to grouped timeseries: https://cloki.org
  • 32. LogQL JSON Parsing + Label Filtering https://cloki.org
  • 33. LogQL JSON Parsing + Unwrap Aggregations https://cloki.org
  • 35. avg_over_time({emitter="janus"} | json | type="32" | event_media="video" | unwrap event_packets_sent[1s]) by (type) Label Matchers Line & Label Filters Parser Unwrap Expression Grouping Expression { level!="error"} |~ “ParsingException” {emitter="janus"} | json type=”deep.nested.type” | type="32" Label Matchers Parser Label Filter LogQL JSON Parsing: under the hood C L I C K H O U S E C L O K I Label Matchers Line Filter https://cloki.org As query complexity increases, so does its cost - a fair counterweight to the cheaper inserts and selects
  • 37. cLoki vs Loki Performance Benchmark https://cloki.org 3.6x faster
  • 38. cLoki vs Loki Performance Benchmark https://cloki.org 2x-4x faster
  • 39. cLoki vs Loki Performance Benchmark https://cloki.org Loki unwrap interval cLoki unwrap interval
  • 41. Clickhouse Query Wrappers <aggr op> <labels> <database>.<table> sum by (ruri_user, from_user) (rate(duration[300])) from my_database.my_table where duration > 10 <optional conditions> <function>(<metric>[<range>]) Not into logs? cLoki can also act as a gateway to run queries to existing clickhouse data tables with the same query model: https://cloki.org
  • 42. Clickhouse Query Wrappers <aggr op> <labels> <database>.<table> sum by (ruri_user, from_user) (rate(duration[300])) from my_database.my_table where duration > 10 <optional conditions> <function>(<metric>[<range>]) cLoki provides a simplified query model for Visualizing metrics and tags dynamically out of (almost) any clickhouse table This approach requires no preparation and poses no discrimination as of how data is accessed or inserted into Clickhouse. clickhouse({ db="my_database", table="my_table", tag="ruri_user, from_user", metric="sum(duration)", where="duration > 10", interval="300" timefield="record_datetime" }) https://cloki.org SELECT ruri_user, from_user, groupArray((t, c)) AS groupArr FROM ( SELECT (intDiv(toUInt32(record_datetime), 300)*300)*1000 AS t, from_user, ruri_user, sum(duration) c FROM my_database.my_table PREWHERE record_datetime BETWEEN 1610533076 AND 1610536677 AND duration > 10 GROUP BY t, ruri_user, from_user ORDER BY t, ruri_user, from_user ) ...
  • 44. cLoki Integrations https://hepic.tel The first native implementation was in our own HEP stack (HOMER, HEPIC, HEP.*) offering live logging features
  • 45. cLoki Integrations https://sipfront.com As we all know, there’s only one way to test technologies and that’s in production with lots of data and metrics! sipfront.com kindly provided us with variety of Logs, WebRTC and VoIP/SIP analytics from their automated testing.
  • 46. cLoki Integrations https://cloki.org In less than a week they went from zero to a beautiful live integration, tracking all service vitals in realtime!
  • 47. cLoki Integrations https://cloki.org The browser and complex logs of WebRTC are a great opportunity and use case for cLoki and Clickhouse
  • 48. Roadmap The project roadmap will be user and community driven with several items already being developed: ✓ Plugin subsystem to extend LogQL with parsers, functions and macros ䷢ Window Functions for smoothing/upsampling series (21.4) ䷢ Fingerprinting using native Clickhouse HASH functions ❏ Dictionaries as caches for Labels & Label Values ❏ Live Query Mode using WATCH or VIEW piping to WSS ❏ Alerting notifications using MV + URL engine ❏ Async INSERTS and Semistructured Data INSERTS ❏ Find more Sponsors or adopting orgs to continue long-term https://cloki.org
  • 49. info@qxip.net Upcoming presentations about cLoki: @Clickhouse Meetup October 27, 2021 @Cluecon October 28, 2021 @CommCon December 2021 https://cloki.org Open-Source made by humans: - Lorenzo Mangani - Alexandr Dubovikov - Volodymyr Akchurin - Jachen Duschletta https://metrico.in