OSMC 2024 | Netdata: Open Source, Distributed Observability Pipeline – Journey and Challenge.pdf

Netdata
the open-source observability
platform everyone needs
Costa Tsaousis

Page:
2
About Netdata
● 72k Github Stars!
Netdata is one of the most popular observability projects!
● Leading the Observability category in CNCF!
In terms of Github stars: Netdata is 1st, then was Elasticsearch with 70k stars,
Grafana has 65k, Prometheus has 56k, etc.
● 1.5 million downloads every day!
Docker hub is counting 650M pulls so far.
Cloudflare reports 51TB of transferred data per month.
GitHub URL:

Page:
3
What makes Netdata unique?
● Easy
Just install it on your servers and you are done!
Dashboards and alerts are automatically created for you.
● Real-Time
Per second resolution of all data.
Just 1-second data collection to visualization latency!
● A.I. everywhere
Machine Learning learns the patterns of all your data and detects anomalies without
any manual intervention or configuration!
● Cost Efficient
Designed to be used out-of-the-box and significantly lower operational costs.

Page:
4
Netdata Promises
● Feel the pulse of your infrastructure!
High ﬁdelity monitoring, revealing the micro-world of your services and applications.
● 50% to 90% observability cost reduction!
Most of our users experience signiﬁcant observability cost reduction, compared to the
solutions they were using before Netdata.
● Half Mean Time to Resolution!
Machine learning for all metrics helps correlating and revealing the root cause of issues,
faster and easier.

Page:
5
Observability Generations
1st
generation
Checks
2nd
generation
Metrics
3rd
generation
Logs
5th
generation
Distributed
Check it is there
and works
4rd
generation
Integrated
Sample it
periodically
Collect and
index its logs
All in One All in One,
Real-Time,
High-Fidelity,
AI-Powered,
Cost-Efﬁcient
Nagios, Icinga,
Zabbix, Sensu,
CheckMk, PRTG,
SolarWinds
Graphite,
OpenTSDB,
Prometheus,
InfluxDB
ELK
Splunk
Datadog,
Dynatrace,
New Relic,
Instana,
Grafana
Netdata
Blog post for more details:

Page:
7
Typical Observability Design
Servers, VMs, Applications
Central
Observability
Databases
Observability Data
Metrics, Logs, Traces
Dashboards
📊 📈
Alert Notiﬁcations
⚠ 🔥

Page:
● Business logic is everywhere.
What to collect, what to visualize and how, what to alert,
what to index, what to retain and for how long.
● Scalability is challenging
As the infra grows, these central DBs need to scale, be
highly available and perform at scale.
● A lot of moving parts
Database servers, application servers, a lot of integrations
and protocols, and a tone of different technologies.
● Expensive
Skills, resources, a lot of time - or a lot of money.
Low-ﬁdelity
and expensive!
Observability
requires serious
resources and
skills, and is
never enough!
8
The Effects of a Traditional Observability Pipeline

Page:
9
The Results…
● Low Fidelity Insights
The design of most tools force users to lower granularity and cardinality, resulting in abstract
views, that are not useful in revealing the true dynamics of the infrastructure.
● Inefﬁcient
They enforce a development lifecycle in setting up and maintaining observability, requiring
advanced skills, a lot of discipline and huge amounts of time and effort.
● Limited (or no) AI and Machine Learning
Most observability solutions are pretty dump. And even when they say they are smarter, they
usually use tricks, not true machine learning.
● Expensive
Observability is frequently more expensive than the cost of the infrastructure that is being
monitored.

Page:
10
The Current State of Observability
● Too little observability
Traditional check-based systems, like Nagios, Icinga, Zabbix, Sensu, PRTG, CheckMk,
SolarWinds, etc. have evolved around the idea of checks or sensors. Each check has a status,
probably annotated with some text and a few metrics. Checks are executed every minute.
Equivalent to “trafﬁc lights” monitoring. Reliable and robust, but not enough for today's needs,
suffering from extensive “blind spots”.
● Too complex observability
DIY platforms, like the Grafana ecosystem, have evolved around a much better design (metrics,
logs, traces) and although they are more powerful and highly customizable, they introduce way
too many moving parts, require signiﬁcant skills and expertise, have a steep learning curve, and
they quickly become overcomplicated to maintain and scale.
● Too expensive observability
Commercial vendors, like Datadog, Dynatrace, NewRelic, Instana, each with its own strengths
and weaknesses, although sometimes they are easier and better (to a different extend each), they
are unrealistically expensive.

Page:
11
The Design of Observability keeps Fidelity low
● What affects fidelity?
Granularity (the resolution of the data) and cardinality (the number of entities monitored),
control the amount of data a monitoring system must ingest, process, store and retain.
Low granularity = blurry data, not that detailed
Low cardinality = blind spots, not everything is monitored
Low granularity + low cardinality = abstract view of the infrastructure, lacking detail and coverage
● Why not having high fidelity?
Centralization is the key reason for keeping granularity and cardinality low.
Example: a system monitoring 3000 metrics every second (3k samples/s), has to process, store
and query 450x more data compared to a system monitoring 100 metrics every 15 seconds (<7
samples/s).
Centralization makes fidelity and cost proportional to each other; increasing fidelity results in
higher costs, and reducing costs leads to a decrease in fidelity.

Page:
13
Decentralized Design For High Fidelity
● Keep data at the edge
By keeping the data at the edge:
○ Compute & storage resources are already available and spare
○ No need for network resources
○ The work to be done is small and it can be optimized,
so that monitoring is a “polite citizen” to production applications
● Multiple independent centralization points
Mini centralization points may exist, as required for operational needs:
○ Ephemeral nodes, that may vanish at any point in time
○ High availability of observability data
○ Offloading “sensitive” production systems from observability work
● Unify and integrate everything at query time
To provide uniﬁed infrastructure-wide views, query edge systems (or the mini centralization points), aggregate
their responses and provide high-resolution, real-time dashboards and alerts.

Page:
14
Netdata’s Decentralized Design: Smart Agent
Instead of centralizing
the data, Netdata
distributes the code!
All servers provide
dashboards and alerts,
all data remain at the
edge!

Page:
15
Netdata’s Decentralized Design: Netdata Parents
Data Center A
Netdata Parent A
Cloud Provider B Cloud Provider C
Multiple independent centralization points for
high availability, each providing, aggregated
dashboards and alerts!
A Netdata Parent is the open-source Netdata Agent software, conﬁgured to accept data from other Netdata Agents.
Netdata Parent B Netdata Parent C

Page:
16
Netdata’s Decentralized Design: Netdata Cloud
Netdata Cloud
A thin layer, a proxy, that
distributes queries to
Netdata agents and merges
their responses on the fly, to
provide aggregated views.
Netdata Cloud does not store any
observability data. It is available as
SaaS or On-Prem.
Netdata
Cloud

Page:
17
Hybrid & Multi-Cloud Support
Data Center A
Netdata Parent A
Cloud Provider B Cloud Provider C
Netdata Cloud communicates only with Netdata Parents to
provide infrastructure level views.
Netdata Parent B Netdata Parent C
Netdata Cloud

Page:
18
The pipeline inside every Netdata Agent
Metrics Logs

Page:
19
Common Concerns about Decentralized Designs
● The agent will be heavy
No! The Netdata agent, that is a complete monitoring pipeline in a box, processing may thousands of metrics per
second (vs. others that process just a few hundreds every minute), is one of the lightest observability agents
available.
● Queries will influence production systems
No! Each agent serves only its own data. Querying such a small dataset is lightweight and does not influence
operations. For very sensitive or weak production systems, a mini-centralization point next to these systems will
isolate them from queries (and also offload them from ingestion, processing, storage and retention).
● Queries will be slower
No! They are actually faster! Distributing tiny queries in parallel to multiple systems, provides an aggregate compute
power that is many times higher to what any single system can provide.
● Will require more bandwidth
No! Querying is selective, most of the observability data are never queried unless required for exploration or
troubleshooting. And even then, just a small portion of the data is examined.
So, the overall bandwidth used is a tiny fraction compared to centralized systems.

Page:
20
Edge Resources Required (per node)
Resource Dynatrace Datadog Instana Grafana Netdata
Granularity 1-minute 15-seconds 1-second 1-minute 1-second
Technology Coverage Average High Low Average Excellent
CPU Usage (100% = 1 core) 12% 14% 6.7% 3.3% 3.6%
Memory Usage 1400 MB 972 MB 588 MB 414 MB 181 MB
Disk Space 2 GB 1.2 GB 0.2 GB - 3 GB
Disk Read Rate - 0.2 KB/s - - 0.3 KB/s
Disk Write Rate 38.6 KB/s 8.3 KB/s - 1.6 KB/s 4.8 KB/s
Egress Internet Bandwidth 11.4 GB/mo 11.1 GB/mo 5.4 GB/mo 4.8 GB/mo 0.01 GB/mo
The Netdata agent is a full monitoring in a box, and still is
one of the most lightweight agents among commercial
observability offerings! full comparison URL:

Page:
● -35% CPU Utilization
Netdata: 1.8 CPU cores per million of metrics/s
Prometheus: 2.9 CPU cores per million of metrics/s
● -49% Peak Memory Consumption
Netdata: 49 GiB
Prometheus: 89 GiB
● -12% Bandwidth
Netdata: 227 Mbps
Prometheus: 257 Mbps
● -98% Disk I/O
Netdata: 3 MiB/s (no reads, 3 MiB/s writes)
Prometheus: 129 MiB/s (73.6 MiB/s reads, 55.2 MiB/s writes)
● -75% Storage Footprint
On 3 TiB of storage:
Netdata: 10 days per-sec, 43 days per-min, 467 days per-hour
Prometheus: 7 days per-sec
Stress tested a
Netdata parent
and a Prometheus
with 500 servers,
40k containers,
at 2.7 million
metrics/s
21
Netdata as a time-series db vs Prometheus
:full comparison URL

Page:
In December 2023, University of Amsterdam published a study
related to the impact of monitoring tools for Docker based systems,
aiming to answer 2 questions:
● The impact of monitoring tools on the energy efficiency of
Docker-based systems
● The impact of monitoring tools on the performance of
Docker-based systems
They found that:
- Netdata is the most efficient tool,
Requiring significantly less system resources than the others.
- Netdata is excellent in terms of performance impact,
Allowing containers and applications to run without any
measurable impact due to observability.
Outperforming
other monitoring
solutions in edge
resources
efficiency!
22
Netdata is the most lightweight platform!
:full comparison URL

Netdata
Challenge 1:
Automated data collection,
visualization and alerting!

Page:
● Similar physical or virtual hardware
We all use a finite set of physical and virtual hardware. This
hardware may be different in terms of performance and
capacity, but the technologies involved are standardized.
● Similar operating systems
We all use flavors of a small set of operating systems,
exposing a finite set of metrics covering the monitoring of all
system components and operating system layers.
● Packaged applications
Most of our infrastructure is based on packaged applications,
like web servers, database servers, message brokers, caching
servers, etc.
● Standard libraries
Even for our custom applications, we usually rely on packaged
libraries that expose telemetry in a finite and predictable way.
Since we have
so much in
common, why it
takes so long to
set up a
monitoring
solution?
24
We have a lot in common

Page:
Netdata attaches NIDL metadata to time-series data, allowing
the identification of the infrastructure components (instances)
monitored.
This framework enables:
● Automated data collection
Netdata auto-discovers and collects all data sources on the nodes
it runs.
● Automated visualization
Netdata dashboards are generated by an algorithm. They are not
pre-configured or hard-coded. Each Netdata dashboard is unique
and is driven by the available metrics.
● Automated alerts
Alerts are pre-configured templates that are automatically attached
to their relative components (disks, network interfaces, systems,
databases, processes, web servers, etc).
NIDL stands for:
- Nodes
- Instances
- Dimensions
- Labels
The name comes from the slicing
controls on all Netdata charts.
25
NIDL, the model for rapid deployment

Page:
● Just install it!
One moving part: Netdata. Batteries included!
(i.e. data collection plugins and all needed modules are shipped
with Netdata).
● Battle-tested out-of-the-box alerts!
Netdata stock alerts detect common misconﬁgurations, errors,
and issues.
● Troubleshoot in seconds!
Data collection does not require conﬁguration, unless the
monitored data are password protected (Netdata needs the
password).
Data collection plugins provide metrics with the NIDL framework
embedded into them.
Designed to be
installed
mid-crisis!
26
Mission accomplished!

Netdata
Challenge 2:
get rid of the query language
for slicing and dicing data

Page:
● Since users haven’t conﬁgured the metrics
themselves, can we provide a UI that can
explain what users see?
● How users will be able to slice and dice the
data on any chart, the way it makes sense for
them?
Netdata
collects a vast
number of
metrics you will
probably see for
the ﬁrst time
28
Slice and dice from the UI

Page:
29
A Netdata Chart
Netdata Cloud Live Demo URL:
:Netdata Parent URL

Page:
30
Info Button: Help about this chart
Info button includes links to relevant documentation
and/or some helpful message about the metrics on each
chart.

Page:
31
A Netdata Chart - controls
Anomaly
rate ribbon
NIDL Controls - review data sources and slice/ﬁlter them
(NIDL = Nodes, Instances, Dimensions, Labels) Aggregation
across time
Aggregation
across metrics
Info
ribbon
Dice
the data

Page:
32
A Netdata Chart - anomaly rate per node
Instances per Node
contributing to this chart
Unique time-series per Node
contributing to this chart
The visible volume each Node
is contributing to this chart
The anomaly rate each Node
contributes to this chart
Clicked
on
Nodes
The minimum, average and maximum
values across all metrics this Node
contributes
Similar analysis is available per Instance
(“application” in this chart), dimensions, and
labels.
Filter Nodes
contributing data to this chart

Page:
33
Dicing any chart, without queries
Result: dimension,device_type

Page:
34
Info Ribbon: Missing data collections
A missed data collection is a gap, because something is wrong!
Netdata does not smooth out the data.

Page:
The Netdata query engine, does all the calculations, for all
drop down menus and ribbons in one go and returns
everything in a single query response.
All queries, include all information needed:
- Per Node
- Per Instance (disk, container, database, etc)
- Per Dimension
- Per Label Key
- Per Label Value
Providing:
- Availability of the samples (gaps), over time
- Min, Average and Maximum values
- Anomaly Rate for the visible timeframe
- Volume contributing to the chart
- Number of Nodes, Instances, Dimensions, Label Keys, Label
Values matched
All this
additional
information is
available on
every query,
every chart,
every metric!
35

Netdata
Challenge 3:
make machine learning and anomaly
detection useful for observability

Page:
Wednesday, 2 October, 2019
Todd Underwood, Google
The vast majority of proposed production engineering uses
of Machine Learning (ML) will never work. They are
structurally unsuited to their intended purposes. There are
many key problem domains where SREs want to apply ML
but most of them do not have the right characteristics to be
feasible in the way that we hope. After addressing the most
common proposed uses of ML for production engineering
and explaining why they won't work, several options will be
considered, including approaches to evaluating proposed
applications of ML for feasibility. ML cannot solve most of
the problems most people want it to, but it can solve
some problems. Probably.
Google:
All of Our ML
Ideas Are Bad
(and We Should Feel Bad)
37
AI for observability is tricky
:URL

Page:
ML is probably the simplest way to model the
behavior of individual metrics.
So, given enough past values of a metric, ML can
tell us if the value we just collected is an outlier
or not.
We call this Anomaly Detection.
It is just a bit. True or False.
Over a period of time, we calculate the Anomaly
Rate, ie the % of samples that found to be
anomalous.
Using ML we
can have a
simple and
effective way to
learn the
behavior of our
servers.
38
So what can ML do for us?

Page:
● Unsupervised and Reliable
It should work by itself to detect anomalies, reliably.
● Real-time
Anomalies should be detected in real-time, as metrics are
collected.
● Lightweight
Training at the edge should not affect production
applications.
● Stored in the database
The anomaly bit of each sample, should be part of the sample
for its lifetime, so that we can query for past anomaly rates.
● Integrated everywhere
Anomaly information should be an integral part of the
platform.
Netdata offers
Anomaly
Detection for all
metrics, all
charts, all
dashboards,
and it just
works, totally
unsupervised.
39
Objectives for ML in Netdata

Page:
● Netdata trains a ML model per metric, every 3
hours, using the last 6 hours of data of each
metric. The models are overlapping.
● It maintains 18 ML models per metric.
● Every 3 hours, a new model is generated and the
oldest is removed, So, Netdata remembers the last
57 hours (2days, 9 hours).
● All available ML models for a metric need to agree
that a collected sample is an outlier, for Netdata to
consider it an Anomaly.
This reduces the noise signiﬁcantly.
Machine Learning
needs to forget
the past,
otherwise
anomalies will be
“business as
usual” forever.
40
Rolling Unsupervised Anomaly Detection

Page:
● Many metrics are usually zero.
Like hardware errors, rare exceptions, etc. These are
usually covered by alerts that check for non-zero values.
So, all such metrics do not need to be trained.
● Other metrics are usually constant.
Like the available memory of a server, or the pool of
connections of a database server. These metrics do not
need to be trained either.
● Sliced training for the rest.
For the rest of the metrics, we need to train a model for
just 6 hours of their data, trained every 3 hours.
This is usually less than training 1 metric per second.
Netdata needs ~2 CPU cores, for training ML models and
detecting anomalies, for 1 million unique metrics/s.
Sliced training and
careful
consideration of
the metrics that
beneﬁt from ML,
allows Netdata to
be lightweight.
41
Lightweight

Page:
● The anomaly bit is stored in the db.
Anomaly information is stored in the database together
with each sample collected.
We developed a custom floating point number, which
includes the anomaly bit (much like IEEE 745 stores the
sign of floating point numbers), ensuring that there is no
overhead at all.
● Anomaly rate is calculated on the fly.
The Netdata query engine calculates the anomaly rates
for all metrics, on the fly, in one go.
● Aggregated anomaly rate.
The Netdata query engine calculates aggregated anomaly
rates when combining multiple metrics in the same query,
providing a high level anomaly rate for each chart.
Netdata stores
anomalies
together with the
samples, so
anomaly based
queries are
possible.
42
Stored in the database

Page:
Point Anomalies or Strange Points: Single points that represent
very big or very small values, not seen before (in some statistical
sense).
43
What it can detect? (1/5)
Examples:
● A sudden, extreme spike in
the number of failed
transactions for your
database server.
● An unexpected, moment of
high CPU utilization or
sudden memory spike for
your application server.

Page:
Contextual Anomalies or Strange Patterns: Not strange points in
their own, but unexpected sequences of points, given the history
of the time-series.
44
Examples:
● A regular database job, or a
backup that did not run.
● A cap on the number of web
requests received.

Page:
Collective Anomalies or Strange Multivariate Patterns: Neither
strange points nor strange patterns, but in global sense something
looks off.
45
Examples:
● A network issue that
introduces a lot of
retransmits, lowers the
throughput of the web
server or the workload on
the database server.

Page:
Concept Drifts or Strange Trends: A slow and steady drift to a new
state.
46
Examples:
● A memory leak in an
application.
● An attack that is gradually
increased to its full load.
● A gradual increase in
response time latency.

Page:
Change Point Detection or Strange Step: A shift occurred and
gradually a new normal is established.
47
Examples:
● A faulty deployment that
does not serve all the
workload.

Page:
48
A Netdata dashboard
One fully automated
dashboard, with inﬁnite
scrolling, presenting and
grouping all metrics
available.
Quick access to all sections
using the index on the right.
Multi-dimensional data on
every chart, using chart
controls to slice and dice
any dataset.
AI assisting on every step.

Page:
49
A Netdata Dashboard - what is anomalous?
Time-frame picker
Anomaly rate
per section for
the time-frame
Anomaly Rate button

Page:
● Uses Host Anomaly Rate to identify
durations of interest.
● Host Anomaly Rate is the percentage of
the metrics of a host, that were found to
be anomalous concurrently.
● So, 10% host anomaly rate, means that
10% of all the metrics the host exposes,
were anomalous at the same time,
showing the spread of an anomaly.
Anomaly advisor
assists in ﬁnding
the needle in the
haystack.
50
Anomaly Advisor

Page:
51
Anomaly Advisor - starting point
Percentage of
Host Anomaly
Rate
Number of metrics
concurrently
anomalous

Page:
52
Anomaly Advisor - triggering the analysis
Highlighting an area
on the chart, triggers
the analysis

Page:
53
Anomaly Advisor - the analysis
Anomaly advisor
presents a sorted
list of all metrics,
ordered by their
anomaly rate,
during the
highlighted
time-frame.

Page:
Netdata turns AI to a consultant that can help you spot
what is interesting, what is related, what needs your
attention.
● Unsupervised
There are plenty of settings, but it just works behind the
scenes, learning how metrics behave and providing an
anomaly score for them.
● It is just another attribute for each of your metrics
Anomaly Rate is stored in the metrics database together with
every sample collected, making it possible to query the past
for anomalies.
● Can detect the spread of an anomaly across systems
and applications.
● Can assist ﬁnding the aha! moment while
troubleshooting.
Unsupervised
Anomaly
Detection is an
advisor!
54

Netdata
Challenge 4:
Make logs exploration and analytics,
easy and affordable.

Page:
● Is available everywhere!
We use it already, even when we don’t realize it.
● Is secure by design!
○ FSS, to seal the logs
○ Survives disk failures (uses tmpfs)
○ Its file format is designed for minimal data loss on disk
corruption
● Is unique!
○ Supports any number of fields, even per log entry
(think huge cardinality)
○ Indexes all fields provided
○ Queries on any combination of fields
○ Maintenance free - just works!
● Amazing ingestion performance!
● Can build logs centralization points
It provides all the tools and processes to centralize all the logs
of an infra to a central place.
systemd-journald
is a hidden gem,
that already lives
in our systems!
56
Systemd-journald

Page:
57
Netdata systemd-journal Logs Explorer

Page:
● Yes and No.
The query performance issues are simple implementation
glitches, easy to ﬁx.
● We submitted patches to systemd
We analyzed journalctl and found several issues that once
ﬁxed they improve query performance 14x.
We submitted these patches to systemd.
● Netdata systemd-journal Explorer
We managed to bypass all the performance issues
systemd-journal has, independently of the version of systemd
installed on a system.
Netdata is fast when querying systemd-journal logs on all
systems, even with a slow systemd-journal and journalctl.
systemd-journal
is not slow when
used with
Netdata
58
Systemd-journald: it is slow to query

Page:
● Yes it did.
Generally, very few tools are available to push structured logs
to systemd-journals.
● Netdata log2journal
We released log2journal, a powerful command line tool to
convert any kind of log into structured systemd-journal
entries. Think of it as the equivalent to promtail.
For json and logfmt formatted logs, almost zero
conﬁguration is needed.
● Netdata systemd-cat-native
We released systemd-cat-native, a tool similar to the
standard systemd-cat, which however allows sending a
stream of entries formatted in the systemd-journal native
format to a local or remote systemd-journald.
The value of a
logging system
depends on its
integrations
59
Systemd-journald: it lacks integrations
:URL for log2journal

Page:
Netdata provides all the tools and dashboards to explore
and analyse your system and applications logs, without
actually requiring a dedicated logs database server.
Despite the storage requirements of systemd-journald, the
tool is amazing, especially for developers, since it provides
great flexibility and troubleshooting features.
Even if you don’t want to push your traeﬁk, haproxy or
nginx access logs to it due to its storage requirements, we
strongly recommend to use it for application error logs
and exceptions. Your troubleshooting efforts can become
a lot simpler with this environment.
Netdata provides
the easiest and
more efﬁcient way
to access your
logs, by utilizing
resources and
tools you already
use today.
60

Netdata
Challenge 5:
Observability is more than metrics, logs
and traces. What is missing?

Page:
To completely understand or effectively troubleshoot an
issue, metrics, logs and traces may not be enough.
What if we need to examine:
● the slow queries on a database,
● the list of network connections an application has,
● the ﬁles in a ﬁlesystem,
● … and the plethora of non-metric, non-log,
non-tracing information available?
Most monitoring systems give up. You have to use the
console of your database server, ssh to the server, or (for
others :) restart the problematic component or application
and hope the issue goes away…
Can a monitoring system help?
To completely
understand, or
effectively
troubleshoot an
issue, we need
more!
62
Challenge

Page:
plugin
GP
63
Netdata Functions
A1 A2 A3 A4 A5
B1 B2 B3 B4 B5
C1 C2 C3 C4 C5
PA
PB
PC
Data Center 1 Data Center 2
Cloud Provider 1
User is accessing a
function exposed by
a data collection
plugin on B5
Alerting
Netdata Parent
Netdata Parent
Netdata Parent
Netdata
Grandparent
function

Page:
64
Example: Network Connections Explorer

Page:
● Data collection plugins expose Functions.
Functions have a name, some parameters, accept a payload,
return a payload and require some permissions to access them.
All these can be custom for each and every function.
● Parents are aware of their childrens’ Functions.
Parents are updated in real-time about changes to Functions, so
that all nodes involved in a streaming and replication chain are
always up to date for the available functions of the entire infra
behind them.
● Dashboards provide the list of Functions.
● Netdata UI supports widgets for Functions.
We are standardizing a set of UI widgets capable of presenting
different kinds of data, depending on which is the most
appropriate way for them to be presented.
Functions are
data collection
plugin features
to query
non-metric data
of any kind
65

Page:
● Horizontal Scalability
NC provides unified dashboards and alerts, and dispatches alerts centrally,
without the need to centralize all data on one server. Behind the scenes it queries
multiple Netdata and aggregates their responses on the fly.
● Role Based Access Control (RBAC)
NC allows grouping infrastructure and users in “war rooms”, limiting and
controlling users’ access to the infrastructure.
NC also acts as a Single-Sign-On provider for all your Netdata, limiting what users
can see even when they access Netdata directly.
● Access from anywhere
NC allows accessing your Netdata servers from anywhere, without the need for a
VPN.
● Mobile App for Notifications
NC enables the use of the Netdata Mobile App (iOS, Android) for receiving alert
notifications.
● Persisted Customizations and Dynamic Configuration
NC enables dynamic configuration and stores user settings, custom dashboards,
personalized views and related settings and options, per node, user, room, and
space.
Netdata Cloud (NC)
complements
Netdata
67
Monetization through SaaS
IMPORTANT
Netdata Cloud does not centralize your data.
Your data are always, and exclusively on-prem, inside the Netdata you install.
Netdata Cloud queries your Netdata in real-time, to present dashboards and alerts.

Thank You!
Costa Tsaousis
:GitHub URL, https://github.com/netdata/netdata

OSMC 2024 | Netdata: Open Source, Distributed Observability Pipeline – Journey and Challenge.pdf

More Related Content

Similar to OSMC 2024 | Netdata: Open Source, Distributed Observability Pipeline – Journey and Challenge.pdf (20)

Recently uploaded (20)

OSMC 2024 | Netdata: Open Source, Distributed Observability Pipeline – Journey and Challenge.pdf