SlideShare a Scribd company logo
A Deep Dive into 
Nagios Analytics 
Alexis Lê-Quôc (@alq) 
http://datadoghq.com
A Deep Dive into 
Nagios Analytics 
Alexis Lê-Quôc (@alq) 
http://datadoghq.com
@alq 
Dev & Ops 
Nagios user since 2008 
Datadog co-founder
A little survey
Top 3 failed checks
That woke me up 
That I responded to 
last week 
That I responded to 
5 weeks ago 
Top 3 failed checks 
That most of my team 
responded to at least once 
That impacts our business 
the most?
That woke me up 
That I responded to 
last week 
That I responded to 
5 weeks ago 
Top 3 failed checks 
That most of my team 
responded to at least once 
That impacts our business 
the most?
At best, finding local optimums 
Using memory to 
prioritize remediation... 
At worst, brownian motion
Analytics
Performance Metrics Nagios Traffic Other Sources 
In the “Cloud” 
Real-time graphs + analytics
Aggregation
Real-time Analytics 
(Nagios et al.)
Real-time Analytics
Nagios Traffic 
In the “Cloud” 
Real-time graphs + analytics
Nagios a “chatty” source 
out of 40+ Datadog supports
One example
Deep dive into Nagios analytics
Almost 13000 Nagios “events” 
over past week
Constant stream
86 notifications!
Pattern
Pattern
More data? 
More questions.
A diNaolto a sgci ewntifiict shtud ydata
6 
4 
2 
0 
0 250 500 750 
Host count 
Population 
factor(quartile) 
1 
2 
3 
4 
Nagios samples 
Population 
25% 
50% 
75% 
100% 
20 
93 
322 
904
Does size matter?
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
1 2 3 4 
0 250 500 750 1000 
Nagios alert per host 
count per week 
Weekly count per host split by quartile
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
1 2 3 4 
0 250 500 750 1000 
Nagios alert per host 
count per week 
Weekly count per host split by quartile 
Outliers 
Sick hosts, 
silenced checks
Notifications
Notifications 
1-3% of alerts notify 
Little difference per quartile
Does time of day 
matter?
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
12 
8 
4 
12 
8 
4 
12 
8 
4 
12 
8 
4 
1 2 3 4 
0 5 10 15 20 
Hour of Day (UTC) 
Alerts per hour
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
Mean about the same 
across quartiles 
● 
● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
12 
8 
4 
12 
8 
4 
12 
8 
4 
12 
8 
4 
1 2 3 4 
0 5 10 15 20 
Hour of Day (UTC) 
Alerts per hour 
Time-based deviation?
Does the day of week 
matter?
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
1 2 3 4 
Sun Mon Tue Wed Thu Fri Sat 
Day of week 
Alerts per hour 
Notifying Alerts per Day
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
40 
30 
20 
10 
0 
1 2 3 4 
Sun Mon Tue Wed Thu Fri Sat 
Day of week 
Alerts per hour 
Notifying Alerts per Day 
Not really
Squeaky wheels? 
(checks)
30 
20 
10 
0 
30 
20 
10 
0 
30 
20 
10 
0 
30 
20 
10 
0 
1 2 3 4 
0 50 100 150 200 250 
Checks ranked by noise 
Alerts per hour 
Noisiest checks (overall) 
Outlier
● 
● 
● 
● ● 
● ● 
● ● ● ● ● ● ● ● ● ● 
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 
● ● ● 
30 
20 
10 
0 
0 20 40 
Checks ranked by noise 
Alerts per hour 
Noisiest checks (outlier) 
Outlier in more detail
8 
6 
4 
2 
0 
8 
6 
4 
2 
0 
8 
6 
4 
2 
0 
8 
6 
4 
2 
0 
1 2 3 4 
0 50 100 150 200 
Checks ranked by noise 
Alerts per hour 
Noisiest checks (without outlier) 
Long Tail
Squeaky wheel? 
(hosts)
30 
20 
10 
0 
30 
20 
10 
0 
30 
20 
10 
0 
30 
20 
10 
0 
1 2 3 4 
0 50 100 150 200 
Hosts ranked by noise 
Alerts per hour 
Noisiest hosts (overall) 
Same outlier
● 
● 
● ● 
● 
● 
● 
● 
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 
● ● ● ● ● ● ● ● ● 
● 
30 
20 
10 
0 
3 
0 20 40 60 
Hosts ranked by noise 
Alerts per hour 
Noisiest hosts (outlier) 
Similar pattern as checks
8 
6 
4 
2 
0 
8 
6 
4 
2 
0 
8 
6 
4 
2 
0 
8 
6 
4 
2 
0 
1 2 3 4 
0 50 100 150 200 
Checks ranked by noise 
Alerts per hour 
Noisiest checks (without outlier) 
Long Tail
Recurring alerts
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
●● 
● 
● 
● 
● 
● 
● 
● ● ● 
● 
● 
●● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ●● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
●● 
●● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● ●● 
● 
● 
● 
● ● 
● 
● 
●● 
● 
● 
● 
● 
● 
● ● 
● ● 
● ● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
●● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
●● 
●●● 
● 
● 
● ● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● ● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
●● 
● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● ● 
● ● 
●● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
●● 
●● 
●● 
● ● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
●
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● ● ● 
● 
● 
● 
● 
●● 
● 
● 
● 
●● 
● 
● 
● ● 
● 
●● 
●● 
● 
● 
● 
● 
● 
● ● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● ● 
● 
● 
● 
● 
● 
● 
●
● ● 
● 
●
● ● 
●
● 
● 
● ● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● ● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● ●
● 
● 
● 
● ● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● ● 
● 
● 
● ● 
● 
● 
● 
●● 
● 
●●● 
●● 
● 
● 
● 
● 
● 
●● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● ● 
●● 
● 
●● Happens 
150 
Often 
100 
50 
0 
0 100 200 300 
Age between earliest and latest occurrence 
Number of days occurring 
factor(quartile) 
● 
● 
● 
● 
1 
2 
3 
4 
Alert age & frequency of occurrence 
Young Old 
Seldom 
happens
Occur often, for a long time Tolerated 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
●● 
● 
● 
● 
● 
● 
● 
● ● ● 
● 
● 
●● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ●● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
●● 
●● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● ●● 
● 
● 
● 
● ● 
● 
● 
●● 
● 
● 
● 
● 
● 
● ● 
● ● 
● ● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
●● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
●● 
●●● 
● 
● 
● ● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● ● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
●● 
● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● ● 
● ● 
●● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
●● 
●● 
●● 
● ● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
●
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● ● ● 
● 
● 
● 
● 
●● 
● 
● 
● 
●● 
● 
● 
● ● 
● 
●● 
●● 
● 
● 
● 
● 
● 
● ● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● ● 
● 
● 
● 
● 
● 
● 
●
● ● 
● 
●
● ● 
●
● 
● 
● ● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● ● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● ●
● 
● 
● 
● ● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● ● 
● 
● 
● ● 
● 
● 
● 
●● 
● 
●●● 
●● 
● 
● 
● 
● 
● 
●● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● ● 
●● 
● 
●● 150 
100 
50 
0 
0 100 200 300 
Age between earliest and latest occurrence 
Number of days occurring 
factor(quartile) 
● 
● 
● 
● 
1 
2 
3 
4 
Alert age & frequency of occurrence 
Happen once in a while
More data? 
More questions.
HOWTO?
Awk 
R 
ggplot2 
Find out tomorrow! 
Postgres 
d3
Presentation matters
Deep dive into Nagios analytics
Take-away?
Take-aways 
• Don’t rely on your memory to prioritize 
• Your Nagios logs are a treasure trove 
• Have a dialog with your data 
• Presentation matters
Curious about Datadog? 
Like cute logos? 
http://dtdg.co/nagios2012

More Related Content

PDF
Barley environmental association - Plant & Animal Genome 2018
PDF
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
DOCX
Pocket dot grid pages
PDF
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
PPT
Comparing public RNA-seq data
PPT
アイ・トレーニング10点)
PPTX
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
PDF
Catalogue hikvision
Barley environmental association - Plant & Animal Genome 2018
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Pocket dot grid pages
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...
Comparing public RNA-seq data
アイ・トレーニング10点)
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
Catalogue hikvision

Similar to Deep dive into Nagios analytics (16)

PDF
2018 big data_ignite
PPTX
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
PDF
PDF
17 polishing
PDF
17 Sampling Dist
PDF
20 Polishing
PDF
Advanced Procedural Rendering in DirectX11 - CEDEC 2012
PDF
01 intro
PDF
Model Visualisation (with ggplot2)
PDF
Adventures in algorithmic cultures
PDF
Optimal Nudging. Presentation UD.
PDF
2018 jsm vancouver
PDF
Fairisle knitting
PDF
Fruit breedomics workshop wp6 application of high throughput micheletti
PDF
データ社会を生きる技術〜人工知能のHypeとHope〜
PDF
Consumer Preferences in Real Estate Markets
2018 big data_ignite
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
17 polishing
17 Sampling Dist
20 Polishing
Advanced Procedural Rendering in DirectX11 - CEDEC 2012
01 intro
Model Visualisation (with ggplot2)
Adventures in algorithmic cultures
Optimal Nudging. Presentation UD.
2018 jsm vancouver
Fairisle knitting
Fruit breedomics workshop wp6 application of high throughput micheletti
データ社会を生きる技術〜人工知能のHypeとHope〜
Consumer Preferences in Real Estate Markets
Ad

More from Datadog (20)

PPTX
What it Means to be a Next-Generation Managed Service Provider
PPTX
Lifting the Blinds: Monitoring Windows Server 2012
PDF
Monitoring kubernetes across data center and cloud
PDF
Datadog + VictorOps Webinar
PDF
Dataday Texas 2016 - Datadog
PDF
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
PDF
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PPTX
Monitoring Docker containers - Docker NYC Feb 2015
PDF
Running & Monitoring Docker at Scale
PDF
Treating Infrastructure as Garbage
PDF
Events and metrics the Lifeblood of Webops
PDF
The Data Mullet: From all SQL to No SQL back to Some SQL
PDF
Big (IT) data
PDF
Just enough web ops for web developers
PDF
Customer Ops: DevOps <3 customer support
PDF
I <3 graphs in 20 slides
PDF
Effective monitoring with StatsD
PDF
Alerting: more signal, less noise, less pain
PDF
Fact based monitoring
PDF
Fact-Based Monitoring
What it Means to be a Next-Generation Managed Service Provider
Lifting the Blinds: Monitoring Windows Server 2012
Monitoring kubernetes across data center and cloud
Datadog + VictorOps Webinar
Dataday Texas 2016 - Datadog
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
Monitoring Docker containers - Docker NYC Feb 2015
Running & Monitoring Docker at Scale
Treating Infrastructure as Garbage
Events and metrics the Lifeblood of Webops
The Data Mullet: From all SQL to No SQL back to Some SQL
Big (IT) data
Just enough web ops for web developers
Customer Ops: DevOps <3 customer support
I <3 graphs in 20 slides
Effective monitoring with StatsD
Alerting: more signal, less noise, less pain
Fact based monitoring
Fact-Based Monitoring
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
sap open course for s4hana steps from ECC to s4
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.

Deep dive into Nagios analytics

  • 1. A Deep Dive into Nagios Analytics Alexis Lê-Quôc (@alq) http://datadoghq.com
  • 2. A Deep Dive into Nagios Analytics Alexis Lê-Quôc (@alq) http://datadoghq.com
  • 3. @alq Dev & Ops Nagios user since 2008 Datadog co-founder
  • 5. Top 3 failed checks
  • 6. That woke me up That I responded to last week That I responded to 5 weeks ago Top 3 failed checks That most of my team responded to at least once That impacts our business the most?
  • 7. That woke me up That I responded to last week That I responded to 5 weeks ago Top 3 failed checks That most of my team responded to at least once That impacts our business the most?
  • 8. At best, finding local optimums Using memory to prioritize remediation... At worst, brownian motion
  • 10. Performance Metrics Nagios Traffic Other Sources In the “Cloud” Real-time graphs + analytics
  • 14. Nagios Traffic In the “Cloud” Real-time graphs + analytics
  • 15. Nagios a “chatty” source out of 40+ Datadog supports
  • 18. Almost 13000 Nagios “events” over past week
  • 23. More data? More questions.
  • 24. A diNaolto a sgci ewntifiict shtud ydata
  • 25. 6 4 2 0 0 250 500 750 Host count Population factor(quartile) 1 2 3 4 Nagios samples Population 25% 50% 75% 100% 20 93 322 904
  • 27. 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 1 2 3 4 0 250 500 750 1000 Nagios alert per host count per week Weekly count per host split by quartile
  • 28. 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 1 2 3 4 0 250 500 750 1000 Nagios alert per host count per week Weekly count per host split by quartile Outliers Sick hosts, silenced checks
  • 30. Notifications 1-3% of alerts notify Little difference per quartile
  • 31. Does time of day matter?
  • 32. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 12 8 4 12 8 4 12 8 4 12 8 4 1 2 3 4 0 5 10 15 20 Hour of Day (UTC) Alerts per hour
  • 33. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Mean about the same across quartiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 12 8 4 12 8 4 12 8 4 12 8 4 1 2 3 4 0 5 10 15 20 Hour of Day (UTC) Alerts per hour Time-based deviation?
  • 34. Does the day of week matter?
  • 35. 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 1 2 3 4 Sun Mon Tue Wed Thu Fri Sat Day of week Alerts per hour Notifying Alerts per Day
  • 36. 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 40 30 20 10 0 1 2 3 4 Sun Mon Tue Wed Thu Fri Sat Day of week Alerts per hour Notifying Alerts per Day Not really
  • 38. 30 20 10 0 30 20 10 0 30 20 10 0 30 20 10 0 1 2 3 4 0 50 100 150 200 250 Checks ranked by noise Alerts per hour Noisiest checks (overall) Outlier
  • 39. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 20 10 0 0 20 40 Checks ranked by noise Alerts per hour Noisiest checks (outlier) Outlier in more detail
  • 40. 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 1 2 3 4 0 50 100 150 200 Checks ranked by noise Alerts per hour Noisiest checks (without outlier) Long Tail
  • 42. 30 20 10 0 30 20 10 0 30 20 10 0 30 20 10 0 1 2 3 4 0 50 100 150 200 Hosts ranked by noise Alerts per hour Noisiest hosts (overall) Same outlier
  • 43. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 20 10 0 3 0 20 40 60 Hosts ranked by noise Alerts per hour Noisiest hosts (outlier) Similar pattern as checks
  • 44. 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 1 2 3 4 0 50 100 150 200 Checks ranked by noise Alerts per hour Noisiest checks (without outlier) Long Tail
  • 46. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● Happens 150 Often 100 50 0 0 100 200 300 Age between earliest and latest occurrence Number of days occurring factor(quartile) ● ● ● ● 1 2 3 4 Alert age & frequency of occurrence Young Old Seldom happens
  • 47. Occur often, for a long time Tolerated ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● 150 100 50 0 0 100 200 300 Age between earliest and latest occurrence Number of days occurring factor(quartile) ● ● ● ● 1 2 3 4 Alert age & frequency of occurrence Happen once in a while
  • 48. More data? More questions.
  • 50. Awk R ggplot2 Find out tomorrow! Postgres d3
  • 54. Take-aways • Don’t rely on your memory to prioritize • Your Nagios logs are a treasure trove • Have a dialog with your data • Presentation matters
  • 55. Curious about Datadog? Like cute logos? http://dtdg.co/nagios2012