SlideShare a Scribd company logo
collectd
Thresholds and alerting
About me
● Florian "octo" Forster
● Open-source work since 2001
● Started collectd in 2005
Agenda
● collectd
● Thresholds
● Alerting
Agenda
● collectd
● Thresholds
● Alerting
collectd
● Daemon
● collect metrics
● mangle / transport metrics
● store metrics (no retrieve)
collectd
● Open-source project
○ MIT and GPL licensed
● Platform independent
○ Linux, BSD, Solaris, AIX, HP-UX, …
○ Windows via SSC Serv (non-free)
collectd
● Agent based design
○ Runs on each host
● Extensible via plugins
○ Language bindings (Perl, Python, Java)
○ "exec" plugin, e.g. shell scripts
collectd
● 95+ "read" (input) plugins
○ System metrics (e.g. CPU, memory)
○ Application metrics (e.g. MySQL)
○ Other (Xeon Phi, SNMP, OneWire)
collectd
● 15+ "write" (output) plugins
○ Graphite
○ RRDtool
○ RRDCacheD
○ Riemann
○ MongoDB
○ HTTP (generic)
collectd
# Input
LoadPlugin cpu
LoadPlugin memory
LoadPlugin df
<Plugin df>
MountPoint "/"
ValuesPercentage true
</Plugin>
# Output
LoadPlugin network
<Plugin network>
Server "influxdb.example.com"
</Plugin>
Example configuration
collectd
● collectd's network plugin
○ Uses a binary protocol
○ UDP transport
○ 3rd party apps, e.g. InfluxDB, Prometheus
→ Grafana meets Monitoring
(tomorrow 9:30, this room, German)
→ Prometheus
(tomorrow 12:00, this room, English)
Agenda
● collectd
● Thresholds
● Alerting
Thresholds
● Implemented as a plugin
● Generates "notification" (aka. alert)
● Severities: FAILURE, WARNING, OKAY
Thresholds
collectd
CPU
Disk
Memory
…
Threshold
Network
Thresholds
● Load the Threshold plugin
● Configure thresholds
● Ensure "notifications" are handled
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
</Type>
</Plugin>
collectd.conf
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
</Type>
</Plugin>
collectd.conf
Thresholds
● Metrics must be selected by type
● They can be selected by host, plugin,
plugin instance and type instance.
Thresholds
<Plugin threshold>
<Host "db.example.com">
<Plugin "cpu">
<Type "percent">
Instance "wait"
WarningMax 40
</Type>
</Plugin>
</Host>
</Plugin>
collectd.conf
Thresholds
● WarningMin, FailureMin: Lower bound of
acceptable values.
● WarningMax, FailureMax: Upper bound of
acceptable values.
Thresholds
WarningMin WarningMax
FailureMaxFailureMin
Warning
Thresholds
Warning!
Thresholds
Warning Warning
Warning
Okay Okay
Failure
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
Persist true
</Type>
</Plugin>
collectd.conf
Warning
Thresholds
Warning!
Warning!
Warning!
Warning!
Warning!
Warning!
Warning!
Warning!
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
Persist true
PersistOK true
</Type>
</Plugin>
collectd.conf
Warning
Thresholds
Warning!
Warning!
Warning!
Warning!
Warning!
Warning!
Warning!
Warning!
Ok
Ok
Ok
Warning
Thresholds
Warning!
Never mind
Help!
It's happening again!
Phew!
Why does this
keep happening?!
Meh.
Thresholds
● Real metrics are noisy
● Murphy is not on our side
● First approach: Hysteresis
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
Hysteresis 5
</Type>
</Plugin>
collectd.conf
Maybe?
Warning
Thresholds
Warning!
Never mind
Warning
Maybe?
Thresholds
Warning!
Never mind
Thresholds
● Many metrics are bursty
● Especially latency
● Second approach: multiple failures
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
Hysteresis 5
Hits 6
</Type>
</Plugin>
collectd.conf
Warning
Maybe?
Thresholds
One potato
Phew
Two potato
Three potato
Thresholds
● "Interesting" metrics
● Create a FAILURE notification when metric
disappears
Thresholds
LoadPlugin threshold
<Plugin threshold>
<Type "temperature">
WarningMax 40
Interesting true
</Type>
</Plugin>
collectd.conf
Agenda
● collectd
● Thresholds
● Alerting
Alerting
● "notifications"
● Severities: FAILURE, WARNING, OKAY
Alerting
● Support in some plugins:
● e.g. exec, network, logfile, syslog,
write_riemann, write_sensu
Alerting
LoadPlugin exec
<Plugin exec>
NotificationExec "user" "/usr/lib/collectd/notify.sh"
</Plugin>
Example: exec
Alerting
/usr/lib/collectd/notify.sh <<EOF
Severity: WARNING
Time: 1447747649.961
Host: db.example.com
Plugin: cpu
Type: percent
Data source "value" is currently 42. That is above the warning threshold of 40.
EOF
Example: exec notification format
Alerting
● Special notification plugins
● notify_desktop, notify_email
Alerting
LoadPlugin "notify_email"
<Plugin "notify_email">
From "collectd@example.com"
Recipient "monitoring+collectd@example.com"
SMTPServer "mail.example.com"
SMTPUser "collectd"
SMTPPassword "/!0sMcek3U"
</Plugin>
Example: notify_email
Alerting
● What's next?
● notify_nagios: https://collectd.org/bugs/1337
● Writes service check results
Alerting
LoadPlugin "notify_nagios"
<Plugin "notify_nagios">
CommandFile "/usr/local/nagios/var/rw/nagios.cmd"
</Plugin>
Example: notify_nagios
Thank you!
Thank you!
Questions?
It's time for
Questions

More Related Content

PDF
Ethernet Shield
PDF
PHP Project development with Vagrant
PDF
Flowchart - Building next gen malware behavioural analysis environment
PDF
Having fun with Raspberry(s) and Apache projects
PDF
Xdebug from a to x
PDF
Vm ware fuzzing - defcon russia 20
PPSX
Golang getting started
ZIP
Workshop@naha val3
Ethernet Shield
PHP Project development with Vagrant
Flowchart - Building next gen malware behavioural analysis environment
Having fun with Raspberry(s) and Apache projects
Xdebug from a to x
Vm ware fuzzing - defcon russia 20
Golang getting started
Workshop@naha val3

What's hot (6)

PDF
Improving monitoring systems Interoperability with OpenMetrics
PDF
Velocity 2011 - Our first DDoS attack
ZIP
Workshop@naha_val3
PDF
PDF
Nmap5.cheatsheet.eng.v1
PDF
Node.js
Improving monitoring systems Interoperability with OpenMetrics
Velocity 2011 - Our first DDoS attack
Workshop@naha_val3
Nmap5.cheatsheet.eng.v1
Node.js
Ad

Viewers also liked (11)

PPTX
In times of crisis : 2014
PDF
Comunicato bigliardino n°7
PDF
Vacuum Table and Attachment Plate for Dynamometer
PPT
Macroscop cloud mips_2015_new_v2.0
PPT
Predaj Ready Made Spoločnosť
PDF
Windows
PDF
Comunicato attività polisportiva - Calcio N°8 del 23 novembre 2015
PDF
So Why Mobile Reengagement?
PDF
P1_E1_Internet.pdf
PDF
Mum template
PDF
Совспорт. Футбол №46 2015 "MYFOOTBALL.WS"
In times of crisis : 2014
Comunicato bigliardino n°7
Vacuum Table and Attachment Plate for Dynamometer
Macroscop cloud mips_2015_new_v2.0
Predaj Ready Made Spoločnosť
Windows
Comunicato attività polisportiva - Calcio N°8 del 23 novembre 2015
So Why Mobile Reengagement?
P1_E1_Internet.pdf
Mum template
Совспорт. Футбол №46 2015 "MYFOOTBALL.WS"
Ad

Similar to OSMC 2015: Collectd Thresholds Plugin and Icinga by Florian Forster (20)

PDF
OSMC 2014: Introduction into collectd | Florian Foster
PDF
OSMC 2014 | Introduction into collectd by Florian Forster
PPTX
Herding cats & catching fire: Workday's telemetry & middleware
ODP
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
PDF
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
PDF
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
PDF
Introduction of unit test on android kernel
PDF
Collect distributed application logging using fluentd (EFK stack)
PDF
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
PDF
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
PDF
Debugging Java from Dumps
PDF
Infrastructure & System Monitoring using Prometheus
PDF
Helpful pre commit hooks for Python and Django
PDF
The Green Lab - [04 B] [PWA] Experiment setup
PPTX
Andriy Shalaenko - GO security tips
PDF
Managing your Minions with Func
PDF
Debugging webOS applications
PDF
Android Internals
PDF
php & performance
PDF
Unmanned Aerial Vehicles: Exploit Automation with the Metasploit Framework
OSMC 2014: Introduction into collectd | Florian Foster
OSMC 2014 | Introduction into collectd by Florian Forster
Herding cats & catching fire: Workday's telemetry & middleware
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
Introduction of unit test on android kernel
Collect distributed application logging using fluentd (EFK stack)
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
Debugging Java from Dumps
Infrastructure & System Monitoring using Prometheus
Helpful pre commit hooks for Python and Django
The Green Lab - [04 B] [PWA] Experiment setup
Andriy Shalaenko - GO security tips
Managing your Minions with Func
Debugging webOS applications
Android Internals
php & performance
Unmanned Aerial Vehicles: Exploit Automation with the Metasploit Framework

Recently uploaded (20)

PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Approach and Philosophy of On baking technology
PDF
Getting Started with Data Integration: FME Form 101
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Machine Learning_overview_presentation.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TLE Review Electricity (Electricity).pptx
Heart disease approach using modified random forest and particle swarm optimi...
Approach and Philosophy of On baking technology
Getting Started with Data Integration: FME Form 101
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
cloud_computing_Infrastucture_as_cloud_p
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine Learning_overview_presentation.pptx
A comparative study of natural language inference in Swahili using monolingua...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Network Security Unit 5.pdf for BCA BBA.
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Mushroom cultivation and it's methods.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
NewMind AI Weekly Chronicles - August'25-Week II

OSMC 2015: Collectd Thresholds Plugin and Icinga by Florian Forster