Monitoring NGINX (plus): key metrics and how-to

Monitoring nginx
Alexis Lê-Quôc, Datadog
@alq

Agenda
• Dramatis personae
• Observations
• Monitoring 1 nginx (plus) with logs
• Monitoring 1 nginx (plus) with metrics
• Monitoring N nginx effectively

Datadog == monitoring
• Monitoring as a service
• Work really will with large, dynamic environments (e.g. clouds)
• Aggregate performance metrics
• Correlate nginx performance with the rest of your infrastructure

Some stats
• Across all monitored servers
• nginx ~10%
• Apache ~5%
• CPU and CPU/$ is the dominant resource

% of instances per core count
40%
30%
20%
10%
0%
1 2 4 8 12 16 24 32
Core count
10%
3% 1%
10%
30%
7%
39%
10%

% of instances per type (AWS only)
30%
22.5%
15%
7.5%
0%
c3.l c3.2xl c1.xl c3.8xl m3.l c3.xl m3.m cc2.8xl t2.m c3.4xl rest
EC2 type
8.6%
3.1%
5% 4.7% 4.5% 4.4% 5.3%
7.6%
13%
14%
30%

Monitoring nginx
1. Monitoring with logs
2. Monitoring with status
3. Monitoring with statsd

Monitoring with logs
nginx log forwarder indexer UI
• Canonical example of log indexers
• Your choice of:
• logstash
• splunk
• logentries, sumologic, loggly, etc.

Monitoring with logs
nginx log forwarder indexer UI
Strengths Weaknesses
forensics & anomalies low signal-to-noise ratio
content-driven analysis “black box”

Monitoring with metrics
nginx
status
collector aggregator UI/alerts
• open-source: ngx_http_stub_status_module
• bare-bone metrics
• human-readable text presentation
• plus: ngx_http_status_module
• a lot more metrics for each function
• json format
• Your choice of…
• Datadog, Nagios, Zabbix, etc. for open-source
• Datadog for nginx plus

Monitoring with metrics
nginx
status
collector aggregator UI/alerts
lightweight & real-time no insight into content
“white box”

Simple metrics taxonomy
1. What it measures
• Work or resource
• Focus on work because work == value
• Resource analysis useful to understand performance
• Use Brendan Gregg’s USE
• Utilization (% over time)
• Saturation (queue length)
• Errors (count over time)
2. Type
• Gauge: sample
• Counter: accumulated sample, needs to be derived to be
meaningful
http://www.brendangregg.com/usemethod.html

Open-source metrics
Class Type Resource/Work Notes
Current
connections
Gauge Resource
reading, writing,
idle
Accepted
connections
Counter Resource
Handled
connections
Counter Resource
<= accepted if
resource limit
Requests Counter Work
True purpose of
the server
•Latency must be measured
using logs or statsd.

Key “plus” metrics
Class Type Resource/Work Notes
5xx Errors Counter Work
without log
analysis
5xx/sum(Nxx) Gauge Work error rate %
idle/dropped
connections
Gauge Resource saturation
active/total
connections
Gauge Resource
upstream
capacity
Requests Counter Work
true purpose of
the server
• Latency must be measured
using logs or statsd.

Monitoring with statsd
nginx statsd UI/alerts
lightweight, real-time, standard not comprehensive
custom metrics, content-aware
https://github.com/zebrafishlabs/nginx-statsd

Monitoring nginx
1. Logs for content-analysis (forensics, anomalies, marketing)
2. Status for (white box) performance monitoring
3. statsD for custom metrics
No single method gives you everything you need.

Monitoring a lot of nginx
1. Requires aggregation
2. It’s all about Metadata (“Pet-to-cattle” mindset)
3. Correlation

Aggregation
• By default for log-based monitoring
• Not by default for metric-based monitoring

Metadata
• Analyze by properties that are not the host identity
• Find anomalies that are not obvious
• Pet-to-cattle evolution: hosts don’t matter, services do

Correlation
• nginx is only one piece of the infrastructure

Thank you!
Questions/Comments? @alq

Monitoring NGINX (plus): key metrics and how-to

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to Monitoring NGINX (plus): key metrics and how-to (20)

More from Datadog (20)

Recently uploaded (20)

Monitoring NGINX (plus): key metrics and how-to