SlideShare a Scribd company logo
Java application monitoring with
Dropwizard Metrics
and Graphite
Roberto Franchini
@robfrankie
Bologna, April 10th, 2015
whoami(1)
15 years of experience, proud to be a programmer
Writes software for information extraction, nlp, opinion
mining (@scale ), and a lot of other buzzwords
Implements scalable architectures
Plays with servers (don't say that to my sysadmin)
Member of the JUG-Torino coordination team
feedback http://lanyrd.com/sdkghq
2
Company
3
Agenda
Intro
Scenario
System monitoring
Application monitoring (dark side)
Application monitoring (light side)
Dropwizard Metrics
Dashboards
4
Quotes
Business value
Our code generates business value
when it runs, not when we write it.
We need to know what our code does when it runs.
We can’t do this unless we measure it.
(Codahale)
6
SLA driven
Have an SLA for your service
Measure and report performance against the SLA
(Ben Treynor, google inc.)
7
Scenario
45 bare metal servers
Ngnix
Jetty (mainly embedded)
PostgreSQL
GlusterFS (28TB and growing)
Kestrel
Kafka on the horizon
Redis
Jenkins as scheduler (cron on steroids)
Infrastructure
9
Software
Java shop
Home made distributed search engine
Home made little PAAS
Docker on the go
More than 120 webapps
More than 100 batch jobs
NRT stream processing jobs running 24x7
10
Java
Java is not dead
And is almost everywhere
The language is evolving
The JVM is the most advanced managed environment
where run your code
Choose your style: Scala, Clojure, Groovy
11
Who uses it (cool side)
Twitter
Spotify
Google
Netflix
LinkedIn
12
Who uses it (real world)
Your bank
13
Systems monitoring
Collectd
From 2012 Collectd
systems: load, df, traffic
java (via jmx): heap
queues: items, size
dbms: connections, size
15
Collectd charts
Traffic
16
Collectd to Graphite
collectd writes to graphite
write_graphite
better charts
dashboard are easy
dashboards are meaningful
17
Graphite dashboard
Servers load dashboard
18
Grafana
Grafana
A beautiful frontend for graphite
Dashboards are meaningful
and
BEAUTIFUL
(you can send screenshots to managers now)
19
Grafana dashboard
20
Application monitoring
Requirements
Measure behaviors
Send to graphite
Integrate with system measures
Correlate with system measures
22
Repeat with me
Correlate application and
system metrics
23
Correlate
graphite
collectd
applications
grafana
24
To do what?
Discover bottlenecks
post-mortem analysis
SLA monitoring
IO impact
Network traffic
Memory
25
User Story
Given the application running
when the manager comes
then I want to show a big green number
26
The answer
42
27
In detail
“Application monitoring? WHAT?”
“Ok, let me explain
What the app is doing right now?
How is the app performing right now?
And then graph it!”
“Ok, I got it!”
“Let me see”
28
5 minutes later
public class PoorManJavaMetrics {
int called;
long totalTime;
public void doThings() {
final long start = System.currentTimeMillis();
//heavy business logic
called++;
final long end = System.currentTimeMillis();
final long duration = end - start;
totalTime +=duration;
}
public void logStats() {
System.out.println("---stats---");
//I can’t write that
}
}
29
DIY Java Monitoring
Maybe better with centralized utility class
(maybe…)
thread safeness?
send measure to different backends?
log to different logging systems?
30
Java Monitoring
Measure in the code
Thread safeness
Counters, gauges, meters etc.
Log metrics
Graph metrics
Export metrics
31
NOT only JMX
We want more
Integrate JMX metrics from third-party libs
JMX
32
Dropwizard Metrics
https://dropwizard.github.io/metrics/3.1.0/
Overview
Code instrumentation
meters, gauges, counters, histograms
Reporters
console, csv, slf4j, jmx
Web app instrumentation
Web app health check
Advanced reporters
graphite, ganglia
34
Overview
Third party libs
aspectj
influxdb
statsd
cassandra
35
Main parts
MetricsRegistry
a collection of all the metrics for your application
usually one instance per JVM
use more in multi WAR deployment
Names
each metric has a unique name
registry has helper methods for creating names
MetricRegistry.name(Queue.class, "items", "total")
//com.example.queue.items.total
MetricRegistry.name(Queue.class, "size", "byte")
//com.example.queue.size.byte
36
Metrics
Gauges
the simplest metric type: it just returns a value
Counters
incrementing and decrementing 64.bit integer
final Map<String, String> keys = new HashMap<>();
registry.register(MetricRegistry.name("gauge", "keys"), new Gauge<Integer>() {
@Override
public Integer getValue() {
return keys.keySet().size();
}
});
final Counter counter= registry.counter(MetricRegistry.name("counter",
"inserted"));
counter.inc();
37
Metrics
Histograms
measures the distribution of values in a stream of data
Meters
measures the rate at which a set of events occur
final Histogram resultCounts = registry.histogram(name(ProductDAO.class,
"result-counts");
resultCounts.update(results.size());
final Meter meter = registry.meter(MetricRegistry.name("meter", "inserted"));
meter.mark();
38
Metrics
Timers
a histogram of the duration of a type of event and a
meter of the rate of its occurrence
Timer timer = registry.timer(MetricRegistry.name("timer", "inserted"));
Context context = timer.time();
//timed ops
context.stop();
39
Reporters
JMX
expose metrics as JMX Beans
Console
periodically reports metrics to the console
CSV
appends a set of .csv files in a given dir
SLF4j
log metrics to a logger
Graphite
stream metrics to graphite
40
Console reporter
final ConsoleReporter console = ConsoleReporter.forRegistry(registry)
.outputTo(System.out)
.convertRatesTo(TimeUnit.MINUTES)
.build();
console.start(10, TimeUnit.SECONDS);
4/9/15 11:45:57 PM
=============================================================
-- Gauges ----------------------------------------------------------------------
gauge.keys
value = 9901
-- Counters --------------------------------------------------------------------
counter.inserted
count = 9901
-- Meters ----------------------------------------------------------------------
meter.inserted
count = 9901
41
slf4j reporter
final Slf4jReporter logging = Slf4jReporter.forRegistry(registry)
.convertDurationsTo(TimeUnit.MINUTES)
.outputTo(LoggerFactory.getILoggerFactory().getLogger("metrics")) .
build();
logging.start(20, TimeUnit.SECONDS);
0 [metrics-logger-reporter-2-thread-1] INFO metrics - type=GAUGE, name=gauge.keys, value=901
2 [metrics-logger-reporter-2-thread-1] INFO metrics - type=COUNTER, name=counter.inserted, count=901
6 [metrics-logger-reporter-2-thread-1] INFO metrics - type=METER, name=meter.inserted, count=901,
mean_rate=90.03794743129822, m1=81.7831205903394, m5=80.52726521433198, m15=80.
30969500950305, rate_unit=events/second
14 [metrics-logger-reporter-2-thread-1] INFO metrics - type=TIMER, name=timer.inserted, count=900, min=1.
9083333333333335E-8, max=0.016671673633333335, mean=1.667999479718904E-4, stddev=0.
0016585493668388946, median=7.196666666666667E-8, p75=1.3421666666666667E-7, p95=2.
7838333333333335E-7, p98=7.131833333333334E-7, p99=0.01666843721666667, p999=0.
016671673633333335, mean_rate=89.8720293570475, m1=81.59911170741354, m5=80.33057092356765,
m15=80.11080303990207, rate_unit=events/second, duration_unit=minutes
42
Graphite reporter
final Graphite graphite = new Graphite(new InetSocketAddress("graphite.example.com", 2003));
final GraphiteReporter reporter = GraphiteReporter.forRegistry(registry)
.prefixedWith("web1.example.com")
.convertRatesTo(TimeUnit.SECONDS)
.convertDurationsTo(TimeUnit.MILLISECONDS)
.filter(MetricFilter.ALL)
.build(graphite);
reporter.start(1, TimeUnit.MINUTES);
Metrics can be prefixed
Useful to divide environment metrics: prod, test
43
Metrics naming
Dot notation by getClass()
easy to create
very long name on dashboard
Maybe better to use
<namespace>.<instrumented section>
.<target (noun)>.<action (past tense verb)>
Such as
accounts.authentication.password.failed
Use prefix
prod, test, dev, local
differentiate data retention on graphite by prefix
44
Grafana application overview
45
Demo
References
https://dropwizard.github.io/metrics/3.1.0/
https://dl.dropboxusercontent.com/u/2744222/2011-04-09-
Metrics-Metrics-Everywhere.pdf
http://graphite.wikidot.com/
http://grafana.org/
http://matt.aimonetti.net/posts/2013/06/26/practical-guide-
to-graphite-monitoring/
https://www.usenix.
org/sites/default/files/conference/protected-
files/srecon15_slides_limoncelli.pdf
47
Thank You
http://lanyrd.com/sdkghq
@robfrankie
franchini@celi.it
48

More Related Content

PDF
Microservices with Spring Boot Tutorial | Edureka
PDF
Kingdom of Saudi Arabia (KSA) Value Added Tax (VAT) Law (with Index)
PPTX
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
PPTX
Dropwizard Introduction
PDF
Metrics by coda hale : to know your app’ health
PDF
What the hell is your software doing at runtime?
PDF
Monitoring with Prometheus
PPT
Computer performance and cost analysis in systems
Microservices with Spring Boot Tutorial | Edureka
Kingdom of Saudi Arabia (KSA) Value Added Tax (VAT) Law (with Index)
Multi-Agent AI Systems: Architectures & Communication (MCP and A2A)
Dropwizard Introduction
Metrics by coda hale : to know your app’ health
What the hell is your software doing at runtime?
Monitoring with Prometheus
Computer performance and cost analysis in systems

Similar to Java application monitoring with Dropwizard Metrics and graphite (20)

ODP
Spatial Data Integrator - Software Presentation and Use Cases
PDF
OORPT Dynamic Analysis
PDF
observability pre-release: using prometheus to test and fix new software
PPTX
Prometheus and Grafana
PDF
Go Observability (in practice)
PDF
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
PDF
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
PDF
A calculus of mobile Real-Time processes
PPTX
Business Process Analytics: From Insights to Predictions
PDF
Using bluemix predictive analytics service in Node-RED
PDF
GraphQL Basics
PPTX
How to Monitor Application Performance in a Container-Based World
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
PPT
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
PPTX
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
PDF
Prometheus Everything, Observing Kubernetes in the Cloud
PDF
Microservices in Go_Dessi_Massimiliano_Codemotion_2017_Rome
PDF
Defects mining in exchanges - medvedev, klimakov, yamkovi
PPTX
What is going on - Application diagnostics on Azure - TechDays Finland
PDF
Microservices and Prometheus (Microservices NYC 2016)
Spatial Data Integrator - Software Presentation and Use Cases
OORPT Dynamic Analysis
observability pre-release: using prometheus to test and fix new software
Prometheus and Grafana
Go Observability (in practice)
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
A calculus of mobile Real-Time processes
Business Process Analytics: From Insights to Predictions
Using bluemix predictive analytics service in Node-RED
GraphQL Basics
How to Monitor Application Performance in a Container-Based World
Flink 0.10 @ Bay Area Meetup (October 2015)
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
Prometheus Everything, Observing Kubernetes in the Cloud
Microservices in Go_Dessi_Massimiliano_Codemotion_2017_Rome
Defects mining in exchanges - medvedev, klimakov, yamkovi
What is going on - Application diagnostics on Azure - TechDays Finland
Microservices and Prometheus (Microservices NYC 2016)
Ad

More from Roberto Franchini (7)

PDF
Integration tests: use the containers, Luke!
PDF
OrientDB - The 2nd generation of (multi-model) NoSQL
PPTX
Where are yours vertexes and what are they talking about?
PDF
Codemotion Rome 2015. GlusterFS
PDF
GlusterFs: a scalable file system for today's and tomorrow's big data
PDF
Redis for duplicate detection on real time stream
ODP
TDD - una introduzione
Integration tests: use the containers, Luke!
OrientDB - The 2nd generation of (multi-model) NoSQL
Where are yours vertexes and what are they talking about?
Codemotion Rome 2015. GlusterFS
GlusterFs: a scalable file system for today's and tomorrow's big data
Redis for duplicate detection on real time stream
TDD - una introduzione
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Empathic Computing: Creating Shared Understanding
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
Understanding_Digital_Forensics_Presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm

Java application monitoring with Dropwizard Metrics and graphite