SlideShare a Scribd company logo
{ }
{ }
{ }
Firenze, November 17th 2015
Roberto “FRANK” Franchini
@robfrankie
Increase business value, measure it!
What the hell is your
software doing at runtime?
More than 15 years of experience, proud to be a
programmer
Member of OrientDB team, tech lead for the full-text,
spatial, JDBC and Docker images
Wrote software for NLP and opinion mining (@scale )
Played with servers, then bought a sysadmin
JUG-Torino co-lead
2
whoami(1)
Agenda
Quotes
System monitoring
Coding
Application monitoring
All together
Feedback
Sample Scenario
3
{ }
{ }
{ }
Quotes
Business value
Our code generates business value
when it runs, not when we write it.
We need to know what our code does when it
runs.
We can’t do this unless we measure it.
(Codahale)
5
SLA driven
Have an SLA for your service
Measure and report performance against the
SLA
(Ben Treynor, Google inc.)
6
{ }
{ }
{ }
System monitoring
Infrastructure monitoring
Sysadmins monitor infrastructure
from the beginning of IT
With right tools a single BOFH
can handle hundreds of servers
8
Tools
On premises
collectd zabbix zenoss
nagios cacti graphite/grafana
Cloud based
datadog newrelic
9
Measures
Cpu load
Network traffic
Disk I/O
Memory
More and more
10
Charts
11
Dashboard
12
Cool, black dashboard
13
{ }
{ }
{ }
Code and deploy
Write
TDD
SOLID principles
Design Patterns
Code metrics
15
Build
unit tests
integration tests
performance tests
test coverage
code quality reports
16
Deploy
Deployment pipeline
Microservices
Container
Cloud
17
Rest
All done, take your rest
Umh
I don’t think so anymore
18
{ }
{ }
{ }
Application monitoring
The day after deployment
How to monitor our service status?
How to measure it?
How it behave?
How it interact with other parts of the system?
Multiply for each µ-service
20
Monitorability
Design sw to be monitorable
Expose metrics (JMX)
Expose status (REST api)
Send metrics to monitoring tools
21
We need application monitoring
“Application monitoring? WHAT?”
“Ok, let me explain
What the app is doing right now?
How is the app performing right now?
And then graph it!”
“Ok, I got it!”
“Let me see” 22
5 minutes later
public class PoorManJavaMetrics {
int called;
long totalTime;
public void doThings() {
final long start = System.currentTimeMillis();
//heavy business logic
called++;
final long end = System.currentTimeMillis();
final long duration = end - start;
totalTime +=duration;
}
public void logStats() {
System.out.println("---stats---");
//Here be DRAGONS
}
}
23
24
Luca Franchini
Use the right tool
Use a library (e.g.: dropwizard metrics)
Count events, measure duration
Log metric values
Send application metrics
to the same backend of system metrics
25
Don’t forget naming!
A naming pattern
<namespace>.<instrumented section>
.<target (noun)>.<action (past tense verb)>
Such as
accounts.authentication.password.failed
Use prefix
prod, test, dev, local
prod.accounts.authentication.password.failed
26
Which metrics?
Rate of documents processed
Latency
Transactions per second (€€€€)
Total number of errors
Meantime user interaction
27
{ }
{ }
{ }
All together now
Code on systems
Don’t cross the streams
Enable code metrics means
sysadmins and devs in the same room
talking to each other
to improve business value
29
Send
application metrics to
the same backend
of system metrics
30
Correlate application
and
system metrics
31
Repeat with me
32
Correlate application
and
system metrics
(Cross the streams!)
33
Single metrics backend
graphite
collectd
applications
grafana
34
To do what?
Discover bottlenecks
post-mortem analysis
SLA monitoring
IO impact
Network traffic
Memory utilization
35
To do what?
Why is performing better on dev laptop?
Why on customer infrastructure it takes 24h
(our old test server takes 1h)?
Mechanical sympathy at large: the new service
is fucking up the I/O
36
Implement THE User Story
Given the application running
when the manager comes
then I want to show a big green number
37
The answer
42
38
Application metrics dashboard
39
Get feedback
40
It’s all about feedback
Our code is talking to us
Listen to it
And take decisions
Decisions
Set new SLAs
Refactor bottleneck
Buy new hw
Expand the cloud
Drop a product
41
42
write code
deploy it
measure it
get feedback
Iterative
10 define some metrics
20 deploy
30 add other metrics
40 goto 10
Are you able to deploy every day?
43
{ }
{ }
{ }
Sample scenario
45 bare metal servers
Ngnix, Jetty, PostgreSQL
GlusterFS, Queues,
Redis, Jenkins (cron on steroids)
Infrastructure
45
Software
Java shop
deploy with Docker
More than 120 webapps
More than 100 batch jobs
NRT stream processing jobs running 24x7
46
Monitoring
collectD, graphite, grafana for system
monitoring
Dropwizard Metrics inside code for application
monitoring
Application metrics reported to graphite too
47
Feedback and decisions
WTF happened last night?
How is it going this morning?
Do you think we can survive the message
flood?
Hey boss, it’s time to buy a new server, we are
running out of resources.
48
{ }
{ }
{ }
Wrap up
Shopping list
Define your SLAs/target
Code and deploy with good practices
Code with monitorability in mind
Monitor your app/service
Correlate system and application metrics
Get feedback
Take decisions 50
References
https://dropwizard.github.io/metrics/3.1.0/
https://dl.dropboxusercontent.com/u/2744222/2011-04-09-
Metrics-Metrics-Everywhere.pdf
http://graphite.wikidot.com/
http://grafana.org/
http://matt.aimonetti.net/posts/2013/06/26/practical-guide-
to-graphite-monitoring/
https://www.usenix.
org/sites/default/files/conference/protected-
files/srecon15_slides_limoncelli.pdf
51
Credits
Sketches by my sons
Andrea (Andrew) and Luca (Luke) Franchini
Cool dashboards are made with Grafana
52
{ }
{ }
{ }
Thank you
Roberto Franchini
ro.franchini@gmail.com
r.franchini@orientdb.com
@robfrankie

More Related Content

PPTX
Where are yours vertexes and what are they talking about?
PDF
Java application monitoring with Dropwizard Metrics and graphite
PDF
Fall in Love with Graphs and Metrics using Grafana
PPTX
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
PPTX
Going Reactive with Spring 5
PDF
Why Distributed Tracing is Essential for Performance and Reliability
PPTX
Debunking Common Myths in Stream Processing
PDF
Baymeetup-FlinkResearch
Where are yours vertexes and what are they talking about?
Java application monitoring with Dropwizard Metrics and graphite
Fall in Love with Graphs and Metrics using Grafana
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
Going Reactive with Spring 5
Why Distributed Tracing is Essential for Performance and Reliability
Debunking Common Myths in Stream Processing
Baymeetup-FlinkResearch

What's hot (20)

PDF
Using Spark at Vungle
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
PDF
Stockholm meetup Kafka_tutorials_window_final_result
PDF
Building Conclave: a decentralized, real-time collaborative text editor
PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
PPTX
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
PDF
Everything You wanted to Know About Distributed Tracing
PDF
Opentracing 101
PPTX
The Evolution of (Open Source) Data Processing
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PPTX
Solving the Hidden Costs of Kubernetes with Observability
PDF
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
PPTX
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
PPTX
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
PDF
A look at Flink 1.2
PPTX
Streaming in the Wild with Apache Flink
PDF
Architectures That Scale Deep - Regaining Control in Deep Systems
PPTX
OpenTelemetry For Developers
PDF
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Using Spark at Vungle
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Stockholm meetup Kafka_tutorials_window_final_result
Building Conclave: a decentralized, real-time collaborative text editor
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Everything You wanted to Know About Distributed Tracing
Opentracing 101
The Evolution of (Open Source) Data Processing
Apache Flink(tm) - A Next-Generation Stream Processor
Solving the Hidden Costs of Kubernetes with Observability
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
A look at Flink 1.2
Streaming in the Wild with Apache Flink
Architectures That Scale Deep - Regaining Control in Deep Systems
OpenTelemetry For Developers
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Ad

Similar to What the hell is your software doing at runtime? (20)

PPTX
Bejug - Activiti in Action (part 1)
PDF
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
PDF
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
PDF
Robert Mircea & Virgil Chereches: Our Journey To Continuous Delivery at I T.A...
PDF
Intro to open source telemetry linux con 2016
PDF
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
PPTX
Our Journey To Continuous Delivery
PDF
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
PDF
Path to continuous delivery
PPTX
Cytoscape CI Chapter 2
PDF
JBoss Community's Application Monitoring Platform
PDF
JBoss Community's Application Monitoring Platform
PPTX
Breaking the 2 Pizza Paradox with your Platform as an Application
PPTX
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
PPTX
From Duke of DevOps to Queen of Chaos - Api days 2018
PDF
Continuous Lifecycle London 2018 Event Keynote
PDF
Defects mining in exchanges - medvedev, klimakov, yamkovi
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
PDF
Apidays Paris 2023 - How to use NoCode as a Microservice, Benjamin Buléon and...
PPTX
SDLC & DevOps Transformation with Agile
Bejug - Activiti in Action (part 1)
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Robert Mircea & Virgil Chereches: Our Journey To Continuous Delivery at I T.A...
Intro to open source telemetry linux con 2016
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Our Journey To Continuous Delivery
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
Path to continuous delivery
Cytoscape CI Chapter 2
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring Platform
Breaking the 2 Pizza Paradox with your Platform as an Application
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
From Duke of DevOps to Queen of Chaos - Api days 2018
Continuous Lifecycle London 2018 Event Keynote
Defects mining in exchanges - medvedev, klimakov, yamkovi
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Apidays Paris 2023 - How to use NoCode as a Microservice, Benjamin Buléon and...
SDLC & DevOps Transformation with Agile
Ad

More from Roberto Franchini (6)

PDF
Integration tests: use the containers, Luke!
PDF
OrientDB - The 2nd generation of (multi-model) NoSQL
PDF
Codemotion Rome 2015. GlusterFS
PDF
GlusterFs: a scalable file system for today's and tomorrow's big data
PDF
Redis for duplicate detection on real time stream
ODP
TDD - una introduzione
Integration tests: use the containers, Luke!
OrientDB - The 2nd generation of (multi-model) NoSQL
Codemotion Rome 2015. GlusterFS
GlusterFs: a scalable file system for today's and tomorrow's big data
Redis for duplicate detection on real time stream
TDD - una introduzione

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Essential Infomation Tech presentation.pptx
PDF
System and Network Administration Chapter 2
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
System and Network Administraation Chapter 3
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
AI in Product Development-omnex systems
PDF
Nekopoi APK 2025 free lastest update
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
top salesforce developer skills in 2025.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Odoo Companies in India – Driving Business Transformation.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Essential Infomation Tech presentation.pptx
System and Network Administration Chapter 2
How to Migrate SBCGlobal Email to Yahoo Easily
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Softaken Excel to vCard Converter Software.pdf
System and Network Administraation Chapter 3
Which alternative to Crystal Reports is best for small or large businesses.pdf
CHAPTER 2 - PM Management and IT Context
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms II-SECS-1021-03
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
AI in Product Development-omnex systems
Nekopoi APK 2025 free lastest update
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
top salesforce developer skills in 2025.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Odoo Companies in India – Driving Business Transformation.pdf

What the hell is your software doing at runtime?