SlideShare a Scribd company logo
Monitoring 
! 
Claude Falguière 
Valtech Paris 
#DV14 #Monitoring @cfalguiere
Content 
• DevOps is more than tooling! 
! 
! 
• Make you love Data! 
Individuals and interactions over processes and tools 
! 
• Motivations for providing and collecting data! 
! 
• Monitoring user stories and practices! 
! 
• Getting started and open source tooling 
#DV14 #Monitoring @cfalguiere
Claude Falguiere 
• Devoxx4Kids! 
• Paris JUG, Devoxx France, 
Duchess! 
! 
http://cfalguiere.wordpress.com ! 
• DevOps Coach ! 
• Java, Performance! 
#DV14 #Monitoring @cfalguiere
Monitoring 
What would you do if you knew 
that database is broken 
that number of hits doubles every 2 month 
that users struggle to find the order form 
why the app is slow 
what users want to buy 
#DV14 #Monitoring @cfalguiere
Model 
138 ms 
742 orders 
42 users 
Questions Model 
Hypothesis 
Facts Data 
Sales increased by 14% 
Estimated orders next month 934 
Average number of requests is 5 times 
the number of users 
#DV14 #Monitoring @cfalguiere
Galaxy Rotation Problem 
Spiral galaxies spin too fast ! 
! 
Expected mass should be 
ten times the observed 
mass - calculated from the 
visible objets - to prevent 
galaxies from flying apart 
#DV14 #Monitoring @cfalguiere
Discovery of Dark Matter 
Assumes 
readings 
are wrong 
1932 - 1933 1960 - 1970 2010 - 2013 
Jan Oort Fritz Zwicky 
Hypothesis 
of a missing 
mass 
?? 
If readings 
are true, is 
model 
wrong ? 
Mass 
calculated 
from 
gravitational 
effects and 
evidence of 
Dark Matter 
Plank Satellite 
Dark matter 
estimated to 
84.5% of the 
total matter in 
the universe 
Vera Rubin 
#DV14 #Monitoring @cfalguiere
Measure everything 
Lean Startup DevOps 
Make decisions 
based on facts! 
Big Data 
#DV14 #Monitoring @cfalguiere
Measure everything 
Lean Startup DevOps 
Make decisions 
based on facts! 
Big Data 
#DV14 #Monitoring @cfalguiere
Measure everything 
Lean Startup DevOps 
Make decisions 
based on facts! 
Big Data 
#DV14 #Monitoring @cfalguiere
Measure everything 
Lean Startup DevOps 
Make decisions 
based on facts! 
Big Data 
#DV14 #Monitoring @cfalguiere
What would you do if you knew 
that database is broken 
that number of hits doubles every 2 month 
that users struggle to find the order form 
why the app is slow 
what users want to buy 
#DV14 #Monitoring @cfalguiere
Motivations and user stories 
SLA observance! 
Alerting 
Alerting 
Diagnosis / Post-Mortem! 
Capacity Planning! 
Improvement 
Storage, Visualization 
#DV14 #Monitoring @cfalguiere
Motivations and user stories 
SLA observance! 
Alerting 
Diagnosis / Post-Mortem! 
Capacity Planning! 
Improvement 
Alerting Storage, Visualization 
#DV14 #Monitoring @cfalguiere
Architecture 
Collector 
Probe 
App 
Alerting 
Storage, 
Aggregation Dev 
Log Parser 
Support 
Log 
Network 
System 
DBA 
Visualization 
#DV14 #Monitoring @cfalguiere
Architecture 
Collector 
Probe 
App 
Alerting 
Storage, 
Aggregation Dev 
Log Parser 
Support 
Log 
Network 
System 
DBA 
Visualization 
#DV14 #Monitoring @cfalguiere
Architecture 
Collector 
Probe 
App 
Alerting 
Storage, 
Aggregation Dev 
Log Parser 
Support 
Log 
Network 
System 
DBA 
Visualization 
#DV14 #Monitoring @cfalguiere
Collector 
System 
Collector 
App 
Log 
Storage 
MQ 
Storage 
Alerting 
filters 
rules MQ 
#DV14 #Monitoring @cfalguiere
Collector 
System 
Collector 
App 
Log 
Storage 
MQ 
Storage 
Alerting 
filters 
rules MQ 
#DV14 #Monitoring @cfalguiere
Topology 
App Platform 
App 
Monitoring Platform 
Alerting 
Collector! Visualization 
Log Parser 
Log 
Storage, 
Aggregation 
#DV14 #Monitoring @cfalguiere
Resilience 
App Platform 
App 
Monitoring Platform 
Alerting 
Collector 
Log Parser 
Log 
MQ Collector 
Storage, 
Aggregation 
Visualization 
MQ 
#DV14 #Monitoring @cfalguiere
What would you do if you knew that 
database is broken 
#DV14 #Monitoring @cfalguiere
Error detection and alerting 
• Log filtering ! 
• Event firing! 
! 
• Context! 
• is it critical ?! 
• which feature does it impact ?! 
• how deep is the impact ? 
#DV14 #Monitoring @cfalguiere
Is this a log ? 
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: ! 
Access denied for user 'shopapp'@'shprdb1' to database 'shop'! 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)! 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)! 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)! 
at java.lang.reflect.Constructor.newInstance(Unknown Source)! 
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)! 
at com.mysql.jdbc.Util.getInstance(Util.java:386)! 
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)! 
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)! 
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)! 
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:928)! 
at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1750)! 
at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1290)! 
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2493)! 
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2526)! 
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2311)! 
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)! 
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)! 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)! 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)! 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)! 
at java.lang.reflect.Constructor.newInstance(Unknown Source)! 
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)! 
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)! 
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:347)! 
at java.sql.DriverManager.getConnection(Unknown Source)! 
#DV14 #Monitoring @cfalguiere
Log example 
2013-12-17 05:53:16,208 ERROR [Order Creation Service](456713) 
[shpras2](web1234) Could not create order id=456713 - Cause: 
Can’t connect to database ‘shop” - MySqlMessage: Access denied 
for user 'shopapp'@'shprdb1' to database 'shop'! 
2013-12-17 05:53:16,208 ! 
ERROR ! 
[Order Creation Service]! 
(456713) ! 
[shpras2]! 
(web1234) ! 
Could not create order id=456713 ! 
Cause: Can’t connect to database ‘shop” ! 
MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! 
#DV14 #Monitoring @cfalguiere
Timestamp 
2013-12-17 05:53:16,208 ! 
ERROR ! 
[Order Creation Service]! 
(456713) ! 
[shpras2]! 
(web1234) ! 
Could not create order id=456713 ! 
Cause: Can’t connect to database ‘shop” ! 
MySqlMessage: Access denied for user 
'shopapp'@'shprdb1' to database 'shop'! 
Severity 
}Context (technical and business) 
{ 
Meaningful information 
#DV14 #Monitoring @cfalguiere
Log Collectors 
Collector 
Collectd 
Logstash 
storage 
Log 
Alerting! 
System 
Flume 
Splunk 
(Commercial) 
#DV14 #Monitoring @cfalguiere
Logstash 
input {! 
file {! 
path => “/app/logs/apache/*.log”! 
type => "apachelog"! 
}! 
}! 
! 
filter {! 
if [type] == "apachelog" {! 
grok {! 
pattern => “%{COMBINEDAPACHELOG}" ! 
}! 
}! 
}! 
! 
output {! 
elasticsearch { host => localhost } ! 
stdout { }! 
} 
#DV14 #Monitoring @cfalguiere
Logstash 
input {! 
file {! 
path => “/app/logs/appserver/monitor*.log"! 
type => "applog"! 
}! 
}! 
! 
filter {! 
if [type] == "applog" {! 
grok {! 
pattern => “%{TIMESTAMP_ISO8601:ts}” %{WORD}:severity …! 
}! 
}! 
}! 
! 
output {! 
elasticsearch { host => localhost } ! 
stdout { }! 
} 
#DV14 #Monitoring @cfalguiere
Rate check 
• Frequency of an error increases! 
• Activity falls (e.g. Frequency of orders)! 
! 
• Alerting based on threshold 
#DV14 #Monitoring @cfalguiere
Baselining 
120 
90 
60 
30 
0 
A 
B 
10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10 
200 
150 
100 
50 
0 
D 
C 
10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10 
#DV14 #Monitoring @cfalguiere
What would you do if you knew that 
number of hits doubles 
every 2 month 
120 
90 
60 
30 
0 
Jan Feb Mar Apr May Jun Jul Aug 
#DV14 #Monitoring @cfalguiere
Graphers 
70 
52,5 
35 
17,5 
0 
30 
22,5 
15 
7,5 
0 
• Foresight 
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
• Cycles 
Sun Tue Thu Sat Mon Wed Wed 
40 
30 
20 
10 
0 
• Correlation 
Sun Tue Thu Sat Mon WedWed 
• Distribution 
100 
75 
50 
25 
0 
April May June July 
#DV14 #Monitoring @cfalguiere
Storage / Visualization 
Collectors (Collectd / Statd / 
Logstash / Flume) 
Plain REST 
Graphite 
docker: lopter/collectd-graphite 
#DV14 #Monitoring @cfalguiere
Collect and Share 
Collect Once and Share! 
• Support, ! 
• Ops, Dev! 
• Business! 
! 
UpToDate! 
Flexible! 
! 
#DV14 #Monitoring @cfalguiere
Storage / Visualization 
Collectors (Collectd / Statd / 
Logstash / Flume) 
Plain REST REST REST 
Graphite 
InfluxDB 
Grafana 
docker: gsogol/docker-elk 
Logstash 
ElasticSearch 
Kibana 
#DV14 #Monitoring @cfalguiere
JMX 
source: wikipedia 
• MBeans! 
• Registration! 
• Servo! 
• RMI and firewalls! 
• -Dcom.sun.management.jmxremote.rmi.port=p! 
• -Djava.rmi.server.hostname=n.n.n.n! 
• Jolokia! 
• jmxtrans! 
! 
#DV14 #Monitoring @cfalguiere
JMX Collectors 
storage 
Collector 
logstash collectd 
JMX beans 
VisualVM! 
JConsole 
JMX Enabled! 
! 
App 
Performance 
Monitoring 
tools 
#DV14 #Monitoring @cfalguiere
JSON Event over REST 
curl -X POST “…” ! 
Timestamp 
-d '{"ts": "2013-12-17 05:53:16,208", ! 
! "type": “metric”, ! 
! “module”: “Order Creation Service”, ! 
! “module-id”: “456713”, ! 
! “instance”: “shpras2”, ! 
! “thread”: “web1234”, ! 
“name”: “order-creation”,! 
! “duration”: “12”, ! 
! “unit”: “ms”} 
} Context (technical and business) 
} Metric) 
#DV14 #Monitoring @cfalguiere
What would you do if you knew 
why app is slow 
#DV14 #Monitoring @cfalguiere
Tuning 
• Collectd/Statd plugins! 
• Metrics ! 
• Commercial : Plumbr, 
AppDynamics, New 
Relics! 
! 
! 
Where does it spend time ?! 
Why ? 
cross-check metrics from 
various sub-systems 
Front-End 
Back-End 
System 
DB 
System 
System 
#DV14 #Monitoring @cfalguiere
What would you do if you knew that 
users struggle to find the 
order form 
#DV14 #Monitoring @cfalguiere
Web Analytics / User tracking 
• Web analytics! 
• Page counters! 
• Tagging! 
• Log parser! 
! 
• Google Analytics! 
• Piwik (docker: cfalguiere/docker-piwik) 
• Reporting APIs 
#DV14 #Monitoring @cfalguiere
What would you do if you knew what 
users want to buy 
#DV14 #Monitoring @cfalguiere
Model vs Big Data 
• Expected information! 
• Explicit Model! 
• List of metrics 
• Classification! 
• Machine Learning! 
• Patterns detection! 
Highlights valuable metrics and relationships 
#DV14 #Monitoring @cfalguiere
Getting started 
List user 
stories and 
metrics 
setup 
monitoring 
get 
facts 
get 
add 
metrics 
hypothesis 
validate 
hypothesis 
get 
facts 
#DV14 #Monitoring @cfalguiere
What should I monitor ? 
Alerting & Post-Mortem :! 
Presence check 
Activity (how many users, requests, orders …) 
Ressources that are limited in size 
Physical : CPU, memory, free disk space, network bandwidth ... 
Logical : pools, queues, caches, … 
Errors 
Others 
#DV14 #Monitoring @cfalguiere
What should I monitor ? 
Plan & Improve :! 
Any information which is useful to understand the process 
time spent for each major step 
things that are done often or requires large datasets 
user navigation 
context 
Listen to users and ops 
#DV14 #Monitoring @cfalguiere
Learn from data 
Continuous 
Improvement 
Design for 
Failure 
#DV14 #Monitoring @cfalguiere
Thank You 
#DV14 #Monitoring @cfalguiere

More Related Content

PDF
Devoxx 2014 Monitoring
PDF
Data driven devops as presented at QCon London 2018
PDF
Data driven devops as presented at Codemash 2018
PDF
Bigdata for small pockets, by Javier Ramirez from teowaki. RubyC Kiev 2014
PDF
Solutions for when documentation fails
PDF
Continuous deployment 2.0
PDF
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
PDF
The Path of DevOps Enlightenment for InfoSec
Devoxx 2014 Monitoring
Data driven devops as presented at QCon London 2018
Data driven devops as presented at Codemash 2018
Bigdata for small pockets, by Javier Ramirez from teowaki. RubyC Kiev 2014
Solutions for when documentation fails
Continuous deployment 2.0
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
The Path of DevOps Enlightenment for InfoSec

Viewers also liked (20)

PDF
Pres perf human talks mars 2015
PDF
Le monitoring à l'heure de DevOps et Big Data
PDF
H2O , Le machine learning sans coder ou presque - Devoxx france 2016
PDF
Diagnostic performances
PDF
Incanter bigdata jsc2012
PDF
Presentation devoxx4kids à iut-agile
PDF
Customer Ops: DevOps &lt;3 customer support
PDF
DevconTLV 2014 (Jan) - DIY DevOps
PPTX
Survey on article extraction and comment monitoring techniques
PDF
Practical Monitoring Techniques
PDF
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
PDF
Which watcher watches CloudWatch
PDF
Measured availability - Sanjay Singh - DevOps Bangalore meetup March 28th 2015
PDF
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
PDF
DevOps Roadtrip Minneapolis
PDF
DevOps/Flow workshop for agile india 2015
PPTX
Time to say goodbye to your Nagios based setup
PDF
Run IT Support the DevOps Way
PDF
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
PDF
Fall in Love with Graphs and Metrics using Grafana
Pres perf human talks mars 2015
Le monitoring à l'heure de DevOps et Big Data
H2O , Le machine learning sans coder ou presque - Devoxx france 2016
Diagnostic performances
Incanter bigdata jsc2012
Presentation devoxx4kids à iut-agile
Customer Ops: DevOps &lt;3 customer support
DevconTLV 2014 (Jan) - DIY DevOps
Survey on article extraction and comment monitoring techniques
Practical Monitoring Techniques
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
Which watcher watches CloudWatch
Measured availability - Sanjay Singh - DevOps Bangalore meetup March 28th 2015
5 Ways ITSM can Support DevOps, an ITSM Academy Webinar
DevOps Roadtrip Minneapolis
DevOps/Flow workshop for agile india 2015
Time to say goodbye to your Nagios based setup
Run IT Support the DevOps Way
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
Fall in Love with Graphs and Metrics using Grafana
Ad

Similar to Devoxx 2014 monitoring (20)

PDF
Creating PostgreSQL-as-a-Service at Scale
PDF
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
PDF
Dev Ops without the Ops
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
PDF
DevOps Fest 2019. Gianluca Arbezzano. DevOps never sleeps. What we learned fr...
PPTX
So you want to build a mobile app - HTML5 vs. Native @ the Boston Mobile Expe...
PDF
Dev and Ops Collaboration and Awareness at Etsy and Flickr
PDF
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
PDF
Designing and Implementing a Multiuser Apps Platform
PPTX
Data Pipelines - Big Data meets Salesforce
KEY
Cross-platform logging and analytics
PDF
Developing Brilliant and Powerful APIs in Ruby & Python
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
PDF
Intro to DefectDojo at OWASP Switzerland
PDF
2012 03 27_philly_jug_rewrite_static
PDF
Data Driven DevOps
PDF
Codemotion Progressive Web Applications Pwa Webinar - Jorge Ferreiro - @jgfer...
PDF
StackStorm DevOps Automation Webinar
PDF
How to debug IoT Agents
Creating PostgreSQL-as-a-Service at Scale
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Dev Ops without the Ops
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
DevOps Fest 2019. Gianluca Arbezzano. DevOps never sleeps. What we learned fr...
So you want to build a mobile app - HTML5 vs. Native @ the Boston Mobile Expe...
Dev and Ops Collaboration and Awareness at Etsy and Flickr
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
Designing and Implementing a Multiuser Apps Platform
Data Pipelines - Big Data meets Salesforce
Cross-platform logging and analytics
Developing Brilliant and Powerful APIs in Ruby & Python
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Intro to DefectDojo at OWASP Switzerland
2012 03 27_philly_jug_rewrite_static
Data Driven DevOps
Codemotion Progressive Web Applications Pwa Webinar - Jorge Ferreiro - @jgfer...
StackStorm DevOps Automation Webinar
How to debug IoT Agents
Ad

More from Claude Falguiere (16)

PDF
Présentation du user group Duchess France au GDG de Nantes
PDF
Présentation de Page Speed au GDG de Nantes
PDF
Présentation Performances Montpellier
PDF
Pres android nuit de l'info v3
PDF
Performance test - YaJUG Octobre 2012
PDF
La marmite Intro session NoSQL
PDF
Analyse de données avec Incanter
PDF
Analyse de données avec Incanter
KEY
Quickie Incanter/Clojure à Devoxx France 2012
PDF
Diagnostic performances
PDF
Usine logicielle ios
PDF
Deploiement continu breizh camp
PDF
Deploiement continu AgileFfrance 2011
PDF
Mesurer les performances avec JMeter
PDF
No sql pour valtech tech days
PDF
Mesurer Les Performances Avec JMeter Cours Du Soir Valtech 25 Mars 2010
Présentation du user group Duchess France au GDG de Nantes
Présentation de Page Speed au GDG de Nantes
Présentation Performances Montpellier
Pres android nuit de l'info v3
Performance test - YaJUG Octobre 2012
La marmite Intro session NoSQL
Analyse de données avec Incanter
Analyse de données avec Incanter
Quickie Incanter/Clojure à Devoxx France 2012
Diagnostic performances
Usine logicielle ios
Deploiement continu breizh camp
Deploiement continu AgileFfrance 2011
Mesurer les performances avec JMeter
No sql pour valtech tech days
Mesurer Les Performances Avec JMeter Cours Du Soir Valtech 25 Mars 2010

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
medical staffing services at VALiNTRY
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
AI in Product Development-omnex systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
System and Network Administration Chapter 2
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Essential Infomation Tech presentation.pptx
PTS Company Brochure 2025 (1).pdf.......
How Creative Agencies Leverage Project Management Software.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Design an Analysis of Algorithms I-SECS-1021-03
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
2025 Textile ERP Trends: SAP, Odoo & Oracle
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
CHAPTER 2 - PM Management and IT Context
medical staffing services at VALiNTRY
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Upgrade and Innovation Strategies for SAP ERP Customers
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
AI in Product Development-omnex systems
VVF-Customer-Presentation2025-Ver1.9.pptx
System and Network Administration Chapter 2
Design an Analysis of Algorithms II-SECS-1021-03
Understanding Forklifts - TECH EHS Solution
Essential Infomation Tech presentation.pptx

Devoxx 2014 monitoring

  • 1. Monitoring ! Claude Falguière Valtech Paris #DV14 #Monitoring @cfalguiere
  • 2. Content • DevOps is more than tooling! ! ! • Make you love Data! Individuals and interactions over processes and tools ! • Motivations for providing and collecting data! ! • Monitoring user stories and practices! ! • Getting started and open source tooling #DV14 #Monitoring @cfalguiere
  • 3. Claude Falguiere • Devoxx4Kids! • Paris JUG, Devoxx France, Duchess! ! http://cfalguiere.wordpress.com ! • DevOps Coach ! • Java, Performance! #DV14 #Monitoring @cfalguiere
  • 4. Monitoring What would you do if you knew that database is broken that number of hits doubles every 2 month that users struggle to find the order form why the app is slow what users want to buy #DV14 #Monitoring @cfalguiere
  • 5. Model 138 ms 742 orders 42 users Questions Model Hypothesis Facts Data Sales increased by 14% Estimated orders next month 934 Average number of requests is 5 times the number of users #DV14 #Monitoring @cfalguiere
  • 6. Galaxy Rotation Problem Spiral galaxies spin too fast ! ! Expected mass should be ten times the observed mass - calculated from the visible objets - to prevent galaxies from flying apart #DV14 #Monitoring @cfalguiere
  • 7. Discovery of Dark Matter Assumes readings are wrong 1932 - 1933 1960 - 1970 2010 - 2013 Jan Oort Fritz Zwicky Hypothesis of a missing mass ?? If readings are true, is model wrong ? Mass calculated from gravitational effects and evidence of Dark Matter Plank Satellite Dark matter estimated to 84.5% of the total matter in the universe Vera Rubin #DV14 #Monitoring @cfalguiere
  • 8. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  • 9. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  • 10. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  • 11. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  • 12. What would you do if you knew that database is broken that number of hits doubles every 2 month that users struggle to find the order form why the app is slow what users want to buy #DV14 #Monitoring @cfalguiere
  • 13. Motivations and user stories SLA observance! Alerting Alerting Diagnosis / Post-Mortem! Capacity Planning! Improvement Storage, Visualization #DV14 #Monitoring @cfalguiere
  • 14. Motivations and user stories SLA observance! Alerting Diagnosis / Post-Mortem! Capacity Planning! Improvement Alerting Storage, Visualization #DV14 #Monitoring @cfalguiere
  • 15. Architecture Collector Probe App Alerting Storage, Aggregation Dev Log Parser Support Log Network System DBA Visualization #DV14 #Monitoring @cfalguiere
  • 16. Architecture Collector Probe App Alerting Storage, Aggregation Dev Log Parser Support Log Network System DBA Visualization #DV14 #Monitoring @cfalguiere
  • 17. Architecture Collector Probe App Alerting Storage, Aggregation Dev Log Parser Support Log Network System DBA Visualization #DV14 #Monitoring @cfalguiere
  • 18. Collector System Collector App Log Storage MQ Storage Alerting filters rules MQ #DV14 #Monitoring @cfalguiere
  • 19. Collector System Collector App Log Storage MQ Storage Alerting filters rules MQ #DV14 #Monitoring @cfalguiere
  • 20. Topology App Platform App Monitoring Platform Alerting Collector! Visualization Log Parser Log Storage, Aggregation #DV14 #Monitoring @cfalguiere
  • 21. Resilience App Platform App Monitoring Platform Alerting Collector Log Parser Log MQ Collector Storage, Aggregation Visualization MQ #DV14 #Monitoring @cfalguiere
  • 22. What would you do if you knew that database is broken #DV14 #Monitoring @cfalguiere
  • 23. Error detection and alerting • Log filtering ! • Event firing! ! • Context! • is it critical ?! • which feature does it impact ?! • how deep is the impact ? #DV14 #Monitoring @cfalguiere
  • 24. Is this a log ? Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: ! Access denied for user 'shopapp'@'shprdb1' to database 'shop'! at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)! at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)! at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)! at java.lang.reflect.Constructor.newInstance(Unknown Source)! at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)! at com.mysql.jdbc.Util.getInstance(Util.java:386)! at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)! at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)! at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)! at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:928)! at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1750)! at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1290)! at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2493)! at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2526)! at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2311)! at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)! at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)! at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)! at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)! at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)! at java.lang.reflect.Constructor.newInstance(Unknown Source)! at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)! at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)! at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:347)! at java.sql.DriverManager.getConnection(Unknown Source)! #DV14 #Monitoring @cfalguiere
  • 25. Log example 2013-12-17 05:53:16,208 ERROR [Order Creation Service](456713) [shpras2](web1234) Could not create order id=456713 - Cause: Can’t connect to database ‘shop” - MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! 2013-12-17 05:53:16,208 ! ERROR ! [Order Creation Service]! (456713) ! [shpras2]! (web1234) ! Could not create order id=456713 ! Cause: Can’t connect to database ‘shop” ! MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! #DV14 #Monitoring @cfalguiere
  • 26. Timestamp 2013-12-17 05:53:16,208 ! ERROR ! [Order Creation Service]! (456713) ! [shpras2]! (web1234) ! Could not create order id=456713 ! Cause: Can’t connect to database ‘shop” ! MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! Severity }Context (technical and business) { Meaningful information #DV14 #Monitoring @cfalguiere
  • 27. Log Collectors Collector Collectd Logstash storage Log Alerting! System Flume Splunk (Commercial) #DV14 #Monitoring @cfalguiere
  • 28. Logstash input {! file {! path => “/app/logs/apache/*.log”! type => "apachelog"! }! }! ! filter {! if [type] == "apachelog" {! grok {! pattern => “%{COMBINEDAPACHELOG}" ! }! }! }! ! output {! elasticsearch { host => localhost } ! stdout { }! } #DV14 #Monitoring @cfalguiere
  • 29. Logstash input {! file {! path => “/app/logs/appserver/monitor*.log"! type => "applog"! }! }! ! filter {! if [type] == "applog" {! grok {! pattern => “%{TIMESTAMP_ISO8601:ts}” %{WORD}:severity …! }! }! }! ! output {! elasticsearch { host => localhost } ! stdout { }! } #DV14 #Monitoring @cfalguiere
  • 30. Rate check • Frequency of an error increases! • Activity falls (e.g. Frequency of orders)! ! • Alerting based on threshold #DV14 #Monitoring @cfalguiere
  • 31. Baselining 120 90 60 30 0 A B 10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10 200 150 100 50 0 D C 10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10 #DV14 #Monitoring @cfalguiere
  • 32. What would you do if you knew that number of hits doubles every 2 month 120 90 60 30 0 Jan Feb Mar Apr May Jun Jul Aug #DV14 #Monitoring @cfalguiere
  • 33. Graphers 70 52,5 35 17,5 0 30 22,5 15 7,5 0 • Foresight Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec • Cycles Sun Tue Thu Sat Mon Wed Wed 40 30 20 10 0 • Correlation Sun Tue Thu Sat Mon WedWed • Distribution 100 75 50 25 0 April May June July #DV14 #Monitoring @cfalguiere
  • 34. Storage / Visualization Collectors (Collectd / Statd / Logstash / Flume) Plain REST Graphite docker: lopter/collectd-graphite #DV14 #Monitoring @cfalguiere
  • 35. Collect and Share Collect Once and Share! • Support, ! • Ops, Dev! • Business! ! UpToDate! Flexible! ! #DV14 #Monitoring @cfalguiere
  • 36. Storage / Visualization Collectors (Collectd / Statd / Logstash / Flume) Plain REST REST REST Graphite InfluxDB Grafana docker: gsogol/docker-elk Logstash ElasticSearch Kibana #DV14 #Monitoring @cfalguiere
  • 37. JMX source: wikipedia • MBeans! • Registration! • Servo! • RMI and firewalls! • -Dcom.sun.management.jmxremote.rmi.port=p! • -Djava.rmi.server.hostname=n.n.n.n! • Jolokia! • jmxtrans! ! #DV14 #Monitoring @cfalguiere
  • 38. JMX Collectors storage Collector logstash collectd JMX beans VisualVM! JConsole JMX Enabled! ! App Performance Monitoring tools #DV14 #Monitoring @cfalguiere
  • 39. JSON Event over REST curl -X POST “…” ! Timestamp -d '{"ts": "2013-12-17 05:53:16,208", ! ! "type": “metric”, ! ! “module”: “Order Creation Service”, ! ! “module-id”: “456713”, ! ! “instance”: “shpras2”, ! ! “thread”: “web1234”, ! “name”: “order-creation”,! ! “duration”: “12”, ! ! “unit”: “ms”} } Context (technical and business) } Metric) #DV14 #Monitoring @cfalguiere
  • 40. What would you do if you knew why app is slow #DV14 #Monitoring @cfalguiere
  • 41. Tuning • Collectd/Statd plugins! • Metrics ! • Commercial : Plumbr, AppDynamics, New Relics! ! ! Where does it spend time ?! Why ? cross-check metrics from various sub-systems Front-End Back-End System DB System System #DV14 #Monitoring @cfalguiere
  • 42. What would you do if you knew that users struggle to find the order form #DV14 #Monitoring @cfalguiere
  • 43. Web Analytics / User tracking • Web analytics! • Page counters! • Tagging! • Log parser! ! • Google Analytics! • Piwik (docker: cfalguiere/docker-piwik) • Reporting APIs #DV14 #Monitoring @cfalguiere
  • 44. What would you do if you knew what users want to buy #DV14 #Monitoring @cfalguiere
  • 45. Model vs Big Data • Expected information! • Explicit Model! • List of metrics • Classification! • Machine Learning! • Patterns detection! Highlights valuable metrics and relationships #DV14 #Monitoring @cfalguiere
  • 46. Getting started List user stories and metrics setup monitoring get facts get add metrics hypothesis validate hypothesis get facts #DV14 #Monitoring @cfalguiere
  • 47. What should I monitor ? Alerting & Post-Mortem :! Presence check Activity (how many users, requests, orders …) Ressources that are limited in size Physical : CPU, memory, free disk space, network bandwidth ... Logical : pools, queues, caches, … Errors Others #DV14 #Monitoring @cfalguiere
  • 48. What should I monitor ? Plan & Improve :! Any information which is useful to understand the process time spent for each major step things that are done often or requires large datasets user navigation context Listen to users and ops #DV14 #Monitoring @cfalguiere
  • 49. Learn from data Continuous Improvement Design for Failure #DV14 #Monitoring @cfalguiere
  • 50. Thank You #DV14 #Monitoring @cfalguiere