SlideShare a Scribd company logo
Monitoring
Challenges in a
World of
Automation
Monitoring is hard enough on its own.
Automation makes it harder.
Anthony Goddard
VP Operations
Sensu, Inc.
@anthonygoddard // @sensu
● Open core monitoring framework, released in 2011
● Enterprise offering launched in 2015
● Sensu Inc formed in January 2017
● 20 employees & growing!
About Sensu
What is Sensu?
● An open source, cloud native monitoring framework
● The monitoring router
● Infrastructure, service, and application monitoring
● Designed for automation
● Cross platform (linux, Windows, BSD, AIX, Solaris, MacOS, etc)
● Learn more: https://sensuapp.org
Mission Statement
Obviate the need to (re)build custom monitoring solutions.
This isn't a talk about
Sensu.
Purpose of this talk
● Discuss challenges of monitoring ephemeral systems
● Review basic cloud native monitoring requirements
○ Automated discovery
○ Automated monitoring
○ Automated decommissioning
● Talk about cloud native monitoring anti-patterns
● Live demo! (what could possibly go wrong?)
Let's do this
Cloud computing has
changed the world.
Which came first? Cloud computing or DevOps?
Problem Statement
● Cloud platforms and automation systems cause changes in
infrastructure that increase the complexity of monitoring
● New systems/endpoints must be discovered and monitored
automatically
● Monitoring must now distinguish the subtle differences between
"down" and "decommissioned"
Expectations
Our infrastructure is becoming increasingly more automated and ephemeral.
Shouldn't we expect similar capabilities from our monitoring?
Cloud Native Monitoring Requirements
Overview
1. Automated discovery
2. Automated monitoring
3. Automated decommissioning
1. Automated
Discovery
New systems should be
automatically discovered.
Cloud Native Monitoring Requirements
Cloud concepts
● Provisioning events create and replace instances
● Cloud providers automate replication of instances (e.g.
auto-scaling groups, etc)
● APIs allow external systems to invoke provisioning events
Automated Discovery
Automated Discovery
Cloud monitoring anti-patterns
● Polling-based discovery (regardless of protocol)
● Discovery that precludes complex network topologies
● Punching holes in firewalls (ingress traffic)
Polling is not a reliable discovery solution.
Automated Discovery
Cloud-native monitoring requirements
● New systems must be discovered in realtime
● Provide push-based or event-based discovery + discovery APIs
2. Automated
Monitoring
New systems should be
monitored automatically.
Cloud Native Monitoring Requirements
Automated Monitoring
Cloud concepts
● Almost all infrastructures are distributed systems
● Disparate systems fulfill unique roles (e.g. db, web service)
● Simple architectures = one or more roles per system
● Complex architectures = one role per system
Automated Monitoring
Cloud monitoring anti-patterns
● Monitoring configuration mapped to individual systems
● Monitoring via remote access (e.g. SSH, WinRM, NRPE)
Nope.
Automated Monitoring
Cloud-native monitoring requirements
● Monitoring configuration should be mapped to roles
● Monitoring should begin the moment systems come online
Automated monitoring should "just work"
3. Automated
Decommissioning
Terminated systems should be
automatically removed
from monitoring.
Cloud Native Monitoring Requirements
Automated Decommissioning
Cloud Concepts
● Utility computing incentivizes cost savings
● Decommission systems when not in use, or during reduced load
● Intentional actions look very similar to failure scenarios
Automated Decommissioning
Cloud monitoring anti-patterns
● Making assumptions about the lack of monitoring data
● Making assumptions about the loss of network connectivity
● Using a monitoring system as a source of absolute truth
Cloud-native monitoring requirements
● Should be invoked by the terminated system (i.e. stop signal)
● May be triggered by the provisioning system (i.e. via APIs)
● Optionally verified via external source(s) of truth (as needed)
● Must be the most reliable function of the monitoring system
Automated Decommissioning
When you can no longer trust your monitoring alerts.
Demo!
But first, some questions…
Public/Private Cloud (IaaS)
Who knows what "the cloud" is?
Who understands basic cloud computing
concepts like ASGs and ELBs?
Who is currently using a IaaS provider like
AWS, GCP, Azure, or OpenStack?
Kubernetes
Who knows what Kubernetes is?
Who has Kubernetes on their roadmap?
Who is currently using Kubernetes?
Audience participation time!
(DEMO)
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
QUESTIONS?
Conclusion
● Cloud computing introduces challenges that demand
cloud-native monitoring solutions.
● Monitoring solutions must automatically discover new systems.
● Monitoring configuration should be applied automatically.
● Monitoring should comprehend "down" vs "decommissioned".
Thank You

More Related Content

PDF
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
PDF
OSMC 2017 | Current State of Icinga by Erk Bernd
PDF
Proactive monitoring tools or services - Open Source
PDF
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
ODP
Icinga Camp Belgrade - ITAF Monitoring best practices & demo
PDF
OSMC 2017 | Icinga 2 Multi Zone HA Setup using Ansible by Toshaan Bharvani
PPTX
Icinga camp ams 2016 icinga2
PDF
OSMC 2017 | Ops and dev stories- Integrate everything into your monitoring st...
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Current State of Icinga by Erk Bernd
Proactive monitoring tools or services - Open Source
OSMC 2017 | Building a Monitoring solution for modern applications by Martin ...
Icinga Camp Belgrade - ITAF Monitoring best practices & demo
OSMC 2017 | Icinga 2 Multi Zone HA Setup using Ansible by Toshaan Bharvani
Icinga camp ams 2016 icinga2
OSMC 2017 | Ops and dev stories- Integrate everything into your monitoring st...

What's hot (18)

PPTX
Nagios Conference 2014 - Tanja Lewit - Nagios and Kentix System Partners - Cr...
PDF
DevSecOps - Security in DevOps
PDF
MoniTutor
ODP
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
PPTX
Icinga Camp Amsterdam - How to monitor Windows
PPTX
Bbva bank on Open Stack
PPTX
Zabbix
PDF
Icinga Camp Bangalore - Icinga2 and Ansible
PPTX
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
PDF
Icinga Director
PPTX
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
PDF
Icinga Camp San Diego 2016 - Icinga Director
PPTX
Icinga at Flossuk 2015 in York
PPTX
Icinga Camp Berlin 2017 - Integrations all the way
PPTX
Presentation about Icinga at Kiratech DevOps Day in Verona
PDF
Icinga Camp Berlin 2017 - Icinga Director
PPTX
Icinga @ OSMC 2014
PPTX
Using Puppet With A Secrets Server
Nagios Conference 2014 - Tanja Lewit - Nagios and Kentix System Partners - Cr...
DevSecOps - Security in DevOps
MoniTutor
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
Icinga Camp Amsterdam - How to monitor Windows
Bbva bank on Open Stack
Zabbix
Icinga Camp Bangalore - Icinga2 and Ansible
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
Icinga Director
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp San Diego 2016 - Icinga Director
Icinga at Flossuk 2015 in York
Icinga Camp Berlin 2017 - Integrations all the way
Presentation about Icinga at Kiratech DevOps Day in Verona
Icinga Camp Berlin 2017 - Icinga Director
Icinga @ OSMC 2014
Using Puppet With A Secrets Server
Ad

Similar to OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard (20)

PDF
Make monitoring ready for cloud native applications
PDF
Cloud Observability in Action MEAP V06 Michael Mh9 Hausenblas
PDF
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
PDF
Introduction to Cloud Native Computing
PPTX
What does "monitoring" mean? (FOSDEM 2017)
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
PDF
Monitoring Big Data Systems - "The Simple Way"
DOCX
Observability A Critical Practice to Enable Digital Transformation
PDF
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
PDF
Microservices meetup April 2017
PDF
Distributed Monitoring and Cloud Scaling
PDF
OpenStack monitoring - Unidata S.p.A. Case Report
PDF
Pitch Deck Teardown: Lumigo's $29 million Series A deck
PPTX
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
PPTX
Unified Situational Awareness Dashboard for Spacecraft Operations: an inte...
PDF
DZone webinar - Shift left Observability
PPTX
Serverless Computing & Automation - GCP
PDF
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
PDF
Cloud native defined
PDF
OSMC 2013 | Distributed Monitoring and Cloud Scaling for Web Apps by Fernando...
Make monitoring ready for cloud native applications
Cloud Observability in Action MEAP V06 Michael Mh9 Hausenblas
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Introduction to Cloud Native Computing
What does "monitoring" mean? (FOSDEM 2017)
Evolution of Monitoring and Prometheus (Dublin 2018)
Monitoring Big Data Systems - "The Simple Way"
Observability A Critical Practice to Enable Digital Transformation
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Microservices meetup April 2017
Distributed Monitoring and Cloud Scaling
OpenStack monitoring - Unidata S.p.A. Case Report
Pitch Deck Teardown: Lumigo's $29 million Series A deck
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Unified Situational Awareness Dashboard for Spacecraft Operations: an inte...
DZone webinar - Shift left Observability
Serverless Computing & Automation - GCP
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Cloud native defined
OSMC 2013 | Distributed Monitoring and Cloud Scaling for Web Apps by Fernando...
Ad

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Introduction to Artificial Intelligence
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Transform Your Business with a Software ERP System
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
medical staffing services at VALiNTRY
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPT
Introduction Database Management System for Course Database
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PTS Company Brochure 2025 (1).pdf.......
Introduction to Artificial Intelligence
Understanding Forklifts - TECH EHS Solution
Operating system designcfffgfgggggggvggggggggg
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Transform Your Business with a Software ERP System
Online Work Permit System for Fast Permit Processing
ISO 45001 Occupational Health and Safety Management System
2025 Textile ERP Trends: SAP, Odoo & Oracle
Design an Analysis of Algorithms II-SECS-1021-03
medical staffing services at VALiNTRY
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Introduction Database Management System for Course Database
How to Choose the Right IT Partner for Your Business in Malaysia
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Migrate SBCGlobal Email to Yahoo Easily
ManageIQ - Sprint 268 Review - Slide Deck
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard

  • 1. Monitoring Challenges in a World of Automation Monitoring is hard enough on its own. Automation makes it harder.
  • 2. Anthony Goddard VP Operations Sensu, Inc. @anthonygoddard // @sensu
  • 3. ● Open core monitoring framework, released in 2011 ● Enterprise offering launched in 2015 ● Sensu Inc formed in January 2017 ● 20 employees & growing! About Sensu
  • 4. What is Sensu? ● An open source, cloud native monitoring framework ● The monitoring router ● Infrastructure, service, and application monitoring ● Designed for automation ● Cross platform (linux, Windows, BSD, AIX, Solaris, MacOS, etc) ● Learn more: https://sensuapp.org
  • 5. Mission Statement Obviate the need to (re)build custom monitoring solutions.
  • 6. This isn't a talk about Sensu.
  • 7. Purpose of this talk ● Discuss challenges of monitoring ephemeral systems ● Review basic cloud native monitoring requirements ○ Automated discovery ○ Automated monitoring ○ Automated decommissioning ● Talk about cloud native monitoring anti-patterns ● Live demo! (what could possibly go wrong?)
  • 10. Which came first? Cloud computing or DevOps?
  • 11. Problem Statement ● Cloud platforms and automation systems cause changes in infrastructure that increase the complexity of monitoring ● New systems/endpoints must be discovered and monitored automatically ● Monitoring must now distinguish the subtle differences between "down" and "decommissioned"
  • 12. Expectations Our infrastructure is becoming increasingly more automated and ephemeral. Shouldn't we expect similar capabilities from our monitoring?
  • 13. Cloud Native Monitoring Requirements Overview 1. Automated discovery 2. Automated monitoring 3. Automated decommissioning
  • 15. New systems should be automatically discovered. Cloud Native Monitoring Requirements
  • 16. Cloud concepts ● Provisioning events create and replace instances ● Cloud providers automate replication of instances (e.g. auto-scaling groups, etc) ● APIs allow external systems to invoke provisioning events Automated Discovery
  • 17. Automated Discovery Cloud monitoring anti-patterns ● Polling-based discovery (regardless of protocol) ● Discovery that precludes complex network topologies ● Punching holes in firewalls (ingress traffic)
  • 18. Polling is not a reliable discovery solution.
  • 19. Automated Discovery Cloud-native monitoring requirements ● New systems must be discovered in realtime ● Provide push-based or event-based discovery + discovery APIs
  • 21. New systems should be monitored automatically. Cloud Native Monitoring Requirements
  • 22. Automated Monitoring Cloud concepts ● Almost all infrastructures are distributed systems ● Disparate systems fulfill unique roles (e.g. db, web service) ● Simple architectures = one or more roles per system ● Complex architectures = one role per system
  • 23. Automated Monitoring Cloud monitoring anti-patterns ● Monitoring configuration mapped to individual systems ● Monitoring via remote access (e.g. SSH, WinRM, NRPE)
  • 24. Nope.
  • 25. Automated Monitoring Cloud-native monitoring requirements ● Monitoring configuration should be mapped to roles ● Monitoring should begin the moment systems come online
  • 28. Terminated systems should be automatically removed from monitoring. Cloud Native Monitoring Requirements
  • 29. Automated Decommissioning Cloud Concepts ● Utility computing incentivizes cost savings ● Decommission systems when not in use, or during reduced load ● Intentional actions look very similar to failure scenarios
  • 30. Automated Decommissioning Cloud monitoring anti-patterns ● Making assumptions about the lack of monitoring data ● Making assumptions about the loss of network connectivity ● Using a monitoring system as a source of absolute truth
  • 31. Cloud-native monitoring requirements ● Should be invoked by the terminated system (i.e. stop signal) ● May be triggered by the provisioning system (i.e. via APIs) ● Optionally verified via external source(s) of truth (as needed) ● Must be the most reliable function of the monitoring system Automated Decommissioning
  • 32. When you can no longer trust your monitoring alerts.
  • 33. Demo! But first, some questions…
  • 34. Public/Private Cloud (IaaS) Who knows what "the cloud" is? Who understands basic cloud computing concepts like ASGs and ELBs? Who is currently using a IaaS provider like AWS, GCP, Azure, or OpenStack? Kubernetes Who knows what Kubernetes is? Who has Kubernetes on their roadmap? Who is currently using Kubernetes? Audience participation time!
  • 39. Conclusion ● Cloud computing introduces challenges that demand cloud-native monitoring solutions. ● Monitoring solutions must automatically discover new systems. ● Monitoring configuration should be applied automatically. ● Monitoring should comprehend "down" vs "decommissioned".