SlideShare a Scribd company logo
Monitoring in 2017
Challenges in monitoring containers, and dynamic
infrastructure.
TIAD

Oct 6, 2017
Charly Fontaine
Software Engineer - Containers team

Datadog
CharlyF
[charly@datadoghq.com]
Name: Charly Fontaine
Role: Software Engineer


Interests:
* Containerized Infrastructures
* Monitoring all the things
* Motorbikes
• SaaS based infrastructure and app monitoring
• Open Source Agent
• Time series data (metrics and events)
• Processing nearly a trillion data points per day
• Intelligent Alerting
• We’re hiring! (www.datadoghq.com/careers/)
Datadog Overview
Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches,
Queues and more...
Monitor Everything
$ cat ~/.plan
1. Intro: The Importance of Monitoring
2. The Challenge: Monitoring Dynamic Infrastructure
3. Finding the Signal: How do we know what to monitor?
4. Implementation: Applying it to Containerized Workloads
5. Demo: Monitoring of a containerized web app deployment
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
Collecting data is cheap;

not having it when you
need it can be expensive
Instrument all the things!
Sharing
Using and Sharing the same
metrics and measurements
across teams is key to avoiding
misunderstandings.
Why do we focus on Docker and
Containers?
Source: http://bit.ly/1RQRsXW
When the choice of technology is
determined by what is popular on
HackerNews that week.
Hacker News Driven Development
Monitoring in 2017 - TIAD Camp Docker
https://www.datadoghq.com/docker-adoption/
Docker Adoption Growth
We’ve see 5x increase of Docker adoption over the last year.
Monitoring in 2017 - TIAD Camp Docker
Source: Datadog
Source: http://bit.ly/1qFylWK
Monitoring in 2017 - TIAD Camp Docker
Open Questions
• Where is my container running?
• What is the capacity of my cluster?
• What’s the total throughput of my app?
• What’s its response time per tag? (app, version, region)
• What’s the distribution of 5xx error per container?
More Details at: http://www.datadoghq.com/blog/monitoring-101-alerting/
Monitoring VS Observing
Examples: NGINX - Metrics
Resource Metrics:

• Disk I/O
• Memory
• CPU
• Queue Length
Work Metrics: 

• Requests Per Second
• Request Time
• Error Rates (4xx or 5xx)
• Success (2xx)
Examples: NGINX - Events
• Configuration Change
• Code Deployment
• Service Started / Stopped
Examples: Events
What to demand from our
monitoring tooling?
Cryptic Alerts
W
H
A
T
?
EVERY ALERT MUST BE ACTIONABLE
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
Query Based Monitoring
“What’s the average throughput of
application:nginx per version ?”
“Alert me when one of my pod from replication
controller:foo is not behaving like the others?”
“Show me rate of HTTP 500 responses from nginx”
“… across all data centers”
“… running my app version 2….”
Getting at the metrics…
Resource Metrics
Utilization:
• CPU (user + system)
• memory
• i/o
• network traffic
Saturation
• throttling
• swap
Error
• Network Errors 

(receive vs transmit)
Container Events
• Starting / Stopping Containers
• Scaling Events for Underlying Instances
• Deploying a new container build
Pseudo-files
• Provide visibility into container metrics via the file system.
• Generally under: 

/cgroup/<resource>/docker/$CONTAINER_ID/ 

or

/sys/fs/cgroup/<resource>/docker/$CONTAINER_ID/

Pseudo-files: CPU Metrics
$ cat /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/cpuacct.stat
> user 2451 # time spent running processes since boot
> system 966 # time spent executing system calls since boot
$ cat /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.stat
> nr_periods 565 # Number of enforcement intervals that have elapsed
> nr_throttled 559 # Number of times the group has been throttled
> throttled_time 12119585961 # Total time that members of the group were throttled (12.12 seconds)
Pseudo-files: CPU Throttling
Docker API
• Detailed streaming metrics as JSON HTTP socket

$ curl -v --unix-socket /var/run/docker.sock http://localhost/containers/
28d7a95f468e/stats

Side Car Containers
Service Discovery
Docker API Kubernetes
Monitoring Agent
Container
A O A O
Containers List &
Metadata
Additional Metadata
(Tags, etc)
Config Backends
Integration Configurations
Host Level
Metrics
Monitoring in 2017 - TIAD Camp Docker
Custom Metrics
• Instrument custom applications

• You know your key transactions best.

• Use async protocols like Etys’ STATSD or 

DogstatsD
My friend Martin
The demo
Resources
Monitoring 101: Alerting 

https://www.datadoghq.com/blog/monitoring-101-alerting/
Monitoring 101: Collecting the Right Data
https://www.datadoghq.com/blog/monitoring-101-collecting-data/
Monitoring 101: Investigating performance issues
https://www.datadoghq.com/blog/monitoring-101-investigation/

The Power of Tagged Metrics
https://www.datadoghq.com/blog/the-docker-monitoring-problem/
How to Collect Docker Metrics
https://www.datadoghq.com/blog/how-to-collect-docker-metrics/
8 surprising facts about Docker Adoption
https://www.datadoghq.com/docker-adoption/

More Related Content

PDF
Strategy, planning and governance for enterprise deployments of containers - ...
PDF
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
PDF
Docker in Production, Look No Hands! by Scott Coulton
PDF
Hacking into your containers, and how to stop it!
PDF
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
PDF
Docker for Ops - Scott Coulton, Puppet
PDF
Continuous Delivery With Containers
PPTX
SDLC Using Docker for Fun and Profit
Strategy, planning and governance for enterprise deployments of containers - ...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Docker in Production, Look No Hands! by Scott Coulton
Hacking into your containers, and how to stop it!
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
Docker for Ops - Scott Coulton, Puppet
Continuous Delivery With Containers
SDLC Using Docker for Fun and Profit

What's hot (20)

PDF
Windows container security
PPT
Where and When to Docker
PDF
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
PDF
Empower Your Docker Containers with Watson - DockerCon 2017 Austin
PDF
DCSF19 Container Security: Theory & Practice at Netflix
PDF
Building your production tech stack for docker container platform
PDF
Demystifying container connectivity with kubernetes in docker
PDF
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
PDF
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
PPTX
DockerCon 16 General Session Day 1
PDF
Securing the Container Pipeline at Salesforce by Cem Gurkok
PDF
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
PDF
Automated hardware testing using docker for space
PDF
Immutable Awesomeness by John Willis and Josh Corman
PDF
Talking TUF: Securing Software Distribution
PPTX
Implementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
PDF
DCSF19 How To Build Your Containerization Strategy
PDF
Scalable and Available Services with Docker and Kubernetes
PDF
Infrastructure as Code with Ansible
PPTX
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Windows container security
Where and When to Docker
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Empower Your Docker Containers with Watson - DockerCon 2017 Austin
DCSF19 Container Security: Theory & Practice at Netflix
Building your production tech stack for docker container platform
Demystifying container connectivity with kubernetes in docker
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DockerCon 16 General Session Day 1
Securing the Container Pipeline at Salesforce by Cem Gurkok
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
Automated hardware testing using docker for space
Immutable Awesomeness by John Willis and Josh Corman
Talking TUF: Securing Software Distribution
Implementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
DCSF19 How To Build Your Containerization Strategy
Scalable and Available Services with Docker and Kubernetes
Infrastructure as Code with Ansible
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Ad

Similar to Monitoring in 2017 - TIAD Camp Docker (20)

PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
PDF
Proactive ops for container orchestration environments
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
PDF
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
PPTX
ThroughTheLookingGlass_EffectiveObservability.pptx
PPTX
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
PPTX
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
PDF
Data in Motion - tech-intro-for-paris-hackathon
PDF
Monitoring as Software Validation
PPTX
Monitoring federation open stack infrastructure
PPTX
What's New in Docker - February 2017
PDF
Cloudera federal summit
PDF
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
PDF
OpenStack Murano
PPTX
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
PDF
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
PDF
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
PPTX
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik
PPTX
SplunkLive! Munich 2018: Data Onboarding Overview
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Proactive ops for container orchestration environments
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
ThroughTheLookingGlass_EffectiveObservability.pptx
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
Data in Motion - tech-intro-for-paris-hackathon
Monitoring as Software Validation
Monitoring federation open stack infrastructure
What's New in Docker - February 2017
Cloudera federal summit
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Apache Flink: Real-World Use Cases for Streaming Analytics
OpenStack Murano
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik
SplunkLive! Munich 2018: Data Onboarding Overview
Ad

More from The Incredible Automation Day (20)

PDF
A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
PDF
Docker in real life and in the Cloud - TIAD Camp Docker
PDF
Orchestrating Docker in production - TIAD Camp Docker
PPTX
Cluster SQL - TIAD Camp Microsoft Cloud Readiness
PPTX
Build the VPC - TIAD Camp Microsoft Cloud Readiness
PPTX
Opening Keynote - TIAD Camp Microsoft Cloud Readiness
PPTX
Replatforming - TIAD Camp Microsoft Cloud Readiness
PPTX
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
PPTX
Active Directory - TIAD Camp Microsoft Cloud Readiness
PPTX
Application Stack - TIAD Camp Microsoft Cloud Readiness
PPTX
Keynote TIAD Camp Serverless
PPTX
From AIX to Zero-ops by Pierre Baillet
PDF
Serverless low cost analytics by Adways y Audric Guigon
PPTX
Operationnal challenges behind Serverless architectures by Laurent Bernaille
PDF
Build chatbots with api.ai and Google cloud functions
PDF
Real time serverless data pipelines on AWS
PPTX
PPTX
TIAD 2016 - Beyond windowsautomation
PPTX
TIAD 2016 : Application delivery in a container world
PDF
TIAD 2016 : Where DevOps is going next by George Miranda
A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
Docker in real life and in the Cloud - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
Cluster SQL - TIAD Camp Microsoft Cloud Readiness
Build the VPC - TIAD Camp Microsoft Cloud Readiness
Opening Keynote - TIAD Camp Microsoft Cloud Readiness
Replatforming - TIAD Camp Microsoft Cloud Readiness
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
Active Directory - TIAD Camp Microsoft Cloud Readiness
Application Stack - TIAD Camp Microsoft Cloud Readiness
Keynote TIAD Camp Serverless
From AIX to Zero-ops by Pierre Baillet
Serverless low cost analytics by Adways y Audric Guigon
Operationnal challenges behind Serverless architectures by Laurent Bernaille
Build chatbots with api.ai and Google cloud functions
Real time serverless data pipelines on AWS
TIAD 2016 - Beyond windowsautomation
TIAD 2016 : Application delivery in a container world
TIAD 2016 : Where DevOps is going next by George Miranda

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
A Presentation on Artificial Intelligence
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation

Monitoring in 2017 - TIAD Camp Docker

  • 1. Monitoring in 2017 Challenges in monitoring containers, and dynamic infrastructure. TIAD Oct 6, 2017 Charly Fontaine Software Engineer - Containers team
 Datadog
  • 2. CharlyF [charly@datadoghq.com] Name: Charly Fontaine Role: Software Engineer 
 Interests: * Containerized Infrastructures * Monitoring all the things * Motorbikes
  • 3. • SaaS based infrastructure and app monitoring • Open Source Agent • Time series data (metrics and events) • Processing nearly a trillion data points per day • Intelligent Alerting • We’re hiring! (www.datadoghq.com/careers/) Datadog Overview
  • 4. Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches, Queues and more... Monitor Everything
  • 5. $ cat ~/.plan 1. Intro: The Importance of Monitoring 2. The Challenge: Monitoring Dynamic Infrastructure 3. Finding the Signal: How do we know what to monitor? 4. Implementation: Applying it to Containerized Workloads 5. Demo: Monitoring of a containerized web app deployment
  • 9. Collecting data is cheap;
 not having it when you need it can be expensive
  • 11. Sharing Using and Sharing the same metrics and measurements across teams is key to avoiding misunderstandings.
  • 12. Why do we focus on Docker and Containers?
  • 14. When the choice of technology is determined by what is popular on HackerNews that week. Hacker News Driven Development
  • 17. Docker Adoption Growth We’ve see 5x increase of Docker adoption over the last year.
  • 22. Open Questions • Where is my container running? • What is the capacity of my cluster? • What’s the total throughput of my app? • What’s its response time per tag? (app, version, region) • What’s the distribution of 5xx error per container?
  • 23. More Details at: http://www.datadoghq.com/blog/monitoring-101-alerting/
  • 25. Examples: NGINX - Metrics Resource Metrics:
 • Disk I/O • Memory • CPU • Queue Length Work Metrics: 
 • Requests Per Second • Request Time • Error Rates (4xx or 5xx) • Success (2xx)
  • 26. Examples: NGINX - Events • Configuration Change • Code Deployment • Service Started / Stopped
  • 28. What to demand from our monitoring tooling?
  • 30. EVERY ALERT MUST BE ACTIONABLE
  • 33. Query Based Monitoring “What’s the average throughput of application:nginx per version ?” “Alert me when one of my pod from replication controller:foo is not behaving like the others?” “Show me rate of HTTP 500 responses from nginx” “… across all data centers” “… running my app version 2….”
  • 34. Getting at the metrics…
  • 35. Resource Metrics Utilization: • CPU (user + system) • memory • i/o • network traffic Saturation • throttling • swap Error • Network Errors 
 (receive vs transmit)
  • 36. Container Events • Starting / Stopping Containers • Scaling Events for Underlying Instances • Deploying a new container build
  • 37. Pseudo-files • Provide visibility into container metrics via the file system. • Generally under: 
 /cgroup/<resource>/docker/$CONTAINER_ID/ 
 or
 /sys/fs/cgroup/<resource>/docker/$CONTAINER_ID/

  • 38. Pseudo-files: CPU Metrics $ cat /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/cpuacct.stat > user 2451 # time spent running processes since boot > system 966 # time spent executing system calls since boot $ cat /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.stat > nr_periods 565 # Number of enforcement intervals that have elapsed > nr_throttled 559 # Number of times the group has been throttled > throttled_time 12119585961 # Total time that members of the group were throttled (12.12 seconds) Pseudo-files: CPU Throttling
  • 39. Docker API • Detailed streaming metrics as JSON HTTP socket
 $ curl -v --unix-socket /var/run/docker.sock http://localhost/containers/ 28d7a95f468e/stats

  • 41. Service Discovery Docker API Kubernetes Monitoring Agent Container A O A O Containers List & Metadata Additional Metadata (Tags, etc) Config Backends Integration Configurations Host Level Metrics
  • 43. Custom Metrics • Instrument custom applications
 • You know your key transactions best.
 • Use async protocols like Etys’ STATSD or 
 DogstatsD
  • 45. Resources Monitoring 101: Alerting 
 https://www.datadoghq.com/blog/monitoring-101-alerting/ Monitoring 101: Collecting the Right Data https://www.datadoghq.com/blog/monitoring-101-collecting-data/ Monitoring 101: Investigating performance issues https://www.datadoghq.com/blog/monitoring-101-investigation/
 The Power of Tagged Metrics https://www.datadoghq.com/blog/the-docker-monitoring-problem/ How to Collect Docker Metrics https://www.datadoghq.com/blog/how-to-collect-docker-metrics/ 8 surprising facts about Docker Adoption https://www.datadoghq.com/docker-adoption/