SlideShare a Scribd company logo
LinuxCon 2016
An introduction to datacenter
telemetry using open source tools
Matt Brender (@mjbrender)
Briefly, About Me
Am:
@mjbrender (everywhere)
Developer Advocate,
Orchestration Engineering
Pretty good at Open Source
practices
Was:
Storage array performance
VMware
NoSQL
Loose Agenda
1. Wishful thinking of the lab config
2. What is telemetry
3. One opinion on the state of open source tooling
Let’s Test the Network
4
linuxcon.snap-telemetry.io
then
git clone
I encourage you to keep downloading stuff until you’re ready to go.
Lab Hopes
5
High Level View
6
Grafana
+
InfluxDB
Snap Snap
“Admin” ”Production”
Less High Level View
7
Your Laptop
Ubuntu 16.04
Vagrant
Ubuntu 16.04Ubuntu 16.04
Less High Level View
8
Your Laptop
Ubuntu 16.04
Vagrant
Ansible
Ubuntu 16.04Ubuntu 16.04
SnapDocker Snap
Less High Level View
9
Your Laptop
Ubuntu 16.04
Vagrant
Ansible
Ubuntu 16.04Ubuntu 16.04
SnapDocker Snap
Compose
InfluxDB Grafana
Why???
10
11
Telemetry
12
Snap
collectd
StatsD
telegraf
beats
Logstash
diamond
InfluxDB
OpenTSDB
KairosDB
Graphite
Prometheus
ElasticSearch
Bosun
Grafana
Sensu
Ganglia
RRDtool
Nagios
Facette
Vector (Netflix)
13
what my friends think telemetry is what my parents think telemetry is
what society thinks telemetry is
what my boss thinks telemetry is what I think telemetry is what telemetry actually is
What Is Telemetry?
Telemetry is the stuff you can measure and the process of capturing it: from the heat
generated on a CPU core to the throughput of Nginx* running in a Docker* container on a
Kubernetes cluster. It’s all measurable and it’s all summarized in that one word.
• Telemetry - the process of using equipment to take measurements of something and
send them to another place
• Metrics - measurements of facts throughout the data center
• Analytics - the method of logical analysis that determines the consequences of
information
What Is Telemetry?
What How
Application Availability ping
Operating System
Performance
psutil
Hardware Utilization
Intel Performance
Counter Metrics (PCM)
What Is Telemetry?
What How Why
Application Availability ping SLA compliance
Operating System
Performance
psutil System performance
Hardware Utilization
Intel Performance
Counter Metrics (PCM)
Scaling capacity
What snap is and what it isn’t
17
Telemetry Analytics
What snap is and what it isn’t
18
Telemetry Analytics
snap
snap is a framework
for metrics.
snap is NOT an
analytics alternative.
What snap is and what it isn’t
19
Telemetry Analytics
Automation
Scheduling
IRO
collect process publish
The Watcher Workflow
20
21
Collectors in snap
Processors in snap
22
Publishers in snap
23
24
Collectors in snap
Collect telemetry data once via plugins for:
§ Bare metal, including Intel specific platform metrics
(CPU, NIC, BMC, SMARTS)
§ Operating Environments and existing telemetry
(Docker, libvirt, psutil)
§ Application services and adjacencies
(Ceph, HAProxy, Etcd, Facter, MySQL, Apache)
Populate a dynamically generated single-namespace
telemetry catalog
25
Filter, alter or append metadata via plugins for:
§ Filtering (Moving Averages)
§ Normalization
§ Encryption for all or part of the data set
§ Injection of metadata
§ Tokens
§ Tenant IDs
Forking to one or more endpoints
Processors in snap
26
Publish data via plugins for:
§ Dashboard Tools
(Graphite, Grafana, Riemann)
§ Queues and Logs
(RabbitMQ, Kafka, File)
§ Databases
(PostgreSQL, InfluxDB, OpenTSDB, SAP HANA)
To one or more endpoints
Publishers in snap
Visibility at all layers
27
App
OS
HW
?
?
?
?
Analytics Pipeline
Dashboards
Visibility at all layers
28
?
App
OS
HW
Analytics Pipeline
Dashboards
Visibility at all layers
29
Snap
App
OS
HW
Analytics Pipeline
Dashboards
Visibility at all layers
30
OS
HW
Analytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap
Visibility at all layers
31
OS
HW
Analytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap
Visibility at all layers
32
OS
HW
Analytics Pipeline
Dashboards
App
OS
HW
App
Snap
Kubernetes
Visibility at all layers
OS
HW
App
Snap
Kubernetes
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
34
REST & CLI
Flexible
Scheduling
Caching Security
Plugin Lifecycle
Management
Worker Queues Metric Catalog Tribe
Thought Leadership Ahead
35
Warning:
Monitoring is
36
Monitoring
37
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
38
Monitoring is
Telemetry
39
Monitoring is
Telemetry
Collect
Process
Publish
Schedule
Automate
40
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
41
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Snap
42
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Grafana
Better Thought Leadership
43
by @obscurify by @caskey
https://github.com/mjbrender/what-we-talk-about-when-we-talk-about-telemetry
Q&A
44
FAQ
45
Do I need telemetry?
FAQ
46
I don’t need telemetry, I have
____________.
FAQ
47
I don’t need telemetry, I have
____________.Graphite
48
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Graphite
FAQ
49
Do I need monitoring?
FAQ
50
We run ________ for monitoring.Nagios
51
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Nagios
What Is Telemetry? (revisited)
What How
Application Availability ping
Operating System
Performance
psutil
Hardware Utilization
Intel Performance
Counter Metrics (PCM)
What Is Telemetry? (revisited)
What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System
Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
What Is Telemetry? (revisited)
What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System
Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
Snap Grafana
55
Next Up
56
Start using Snap!
• snap-telemetry.io
• github.com/intelsdi-x
Find me:
• on The Geek Whisperers
• and @mjbrender
additional information
57
Everything is Challenging At Scale
58
Add new task
59
Add new task
60
define as a tribe
Scaling with Tribe
61
Scaling with Tribe
Add new task
62
snap | What’s next?
Physical/Virtual Host
Scheduler
Processing
Publishing
Collection
63
snap | What’s next?
64
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host Physical/VM Host
Collection
Collection
Collection
Scheduler
Processing Publishing
§ Plugin load
§ Dynamic, does not require restart
§ Automatically is informed by plugin on the features, metrics, and configuration detail.
§ Dynamically extends the metric catalog when loaded.
§ Plugin unload
§ Removes metrics from catalog automatically
§ Loading a new plugin automatically upgrades running workflows in tasks
§ Optionally the collection can be pinned to a version
(ex: get /intel/server/cpu/load/v1)
§ Each scheduled workflow automatically uses the most mature plugin for that step
§ Coupled with dynamic plugin loading results in instantaneous updates to existing workflows
§ Helpful for bug fixes, security patching, improving accuracy
snap | Plugin Lifecycle
65
Customizable definition of task and related workflow:
Collect
Publish
Publish
Collect Publish ProcessCollect Publish
Collect
Process Publish
Process Publish
snap | Overview – Example Workflows
66
The Catalog
67
Intel
PCM
psutil HAProxy
/intel/psutil/load/load1
/intel/psutil/load/load5
/intel/psutil/vm/available
/intel/pcm/EXEC
/intel/pcm/FREQ
/intel/linux/docker/cpu_stats/throttling_data/periods
snapctl metric list
/intel/server/health/score
Docker
Intel
Health
/intel/haproxy/info/MaxConnRate
snap

More Related Content

PPTX
GoSF Jan 2016 - Go Write a Plugin for Snap!
PDF
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
PPTX
Framingham Go Meetup - October 2016
PDF
Telemetry indepth
PPTX
Prometheus (Monitorama 2016)
PDF
ACM Applicative System Methodology 2016
PDF
Using eBPF to Measure the k8s Cluster Health
PDF
SREcon 2016 Performance Checklists for SREs
GoSF Jan 2016 - Go Write a Plugin for Snap!
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Framingham Go Meetup - October 2016
Telemetry indepth
Prometheus (Monitorama 2016)
ACM Applicative System Methodology 2016
Using eBPF to Measure the k8s Cluster Health
SREcon 2016 Performance Checklists for SREs

What's hot (20)

PDF
Netflix SRE perf meetup_slides
PPTX
Prometheus with Grafana - AddWeb Solution
PDF
Monitoring Kubernetes with Prometheus
PDF
RxNetty vs Tomcat Performance Results
PDF
Linux 4.x Tracing Tools: Using BPF Superpowers
PDF
LISA17 Container Performance Analysis
PDF
Get Lower Latency and Higher Throughput for Java Applications
PDF
Designing Tracing Tools
PDF
Scaling ingest pipelines with high performance computing principles - Rajiv K...
PDF
Kernel Recipes 2017: Using Linux perf at Netflix
PDF
Velocity 2015 linux perf tools
PDF
Linux Performance Analysis: New Tools and Old Secrets
PDF
Debugging node in prod
PPTX
Broken Linux Performance Tools 2016
PDF
Performance optimization 101 - Erlang Factory SF 2014
PDF
LISA2010 visualizations
PPTX
An Introduction to Prometheus (GrafanaCon 2016)
PDF
Netflix: From Clouds to Roots
PPTX
Prometheus Training
PDF
Linux Profiling at Netflix
Netflix SRE perf meetup_slides
Prometheus with Grafana - AddWeb Solution
Monitoring Kubernetes with Prometheus
RxNetty vs Tomcat Performance Results
Linux 4.x Tracing Tools: Using BPF Superpowers
LISA17 Container Performance Analysis
Get Lower Latency and Higher Throughput for Java Applications
Designing Tracing Tools
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Kernel Recipes 2017: Using Linux perf at Netflix
Velocity 2015 linux perf tools
Linux Performance Analysis: New Tools and Old Secrets
Debugging node in prod
Broken Linux Performance Tools 2016
Performance optimization 101 - Erlang Factory SF 2014
LISA2010 visualizations
An Introduction to Prometheus (GrafanaCon 2016)
Netflix: From Clouds to Roots
Prometheus Training
Linux Profiling at Netflix
Ad

Viewers also liked (20)

PDF
Data Logging and Telemetry
PPTX
Commitmas 2016
PDF
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
PPTX
Cashing in on logging and exception data
PDF
Datacenter Transformation - Energy And Availability - Dio Van Der Arend
PPT
Data Center PUE Reconsidered
PPTX
Building Successful Apps Using Application Telemetry and Data Driven Decision...
PDF
Trends in HPC Power Metrics and where to from here Ramkumar Nagappan Intel Final
PPTX
Be The API - VMware UserCon 2016
PDF
Bosun Monitoring Talk at LISA14
PDF
The Real Cost of Slow Time vs Downtime
PDF
Dockerizing OpenStack for High Availability
PDF
Fluent-bit
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
PDF
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
PDF
How to Make Norikra Perfect
PDF
Native container monitoring
PDF
Fluentd Overview, Now and Then
PPTX
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
PDF
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Data Logging and Telemetry
Commitmas 2016
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
Cashing in on logging and exception data
Datacenter Transformation - Energy And Availability - Dio Van Der Arend
Data Center PUE Reconsidered
Building Successful Apps Using Application Telemetry and Data Driven Decision...
Trends in HPC Power Metrics and where to from here Ramkumar Nagappan Intel Final
Be The API - VMware UserCon 2016
Bosun Monitoring Talk at LISA14
The Real Cost of Slow Time vs Downtime
Dockerizing OpenStack for High Availability
Fluent-bit
DataEngConf SF16 - Collecting and Moving Data at Scale
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
How to Make Norikra Perfect
Native container monitoring
Fluentd Overview, Now and Then
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Ad

Similar to Intro to open source telemetry linux con 2016 (20)

PPTX
The Art of Container Monitoring
PDF
INNOVATIVE TECHNOLOGIES DRIVE THE FUTURE OF DATA CENTER TELEMETRY
PDF
Monitoring - deeper dive
PPTX
WTF is Sensu and Monitoring
PDF
2307 - DevBCN - Otel 101_compressed.pdf
PDF
Webinar Monitoring in era of cloud computing
PPTX
What does "monitoring" mean? (FOSDEM 2017)
PDF
Fine grained monitoring
PPTX
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
PDF
Monitoring microservices platform
PPTX
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
PPTX
Service Assurance Constructs for Achieving Network Transformation by Sunku Ra...
PPTX
Comprehensive container based service monitoring with kubernetes and istio
PDF
Monitoring in an Infrastructure as Code Age
PPTX
Network Telemetry
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
PDF
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
PPTX
Monitoring system for OpenStack,using a OSS products
PDF
Application metrics - Confoo 2019
PDF
Application Metrics (with Prometheus examples)
The Art of Container Monitoring
INNOVATIVE TECHNOLOGIES DRIVE THE FUTURE OF DATA CENTER TELEMETRY
Monitoring - deeper dive
WTF is Sensu and Monitoring
2307 - DevBCN - Otel 101_compressed.pdf
Webinar Monitoring in era of cloud computing
What does "monitoring" mean? (FOSDEM 2017)
Fine grained monitoring
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring microservices platform
Service Assurance Constructs for Achieving Network Transformation - Sunku Ran...
Service Assurance Constructs for Achieving Network Transformation by Sunku Ra...
Comprehensive container based service monitoring with kubernetes and istio
Monitoring in an Infrastructure as Code Age
Network Telemetry
Evolution of Monitoring and Prometheus (Dublin 2018)
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Monitoring system for OpenStack,using a OSS products
Application metrics - Confoo 2019
Application Metrics (with Prometheus examples)

More from Matthew Broberg (7)

PPTX
Where Do We Go From Here?
PPTX
A Geek Whisperer's Guide to Career Options
PPTX
Commitmas 2015
PPTX
Social Media Communities Explained - They're Like Puppies
PPTX
Social Benchmarking Training
PPTX
How to Pitch an Idea - Lessons from EMC TV & Toastmasters
PPTX
Social influence
Where Do We Go From Here?
A Geek Whisperer's Guide to Career Options
Commitmas 2015
Social Media Communities Explained - They're Like Puppies
Social Benchmarking Training
How to Pitch an Idea - Lessons from EMC TV & Toastmasters
Social influence

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PPT
Introduction Database Management System for Course Database
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Introduction to Artificial Intelligence
PDF
Nekopoi APK 2025 free lastest update
PDF
medical staffing services at VALiNTRY
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
L1 - Introduction to python Backend.pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
System and Network Administraation Chapter 3
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
CHAPTER 2 - PM Management and IT Context
Digital Strategies for Manufacturing Companies
Introduction Database Management System for Course Database
Operating system designcfffgfgggggggvggggggggg
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Introduction to Artificial Intelligence
Nekopoi APK 2025 free lastest update
medical staffing services at VALiNTRY
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
L1 - Introduction to python Backend.pptx
Softaken Excel to vCard Converter Software.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Upgrade and Innovation Strategies for SAP ERP Customers
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
ManageIQ - Sprint 268 Review - Slide Deck
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
2025 Textile ERP Trends: SAP, Odoo & Oracle
System and Network Administraation Chapter 3
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
ISO 45001 Occupational Health and Safety Management System
CHAPTER 2 - PM Management and IT Context

Intro to open source telemetry linux con 2016