SlideShare a Scribd company logo
1
2
Daniel Krook
Senior Certified IT Specialist, IBM
The IBM dashboard for operational metrics
3
We run Cloud Foundry on dozens of OpenStack VMs
Two intranet clusters
In the past year, we’ve learned how to
Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 apps
NG: 41 medium VMs deployed with BOSH: 123 users, 247 apps
Not counting Dev deployments
All on 50+ Nova Compute nodes
• Keep Cloud Foundry running smoothly
• Discover and prevent impending problems
• Resolve unexpected issues quickly
4
1. Show the key data points we track
2. Show how our metrics dashboard helps us monitor that data
3. Share ideas on how to find better data in NG and beyond
4. Spark discussion on improved visibility for CF admins and customers.
Goals of this lightning talk
We are looking to get better at this, and help the community get better as well.
5
1. The key data
6
What are the important metrics?
Data that can be
tracked over time to see
trends and behaviors
Data that can help
us predict problems
before they happen
DEAs and apps health
ď‚§ Memory reserved as a proportion of the
memory available
General health of all components
ď‚§ Health of the virtual machines
ď‚§ Status of the processes running on them
Database nodes and services
ď‚§ Number of provisioned services against
capacity available
At the PaaS layer, that means:
7
ď‚§ Deliver continuous
availability in the cloud
ď‚§ Proactively solve
problems rather than
react to them
ď‚§ Understand the behavior
of the system to
automate it
Why do we need metrics?
8
ď‚§ NATS message bus
• Discover the components to interrogate
• Best for dynamically changing data
Where can we find them?
ď‚§ Cloud Controller database (CCDB)
• Longer lived data that isn’t in the varz endpoints
9
2. Monitoring that data
10
1. Views of component health
2. Resource usage details
3. Ongoing growth trends
4. Access to logs and raw varz
5. Email notifications
Our metrics dashboard provides…
11
ď‚§ Components nearing capacity or failure
ď‚§ Already failed components
ď‚§ Out of control apps and noisy users
ď‚§ Active/inactive users and apps
ď‚§ Growth trends and runtime/service adoption
It helps us find (and fix) problems
It helps us see patterns
12
User and app trends
There is also one unauthenticated page for high level stats
13
DEA list
14
DEA details
15
Service node list
16
Service node details
17
User list
18
User details
19
App list
20
App details
21
Log list
22
Log details
23
Email notifications
24
3. Finding and acting on better data
25
 NG provides granular user/org/space views…
• This enables better BSS potential in terms of QoS and departmental billing
 …But we lost user and app data linkages from the health manager
• Can’t see what DEA my app resides on (not currently enabled in our NG version)
• Can’t see how many apps a user has (replaced by orgs and spaces, but still
valuable to trace)
• See https://github.com/cloudfoundry/cloud_controller_ng/issues/81
 We’d like to restore that data, either surface it
• in varz endpoints (dynamic data, preferred) or
• CC_DB (static data, could be a security concern)
Let’s resolve gaps in data captured from NG
26
ď‚§ Detect errors in applications that are traceable to users/orgs
• Preemptively reach out to them to see if they need help
• Think customer service and proactive support!
• Can we hook into to BOSH or Jenkins for automation?
ď‚§ Automate (and expand links to the IaaS and SaaS stacks)
• Self healing systems (out of disk, move apps)
• Self scaling systems (detect when nearing thresholds)
• Evolving topologies (replace unused service nodes with popular ones)
Let’s begin to link metrics to automation
27
ď‚§ Admins are the primary beneficiary right now
• But data is almost completely read only
• Should we provide UAA based tiers of access to admins?
ď‚§ Others can and should benefit
• Customers
• End users
• Developers
• Management
• Executives, line of business owners
• Finance
Let’s expand the broadcast of metrics to more users
28
Thanks!
29
The metrics dashboard innovators
Chris Peters Russell Boykin
Doug Davis Wei Feng
30
We’re hiring!
Search Jobs at IBM by:
SmartCloud Application Services
31

More Related Content

PDF
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
PDF
Improving Veteran benefit services through efficient data streaming | Robert ...
PPTX
SplunkLive! Customer Presentation – Covance Inc"
 
PPTX
RapidScale CloudServer
PPTX
Cloud Consulting Services Company | UnifyCloud LLC
PPTX
Stream Analytics for Data in Motion
PPTX
CloudDiscovery - Machine Analytics
PPTX
Trust, security and privacy issues with cloud erp
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Improving Veteran benefit services through efficient data streaming | Robert ...
SplunkLive! Customer Presentation – Covance Inc"
 
RapidScale CloudServer
Cloud Consulting Services Company | UnifyCloud LLC
Stream Analytics for Data in Motion
CloudDiscovery - Machine Analytics
Trust, security and privacy issues with cloud erp

What's hot (20)

PPTX
January 2015 Webinar - Wins and Successes from 2014
PPTX
Science for the Future: Strategies for Moving and Sharing Data
PPTX
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
PDF
Towards Personalization in Global Digital Health
PPTX
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
 
PPTX
Splunk Distributed Management Console
 
PDF
Modern management of data pipelines made easier
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
 
PDF
Affecto Informatica World Tour 2015: The Age of Engagement
PDF
Splunk in the Cisco Unified Computing System (UCS)
 
PPTX
RapidScale CloudMail
PDF
Three Pillars, Zero Answers: Rethinking Observability
PDF
Migrating from Java EE to cloud-native Reactive systems
PPTX
Event-driven architecture
PDF
IBM and Lightbend Build Integrated Platform for Cognitive Development
PPTX
SplunkLive! Customer Presentation - SSA
 
PPTX
SplunkLive! Customer Presentation - Staples
 
PPTX
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
PPTX
Dev ops toronto
PDF
Conferencia principal: EvoluciĂłn y visiĂłn de Elastic Observability
January 2015 Webinar - Wins and Successes from 2014
Science for the Future: Strategies for Moving and Sharing Data
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Towards Personalization in Global Digital Health
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
 
Splunk Distributed Management Console
 
Modern management of data pipelines made easier
Taking Splunk to the Next Level - Architecture Breakout Session
 
Affecto Informatica World Tour 2015: The Age of Engagement
Splunk in the Cisco Unified Computing System (UCS)
 
RapidScale CloudMail
Three Pillars, Zero Answers: Rethinking Observability
Migrating from Java EE to cloud-native Reactive systems
Event-driven architecture
IBM and Lightbend Build Integrated Platform for Cognitive Development
SplunkLive! Customer Presentation - SSA
 
SplunkLive! Customer Presentation - Staples
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Dev ops toronto
Conferencia principal: EvoluciĂłn y visiĂłn de Elastic Observability
Ad

Viewers also liked (13)

PDF
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
PPT
Best Practices in Measuring Critical Support Metrics
PPTX
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
PPTX
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
PPTX
Regulatory Reporting Dashboard
PDF
The difference between a KPI and a Metric
PPT
Stress management in hr
PPT
KPI for HR Manager - Sample of KPIs for HR
PDF
Microservices with Spring and Cloud Foundry
PPTX
The 10 Most Important Banking Metrics
PPT
Developing Metrics and KPI (Key Performance Indicators
PDF
Learning Metrics: Building Your Training Scorecard
PPTX
KEY PERFORMANCE INDICATOR
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Best Practices in Measuring Critical Support Metrics
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Regulatory Reporting Dashboard
The difference between a KPI and a Metric
Stress management in hr
KPI for HR Manager - Sample of KPIs for HR
Microservices with Spring and Cloud Foundry
The 10 Most Important Banking Metrics
Developing Metrics and KPI (Key Performance Indicators
Learning Metrics: Building Your Training Scorecard
KEY PERFORMANCE INDICATOR
Ad

Similar to The IBM dashboard for operational metrics (20)

PDF
Cloudera federal summit
PDF
2022 Trends in Enterprise Analytics
PPTX
SMAC - Social, Mobile, Analytics and Cloud - An overview
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PDF
Whitepaper factors to consider when selecting an open source infrastructure ...
PPTX
Lecture 3.31 3.32.pptx
PPTX
ADDO Open Source Observability Tools
PDF
Whitepaper factors to consider commercial infrastructure management vendors
PPTX
The Architecture of Continuous Innovation - OSCON 2015
PDF
About Streaming Data Solutions for Hadoop
PPTX
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
PDF
Cloud-Native Data: What data questions to ask when building cloud-native apps
PDF
How to improve your system monitoring
PPTX
DockerCon SF 2019 - Observability Workshop
PDF
Why Monitoring and Logging are Important in DevOps.pdf
PDF
Streamline Your Data Workflows with DataOps for Better Efficiency.pdf
PPSX
Big Data
PDF
Whitepaper tableau for-the-enterprise-0
PPTX
Emerging IT Trends and Innovation Concepts.pptx
PPTX
How to add security in dataops and devops
Cloudera federal summit
2022 Trends in Enterprise Analytics
SMAC - Social, Mobile, Analytics and Cloud - An overview
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Whitepaper factors to consider when selecting an open source infrastructure ...
Lecture 3.31 3.32.pptx
ADDO Open Source Observability Tools
Whitepaper factors to consider commercial infrastructure management vendors
The Architecture of Continuous Innovation - OSCON 2015
About Streaming Data Solutions for Hadoop
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Cloud-Native Data: What data questions to ask when building cloud-native apps
How to improve your system monitoring
DockerCon SF 2019 - Observability Workshop
Why Monitoring and Logging are Important in DevOps.pdf
Streamline Your Data Workflows with DataOps for Better Efficiency.pdf
Big Data
Whitepaper tableau for-the-enterprise-0
Emerging IT Trends and Innovation Concepts.pptx
How to add security in dataops and devops

More from Platform CF (19)

PPTX
The Platform for Building Great Software
PPTX
The Path to Stackato
PPT
Continuous Deployment with Cloud Foundry, Github and Travis CI
PPTX
The Journey to Cloud Foundry
PPTX
Pivotal HD as a Cloud Foundry Service
POTX
What Lessons Can Cloud Foundry Teach to IaaS?
PPTX
Cloud Foundry at VMware
PDF
Go Within Cloud Foundry
PDF
Continuous Delivery with Cloud Foundry
PDF
From Zero To Factory
PPTX
Service Distribution to Any Cloud - Cloud Elements
PPTX
Cloud Foundry Marketplace Powered by AppDirect
PPTX
The Path to Stackato
PPTX
Multi-site Architecture Considerations
PPTX
Intro to MoPaaS
PPTX
Cloud Foundry at NTT
PPT
Building Opportunity with an Open Cloud Architecture
PPTX
Extending Cloud Foundry to .NET
PPTX
Cloud Foundry at Rakuten
The Platform for Building Great Software
The Path to Stackato
Continuous Deployment with Cloud Foundry, Github and Travis CI
The Journey to Cloud Foundry
Pivotal HD as a Cloud Foundry Service
What Lessons Can Cloud Foundry Teach to IaaS?
Cloud Foundry at VMware
Go Within Cloud Foundry
Continuous Delivery with Cloud Foundry
From Zero To Factory
Service Distribution to Any Cloud - Cloud Elements
Cloud Foundry Marketplace Powered by AppDirect
The Path to Stackato
Multi-site Architecture Considerations
Intro to MoPaaS
Cloud Foundry at NTT
Building Opportunity with an Open Cloud Architecture
Extending Cloud Foundry to .NET
Cloud Foundry at Rakuten

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
 
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Cloud computing and distributed systems.
KodekX | Application Modernization Development
 
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing

The IBM dashboard for operational metrics

  • 1. 1
  • 2. 2 Daniel Krook Senior Certified IT Specialist, IBM The IBM dashboard for operational metrics
  • 3. 3 We run Cloud Foundry on dozens of OpenStack VMs Two intranet clusters In the past year, we’ve learned how to Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 apps NG: 41 medium VMs deployed with BOSH: 123 users, 247 apps Not counting Dev deployments All on 50+ Nova Compute nodes • Keep Cloud Foundry running smoothly • Discover and prevent impending problems • Resolve unexpected issues quickly
  • 4. 4 1. Show the key data points we track 2. Show how our metrics dashboard helps us monitor that data 3. Share ideas on how to find better data in NG and beyond 4. Spark discussion on improved visibility for CF admins and customers. Goals of this lightning talk We are looking to get better at this, and help the community get better as well.
  • 6. 6 What are the important metrics? Data that can be tracked over time to see trends and behaviors Data that can help us predict problems before they happen DEAs and apps health ď‚§ Memory reserved as a proportion of the memory available General health of all components ď‚§ Health of the virtual machines ď‚§ Status of the processes running on them Database nodes and services ď‚§ Number of provisioned services against capacity available At the PaaS layer, that means:
  • 7. 7 ď‚§ Deliver continuous availability in the cloud ď‚§ Proactively solve problems rather than react to them ď‚§ Understand the behavior of the system to automate it Why do we need metrics?
  • 8. 8 ď‚§ NATS message bus • Discover the components to interrogate • Best for dynamically changing data Where can we find them? ď‚§ Cloud Controller database (CCDB) • Longer lived data that isn’t in the varz endpoints
  • 10. 10 1. Views of component health 2. Resource usage details 3. Ongoing growth trends 4. Access to logs and raw varz 5. Email notifications Our metrics dashboard provides…
  • 11. 11 ď‚§ Components nearing capacity or failure ď‚§ Already failed components ď‚§ Out of control apps and noisy users ď‚§ Active/inactive users and apps ď‚§ Growth trends and runtime/service adoption It helps us find (and fix) problems It helps us see patterns
  • 12. 12 User and app trends There is also one unauthenticated page for high level stats
  • 24. 24 3. Finding and acting on better data
  • 25. 25 ď‚§ NG provides granular user/org/space views… • This enables better BSS potential in terms of QoS and departmental billing ď‚§ …But we lost user and app data linkages from the health manager • Can’t see what DEA my app resides on (not currently enabled in our NG version) • Can’t see how many apps a user has (replaced by orgs and spaces, but still valuable to trace) • See https://github.com/cloudfoundry/cloud_controller_ng/issues/81 ď‚§ We’d like to restore that data, either surface it • in varz endpoints (dynamic data, preferred) or • CC_DB (static data, could be a security concern) Let’s resolve gaps in data captured from NG
  • 26. 26 ď‚§ Detect errors in applications that are traceable to users/orgs • Preemptively reach out to them to see if they need help • Think customer service and proactive support! • Can we hook into to BOSH or Jenkins for automation? ď‚§ Automate (and expand links to the IaaS and SaaS stacks) • Self healing systems (out of disk, move apps) • Self scaling systems (detect when nearing thresholds) • Evolving topologies (replace unused service nodes with popular ones) Let’s begin to link metrics to automation
  • 27. 27 ď‚§ Admins are the primary beneficiary right now • But data is almost completely read only • Should we provide UAA based tiers of access to admins? ď‚§ Others can and should benefit • Customers • End users • Developers • Management • Executives, line of business owners • Finance Let’s expand the broadcast of metrics to more users
  • 29. 29 The metrics dashboard innovators Chris Peters Russell Boykin Doug Davis Wei Feng
  • 30. 30 We’re hiring! Search Jobs at IBM by: SmartCloud Application Services
  • 31. 31