SlideShare a Scribd company logo
© 2010 VMware Inc. All rights reserved
vCenter Operations Enterprise - Standalone
Real-time Performance Management for the Entire Enterprise
Technical Presentation
12
Management Challenges
 Performance problems often occur with no real warning
– Many times end users are the first to notice problems
– Root cause determination is difficult and time-consuming
– Solving problems requires all-hands-on-deck bridge calls
 Real-time understanding of performance is lacking
– No reliable understanding of the health of IT infrastructure makes IT too reactive
– Siloed monitoring tools do not allow a common “truth”
– No correlation across IT silos
 Optimizing IT infrastructure is difficult if not impossible
– Understanding the abnormal metric behaviors that lead to degradation of Key
Performance Indicators is not possible with current tools
– Understanding the abnormal behaviors that define your worst performing devices is
not possible with current tools
– Heavy reliance on “Tribal Knowledge” of a few application experts
13
What If You Could…
 Automate
• Eliminate time-consuming problem resolution
processes
 Correlate and Accelerate
• “One Click” to root cause of emerging
performance problems to reduce MTTI/MTTR
 Get Proactive
• Avert end user and business impact of
building performance problems
 Collaborate
• Aggregate and correlate data from monitoring
landscape to create a single “truth”
 Optimize
• Tune components to deliver optimal
performance for application transactions
14
Technical Details
15
1st Generation - Event-Centric, Hard-Threshold Based
3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a
3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System
3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System
3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System
3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a
3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle
3/4/08 14:40 n/a responseTimeServ… The Response Time Service Level on Siebel Sa.. n/a n/a n/a
3/4/08 14:20 n/a processingTimeServ.. The Processing Time Service Level on Siebel S. n/a n/a n/a
3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 6780)’: is cons.. n/a 0 Windows_System
3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 7940)’: is cons.. n/a 0 Windows_System
3/4/08 14:15 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a
3/4/08 14:15 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a
3/4/08 13:55 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle
3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a
3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System
3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System
How “1st Generation” Tools Attempt to Solve These Problems
DATA FEEDS
DATA FEEDS
DATA FEEDS
DATA FEEDS
16
2nd Generation - Rudimentary Baselining, Rules/Templates, Charting
3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a
3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System
3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System
3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System
3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a
3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle
3/4/08 14:40 n/a responseTimeServ… The Response Time Service Level on Siebel Sa.. n/a n/a n/a
3/4/08 14:20 n/a processingTimeServ.. The Processing Time Service Level on Siebel S. n/a n/a n/a
3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 6780)’: is cons.. n/a 0 Windows_System
3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 7940)’: is cons.. n/a 0 Windows_System
3/4/08 14:15 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a
3/4/08 14:15 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a
3/4/08 13:55 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle
3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a
3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System
3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System
3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System
3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a
3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle
How “2nd Generation” Tools Attempt to Solve These Problems?
DATA FEEDS
DATA FEEDS
DATA FEEDS
DATA FEEDS
17
VMware’s Approach to Real-Time Performance Management
Flexible
INTEGRATION
to many data sources
Enterprise
SCALABILITY
Patented performance
ANALYTICS
I can put all my
monitoring tools to good
use and get better
performance analytics.
Powerful information
DASHBOARDS
3rd Generation – Holistic, Real Time Analytics
18
Slide 18
vCenter Operations 3rd Generation Approach – An Analogy
My brain is understanding the health of my body.
Should I do anything?
Your Brain Understands Context:
 If my heart rate and temperature are increasing I
should go to the hospital
 If I’m tired, rest more
 If I tire easily, start exercising!
Heart RateRespiration Temperature
Muscular Skeletal Cardio Vascular
Monitoring UserEx Metrics Monitoring Business Metrics
Monitoring App Layer Metric – JVM, DB Connections, etc.
Monitoring Server O/S Metrics – CPU, RAM, Disk, I/O, etc.
vCenter Operations is understanding the health of
my enterprise by analyzing millions of
measurements. Should I do anything?
vCenter Operations Understands Context:
 Act based on urgency of emerging problems
 Act based on real-time performance dashboards
 Act based on long term correlations and trends
vCenter
Operations
Nervous
19
Data Agnostic Approach to Data Collection
 Accepts any time series data (examples)
• Server OS
• Server App layer (eg, IIS, Oracle, WebSphere, etc)
• Network
• Storage
• User Experience
• Transactional
• Business Data
• Change Events
 Minimal Required Fields (4)
• Object Name, Metric Name, Value, Timestamp
 Data Extraction - *not* an analytic question
• No rules/templates to Write and Maintain
• vCenter Operations Analytics do all of the “Work”
vCenter
Operations
20
Slide 20
Learn Normal Behavior and Identify Abnormalities
 Doesn’t assume IT data has a normal bell-shaped distribution
 Sophisticated Analytics – 8 different algorithms
 Learns your dynamic ranges of “Normal” without templates
 Learns patterns of behavior and identifies Abnormalities
BLUE LINE
Metric’s
Measured Value
GRAY BAR
Learned Upper and
Lower band of Dynamic
Threshold - “Normal”
RED Zone
Breached Dynamic
Threshold – “Abnormal”
21
Proactive Alerting – Smart Alerts
User Experience (eg, RUM, etc.)
Database Silo (eg, Quest, etc.)
App Data (eg, Wily, etc.)
Network Data (e.g., Ionix IPPM, etc.)
Smart Alert Generation (“When”)
Business Data (eg, Finance)
! SMART ALERT
Business Application
22
Drill down to the Root Cause
Smart Alert Summary (“What”)
23
Drill down to the Root Cause
Smart Alert Summary (“What”)
Early Warning
SMART ALERT
Noise Line Crossed
24
Drill down to the Root Cause
Smart Alert Summary (“What”)
Impact to
application
health
Impact to health of
each technology tier
No major impact to
application key Performance
Indicators (KPIs)…yet.
25
Drill down to the Root Cause
Smart Alert Summary (“What”)
Root cause technology
tier is the DB
Metric-level
root cause
symptoms -
START HERE
26
Drill down to the Root Cause
See change and other
external events
affect on application
health with this
“mash up” view
Smart Alert Summary (“What”)
27
 One Source of Truth Across the
Enterprise
 Health - Objective measure of
performance based on
underlying level of abnormal
behavior
 Analytics provide a Health
score for any resource or
grouping
• A single Server, Device, Resource
• Entire Tier or Silo
• Entire Application or Service
• Entire Datacenter
• Any Arbitrary Group of Resources
Dynamic Performance Dashboards – Health Scores
“How is our world doing?”
32
vCenter Operations - OPEX Savings
Incident Management
Lifecycle Savings
 Manage/Resolve incidents
 Proactive alerts reduce costs
30-40%
Change Lifecycle
Savings
 Manage changes to
apps/infrastructure
 “Before/after” analysis reduces
changed-related incidents 30-40%
Incident Management
Savings
 Managing Service Desk issues
(Incidents)
 Manual threshold elimination
reduces erroneous tickets by
50-60%
Problem Management
Savings
 Closing problems after systems
restored, includes root cause
analysis
 Root cause analysis reduces
problem closure by 30%
33
Customer Success: IT Operations
Before
 400 critical alerts/hour
 End user complaints
alerted IT to the problem
 End users impacted (avg. 2
hours/outage)
 12 Level-2 engineers on
bridge call to address
problem
After
 20 alerts/MONTH
 3 hours advanced warning
of slowdown w/root cause
 NO end user impact
 1 Level-2 Engineer and 1
DBA to address problems
Learn Normal
Smart Alerting
Root Cause
Solve performance issues before end users are affected
and reduce total alerts
34
vCenter Operations Architecture,
Process and Sizing
35
vCenter Operations Enterprise - Standalone Architecture
 Four Main Services:
Collector, Analytics,
Web, ActiveMQ
 Architecture includes
MS SQL or Oracle DB,
plus File-based DB
(FSDB) for raw metric
storage
 Collectors can be
distributed for
scalability, or to span
DCs & firewalls
36
vCenter Operations Enterprise - Standalone Processing
4a: Metric-level anomalies
are tracked for Alerting and
Dashboarding
5: Data
provided to
“Northbound”
integration
with products
like Ionix
SMARTS
SAM
2a: Analytics runs daily to
determine hour-by-hour
DTs for next 24 hours
2b: Full FSDB is scanned
by the 8 analytic algorithms
to determine per metric
best match the next 24
hour period
1a: Collectors and
adapters collect metrics,
topology & change events
- Ongoing -
1b: Data
stored in
FSDB
3: Incoming data points are
tested against DT bands
4b: Correlate anomalies,
generate Smart Alerts,
and determine RC
2c: Store DT data
in SQL DB
37
Deployment Prerequisites and Sizing
 OS Support
• Win Server 2003 R2 (x64)
• Red Hat Linux RHEL 5 (x64)
* Customer supplied
 DB Support*
• SQL Server 2005
• Oracle 10g R2
Size Metrics -
Collected every
5 min on Avg.
Processors
(>2.8Ghz)
Memory Minimum
Initial Disk
Space
Processors
(>2.8Ghz)
Memory Minimum
Initial Disk
Space
Small <250,00 4 Cores 12GB 500GB 2 Cores 4GB 10GB
Medium <1,000,000 8 Cores 24GB 500GB 4 Cores 8GB 25GB
Large <5,000,000 16 Cores 64GB 5TB 8 Cores 16GB 100GB
Very
Large
<10,000,000 24 Cores 128GB 10TB 8 Cores 16GB 100GB
DB ServerAnalytics (Main) Server
38
vCenter Operations Editions
39
VMware vCenter Operations Editions
vCenter Operations Enterprise
+ Full Configuration & Compliance
Management
+ Other VMware & 3rd Party Integrations
(View, management, servers, storage)
Non-VMware (incl. physical) environments
vCenter Operations Advanced
+ Capacity
Planning
VMware Cloud / vCenter
vSphere
vCenter Operations Standard
Performance
Real-time
Capacity
Configuration
Change
40
Understanding the vCenter Operations Editions
vCenter Operations Standard
Edition
vCenter Operations Enterprise
- Standalone
Data Sources vCenter x 1 • Any 3rd party monitoring tools’
time series data
• Change events
• Multiple vCenter Servers
Objects vCenter Objects (i.e.)
• Data Centers
• Clusters
• ESX Hosts
• Datastores
• VMs x 1500
Unlimited Scope (i.e.)
• Applications
• Network Infrastructure
• Storage
• Hosts (ESX, Win, Linux, etc)
• VMs
Users Infrastructure (e.g. VI Admins) Operations, Infrastructure,
Application Teams, Business
Owners, CxOs
Dynamic Thresholds Yes Yes
Performance Root Cause Yes Yes
Proactive Alerting No Yes
Customizable Dashboards No Yes
Notifications No Yes
ScopeFunction
41
Demo
42
Questions

More Related Content

PDF
Datasheet_SE-Wonderware_AlarmAdviser
PDF
Finger pointing
PPTX
Labview applications in healthcare
PDF
OERCA 2016 e-brochure
PDF
Industrial use of formal methods
PDF
White paper - Actionable Alarming - Wonderware-Schneider Electric
PDF
Remote sensing
Datasheet_SE-Wonderware_AlarmAdviser
Finger pointing
Labview applications in healthcare
OERCA 2016 e-brochure
Industrial use of formal methods
White paper - Actionable Alarming - Wonderware-Schneider Electric
Remote sensing

Viewers also liked (18)

DOCX
Justin vu
PDF
Energy efficient Lighting
PDF
Booknontext 2
PDF
Programa electronica 653
PDF
Conferenza stampa presentazione BANDI PSR BASILICATA 2014-2020
DOCX
Joshua Hall - Engineering Technologist - Systems Admin
PPTX
Mission statement
PDF
Mohammed Mostafa
PPT
Zimbra admin ui demo presentation
PDF
Cams소개서 150709
DOCX
AnnualReportMANA4233
PDF
Amigo MGA, LLC: Insurance You Need
PPTX
Chapter 15 leah jones
DOCX
Cv linkedin
PDF
Insurance and Your Engagement
PDF
PROUDER INDUSTRIAL LTD -Black Catalogue
PPT
Week 4 2008 werner
PDF
IFB Dealer Brochure
Justin vu
Energy efficient Lighting
Booknontext 2
Programa electronica 653
Conferenza stampa presentazione BANDI PSR BASILICATA 2014-2020
Joshua Hall - Engineering Technologist - Systems Admin
Mission statement
Mohammed Mostafa
Zimbra admin ui demo presentation
Cams소개서 150709
AnnualReportMANA4233
Amigo MGA, LLC: Insurance You Need
Chapter 15 leah jones
Cv linkedin
Insurance and Your Engagement
PROUDER INDUSTRIAL LTD -Black Catalogue
Week 4 2008 werner
IFB Dealer Brochure
Ad

Similar to V center operations enterprise standalone technical presentation (20)

PDF
How to not fail at security data analytics (by CxOSidekick)
PPT
Performance Analysis of Idle Programs
PPTX
Observability in real time at scale
PDF
Stream Processing Overview
PDF
Maintain Peace of Mind 24/7 with Lab Monitoring and Alerting
PDF
Nonfunctional Testing: Examine the Other Side of the Coin
PDF
Optimizing connected system performance md&amp;m-anaheim-sandhi bhide 02-07-2017
PPT
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
PPTX
Innoslate 4.5 and Sopatra
PDF
monitor_begin_s.PDF
PDF
Test Bank for Operating Systems: Internals and Design Principles, 7th Edition...
PDF
Oracle R12 EBS Performance Tuning
PDF
Automating the Hunt for Non-Obvious Sources of Latency Spreads
PDF
Test Bank for Operating Systems: Internals and Design Principles, 7th Edition...
PDF
How to Monitoring the SRE Golden Signals (E-Book)
PDF
Test Bank for Operating Systems: Internals and Design Principles, 7th Edition...
PPTX
SplunkLive! Splunk App for VMware
PPTX
An Introduction to Prometheus (GrafanaCon 2016)
PPTX
Webinar on Functional Safety Analysis using Model-based System Analysis
PDF
SREcon 2016 Performance Checklists for SREs
How to not fail at security data analytics (by CxOSidekick)
Performance Analysis of Idle Programs
Observability in real time at scale
Stream Processing Overview
Maintain Peace of Mind 24/7 with Lab Monitoring and Alerting
Nonfunctional Testing: Examine the Other Side of the Coin
Optimizing connected system performance md&amp;m-anaheim-sandhi bhide 02-07-2017
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Innoslate 4.5 and Sopatra
monitor_begin_s.PDF
Test Bank for Operating Systems: Internals and Design Principles, 7th Edition...
Oracle R12 EBS Performance Tuning
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Test Bank for Operating Systems: Internals and Design Principles, 7th Edition...
How to Monitoring the SRE Golden Signals (E-Book)
Test Bank for Operating Systems: Internals and Design Principles, 7th Edition...
SplunkLive! Splunk App for VMware
An Introduction to Prometheus (GrafanaCon 2016)
Webinar on Functional Safety Analysis using Model-based System Analysis
SREcon 2016 Performance Checklists for SREs
Ad

More from solarisyourep (20)

PDF
Presentation a new era in it
PDF
Presentation a vision for user centric computing
PDF
Presentation advanced management – the road ahead
PDF
Presentation architecting a cloud infrastructure
PDF
Presentation architecting virtualized infrastructure for big data
PDF
Presentation avoiding the 19 biggest ha & drs configuration mistakes
PDF
Presentation blade center foundation for cloud
PDF
Presentation building and running your private cloud
PDF
Presentation building your cloud with v mware
PDF
Presentation business critical applications in a virtual env
PDF
Presentation cim1309 v cat 3.0 operating a v-mware cloud
PDF
Presentation cisco intelligent automation complementing and extending v mwa...
PDF
Presentation cisco vxi–optimized infrastructure for scaling v mware view wi...
PDF
Presentation cloud infrastructure and management – from v sphere to vcloud ...
PDF
Presentation cloud infrastructure launch – what’s new
PDF
Presentation cloud meets big
PDF
Presentation consuming a cloud
PDF
Presentation desktops for the cloud the view rollout
PDF
Presentation disaster recovery in virtualization and cloud
PDF
Presentation drs advanced concepts, best practices and future directions
Presentation a new era in it
Presentation a vision for user centric computing
Presentation advanced management – the road ahead
Presentation architecting a cloud infrastructure
Presentation architecting virtualized infrastructure for big data
Presentation avoiding the 19 biggest ha & drs configuration mistakes
Presentation blade center foundation for cloud
Presentation building and running your private cloud
Presentation building your cloud with v mware
Presentation business critical applications in a virtual env
Presentation cim1309 v cat 3.0 operating a v-mware cloud
Presentation cisco intelligent automation complementing and extending v mwa...
Presentation cisco vxi–optimized infrastructure for scaling v mware view wi...
Presentation cloud infrastructure and management – from v sphere to vcloud ...
Presentation cloud infrastructure launch – what’s new
Presentation cloud meets big
Presentation consuming a cloud
Presentation desktops for the cloud the view rollout
Presentation disaster recovery in virtualization and cloud
Presentation drs advanced concepts, best practices and future directions

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)

V center operations enterprise standalone technical presentation

  • 1. © 2010 VMware Inc. All rights reserved vCenter Operations Enterprise - Standalone Real-time Performance Management for the Entire Enterprise Technical Presentation
  • 2. 12 Management Challenges  Performance problems often occur with no real warning – Many times end users are the first to notice problems – Root cause determination is difficult and time-consuming – Solving problems requires all-hands-on-deck bridge calls  Real-time understanding of performance is lacking – No reliable understanding of the health of IT infrastructure makes IT too reactive – Siloed monitoring tools do not allow a common “truth” – No correlation across IT silos  Optimizing IT infrastructure is difficult if not impossible – Understanding the abnormal metric behaviors that lead to degradation of Key Performance Indicators is not possible with current tools – Understanding the abnormal behaviors that define your worst performing devices is not possible with current tools – Heavy reliance on “Tribal Knowledge” of a few application experts
  • 3. 13 What If You Could…  Automate • Eliminate time-consuming problem resolution processes  Correlate and Accelerate • “One Click” to root cause of emerging performance problems to reduce MTTI/MTTR  Get Proactive • Avert end user and business impact of building performance problems  Collaborate • Aggregate and correlate data from monitoring landscape to create a single “truth”  Optimize • Tune components to deliver optimal performance for application transactions
  • 5. 15 1st Generation - Event-Centric, Hard-Threshold Based 3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a 3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System 3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System 3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System 3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a 3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a 3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle 3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle 3/4/08 14:40 n/a responseTimeServ… The Response Time Service Level on Siebel Sa.. n/a n/a n/a 3/4/08 14:20 n/a processingTimeServ.. The Processing Time Service Level on Siebel S. n/a n/a n/a 3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 6780)’: is cons.. n/a 0 Windows_System 3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 7940)’: is cons.. n/a 0 Windows_System 3/4/08 14:15 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a 3/4/08 14:15 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a 3/4/08 13:55 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle 3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a 3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System 3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System How “1st Generation” Tools Attempt to Solve These Problems DATA FEEDS DATA FEEDS DATA FEEDS DATA FEEDS
  • 6. 16 2nd Generation - Rudimentary Baselining, Rules/Templates, Charting 3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a 3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System 3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System 3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System 3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a 3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a 3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle 3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle 3/4/08 14:40 n/a responseTimeServ… The Response Time Service Level on Siebel Sa.. n/a n/a n/a 3/4/08 14:20 n/a processingTimeServ.. The Processing Time Service Level on Siebel S. n/a n/a n/a 3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 6780)’: is cons.. n/a 0 Windows_System 3/4/08 14:39 Host 3 Top_CPU_Table Process ‘siebsh.exe(svc-siebel, 7940)’: is cons.. n/a 0 Windows_System 3/4/08 14:15 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a 3/4/08 14:15 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a 3/4/08 13:55 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle 3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a 3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System 3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System 3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System 3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a 3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a 3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle 3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high I/O has been de.. n/a OraSF Oracle How “2nd Generation” Tools Attempt to Solve These Problems? DATA FEEDS DATA FEEDS DATA FEEDS DATA FEEDS
  • 7. 17 VMware’s Approach to Real-Time Performance Management Flexible INTEGRATION to many data sources Enterprise SCALABILITY Patented performance ANALYTICS I can put all my monitoring tools to good use and get better performance analytics. Powerful information DASHBOARDS 3rd Generation – Holistic, Real Time Analytics
  • 8. 18 Slide 18 vCenter Operations 3rd Generation Approach – An Analogy My brain is understanding the health of my body. Should I do anything? Your Brain Understands Context:  If my heart rate and temperature are increasing I should go to the hospital  If I’m tired, rest more  If I tire easily, start exercising! Heart RateRespiration Temperature Muscular Skeletal Cardio Vascular Monitoring UserEx Metrics Monitoring Business Metrics Monitoring App Layer Metric – JVM, DB Connections, etc. Monitoring Server O/S Metrics – CPU, RAM, Disk, I/O, etc. vCenter Operations is understanding the health of my enterprise by analyzing millions of measurements. Should I do anything? vCenter Operations Understands Context:  Act based on urgency of emerging problems  Act based on real-time performance dashboards  Act based on long term correlations and trends vCenter Operations Nervous
  • 9. 19 Data Agnostic Approach to Data Collection  Accepts any time series data (examples) • Server OS • Server App layer (eg, IIS, Oracle, WebSphere, etc) • Network • Storage • User Experience • Transactional • Business Data • Change Events  Minimal Required Fields (4) • Object Name, Metric Name, Value, Timestamp  Data Extraction - *not* an analytic question • No rules/templates to Write and Maintain • vCenter Operations Analytics do all of the “Work” vCenter Operations
  • 10. 20 Slide 20 Learn Normal Behavior and Identify Abnormalities  Doesn’t assume IT data has a normal bell-shaped distribution  Sophisticated Analytics – 8 different algorithms  Learns your dynamic ranges of “Normal” without templates  Learns patterns of behavior and identifies Abnormalities BLUE LINE Metric’s Measured Value GRAY BAR Learned Upper and Lower band of Dynamic Threshold - “Normal” RED Zone Breached Dynamic Threshold – “Abnormal”
  • 11. 21 Proactive Alerting – Smart Alerts User Experience (eg, RUM, etc.) Database Silo (eg, Quest, etc.) App Data (eg, Wily, etc.) Network Data (e.g., Ionix IPPM, etc.) Smart Alert Generation (“When”) Business Data (eg, Finance) ! SMART ALERT Business Application
  • 12. 22 Drill down to the Root Cause Smart Alert Summary (“What”)
  • 13. 23 Drill down to the Root Cause Smart Alert Summary (“What”) Early Warning SMART ALERT Noise Line Crossed
  • 14. 24 Drill down to the Root Cause Smart Alert Summary (“What”) Impact to application health Impact to health of each technology tier No major impact to application key Performance Indicators (KPIs)…yet.
  • 15. 25 Drill down to the Root Cause Smart Alert Summary (“What”) Root cause technology tier is the DB Metric-level root cause symptoms - START HERE
  • 16. 26 Drill down to the Root Cause See change and other external events affect on application health with this “mash up” view Smart Alert Summary (“What”)
  • 17. 27  One Source of Truth Across the Enterprise  Health - Objective measure of performance based on underlying level of abnormal behavior  Analytics provide a Health score for any resource or grouping • A single Server, Device, Resource • Entire Tier or Silo • Entire Application or Service • Entire Datacenter • Any Arbitrary Group of Resources Dynamic Performance Dashboards – Health Scores “How is our world doing?”
  • 18. 32 vCenter Operations - OPEX Savings Incident Management Lifecycle Savings  Manage/Resolve incidents  Proactive alerts reduce costs 30-40% Change Lifecycle Savings  Manage changes to apps/infrastructure  “Before/after” analysis reduces changed-related incidents 30-40% Incident Management Savings  Managing Service Desk issues (Incidents)  Manual threshold elimination reduces erroneous tickets by 50-60% Problem Management Savings  Closing problems after systems restored, includes root cause analysis  Root cause analysis reduces problem closure by 30%
  • 19. 33 Customer Success: IT Operations Before  400 critical alerts/hour  End user complaints alerted IT to the problem  End users impacted (avg. 2 hours/outage)  12 Level-2 engineers on bridge call to address problem After  20 alerts/MONTH  3 hours advanced warning of slowdown w/root cause  NO end user impact  1 Level-2 Engineer and 1 DBA to address problems Learn Normal Smart Alerting Root Cause Solve performance issues before end users are affected and reduce total alerts
  • 21. 35 vCenter Operations Enterprise - Standalone Architecture  Four Main Services: Collector, Analytics, Web, ActiveMQ  Architecture includes MS SQL or Oracle DB, plus File-based DB (FSDB) for raw metric storage  Collectors can be distributed for scalability, or to span DCs & firewalls
  • 22. 36 vCenter Operations Enterprise - Standalone Processing 4a: Metric-level anomalies are tracked for Alerting and Dashboarding 5: Data provided to “Northbound” integration with products like Ionix SMARTS SAM 2a: Analytics runs daily to determine hour-by-hour DTs for next 24 hours 2b: Full FSDB is scanned by the 8 analytic algorithms to determine per metric best match the next 24 hour period 1a: Collectors and adapters collect metrics, topology & change events - Ongoing - 1b: Data stored in FSDB 3: Incoming data points are tested against DT bands 4b: Correlate anomalies, generate Smart Alerts, and determine RC 2c: Store DT data in SQL DB
  • 23. 37 Deployment Prerequisites and Sizing  OS Support • Win Server 2003 R2 (x64) • Red Hat Linux RHEL 5 (x64) * Customer supplied  DB Support* • SQL Server 2005 • Oracle 10g R2 Size Metrics - Collected every 5 min on Avg. Processors (>2.8Ghz) Memory Minimum Initial Disk Space Processors (>2.8Ghz) Memory Minimum Initial Disk Space Small <250,00 4 Cores 12GB 500GB 2 Cores 4GB 10GB Medium <1,000,000 8 Cores 24GB 500GB 4 Cores 8GB 25GB Large <5,000,000 16 Cores 64GB 5TB 8 Cores 16GB 100GB Very Large <10,000,000 24 Cores 128GB 10TB 8 Cores 16GB 100GB DB ServerAnalytics (Main) Server
  • 25. 39 VMware vCenter Operations Editions vCenter Operations Enterprise + Full Configuration & Compliance Management + Other VMware & 3rd Party Integrations (View, management, servers, storage) Non-VMware (incl. physical) environments vCenter Operations Advanced + Capacity Planning VMware Cloud / vCenter vSphere vCenter Operations Standard Performance Real-time Capacity Configuration Change
  • 26. 40 Understanding the vCenter Operations Editions vCenter Operations Standard Edition vCenter Operations Enterprise - Standalone Data Sources vCenter x 1 • Any 3rd party monitoring tools’ time series data • Change events • Multiple vCenter Servers Objects vCenter Objects (i.e.) • Data Centers • Clusters • ESX Hosts • Datastores • VMs x 1500 Unlimited Scope (i.e.) • Applications • Network Infrastructure • Storage • Hosts (ESX, Win, Linux, etc) • VMs Users Infrastructure (e.g. VI Admins) Operations, Infrastructure, Application Teams, Business Owners, CxOs Dynamic Thresholds Yes Yes Performance Root Cause Yes Yes Proactive Alerting No Yes Customizable Dashboards No Yes Notifications No Yes ScopeFunction