SlideShare a Scribd company logo
Proactive performance monitoring with adaptive thresholds
<Insert Picture Here> 
Proactive Performance Monitoring with Adaptive 
Thresholds 
John Beresniewicz 
Consulting Member of Technical Staff 
Oracle USA
The following is intended to outline our general 
product direction. It is intended for information 
purposes only, and may not be incorporated into 
any contract. It is not a commitment to deliver any 
material, code, or functionality, and should not be 
relied upon in making purchasing decisions. 
The development, release, and timing of any 
features or functionality described for Oracle’s 
products remains at the sole discretion of Oracle.
<Insert Picture Here> 
Agenda 
• Performance Monitoring 
• Understanding Metrics 
• Baselines and Adaptive Thresholds 
• Enterprise Manager Use Cases
<Insert Picture Here> 
Performance Monitoring
A brief history 
• Availability monitoring 
• Simple Boolean (up/down) using ping 
• Notification frameworks constructed 
• Performance monitoring 
• Fixed thresholds over system-level counters (V$SYSSTAT) 
• Use existing frameworks 
• Vendor metric madness 
• More metrics must be better 
• Users complaints are still the primary alerting mechanism
Performance alerting is difficult 
• Performance is subjective and variable 
• Better or worse, not best or worst 
• Applications vary in performance characteristics 
• Workloads vary predictably within system 
• Many metrics, few good signals 
• DB Time metrics far superior to counter-based ones 
• Metrics lack semantic framework 
• Do alerts point at symptoms, causes, both? 
• Setting thresholds manually is labor intensive 
• The M x N problem (M targets and N metrics)
<Insert Picture Here> 
Understanding Metrics
Classifying metrics 
• Identify a set of basic metrics 
• PERFORMANCE: Time-based metrics 
• KING KONG: Average Active Sessions 
• Response time per Txn, Response time per call 
• WORKLOAD TYPE 
• What kind of work is system doing? 
• Typically the “per txn” metrics 
• WORKLOAD VOLUME 
• How much demand is being placed on system? 
• Typically the “per sec” metrics 
• Triage performance effects by correlating with causes
Demand varies predictably 
Autocorrelation of calls per second for email system
Executions per second over a week 
• Weekdays show clear 
hour-of-day pattern 
• Weekends different 
• What threshold to set?
Average active sessions 
Scotty, I think 
we have a 
problem
Outliers or events? 
In stable system, 
metrics should be 
statistically stable 
and rare 
observations may 
signal events 
Are these significant?
<Insert Picture Here> 
Baselines and 
Adaptive Thresholds
Operational requirements 
• Set alert thresholds automatically 
• Determine thresholds relative to baseline behavior 
• Adjust thresholds for expected workload changes 
• Adapt thresholds to system evolution
AWR Baselines 
• Captured AWR snapshots representing expected 
performance under common workload 
• Capture can be pre-configured using templates 
• SYSTEM_MOVING_WINDOW 
• Trailing N days of data 
• Compare performance against recent history 
• N is settable in days, 3 weeks or 5 weeks are nice settings 
• Out-of-box baseline in RDBMS 11g
Time-grouping 
• Captures workload periodicity by grouping data into 
common diurnal time buckets 
• Daily periodicity 
• All hours, Day-Night, Hour-of-Day 
• Weekly periodicity 
• All days, Weekday-Weekend, Day-of-Week 
• Time-grouping combines daily and weekly periodicities
Metric statistics 
• Basic metrics only 
• Computed over SYSTEM_MOVING_WINDOW 
• Standard stats: MIN, MAX, AVG, STDDEV 
• Percentiles: 
• Measured: 25, 50 (median), 75, 90, 95, 99 
• Estimated: 99.9, 99.99 
• Computed over time-groups 
• Automatically determined in 11g 
• Computed weekly 
• Saturday 12 midnight Scheduler job
Time-grouped statistics
Adaptive alert thresholds 
• Percent of maximum thresholds 
• User input multiplier over time group maximum 
• Good for detecting load peaks 
• Significance level thresholds 
• Signal on unusual metric values 
• HIGH (95 pctile) 
• VERY HIGH (99 pctile) 
• SEVERE (99.9 pctile) 
• EXTREME (99.99 pctile) 
• Computed and set automatically 
• Thresholds can reset every hour (MMON task)
<Insert Picture Here> 
Enterprise Manager 
User Interface
Early 10g visualization: seismograph
Enterprise Manager entry points 
• DB home page: Related Links 
• 10g: Metric Baselines 
• Need to enable metric persistence 
• Static and moving window baselines 
• Time grouping selected by user 
• 11g: Baseline Metric Thresholds 
• Out-of-box metric persistence and statistics computation 
• Improved use case based interface 
• Automatic time grouping selection 
• Statistics computed over SYSTEM_MOVING_WINDOW
RDBMS 11g use case goals 
• Quickly configure Adaptive Thresholds 
• Adjust thresholds in context 
• Identify signals for known problem 
• Advanced metric analysis
Baseline Metric Thresholds page
Quickly configure Adaptive Thresholds
Quick configure: OLTP
Quick configure: Data Warehouse
Adjust thresholds in context
Adjust thresholds in context
Identify signals for known problem
Identify signals for known problem
Advanced metric analysis
Proactive performance monitoring with adaptive thresholds
Proactive performance monitoring with adaptive thresholds

More Related Content

PDF
Contract-oriented PLSQL Programming
PDF
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
PDF
Awr + 12c performance tuning
PDF
Oracle Performance Tuning Fundamentals
PDF
Ash architecture and advanced usage rmoug2014
PDF
Awr1page OTW2018
PPTX
Oracle Performance Tuning Training | Oracle Performance Tuning
PDF
Awr1page - Sanity checking time instrumentation in AWR reports
Contract-oriented PLSQL Programming
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
Awr + 12c performance tuning
Oracle Performance Tuning Fundamentals
Ash architecture and advanced usage rmoug2014
Awr1page OTW2018
Oracle Performance Tuning Training | Oracle Performance Tuning
Awr1page - Sanity checking time instrumentation in AWR reports

What's hot (20)

PPT
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
PPTX
Top 10 tips for Oracle performance (Updated April 2015)
PDF
Aioug vizag oracle12c_new_features
PDF
Analyzing and Interpreting AWR
PDF
Performance Tuning intro
PDF
AAS Deeper Meaning
PDF
How to find what is making your Oracle database slow
PDF
Performance tuning in sql server
PPTX
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 2
PPSX
Oracle Performance Tuning Fundamentals
PPT
OOUG - Oracle Performance Tuning with AAS
PDF
Why & how to optimize sql server for performance from design to query
PPT
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
PDF
Oracle Performance Tools of the Trade
PDF
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
PPTX
Adapting and adopting spm v04
PDF
SQL Server Tuning to Improve Database Performance
PPTX
Kscope 14 Presentation : Virtual Data Platform
PPTX
Crack the complexity of oracle applications r12 workload v2
PPT
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
Top 10 tips for Oracle performance (Updated April 2015)
Aioug vizag oracle12c_new_features
Analyzing and Interpreting AWR
Performance Tuning intro
AAS Deeper Meaning
How to find what is making your Oracle database slow
Performance tuning in sql server
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 2
Oracle Performance Tuning Fundamentals
OOUG - Oracle Performance Tuning with AAS
Why & how to optimize sql server for performance from design to query
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Oracle Performance Tools of the Trade
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Adapting and adopting spm v04
SQL Server Tuning to Improve Database Performance
Kscope 14 Presentation : Virtual Data Platform
Crack the complexity of oracle applications r12 workload v2
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Ad

Similar to Proactive performance monitoring with adaptive thresholds (20)

PDF
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
PPTX
Monitoring and Managing Java Applications
PDF
Measure All the Things! - Austin Data Day 2014
PPT
Ch24 system administration
PDF
Doc 2011101412020074
PDF
I Know What You Did Last Summer
PDF
Ioug oow12 em12c
PDF
10 tips-for-optimizing-sql-server-performance-white-paper-22127
PDF
Your Attention, Please: Better Observability for Distributed Systems - John F...
PDF
I Know What You Did THIS Summer
PPT
EM12c Monitoring, Metric Extensions and Performance Pages
PDF
Strategic governance performance_management_systems
PPTX
StatsCraft 2015: Introduction to monitoring - Yoav Abrahami and Mark Sonis
PPTX
Logging, Instrumentation, Dashboards and Alerts - for developers
PDF
3 types of monitoring for 2020
PDF
EM12c: Capacity Planning with OEM Metrics
PPTX
Perfmon And Profiler 101
PDF
Oracle Enteprise Manager Cloud Control 12c - Setting Up Metrics and Monitorin...
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Monitoring and Managing Java Applications
Measure All the Things! - Austin Data Day 2014
Ch24 system administration
Doc 2011101412020074
I Know What You Did Last Summer
Ioug oow12 em12c
10 tips-for-optimizing-sql-server-performance-white-paper-22127
Your Attention, Please: Better Observability for Distributed Systems - John F...
I Know What You Did THIS Summer
EM12c Monitoring, Metric Extensions and Performance Pages
Strategic governance performance_management_systems
StatsCraft 2015: Introduction to monitoring - Yoav Abrahami and Mark Sonis
Logging, Instrumentation, Dashboards and Alerts - for developers
3 types of monitoring for 2020
EM12c: Capacity Planning with OEM Metrics
Perfmon And Profiler 101
Oracle Enteprise Manager Cloud Control 12c - Setting Up Metrics and Monitorin...
Ad

More from John Beresniewicz (8)

PDF
ASHviz - Dats visualization research experiments using ASH data
PDF
NoSQL is Anti-relational
PDF
JB Design CV: products / mockups / experiments
PDF
Awr1page - Sanity checking time instrumentation in AWR reports
PDF
AWR Ambiguity: Performance reasoning when the numbers don't add up
PDF
Ash Outliers UKOUG2011
PDF
Average Active Sessions - OaktableWorld 2013
PDF
Average Active Sessions RMOUG2007
ASHviz - Dats visualization research experiments using ASH data
NoSQL is Anti-relational
JB Design CV: products / mockups / experiments
Awr1page - Sanity checking time instrumentation in AWR reports
AWR Ambiguity: Performance reasoning when the numbers don't add up
Ash Outliers UKOUG2011
Average Active Sessions - OaktableWorld 2013
Average Active Sessions RMOUG2007

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
The AUB Centre for AI in Media Proposal.docx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?

Proactive performance monitoring with adaptive thresholds

  • 2. <Insert Picture Here> Proactive Performance Monitoring with Adaptive Thresholds John Beresniewicz Consulting Member of Technical Staff Oracle USA
  • 3. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 4. <Insert Picture Here> Agenda • Performance Monitoring • Understanding Metrics • Baselines and Adaptive Thresholds • Enterprise Manager Use Cases
  • 5. <Insert Picture Here> Performance Monitoring
  • 6. A brief history • Availability monitoring • Simple Boolean (up/down) using ping • Notification frameworks constructed • Performance monitoring • Fixed thresholds over system-level counters (V$SYSSTAT) • Use existing frameworks • Vendor metric madness • More metrics must be better • Users complaints are still the primary alerting mechanism
  • 7. Performance alerting is difficult • Performance is subjective and variable • Better or worse, not best or worst • Applications vary in performance characteristics • Workloads vary predictably within system • Many metrics, few good signals • DB Time metrics far superior to counter-based ones • Metrics lack semantic framework • Do alerts point at symptoms, causes, both? • Setting thresholds manually is labor intensive • The M x N problem (M targets and N metrics)
  • 8. <Insert Picture Here> Understanding Metrics
  • 9. Classifying metrics • Identify a set of basic metrics • PERFORMANCE: Time-based metrics • KING KONG: Average Active Sessions • Response time per Txn, Response time per call • WORKLOAD TYPE • What kind of work is system doing? • Typically the “per txn” metrics • WORKLOAD VOLUME • How much demand is being placed on system? • Typically the “per sec” metrics • Triage performance effects by correlating with causes
  • 10. Demand varies predictably Autocorrelation of calls per second for email system
  • 11. Executions per second over a week • Weekdays show clear hour-of-day pattern • Weekends different • What threshold to set?
  • 12. Average active sessions Scotty, I think we have a problem
  • 13. Outliers or events? In stable system, metrics should be statistically stable and rare observations may signal events Are these significant?
  • 14. <Insert Picture Here> Baselines and Adaptive Thresholds
  • 15. Operational requirements • Set alert thresholds automatically • Determine thresholds relative to baseline behavior • Adjust thresholds for expected workload changes • Adapt thresholds to system evolution
  • 16. AWR Baselines • Captured AWR snapshots representing expected performance under common workload • Capture can be pre-configured using templates • SYSTEM_MOVING_WINDOW • Trailing N days of data • Compare performance against recent history • N is settable in days, 3 weeks or 5 weeks are nice settings • Out-of-box baseline in RDBMS 11g
  • 17. Time-grouping • Captures workload periodicity by grouping data into common diurnal time buckets • Daily periodicity • All hours, Day-Night, Hour-of-Day • Weekly periodicity • All days, Weekday-Weekend, Day-of-Week • Time-grouping combines daily and weekly periodicities
  • 18. Metric statistics • Basic metrics only • Computed over SYSTEM_MOVING_WINDOW • Standard stats: MIN, MAX, AVG, STDDEV • Percentiles: • Measured: 25, 50 (median), 75, 90, 95, 99 • Estimated: 99.9, 99.99 • Computed over time-groups • Automatically determined in 11g • Computed weekly • Saturday 12 midnight Scheduler job
  • 20. Adaptive alert thresholds • Percent of maximum thresholds • User input multiplier over time group maximum • Good for detecting load peaks • Significance level thresholds • Signal on unusual metric values • HIGH (95 pctile) • VERY HIGH (99 pctile) • SEVERE (99.9 pctile) • EXTREME (99.99 pctile) • Computed and set automatically • Thresholds can reset every hour (MMON task)
  • 21. <Insert Picture Here> Enterprise Manager User Interface
  • 23. Enterprise Manager entry points • DB home page: Related Links • 10g: Metric Baselines • Need to enable metric persistence • Static and moving window baselines • Time grouping selected by user • 11g: Baseline Metric Thresholds • Out-of-box metric persistence and statistics computation • Improved use case based interface • Automatic time grouping selection • Statistics computed over SYSTEM_MOVING_WINDOW
  • 24. RDBMS 11g use case goals • Quickly configure Adaptive Thresholds • Adjust thresholds in context • Identify signals for known problem • Advanced metric analysis
  • 31. Identify signals for known problem
  • 32. Identify signals for known problem