SlideShare a Scribd company logo
Predicting Production Outages:
Unleashing the Power of Micro-Metrics
Ram Lakshmanan
Architect yCrash
Predicting Memory Problems
3
Healthy Application
4
Acute Memory Leak
5
Memory Leak
GC Throughput
Micrometric
Source: Garbage Collection Log
How does 96% GC Throughput sound?
1 day = 1440 Minutes (i.e., 24 hours x 60 minutes)
96% GC Throughput means app pausing for 57.6
minutes/day
7
What is GC Throughput?
Amount of time application spends in processing customer
transactions
vs
Amount of time application spends in processing garbage
collection activity
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
8
Open-source script:
https://github.com/ycrash/yc-data-script
360° Data
Predicting Backend Slowdown
Application Architecture
JDBC
SOAP
MainFrame
REST
Server Thread Pool
Application Server
HTTP(S) request
10
Application Architecture
JDBC
SOAP
MainFrame
REST
Server Thread Pool
Application Server
HTTP(S) request
11
Threads with identical Stack trace
Micrometric
Source: Thread Dump
13
Case Study
Backend Slowdown in a Major
Financial Institution in N.
America
Predicting CPU Spike
top –H –p <PROCESS_ID>’
Secrete Option:
15
We all might have used ‘top’
Case Study
Major Trading app in N.
America
https://blog.fastthread.io/2020/04/23/troubleshooting-cpu-spike-in-a-major-trading-application/
16
Predicting Concurrency issues
public void synchronized getData() {
doSomething();
}
Thread 1
Thread 2
Thread 1
BLOCKED THREADS
Concurrency Problem
18
BLOCKED state threads
Micrometric
Source: Thread Dump
Case Study
Major Leisure Travel Service
Provider
https://blog.fastthread.io/2020/04/23/troubleshooting-cpu-spike-in-a-major-trading-application/
20
Specific Errors/Exceptions
Micrometric
Source: Application Logs
My App
yCrash
agent
yCrash Server
Container/Machine
1
Every 3 minutes Micro-Metrics*
are captured
2 Metrics are transmitted
4 If problem forecasted,
360 ° data capture
is triggered
3 ML, Patterns applied on the Micro-Metrics
Cloud/On-premise
22
Micro-Metrics *
1. Garbage Collection Log
2. Thread Dump + top –H
3. Application Log
Micro-Metrics Monitoring Architecture
1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
23
Open-source script:
https://github.com/ycrash/yc-data-script
360° Data
Ram Lakshmanan ram@tier1app.com
@tier1app https://www.linkedin.com/company/ycrash
This deck will be published in:
https://blog.ycrash.io
Learn to troubleshoot like a pro with my online training program
24
THANK YOU
FRIENDS

More Related Content

PPTX
predicting-m3-devopsconMunich-2023-v2.pptx
PPTX
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
PPTX
predicting-outages-micro-metrics-ADDO-2023.pptx
PPTX
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
PPTX
7 Micro-Metrics That Predict Production Outages in Performance Labs Webinar
PPTX
Top-5-java-perf-problems-jax_mainz_2024.pptx
PPTX
Micro-metrics to forecast performance tsunamis
PPTX
GC Tuning: Fortune 500 Case Studies on Cutting Costs and Boosting Performance
predicting-m3-devopsconMunich-2023-v2.pptx
Predicting Production Outages: Unleashing the Power of Micro-Metrics – ADDO C...
predicting-outages-micro-metrics-ADDO-2023.pptx
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
7 Micro-Metrics That Predict Production Outages in Performance Labs Webinar
Top-5-java-perf-problems-jax_mainz_2024.pptx
Micro-metrics to forecast performance tsunamis
GC Tuning: Fortune 500 Case Studies on Cutting Costs and Boosting Performance

Similar to predicting-m3-devopsconMunich-2023.pptx (20)

PPTX
7 habits of highly effective Performance Troubleshooters
PPTX
Troubleshooting JVM Outages – 3 Fortune 500 Case Studies
PPTX
Troubleshooting JVM Outages – 3 Fortune 500 case studies
PPTX
Troubleshooting JVM Outages – 3 Fortune 500 case studies
PPTX
Top-5-Performance-JaxLondon-2023.pptx
PPTX
Top Java Performance Problems and Metrics To Check in Your Pipeline
PDF
LISA2010 visualizations
PPTX
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
PPTX
The Business Justification for APM
PPTX
MAJOR OUTAGES IN MAJOR ENTERPRISES
PDF
Self-Aware Applications: Automatic Production Monitoring (NDC Sydney 2017)
PPTX
What to consider when monitoring microservices
PPTX
Big server-is-watching-you
PDF
Performance Analysis: The USE Method
PPTX
Tracking SLAs In Cloud
PDF
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
PDF
Cross-Platform Observability for Cloud Foundry
PPTX
Key Challenges in Troubleshooting Customer On-Premise Applications
PDF
Webinar: Diagnosing Apache Cassandra Problems in Production
PDF
Webinar: Diagnosing Apache Cassandra Problems in Production
7 habits of highly effective Performance Troubleshooters
Troubleshooting JVM Outages – 3 Fortune 500 Case Studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Top-5-Performance-JaxLondon-2023.pptx
Top Java Performance Problems and Metrics To Check in Your Pipeline
LISA2010 visualizations
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
The Business Justification for APM
MAJOR OUTAGES IN MAJOR ENTERPRISES
Self-Aware Applications: Automatic Production Monitoring (NDC Sydney 2017)
What to consider when monitoring microservices
Big server-is-watching-you
Performance Analysis: The USE Method
Tracking SLAs In Cloud
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Cross-Platform Observability for Cloud Foundry
Key Challenges in Troubleshooting Customer On-Premise Applications
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production

More from Tier1 app (20)

PPTX
What to Capture When It Breaks: 16 Artifacts That Reveal Root Causes
PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
PDF
Troubleshooting Virtual Threads in Java!
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
PPTX
GC Tuning: A Masterpiece in Performance Engineering
PPTX
How to Troubleshoot 9 Types of OutOfMemoryError
PPTX
Not So Common Memory Leaks in Java Webinar
PPTX
Common Memory Leaks in Java and How to Fix Them
PPTX
Top 5 Java Performance Problems Presentation!
PPTX
Mastering Thread Dump Analysis: 9 Tips & Tricks
PPTX
How to Check and Optimize Memory Size for Better Application Performance
PPTX
TroubleshootingJVMOutages-3CaseStudies (1).pptx
PPTX
TroubleshootingJVMOutages-3CaseStudies.pptx
PPTX
Major Outages in Major Enterprises Payara Conference
PPTX
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
PPTX
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
PPTX
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
PPTX
Effectively Troubleshoot 9 Types of OutOfMemoryError
PPTX
Top-5-production-devconMunich-2023-v2.pptx
PPTX
Top-5-production-devconMunich-2023.pptx
What to Capture When It Breaks: 16 Artifacts That Reveal Root Causes
Virtual Threads in Java: A New Dimension of Scalability and Performance
Troubleshooting Virtual Threads in Java!
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
GC Tuning: A Masterpiece in Performance Engineering
How to Troubleshoot 9 Types of OutOfMemoryError
Not So Common Memory Leaks in Java Webinar
Common Memory Leaks in Java and How to Fix Them
Top 5 Java Performance Problems Presentation!
Mastering Thread Dump Analysis: 9 Tips & Tricks
How to Check and Optimize Memory Size for Better Application Performance
TroubleshootingJVMOutages-3CaseStudies (1).pptx
TroubleshootingJVMOutages-3CaseStudies.pptx
Major Outages in Major Enterprises Payara Conference
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
Effectively Troubleshoot 9 Types of OutOfMemoryError
Top-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023.pptx

Recently uploaded (20)

PPTX
assetexplorer- product-overview - presentation
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
L1 - Introduction to python Backend.pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Nekopoi APK 2025 free lastest update
PPTX
CHAPTER 2 - PM Management and IT Context
assetexplorer- product-overview - presentation
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Which alternative to Crystal Reports is best for small or large businesses.pdf
Reimagine Home Health with the Power of Agentic AI​
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
L1 - Introduction to python Backend.pptx
ai tools demonstartion for schools and inter college
Upgrade and Innovation Strategies for SAP ERP Customers
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Digital Systems & Binary Numbers (comprehensive )
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Computer Software and OS of computer science of grade 11.pptx
Nekopoi APK 2025 free lastest update
CHAPTER 2 - PM Management and IT Context

predicting-m3-devopsconMunich-2023.pptx

Editor's Notes

  • #14: http://localhost:8080/yc-report.jsp?ou=SAP&de=198.134.23.1&app=yc&ts=2023-06-11T22-56-32
  • #17: http://localhost:8080/yc-report.jsp?ou=SAP&de=32.123.89.12&app=yc&ts=2023-06-11T23-54-10
  • #21: http://localhost:8080/yc-report.jsp?ou=SAP&de=90.21.123.19&app=yc&ts=2023-12-03T19-11-33