SlideShare a Scribd company logo
Shift-Left SRE - Self-healing on
OpenShift with Ansible
#RedHatOSD
Jürgen Etzlstorfer, Technology Strategist
@jetzlstorfer
A relentless pursuit of software perfection
1600 Employees 5000 Enterprise Customers 79 of the Global 100
confidential
If you write applications,
they will break eventually
~ Murphy‘s law
On average, a single transaction uses 82 different types of technology
Browser
Multi-geo
Mobile Network
Code
Hosts
Logs
IoT
3rd parties
Services
Cloud SDN
Containers
Applications are getting more complex!
confidential
What if you had
something similar to
a self-healing robot?
confidential
But how?
Building auto-remediation into your pipeline
#RedHatOSD
✓ ✓ ✓ ✓
commit build test stage prod
✓
Building auto-remediation into your pipeline
#RedHatOSD
✓ ✓ ✓ ✓ ✗
commit build test stage prod
Building auto-remediation into your pipeline
#RedHatOSD
✓ ✓ ✗
commit build test stage prod
Break the
pipeline early
Building auto-remediation into your pipeline
#RedHatOSD
✓ ✓
commit build test stage prod
Include
remediation-as-code
…
Building auto-remediation into your pipeline
#RedHatOSD
✓ ✓
commit build test stage prod
Include
remediation-as-code
✓
… … … … …
✓ ✓
OpenShift Container Platform
Dynatrace Software Intelligence
Ansible Automation
Self-healing applications
Automation
via APIs
▪ Automation/Execution: perform
mitigation/remediation actions
▪ Access to all systems
▪ Monitoring: know what’s going on in your
applications
▪ End-to-end
▪ Full-stack – fully integrated in production
Self-healing building blocks
#RedHatOSD
FullStack-PoweredbyDynatraceOneAgent
Self-healing with Ansible Tower and Dynatrace
#RedHatOSD
● APIs are key to enable automation
● Ansible Tower provides rich API for managing Ansible jobs
● Playbooks can be orchestrated in workflows and job templates
How to enable auto-remediation
Full-stack
environment
is monitored
Anomalies
are detected
automatically
Root
cause
analysis is
performed
Problem
notification
is sent
Event is
received
Job is
triggered
Playbook is
executed
Problem is
remediated
#RedHatOSD
How to enable auto-remediation
Full-stack
environment
is monitored
Anomalies
are detected
automatically
Root
cause
analysis is
performed
Problem
notification
is sent
Event is
received
Job is
triggered
Playbook is
executed
Problem is
remediated
#RedHatOSD
Ansible Tower integration in Dynatrace
#RedHatOSD
Ansible Tower integration in Dynatrace
#RedHatOSD
Ansible Tower integration in Dynatrace
#RedHatOSD
confidential
What we will see in the demo
▪ TicketMonster application running on OpenShift
▪ Full-stack, end-to-end monitoring by Dynatrace
▪ Feature release via Ansible Tower
▪ Auto-remediation as code (Ansible playbooks)
#RedHatOSD
DEMO TIME :)
confidential
What we have seen in the demo – short recap
Release of a
new feature
Dynatrace
detects
increase of
failure rate
Dynatrace
fires a
problem
notification
to Ansible
Tower
Ansible
Tower kicks
off a
playbook
Check for
latest
deployment
with
remediation
scripts
Remediation
script is
executed
Problem is
remediated
#RedHatOSD
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
Escalate at
2AM?
5
1
2
3
4
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
Escalate at
2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
5
1
2
3
4
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
Escalate at
2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
5
1
2
3
4
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
Escalate at
2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
4 Hung threads? Restart Service!
5
1
2
3
4
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
Escalate at
2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
4 Hung threads? Restart Service!
5
1
2
3
4
Update Dev
TicketsImpact Mitigated??
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
Escalate at
2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
4 Hung threads? Restart Service!
5 Still ongoing? Initiate Rollback!
5
1
2
3
4
Mark Bad
Commits
Update Dev
TicketsImpact Mitigated??
Self-healing in an enterprise environment
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
Escalate at
2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
4 Hung threads? Restart Service!
5 Still ongoing? Initiate Rollback!
Escalate? Still ongoing?
5
1
2
3
4
Mark Bad
Commits
Update Dev
TicketsImpact Mitigated??
confidential
Embed auto-remediation in your CI/CD pipeline
confidential
Embed auto-remediation in your CI/CD pipeline
Path to NoOps: Self-Healing, …
Shift-Right: Tags, Deploys, Events
confidential
Embed auto-remediation in your CI/CD pipeline
Shift-Left: Break Pipeline Earlier
Path to NoOps: Self-Healing, …
Shift-Right: Tags, Deploys, Events
Actionable Feedback Loops
https://www.ansible.com/blog/enable-self-healing-applications-with-ansible-and-dynatrace
https://www.dynatrace.com/news/blog/set-up-ansible-tower-with-dynatrace-to-enable-your-self-healing-applications/
OpenShift Dynatrace♥
“Beyond years of industry knowledge in the APM space, Dynatrace offers one of the best solutions I’ve seen
for monitoring applications running on OpenShift. What really distinguishes them from others is the use of
artificial intelligence based root-cause analysis. OpenShift is a platform to allow you to run decoupled services
and applications, which can be a monitoring nightmare, but Dynatrace’s insights makes it less scary.”
Chris Morgan, Technical Director – Red Hat OpenShift Ecosystem
GRAZIE PER L’ATTENZIONE
Jürgen Etzlstorfer, Technology Strategist
#RedHatOSD

More Related Content

PDF
"Creating, Weaponizing,and Detecting Deep Fakes," a Presentation from U.C. Be...
PDF
Agentic RAG What it is its types applications and implementation.pdf
PPTX
DevOps + DataOps = Digital Transformation
PPTX
[PR12] Inception and Xception - Jaejun Yoo
PDF
Accelerating Product Development FLOW: Kanban at Jaguar Land Rover
PPTX
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
PPTX
Material MDM in the Oil & Gas Industry - A Verdantis Case Study
PPTX
Artificial intelligence
"Creating, Weaponizing,and Detecting Deep Fakes," a Presentation from U.C. Be...
Agentic RAG What it is its types applications and implementation.pdf
DevOps + DataOps = Digital Transformation
[PR12] Inception and Xception - Jaejun Yoo
Accelerating Product Development FLOW: Kanban at Jaguar Land Rover
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Material MDM in the Oil & Gas Industry - A Verdantis Case Study
Artificial intelligence

What's hot (20)

PDF
IoT and Smart Manufacturing
PDF
Digital Twin and Smart Spaces
PDF
Smart City and Digital Twin
PPTX
Asset Tracking and Location Technologies for Internet of Things
PDF
chapter 11 HANDS ON MACHINE LEARNING SCIKIT
PDF
Anomaly Detection At The Edge
PDF
Siemens_2022_JPM-Digital-Twin-Conference.pdf
PPT
Digital Manufacturing
PPTX
Digital twins ppt
PPTX
How to apply machine learning into your CI/CD pipeline
PDF
Industry 4.0: Navigating the Manufacturing Revolution in ASEAN
PDF
Snowflake Data Science and AI/ML at Scale
PDF
LLM Cheatsheet and it's brief introduction
PPTX
Systems Thinking Primer
PDF
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
PPTX
Explainable AI in Industry (FAT* 2020 Tutorial)
PDF
Drive digital transformation to digital twin
PPTX
Intelligence artificielle: Définitions et challenges
PDF
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
PPTX
Introduction to Keras
IoT and Smart Manufacturing
Digital Twin and Smart Spaces
Smart City and Digital Twin
Asset Tracking and Location Technologies for Internet of Things
chapter 11 HANDS ON MACHINE LEARNING SCIKIT
Anomaly Detection At The Edge
Siemens_2022_JPM-Digital-Twin-Conference.pdf
Digital Manufacturing
Digital twins ppt
How to apply machine learning into your CI/CD pipeline
Industry 4.0: Navigating the Manufacturing Revolution in ASEAN
Snowflake Data Science and AI/ML at Scale
LLM Cheatsheet and it's brief introduction
Systems Thinking Primer
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Explainable AI in Industry (FAT* 2020 Tutorial)
Drive digital transformation to digital twin
Intelligence artificielle: Définitions et challenges
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
Introduction to Keras
Ad

Similar to Shift-left SRE: Self-healing on OpenShift with Ansible (20)

PDF
OSMC 2024 | Building a better check_http by Mattias Schlenker.pdf
PDF
Dev ops with smell v1.2
PDF
It's Time to Debloat the Cloud with Unikraft
PDF
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
PDF
Success Factors for a Mature Microservices Implementation
PPTX
Stress Testing at Twitter: a tale of New Year Eves
PDF
Serverless in production, an experience report (FullStack 2018)
PDF
Serverless in Production, an experience report (AWS UG South Wales)
PDF
Serverless in production, an experience report (CoDe-Conf)
PDF
Serverless in production, an experience report
PDF
Experts live dtap reinvented, a risk driven approach to release pipelines
PDF
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
PDF
Need to-know patterns building microservices - java one
PPT
Schneider Electric Scada Global Support Provides Troubleshooting and Technica...
PDF
Industrial IoT bootcamp
PDF
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
PPTX
Tech trends 2018 2019
PDF
Developer-Friendly CI / CD for Kubernetes
PDF
44CON 2014 - Switches Get Stitches, Eireann Leverett & Matt Erasmus
PPT
Continues Deployment - Tech Talk week
OSMC 2024 | Building a better check_http by Mattias Schlenker.pdf
Dev ops with smell v1.2
It's Time to Debloat the Cloud with Unikraft
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
Success Factors for a Mature Microservices Implementation
Stress Testing at Twitter: a tale of New Year Eves
Serverless in production, an experience report (FullStack 2018)
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report
Experts live dtap reinvented, a risk driven approach to release pipelines
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Need to-know patterns building microservices - java one
Schneider Electric Scada Global Support Provides Troubleshooting and Technica...
Industrial IoT bootcamp
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Tech trends 2018 2019
Developer-Friendly CI / CD for Kubernetes
44CON 2014 - Switches Get Stitches, Eireann Leverett & Matt Erasmus
Continues Deployment - Tech Talk week
Ad

Recently uploaded (20)

PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Introduction to Artificial Intelligence
PPT
Introduction Database Management System for Course Database
PDF
System and Network Administraation Chapter 3
PPTX
ai tools demonstartion for schools and inter college
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PDF
Digital Strategies for Manufacturing Companies
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
L1 - Introduction to python Backend.pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Softaken Excel to vCard Converter Software.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Online Work Permit System for Fast Permit Processing
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction to Artificial Intelligence
Introduction Database Management System for Course Database
System and Network Administraation Chapter 3
ai tools demonstartion for schools and inter college
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Nekopoi APK 2025 free lastest update
Digital Strategies for Manufacturing Companies
How to Choose the Right IT Partner for Your Business in Malaysia
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PTS Company Brochure 2025 (1).pdf.......
L1 - Introduction to python Backend.pptx
How to Migrate SBCGlobal Email to Yahoo Easily
Softaken Excel to vCard Converter Software.pdf

Shift-left SRE: Self-healing on OpenShift with Ansible

  • 1. Shift-Left SRE - Self-healing on OpenShift with Ansible #RedHatOSD Jürgen Etzlstorfer, Technology Strategist @jetzlstorfer
  • 2. A relentless pursuit of software perfection 1600 Employees 5000 Enterprise Customers 79 of the Global 100
  • 3. confidential If you write applications, they will break eventually ~ Murphy‘s law
  • 4. On average, a single transaction uses 82 different types of technology Browser Multi-geo Mobile Network Code Hosts Logs IoT 3rd parties Services Cloud SDN Containers Applications are getting more complex!
  • 5. confidential What if you had something similar to a self-healing robot?
  • 7. Building auto-remediation into your pipeline #RedHatOSD ✓ ✓ ✓ ✓ commit build test stage prod ✓
  • 8. Building auto-remediation into your pipeline #RedHatOSD ✓ ✓ ✓ ✓ ✗ commit build test stage prod
  • 9. Building auto-remediation into your pipeline #RedHatOSD ✓ ✓ ✗ commit build test stage prod Break the pipeline early
  • 10. Building auto-remediation into your pipeline #RedHatOSD ✓ ✓ commit build test stage prod Include remediation-as-code …
  • 11. Building auto-remediation into your pipeline #RedHatOSD ✓ ✓ commit build test stage prod Include remediation-as-code ✓ … … … … … ✓ ✓
  • 12. OpenShift Container Platform Dynatrace Software Intelligence Ansible Automation Self-healing applications
  • 13. Automation via APIs ▪ Automation/Execution: perform mitigation/remediation actions ▪ Access to all systems ▪ Monitoring: know what’s going on in your applications ▪ End-to-end ▪ Full-stack – fully integrated in production Self-healing building blocks #RedHatOSD
  • 15. Self-healing with Ansible Tower and Dynatrace #RedHatOSD ● APIs are key to enable automation ● Ansible Tower provides rich API for managing Ansible jobs ● Playbooks can be orchestrated in workflows and job templates
  • 16. How to enable auto-remediation Full-stack environment is monitored Anomalies are detected automatically Root cause analysis is performed Problem notification is sent Event is received Job is triggered Playbook is executed Problem is remediated #RedHatOSD
  • 17. How to enable auto-remediation Full-stack environment is monitored Anomalies are detected automatically Root cause analysis is performed Problem notification is sent Event is received Job is triggered Playbook is executed Problem is remediated #RedHatOSD
  • 18. Ansible Tower integration in Dynatrace #RedHatOSD
  • 19. Ansible Tower integration in Dynatrace #RedHatOSD
  • 20. Ansible Tower integration in Dynatrace #RedHatOSD
  • 21. confidential What we will see in the demo ▪ TicketMonster application running on OpenShift ▪ Full-stack, end-to-end monitoring by Dynatrace ▪ Feature release via Ansible Tower ▪ Auto-remediation as code (Ansible playbooks) #RedHatOSD
  • 23. confidential What we have seen in the demo – short recap Release of a new feature Dynatrace detects increase of failure rate Dynatrace fires a problem notification to Ansible Tower Ansible Tower kicks off a playbook Check for latest deployment with remediation scripts Remediation script is executed Problem is remediated #RedHatOSD
  • 24. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! Escalate at 2AM? 5 1 2 3 4
  • 25. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! Escalate at 2AM? 2 High Garbage Collection? Adjust/Revert Memory Settings! 5 1 2 3 4
  • 26. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! 3 Issue with BLUE only? Switch back to GREEN! Escalate at 2AM? 2 High Garbage Collection? Adjust/Revert Memory Settings! 5 1 2 3 4
  • 27. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! 3 Issue with BLUE only? Switch back to GREEN! Escalate at 2AM? 2 High Garbage Collection? Adjust/Revert Memory Settings! 4 Hung threads? Restart Service! 5 1 2 3 4
  • 28. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! 3 Issue with BLUE only? Switch back to GREEN! Escalate at 2AM? 2 High Garbage Collection? Adjust/Revert Memory Settings! 4 Hung threads? Restart Service! 5 1 2 3 4 Update Dev TicketsImpact Mitigated??
  • 29. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! 3 Issue with BLUE only? Switch back to GREEN! Escalate at 2AM? 2 High Garbage Collection? Adjust/Revert Memory Settings! 4 Hung threads? Restart Service! 5 Still ongoing? Initiate Rollback! 5 1 2 3 4 Mark Bad Commits Update Dev TicketsImpact Mitigated??
  • 30. Self-healing in an enterprise environment Auto Mitigate! 1 CPU Exhausted? Add a new service instance! 3 Issue with BLUE only? Switch back to GREEN! Escalate at 2AM? 2 High Garbage Collection? Adjust/Revert Memory Settings! 4 Hung threads? Restart Service! 5 Still ongoing? Initiate Rollback! Escalate? Still ongoing? 5 1 2 3 4 Mark Bad Commits Update Dev TicketsImpact Mitigated??
  • 32. confidential Embed auto-remediation in your CI/CD pipeline Path to NoOps: Self-Healing, … Shift-Right: Tags, Deploys, Events
  • 33. confidential Embed auto-remediation in your CI/CD pipeline Shift-Left: Break Pipeline Earlier Path to NoOps: Self-Healing, … Shift-Right: Tags, Deploys, Events Actionable Feedback Loops
  • 35. OpenShift Dynatrace♥ “Beyond years of industry knowledge in the APM space, Dynatrace offers one of the best solutions I’ve seen for monitoring applications running on OpenShift. What really distinguishes them from others is the use of artificial intelligence based root-cause analysis. OpenShift is a platform to allow you to run decoupled services and applications, which can be a monitoring nightmare, but Dynatrace’s insights makes it less scary.” Chris Morgan, Technical Director – Red Hat OpenShift Ecosystem
  • 36. GRAZIE PER L’ATTENZIONE Jürgen Etzlstorfer, Technology Strategist #RedHatOSD