How to build your own auto-remediation workflow - Ansible Meetup Munich

Jürgen Etzlstorfer
@jetzlstorfer
Technology Strategist
How to build your own auto-remediation
workflow for your applications using Ansible
Ansible Meetup Munich, 10th July 2018

confidential
The journey – Part 1

confidential
The journey – Part 2
 What is (auto-)remediation and why you need it
 How to build your auto-remediation workflow?
 Demo Time!
 Outlook: embed auto-remedation in your CI/CD pipeline

On average, a single transaction uses 82 different types of technology
Browser
Multi-geo
Mobile Network
Code
Hosts
Logs
IoT
3rd parties
Services
Cloud SDN
Containers
Applications are getting more complex!

confidential
If you write applications,
they will break eventually
~ Murphy‘s law

confidential
What if you had
something similar to
a self-healing robot?

confidential
What is needed for self-healing applications?
 Monitoring: know what’s going on in your
applications
 End-to-end
 Full-stack – fully integrated in production
(or even in staging)
 Automation/Execution: perform
mitigation/remediation actions
 Access to all systems
 Automation system should be isolated from
production system
APIs

confidential
Know what‘s going on in your
applications
 Monitor your applications Identify the root cause
of the problem!

confidential
Auto-remediation with Ansible (Tower)
 APIs are key to enable automation
 Ansible Tower makes extensive use APIs internally and exposes them also externally
 Ansible playbooks are scripts that are executed from a central host on different machines
 Multiple OS are supported
 Idempotent
 Playbooks can be orchestrated in workflows and job templates

Full-stack
environment
is monitored
Anomalies
are detected
automatically
Root
cause
analysis is
performed
Problem
notification
is sent
Event is
received
Job is
triggered
Playbook is
executed
Problem is
remediated
How to enable auto-remediation

Staging
Approve
Staging
Production
Approve
Production
Up and
running
Scenario: How to mitigate a bad deployment?
Staging
Approve
Staging
Production
Approve
Production
Remediation
Roll-
back

confidential
---
- name: rollback to previous version
hosts: localhost
vars:
...
tasks:
- name: push comment to dynatrace
uri:
url: "{{dtcommentapiurl}}"
method: POST
body_format: json
body: "{ "comment": "Remediation playbook started.", "user": "{{commentuser}}", "context":
"Ansible Tower" }"
- name: fetch custom deployment events
uri:
url: "{{dtdeploymentapiurl}}"
return_content: yes
with_items: "{{ impactedEntities }}"
register: customproperties
ignore_errors: no
- name: parse deployment events
set_fact:
deployment_events: "{{item.json.events}}"
with_items: "{{ customproperties.results }}"
register: app_result

confidential
- name: call remediation action
uri:
url: "{{ myItem.remediationAction }}"
method: POST
body_format: json
body: "{{ payload | to_json }}"
return_content: yes
ignore_errors: yes
register: result
- name: push success comment to dynatrace
uri:
url: "{{dtcommentapiurl}}"
method: POST
body_format: json
body: "{ "comment": "Invoked remediation action successfully executed: {{result.content}}",
"user": "{{commentuser}}", "context": "Ansible Tower" }"
when: result.status == 200
- name: push error comment to dynatrace
...
body: "{ "comment": "Invoked remediation action failed: {{result.content}}", "user":
"{{commentuser}}", "context": "Ansible Tower" }"
when: result.status != 200

confidential
Steps to mitigate the bad deployment
Fetch
information
about event
Process the
data
Select
corresponding
remediation
action
1.Execution the
remediation
action
Keep track of all automation steps

confidential
Auto-remediation as a safety net
It does not fix your problem

confidential
https://blogs.msdn.microsoft.com/visualstudioalmrangers/2017/04/17/set-up-a-cicd-pipeline-for-your-team-services-extension/

confidential
Embed auto-remediation in your CI/CD pipeline
Shift-Left: Break Pipeline Earlier
Path to NoOps: Self-Healing, …
Shift-Right: Tags, Deploys, Events
Actionable Feedback Loops

Injecting speed &
quality: automatic gate
at test & performance
• Continuous Performance Validation for daily builds
• Root Cause details automatically pushed to JIRA
• Decisions made to compare, break, or good-to-go
Shift-left:engage Dev withearlier & automatedfeedback

confidential
Shift-right:empowerOps withmore contextto react faster

https://github.com/Dynatrace/AWSDevOpsTutorial
pushDynatraceDeploymentEvent
Pushes Deployment Info to Dynatrace Entities
validateBuildDynatraceWorker
Compares Builds and Approves/Rejects Pipeline
pushDynatraceDeploymentEvent
Pushes Deployment Info to Dynatrace Entities
validateBuildDynatraceWorker
Validates Production and Approves/Rejects Pipeline
handleDynatraceProblemNotification
Executes Auto-Remediating Actions, e.g: Rollback
Build 6
Build 7
Production
Production
Auto-Approve!
Auto-Reject!
Auto-Approve!
Auto-Reject!

How to start to build your own remediation workflow?
1. Monitor your environment
2. Define your runbooks
3. Start small and with low hanging fruits
 What are frequent issues?
 Of these, which ones are easy to deal with?
4. Build more and more automation along the way
Cultural Change!

www.dynatrace.com
confidential
Jürgen Etzlstorfer
Technology Strategist
juergen.etzlstorfer@dynatrace.com
@jetzlstorfer
Thank you!

confidential
dynatrace.com/trial

How to build your own auto-remediation workflow - Ansible Meetup Munich

More Related Content

What's hot (20)

Similar to How to build your own auto-remediation workflow - Ansible Meetup Munich (20)

Recently uploaded (20)

How to build your own auto-remediation workflow - Ansible Meetup Munich

Editor's Notes