SlideShare a Scribd company logo
(Blameless)
post-mortems
@jasonhand
It’s Not Your Fault
Jason Hand
DevOps
“Handyman”
jason@VictorOps.com
!
@jasonhand
@jasonhand
A little about me…
Dir. of Platform Support - AppDirect
Dir. of Technical Support - Standing Cloud
Dir. of Operational Systems - American Fasteners, Inc.
Hiker, climber, brewer, runner, biker, boarder, surfer,
painter, singer, reader, writer, picker, coder, racer,
camper, volunteer …. all the usual “Colorado 1-upper” crap.
@jasonhand
Alternative names
Also known as:
(Note: Public & Internal)
Project Retrospectives
Post-mortem analysis Post-project review
Project Analysis Review
Quality Improvement Review
Autopsy Review
Santayana Review
After Action Review
Touchdown Meeting
@jasonhand
Post-mortem
Defined
A process intended to inform improvements by determining
aspects that were successful or unsuccessful.
What ?
@jasonhand
Post-mortem
Defined
As soon as feasible after the Incident is resolved.
When ?
@jasonhand
Post-mortem
Defined
Everybody
Who ?
@jasonhand
Post-mortem
Defined
To communicate with your team
Why ?
To understand what happened for learning and improving
@jasonhand
Post-mortem
Defined
Talk about the incident timeline
Escalation steps
What was done to resolve the problem
Create a remediation plan
Make it available
How ?
@jasonhand
The Three R’s
Regret
Acknowledgement and apology
Reason
Initial incident detection to resolution, including
the so-called “root causes.”
Remedy
Actionable remediation items
Dave Zwieback
VP Engineering - Next Big Sound
@jasonhand
( simple format )
(Remedy)
Specific
Measurable
Agreed Upon/Agreeable
Realistic
Timebound
Use SMART recommendations
Moving from Reaction to Action
@jasonhand
Blameless
image from “Across the Universe” @jasonhand
2011 - Hired to Standing Cloud
Cool story, bro
Cloud marketplace & automated deployment of apps
Build Support team
Provide Managed services
@jasonhand
Cool story, bro
@jasonhand
– Sydney Dekker
“Reprimanding bad apples may
seem like a quick and rewarding
fix, but it’s like peeing in your
pants.
!
You feel relieved and perhaps even
nice and warm for a little while,
but then it gets cold and
uncomfortable.
!
And you look like a fool”
Quote first seen in J. Paul Reed’s “A Look at Looking in the Mirror"
@jasonhand
What is a blameless
post-mortem?
Team members are accountable but not responsible
Complete Transparency
Deeper look at circumstances
What happened and how to improve it (specific details)
Real conditions of failure in complex systems
@jasonhand
– Dave Zwieback
“Your organization must
continually affirm that
individuals are NEVER the “root
cause” of outages.”
@jasonhand
Paraphrased from “Fallible Humans” by Ian Malpass
- DevOpsDays - Minneapolis
source: http://www.indecorous.com/fallible_humans/@jasonhand
(Efficiency Thoroughness Trade Off)
The trade off between:
!
being efficient
vs
being thorough
ETTO
Efficient
Thorough
@jasonhand
- Ian Malpass
“We can be thorough and really
dig into the task at hand and
understand it well but this takes
time:
it is inefficient.”
@jasonhand
Cause & Effect
There are many factors that played a part in the problem
source: http://xkcd.com
“may be”
@jasonhand
Stress
& Cognitive
Bias
@jasonhand
Yerkes-Dodson Model
source: The Human Side of Postmortems
@jasonhand
@jasonhand
Reduce Stress?
… build
muscle memory
Simulate many types of problems
and outages as “practice” …
@jasonhand
Evaluative Threat
Being negatively judged
plays a big role in stress
@jasonhand
What is stress surface?
Variables of a situation
Novel or unusual
Unpredictable
Controllable situation
Negative judgement
Lack of sleep
Problems at home
Health
Relationships
@jasonhand
Evaluative threats
ALSO
Etc…
Capturing the
Human-side
Ask questions
@jasonhand
Stress Questionnaire
The situation was novel or unusual?
The situation was unpredictable?
You were unable to control the situation?
Others could judge your actions negatively?
0 = Never 1 = Almost Never 2 = Sometimes
3 = Fairly Often 4 = Very Often
During the outage, how often have you felt or thought that:
@jasonhand
Why we don’t punish
De-incentivized to give the details
Practically guarantees a repeat of the problem
Understand why actions made sense (at the time)
Create safety AND accountability
Move away from idea of “individuals are problems”
Create new “experts”
@jasonhand
@jasonhand
Promoting from within
Where do we start?
• Document your timeline or log data
• Document conversations
• Leave room for notes
• Mean time to resolution / Time calculations
• Level of severity
• Archive it for historical retrieval
• Remediation. Make it actionable
@jasonhand
The basics:
Tools
Etsy’s Morgue
VictorOps
Post-mortem Report
@jasonhand
Internal Wiki
@jasonhand
Seek the truth
Don’t blame others …
!
Don’t blame yourself
Thank You
Questions ?
@jasonhand
Resources
“The Human Side of Postmortems” - Dave Zwieback
“The Field Guide to Understanding Human Error” - Sydney Dekker
“A Look at Looking in the Mirror” - J. Paul Reed
“Fallible Humans” - Ian Malpass (http://www.indecorous.com/fallible_humans/)
“4 Questions to ask for an effective Technical Post Mortem” - Jeffrey O’Brien (http://www.maintenanceassistant.com/blog/
4-questions-effective-technical-post-mortem/)
“Nine steps to IT post-mortem excellence” - Michael Krigsman (http://www.zdnet.com/blog/projectfailures/nine-steps-to-it-
post-mortem-excellence/1069)
“Postmortem reviews: purpose and approaches in software engineering” - Torgeir Dingsøyr (http://www.uio.no/studier/
emner/matnat/ifi/INF5180/v10/undervisningsmateriale/reading-materials/p08/post-mortems.pdf)
“Blameless PostMortems and a Just Culture” - John Allspaw (http://codeascraft.com/2012/05/22/blameless-postmortems/)
“What blameless really means” - Jessica Harllee (http://www.jessicaharllee.com/notes/what-blameless-really-means/)
“Each necessary, but only jointly sufficient” - John Allspaw (http://www.kitchensoap.com/2012/02/10/each-necessary-but-
only-jointly-sufficient/)
@jasonhand

More Related Content

PPTX
Introduction to DevOps
PDF
Observability at Scale
PPTX
DevOps Overview
PPSX
Service Mesh - Observability
PPTX
DevOps Monitoring and Alerting
PPTX
Introduction to CI/CD
PPTX
DevOps and Tools
PPTX
DevOps 101 - an Introduction to DevOps
Introduction to DevOps
Observability at Scale
DevOps Overview
Service Mesh - Observability
DevOps Monitoring and Alerting
Introduction to CI/CD
DevOps and Tools
DevOps 101 - an Introduction to DevOps

What's hot (20)

PPTX
Observability – the good, the bad, and the ugly
PPTX
Site reliability engineering - Lightning Talk
PDF
DevOps Powerpoint Presentation Slides
PPSX
Microservices Architecture - Cloud Native Apps
PPTX
How to Organize and Prioritize Requirements
PPTX
DevOps Introduction
PPTX
Site (Service) Reliability Engineering
PPTX
Introduction to DevOps
PDF
CI/CD (DevOps) 101
PPTX
Lean Agile Center of Excellence - Agile2017 Talk
PPTX
DevOps introduction
PPTX
Flusso Continuous Integration & Continuous Delivery
PPTX
PPTX
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
PDF
DevOps for beginners
PPTX
Transforming Organizations with CI/CD
PDF
SecDevOps - The Operationalisation of Security
PPTX
SRE vs DevOps
PDF
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
PDF
Getting started with Site Reliability Engineering (SRE)
Observability – the good, the bad, and the ugly
Site reliability engineering - Lightning Talk
DevOps Powerpoint Presentation Slides
Microservices Architecture - Cloud Native Apps
How to Organize and Prioritize Requirements
DevOps Introduction
Site (Service) Reliability Engineering
Introduction to DevOps
CI/CD (DevOps) 101
Lean Agile Center of Excellence - Agile2017 Talk
DevOps introduction
Flusso Continuous Integration & Continuous Delivery
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
DevOps for beginners
Transforming Organizations with CI/CD
SecDevOps - The Operationalisation of Security
SRE vs DevOps
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
Getting started with Site Reliability Engineering (SRE)
Ad

Similar to It's Not Your Fault - Blameless Post-mortems (20)

PPTX
WEBINAR: VictorOps Blameless Post-Mortems
PDF
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
PPT
Tom Peters at Property Loss Research Bureau, Orlando
PDF
Purple Teaming the Cyber Kill Chain: Practical Exercises for Everyone Sector...
PDF
Purple teaming Cyber Kill Chain
PDF
Bootstrapping a-devops-matter
PDF
Retrospecting our Retrospectives
PDF
What I Learned By Talking to 100+ Teams About Data
PDF
Open Plans User Testing Workshop
PDF
Remote Control: Your Guide to Successful Collaboration
PPTX
Human error investigation
PPTX
United2012 Rugged DevOps Rocks
PDF
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
PDF
DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retro...
PDF
The Blameless Cloud: Bringing Actionable Retros to Salesforce
PDF
Five Ways to Get Better Data From Our Users
PDF
Critical Thinking for Software Testers
PPTX
From Intake to Engagement: Old School and New Cool Strategies and Techniques
PPTX
The Art of Speaking Data.
WEBINAR: VictorOps Blameless Post-Mortems
DevOps Connect: Josh Corman and Gene Kim discuss DevOpsSec
Tom Peters at Property Loss Research Bureau, Orlando
Purple Teaming the Cyber Kill Chain: Practical Exercises for Everyone Sector...
Purple teaming Cyber Kill Chain
Bootstrapping a-devops-matter
Retrospecting our Retrospectives
What I Learned By Talking to 100+ Teams About Data
Open Plans User Testing Workshop
Remote Control: Your Guide to Successful Collaboration
Human error investigation
United2012 Rugged DevOps Rocks
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
DOES15 - Finn-Braun and Reed - The Blameless Cloud: Bringing Actionable Retro...
The Blameless Cloud: Bringing Actionable Retros to Salesforce
Five Ways to Get Better Data From Our Users
Critical Thinking for Software Testers
From Intake to Engagement: Old School and New Cool Strategies and Techniques
The Art of Speaking Data.
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
1. Introduction to Computer Programming.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Mushroom cultivation and it's methods.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
project resource management chapter-09.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Approach and Philosophy of On baking technology
Assigned Numbers - 2025 - Bluetooth® Document
1. Introduction to Computer Programming.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
SOPHOS-XG Firewall Administrator PPT.pptx
Chapter 5: Probability Theory and Statistics
Agricultural_Statistics_at_a_Glance_2022_0.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Mushroom cultivation and it's methods.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative study of natural language inference in Swahili using monolingua...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
NewMind AI Weekly Chronicles - August'25-Week II
A novel scalable deep ensemble learning framework for big data classification...
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
project resource management chapter-09.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf

It's Not Your Fault - Blameless Post-mortems

  • 3. A little about me… Dir. of Platform Support - AppDirect Dir. of Technical Support - Standing Cloud Dir. of Operational Systems - American Fasteners, Inc. Hiker, climber, brewer, runner, biker, boarder, surfer, painter, singer, reader, writer, picker, coder, racer, camper, volunteer …. all the usual “Colorado 1-upper” crap. @jasonhand
  • 4. Alternative names Also known as: (Note: Public & Internal) Project Retrospectives Post-mortem analysis Post-project review Project Analysis Review Quality Improvement Review Autopsy Review Santayana Review After Action Review Touchdown Meeting @jasonhand
  • 5. Post-mortem Defined A process intended to inform improvements by determining aspects that were successful or unsuccessful. What ? @jasonhand
  • 6. Post-mortem Defined As soon as feasible after the Incident is resolved. When ? @jasonhand
  • 8. Post-mortem Defined To communicate with your team Why ? To understand what happened for learning and improving @jasonhand
  • 9. Post-mortem Defined Talk about the incident timeline Escalation steps What was done to resolve the problem Create a remediation plan Make it available How ? @jasonhand
  • 10. The Three R’s Regret Acknowledgement and apology Reason Initial incident detection to resolution, including the so-called “root causes.” Remedy Actionable remediation items Dave Zwieback VP Engineering - Next Big Sound @jasonhand ( simple format )
  • 11. (Remedy) Specific Measurable Agreed Upon/Agreeable Realistic Timebound Use SMART recommendations Moving from Reaction to Action @jasonhand
  • 12. Blameless image from “Across the Universe” @jasonhand
  • 13. 2011 - Hired to Standing Cloud Cool story, bro Cloud marketplace & automated deployment of apps Build Support team Provide Managed services @jasonhand
  • 15. – Sydney Dekker “Reprimanding bad apples may seem like a quick and rewarding fix, but it’s like peeing in your pants. ! You feel relieved and perhaps even nice and warm for a little while, but then it gets cold and uncomfortable. ! And you look like a fool” Quote first seen in J. Paul Reed’s “A Look at Looking in the Mirror" @jasonhand
  • 16. What is a blameless post-mortem? Team members are accountable but not responsible Complete Transparency Deeper look at circumstances What happened and how to improve it (specific details) Real conditions of failure in complex systems @jasonhand
  • 17. – Dave Zwieback “Your organization must continually affirm that individuals are NEVER the “root cause” of outages.” @jasonhand
  • 18. Paraphrased from “Fallible Humans” by Ian Malpass - DevOpsDays - Minneapolis source: http://www.indecorous.com/fallible_humans/@jasonhand
  • 19. (Efficiency Thoroughness Trade Off) The trade off between: ! being efficient vs being thorough ETTO Efficient Thorough @jasonhand
  • 20. - Ian Malpass “We can be thorough and really dig into the task at hand and understand it well but this takes time: it is inefficient.” @jasonhand
  • 21. Cause & Effect There are many factors that played a part in the problem source: http://xkcd.com “may be” @jasonhand
  • 23. Yerkes-Dodson Model source: The Human Side of Postmortems @jasonhand
  • 25. Reduce Stress? … build muscle memory Simulate many types of problems and outages as “practice” … @jasonhand
  • 26. Evaluative Threat Being negatively judged plays a big role in stress @jasonhand
  • 27. What is stress surface? Variables of a situation Novel or unusual Unpredictable Controllable situation Negative judgement Lack of sleep Problems at home Health Relationships @jasonhand Evaluative threats ALSO Etc…
  • 29. Stress Questionnaire The situation was novel or unusual? The situation was unpredictable? You were unable to control the situation? Others could judge your actions negatively? 0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often During the outage, how often have you felt or thought that: @jasonhand
  • 30. Why we don’t punish De-incentivized to give the details Practically guarantees a repeat of the problem Understand why actions made sense (at the time) Create safety AND accountability Move away from idea of “individuals are problems” Create new “experts” @jasonhand
  • 32. Promoting from within Where do we start? • Document your timeline or log data • Document conversations • Leave room for notes • Mean time to resolution / Time calculations • Level of severity • Archive it for historical retrieval • Remediation. Make it actionable @jasonhand The basics:
  • 34. @jasonhand Seek the truth Don’t blame others … ! Don’t blame yourself Thank You
  • 36. Resources “The Human Side of Postmortems” - Dave Zwieback “The Field Guide to Understanding Human Error” - Sydney Dekker “A Look at Looking in the Mirror” - J. Paul Reed “Fallible Humans” - Ian Malpass (http://www.indecorous.com/fallible_humans/) “4 Questions to ask for an effective Technical Post Mortem” - Jeffrey O’Brien (http://www.maintenanceassistant.com/blog/ 4-questions-effective-technical-post-mortem/) “Nine steps to IT post-mortem excellence” - Michael Krigsman (http://www.zdnet.com/blog/projectfailures/nine-steps-to-it- post-mortem-excellence/1069) “Postmortem reviews: purpose and approaches in software engineering” - Torgeir Dingsøyr (http://www.uio.no/studier/ emner/matnat/ifi/INF5180/v10/undervisningsmateriale/reading-materials/p08/post-mortems.pdf) “Blameless PostMortems and a Just Culture” - John Allspaw (http://codeascraft.com/2012/05/22/blameless-postmortems/) “What blameless really means” - Jessica Harllee (http://www.jessicaharllee.com/notes/what-blameless-really-means/) “Each necessary, but only jointly sufficient” - John Allspaw (http://www.kitchensoap.com/2012/02/10/each-necessary-but- only-jointly-sufficient/) @jasonhand