SlideShare a Scribd company logo
Practical Operability
Techniques for Teams
Matthew Skelton
Skelton Thatcher Consulting
skeltonthatcher.com / @SkeltonThatcher
Agile in the City, Bristol 2017, London – 02 Nov 2017
Today
What is operability?
Modern logging
Run Book dialogue sheets
Endpoint healthchecks
Correlation IDs
User Personas for dashboards
You
Software Developer
Tester / QA
DevOps Engineer
Team Leader
Head of Department
Operability:
use modern logging, Run Book
dialogue sheets, endpoint
healthchecks, correlation IDs,
and user personas as
team collaboration techniques
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
About me
Matthew Skelton
@matthewpskelton
Co-founder at
Skelton Thatcher Consulting
skeltonthatcher.com
Team Guide to
Software Operability
Matthew Skelton & Rob Thatcher
skeltonthatcher.com/publications
Download a free sample chapter
Team Guide series
#Operability #BusinessMetrics #Testability #Releasability
skeltonthatcher.com/publications
Practical Operability
Techniques for Teams
What is operability?
Operability
making software work well
in Production
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Logging with Event IDs
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
logging with Event IDs
reduce time to detect problems
increase team engagement
increase configurability
enhance collaboration
#operability
search by
event
Event ID
{Delivered,
InTransit,
Arrived}
transaction
trace
Correlation ID
612999958…
How many distinct event
types (state transitions)
in your application?
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
represent distinct states
enum
Human-readable sets:
unique values, sparse,
immutable
C#, Java, Python, node
(Ruby, PHP, …)
Technical
Domain
public enum EventID
{
// Badly-initialised logging data
NotSet = 0,
// An unrecognised event has occurred
UnexpectedError = 10000,
ApplicationStarted = 20000,
ApplicationShutdownNoticeReceived = 20001,
MessageQueued = 40000,
MessagePeeked = 40001,
BasketItemAdded = 60001,
BasketItemRemoved = 60002,
CreditCardDetailsSubmitted = 70001,
// ...
}
BasketItemAdded = 60001
BasketItemRemoved = 60002
log using Event IDs
with a modern
‘structured logging’ library
example:
https://github.com/EqualExperts/opslogger
Sean Reilly
@seanjreilly
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Example: video processing
On-demand processing of TV and
mobile streaming adverts
Ad-agency  TV broadcaster
High throughput
Glitch-free video & audio
Storage I/O
Worker Job
Queue
Upload
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Example: video processing
Discover processing bottlenecks
Trigger alerts via LogEntries /
HostedGraphite
Report on KPIs
Target areas for improvement
http://honeycomb.tv/
Run Book dialogue sheets
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Run Book dialogue sheets
Checklists for typical operational
considerations
Team-friendly exploration
runbooktemplate.infoRun Book dialogue sheets
System characteristics
Hours of operation
During what hours does the service or system actually need to operate? Can portions or features of the
system be unavailable at times if needed?
Hours of operation - core features
(e.g. 03:00-01:00 GMT+0)
Hours of operation - secondary features
(e.g. 07:00-23:00 GMT+0)
Data and processing flows
How and where does data flow through the system? What controls or triggers data flows?
(e.g. mobile requests / scheduled batch jobs / inbound IoT sensor data )
…
http://runbooktemplate.info/
runbooktemplate.infoRun Book dialogue sheets
Endpoint healthchecks
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
endpoint healthchecks
Every runnable app/service/daemon
exposes /status/health
An HTTP GET to the endpoint returns:
200 – "I am healthy"
500 – "I am sick"
endpoint healthchecks
Each component is responsible for
determining its own health and
viability – this is very contextual
endpoint healthchecks
Use JSON as a response type –
parsable by both
machines and humans!
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
endpoint healthchecks
For databases and other non-HTTP
components, run a lightweight HTTP
service in front of the component
200 / 500 responses
Helper service
https://github.com/Lugribossk/simple-dashboard
Correlation IDs
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
‘Unique-ish’ identifier for each request
Passed through downstream layers
Unique-ish ID
Synchronous HTTP:
X-HEADER e.g. X-trace-id
X-trace-id: 348e1cf8
If header is present, pass it on
(Yes, RFC6648, but this is internal only)
Asynchonous (queues, etc.):
Message Attributes, name:value pair
e.g. "trace-id":"348e1cf8"
AWS SQS: SendMessage() / ReceiveMessage()
Log the Correlation ID if present
Example: electronic trading
High speed, low latency
Trading options & derivatives
Connected to stock exchanges
Sub-millisecond timings
> £1 million per day traded
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Correlations IDs for trading
Evidence for timely operation
Help identify bottlenecks
Target areas for perf tuning
Identify race conditions
Increase operability
Example: OpenTracing / PCF
3 tracing elements:
TraceID, SpanID, ParentSpan
"X-B3-TraceId" "X-B3-SpanId"
"X-B3-ParentSpan"
Example: OpenTracing / PCF
Always log the TraceID as-is
Log calling SpanID as ParentSpan
Log new SpanID
Trace
Span
ParentSpan
Lightweight user personas
Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017
Lightweight user personas:
Ops Engineer
Test Engineer
Build & Deployment Engineer
Service Owner
Lightweight user personas:
Consider the User Experience (UX) of
engineers and team members using
and working with the software
http://www.keepitusable.com/blog/?tag=alan-cooper
https://www.geckoboard.com/blog/visualisation-upgrades-
progressing-towards-a-more-useful-and-beautiful-dashboard/
Lightweight user personas:
What data does the User Persona
need visible on a dashboard in order
to make decisions rapidly & safely?
Summary
Operability
making software work well
in Production
logging with Event IDs
use enum-based Event IDs to
explore runtime behaviour
and fault conditions
Run Book dialogue sheets
explore and establish
operational requirements as
a team, around a physical
table, together
endpoint healthchecks
HTTP 200 / 500 responses to
/status/health call with
JSON details – good for
tools and humans
Correlation IDs
trace execution using
correlation IDs:
synchronous (HTTP X-trace-id)
async (SQS MessageAttribute)
lightweight user personas
explore the UX and needs of
different roles for rapid
decisions via dashboards
Operability
use modern logging, Run Book
dialogue sheets, endpoint
healthchecks, correlation IDs,
and user personas as
team collaboration techniques
Team Guide to
Software Operability
Matthew Skelton & Rob Thatcher
skeltonthatcher.com/publications
Download a free sample chapter
Team Guide series
#Operability #BusinessMetrics #Testability #Releasability
skeltonthatcher.com/publications
Resources
• Team Guide to Software Operability by Matthew Skelton and Rob
Thatcher (Skelton Thatcher Publications, 2016)
http://operabilitybook.com/
• Run Book template & Run Book dialogue sheets
http://runbooktemplate.info/
Questions?
Twitter: @SkeltonThatcher | #operability
email: questions@skeltonthatcher.com
thank you
@SkeltonThatcher
skeltonthatcher.com

More Related Content

PDF
Best Practices for Accelerating Continuous Testing
PDF
Continuous Delivery at scale - Matthew Skelton - NHS Digital agile CoP - Marc...
PDF
Business and technical agility with Team Topologies - QCon Plus - 2021-05-26
PDF
Business and Technical Agility with Team Topologies, Jun 2021
PDF
Business agility with Team Topologies - NatWest Group - 2021-01-19
PDF
Team Topologies at Parts Unlimited, The Unicorn Project Book Club, Jan 2020
PDF
Team Topologies in action - early results from industry - DOES London Virtual...
PDF
Kubernetes Is Not Your Platform, It's Just the Foundation @ Tech Community Da...
Best Practices for Accelerating Continuous Testing
Continuous Delivery at scale - Matthew Skelton - NHS Digital agile CoP - Marc...
Business and technical agility with Team Topologies - QCon Plus - 2021-05-26
Business and Technical Agility with Team Topologies, Jun 2021
Business agility with Team Topologies - NatWest Group - 2021-01-19
Team Topologies at Parts Unlimited, The Unicorn Project Book Club, Jan 2020
Team Topologies in action - early results from industry - DOES London Virtual...
Kubernetes Is Not Your Platform, It's Just the Foundation @ Tech Community Da...

What's hot (19)

PDF
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
PDF
What is platform as a product? Clues from Team Topologies - WTFinar with Cont...
PDF
Business and Technical Agility with Team Topologies @ WTF Is Cloud Native, No...
PDF
Product Teams Need a Family Too! @ Stockholm Engineering Leadership Meetup, J...
PDF
WFT is platform as a product? Clues from Team Topologies - WTFinar with Conta...
PDF
Avoiding the CI/CD Monolith with Team Design & Evolution @ London CD meetup, ...
PDF
Monoliths vs Microservices is the Wrong Question; Start with Team Cognitive L...
PDF
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
PDF
Product Teams Need a Family Too! @ Enterprise Agile San Francisco meetup, Jul...
PDF
Playing Tetris with Cognitive Load @ Craft Conference, Jun 2021
PDF
Automated Governance
PDF
Business Agility with Team Topologies @ Digital Transformation London meetup,...
PDF
Conway's Law Is Out to Get You! @ #PMOwfh meetup, May 2020
PDF
How to choose tools for DevOps and Continuous Delivery - #doxlon
PDF
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
PDF
Product Teams Need a Family Too! @ Product Camp Brazil, Dec 2021
PDF
What Is Platform as a Product - Clues from Team Topologies @ AXA, Sep 2021
PPTX
Beyond Agile with Team Topologies
PDF
Remote-First Team Interactions for Business and Technology Teams @ Lean-Agile...
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
What is platform as a product? Clues from Team Topologies - WTFinar with Cont...
Business and Technical Agility with Team Topologies @ WTF Is Cloud Native, No...
Product Teams Need a Family Too! @ Stockholm Engineering Leadership Meetup, J...
WFT is platform as a product? Clues from Team Topologies - WTFinar with Conta...
Avoiding the CI/CD Monolith with Team Design & Evolution @ London CD meetup, ...
Monoliths vs Microservices is the Wrong Question; Start with Team Cognitive L...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
Product Teams Need a Family Too! @ Enterprise Agile San Francisco meetup, Jul...
Playing Tetris with Cognitive Load @ Craft Conference, Jun 2021
Automated Governance
Business Agility with Team Topologies @ Digital Transformation London meetup,...
Conway's Law Is Out to Get You! @ #PMOwfh meetup, May 2020
How to choose tools for DevOps and Continuous Delivery - #doxlon
What is platform as a product? Clues from Team Topologies - Puppetize 2020 - ...
Product Teams Need a Family Too! @ Product Camp Brazil, Dec 2021
What Is Platform as a Product - Clues from Team Topologies @ AXA, Sep 2021
Beyond Agile with Team Topologies
Remote-First Team Interactions for Business and Technology Teams @ Lean-Agile...
Ad

Similar to Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017 (20)

PDF
Practical operability techniques for distributed systems - Velocity EU 2017
PDF
Practical operability techniques for teams - IPEXPO 2017
PDF
Practical operability techniques for teams - webinar - Skelton Thatcher & Unicom
PDF
Practical, team-focused operability techniques for distributed systems - DevO...
PDF
5 practical operability techniques for teams - Matthew Skelton - ADDO 2018
PDF
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
PDF
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
PDF
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
PPT
Application Logging Good Bad Ugly ... Beautiful?
PPTX
Dot Net performance monitoring
PDF
Observability foundations in dynamically evolving architectures
PPTX
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
PPTX
SplunkLive! Zurich 2018: Integrating Metrics and Logs
PPS
Biz Nova It Project Bonus Slides
PPTX
Observability for Application Developers (1)-1.pptx
PDF
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
PPTX
Disaster_Reovery1_Patrol_Continuity.pptx
PDF
Microservices and Prometheus (Microservices NYC 2016)
ODP
SCM: An Introduction
PDF
Performance testing wreaking balls
Practical operability techniques for distributed systems - Velocity EU 2017
Practical operability techniques for teams - IPEXPO 2017
Practical operability techniques for teams - webinar - Skelton Thatcher & Unicom
Practical, team-focused operability techniques for distributed systems - DevO...
5 practical operability techniques for teams - Matthew Skelton - ADDO 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Application Logging Good Bad Ugly ... Beautiful?
Dot Net performance monitoring
Observability foundations in dynamically evolving architectures
What is going on? Application Diagnostics on Azure - Copenhagen .NET User Group
SplunkLive! Zurich 2018: Integrating Metrics and Logs
Biz Nova It Project Bonus Slides
Observability for Application Developers (1)-1.pptx
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Disaster_Reovery1_Patrol_Continuity.pptx
Microservices and Prometheus (Microservices NYC 2016)
SCM: An Introduction
Performance testing wreaking balls
Ad

More from Skelton Thatcher Consulting Ltd (20)

PDF
Teams and monoliths - Matthew Skelton - London DevOps June 2017
PDF
How and why to design your teams for modern software - JAX DevOps - April 2017
PDF
How and why to design your teams for modern software systems - Agile in Leeds...
PDF
Using Rancher for highly available deployment services with GoCD and TeamCity
PDF
How and why to design your Teams for modern Software Systems - Matthew Skelto...
PDF
Teams and monoliths - Matthew Skelton - Velocity EU 2016
PDF
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
PDF
Teams and monoliths - Matthew Skelton - LondonCD 2016
PDF
Continuous Delivery Anti-patterns from the wild - Matthew Skelton- IPEXPO Europe
PDF
Continuous Delivery antipatterns from the wild - Matthew Skelton - IPEXPO Man...
PDF
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
PDF
Why and how to test logging - DevOps Showcase North - Feb 2016 - Matthew Skelton
PDF
How to bridge the Dev-DBA chasm - AgileYorkshire - Matthew Skelton
PDF
How to address operational aspects effectively with Agile practices - Matthew...
PDF
Long live the DevOps team - LeedsDevOps - 2015-10-22 - Matthew Skelton
PDF
Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
PDF
Demystifying Operational Features for Product Owners - AgileCam - SkeltonThat...
PDF
Un-broken Logging - Operability.io 2015 - Matthew Skelton
PDF
Long live the DevOps team - Edinburgh 2015 - Skelton Thatcher
PDF
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcher
Teams and monoliths - Matthew Skelton - London DevOps June 2017
How and why to design your teams for modern software - JAX DevOps - April 2017
How and why to design your teams for modern software systems - Agile in Leeds...
Using Rancher for highly available deployment services with GoCD and TeamCity
How and why to design your Teams for modern Software Systems - Matthew Skelto...
Teams and monoliths - Matthew Skelton - Velocity EU 2016
Teams and monoliths - Matthew Skelton - Agile in the City Bristol 2016
Teams and monoliths - Matthew Skelton - LondonCD 2016
Continuous Delivery Anti-patterns from the wild - Matthew Skelton- IPEXPO Europe
Continuous Delivery antipatterns from the wild - Matthew Skelton - IPEXPO Man...
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
Why and how to test logging - DevOps Showcase North - Feb 2016 - Matthew Skelton
How to bridge the Dev-DBA chasm - AgileYorkshire - Matthew Skelton
How to address operational aspects effectively with Agile practices - Matthew...
Long live the DevOps team - LeedsDevOps - 2015-10-22 - Matthew Skelton
Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Demystifying Operational Features for Product Owners - AgileCam - SkeltonThat...
Un-broken Logging - Operability.io 2015 - Matthew Skelton
Long live the DevOps team - Edinburgh 2015 - Skelton Thatcher
Miniature Guide to Operational Features - EdinDevOps - SkeltonThatcher

Recently uploaded (20)

PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administration Chapter 2
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
history of c programming in notes for students .pptx
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
System and Network Administraation Chapter 3
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Nekopoi APK 2025 free lastest update
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How Creative Agencies Leverage Project Management Software.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administration Chapter 2
Wondershare Filmora 15 Crack With Activation Key [2025
Online Work Permit System for Fast Permit Processing
history of c programming in notes for students .pptx
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
System and Network Administraation Chapter 3
Upgrade and Innovation Strategies for SAP ERP Customers
Navsoft: AI-Powered Business Solutions & Custom Software Development
How to Choose the Right IT Partner for Your Business in Malaysia
Odoo Companies in India – Driving Business Transformation.pdf
Odoo POS Development Services by CandidRoot Solutions
Design an Analysis of Algorithms II-SECS-1021-03
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
How to Migrate SBCGlobal Email to Yahoo Easily
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Nekopoi APK 2025 free lastest update
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025

Practical operability techniques for teams - Matthew Skelton - Agile in the City Bristol 2017