SlideShare a Scribd company logo
SRE Demystified
Docs that matter - 1
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com,
http://ganeshniyer.com
Dr Ganesh Neelakanta Iyer
SRE
•
2https://image.slidesharecdn.com/devopssreatgooglescale-190121123035/95/devops-sre-at-google-scale-30-638.jpg?cb=1548074257
Ref: https://queue.acm.org/detail.cfm?
id=3283589
3
Why Documentation is important?
• In the early stages of an SRE team's existence, the organization depends
heavily on the performance of highly skilled individuals on the team
• The team preserves important operational concepts and principles as
nuggets of "tribal knowledge" that are passed on verbally to new team
members
• If these concepts and principles are not codified and documented, they
will often need to be relearned painfully through trial and error
• Sometimes team members perform operational procedures as a strict
sequence of steps defined by their predecessors in the distant past,
without understanding the reasons these steps were initially prescribed
• If this is allowed to continue, processes eventually become fragmented
and tend to degenerate as the team scales up to handle new challenges
4
Documents for New Service Onboarding
5
https://cornerofficedotblog.files.wordpress.com/2018/04/dilbert-onboarding.png?w=700
Example PRR Template Areas
6
Area Questions
Architecture and
dependencies
What is your request flow from user to front end to back end?
Are there different types of requests with different latency requirements?
Capacity planning How much traffic and rate of growth do you expect during and after the
launch?
Have you obtained all the compute resources needed to support your
traffic?
Failure modes Do you have any single points of failure in your design?
How do you mitigate unavailability of your dependencies?
Processes and
automation
Are any manual processes required to keep the service running?
External dependencies What third-party code, data, services, or events do the service or the launch
depend upon?
Do any partners depend on your service? If so, do they need to be notified
of your launch?
Engagement Model Doc
• Service takeover criteria and the PRR process.
• SLO negotiation process and error budgets.
• New launch criteria and launch freeze policy (if applicable).
• Content and frequency of service status reports from the
SRE team.
• RE staffing requirements.
• Feature roadmap planning process and priority of reliability
features (requested by SREs) versus new product
functionality.
7
Documents for Running a Service
8
https://s3.amazonaws.com/lowres.cartoonstock.com/law-order-running_amok-
cars-engine_trouble-engines-auto-mban867_low.jpg
Post-mortem Template
• Timeline.
• Description of user impact.
• Root cause.
• Action items / lessons learned.
9
Policies and SLA
• Policies
• Policy documents mandate specific technical and nontechnical policies for
production
• Technical policies can apply to areas such as production-change logging, log
retention, internal service naming, and use of and access to emergency
credentials
• SLA
• SRE teams document their service(s) SLA for availability and latency, and
monitor service performance relative to the SLA
• Documenting and publishing an SLA, and rigorously measuring the end-user
experience and comparing it with the SLA, allows SRE teams to innovate more
quickly while preserving a good user experience
10
Documents for Production Products
• SRE teams aim to spend 50 percent of
their time on project work, developing
software that automates away manual
work or improves the reliability of a
managed service
• These documents are important because
they enable users to find out whether a
product is right for them to adopt, how to
get started, and how to get support
• They also provide a consistent user
experience and facilitate product adoption
11
12
References
13
Dr Ganesh Neelakanta Iyer
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com

More Related Content

PDF
SRE Demystified - 04 - Engagement Model
PDF
SRE Demystified - 09 - Simplicity
PDF
SRE Demystified - 10 - Release management-1
PDF
SRE Demystified - 13 - Docs that matter -2
PDF
SRE Demystified - 11 - Release management-2
PDF
SRE Demystified - 01 - SLO SLI and SLA
PDF
SRE Demystified - 03 - Choosing SLIs and SLOs
PDF
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design
SRE Demystified - 04 - Engagement Model
SRE Demystified - 09 - Simplicity
SRE Demystified - 10 - Release management-1
SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 11 - Release management-2
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 03 - Choosing SLIs and SLOs
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design

What's hot (18)

PDF
SRE Demystified - 06 - Distributed Monitoring
PPTX
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
PDF
Db change management automation:- Devops and Ansible
PPTX
Selenium ide 1
PDF
Performance Value Brief
PPTX
Oracle performance project public
PDF
Test Environment Strategy
PDF
SV Value Brief
PPTX
Why do you need multiple qa environments
PPTX
Load Testing Implementation With Agile Approach
PPTX
Intro to PE 01/15/2016
PPTX
Software testing
PDF
Database Health-Check Consulting Service
PDF
Test data management
DOC
Career Opportunity--Database Product Test Engineer
PPTX
Optimize continuous delivery of oracle fusion middleware applications
PDF
Why do you need multiple qa environments
PPTX
Introduction to Puppet Enterprise- 01/19/2016
SRE Demystified - 06 - Distributed Monitoring
VSLive Orlando 2019 - When "We are down" is not good enough. SRE on Azure
Db change management automation:- Devops and Ansible
Selenium ide 1
Performance Value Brief
Oracle performance project public
Test Environment Strategy
SV Value Brief
Why do you need multiple qa environments
Load Testing Implementation With Agile Approach
Intro to PE 01/15/2016
Software testing
Database Health-Check Consulting Service
Test data management
Career Opportunity--Database Product Test Engineer
Optimize continuous delivery of oracle fusion middleware applications
Why do you need multiple qa environments
Introduction to Puppet Enterprise- 01/19/2016
Ad

Similar to SRE Demystified - 12 - Docs that matter -1 (20)

PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
PDF
Site Reliability Engineering slide deck 101
PPTX
A Crash Course in Building Site Reliability
PPTX
Design Review Best Practices - SREcon 2014
PDF
SRE Demystified - 14 - SRE Practices overview
PDF
Building SharePoint Enterprise Platforms - Off the beaten path
PDF
Essential_Skills_of_a_Site_Reliability_E.pdf
PDF
Building out a Global Data delivery platform - the business and technical use...
PPTX
Documentation Framework for IT Service Delivery
PDF
Cloud expo 2018: From Apollo 13 to Google SRE - When DevOps meets SRE
PPTX
Facilitating DevOps Execution in an All Digital Environment
PDF
Site-Reliability-Engineering-v2[6241].pdf
PDF
S.R.E - create ultra-scalable and highly reliable systems
PPTX
Building enterprise platforms - off the beaten path - SharePoint User Group U...
PDF
How to get started with Site Reliability Engineering
PPTX
Site reliability engineering
PPTX
"10 Pitfalls of a Platform Team", Yura Rochniak
PDF
Clearing the Way For SRE In the Enterprise
PDF
Kks sre book_ch1,2
PPTX
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
SRE (service reliability engineer) on big DevOps platform running on the clou...
Site Reliability Engineering slide deck 101
A Crash Course in Building Site Reliability
Design Review Best Practices - SREcon 2014
SRE Demystified - 14 - SRE Practices overview
Building SharePoint Enterprise Platforms - Off the beaten path
Essential_Skills_of_a_Site_Reliability_E.pdf
Building out a Global Data delivery platform - the business and technical use...
Documentation Framework for IT Service Delivery
Cloud expo 2018: From Apollo 13 to Google SRE - When DevOps meets SRE
Facilitating DevOps Execution in an All Digital Environment
Site-Reliability-Engineering-v2[6241].pdf
S.R.E - create ultra-scalable and highly reliable systems
Building enterprise platforms - off the beaten path - SharePoint User Group U...
How to get started with Site Reliability Engineering
Site reliability engineering
"10 Pitfalls of a Platform Team", Yura Rochniak
Clearing the Way For SRE In the Enterprise
Kks sre book_ch1,2
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
Ad

More from Dr Ganesh Iyer (20)

PDF
SRE Demystified - 07 - Practical Alerting
PDF
SRE Demystified - 05 - Toil Elimination
PDF
Machine Learning for Statisticians - Introduction
PDF
Making Decisions - A Game Theoretic approach
PDF
Cloud and Industry4.0
PDF
Game Theory and Engineering Applications
PDF
Machine Learning and its Applications
PDF
How to become a successful entrepreneur
PDF
Dockers and kubernetes
PDF
Containerization Principles Overview for app development and deployment
PDF
Game Theory and Engineering Applications
PDF
Demystifying Containerization Principles for Data Scientists
PDF
Cloud computing for image processing and bio informatics
PDF
Io t trends
PDF
Trends in IoT 2017
PPTX
Docker 101 - High level introduction to docker
PDF
DrGanesh-Jan-17-Resume-V1.0
PPTX
Simplify enterprise IT with no code platform - aPaaS
PDF
Agile testing - Principles and best practices
PPTX
Docker - A high level introduction to dockers and containers
SRE Demystified - 07 - Practical Alerting
SRE Demystified - 05 - Toil Elimination
Machine Learning for Statisticians - Introduction
Making Decisions - A Game Theoretic approach
Cloud and Industry4.0
Game Theory and Engineering Applications
Machine Learning and its Applications
How to become a successful entrepreneur
Dockers and kubernetes
Containerization Principles Overview for app development and deployment
Game Theory and Engineering Applications
Demystifying Containerization Principles for Data Scientists
Cloud computing for image processing and bio informatics
Io t trends
Trends in IoT 2017
Docker 101 - High level introduction to docker
DrGanesh-Jan-17-Resume-V1.0
Simplify enterprise IT with no code platform - aPaaS
Agile testing - Principles and best practices
Docker - A high level introduction to dockers and containers

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
NewMind AI Monthly Chronicles - July 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf

SRE Demystified - 12 - Docs that matter -1

  • 1. SRE Demystified Docs that matter - 1 ganesh@ganeshniyer.com ganesh.vigneswara@gmail.com, http://ganeshniyer.com Dr Ganesh Neelakanta Iyer
  • 4. Why Documentation is important? • In the early stages of an SRE team's existence, the organization depends heavily on the performance of highly skilled individuals on the team • The team preserves important operational concepts and principles as nuggets of "tribal knowledge" that are passed on verbally to new team members • If these concepts and principles are not codified and documented, they will often need to be relearned painfully through trial and error • Sometimes team members perform operational procedures as a strict sequence of steps defined by their predecessors in the distant past, without understanding the reasons these steps were initially prescribed • If this is allowed to continue, processes eventually become fragmented and tend to degenerate as the team scales up to handle new challenges 4
  • 5. Documents for New Service Onboarding 5 https://cornerofficedotblog.files.wordpress.com/2018/04/dilbert-onboarding.png?w=700
  • 6. Example PRR Template Areas 6 Area Questions Architecture and dependencies What is your request flow from user to front end to back end? Are there different types of requests with different latency requirements? Capacity planning How much traffic and rate of growth do you expect during and after the launch? Have you obtained all the compute resources needed to support your traffic? Failure modes Do you have any single points of failure in your design? How do you mitigate unavailability of your dependencies? Processes and automation Are any manual processes required to keep the service running? External dependencies What third-party code, data, services, or events do the service or the launch depend upon? Do any partners depend on your service? If so, do they need to be notified of your launch?
  • 7. Engagement Model Doc • Service takeover criteria and the PRR process. • SLO negotiation process and error budgets. • New launch criteria and launch freeze policy (if applicable). • Content and frequency of service status reports from the SRE team. • RE staffing requirements. • Feature roadmap planning process and priority of reliability features (requested by SREs) versus new product functionality. 7
  • 8. Documents for Running a Service 8 https://s3.amazonaws.com/lowres.cartoonstock.com/law-order-running_amok- cars-engine_trouble-engines-auto-mban867_low.jpg
  • 9. Post-mortem Template • Timeline. • Description of user impact. • Root cause. • Action items / lessons learned. 9
  • 10. Policies and SLA • Policies • Policy documents mandate specific technical and nontechnical policies for production • Technical policies can apply to areas such as production-change logging, log retention, internal service naming, and use of and access to emergency credentials • SLA • SRE teams document their service(s) SLA for availability and latency, and monitor service performance relative to the SLA • Documenting and publishing an SLA, and rigorously measuring the end-user experience and comparing it with the SLA, allows SRE teams to innovate more quickly while preserving a good user experience 10
  • 11. Documents for Production Products • SRE teams aim to spend 50 percent of their time on project work, developing software that automates away manual work or improves the reliability of a managed service • These documents are important because they enable users to find out whether a product is right for them to adopt, how to get started, and how to get support • They also provide a consistent user experience and facilitate product adoption 11
  • 12. 12
  • 14. Dr Ganesh Neelakanta Iyer ganesh@ganeshniyer.com ganesh.vigneswara@gmail.com