SlideShare a Scribd company logo
SRE in startup
Zonky 17.1.2017
Ladislav Prskavec, Apiary
ladislav@apiary.io
@abtris
1
What is SRE?
2
"What happens when a software engineer is tasked with what used to
be called operations."
» Ben Treynor Sloss, Vice President, Google Engineering,
founder of Google SRE
3
"Our work is like being part of the world's most intense pit crew. We
change the tires of a race car as it's going 100 mph."
» Andrew Widdowson, Site Reliability Engineer, Mountain View
4
In general, an SRE team is responsible for:
» availability
» latency
» performance
» efficiency
» change management
» monitoring
» emergency response
» capacity planning
5
6
If the team agrees on a 99.9% SLA,
that gives them an error budget of
0.1%.
7
8
Rule
If service is in SLA, launch away
- clearly DEV team is doing a good job
If service is not within SLA, launch freeze
- Until you earn back enough error budget
9
Error budget
» removes SRE - DEV conflict
» DEV teams make self-police
10
Common staffing pool
» one more SRE = one less Dev
11
SRE hires only coders
» they get bored easily
» speak same language as Dev
12
50% cap on ops work
» if you succeed works scales with traffic
» coding reduce work / traffic ratio
13
Keep Dev in rotation
» 5% ops handled by devs
14
Speaking of Dev and Ops work
» excess operations load (tickets, oncall, etc.)
15
SRE portability
» no requirement to stick with project or SRE
16
Outages
» minimalize impact
» prevent recurrence
17
Minimalize damage
» no NOC
» good diagnostic information
» practice, practice, practice
18
Prevent recurrence
1. Handle event
2. Write post-mortems
3. Reset
19
Post-mortems philosophy
» blameless, focus on process and technology
» create timeline
» get all facts
» create bugs for all followup work
20
How are specific SRE
in startup?
21
1:10
22
Horizontal team
23
SaaS oriented
24
Oncall culture
25
It's cool work
26
SRE book
27
"May the Queries Flow,
And the Pagers Remain Silent"
SRE Benediction
28

More Related Content

PDF
Building an SRE Organization @ Squarespace
PDF
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
PPTX
How Small Team Get Ready for SRE (public version)
PPTX
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
PDF
SRE 101
PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
PDF
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
PDF
Getting started with Site Reliability Engineering (SRE)
Building an SRE Organization @ Squarespace
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
How Small Team Get Ready for SRE (public version)
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE 101
SRE (service reliability engineer) on big DevOps platform running on the clou...
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Getting started with Site Reliability Engineering (SRE)

What's hot (20)

PPTX
A Crash Course in Building Site Reliability
PPTX
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
PDF
SRE Demystified - 01 - SLO SLI and SLA
PPTX
Site (Service) Reliability Engineering
PDF
Sre summary
PPTX
Site reliability engineering
PDF
Service Level Terminology : SLA ,SLO & SLI
PPTX
SRE 101 (Site Reliability Engineering)
PPTX
DevOps Torino Meetup - SRE Concepts
PPTX
What is Site Reliability Engineering (SRE)
PPTX
SRE vs DevOps
PPTX
SRE-iously! Reliability!
PDF
Overview of Site Reliability Engineering (SRE) & best practices
PDF
SRE From Scratch
PDF
How to SRE when you have no SRE
PDF
DevOps & SRE at Google Scale
PPTX
Site reliability engineering - Lightning Talk
PDF
Microservices at Spotify
PPTX
Prov International - Our Service-Now ITOM Delivery Capabilities
PPSX
Microservices, Containers, Kubernetes, Kafka, Kanban
A Crash Course in Building Site Reliability
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
SRE Demystified - 01 - SLO SLI and SLA
Site (Service) Reliability Engineering
Sre summary
Site reliability engineering
Service Level Terminology : SLA ,SLO & SLI
SRE 101 (Site Reliability Engineering)
DevOps Torino Meetup - SRE Concepts
What is Site Reliability Engineering (SRE)
SRE vs DevOps
SRE-iously! Reliability!
Overview of Site Reliability Engineering (SRE) & best practices
SRE From Scratch
How to SRE when you have no SRE
DevOps & SRE at Google Scale
Site reliability engineering - Lightning Talk
Microservices at Spotify
Prov International - Our Service-Now ITOM Delivery Capabilities
Microservices, Containers, Kubernetes, Kafka, Kanban
Ad

Viewers also liked (20)

PDF
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
PPT
The ROLE SRE Approach - Getting more concrete
PDF
Sre con16 tier 1 metamorphosis
POTX
Using the Splunk Java SDK
PPTX
SouthBay SRE Meetup Jan 2016
PPTX
Scio Saa S Readiness Evaluation Sre V1.0
PDF
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
PDF
Works of site reliability engineer
PDF
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
DOCX
Linux Server Hardening - Steps by Steps
PDF
TXLF: Automated Deployment of OpenStack with Chef
PDF
Software Reliability Engineering
PPTX
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
PDF
You got a couple Microservices, now what? - Adding SRE to DevOps
PDF
SRE Tools
PDF
Load balancing in the SRE way
PDF
SRE - drupal day aveiro 2016
PPTX
Site Reliability Engineering Helps Google Conquer The World
PDF
DevOps and Chef
PPTX
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
The ROLE SRE Approach - Getting more concrete
Sre con16 tier 1 metamorphosis
Using the Splunk Java SDK
SouthBay SRE Meetup Jan 2016
Scio Saa S Readiness Evaluation Sre V1.0
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Works of site reliability engineer
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Linux Server Hardening - Steps by Steps
TXLF: Automated Deployment of OpenStack with Chef
Software Reliability Engineering
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
You got a couple Microservices, now what? - Adding SRE to DevOps
SRE Tools
Load balancing in the SRE way
SRE - drupal day aveiro 2016
Site Reliability Engineering Helps Google Conquer The World
DevOps and Chef
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Ad

Similar to SRE in Startup (20)

PDF
Site-Reliability-Engineering-v2[6241].pdf
PPTX
SRE in Enterprise - Local Journey DevopsDays Galway
PDF
GCP-pdevops devops engineer exam prepearitaon guide
PDF
Dan Cornell - The Real Cost of Software Remediation
PDF
Real Cost of Software Remediation
PDF
Test Automation on Large Agile Projects: It's Not a Cakewalk
PDF
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
PDF
Those Other SLOs.pdf
PDF
5 Habits of High-Velocity Teams Using Kubernetes
PDF
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
PDF
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
PDF
On designing and deploying internet scale services
PDF
The Stream Process™ for Defining Projects
PDF
Kks sre book_ch1,2
PDF
SRE in Apiary
PDF
DevOpsDays Galway 2019 - SRE at Genesys
PDF
Beyond the Scrum Team: Delivering "Done" at Scale
PDF
Combining Speed of Delivery and Quality in Complex Systems
PDF
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
PPT
Extreme programming
Site-Reliability-Engineering-v2[6241].pdf
SRE in Enterprise - Local Journey DevopsDays Galway
GCP-pdevops devops engineer exam prepearitaon guide
Dan Cornell - The Real Cost of Software Remediation
Real Cost of Software Remediation
Test Automation on Large Agile Projects: It's Not a Cakewalk
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
Those Other SLOs.pdf
5 Habits of High-Velocity Teams Using Kubernetes
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
On designing and deploying internet scale services
The Stream Process™ for Defining Projects
Kks sre book_ch1,2
SRE in Apiary
DevOpsDays Galway 2019 - SRE at Genesys
Beyond the Scrum Team: Delivering "Done" at Scale
Combining Speed of Delivery and Quality in Complex Systems
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
Extreme programming

More from Ladislav Prskavec (20)

PDF
Modern Web Architecture<br>based on JS, API and Markup
PDF
How you can kill Wordpress!
PDF
PDF
Datascript: Serverless Architetecture
PDF
Serverless Architecture
PDF
PDF
PragueJS meetups 30th anniversary
PDF
How to easy deploy app into any cloud
PDF
Docker - modern platform for developement and operations
PDF
GDGSCL - Docker a jeho provoz v Heroku a AWS
PDF
AWS Elastic Container Service
PDF
Comparison nodejs frameworks using Polls API
PDF
Docker Elastic Beanstalk
PDF
Docker včera, dnes a zítra
PDF
Tessel is a microcontroller that runs JavaScript.
PDF
PDF
PDF
PDF
Firebase and AngularJS
PDF
AngularJS at PyVo
Modern Web Architecture<br>based on JS, API and Markup
How you can kill Wordpress!
Datascript: Serverless Architetecture
Serverless Architecture
PragueJS meetups 30th anniversary
How to easy deploy app into any cloud
Docker - modern platform for developement and operations
GDGSCL - Docker a jeho provoz v Heroku a AWS
AWS Elastic Container Service
Comparison nodejs frameworks using Polls API
Docker Elastic Beanstalk
Docker včera, dnes a zítra
Tessel is a microcontroller that runs JavaScript.
Firebase and AngularJS
AngularJS at PyVo

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced Soft Computing BINUS July 2025.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Modernizing your data center with Dell and AMD
GamePlan Trading System Review: Professional Trader's Honest Take
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Monthly Chronicles - July 2025
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

SRE in Startup