Site Reliability Engineering
DevOps Culture and System Reliability
Ahmed Badran - CTO @Magalix
ahmed.badran@magalix.com
Agenda
- My Background
- Developers vs Operators
- The DevOps Culture
- Site Reliability Engineering
- Service Level Objectives
○ Availability
○ Scalability
- Resources
- Q&A
My Background
Developers vs Operators
With the rise of online systems and microservices architecture, agile development
became the norm
Agility vs Stability
Applications vs Infrastructure
The DevOps Culture
The DevOps Culture
❏ Reduce Organizational Silos
❏ Implement Gradual Change
❏ Accept Failures as Normal
❏ Leverage Tooling and Automation
❏ Measure Everything
Site Reliability Engineering
Site Reliability Engineering
✓ Reduce Organizational Silos
➔ Shared ownership of production environment
✓ Accept Failures as Normal
➔ SLO & blameless postmortems
✓ Implement Gradual Change
➔ Agile development and small releases
➔ Canary testing in production
✓ Leverage Tooling and Automation
➔ Continuous integration and deployment
✓ Measure Everything
➔ Define and measure system reliability
➔ Measure and reduce toil
Service Level Objectives
● SLI Service Level Indicator
Metrics over time which inform us about the health of a service
● SLO Service Level Objective
Agreed about bounds regarding how often the SLI must be met
● SLA Service Level Agreement
Business level agreement which define penalties if we fail to deliver the agreed on availability
Availability
Good service to total demanded service
❏ Availability = Uptime / (Uptime + Downtime)
❏ Availability = Successful Requests / Total Requests
Availability
Availability
Scalability
● Measuring Load
○ Write load
○ Read load
● Measuring Performance
○ Throughput
○ Response time
○ Queuing time
Scalability
Resources
❏ Site Reliability Engineering - 2 books
❏ Building Microservices (Designing Fine Grained Systems) - Book
❏ Designing Data-Intensive Applications - Book
❏ On Designing and Deploying Internet Scale Services - James Hamilton 2007
❏ Meaningful Availability - Google Paper
❏ 8 Fallacies of Distributed Computing - wikipedia
Q&A
Ahmed Badran - CTO @Magalix
ahmed.badran@magalix.com

More Related Content

PPTX
DOES SFO 2016 - Ray Krueger - Speed as a Prime Directive
PPTX
What is DevOps?
PPTX
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
PPTX
Fail Fast and Win with Continuous Testing: Uri Scheiner – Jenkins World
PPTX
DOES SFO 2016 - Marc Priolo - Are we there yet?
PPTX
CA CD for Linkedin
PPTX
Creating a pull for DevOps in an Agile Transformation
PPTX
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale
DOES SFO 2016 - Ray Krueger - Speed as a Prime Directive
What is DevOps?
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
Fail Fast and Win with Continuous Testing: Uri Scheiner – Jenkins World
DOES SFO 2016 - Marc Priolo - Are we there yet?
CA CD for Linkedin
Creating a pull for DevOps in an Agile Transformation
DOES SFO 2016 - Greg Maxey and Laurent Rochette - DSL at Scale

What's hot (20)

PPTX
Deploying more technology to shift from agility to anti-fragility
PPTX
Ian Margetts - ASOS’ Journey to Continuous Deployment
PDF
Super Charge your Product Development via the Use of DevOps
PPTX
The art of wmb deployment automation
PPTX
Key alias dev standard final
PPTX
QASymphony Atlanta Customer User Group Fall 2017
PDF
Microservices in production 15/12/2015
PPTX
The Cloud Journey in an Enterprise - IDC Multicloud - Stockholm November 20, ...
PPTX
Moving QA from Reactive to Proactive with qTest
PDF
Build a Microsite in 30 Minutes - No Coding Required
PPTX
The Cloud journey in an Enterprise - Delivery of Things World - Berlin April ...
PDF
InCycle's DevOps with Azure
PDF
Automated release management for salesforce
PPTX
DevOps + Azure
PPTX
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
PPTX
Cloud @ Scania - Södertälje Science Week
PPTX
DevOps Condensed
PPTX
From Iterative to Continuous Delivery, PNSQC 2014
PPTX
DOES16 San Francisco - Marc Ng - SAP’s DevOps Journey: From Building an App t...
PPTX
Extending DevOps practices to SQL Server databases - branching & merging with...
Deploying more technology to shift from agility to anti-fragility
Ian Margetts - ASOS’ Journey to Continuous Deployment
Super Charge your Product Development via the Use of DevOps
The art of wmb deployment automation
Key alias dev standard final
QASymphony Atlanta Customer User Group Fall 2017
Microservices in production 15/12/2015
The Cloud Journey in an Enterprise - IDC Multicloud - Stockholm November 20, ...
Moving QA from Reactive to Proactive with qTest
Build a Microsite in 30 Minutes - No Coding Required
The Cloud journey in an Enterprise - Delivery of Things World - Berlin April ...
InCycle's DevOps with Azure
Automated release management for salesforce
DevOps + Azure
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
Cloud @ Scania - Södertälje Science Week
DevOps Condensed
From Iterative to Continuous Delivery, PNSQC 2014
DOES16 San Francisco - Marc Ng - SAP’s DevOps Journey: From Building an App t...
Extending DevOps practices to SQL Server databases - branching & merging with...
Ad

Similar to Site Reliability Engineering: The DevOps Culture. (20)

PDF
Getting started with Site Reliability Engineering (SRE)
PDF
S.R.E - create ultra-scalable and highly reliable systems
PDF
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
PDF
Site-Reliability-Engineering-v2[6241].pdf
PPTX
DevOps & Site Reliability Engineering (SRE).pptx
PPTX
DevOps Torino Meetup - SRE Concepts
PDF
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
PPTX
The Next Wave of Reliability Engineering
PDF
Sre summary
PPTX
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
PPTX
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
PPTX
Site (Service) Reliability Engineering
PDF
What is DevOps? And Why Use DevOps? What?
PDF
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
PPTX
SRE vs DevOps
PDF
How To Make Dev Ops Work @ Netlight Edge X Berlin
PPTX
When down is not good enough. SRE On Azure
PDF
GCP-pdevops devops engineer exam prepearitaon guide
PDF
SRE - drupal day aveiro 2016
Getting started with Site Reliability Engineering (SRE)
S.R.E - create ultra-scalable and highly reliable systems
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site-Reliability-Engineering-v2[6241].pdf
DevOps & Site Reliability Engineering (SRE).pptx
DevOps Torino Meetup - SRE Concepts
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
SRE (service reliability engineer) on big DevOps platform running on the clou...
The Next Wave of Reliability Engineering
Sre summary
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
Site (Service) Reliability Engineering
What is DevOps? And Why Use DevOps? What?
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
SRE vs DevOps
How To Make Dev Ops Work @ Netlight Edge X Berlin
When down is not good enough. SRE On Azure
GCP-pdevops devops engineer exam prepearitaon guide
SRE - drupal day aveiro 2016
Ad

Recently uploaded (20)

PPTX
Build Your First AI Agent with UiPath.pptx
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
Build Your First AI Agent with UiPath.pptx
TEXTILE technology diploma scope and career opportunities
Improvisation in detection of pomegranate leaf disease using transfer learni...
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Module 1.ppt Iot fundamentals and Architecture
CloudStack 4.21: First Look Webinar slides
Flame analysis and combustion estimation using large language and vision assi...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
1 - Historical Antecedents, Social Consideration.pdf
sustainability-14-14877-v2.pddhzftheheeeee
The influence of sentiment analysis in enhancing early warning system model f...
Developing a website for English-speaking practice to English as a foreign la...
Consumable AI The What, Why & How for Small Teams.pdf
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Convolutional neural network based encoder-decoder for efficient real-time ob...
Zenith AI: Advanced Artificial Intelligence
A proposed approach for plagiarism detection in Myanmar Unicode text

Site Reliability Engineering: The DevOps Culture.

  • 1. Site Reliability Engineering DevOps Culture and System Reliability Ahmed Badran - CTO @Magalix ahmed.badran@magalix.com
  • 2. Agenda - My Background - Developers vs Operators - The DevOps Culture - Site Reliability Engineering - Service Level Objectives ○ Availability ○ Scalability - Resources - Q&A
  • 4. Developers vs Operators With the rise of online systems and microservices architecture, agile development became the norm Agility vs Stability Applications vs Infrastructure
  • 6. The DevOps Culture ❏ Reduce Organizational Silos ❏ Implement Gradual Change ❏ Accept Failures as Normal ❏ Leverage Tooling and Automation ❏ Measure Everything
  • 8. Site Reliability Engineering ✓ Reduce Organizational Silos ➔ Shared ownership of production environment ✓ Accept Failures as Normal ➔ SLO & blameless postmortems ✓ Implement Gradual Change ➔ Agile development and small releases ➔ Canary testing in production ✓ Leverage Tooling and Automation ➔ Continuous integration and deployment ✓ Measure Everything ➔ Define and measure system reliability ➔ Measure and reduce toil
  • 9. Service Level Objectives ● SLI Service Level Indicator Metrics over time which inform us about the health of a service ● SLO Service Level Objective Agreed about bounds regarding how often the SLI must be met ● SLA Service Level Agreement Business level agreement which define penalties if we fail to deliver the agreed on availability
  • 10. Availability Good service to total demanded service ❏ Availability = Uptime / (Uptime + Downtime) ❏ Availability = Successful Requests / Total Requests
  • 13. Scalability ● Measuring Load ○ Write load ○ Read load ● Measuring Performance ○ Throughput ○ Response time ○ Queuing time
  • 15. Resources ❏ Site Reliability Engineering - 2 books ❏ Building Microservices (Designing Fine Grained Systems) - Book ❏ Designing Data-Intensive Applications - Book ❏ On Designing and Deploying Internet Scale Services - James Hamilton 2007 ❏ Meaningful Availability - Google Paper ❏ 8 Fallacies of Distributed Computing - wikipedia
  • 16. Q&A Ahmed Badran - CTO @Magalix ahmed.badran@magalix.com