SlideShare a Scribd company logo
@EdMcBane
7 lessons learned building HP/HA systems
Never gonna
give you up
Never gonna
let you down
@EdMcBane
Francesco Degrassi
Enthusiastic yet pragmatic Lean Software Developer.
Uppish and cynical nihilist from time to time.
@EdMcBane
Lean Software Development and team coaching
Continuous Delivery, High availability, performance
Security sensitive & high uncertainty domains
@EdMcBane
The challenge
● Primary european client
● Innovative service for the consumer market
● Large userbase (200K+ users)
● Very high request rate
● Low latency requirement (<< RTT)
@EdMcBane
What we built
@EdMcBane
What did we learn?
@EdMcBane
Make your
assumptions explicit
and keep testing them
Don’t eat
the yellow snow
@EdMcBane
Make your
assumptions explicit
and keep testing them
#1
Make your
assumptions explicit
and keep challenging them
@EdMcBane
Make your
assumptions explicit
and keep testing them
#2
Performance &
High Availability are
not extra features
@EdMcBane
@EdMcBane
Make your
assumptions explicit
and keep testing them
#3
Do not reinvent
the wheel
...but keep things simple
@EdMcBane
@EdMcBane
● Everything was good with the single core
scenario
In our case...
@EdMcBane
SO_REUSEPORT
For TCP, so_reuseport allows multiple
listener sockets to be bound to the same
port.
Received packets are distributed to
multiple sockets bound to the same port
using a 4-tuple hash.
With so_reuseport the distribution is
uniform.
@EdMcBane
Everything should be
made as simple as
possible, but not
simpler
— Albert Einstein
@EdMcBane
LESS(1) General Commands Manual LESS(1)
NAME
less - opposite of more
SYNOPSIS
less -?
less --help
less -V
less --version
less [-[+]aABcCdeEfFgGiIJKLmMnNqQrRsSuUVwWX~]
[-b space] [-h lines] [-j line] [-k keyfile]
[-{oO} logfile] [-p pattern] [-P prompt] [-t tag]
[-T tagsfile] [-x tab,...] [-y lines] [-[z] lines]
[-# shift] [+[+]cmd] [--] [filename]...
(See the OPTIONS section for alternate option syntax with long option
names.)
DESCRIPTION
LESS IS similar to MORE (1), but has many more features.
Less does not have to read the entire input file before starting, so
with large input files it starts up faster than text editors like vi
(1). Less uses termcap (or terminfo on some systems), so it can run on
Manual page less(1) line 1 (press h for help or q to quit) .
@EdMcBane
Make your
assumptions explicit
and keep testing them
#4
Be wary of
cargo-cult optimization
@EdMcBane
@EdMcBane
TCP_TW_RECYCLE
Enable fast recycling TIME-WAIT sockets.
Default value is 0. It should not be changed
without advice/request of technical experts.
Linux will drop any segment from the remote
host whose timestamp is not strictly bigger
than the latest recorded timestamp
TCP_TW_RECYCLE + NAT = MADNESS
@EdMcBane
@EdMcBane
Make your
assumptions explicit
and keep testing them
#5
High Availability is
much more than
just redundancy
@EdMcBane
@EdMcBane
● Redundant hardware
● Redundant software components
But there’s more!
● Graceful degradation
● Incremental rollouts
Failure impact
@EdMcBane
Failure frequency
But then also:
● proven technology
● high quality hardware
● automation (to avoid errors)
@EdMcBane
● Effective monitoring
○ realtime
○ reliable
○ understandable
○ thorough
○ meaningful
○ actionable
● Rollback / rollforward
● Automation (for speed)
Time to recover
@EdMcBane
Our response plan goes
something like this...
AaaaaAAaaaah
@EdMcBane
...but be prepared to improvise
● In house experience
● Developers on call
● Drills (chaos monkeys)
Processes designed for ordinary times
are not resilient in a crisis and need to be changed.
@EdMcBane
Make your
assumptions explicit
and keep testing them
#6
Embrace diversity
@EdMcBane
@EdMcBane
@EdMcBane
Make your
assumptions explicit
and keep testing them
#7
Monitoring is essential
… and we can do way better
@EdMcBane
No one size fits all
● “Monitor everything”, like “100% test coverage”
is a nice slogan.
● Each environment requires a slightly different
solution
● Balance between data availability, cost and
ability to keep it actionable
@EdMcBane
@EdMcBane
We are doing logging wrong
● Unstructured
● Inconsistent
● Poor defaults
● Complex, obscure components
● A huge waste of computing power
@EdMcBane
We need a complete overview
● Logs
● Metrics
● Alerts
● Together, coherent, cross-referenced
@EdMcBane
Human beings, who are almost unique in
having the ability to learn from the
experience of others, are also remarkable
for their apparent disinclination to do so.
Douglas Adams
“
”
@EdMcBane
Thanks!
@EdMcBane
fdegrassi@gmail.com
francesco.degrassi@optionfactory.net
http://www.optionfactory.net/blog

More Related Content

PPTX
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
PPTX
Humans by the hundred
PPTX
Humans by the hundred (DevOps Days Ohio)
PDF
Can i service this from my raspberry pi
PPTX
Continuous business goal validation
PDF
[XPday.vn] Legacy code workshop (at) [XP Day Vietnam 2015]
PDF
[QE 2015] Michał Kordas - Agile testing: Optimizing the feedback loops
PPTX
AB Testing at Expedia
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
Humans by the hundred
Humans by the hundred (DevOps Days Ohio)
Can i service this from my raspberry pi
Continuous business goal validation
[XPday.vn] Legacy code workshop (at) [XP Day Vietnam 2015]
[QE 2015] Michał Kordas - Agile testing: Optimizing the feedback loops
AB Testing at Expedia

What's hot (20)

PPTX
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
PPT
Integrated Dev And Qa Team With Scrum
PDF
Feedback Loops in Agile Development
PDF
Developers’ mDay u Banjoj Luci - Milan Popović, PHP Srbija – Testimony (about...
PDF
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
PPTX
A Tale of Two Apps
PPTX
Tech debt will kill us
PPTX
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
PPTX
Why Does (My) Monitoring Suck?
PDF
Is Xp still extreme?
PDF
Debugging distributed systems
PDF
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...
PPTX
The Software Craftsman (2nd edition)
PPTX
The Software Craftsman
PPTX
Agile code quality metrics
PDF
Security Implications for a DevOps Transformation
PPT
Test drive on driven development process
PPTX
TDD = bra design?
PDF
JavaLand 2022 - Debugging distributed systems
PPTX
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
Integrated Dev And Qa Team With Scrum
Feedback Loops in Agile Development
Developers’ mDay u Banjoj Luci - Milan Popović, PHP Srbija – Testimony (about...
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
A Tale of Two Apps
Tech debt will kill us
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Why Does (My) Monitoring Suck?
Is Xp still extreme?
Debugging distributed systems
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...
The Software Craftsman (2nd edition)
The Software Craftsman
Agile code quality metrics
Security Implications for a DevOps Transformation
Test drive on driven development process
TDD = bra design?
JavaLand 2022 - Debugging distributed systems
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
Ad

Viewers also liked (6)

PDF
Building software that matters (BetterSoftware 2013)
PDF
No silver bullet - Diventare agili non è banale, nè scontato
PDF
Building software that matters (Optional Conf 2014)
PDF
Building software that matters (Agilia 2014)
PDF
7 lessons learned building high availability / performance systems - CM2015
PDF
Fifty shades of fail - redefining success and failure from a lean perspective
Building software that matters (BetterSoftware 2013)
No silver bullet - Diventare agili non è banale, nè scontato
Building software that matters (Optional Conf 2014)
Building software that matters (Agilia 2014)
7 lessons learned building high availability / performance systems - CM2015
Fifty shades of fail - redefining success and failure from a lean perspective
Ad

Similar to Never gonna give you up (20)

PDF
The "Holy Grail" of Dev/Ops
PDF
How to get started with Site Reliability Engineering
PPTX
Automate Everything! (No stress development/Tallinn)
PDF
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
PPTX
EMC World 2016 12 Factor Apps FTW
PDF
Production-ready Software
PDF
Availability in a cloud native world v1.6 (Feb 2019)
PPTX
Design Review Best Practices - SREcon 2014
PDF
Agile, Lean, Rugged: The Paper Edition - Ines Sombra's keynote at GOTO London
PDF
Agile, Rugged, and Lean - The Paper Edition
PPTX
Agile, DevOps & Test
PPTX
The Rocky Cloud Road
PDF
Lessons learned from building Demand Side Platform
PPTX
Design Reviews for Operations - Velocity Europe 2014
PPTX
Devops phase-1
PDF
The NBN Puppet Journey
PDF
Real-World DevOps — 20 Practical Developers Tips for Tightening Your Operatio...
PDF
Microservices 5 Things I Wish I'd Known - JFall 2017
PDF
Microservices 5 things i wish i'd known java with the best 2018
PPTX
Engineering DevOps and Cloud
The "Holy Grail" of Dev/Ops
How to get started with Site Reliability Engineering
Automate Everything! (No stress development/Tallinn)
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
EMC World 2016 12 Factor Apps FTW
Production-ready Software
Availability in a cloud native world v1.6 (Feb 2019)
Design Review Best Practices - SREcon 2014
Agile, Lean, Rugged: The Paper Edition - Ines Sombra's keynote at GOTO London
Agile, Rugged, and Lean - The Paper Edition
Agile, DevOps & Test
The Rocky Cloud Road
Lessons learned from building Demand Side Platform
Design Reviews for Operations - Velocity Europe 2014
Devops phase-1
The NBN Puppet Journey
Real-World DevOps — 20 Practical Developers Tips for Tightening Your Operatio...
Microservices 5 Things I Wish I'd Known - JFall 2017
Microservices 5 things i wish i'd known java with the best 2018
Engineering DevOps and Cloud

Recently uploaded (20)

PPTX
Mathew Digital SEO Checklist Guidlines 2025
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPTX
Introduction to Information and Communication Technology
PDF
Sims 4 Historia para lo sims 4 para jugar
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
Internet___Basics___Styled_ presentation
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PPTX
Database Information System - Management Information System
PPTX
innovation process that make everything different.pptx
DOCX
Unit-3 cyber security network security of internet system
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
Mathew Digital SEO Checklist Guidlines 2025
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
The New Creative Director: How AI Tools for Social Media Content Creation Are...
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
Introduction to Information and Communication Technology
Sims 4 Historia para lo sims 4 para jugar
Power Point - Lesson 3_2.pptx grad school presentation
Internet___Basics___Styled_ presentation
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Database Information System - Management Information System
innovation process that make everything different.pptx
Unit-3 cyber security network security of internet system
Unit-1 introduction to cyber security discuss about how to secure a system
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Exploring VPS Hosting Trends for SMBs in 2025
Job_Card_System_Styled_lorem_ipsum_.pptx

Never gonna give you up