SlideShare a Scribd company logo
WAYSTO MINIMISE PERFORMANCE RISKS
IN CONTINUOUS DELIVERY
AdriaanThomas
4 June 2013
INTRODUCTION
OBJECTIVE
Put working software into production as quickly as possible, whilst minimising risk of
load-related problems:
• Bad response times
• Lack of capacity
• Availability too low
• Excessive system resource use
Within the context of websites.
Ways to minimise performance risks in continuous delivery
TRADITIONAL APPROACH
Load testing through simulation
http://www.flickr.com/photos/danramarch/4423023837
DECIDE WHATTOTEST
•Focus on busiest instant
•Model most-hit functionality
•Extrapolate to expected load
•Look at production traffic
•Or attempt educated guess
DECIDE ON SCOPE
Component test
Chain test
Full environment test
•Test coverage
•Level of certainty
•Number of systems
•Amount of work
SET UPTEST DATA
• Usually starts as a copy from production
• Or educated guess what people will enter
• Render anonymous
• Make tests deterministic
• Synchronise between all systems
http://www.flickr.com/photos/22168167@N00/3889737939/
DECIDE ON STRATEGY
One or more of:
•Scalability test
•Stress test
•Endurance test
•Regression test
•Resilience test
http://www.flickr.com/photos/timjoyfamily/5935279962/
DECIDE ONTEST DURATION
(which is tricky)
http://www.flickr.com/photos/wwarby/3297205226
PROVIDE HARDWARE
http://www.flickr.com/photos/s_w_ellis/2681151694/
Copy of production?
Only one copy?
Virtualisation?
Sharing between teams?
INTEGRATE INTO PIPELINE
Unit test
Functional
integration
test
Load test
Very fast Fast Takes longer
INTEGRATE INTO PIPELINE
Unit test
Functional
integration
test
Load test
Very fast Takes longer
PERMANENT LOADTESTING
Daytime: constant load, teams
inspect impact of changes
Nighttime: Endurance
test
Weekends: refresh test data
http://www.flickr.com/photos/renaissancechambara/5106171956/
RESPONSETIME
DNS lookup (www.xebia.com)
Time to first byte + loading HTML
Time to render
Time to document complete
Browser CPU use
Bandwidth
# connections to a single
host
http://www.webpagetest.org/result/130522_FG_10SC/1/details/
SSL handshake
Parse times
Blocking client code
IMPACT OFTHE BROWSER
www.browserscope.org
CLEAR REQUIREMENTS
Response time
Fail: 10 Now: 3.5 Goal: 1
Intention: Users get a response quickly so that
they are happy and spend more money.
Stakeholder: Marketing dept.
Scale: 95th percentile of “document complete”
response times, in seconds, measured over one
minute.
Metric: Page load times as reported by our
RUM tool.
Inspired byTom Gilb, Competitive Engineering
WebPageTest: first view + repeat view (median of 3)
95th percentile response times from access logs
ADJUST REQUIREMENTS DUETO LACK OF
REAL BROWSERS
Playground to test changes
No impact on real users
Less pressure
More work
Guesswork and extrapolation
Can take a significant amount of time
More hardware
THINGS WILL BREAK...
... in spite of your best efforts
http://www.flickr.com/photos/jmarty/1239950166/
SO INSTEAD WE SHOULD FOCUS ON
FAST RECOVERY
http://www.flickr.com/photos/19107136@N02/8386567228/
“MTTR is more important than
MTBF*”
John Allspaw
* for most types of F
0
0.5
1.0
1.5
2.0
99thpercentileresponsetime(s)
Test duration
MTBF LEADSTO FUD
Time→
TTD find cause (RCA) write & test fix build deploy
validate
compile
deploy&test
Monitoring
Alerts
•Skills
•Organisation
•Culture
•Maintainability
•Simple architecture
•Fastworkstations
•Goodtooling
•Abletoquicklytestlocally
•Automation
•Fastbuildserver
•Efficienttests
Monitoring
•Automation
•Flexiblearchitecture
TTR
DEMING FEEDBACK LOOPS
Plan
Do
Study
Act
OODA LOOPS
Observe
Orient
Decide
Act
AVOIDTEST-ONLY MEASUREMENTS
SIMPLE ARCHITECTURE
THE ONLYTHINGTHAT MATTERS IS
WHAT HAPPENS IN PRODUCTION
Everything else is an assumption.
DEPLOYING CHANGES
http://www.flickr.com/photos/39463459@N08/5083733600
Ways to minimise performance risks in continuous delivery
BLUE-GREEN DEPLOYMENTS
Version n+1
Version n
Amazon
Route 53
Elastic
Load
Balancer
Elastic
Load
Balancer
Instances
Instances
DARK LAUNCHING
Web page DB
DARK LAUNCHING
Web page DB Weather SP
DARK LAUNCHING
Web page DB Weather SP
FEATURETOGGLES
CANARY RELEASING
0% 100%
PRODUCTION-IMMUNE SYSTEMS
CONTROLLED LOADTESTING
Instance RDS DB
Instance
RDS DB Instance
Read Replica
Instance
Instance
Amazon
Route 53
Elastic
Load
Balancer
MONITORING
http://www.flickr.com/photos/smieyetracking/5609671098/
MONITORING
Technical metrics
•CPU use
•Memory use
•TPS
•Response times
•etc
Process metrics
•# bugs
•MTTR, MTTD
•Time from idea to live on site
•etc
Business metrics
•Revenue
•# unique visitors
•etc
http://www.flickr.com/photos/smieyetracking/5609671098/
MEASURE IMPACT OF CHANGES
tail	
  -­‐f	
  access_log	
  |	
  alstat.pl	
  -­‐i10	
  -­‐n10	
  -­‐stt
	
  	
  	
  	
  Hits	
  	
  Hits%	
  	
  	
  	
  TPS	
  AvgTmTk	
  TTmTk%	
  	
  AvgRSize	
  RSize%	
  2013-­‐06-­‐04	
  19:37:40	
  (08)
	
  	
  	
  	
  	
  	
  14	
  	
  	
  0.1%	
  	
  	
  	
  1.4	
  	
  	
  1.652	
  	
  	
  5.7%	
  	
  	
  	
  	
  	
  2691	
  	
  	
  0.2%	
  POST	
  	
  	
  200	
  /login.do
	
  	
  	
  	
  	
  	
  14	
  	
  	
  0.1%	
  	
  	
  	
  1.4	
  	
  	
  0.918	
  	
  	
  3.2%	
  	
  	
  	
  	
  	
  3739	
  	
  	
  0.3%	
  GET	
  	
  	
  	
  200	
  /home.do
	
  	
  	
  	
  	
  	
  14	
  	
  	
  0.1%	
  	
  	
  	
  1.4	
  	
  	
  0.879	
  	
  	
  3.1%	
  	
  	
  	
  	
  	
  3185	
  	
  	
  0.2%	
  POST	
  	
  	
  200	
  /order.do
	
  	
  	
  	
  	
  	
  	
  7	
  	
  	
  0.1%	
  	
  	
  	
  0.7	
  	
  	
  0.807	
  	
  	
  1.4%	
  	
  	
  	
  	
  	
  1974	
  	
  	
  0.1%	
  POST	
  	
  	
  200	
  /account.do
	
  	
  	
  	
  	
  	
  	
  4	
  	
  	
  0.0%	
  	
  	
  	
  0.4	
  	
  	
  0.735	
  	
  	
  0.7%	
  	
  	
  	
  	
  	
  3228	
  	
  	
  0.1%	
  GET	
  	
  	
  	
  200	
  /products.do
	
  	
  	
  	
  	
  	
  	
  5	
  	
  	
  0.0%	
  	
  	
  	
  0.5	
  	
  	
  0.697	
  	
  	
  0.9%	
  	
  	
  	
  	
  	
  	
  969	
  	
  	
  0.0%	
  POST	
  	
  	
  200	
  /settings.do
	
  	
  	
  	
  	
  	
  	
  9	
  	
  	
  0.1%	
  	
  	
  	
  0.9	
  	
  	
  0.687	
  	
  	
  1.5%	
  	
  	
  	
  	
  	
  1827	
  	
  	
  0.1%	
  POST	
  	
  	
  200	
  /changeorder.do
	
  	
  	
  	
  	
  	
  27	
  	
  	
  0.2%	
  	
  	
  	
  2.7	
  	
  	
  0.649	
  	
  	
  4.3%	
  	
  	
  	
  	
  	
  2997	
  	
  	
  0.4%	
  POST	
  	
  	
  200	
  /newpasswd.do
	
  	
  	
  	
  	
  	
  15	
  	
  	
  0.1%	
  	
  	
  	
  1.5	
  	
  	
  0.580	
  	
  	
  2.2%	
  	
  	
  	
  	
  	
  2488	
  	
  	
  0.2%	
  GET	
  	
  	
  	
  200	
  /offer.do
	
  	
  	
  	
  	
  	
  95	
  	
  	
  0.9%	
  	
  	
  	
  9.5	
  	
  	
  0.520	
  	
  12.2%	
  	
  	
  	
  	
  	
  4801	
  	
  	
  2.3%	
  GET	
  	
  	
  	
  200	
  /search.do
MEASURE LATENCY
Avg. response times front end vs backend
Number of calls
SMALL DEPLOYMENTS
http://www.flickr.com/photos/rbulmahn/4925464931/
GO/NO-GO MEETINGS
• What are the biggest fears?
• How can we measure this?
• What can be done if it does happen?
RETROSPECTIVES
How can we prevent a failure from
happening again?
How can we detect it earlier?
Was there only one root cause?
http://www.flickr.com/photos/katerha/8380451137
INTRODUCE OUTAGES
Chaos monkey
Game day exercises
http://www.flickr.com/photos/frostnova/440551442/
CULTURE
• Dev and Ops work together on providing information.
• Assumptions are dangerous, try to eliminate as many as possible.
• Small changes are easier to fix than large ones.
• Deploy during office hours so everyone is available in case problems happen.
• All information, including business metrics, should be accessible to everyone.
CLAMS
Culture
Lean
Automation
Measurement
Sharing
SIMPLE, FLEXIBLE ARCHITECTURE
• If the site goes down often, probably its architecture is at fault
• Avoid fragile systems
• Resilience is key
• Scalable (redundancy is not waste)
• Rather many small systems than a few large ones
• State is a “hot brick”
CHANGES FORTHE BUSINESS
• Accept to push smaller changes.
• Continuous delivery vs continuous
deployment.
• Share data.
CONCLUSION
Work on your ability to respond to failure.Trying to prevent failure can slow you down
and make you focus on the wrong things.
Keep assumptions clearly separated from facts. Make your decisions based on evidence.
Measure everything, including the impact of changes to the business.
Look for your compromise, try permanent load testing first and learn from that.
QUESTIONS?
athomas@xebia.com
@a32an
www.xebia.com
blog.xebia.com
(we’re hiring)

More Related Content

PDF
Capacity Management for Web Operations
PDF
Continuous delivery while minimizing performance risks
PDF
Continuous delivery while minimizing performance risks (dutch web ops meetup)
PPT
Analysis of my contents page
PPTX
How did you attract
PPTX
Task 4
PPTX
My manipulated images
PPTX
powerpoint evaluation
Capacity Management for Web Operations
Continuous delivery while minimizing performance risks
Continuous delivery while minimizing performance risks (dutch web ops meetup)
Analysis of my contents page
How did you attract
Task 4
My manipulated images
powerpoint evaluation

Viewers also liked (18)

PPTX
Analysing contents pages prep for blog ppt
PPTX
My Genre
PPTX
Top ten original images
PPTX
DOCX
Contents draft
PPTX
As media unit–preliminary task
PPTX
Main task brief
PPT
Analysing contents pages prep for blog ppt
PPTX
Stages of development double page spread
DOCX
Question 7
PPTX
PPT
Preliminary evaluation example_ppt
PPTX
Hip hop magazine
ODP
conventions
PPTX
Photographs (unaltered and permission)
PPTX
Initial analysis of music magazine
PPT
Analysing nme contents page
PDF
The brief and + initial ideas
Analysing contents pages prep for blog ppt
My Genre
Top ten original images
Contents draft
As media unit–preliminary task
Main task brief
Analysing contents pages prep for blog ppt
Stages of development double page spread
Question 7
Preliminary evaluation example_ppt
Hip hop magazine
conventions
Photographs (unaltered and permission)
Initial analysis of music magazine
Analysing nme contents page
The brief and + initial ideas
Ad

Similar to Ways to minimise performance risks in continuous delivery (20)

PPTX
Machine Learning Impact on IoT - Part 2
PDF
Health monitoring & predictive analytics to lower the TCO in a datacenter
PDF
Dev and Ops Collaboration and Awareness at Etsy and Flickr
PDF
Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
PDF
Designing and Running Performance Experiments
PDF
[INSIGHT OUT 2011] A23 database io performance measuring planning(alex)
PPTX
Designing apps for resiliency
PDF
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
PDF
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
PPT
itSMF Presentation March 2009
PDF
Load Test Like a Pro
PPTX
Performance testingfromthecloud_usingBlazemeter
PPTX
Value add: Single User Performance Testing (http://managingperformancetesting...
PPTX
Supply chain design and operation
PDF
Telemetry Onboarding
PPTX
Introduction to Continuous Delivery (BBWorld/DevCon 2013)
PDF
Framework and Application Benchmarking
PDF
New ideas for trimming O&M costs
PPTX
ICEflo Implementation Management Solution V1d1
PPS
Fujitsu APD Introduction
Machine Learning Impact on IoT - Part 2
Health monitoring & predictive analytics to lower the TCO in a datacenter
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
Designing and Running Performance Experiments
[INSIGHT OUT 2011] A23 database io performance measuring planning(alex)
Designing apps for resiliency
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
itSMF Presentation March 2009
Load Test Like a Pro
Performance testingfromthecloud_usingBlazemeter
Value add: Single User Performance Testing (http://managingperformancetesting...
Supply chain design and operation
Telemetry Onboarding
Introduction to Continuous Delivery (BBWorld/DevCon 2013)
Framework and Application Benchmarking
New ideas for trimming O&M costs
ICEflo Implementation Management Solution V1d1
Fujitsu APD Introduction
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Hybrid model detection and classification of lung cancer
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Encapsulation theory and applications.pdf
PDF
August Patch Tuesday
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
A Presentation on Touch Screen Technology
Digital-Transformation-Roadmap-for-Companies.pptx
Zenith AI: Advanced Artificial Intelligence
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)
Hindi spoken digit analysis for native and non-native speakers
Group 1 Presentation -Planning and Decision Making .pptx
Building Integrated photovoltaic BIPV_UPV.pdf
1. Introduction to Computer Programming.pptx
Web App vs Mobile App What Should You Build First.pdf
DP Operators-handbook-extract for the Mautical Institute
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A novel scalable deep ensemble learning framework for big data classification...
Hybrid model detection and classification of lung cancer
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation theory and applications.pdf
August Patch Tuesday
OMC Textile Division Presentation 2021.pptx
Chapter 5: Probability Theory and Statistics
A Presentation on Touch Screen Technology

Ways to minimise performance risks in continuous delivery