SlideShare a Scribd company logo
ONE MAN OPS
      Reliability & Scale in AWS while letting you sleep through the night
                                                         Jos Boumans - @jiboumans
http://www.fwallpaper.net/picture_pics-Sleepy-cat.html
Tuesday 26 March 13
RIPE NCC
                      Engineering manager for RIPE Database
                                                              http://www.ripe.net/db
Tuesday 26 March 13
CANONICAL
                    Engineering manager for Ubuntu Server 10.04 & 10.10

http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775          http://www.ubuntu.com/business/server/overview
Tuesday 26 March 13
KRUX
                      VP of Operations & Infrastructure

                                                          http://www.krux.com/
Tuesday 26 March 13
GOOD GUYS OF DATA PRIVACY
Tuesday 26 March 13
SOME OF OUR CUSTOMERS
Tuesday 26 March 13
LOTS OF TRAFFIC
http://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html
Tuesday 26 March 13
0                              2,500                 5,000        7,500   10,000



               AVERAGE REQUESTS* / SEC
                                                              *Twitter: New tweets
                                                              Wikipedia: Articles read
https://twitter.com/tps_watcher
                                                              Krux: New data points
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
Tuesday 26 March 13
0                          150,000,000                          300,000,000              450,000,000   600,000,000




                  MONTHLY UNIQUE USERS
http://techcrunch.com/2012/12/18/twitter-passes-200m-monthly-active-users-a-42-increase-over-9-months/
http://technorati.com/technology/article/wikipedias-nonprofit-parent-raises-20-million/
Tuesday 26 March 13
WE CHOSE 'THE CLOUD'
http://previewnetworks.com/blog/
Tuesday 26 March 13
THERE ARE DOWNSIDES
http://modernsavage.hubpages.com/hub/10-springfield-shopper-headlines
Tuesday 26 March 13
FOCUS ON AWS
                                     http://aws.amazon.com/
Tuesday 26 March 13
APRIL 21, 2011
                                                                                                                    http://aws.amazon.com/message/65648/
http://businessnerds.wordpress.com/2011/05/28/so-far-so-good…-the-review/   http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
Tuesday 26 March 13
... SOME OUTAGES ...
                 ... SKIPPED FOR BREVITY ...
Tuesday 26 March 13
JUNE 14, 2012
http://www.laczik.org/BMW/repair/E38_wiring_harness/E38_wiring_harness.html   http://blog.pagerduty.com/2012/06/outage-post-mortem-june-14/
Tuesday 26 March 13
JUNE 29, 2012
http://www.fanpop.com/spots/thunderstorm/images/25416163/title/thunderstorms-wallpaper   http://aws.amazon.com/message/67457/
Tuesday 26 March 13
AWS OUTAGE = YOUR OUTAGE
http://it.mario.wikia.com/wiki/Lakitu
Tuesday 26 March 13
THE RULES HAVE CHANGED
                                                        You're not in Kansas anymore

http://entreatmenot.blogspot.com/2011/04/shattered-dreams.html
Tuesday 26 March 13
NETWORK WILL PARTITION
                                                              And it will happen often

http://thevinylvillain.blogspot.com/2010_04_01_archive.html
Tuesday 26 March 13
DISK IO WILL FLUCTUATE
                                                     On a good day, it's mediocre

http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
Tuesday 26 March 13
IP ADDRESSES WILL CHANGE
                       IP lease is 8 hours
                      DNS TTL is 60 seconds
www.fantom-xp.com
Tuesday 26 March 13
INSTANCES WILL DIE
                                  And it will always be your Database Master

http://room57.deviantart.com/art/Hangman-188353196
Tuesday 26 March 13
HUMANS MAKE MISTAKES
                      Including your humans

Tuesday 26 March 13
EMBRACE FAILURE
                                Hardware will fail. Humans will make errors.
                                   Nature will produce thunderstorms.
http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm
Tuesday 26 March 13
OR, COLLOQUIALLY

Tuesday 26 March 13
ADJUST YOUR STRATEGY
                                                      Don't bring a knife to a gun fight

http://www.flickr.com/photos/statlerhotel/6628770499/sizes/l/in/photostream/
Tuesday 26 March 13
DATA STORES
                                                     Some work better than others

http://gustavhoiland.com/2010/03/10/stacked-boxes/
Tuesday 26 March 13
RDBMS
         CouchDB
                                                                  BigTable Based
       Dynamo Based
                                                                Master / Slave based




                              CAP THEOREM
                      Your choice: sacrifice availability or consistency.
                                      Orange is a lie.
Tuesday 26 March 13
MYSQL / ORACLE VS RDS
                      See: Network partitioning & instances dying

Tuesday 26 March 13
AMAZON REDSHIFT
                                      Great for analytics/reports, bad for OLTP
                                           Unburden your RDS instances
http://www.flitemedia.com/music.php                                               http://aws.amazon.com/redshift
Tuesday 26 March 13
BIGTABLE BASED STORES
                                 HBase, Accumulo, Hypertable
                      Still suffer when network partitioning happens
                                                                       http://www.cloudera.com/cdh4/

Tuesday 26 March 13
DYNAMO BASED STORES
                                                         Cassandra, Riak, DynamoDB

http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html   http://aws.amazon.com/dynamodb/faqs/
Tuesday 26 March 13
GO HOSTED?
                                 CouchDB, MongoDB, Riak, Cassandra, HBase
                                          Your Latency May Vary
http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html
Tuesday 26 March 13
CLIENT SIDE STORAGE
                                          Keep a copy of your users data locally

http://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/       http://www.w3.org/2001/tag/2010/09/ClientSideStorage.html
Tuesday 26 March 13
FILE STORES
                                                                EBS vs Instance Store ...
                                                                     ... vs RamFS
http://homedezine.blogspot.com/2011/04/day-my-cat-removed-carpet-photo-studio.html
Tuesday 26 March 13
SIMPLE STORAGE SERVICE
                                                        S3: Arguably AWS' best feature

http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/
Tuesday 26 March 13
TRAFFIC SHAPING
                                                Control every part of the request

http://www.visualphotos.com/image/2x4154765/man_standing_with_traffic_cones_in_shape_of_u-turn
Tuesday 26 March 13
STAY LOCAL IF YOU CAN
                 Going off box exposes you to risks you need to mitigate

http://southshorewoman.com/issue/june-2010/article/local-character
Tuesday 26 March 13
CACHE WHAT YOU CAN
                                  HTTP Responses, DB Queries, User content
                                         Browsers have caches too!
http://theoatmeal.com/blog/charity_money
Tuesday 26 March 13
USE ELASTIC LOAD BALANCERS
                                                They will save you more than once

http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/
Tuesday 26 March 13
USE GLOBAL LOAD BALANCING
                      Fail over to the closest data center on region failure

Tuesday 26 March 13
SHOUT OUT: DYN
                      DNS for Bit.ly, Quora, Twitter, Wikia, etc

Tuesday 26 March 13
USE A CDN
                                        Critical items should always be available

http://kadanthuponanimidangal.blogspot.com/2010/12/blog-post_6992.html
Tuesday 26 March 13
MEASURE EVERYTHING
                Find outliers, deviants & trends before they cause trouble

http://www.themoviedb.org/movie/629-the-usual-suspects
Tuesday 26 March 13
GRAPHITE, STATSD & COLLECTD
                       Use Statsd & Collectd for application/system metrics
                           Use graphite to store, aggregate & visualize
                                                                                                                    http://hostedgraphite.com/
http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html   http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
Tuesday 26 March 13
GRAPH EVENTS
         Deployments, outages, CDN reconfigurations, failed builds, etc
          Anything that's important to the health of your eco system
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
Tuesday 26 March 13
COMPARE WEEK TO WEEK
                          Overlay week to week graphs using timeShift()
                         Quickly identifies trends and deviations from trends
http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-10
Tuesday 26 March 13
FORECASTING
                                 Use Holt-Winters confidence bands
                        Verify that your metrics are within normal tolerance
https://github.com/ripienaar/graphite-graph-dsl/wiki/Creating-Holt-Winters-Forecasts
Tuesday 26 March 13
FIND INDIVIDUAL OUTLIERS
                                                      Absolute numbers mean very little
                                                       Use mean & standard deviation
http://en.wikipedia.org/wiki/File:Black_sheep-1.jpg
Tuesday 26 March 13
ALERT ON TRENDS
                                Once you go over a threshold, it's too late
                              Alert on unwanted trends and preemptively fix
http://sub-second.blogspot.com/2012/06/reporting-response-times-percentile.html   http://aphyr.github.com/riemann/
Tuesday 26 March 13
MEASURE WITHOUT RETROFIT
                                          LogFormat "http.beacon:%D|ms" stats
                                         CustomLog "|nc -u localhost 8125" stats
                                                                               http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/
http://absinthemindedhero.blogspot.com/2012/03/victory-nonetheless.html   http://jiboumans.wordpress.com/2013/02/27/realtime-stats-from-varnish/
Tuesday 26 March 13
SHOUT OUT: NEW RELIC
             Java, but also Python, Ruby, .NET, PHP & NodeJS support
             In depth profiling of your app for performance & errors.
Tuesday 26 March 13
CONFIGURATION MANAGEMENT
                                                             Unique snowflakes are bad

http://www.torange.us/Plants/Conifers/spruce-needles-in-hoarfrost-424.html
Tuesday 26 March 13
PUPPET VS CHEF
                            Yes.

                                               http://puppetlabs.com/
                                       http://www.opscode.com/chef
Tuesday 26 March 13
INFRASTRUCTURE AS CODE
                                            Use different environments
                                            Measure and report on it
http://americansingercanary.com/green.htm
Tuesday 26 March 13
SHOUT OUT: UBUNTU
                                      Ubuntu + cloud-init + boto = awesome*
                                                                         *I am biased

http://www.123rf.com/photo_4871141_food-pyramid-isolated-on-white.html                  https://github.com/krux/ops-tools

Tuesday 26 March 13
AWS OPSWORKS
                                  Hosted Chef, No extra charge, Ubuntu 12.04 or Amazon Linux
                                                 Still rough around the edges.

http://thebrandbuilder.files.wordpress.com/2011/08/gordon-01.jpg                               http://aws.amazon.com/opsworks/

Tuesday 26 March 13
DEV = PRODUCTION
                          "I dunno, it worked on my laptop"
                                 Instead, use vagrant
http://vagrantup.com/                                         http://vagrantup.com/
Tuesday 26 March 13
ROLL YOUR OWN AMIS
                                                Instantly boot up new deployments
                                                     Reduce Time to Respond
http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html   http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/
Tuesday 26 March 13
CONFIDENT DEPLOYS
                                                   That human error could be yours

http://www.etsy.com/listing/37178125/stormtrooper-regrets-those-were-the
Tuesday 26 March 13
CONTINUOUS INTEGRATION
                         Ours: Github + Jenkins + FPM + apt::s3
                      From commit to deployable in one command                         http://github.com/
                                                                                    http://jenkins-ci.org/
                                                                      https://github.com/thekad/apt-s3
                                                             https://github.com/jordansissel/fpm/wiki/
Tuesday 26 March 13
ONE CLICK DEPLOYMENTS
                                        Deployments should not be exciting.
                                      Don't create a checklist; automate & track
                                                                                             https://checkmarkable.com
http://www.thegreenhead.com/2012/07/one-click-butter-cutter.php               https://github.com/jib/aws-analysis-tools/
Tuesday 26 March 13
DARK LAUNCHES
               Exercise the code without impacting the user experience
                                                                          http://www.kissmetrics.com/
http://www.layoutsparks.com/pictures/moon-23                   https://github.com/yahoo/boomerang/
Tuesday 26 March 13
SHADOW TRAFFIC
                                                    Test new code against live traffic

http://doppelthingers.tumblr.com/post/12839979386/traffic-light-shadow-hangman-and-possibly-his   https://gist.github.com/3125323
Tuesday 26 March 13
SLEEP TIGHT
                                           Slides at: www.Slideshare.net/jiboumans
                                                 We're hiring: www.krux.com
http://raafay-awan.blogspot.com/2011/08/cats-cutest-of-creatures.html
Tuesday 26 March 13

More Related Content

PDF
AWS: Architecting for resilience & cost at scale
PDF
One-Man Ops
PDF
Chaos patterns - architecting for failure in distributed systems
PDF
How to measure everything - a million metrics per second with minimal develop...
PDF
Cassandra for Sysadmins
PDF
Reliability & Scale in AWS while letting you sleep through the night
PDF
Architecting for Failure in AWS - PuppetConf 2013
PDF
Hadoop meets Cloud with Multi-Tenancy
AWS: Architecting for resilience & cost at scale
One-Man Ops
Chaos patterns - architecting for failure in distributed systems
How to measure everything - a million metrics per second with minimal develop...
Cassandra for Sysadmins
Reliability & Scale in AWS while letting you sleep through the night
Architecting for Failure in AWS - PuppetConf 2013
Hadoop meets Cloud with Multi-Tenancy

Similar to Devoxx UK: Reliability & Scale in AWS while letting you sleep through the night (20)

PDF
The architecture of data analytics PaaS on AWS
PDF
Big Data @ Bodensee Barcamp 2010
PDF
Introduction to Amazon Web Services
PDF
Availability, the Cloud and Everything
PDF
Service goes accessible_2013_sh
PPTX
Opening up: bibliographic data sharing & interoperability
PDF
Check Please!
PDF
20090309berkeley
PPTX
Cloud based Projects at Belfast eScience Centre
PDF
Aws assimilation
PDF
Introducing Riak and Ripple
PDF
Geo in the cloud
PDF
OpenStack Swift overview oscon2011
PPS
Web20expo Filesystems
PPS
Web20expo Filesystems
PPS
Web20expo Filesystems
PPS
Beyond the File System: Designing Large-Scale File Storage and Serving
PPS
Web20expo Filesystems
PDF
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
PDF
Beam PHP2012 Workshops: The Cloud
The architecture of data analytics PaaS on AWS
Big Data @ Bodensee Barcamp 2010
Introduction to Amazon Web Services
Availability, the Cloud and Everything
Service goes accessible_2013_sh
Opening up: bibliographic data sharing & interoperability
Check Please!
20090309berkeley
Cloud based Projects at Belfast eScience Centre
Aws assimilation
Introducing Riak and Ripple
Geo in the cloud
OpenStack Swift overview oscon2011
Web20expo Filesystems
Web20expo Filesystems
Web20expo Filesystems
Beyond the File System: Designing Large-Scale File Storage and Serving
Web20expo Filesystems
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
Beam PHP2012 Workshops: The Cloud
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
KodekX | Application Modernization Development
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Ad

Devoxx UK: Reliability & Scale in AWS while letting you sleep through the night

  • 1. ONE MAN OPS Reliability & Scale in AWS while letting you sleep through the night Jos Boumans - @jiboumans http://www.fwallpaper.net/picture_pics-Sleepy-cat.html Tuesday 26 March 13
  • 2. RIPE NCC Engineering manager for RIPE Database http://www.ripe.net/db Tuesday 26 March 13
  • 3. CANONICAL Engineering manager for Ubuntu Server 10.04 & 10.10 http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775 http://www.ubuntu.com/business/server/overview Tuesday 26 March 13
  • 4. KRUX VP of Operations & Infrastructure http://www.krux.com/ Tuesday 26 March 13
  • 5. GOOD GUYS OF DATA PRIVACY Tuesday 26 March 13
  • 6. SOME OF OUR CUSTOMERS Tuesday 26 March 13
  • 8. 0 2,500 5,000 7,500 10,000 AVERAGE REQUESTS* / SEC *Twitter: New tweets Wikipedia: Articles read https://twitter.com/tps_watcher Krux: New data points http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm Tuesday 26 March 13
  • 9. 0 150,000,000 300,000,000 450,000,000 600,000,000 MONTHLY UNIQUE USERS http://techcrunch.com/2012/12/18/twitter-passes-200m-monthly-active-users-a-42-increase-over-9-months/ http://technorati.com/technology/article/wikipedias-nonprofit-parent-raises-20-million/ Tuesday 26 March 13
  • 10. WE CHOSE 'THE CLOUD' http://previewnetworks.com/blog/ Tuesday 26 March 13
  • 12. FOCUS ON AWS http://aws.amazon.com/ Tuesday 26 March 13
  • 13. APRIL 21, 2011 http://aws.amazon.com/message/65648/ http://businessnerds.wordpress.com/2011/05/28/so-far-so-good…-the-review/ http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html Tuesday 26 March 13
  • 14. ... SOME OUTAGES ... ... SKIPPED FOR BREVITY ... Tuesday 26 March 13
  • 15. JUNE 14, 2012 http://www.laczik.org/BMW/repair/E38_wiring_harness/E38_wiring_harness.html http://blog.pagerduty.com/2012/06/outage-post-mortem-june-14/ Tuesday 26 March 13
  • 17. AWS OUTAGE = YOUR OUTAGE http://it.mario.wikia.com/wiki/Lakitu Tuesday 26 March 13
  • 18. THE RULES HAVE CHANGED You're not in Kansas anymore http://entreatmenot.blogspot.com/2011/04/shattered-dreams.html Tuesday 26 March 13
  • 19. NETWORK WILL PARTITION And it will happen often http://thevinylvillain.blogspot.com/2010_04_01_archive.html Tuesday 26 March 13
  • 20. DISK IO WILL FLUCTUATE On a good day, it's mediocre http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm Tuesday 26 March 13
  • 21. IP ADDRESSES WILL CHANGE IP lease is 8 hours DNS TTL is 60 seconds www.fantom-xp.com Tuesday 26 March 13
  • 22. INSTANCES WILL DIE And it will always be your Database Master http://room57.deviantart.com/art/Hangman-188353196 Tuesday 26 March 13
  • 23. HUMANS MAKE MISTAKES Including your humans Tuesday 26 March 13
  • 24. EMBRACE FAILURE Hardware will fail. Humans will make errors. Nature will produce thunderstorms. http://www.freeguidetonwcamping.com/oregon_washington_main/washington/southwest_wa/cape_disappointment_sp.htm Tuesday 26 March 13
  • 26. ADJUST YOUR STRATEGY Don't bring a knife to a gun fight http://www.flickr.com/photos/statlerhotel/6628770499/sizes/l/in/photostream/ Tuesday 26 March 13
  • 27. DATA STORES Some work better than others http://gustavhoiland.com/2010/03/10/stacked-boxes/ Tuesday 26 March 13
  • 28. RDBMS CouchDB BigTable Based Dynamo Based Master / Slave based CAP THEOREM Your choice: sacrifice availability or consistency. Orange is a lie. Tuesday 26 March 13
  • 29. MYSQL / ORACLE VS RDS See: Network partitioning & instances dying Tuesday 26 March 13
  • 30. AMAZON REDSHIFT Great for analytics/reports, bad for OLTP Unburden your RDS instances http://www.flitemedia.com/music.php http://aws.amazon.com/redshift Tuesday 26 March 13
  • 31. BIGTABLE BASED STORES HBase, Accumulo, Hypertable Still suffer when network partitioning happens http://www.cloudera.com/cdh4/ Tuesday 26 March 13
  • 32. DYNAMO BASED STORES Cassandra, Riak, DynamoDB http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html http://aws.amazon.com/dynamodb/faqs/ Tuesday 26 March 13
  • 33. GO HOSTED? CouchDB, MongoDB, Riak, Cassandra, HBase Your Latency May Vary http://www.fromoldbooks.org/Walker-ElectricLightingForShips/pages/015-Siemens-Alternate-Current-Dynamo//1552x1175-q75.html Tuesday 26 March 13
  • 34. CLIENT SIDE STORAGE Keep a copy of your users data locally http://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/ http://www.w3.org/2001/tag/2010/09/ClientSideStorage.html Tuesday 26 March 13
  • 35. FILE STORES EBS vs Instance Store ... ... vs RamFS http://homedezine.blogspot.com/2011/04/day-my-cat-removed-carpet-photo-studio.html Tuesday 26 March 13
  • 36. SIMPLE STORAGE SERVICE S3: Arguably AWS' best feature http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/ Tuesday 26 March 13
  • 37. TRAFFIC SHAPING Control every part of the request http://www.visualphotos.com/image/2x4154765/man_standing_with_traffic_cones_in_shape_of_u-turn Tuesday 26 March 13
  • 38. STAY LOCAL IF YOU CAN Going off box exposes you to risks you need to mitigate http://southshorewoman.com/issue/june-2010/article/local-character Tuesday 26 March 13
  • 39. CACHE WHAT YOU CAN HTTP Responses, DB Queries, User content Browsers have caches too! http://theoatmeal.com/blog/charity_money Tuesday 26 March 13
  • 40. USE ELASTIC LOAD BALANCERS They will save you more than once http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/ Tuesday 26 March 13
  • 41. USE GLOBAL LOAD BALANCING Fail over to the closest data center on region failure Tuesday 26 March 13
  • 42. SHOUT OUT: DYN DNS for Bit.ly, Quora, Twitter, Wikia, etc Tuesday 26 March 13
  • 43. USE A CDN Critical items should always be available http://kadanthuponanimidangal.blogspot.com/2010/12/blog-post_6992.html Tuesday 26 March 13
  • 44. MEASURE EVERYTHING Find outliers, deviants & trends before they cause trouble http://www.themoviedb.org/movie/629-the-usual-suspects Tuesday 26 March 13
  • 45. GRAPHITE, STATSD & COLLECTD Use Statsd & Collectd for application/system metrics Use graphite to store, aggregate & visualize http://hostedgraphite.com/ http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/ Tuesday 26 March 13
  • 46. GRAPH EVENTS Deployments, outages, CDN reconfigurations, failed builds, etc Anything that's important to the health of your eco system http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/ Tuesday 26 March 13
  • 47. COMPARE WEEK TO WEEK Overlay week to week graphs using timeShift() Quickly identifies trends and deviations from trends http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-10 Tuesday 26 March 13
  • 48. FORECASTING Use Holt-Winters confidence bands Verify that your metrics are within normal tolerance https://github.com/ripienaar/graphite-graph-dsl/wiki/Creating-Holt-Winters-Forecasts Tuesday 26 March 13
  • 49. FIND INDIVIDUAL OUTLIERS Absolute numbers mean very little Use mean & standard deviation http://en.wikipedia.org/wiki/File:Black_sheep-1.jpg Tuesday 26 March 13
  • 50. ALERT ON TRENDS Once you go over a threshold, it's too late Alert on unwanted trends and preemptively fix http://sub-second.blogspot.com/2012/06/reporting-response-times-percentile.html http://aphyr.github.com/riemann/ Tuesday 26 March 13
  • 51. MEASURE WITHOUT RETROFIT LogFormat "http.beacon:%D|ms" stats CustomLog "|nc -u localhost 8125" stats http://jiboumans.wordpress.com/2012/07/02/measure-all-the-things/ http://absinthemindedhero.blogspot.com/2012/03/victory-nonetheless.html http://jiboumans.wordpress.com/2013/02/27/realtime-stats-from-varnish/ Tuesday 26 March 13
  • 52. SHOUT OUT: NEW RELIC Java, but also Python, Ruby, .NET, PHP & NodeJS support In depth profiling of your app for performance & errors. Tuesday 26 March 13
  • 53. CONFIGURATION MANAGEMENT Unique snowflakes are bad http://www.torange.us/Plants/Conifers/spruce-needles-in-hoarfrost-424.html Tuesday 26 March 13
  • 54. PUPPET VS CHEF Yes. http://puppetlabs.com/ http://www.opscode.com/chef Tuesday 26 March 13
  • 55. INFRASTRUCTURE AS CODE Use different environments Measure and report on it http://americansingercanary.com/green.htm Tuesday 26 March 13
  • 56. SHOUT OUT: UBUNTU Ubuntu + cloud-init + boto = awesome* *I am biased http://www.123rf.com/photo_4871141_food-pyramid-isolated-on-white.html https://github.com/krux/ops-tools Tuesday 26 March 13
  • 57. AWS OPSWORKS Hosted Chef, No extra charge, Ubuntu 12.04 or Amazon Linux Still rough around the edges. http://thebrandbuilder.files.wordpress.com/2011/08/gordon-01.jpg http://aws.amazon.com/opsworks/ Tuesday 26 March 13
  • 58. DEV = PRODUCTION "I dunno, it worked on my laptop" Instead, use vagrant http://vagrantup.com/ http://vagrantup.com/ Tuesday 26 March 13
  • 59. ROLL YOUR OWN AMIS Instantly boot up new deployments Reduce Time to Respond http://bakingismyzen.blogspot.com/2011/07/beignets-cant-have-just-one.html http://puppetlabs.com/blog/rapid-scaling-with-auto-generated-amis-using-puppet/ Tuesday 26 March 13
  • 60. CONFIDENT DEPLOYS That human error could be yours http://www.etsy.com/listing/37178125/stormtrooper-regrets-those-were-the Tuesday 26 March 13
  • 61. CONTINUOUS INTEGRATION Ours: Github + Jenkins + FPM + apt::s3 From commit to deployable in one command http://github.com/ http://jenkins-ci.org/ https://github.com/thekad/apt-s3 https://github.com/jordansissel/fpm/wiki/ Tuesday 26 March 13
  • 62. ONE CLICK DEPLOYMENTS Deployments should not be exciting. Don't create a checklist; automate & track https://checkmarkable.com http://www.thegreenhead.com/2012/07/one-click-butter-cutter.php https://github.com/jib/aws-analysis-tools/ Tuesday 26 March 13
  • 63. DARK LAUNCHES Exercise the code without impacting the user experience http://www.kissmetrics.com/ http://www.layoutsparks.com/pictures/moon-23 https://github.com/yahoo/boomerang/ Tuesday 26 March 13
  • 64. SHADOW TRAFFIC Test new code against live traffic http://doppelthingers.tumblr.com/post/12839979386/traffic-light-shadow-hangman-and-possibly-his https://gist.github.com/3125323 Tuesday 26 March 13
  • 65. SLEEP TIGHT Slides at: www.Slideshare.net/jiboumans We're hiring: www.krux.com http://raafay-awan.blogspot.com/2011/08/cats-cutest-of-creatures.html Tuesday 26 March 13