SlideShare a Scribd company logo
Operations Driven Web Services
-A Case Study of Service Evolution at Rent the
Runway
Camille Fournier, Head of Engineering @skamille
Carlo Barbara, Senior Systems Engineer
@CarloBarbara
In The Beginning, There Was
Drupal
There was also all of these
folks…
Operations-Driven Web Services at Rent the Runway
Can‟t Just Burn the World Down
Hollow It Out!
Hollow It Out!
Hollow It Out!
Hollow It Out!
Complexity
0
2
4
6
8
10
12
14
Dec-11
Jan-12
Feb-12
Mar-12
Apr-12
May-12
Jun-12
Jul-12
Aug-12
Sep-12
Oct-12
Nov-12
Dec-12
Jan-13
Feb-13
Mar-13
Apr-13
May-13
Jun-13
Jul-13
Number of Services in Production
Operations first…
 Availability and performance of our services is critical to
running our business
 The software we develop has to make delivering on our SLAs
possible
 How (besides sane design):
 Healthchecks + Nagios
 Measurements
 Historical Data with Graphs
Metrics
 Gauges – instantaneous value
 Counters – counter with +/-
 Meters – rate over time (mean, 1, 5, & 15 moving avg.)
 Histograms – distribution of data (mean, median, max, std.
div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)
 Timers – Meter of requests & Histogram of duration (frequency
& latency)
Metrics - Healthchecks
 Verify that your service is running correctly
Metrics - Reporting
 HTTP
 JMX
 Graphite
Dropwizard: What is it?
 Quality open source Java webservice components glued
together in a modular way
 Eliminates the need for picking a platform stack, it‟s all there
 It‟s opinionated. If you don‟t like a Dropwizard core
component, that‟s too bad, don‟t use Dropwizard
 Developers focus on business logic, not framework
 It‟s easy, maintainable, and it works!
A Few Words from Coda…
“I had no one I had to toss a WAR to. I had no one to
stand up a Tomcat server and fiddle with it until their
eyes bled. I had no one who didn't trust me to spin up
my own threads or connection pools. So I wrote
something which worked as simply and in as straight-
forward a manner as possible because my own ass
was on the line if it didn't work.”
Dropwizard: The Ingredients
 Jersey for REST
 Jackson for JSON
 Jetty for a webserver
 Metrics for measuring
 YAML for configuring
 Dropwizard for weaving everything together
Dropwizard – Healthchecks
 Register hooks that check the health of your app
 An HTTP endpoint that iterates over all the hooks
 “The meaning of healthy” is decided by you (i. e. Database
Connections, Client Connections, DeadLock Count)
Dropwizard + Metrics
 Dropwizard has lots of platform instrumentation baked in using
Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)
 Ability to add Timers to your endpoints with @Timed
 Ability to add arbitrary metrics as you see fit
Other Frameworks
 Play 1.X
 Abandonware for Play 2.X, which was still beta
 Magic
 Glassfish
 OSGI hell
 “standards”
 Spring
 Everything and the kitchen sink
 Also I hate XML
What do I get out of it? Dev
agenda
 Story telling: causation & correlation
 Integral piece of the operational excellence puzzle
 State of the world – Dashboards
 Developers focus on features, operations is mostly free lunch
 Code review & demo
Disclaimer: You need graphite to really harness the value
Story telling
 The grid is slow why?
 Is it load?
 Is it dependent service latency?
 How does that compare to yesterday
 JVM throws out of memory, what‟s the problem?
 What does the GC jigsaw look?
 When did it change?
 Is it correlated with increased load?
 How is that new „performance‟ tweak?
 If you never measured, then you didn‟t tune. True story!
 What does my 5XX graph look like?
Operational Excellence: The ingredients
 Application Instrumentation (Dropwizard)
 Time Series Data & Graphing (Graphite, D3)
 Centralized logging & log parsing (Rsyslog, Logstash, Nagios)
 Automated alerting & escalation (Pagerduty)
DW & Graphite will get you very far, but if you want total control &
visibility you need the rest. This is the stack that RTR is moving
towards, rather than relying on basic java logging smtp appenders
OMG, we are on GMA, are we
OK?
 10+ services
 Each services runs in a cluster behind an LB
 „OK‟ is somewhat service specific
Basically you need a lot of info at your fingertips. Pictures are
worth a thousand words. Get yourself some dashboards!
Graphite Dashboard
Tasseo dashboard (D3)
• Red, Yellow, & Green Lights
• Realtime
• Endless cool things: graphite + D3
If we see yellow or red, start diagnosing
Free Lunch? Not really
 DB connection pool monitoring
 Http client connection pool monitoring
 JVM Heap & GC info
 Http Server response counts
 Http Server connection info
 Endpoint duration & throughput stats
Where do I sign up?
 You install Graphite, one time hit + some TLC. Medium
Difficulty
 You annotate your endpoints and maybe add finer telemetry.
Easy
 You configure so your service is feeding into graphite.
Hopefully consistently across services, via a „Bundle‟. Easy
Demo
 Show a simple dropwizard codebase
 Do some curls
 Show the admin endpoints
References
 dropwizard.codahale.com
 metrics.codahale.com
 graphite.wikidot.com
Presenters
 @CarloBarbara (www.cabkata.com)
 @Skamille (whilefalse.blogspot.com)
 Rent The Runway is hiring! (renttherunway.com/careers)

More Related Content

PPTX
So You Want to Rewrite That...
PDF
How to Sell Kanban to Your Boss
PDF
PEX Sydney Steve_Towers * BPM is Dead, Long Live CEM *
PDF
Scaling Agile Delivery
PDF
Building Smart Software
PDF
LKCE16 - Kanban more than you think by Wolfgang Wiedenroth
PPTX
Ideas & execution
PDF
LeanKit Webinar: Evolving Your Daily Standup with Kanban by Brendan Wovchko
So You Want to Rewrite That...
How to Sell Kanban to Your Boss
PEX Sydney Steve_Towers * BPM is Dead, Long Live CEM *
Scaling Agile Delivery
Building Smart Software
LKCE16 - Kanban more than you think by Wolfgang Wiedenroth
Ideas & execution
LeanKit Webinar: Evolving Your Daily Standup with Kanban by Brendan Wovchko

What's hot (20)

PDF
Reducing Tickets and Crushing SLAs with StatusPage
PPTX
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
PDF
LKCE17 Katya Terekhova - A Siberian tale of two Kanban implementations
KEY
How agile is rails
PPTX
What's really going on? Essential delivery metrics for Product Managers
PPTX
BoS2015 Rich Mironov - The Four Laws of Software Economics
PDF
Working without a Product Owner by Maaret Pyhajarvi at #AgileIndia2019
PDF
Scrum Fails?
 
PDF
Feedback Loops v4x3 Lightening
PPTX
Great! another bug
PDF
The ART of Avoiding a Train Wreck - European SAFe Summit 2020
PDF
Performance and Metrics at Lonely Planet
PDF
Self-Selection: An Agile Approach to Forming Teams @ Scale
PDF
Agile India: Working without Product Owner
PDF
Lean Scaling – From Lean Startup to Lean Enterprise - Itamar Goldminz
PDF
Principles of Lean UX
PDF
O product where art thou
PDF
LKCE16 - How Kanban saved a Salvation Army hospital in Indonesia by Marcus Ha...
PPTX
12 Ways To Improve the Web Developer & Account Manager Relationship
PDF
How Talking Becomes Doing With Stride
Reducing Tickets and Crushing SLAs with StatusPage
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
LKCE17 Katya Terekhova - A Siberian tale of two Kanban implementations
How agile is rails
What's really going on? Essential delivery metrics for Product Managers
BoS2015 Rich Mironov - The Four Laws of Software Economics
Working without a Product Owner by Maaret Pyhajarvi at #AgileIndia2019
Scrum Fails?
 
Feedback Loops v4x3 Lightening
Great! another bug
The ART of Avoiding a Train Wreck - European SAFe Summit 2020
Performance and Metrics at Lonely Planet
Self-Selection: An Agile Approach to Forming Teams @ Scale
Agile India: Working without Product Owner
Lean Scaling – From Lean Startup to Lean Enterprise - Itamar Goldminz
Principles of Lean UX
O product where art thou
LKCE16 - How Kanban saved a Salvation Army hospital in Indonesia by Marcus Ha...
12 Ways To Improve the Web Developer & Account Manager Relationship
How Talking Becomes Doing With Stride
Ad

Viewers also liked (11)

PDF
Deploying Ruby/Sinatra at Rent the Runway - Next Dev StackUp,May 6, 2014
PDF
How to build your own iOS framework
KEY
Writing Scalable Software in Java
PPTX
Zoo keeper for ricon
PDF
Simple REST-APIs with Dropwizard and Swagger
PPTX
The elements of scale
PPTX
How to go from structureless to structured without losing your vibe
PPTX
So we're running Apache ZooKeeper. Now What? By Camille Fournier
PPTX
The Role of CTO: A Rantifesto
PDF
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
PDF
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Deploying Ruby/Sinatra at Rent the Runway - Next Dev StackUp,May 6, 2014
How to build your own iOS framework
Writing Scalable Software in Java
Zoo keeper for ricon
Simple REST-APIs with Dropwizard and Swagger
The elements of scale
How to go from structureless to structured without losing your vibe
So we're running Apache ZooKeeper. Now What? By Camille Fournier
The Role of CTO: A Rantifesto
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Ad

Similar to Operations-Driven Web Services at Rent the Runway (20)

PPTX
Rent The Runway: Transitioning to Operations Driven Webservices
PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
PDF
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
PDF
Making operations visible - Nick Gallbreath
PDF
Making operations visible - devopsdays tokyo 2013
PDF
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
PDF
Fluentd meetup #3
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
ODP
Non-Relational Databases: This hurts. I like it.
PDF
PyData 2015 Keynote: "A Systems View of Machine Learning"
PDF
20170624 GraphQL Presentation
PDF
Time series databases
PDF
4Developers: Time series databases
PDF
Database Survival Guide: Exploratory Webcast
PDF
Best Practices for Building and Deploying Data Pipelines in Apache Spark
PDF
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
PPTX
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
PPTX
Salesforce & SAP Integration
PDF
AWS Step Functions을 활용한 서버리스 앱 오케스트레이션
Rent The Runway: Transitioning to Operations Driven Webservices
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Making operations visible - Nick Gallbreath
Making operations visible - devopsdays tokyo 2013
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Fluentd meetup #3
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Non-Relational Databases: This hurts. I like it.
PyData 2015 Keynote: "A Systems View of Machine Learning"
20170624 GraphQL Presentation
Time series databases
4Developers: Time series databases
Database Survival Guide: Exploratory Webcast
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Salesforce & SAP Integration
AWS Step Functions을 활용한 서버리스 앱 오케스트레이션

More from Camille Fournier (7)

PDF
Building Engaged Teams in 2017
PDF
The Elements of Scaling
PDF
Hopelessness and Confidence in Distributed Systems Design
PPTX
A People's History of Microservices
PPTX
Becoming a Multiplier
PDF
Keynote talk: How to stay in love with programming (with notes)
PPTX
Keynote talk: How to stay in love with programming
Building Engaged Teams in 2017
The Elements of Scaling
Hopelessness and Confidence in Distributed Systems Design
A People's History of Microservices
Becoming a Multiplier
Keynote talk: How to stay in love with programming (with notes)
Keynote talk: How to stay in love with programming

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing

Operations-Driven Web Services at Rent the Runway

  • 1. Operations Driven Web Services -A Case Study of Service Evolution at Rent the Runway Camille Fournier, Head of Engineering @skamille Carlo Barbara, Senior Systems Engineer @CarloBarbara
  • 2. In The Beginning, There Was Drupal
  • 3. There was also all of these folks…
  • 5. Can‟t Just Burn the World Down
  • 11. Operations first…  Availability and performance of our services is critical to running our business  The software we develop has to make delivering on our SLAs possible  How (besides sane design):  Healthchecks + Nagios  Measurements  Historical Data with Graphs
  • 12. Metrics  Gauges – instantaneous value  Counters – counter with +/-  Meters – rate over time (mean, 1, 5, & 15 moving avg.)  Histograms – distribution of data (mean, median, max, std. div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)  Timers – Meter of requests & Histogram of duration (frequency & latency)
  • 13. Metrics - Healthchecks  Verify that your service is running correctly
  • 14. Metrics - Reporting  HTTP  JMX  Graphite
  • 15. Dropwizard: What is it?  Quality open source Java webservice components glued together in a modular way  Eliminates the need for picking a platform stack, it‟s all there  It‟s opinionated. If you don‟t like a Dropwizard core component, that‟s too bad, don‟t use Dropwizard  Developers focus on business logic, not framework  It‟s easy, maintainable, and it works!
  • 16. A Few Words from Coda… “I had no one I had to toss a WAR to. I had no one to stand up a Tomcat server and fiddle with it until their eyes bled. I had no one who didn't trust me to spin up my own threads or connection pools. So I wrote something which worked as simply and in as straight- forward a manner as possible because my own ass was on the line if it didn't work.”
  • 17. Dropwizard: The Ingredients  Jersey for REST  Jackson for JSON  Jetty for a webserver  Metrics for measuring  YAML for configuring  Dropwizard for weaving everything together
  • 18. Dropwizard – Healthchecks  Register hooks that check the health of your app  An HTTP endpoint that iterates over all the hooks  “The meaning of healthy” is decided by you (i. e. Database Connections, Client Connections, DeadLock Count)
  • 19. Dropwizard + Metrics  Dropwizard has lots of platform instrumentation baked in using Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)  Ability to add Timers to your endpoints with @Timed  Ability to add arbitrary metrics as you see fit
  • 20. Other Frameworks  Play 1.X  Abandonware for Play 2.X, which was still beta  Magic  Glassfish  OSGI hell  “standards”  Spring  Everything and the kitchen sink  Also I hate XML
  • 21. What do I get out of it? Dev agenda  Story telling: causation & correlation  Integral piece of the operational excellence puzzle  State of the world – Dashboards  Developers focus on features, operations is mostly free lunch  Code review & demo Disclaimer: You need graphite to really harness the value
  • 22. Story telling  The grid is slow why?  Is it load?  Is it dependent service latency?  How does that compare to yesterday  JVM throws out of memory, what‟s the problem?  What does the GC jigsaw look?  When did it change?  Is it correlated with increased load?  How is that new „performance‟ tweak?  If you never measured, then you didn‟t tune. True story!  What does my 5XX graph look like?
  • 23. Operational Excellence: The ingredients  Application Instrumentation (Dropwizard)  Time Series Data & Graphing (Graphite, D3)  Centralized logging & log parsing (Rsyslog, Logstash, Nagios)  Automated alerting & escalation (Pagerduty) DW & Graphite will get you very far, but if you want total control & visibility you need the rest. This is the stack that RTR is moving towards, rather than relying on basic java logging smtp appenders
  • 24. OMG, we are on GMA, are we OK?  10+ services  Each services runs in a cluster behind an LB  „OK‟ is somewhat service specific Basically you need a lot of info at your fingertips. Pictures are worth a thousand words. Get yourself some dashboards!
  • 26. Tasseo dashboard (D3) • Red, Yellow, & Green Lights • Realtime • Endless cool things: graphite + D3 If we see yellow or red, start diagnosing
  • 27. Free Lunch? Not really  DB connection pool monitoring  Http client connection pool monitoring  JVM Heap & GC info  Http Server response counts  Http Server connection info  Endpoint duration & throughput stats
  • 28. Where do I sign up?  You install Graphite, one time hit + some TLC. Medium Difficulty  You annotate your endpoints and maybe add finer telemetry. Easy  You configure so your service is feeding into graphite. Hopefully consistently across services, via a „Bundle‟. Easy
  • 29. Demo  Show a simple dropwizard codebase  Do some curls  Show the admin endpoints
  • 31. Presenters  @CarloBarbara (www.cabkata.com)  @Skamille (whilefalse.blogspot.com)  Rent The Runway is hiring! (renttherunway.com/careers)