SlideShare a Scribd company logo
Techniques for Scaling the
       Netflix API
      By Daniel Jacobson
       @daniel_jacobson
    djacobson@netflix.com


                             QCon SF 2011
Techniques for Scaling the Netflix API

                      Agenda

•   History of the Netflix API
•   The Cloud
•   Development and Testing
•   Resiliency
•   Future of the Netflix API
Techniques for Scaling the Netflix API

                      Agenda

•   History of the Netflix API
•   The Cloud
•   Development and Testing
•   Resiliency
•   Future of the Netflix API
Netflix
 API
Netflix API Requests by Audience
          At Launch In 2008




                              Netflix Devices
                              Open API Developers
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SF
Netflix
 API
Netflix API Requests by Audience
   At Launch                         Today




               Netflix Devices
               Open API Developers
Public API




             Private API
Current Emphasis of Netflix API




                         Netflix Devices
Techniques for Scaling the Netflix API

                      Agenda

•   History of the Netflix API
•   The Cloud
•   Development and Testing
•   Resiliency
•   Future of the Netflix API
Techniques for Scaling the Netflix API - QCon SF
Discovery
Discovery
Streaming
Netflix API Powers Discovery
Netflix API : Requests Per Month
                       35




                       30




                       25
Requests in Billions




                       20




                       15




                       10




                        5




                       -
Netflix API : Requests Per Month
                       35




                       30




                       25
Requests in Billions




                       20




                       15




                       10




                        5




                       -
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SF
AWS Cloud
Techniques for Scaling the Netflix API - QCon SF
Autoscaling
Autoscaling
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API

                      Agenda

•   History of the Netflix API
•   The Cloud
•   Development and Testing
•   Resiliency
•   Future of the Netflix API
Development / Testing
     Philosophy
    Act fast, react fast
That Doesn’t Mean We Don’t Test
•   Unit tests
•   Functional tests
•   Regression scripts
•   Continuous integration
•   Capacity planning
•   Load / Performance tests
Development         Contiuous
                                 Run Unit Tests
(Feature & Test)   Integration




   Perform
                    Deploy to    Run Functional
 Integration
                   Staging Env       Tests
    Tests




 Deploy to
 Customers
Cloud-Based Deployment Techniques
API Requests from
                   the Internet




                                                           Problem!




Current Code

In Production




                                     Single Canary Instance
                            To Test New Code with Production Traffic
                                (typically around 1-5% of traffic)
API Requests from
                   the Internet




Current Code

In Production
API Requests from
                   the Internet




Current Code                              New Code

In Production                   Getting Prepared for Production
API Requests from
                           the Internet




      Old Code                              Current Code

Prepared For Rollback                       In Production
API Requests from
                                 the Internet




         Old Code                                    New Code

Rolled Back into Production                       Out of Production
API Requests from
                   the Internet




Current Code

In Production
API Requests from
                   the Internet




Current Code                              New Code

In Production                   Getting Prepared for Production
API Requests from
                           the Internet




      Old Code                              Current Code

Prepared For Rollback                       In Production
API Requests from
   the Internet




                    Current Code

                    In Production
Development                          Continuous
                   Run Unit Tests
(Feature & Test)                     Integration




     Run
                     Deploy to      Run Functional
 Integration
                    Staging Env         Tests
    Tests



                      Perform
Deploy Canary
                      Canary
 Instance(s)
                      Analysis




 Deploy to         Perform Black    Deploy Black
 Customers            Analysis       Instances
Techniques for Scaling the Netflix API

                      Agenda

•   History of the Netflix API
•   The Cloud
•   Development and Testing
•   Resiliency
•   Future of the Netflix API
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
Circuit Breaker Dashboard
Techniques for Scaling the Netflix API - QCon SF
Call Volume and Health / Last 10 Seconds
Call Volume / Last 2 Minutes
Successful Requests
Successful, But Slower Than Expected
Short-Circuited Requests, Delivering Fallbacks
Timeouts, Delivering Fallbacks
Thread Pool & Task Queue Full, Delivering Fallbacks
Exceptions, Delivering Fallbacks
# + # + # + # / (# + # + # + # + #) = Error Rate
                                                   Error Rate
Status of Fallback Circuit
Requests per Second, Over Last 10 Seconds
SLA Information
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
API




Personaliz
                          Movie     Movie     Similar             A/B Test
  ation      User Info                                  Reviews
 Engine
                         Metadata   Ratings   Movies               Engine
API




                         Fallback


Personaliz
                                Movie     Movie     Similar             A/B Test
  ation      User Info                                        Reviews
 Engine
                               Metadata   Ratings   Movies               Engine
API




                         Fallback


Personaliz
                                Movie     Movie     Similar             A/B Test
  ation      User Info                                        Reviews
 Engine
                               Metadata   Ratings   Movies               Engine
Techniques for Scaling the Netflix API

                      Agenda

•   History of the Netflix API
•   The Cloud
•   Development and Testing
•   Resiliency
•   Future of the Netflix API
Netflix
 API
Netflix API Requests by Audience
     Supporting Streaming Devices Today




                                      Netflix Devices
                                      Open API Developers
Techniques for Scaling the Netflix API - QCon SF
Redesign the API
Netflix API : Requests Per Month
                       35




                       30




                       25
Requests in Billions




                       20




                       15




                       10




                        5




                       -
Growth of the Netflix API




Over 30 Billion requests per month
    (Peaks at about 20,000 requests per second)
Techniques for Scaling the Netflix API - QCon SF
<catalog_titles>
 <number_of_results>1140</number_of_results>
 <start_index>0</start_index>
 <results_per_page>10</results_per_page>
 <catalog_title>
 <id>http://api.netflix.com/catalog/titles/movies/60021896</id><title short="Star" regular="Star"></title>
 <box_art small="http://alien2.netflix.com/us/boxshots/tiny/60021896.jpg"
          medium="http://alien2.netflix.com/us/boxshots/small/60021896.jpg"
          large="http://alien2.netflix.com/us/boxshots/large/60021896.jpg"></box_art>
 <link href="http://api.netflix.com/catalog/titles/movies/60021896/synopsis"
          rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis"></link>
 <release_year>2001</release_year>
 <category scheme="http://api.netflix.com/catalog/titles/mpaa_ratings" label="NR"></category>
 <category scheme="http://api.netflix.com/categories/genres" label="Foreign"></category>
 <link href="http://api.netflix.com/catalog/titles/movies/60021896/cast"
          rel="http://schemas.netflix.com/catalog/people.cast" title="cast"></link>
<link href="http://api.netflix.com/catalog/titles/movies/60021896/screen_formats" rel="http://schemas.netflix.com/catalog/titles/screen_formats" title="screen
formats"></link
 <link href="http://api.netflix.com/catalog/titles/movies/60021896/languages_and_audio" rel="http://schemas.netflix.com/catalog/titles/languages_and_audio"
title="languages and audio"></link>
 <average_rating>1.9</average_rating>
 <link href="http://api.netflix.com/catalog/titles/movies/60021896/similars" rel="http://schemas.netflix.com/catalog/titles.similars" title="similars"></link>
 <link href="http://www.netflix.com/Movie/Star/60021896" rel="alternate" title="webpage"></link>
 </catalog_title>
 <catalog_title>
 <id>http://api.netflix.com/catalog/titles/movies/17985448</id><title short="Lone Star" regular="Lone Star"></title>
 <box_art small="http://alien2.netflix.com/us/boxshots/tiny/17985448.jpg" medium="http://alien2.netflix.com/us/boxshots/small/17985448.jpg" large=""></box_art>
 <link href="http://api.netflix.com/catalog/titles/movies/17985448/synopsis" rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis"></link>
 <release_year>1996</release_year>
 <category scheme="http://api.netflix.com/catalog/titles/mpaa_ratings" label="R"></category>
 <category scheme="http://api.netflix.com/categories/genres" label="Drama"></category>
<link href="http://api.netflix.com/catalog/titles/movies/17985448/awards" rel="http://schemas.netflix.com/catalog/titles/awards" title="awards"></link>
 <link href="http://api.netflix.com/catalog/titles/movies/17985448/format_availability" rel="http://schemas.netflix.com/catalog/titles/format_availability"
title="formats"></link>
 <link href="http://api.netflix.com/catalog/titles/movies/17985448/screen_formats" rel="http://schemas.netflix.com/catalog/titles/screen_formats" title="screen
formats"></link>
 <link href="http://api.netflix.com/catalog/titles/movies/17985448/languages_and_audio" rel="http://schemas.netflix.com/catalog/titles/languages_and_audio"
title="languages and audio"></link>
 <average_rating>3.7</average_rating>
 <link href="http://api.netflix.com/catalog/titles/movies/17985448/previews" rel="http://schemas.netflix.com/catalog/titles/previews" title="previews"></link>
 <link href="http://api.netflix.com/catalog/titles/movies/17985448/similars" rel="http://schemas.netflix.com/catalog/titles.similars" title="similars"></link>
 <link href="http://www.netflix.com/Movie/Lone_Star/17985448" rel="alternate" title="webpage"></link>
 </catalog_title>
</catalog_titles>
{"catalog_title":
{"id":"http://api.netflix.com/catalog/titles/movies/60034967",
"title":{"title_short":"Rosencrantz and Guildenstern Are Dead",
"regular":"Rosencrantz and Guildenstern Are Dead"},
"maturity_level":60,
"release_year":"1990",
"average_rating":3.7,
"box_art":{"284pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/ghd/60034967.jpg",
"110pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/large/60034967.jpg",
"38pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/tiny/60034967.jpg",
"64pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/small/60034967.jpg",
"150pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/150/60034967.jpg",
"88pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/88/60034967.jpg",
"124pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/124/60034967.jpg"},
"language":"en",
"web_page":"http://www.netflix.com/Movie/Rosencrantz_and_Guildenstern_Are_Dead/60034967",
"tiny_url":"http://movi.es/ApUP9"},
"meta":{
"expand":["@directors","@bonus_materials","@cast","@awards","@short_synopsis","@synopsis","@box_art","@screen_formats","
@"links":{"id":"http://api.netflix.com/catalog/titles/movies/60034967",
"languages_and_audio":"http://api.netflix.com/catalog/titles/movies/60034967/languages_and_audio",
"title":"http://api.netflix.com/catalog/titles/movies/60034967/title",
"screen_formats":"http://api.netflix.com/catalog/titles/movies/60034967/screen_formats",
"cast":"http://api.netflix.com/catalog/titles/movies/60034967/cast",
"awards":"http://api.netflix.com/catalog/titles/movies/60034967/awards",
"short_synopsis":"http://api.netflix.com/catalog/titles/movies/60034967/short_synopsis",
"box_art":"http://api.netflix.com/catalog/titles/movies/60034967/box_art",
"synopsis":"http://api.netflix.com/catalog/titles/movies/60034967/synopsis",
"directors":"http://api.netflix.com/catalog/titles/movies/60034967/directors",
"similars":"http://api.netflix.com/catalog/titles/movies/60034967/similars",
"format_availability":"http://api.netflix.com/catalog/titles/movies/60034967/format_availability"}
}}
Improve Efficiency of API Requests




Could it have been 5 billion requests per month? Or less?
             (Assuming everything else remained the same)
Netflix API : Requests Per Month
                      35




                      30




                      25
Request in Billions




                      20




                      15




                      10




                      5




                      0
Netflix API : Requests Per Month
                      35




                      30




                      25
Request in Billions




                      20




                      15




                      10




                      5




                      0
API Billionaires Club
13 billion API calls / day (May 2011)
Over 260 billion objects stored in S3 (January 2011)
5 billion API calls / day     (April 2010)



5 billion API calls / day     (October 2009)



1 billion API calls / day (October 2011)

8 billion API calls / month (Q3 2009)

3.2 billion API-delivered stories / month              (October
2011)

3 billion API calls / month (March 2009)


                       Courtesy of John Musser, ProgrammableWeb
API Billionaires Club
13 billion API calls / day (May 2011)
Over 260 billion objects stored in S3 (January 2011)
5 billion API calls / day     (April 2010)



5 billion API calls / day     (October 2009)



1 billion API calls / day (October 2011)

8 billion API calls / month (Q3 2009)

3.2 billion API-delivered stories / month              (October
2011)

3 billion API calls / month (March 2009)


                       Courtesy of John Musser, ProgrammableWeb
Two Major Objective for
             API Redesign

• Improve performance for devices
  – Minimize network traffic (one API call, if possible)
  – Only deliver bytes that are needed

• Improve ability for device teams to rapidly
  innovate
  – Customized request/response patterns per device
  – Deployment is independent from API schedules
Current Client / Server Interaction
CLIENT APPS        API SERVER
                                 AUTH

                                SETUP

                                 TIME

                                QUEUE

                                 LISTS


                                LIST ( i )


                                TITLES
Future Client / Server Interaction
CLIENT APPS       API SERVER


                  CUSTOM SCRIPTING TIER




                       GENERIC API
Custom Scripting Tier Interaction
CLIENT APPS               API SERVER
                                        AUTH
               CUSTOM
              SCRIPTING                SETUP
                 TIER
                                        TIME
                    PS3
                   HOME                QUEUE
                  SCREEN
                  CUSTOM                LISTS
                 ENDPOINT

                                       LIST ( i )

              Owned and operated
                by the UI teams        TITLES
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SF
For More Titles From a List
Techniques for Scaling the Netflix API - QCon SF
Generic API Interaction
CLIENT APP                API SERVER
                                        AUTH

                GENERIC                CONFIG
                 SCRIPT
                                        TIME

                  SIMPLE               QUEUE
                 CATALOG
                 REQUEST
                  TO API                LISTS


                                       LIST ( i )

              Owned and operated
                by the API team        TITLES
Technologies
                               AUTH

                              CONFIG

                               TIME
               GROOVY
                 SIMPLE
  CLIENT      COMPILED
                CATALOG   JAVAQUEUE
                REQUEST
LANGUAGE         INTO
                 TO API      LOLOMO
                               LISTS
                  JVM
                              LIST ( i )


                              TITLES
API Production Servers




  Cassandra Cluster




                         UI Engineers
Active


  1
      2
           3
               4
                   5
    SCRIPT
 REPOSITORY            6
  (DYNAMIC
DEPLOYMENT)




              API SERVER
          APPLICATION CODE
      (REQUIRES API CODE PUSHES)
Active


  1
      2
           3
               4
                   5
    SCRIPT
 REPOSITORY            6
  (DYNAMIC
                           7
DEPLOYMENT)




              API SERVER
          APPLICATION CODE
      (REQUIRES API CODE PUSHES)
1
      2
          3
              4
                              Active
                  5
    SCRIPT
 REPOSITORY           6
  (DYNAMIC
                          7
DEPLOYMENT)




              API SERVER
          APPLICATION CODE
      (REQUIRES API CODE PUSHES)
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SF
If you are interested in helping us solve these
       problems, please contact me at:

           Daniel Jacobson
          djacobson@netflix.com
             @daniel_jacobson
http://www.linkedin.com/in/danieljacobson
 http://www.slideshare.net/danieljacobson

More Related Content

PPTX
APIs for Internal Audiences - Netflix - App Dev Conference
PPTX
API Revolutions : Netflix's API Redesign
PPTX
Scaling the Netflix API
PPTX
Redesigning the Netflix API - OSCON
PPTX
Set Your Content Free! : Case Studies from Netflix and NPR
PPTX
Netflix API: Keynote at Disney Tech Conference
PPTX
Scaling the Netflix API - OSCON
PPTX
Maintaining the Netflix Front Door - Presentation at Intuit Meetup
APIs for Internal Audiences - Netflix - App Dev Conference
API Revolutions : Netflix's API Redesign
Scaling the Netflix API
Redesigning the Netflix API - OSCON
Set Your Content Free! : Case Studies from Netflix and NPR
Netflix API: Keynote at Disney Tech Conference
Scaling the Netflix API - OSCON
Maintaining the Netflix Front Door - Presentation at Intuit Meetup

What's hot (20)

PPTX
Presentation to ESPN about the Netflix API
PPTX
Netflix API - Presentation to PayPal
PPTX
Netflix API : BAPI 2011 Presentation : SF
PPTX
Scaling the Netflix API - From Atlassian Dev Den
PPTX
Netflix API
PPTX
The future-of-netflix-api
PPTX
Top 10 Lessons Learned from the Netflix API - OSCON 2014
PPTX
Netflix API - Separation of Concerns
PPTX
History and Future of the Netflix API - Mashery Evolution of Distribution
PPTX
Maintaining the Front Door to Netflix
PPTX
Why API? - Business of APIs Conference
PPTX
Huge: Running an API at Scale
PPTX
API Design - When to buck the trend (Webcast)
PPTX
API Trends: What to expect in 2012
PPTX
Developers are People Too! Building a DX-Based API Strategy Ronnie Mitra, Pri...
PDF
Migrating Automation Tests to Postman Monitors and ROI
PPTX
Essential API Facade Patterns - Composition (Episode 1)
PPTX
The API Facade Pattern: Overview - Episode 1
PPTX
The API Facade Pattern: Technology - Episode 3
PPTX
Essential API Facade Patterns: One Phase to Two Phase Conversion (Episode 3)
Presentation to ESPN about the Netflix API
Netflix API - Presentation to PayPal
Netflix API : BAPI 2011 Presentation : SF
Scaling the Netflix API - From Atlassian Dev Den
Netflix API
The future-of-netflix-api
Top 10 Lessons Learned from the Netflix API - OSCON 2014
Netflix API - Separation of Concerns
History and Future of the Netflix API - Mashery Evolution of Distribution
Maintaining the Front Door to Netflix
Why API? - Business of APIs Conference
Huge: Running an API at Scale
API Design - When to buck the trend (Webcast)
API Trends: What to expect in 2012
Developers are People Too! Building a DX-Based API Strategy Ronnie Mitra, Pri...
Migrating Automation Tests to Postman Monitors and ROI
Essential API Facade Patterns - Composition (Episode 1)
The API Facade Pattern: Overview - Episode 1
The API Facade Pattern: Technology - Episode 3
Essential API Facade Patterns: One Phase to Two Phase Conversion (Episode 3)
Ad

Viewers also liked (20)

PDF
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
PPTX
Netflix incloudsmarch8 2011forwiki
PPTX
David Pearl: Analysis Netflix
PDF
Lafarge
PDF
Evolving the Netflix API
PDF
Canary Analyze All the Things
PPTX
Netflix Edge Engineering Open House Presentations - June 9, 2016
DOC
Monografia: A prática da Leitura no Ensino de Lingua Inglesa
PPTX
Lafarge ppt
PPTX
Maintaining the Front Door to Netflix : The Netflix API
PPTX
Competition in indian cement industry a case of collusive
PDF
Lafarge Case Study
PPTX
Netflix
PPTX
From SOA to MSA
PDF
Netflix: From Clouds to Roots
PPTX
MicroServices at Netflix - challenges of scale
PPTX
Lafarge Cement Value Chain
PPTX
Photographs I used & Editing
PPT
Isamar guerra
PDF
会津IT秋フォーラム2012での講演資料
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
Netflix incloudsmarch8 2011forwiki
David Pearl: Analysis Netflix
Lafarge
Evolving the Netflix API
Canary Analyze All the Things
Netflix Edge Engineering Open House Presentations - June 9, 2016
Monografia: A prática da Leitura no Ensino de Lingua Inglesa
Lafarge ppt
Maintaining the Front Door to Netflix : The Netflix API
Competition in indian cement industry a case of collusive
Lafarge Case Study
Netflix
From SOA to MSA
Netflix: From Clouds to Roots
MicroServices at Netflix - challenges of scale
Lafarge Cement Value Chain
Photographs I used & Editing
Isamar guerra
会津IT秋フォーラム2012での講演資料
Ad

Similar to Techniques for Scaling the Netflix API - QCon SF (20)

PPTX
AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army
PPTX
Move Fast;Stay Safe:Developing & Deploying the Netflix API
PDF
The Netflix API for a global service
PDF
My Web Performance Dirty Secrets
PDF
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
PDF
Evolution of the Netflix API
PPTX
API Strategy Evolution at Netflix
PDF
Netflix on Cloud - combined slides for Dev and Ops
ZIP
Performance and Fault Tolerance for the Netflix API
PDF
The Cloud: A game changer to test, at scale and in production, SOA based web...
PDF
The new Netflix API
PDF
Netflix Playback Access Team
PPTX
Open API Strategy, by Sensedia
PDF
How To Train Your Microservice
PDF
Building a Great Web API - Evan Cooke - QCON 2011
PDF
Keeping Movies Running Amid Thunderstorms!
PDF
API Design & Moving from Junior to Senior Developer
PDF
Business of APIs Conference 2011 - Netflix
PDF
Keynote: Faster, Better, Cheaper: Pick all Three! By Miles Ward of Google
PDF
How to Introduce Continuous Delivery
AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army
Move Fast;Stay Safe:Developing & Deploying the Netflix API
The Netflix API for a global service
My Web Performance Dirty Secrets
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Evolution of the Netflix API
API Strategy Evolution at Netflix
Netflix on Cloud - combined slides for Dev and Ops
Performance and Fault Tolerance for the Netflix API
The Cloud: A game changer to test, at scale and in production, SOA based web...
The new Netflix API
Netflix Playback Access Team
Open API Strategy, by Sensedia
How To Train Your Microservice
Building a Great Web API - Evan Cooke - QCON 2011
Keeping Movies Running Amid Thunderstorms!
API Design & Moving from Junior to Senior Developer
Business of APIs Conference 2011 - Netflix
Keynote: Faster, Better, Cheaper: Pick all Three! By Miles Ward of Google
How to Introduce Continuous Delivery

More from Daniel Jacobson (6)

PPT
NPR Presentation at Wolfram Data Summit 2010
PPT
NPR: Digital Distribution Strategy: OSCON2010
PPT
NPR's Digital Distribution and Mobile Strategy
PPT
NPR API Usage and Metrics
PPT
OpenID Adoption UX Summit
PPT
NPR : Examples of COPE
NPR Presentation at Wolfram Data Summit 2010
NPR: Digital Distribution Strategy: OSCON2010
NPR's Digital Distribution and Mobile Strategy
NPR API Usage and Metrics
OpenID Adoption UX Summit
NPR : Examples of COPE

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Monthly Chronicles - July 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

Techniques for Scaling the Netflix API - QCon SF

  • 1. Techniques for Scaling the Netflix API By Daniel Jacobson @daniel_jacobson djacobson@netflix.com QCon SF 2011
  • 2. Techniques for Scaling the Netflix API Agenda • History of the Netflix API • The Cloud • Development and Testing • Resiliency • Future of the Netflix API
  • 3. Techniques for Scaling the Netflix API Agenda • History of the Netflix API • The Cloud • Development and Testing • Resiliency • Future of the Netflix API
  • 5. Netflix API Requests by Audience At Launch In 2008 Netflix Devices Open API Developers
  • 9. Netflix API Requests by Audience At Launch Today Netflix Devices Open API Developers
  • 10. Public API Private API
  • 11. Current Emphasis of Netflix API Netflix Devices
  • 12. Techniques for Scaling the Netflix API Agenda • History of the Netflix API • The Cloud • Development and Testing • Resiliency • Future of the Netflix API
  • 17. Netflix API Powers Discovery
  • 18. Netflix API : Requests Per Month 35 30 25 Requests in Billions 20 15 10 5 -
  • 19. Netflix API : Requests Per Month 35 30 25 Requests in Billions 20 15 10 5 -
  • 27. Techniques for Scaling the Netflix API Agenda • History of the Netflix API • The Cloud • Development and Testing • Resiliency • Future of the Netflix API
  • 28. Development / Testing Philosophy Act fast, react fast
  • 29. That Doesn’t Mean We Don’t Test • Unit tests • Functional tests • Regression scripts • Continuous integration • Capacity planning • Load / Performance tests
  • 30. Development Contiuous Run Unit Tests (Feature & Test) Integration Perform Deploy to Run Functional Integration Staging Env Tests Tests Deploy to Customers
  • 32. API Requests from the Internet Problem! Current Code In Production Single Canary Instance To Test New Code with Production Traffic (typically around 1-5% of traffic)
  • 33. API Requests from the Internet Current Code In Production
  • 34. API Requests from the Internet Current Code New Code In Production Getting Prepared for Production
  • 35. API Requests from the Internet Old Code Current Code Prepared For Rollback In Production
  • 36. API Requests from the Internet Old Code New Code Rolled Back into Production Out of Production
  • 37. API Requests from the Internet Current Code In Production
  • 38. API Requests from the Internet Current Code New Code In Production Getting Prepared for Production
  • 39. API Requests from the Internet Old Code Current Code Prepared For Rollback In Production
  • 40. API Requests from the Internet Current Code In Production
  • 41. Development Continuous Run Unit Tests (Feature & Test) Integration Run Deploy to Run Functional Integration Staging Env Tests Tests Perform Deploy Canary Canary Instance(s) Analysis Deploy to Perform Black Deploy Black Customers Analysis Instances
  • 42. Techniques for Scaling the Netflix API Agenda • History of the Netflix API • The Cloud • Development and Testing • Resiliency • Future of the Netflix API
  • 43. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 44. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 45. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 46. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 49. Call Volume and Health / Last 10 Seconds
  • 50. Call Volume / Last 2 Minutes
  • 52. Successful, But Slower Than Expected
  • 55. Thread Pool & Task Queue Full, Delivering Fallbacks
  • 57. # + # + # + # / (# + # + # + # + #) = Error Rate Error Rate
  • 59. Requests per Second, Over Last 10 Seconds
  • 61. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 62. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 63. API Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 64. API Fallback Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 65. API Fallback Personaliz Movie Movie Similar A/B Test ation User Info Reviews Engine Metadata Ratings Movies Engine
  • 66. Techniques for Scaling the Netflix API Agenda • History of the Netflix API • The Cloud • Development and Testing • Resiliency • Future of the Netflix API
  • 68. Netflix API Requests by Audience Supporting Streaming Devices Today Netflix Devices Open API Developers
  • 71. Netflix API : Requests Per Month 35 30 25 Requests in Billions 20 15 10 5 -
  • 72. Growth of the Netflix API Over 30 Billion requests per month (Peaks at about 20,000 requests per second)
  • 74. <catalog_titles> <number_of_results>1140</number_of_results> <start_index>0</start_index> <results_per_page>10</results_per_page> <catalog_title> <id>http://api.netflix.com/catalog/titles/movies/60021896</id><title short="Star" regular="Star"></title> <box_art small="http://alien2.netflix.com/us/boxshots/tiny/60021896.jpg" medium="http://alien2.netflix.com/us/boxshots/small/60021896.jpg" large="http://alien2.netflix.com/us/boxshots/large/60021896.jpg"></box_art> <link href="http://api.netflix.com/catalog/titles/movies/60021896/synopsis" rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis"></link> <release_year>2001</release_year> <category scheme="http://api.netflix.com/catalog/titles/mpaa_ratings" label="NR"></category> <category scheme="http://api.netflix.com/categories/genres" label="Foreign"></category> <link href="http://api.netflix.com/catalog/titles/movies/60021896/cast" rel="http://schemas.netflix.com/catalog/people.cast" title="cast"></link> <link href="http://api.netflix.com/catalog/titles/movies/60021896/screen_formats" rel="http://schemas.netflix.com/catalog/titles/screen_formats" title="screen formats"></link <link href="http://api.netflix.com/catalog/titles/movies/60021896/languages_and_audio" rel="http://schemas.netflix.com/catalog/titles/languages_and_audio" title="languages and audio"></link> <average_rating>1.9</average_rating> <link href="http://api.netflix.com/catalog/titles/movies/60021896/similars" rel="http://schemas.netflix.com/catalog/titles.similars" title="similars"></link> <link href="http://www.netflix.com/Movie/Star/60021896" rel="alternate" title="webpage"></link> </catalog_title> <catalog_title> <id>http://api.netflix.com/catalog/titles/movies/17985448</id><title short="Lone Star" regular="Lone Star"></title> <box_art small="http://alien2.netflix.com/us/boxshots/tiny/17985448.jpg" medium="http://alien2.netflix.com/us/boxshots/small/17985448.jpg" large=""></box_art> <link href="http://api.netflix.com/catalog/titles/movies/17985448/synopsis" rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis"></link> <release_year>1996</release_year> <category scheme="http://api.netflix.com/catalog/titles/mpaa_ratings" label="R"></category> <category scheme="http://api.netflix.com/categories/genres" label="Drama"></category> <link href="http://api.netflix.com/catalog/titles/movies/17985448/awards" rel="http://schemas.netflix.com/catalog/titles/awards" title="awards"></link> <link href="http://api.netflix.com/catalog/titles/movies/17985448/format_availability" rel="http://schemas.netflix.com/catalog/titles/format_availability" title="formats"></link> <link href="http://api.netflix.com/catalog/titles/movies/17985448/screen_formats" rel="http://schemas.netflix.com/catalog/titles/screen_formats" title="screen formats"></link> <link href="http://api.netflix.com/catalog/titles/movies/17985448/languages_and_audio" rel="http://schemas.netflix.com/catalog/titles/languages_and_audio" title="languages and audio"></link> <average_rating>3.7</average_rating> <link href="http://api.netflix.com/catalog/titles/movies/17985448/previews" rel="http://schemas.netflix.com/catalog/titles/previews" title="previews"></link> <link href="http://api.netflix.com/catalog/titles/movies/17985448/similars" rel="http://schemas.netflix.com/catalog/titles.similars" title="similars"></link> <link href="http://www.netflix.com/Movie/Lone_Star/17985448" rel="alternate" title="webpage"></link> </catalog_title> </catalog_titles>
  • 75. {"catalog_title": {"id":"http://api.netflix.com/catalog/titles/movies/60034967", "title":{"title_short":"Rosencrantz and Guildenstern Are Dead", "regular":"Rosencrantz and Guildenstern Are Dead"}, "maturity_level":60, "release_year":"1990", "average_rating":3.7, "box_art":{"284pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/ghd/60034967.jpg", "110pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/large/60034967.jpg", "38pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/tiny/60034967.jpg", "64pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/small/60034967.jpg", "150pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/150/60034967.jpg", "88pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/88/60034967.jpg", "124pix_w":"http://cdn-7.nflximg.com/en_US/boxshots/124/60034967.jpg"}, "language":"en", "web_page":"http://www.netflix.com/Movie/Rosencrantz_and_Guildenstern_Are_Dead/60034967", "tiny_url":"http://movi.es/ApUP9"}, "meta":{ "expand":["@directors","@bonus_materials","@cast","@awards","@short_synopsis","@synopsis","@box_art","@screen_formats"," @"links":{"id":"http://api.netflix.com/catalog/titles/movies/60034967", "languages_and_audio":"http://api.netflix.com/catalog/titles/movies/60034967/languages_and_audio", "title":"http://api.netflix.com/catalog/titles/movies/60034967/title", "screen_formats":"http://api.netflix.com/catalog/titles/movies/60034967/screen_formats", "cast":"http://api.netflix.com/catalog/titles/movies/60034967/cast", "awards":"http://api.netflix.com/catalog/titles/movies/60034967/awards", "short_synopsis":"http://api.netflix.com/catalog/titles/movies/60034967/short_synopsis", "box_art":"http://api.netflix.com/catalog/titles/movies/60034967/box_art", "synopsis":"http://api.netflix.com/catalog/titles/movies/60034967/synopsis", "directors":"http://api.netflix.com/catalog/titles/movies/60034967/directors", "similars":"http://api.netflix.com/catalog/titles/movies/60034967/similars", "format_availability":"http://api.netflix.com/catalog/titles/movies/60034967/format_availability"} }}
  • 76. Improve Efficiency of API Requests Could it have been 5 billion requests per month? Or less? (Assuming everything else remained the same)
  • 77. Netflix API : Requests Per Month 35 30 25 Request in Billions 20 15 10 5 0
  • 78. Netflix API : Requests Per Month 35 30 25 Request in Billions 20 15 10 5 0
  • 79. API Billionaires Club 13 billion API calls / day (May 2011) Over 260 billion objects stored in S3 (January 2011) 5 billion API calls / day (April 2010) 5 billion API calls / day (October 2009) 1 billion API calls / day (October 2011) 8 billion API calls / month (Q3 2009) 3.2 billion API-delivered stories / month (October 2011) 3 billion API calls / month (March 2009) Courtesy of John Musser, ProgrammableWeb
  • 80. API Billionaires Club 13 billion API calls / day (May 2011) Over 260 billion objects stored in S3 (January 2011) 5 billion API calls / day (April 2010) 5 billion API calls / day (October 2009) 1 billion API calls / day (October 2011) 8 billion API calls / month (Q3 2009) 3.2 billion API-delivered stories / month (October 2011) 3 billion API calls / month (March 2009) Courtesy of John Musser, ProgrammableWeb
  • 81. Two Major Objective for API Redesign • Improve performance for devices – Minimize network traffic (one API call, if possible) – Only deliver bytes that are needed • Improve ability for device teams to rapidly innovate – Customized request/response patterns per device – Deployment is independent from API schedules
  • 82. Current Client / Server Interaction CLIENT APPS API SERVER AUTH SETUP TIME QUEUE LISTS LIST ( i ) TITLES
  • 83. Future Client / Server Interaction CLIENT APPS API SERVER CUSTOM SCRIPTING TIER GENERIC API
  • 84. Custom Scripting Tier Interaction CLIENT APPS API SERVER AUTH CUSTOM SCRIPTING SETUP TIER TIME PS3 HOME QUEUE SCREEN CUSTOM LISTS ENDPOINT LIST ( i ) Owned and operated by the UI teams TITLES
  • 88. For More Titles From a List
  • 90. Generic API Interaction CLIENT APP API SERVER AUTH GENERIC CONFIG SCRIPT TIME SIMPLE QUEUE CATALOG REQUEST TO API LISTS LIST ( i ) Owned and operated by the API team TITLES
  • 91. Technologies AUTH CONFIG TIME GROOVY SIMPLE CLIENT COMPILED CATALOG JAVAQUEUE REQUEST LANGUAGE INTO TO API LOLOMO LISTS JVM LIST ( i ) TITLES
  • 92. API Production Servers Cassandra Cluster UI Engineers
  • 93. Active 1 2 3 4 5 SCRIPT REPOSITORY 6 (DYNAMIC DEPLOYMENT) API SERVER APPLICATION CODE (REQUIRES API CODE PUSHES)
  • 94. Active 1 2 3 4 5 SCRIPT REPOSITORY 6 (DYNAMIC 7 DEPLOYMENT) API SERVER APPLICATION CODE (REQUIRES API CODE PUSHES)
  • 95. 1 2 3 4 Active 5 SCRIPT REPOSITORY 6 (DYNAMIC 7 DEPLOYMENT) API SERVER APPLICATION CODE (REQUIRES API CODE PUSHES)
  • 98. If you are interested in helping us solve these problems, please contact me at: Daniel Jacobson djacobson@netflix.com @daniel_jacobson http://www.linkedin.com/in/danieljacobson http://www.slideshare.net/danieljacobson

Editor's Notes

  • #5: When the Netflix API launched three years ago, it was to “let 1,000 flowers bloom”. Today, that API still exists with almost 23,000 flowers.
  • #6: At that time, it was exclusively a public API.
  • #7: Some of the apps developed by the 1,000 flowers.
  • #8: Then streaming started taking off for Netflix, first with computer-based streaming… At that time, it was still experimental and did not draw from the API.
  • #9: But over time, as we added more devices, they started drawing their metadata from the API. Today, almost all of our devices are powered by the API.
  • #10: As a result, today’s consumption is almost entirely from private APIs that service the devices. The Netflix devices account for 99.7% of the API traffic while the public API represents only about .3%.
  • #11: The Netflix API represents the iceberg model for APIs. That is, public APIs represent a relatively small percentage of the value for the company, but they are typically the most visible part of the API program. They equate to the small part of the iceberg that is above water, in open sight. Conversely, the private APIs that drive web sites, mobile phones, device implementations, etc. account for the vast majority of the value for many companies, although people outside of the company often are not aware of them. These APIs equate to the large, hard to see mass of ice underwater. In the API space, most companies get attracted to the tip of the iceberg because that is what they are aware of. As a result, many companies seek to pursue a public API program. Over time, however, after more inspection into the value propositions of APIs, it becomes clear to many that the greatest value is in the private APIs.
  • #12: As a result, the current emphasis for the Netflix API is on the majority case… supporting the Netflix
  • #14: There are basically two types of interactions between Netflix customers and our streaming application… Discovery and Streaming.
  • #15: Discovery is basically any event with a title other than streaming it. That includes browsing titles, looking for something watch, etc.
  • #16: It also includes actions such as rating the title, adding it to your instant queue, etc.
  • #17: Once the customer has identified a title to watch through the Discovery experience, the user can then play that title. Once the Play button is selected, the customer is sent to a different internal service that focuses on handling the streaming. That streaming service also interacts with our CDNs to actually deliver the streaming bits to the device for playback.
  • #18: The API powers the Discovery experience. The rest of these slides will only focus on Discovery, not Streaming.
  • #19: As Discovery events grow, so does the growth of the Netflix API. Discovery continues to grow for a variety of reasons, including more devices, more customers, richer UI experiences, etc.
  • #20: As API traffic grows, so do the infrastructural needs. The more requests, the more servers we need, the more time spent supporting those servers, the higher the costs associated with this support, etc.
  • #21: And our international expansion will only add complexity and more scaling issues.
  • #22: The traditional model is to have systems administrators go into server rooms like this one to build out new servers, etc.
  • #23: Rather than relying on data centers, we have moved everything to the cloud! Enables rapid scaling with relative ease. Adding new servers, in new locations, take minutes. And this is critical when the service needs to grow from 1B requests a month to 1B requests a day in a year.
  • #24: Instead of going into server rooms, we go into a web page like this one. Within minutes, we can spin up new servers to support growing demands.
  • #25: Throughautoscaling in the cloud, we can also dynamically grow our server farm in concert with the traffic that we receive.
  • #26: So, instead of buying new servers based on projected spikes in traffic and having systems administrators add them to the farm, the cloud can dynamically and automatically add and remove servers based on need.
  • #27: And as we continue to expand internationally, we can easily scale up in new regions, closer to the customer base that we are trying to serve, as long as Amazon has a location near there.
  • #29: As a general practice, Netflix focuses on getting code into production as quickly as possible to expose features to new audiences.
  • #30: That said, we do spend a lot of time testing. We have just adopted some new techniques to help us learn more about what the new code will look like in production.
  • #31: Prior to these new changes, our flow looked something like this…
  • #32: That flow has changed with the addition of new techniques, such as canary deployments and what we call red/black deployments.
  • #33: The canary deployments are comparable to canaries in coal mines. We have many servers in production running the current codebase. We will then introduce a single (or perhaps a few) new servers into production running new code. Monitoring the canary servers will show what the new code will look like in production.
  • #34: If the canary shows errors, we pull it/them down, re-evaluate the new code, debug it, etc. We will then repeat the process until the analysis of canary servers look good.
  • #35: If the new code looks good in the canary, we can then use a technique that we call Red/Black Deployments to launch the code. Start with Red, where production code is running. Fire up a new set of servers (Black) equal to the count in Red with the new code.
  • #36: Then switch the pointer to have external requests draw from the Black servers.
  • #37: If a problem is encountered from the Black servers, it is easy to rollback quickly by switching the pointer back to Red. We will then re-evaluate the new code, debug it, etc.
  • #38: Once we have debugged the code, we will put another canary up to evaluate the new changes in production.
  • #39: If the new code looks good in the canary, we can then bring up another set of servers with the new code.
  • #40: Then we will switch production traffic to the new code.
  • #41: Then switch the pointer to have external requests draw from the Black servers. If everything still looks good, we disable the Red servers and the new code becomes the new red servers.
  • #42: So, the development and testing flow now looks more like this…
  • #44: At Netflix, we have a range of engineering teams who focus on specific problem sets. Some teams focus on creating rich presentation layers on various devices. Others focus on metadata and algorithms. For the streaming application to work, the metadata from the services needs to make it to the devices. That is where the API comes in. The API essentially acts as a broken, moving the metadata from inside the Netflix system to the devices.
  • #45: Given the position of the API within the overall system, the API depends on a large number of underlying systems (only some of which are represented here). Moreover, a large number of devices depend on the API (only some of which are represented here). Sometimes, one of these underlying systems experiences an outage.
  • #46: In the past, such an outage could result in an outage in the API.
  • #47: And if that outage cascades to the API, it is likely to have some kind of substantive impact on the devices. The challenge for the API team is to be resilient against dependency outages, to ultimately insulate Netflix customers from low level system problems.
  • #48: To achieve this, we implemented a series of circuit breakers for each library that we depend on. Each circuit breaker controls the interaction between the API and that dependency. This image is a view of the dependency monitor that allows us to view the health and activity of each dependency. This dashboard is designed to give a real-time view of what is happening with these dependencies (over the last two minutes). We have other dashboards that provide insight into longer-term trends, day-over-day views, etc.
  • #49: This is a view of asingle circuit.
  • #50: This circle represents the call volume and health of the dependency over the last 10 seconds. This circle is meant to be a visual indicator for health. The circle is green for healthy, yellow for borderline, and red for unhealthy. Moreover, the size of the circle represents the call volumes, where bigger circles mean more traffic.
  • #51: The blue line represents the traffic trends over the last two minutes for this dependency.
  • #52: The green number shows the number of successful calls to this dependency over the last two minutes.
  • #53: The yellow number shows the number of latent calls into the dependency. These calls ultimately return successful responses, but slower than expected.
  • #54: The blue number shows the number of calls that were handled by the short-circuited fallback mechanisms. That is, if the circuit gets tripped, the blue number will start to go up.
  • #55: The orange number shows the number of calls that have timed out, resulting in fallback responses.
  • #56: The purple number shows the number of calls that fail due to queuing issues, resulting in fallback responses.
  • #57: The red number shows the number of exceptions, resulting in fallback responses.
  • #58: The error rate is calculated from the total number of error and fallback responses divided by the total number calls handled.
  • #59: If the error rate exceeds a certain number, the circuit to the fallback scenario is automatically opened. When it returns below that threshold, the circuit is closed again.
  • #60: The dashboard also shows host and cluster information for the dependency.
  • #61: As well as information about our SLAs.
  • #62: So, going back to the engineering diagram…
  • #63: If that same service fails today…
  • #64: We simply disconnect from that service.
  • #65: And replace it with an appropriate fallback.
  • #66: Keeping our customers happy, even if the experience may be slightly degraded. It is important to note that different dependency libraries have different fallback scenarios. And some are more resilient than others. But the overall sentiment here is accurate at a high level.
  • #68: As discussed earlier, the API was originally built for the 1,000 flowers. Accordingly, today’s API design is very much grounded in the same principles for that same audience.
  • #69: But the audience of the API today is dramatically different.
  • #70: With the emphasis of the API program being on the large mass underwater – the private API.
  • #71: As a result, the current API is no longer the right tool for the job. We need a new API, designed for the present and the future. The following slides talk more about the redesign of the Netflix API to better meet the needs of the key audiences.
  • #72: We already talked about the tremendous growth in API requests…
  • #73: Metrics like 30B requests per month sound great, don’t they? The reality is that this number is concerning…
  • #74: For web sites, like NPR, where page views create ad impressions and ad impressions generate revenue, 30B requests would be amazing.
  • #75: But for systems that yield output that looks like this...
  • #76: Or this… Ad impressions are not part of the game. As a result, the increase in requests don’t translate into more revenue. In fact, they translate into more expenses. That is, to handle more requests requires more servers, more systems-admins, a potentially different application architecture, etc.
  • #77: We are challenging ourselves to redesign the API to see if those same 30B requests could have been 5 billion or perhaps even less. Through more targeted API designs based on what we have learned through our metrics, we will be able to reduce our API traffic as Netflix’ overall traffic grows.
  • #78: Given the same growth charts in the API, it would be great to imagine the traffic patterns being the blue bars instead of the red ones (assuming the customer usage and user experiences remain the same).
  • #79: Similarly, with lower traffic levels for the same user experience, server and administration complexity and costs go down as well.
  • #80: To state the goal another way, John Musser maintains a list of the API Billionaires. Netflix has a pretty lofty position in that club.
  • #81: We aspire to no longer be in that exclusive company. That is one of the things the redesign strives for.
  • #83: Today, the devices call back to the API in a mostly synchronous way to retrieve granular metadata needed to start up the client UI. That model requires a large number of network transactions which are the most expensive part of the overall interaction.
  • #84: We want to break the interaction model into two types of interactions. Custom calls and generic calls.
  • #85: For highly complex or critical interfaces, we want the device to make a single call to the API to a custom endpoint. Behind that endpoint will be a script that the UI teams maintain. That script is the traffic cop for gathering and formatting the metadata needed for that UI. The script will call to backend services to get metadata, but in this model, much of this will be concurrent and progressively rendered to the devices.
  • #86: One way to think of it is to imagine a full grid of movies/TV shows as part of a Netflix UI. The red box represents the viewable area of the grid when the screen loads. In today’s REST-ful resource model, the device needs to make calls for the individual lists, then populate the individual titles with distinct calls for each title. Granted, in today’s model, the are some asynchronous calls and some of them are also performed in bulk. But this still demonstrates the chatty nature of this REST-ful API design.
  • #87: Alternatively, we believe that the custom script could easily return the desiredviewables for each list in one payload much more efficiently.
  • #88: Moreover, this model can return any payload, filling out any portions of the grid structure, in that single response. Now, we are not likely going to want to populate a grid in this way, but it is possible given the highly customizable nature of this model.
  • #89: But once we populate the complex start-up screens through the custom scripting tier, the interactions become much more predictable and device-agnostic. If you want to extend the movies in a given row, you don’t need a custom script. That is why we are exposing the Generic API as well.
  • #90: To populate the grid with more titles for more rows, it is a simple call to get more titles.
  • #91: The call pattern looks like this for the Generic API. Notice, there is no need for some of the session-start requests when using the Generic API.
  • #92: For this model, the technology stack is pretty simple. The client apps have their own languages designed for that particular device. The overall API server codebase is Java. And the custom scripts will be written in Groovy and compiled into the same JVM as the backend API code. This should help with overall performance and library sharing for more complex scripting operations.
  • #93: To publish new scripts to the system, UI engineers will publish the script to Perforce for code management. Then it will be pushed up to a Cassandra cluster in AWS, which acts as a script repository and management system. Every 30 seconds or so, a job will scan the Cassandra cluster looking for new scripts. For all new scripts, they will be pushed to the full farm of API servers to be compiled into the JVM.
  • #94: From the device perspective, there could be many scripts for a given endpoint, only one of which is active at a given time. In this case, the iPad is running off of script #2.
  • #95: New scripts can be dynamically added at any time by any team (most often by the UI engineers). The new script (script 7) will arrive in an inactive state. At that time, the script can be tested from a production server before running the device off of it.
  • #96: When the script looks good and is ready to go live, a configuration change is made and script 7 becomes active immediately across the full server farm.
  • #97: All of these changes in our redesign effort are designed to help the apps and the UI engineers run faster.
  • #98: And achieving those goals will help us keep our customers happy.