SlideShare a Scribd company logo
Cohort Analysis
at Scale
BLAKE IRVINE
STRATA SAN JOSE
2019.03.06
World Markets
World Markets
World Markets
Cohort Analysis at Scale
Cohort Analysis at Scale
Partners help us Grow
Partners are companies that make it easier
for people to sign up and engage with our
service and help us retain members.
BLAKE IRVINE | STRATA SAN JOSE 2018
BLAKE IRVINE | STRATA SAN JOSE 2018
Let’s take a (virtual) trip!
Trip to Strata San Jose
Trip to Strata San Jose
Trip to Strata San Jose
Multiple Partner Associations
Trip to Strata San Jose
● Are partners helping us acquire members?
● Through which channels?
● How does a partner impact the regional market?
● Do members use our service differently on partner devices?
● How do partners compare to each other?
Evaluating Partners
BLAKE IRVINE | STRATA SAN JOSE 2018
Cohorts are the collections of members
associated with a partner that are relevant to
the business question.
BLAKE IRVINE | STRATA SAN JOSE 2018
● Our trip example showed how one member can be associated
with many partners.
● Business teams want to explore the nuance of cohorts...
Cohorts can be complex
BLAKE IRVINE | STRATA SAN JOSE 2018
… for example
BLAKE IRVINE | STRATA SAN JOSE 2018
● 100+ million members
● Dozens of partners
● Many combinations of members and partners
● Leads to…
○ High dimensionality, high cardinality datasets
○ Very large datasets of member-level time-series activity
Evaluating Cohorts
BLAKE IRVINE | STRATA SAN JOSE 2018
Cohort Analysis at Scale
● Data platform
● Data construction
● Data product for cohort analysis
Cohort Analysis at Scale
BLAKE IRVINE | STRATA SAN JOSE 2018
Data Platform
Simplified Overview
Big Data
Portal
Data construction: Data Model
member
device
partner
playback
events
isp
billing
events
billing
processor
BLAKE IRVINE | STRATA SAN JOSE 2018
Data construction: Cohort Dataset
signup_events
cohort
playback_events
billing_events
x_events
BLAKE IRVINE | STRATA SAN JOSE 2018
Data construction: Flat Tables
playback_f cohort_playback_s
device_d
isp_d
geo_d
partner_d
cohort_d
BLAKE IRVINE | STRATA SAN JOSE 2018
Data for consumption: Flat Table
key memb
er_id
device
_id
device_name device_categor
y
partner_name country region data_payload
1 1213 674 Amazon Fire
TV
Set Top Box Amazon US Americas [{"id":5025945823792539,"sequen
ce":41,"time":1491962092955},
{"id":5025947899236389,"sequen
ce":95,"time":1491962104824}]
2 7623 1172 Chromecast Streaming
Stick
Google DE EMEA …
3 4291 129 PS3 Game
Console
Sony ES EMEA …
4 9013 447 iPad 4 Tablet Apple CA Americas …
BLAKE IRVINE | STRATA SAN JOSE 2018
Data construction: Copy
forward
BLAKE IRVINE | STRATA SAN JOSE 2018
cohort_playback_s
Big Data
Portal
● Goals
○ Serve dozens of users
○ Provide interactive / low-latency tool
○ Provide many different perspectives
● Challenges
○ Manage high dimensionality
○ Very large time-series datasets
Data Product for Cohort Analysis
BLAKE IRVINE | STRATA SAN JOSE 2018
Analytic Tool Choices
Choice 1 Choice 2 Choice 3
Analytic Tool Tool 1 Tool 2 Tool 3
Data Engine MPP Cloud In memory
Data Size 1B rows 10B rows 100M rows
Performance
(SWAG)
Up to many
minutes
Many
minutes
Several
Seconds
BLAKE IRVINE | STRATA SAN JOSE 2018
● Data stored in Druid
● Custom app built with Javascript
Choice 4...
BLAKE IRVINE | STRATA SAN JOSE 2018
● An open source data store for analytic applications
● Distributed, column-oriented, indexed architecture
● Well suited to serve our “flat” tables
Druid white paper: http://static.druid.io/docs/druid.pdf
BLAKE IRVINE | STRATA SAN JOSE 2018
● Built with Express, React, Redux, D3
● Custom UX / UI to manage views and dimensionality
● Enabled access to data served by Druid
● Enabled management of query execution and caching
BLAKE IRVINE | STRATA SAN JOSE 2018
● Video demo with simulated data
BLAKE IRVINE | STRATA SAN JOSE 2018
● PED DEMO
BLAKE IRVINE | STRATA SAN JOSE 2018
Challenges
● Dimensions (aka slice-n-dice)
○ More is always better
○ Changes require restatement
● “Typical” use cases must be met also
○ Not a solution for every data question
○ Analysts and other tools are still needed
Challenges
BLAKE IRVINE | STRATA SAN JOSE 2018
● Data volume always increases…
○ More members, more partners, more devices, more metrics
● Custom app development time is longer, and ongoing
○ But for the right use cases, worthwhile
Challenges
BLAKE IRVINE | STRATA SAN JOSE 2018
Partners help Netflix grow.
We measure partner value through cohorts.
Big data tools enable efficient analysis.
BLAKE IRVINE | STRATA SAN JOSE 2018
Thank you!
Blake Irvine - Growth Data Products
birvine@netflix.com
@blakeirvine
linkedin.com/in/blakeirvine/

More Related Content

PDF
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
PDF
User behavior analytics
PDF
Music Personalization At Spotify
PDF
Homepage Personalization at Spotify
PDF
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
PDF
Recent Trends in Personalization at Netflix
PDF
Recent Trends in Personalization: A Netflix Perspective
PDF
Running Apache NiFi with Apache Spark : Integration Options
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
User behavior analytics
Music Personalization At Spotify
Homepage Personalization at Spotify
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Recent Trends in Personalization at Netflix
Recent Trends in Personalization: A Netflix Perspective
Running Apache NiFi with Apache Spark : Integration Options

What's hot (20)

PDF
Time, Context and Causality in Recommender Systems
PDF
Contextualization at Netflix
PDF
Netflix Recommendations Feature Engineering with Time Travel
PPTX
Recommendation at Netflix Scale
PDF
Shallow and Deep Latent Models for Recommender System
PPTX
Personalized Page Generation for Browsing Recommendations
PDF
Personalizing the listening experience
PPTX
Reward Innovation for long-term member satisfaction
PDF
Artwork Personalization at Netflix
PDF
Artwork Personalization at Netflix Fernando Amat RecSys2018
PDF
Context Aware Recommendations at Netflix
PDF
Crafting Recommenders: the Shallow and the Deep of it!
PDF
Enterprise Knowledge Graph
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PDF
Missing values in recommender models
PDF
From Idea to Execution: Spotify's Discover Weekly
PPTX
Learning a Personalized Homepage
PPT
Implementing Semantic Search
PPTX
Recommender Systems
PDF
Past, Present & Future of Recommender Systems: An Industry Perspective
Time, Context and Causality in Recommender Systems
Contextualization at Netflix
Netflix Recommendations Feature Engineering with Time Travel
Recommendation at Netflix Scale
Shallow and Deep Latent Models for Recommender System
Personalized Page Generation for Browsing Recommendations
Personalizing the listening experience
Reward Innovation for long-term member satisfaction
Artwork Personalization at Netflix
Artwork Personalization at Netflix Fernando Amat RecSys2018
Context Aware Recommendations at Netflix
Crafting Recommenders: the Shallow and the Deep of it!
Enterprise Knowledge Graph
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Missing values in recommender models
From Idea to Execution: Spotify's Discover Weekly
Learning a Personalized Homepage
Implementing Semantic Search
Recommender Systems
Past, Present & Future of Recommender Systems: An Industry Perspective

Similar to Cohort Analysis at Scale (20)

PPTX
GiveSignup | RunSignup CRM Integrations
PDF
Twin Cities Eloqua User Group 092413
PDF
Before vs After: Redesigning a Website to be Useful and Informative for Devel...
PDF
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
PPT
Measuring Results And Demonstrating Value.V1
PPTX
Irina Pashina - UX Strategy Spanning Marketing and Technical Content at SAP
PDF
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
PPTX
SAP Process Mining in Action: Hear from Two Customers
PDF
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
PPTX
How to design web intelligence reports that behave like real dashboards
PDF
Discover SAP BusinessObjects BI 4.3
PDF
#askSAP Analytics Innovations Community Call: SAP 2018 strategy and Roadmap f...
PDF
SPS Cambs 07-09-18 - Getting started with Dodel Driven PowerApps
PDF
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
PDF
Analytics in Your Enterprise
PPTX
BI4.2 SP06 and Beyond: The Future of SAP BusinessObjects Webi
PPTX
Supplier Success on the Ariba Network
PPTX
PayPal Real Time Analytics
PPTX
UNIT 4 Social-Media-Marketing
PDF
#askSAP Analytics Innovations Community Call: Become an Intelligent Enterpris...
GiveSignup | RunSignup CRM Integrations
Twin Cities Eloqua User Group 092413
Before vs After: Redesigning a Website to be Useful and Informative for Devel...
How to Convert Your SAP BusinessObjects Unused Licenses to SAP Analytics Cloud
Measuring Results And Demonstrating Value.V1
Irina Pashina - UX Strategy Spanning Marketing and Technical Content at SAP
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
SAP Process Mining in Action: Hear from Two Customers
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
How to design web intelligence reports that behave like real dashboards
Discover SAP BusinessObjects BI 4.3
#askSAP Analytics Innovations Community Call: SAP 2018 strategy and Roadmap f...
SPS Cambs 07-09-18 - Getting started with Dodel Driven PowerApps
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
Analytics in Your Enterprise
BI4.2 SP06 and Beyond: The Future of SAP BusinessObjects Webi
Supplier Success on the Ariba Network
PayPal Real Time Analytics
UNIT 4 Social-Media-Marketing
#askSAP Analytics Innovations Community Call: Become an Intelligent Enterpris...

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Computer network topology notes for revision
PPT
Predictive modeling basics in data cleaning process
PPTX
Database Infoormation System (DBIS).pptx
PDF
Introduction to the R Programming Language
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Transcultural that can help you someday.
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
Reliability_Chapter_ presentation 1221.5784
oil_refinery_comprehensive_20250804084928 (1).pptx
annual-report-2024-2025 original latest.
Computer network topology notes for revision
Predictive modeling basics in data cleaning process
Database Infoormation System (DBIS).pptx
Introduction to the R Programming Language
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
Transcultural that can help you someday.
Supervised vs unsupervised machine learning algorithms
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Cohort Analysis at Scale

  • 1. Cohort Analysis at Scale BLAKE IRVINE STRATA SAN JOSE 2019.03.06
  • 8. Partners are companies that make it easier for people to sign up and engage with our service and help us retain members. BLAKE IRVINE | STRATA SAN JOSE 2018
  • 9. BLAKE IRVINE | STRATA SAN JOSE 2018
  • 10. Let’s take a (virtual) trip!
  • 11. Trip to Strata San Jose
  • 12. Trip to Strata San Jose
  • 13. Trip to Strata San Jose
  • 15. ● Are partners helping us acquire members? ● Through which channels? ● How does a partner impact the regional market? ● Do members use our service differently on partner devices? ● How do partners compare to each other? Evaluating Partners BLAKE IRVINE | STRATA SAN JOSE 2018
  • 16. Cohorts are the collections of members associated with a partner that are relevant to the business question. BLAKE IRVINE | STRATA SAN JOSE 2018
  • 17. ● Our trip example showed how one member can be associated with many partners. ● Business teams want to explore the nuance of cohorts... Cohorts can be complex BLAKE IRVINE | STRATA SAN JOSE 2018
  • 18. … for example BLAKE IRVINE | STRATA SAN JOSE 2018
  • 19. ● 100+ million members ● Dozens of partners ● Many combinations of members and partners ● Leads to… ○ High dimensionality, high cardinality datasets ○ Very large datasets of member-level time-series activity Evaluating Cohorts BLAKE IRVINE | STRATA SAN JOSE 2018
  • 21. ● Data platform ● Data construction ● Data product for cohort analysis Cohort Analysis at Scale BLAKE IRVINE | STRATA SAN JOSE 2018
  • 23. Data construction: Data Model member device partner playback events isp billing events billing processor BLAKE IRVINE | STRATA SAN JOSE 2018
  • 24. Data construction: Cohort Dataset signup_events cohort playback_events billing_events x_events BLAKE IRVINE | STRATA SAN JOSE 2018
  • 25. Data construction: Flat Tables playback_f cohort_playback_s device_d isp_d geo_d partner_d cohort_d BLAKE IRVINE | STRATA SAN JOSE 2018
  • 26. Data for consumption: Flat Table key memb er_id device _id device_name device_categor y partner_name country region data_payload 1 1213 674 Amazon Fire TV Set Top Box Amazon US Americas [{"id":5025945823792539,"sequen ce":41,"time":1491962092955}, {"id":5025947899236389,"sequen ce":95,"time":1491962104824}] 2 7623 1172 Chromecast Streaming Stick Google DE EMEA … 3 4291 129 PS3 Game Console Sony ES EMEA … 4 9013 447 iPad 4 Tablet Apple CA Americas … BLAKE IRVINE | STRATA SAN JOSE 2018
  • 27. Data construction: Copy forward BLAKE IRVINE | STRATA SAN JOSE 2018 cohort_playback_s Big Data Portal
  • 28. ● Goals ○ Serve dozens of users ○ Provide interactive / low-latency tool ○ Provide many different perspectives ● Challenges ○ Manage high dimensionality ○ Very large time-series datasets Data Product for Cohort Analysis BLAKE IRVINE | STRATA SAN JOSE 2018
  • 29. Analytic Tool Choices Choice 1 Choice 2 Choice 3 Analytic Tool Tool 1 Tool 2 Tool 3 Data Engine MPP Cloud In memory Data Size 1B rows 10B rows 100M rows Performance (SWAG) Up to many minutes Many minutes Several Seconds BLAKE IRVINE | STRATA SAN JOSE 2018
  • 30. ● Data stored in Druid ● Custom app built with Javascript Choice 4... BLAKE IRVINE | STRATA SAN JOSE 2018
  • 31. ● An open source data store for analytic applications ● Distributed, column-oriented, indexed architecture ● Well suited to serve our “flat” tables Druid white paper: http://static.druid.io/docs/druid.pdf BLAKE IRVINE | STRATA SAN JOSE 2018
  • 32. ● Built with Express, React, Redux, D3 ● Custom UX / UI to manage views and dimensionality ● Enabled access to data served by Druid ● Enabled management of query execution and caching BLAKE IRVINE | STRATA SAN JOSE 2018
  • 33. ● Video demo with simulated data BLAKE IRVINE | STRATA SAN JOSE 2018
  • 34. ● PED DEMO BLAKE IRVINE | STRATA SAN JOSE 2018
  • 36. ● Dimensions (aka slice-n-dice) ○ More is always better ○ Changes require restatement ● “Typical” use cases must be met also ○ Not a solution for every data question ○ Analysts and other tools are still needed Challenges BLAKE IRVINE | STRATA SAN JOSE 2018
  • 37. ● Data volume always increases… ○ More members, more partners, more devices, more metrics ● Custom app development time is longer, and ongoing ○ But for the right use cases, worthwhile Challenges BLAKE IRVINE | STRATA SAN JOSE 2018
  • 38. Partners help Netflix grow. We measure partner value through cohorts. Big data tools enable efficient analysis. BLAKE IRVINE | STRATA SAN JOSE 2018 Thank you!
  • 39. Blake Irvine - Growth Data Products birvine@netflix.com @blakeirvine linkedin.com/in/blakeirvine/

Editor's Notes

  • #3: In 2016, Netflix became available globally in almost all countries!