SlideShare a Scribd company logo
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Data Update - 01/27/2016vsco.co/blevishkin
Data Update - 03/17/17vsco.co/prazakj
06 APR 2017
RUBEN KOGEL ( VSCO )
RUBEN@VSCO.CO
@CHILICONDATA
Data-based User
Segmentation
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
What is VSCO?
→ tools for expression
→ a community to share, learn, and discover
vsco.co/mikelyon
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Who are VSCO users?
→ 41M monthly audience
→ 12B images served monthly
→ 70% of daily audience create
→ 73% under 25
→ 76% female
→ 81% international
• North America (22%)
• Southeast Asia (20%)
• China (16%)
• Europe (14%)
vsco.co/curtsaunders
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Segment
iOS

client
Android

client
App Store
Prod
Database
VSCO Web
3rd party
(AppAnnie..)
SQL client
Mixpanel SDK
Product

Design

Engineering

Finance
Analysts
Content

Investors

Leadership

(everyone)
periodic

delete
Data Stack at VSCO as of 3/31
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Cantor
iOS

client
Android

client
App Store
Prod
Database
VSCO Web
3rd party
(AppAnnie..)
SQL client
(Presto/
Spark)
Mixpanel SDK Events Analytics
Deep AnalysisKafka
Data Exploration
Dashboarding
New Data Stack at VSCO (in-progress)
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Why segment?
→ marketing: who should we target?
→ design: what usage do we design for?
→ product / officers: how do we grow usage?
vsco.co/evanhundelt
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Why segment?
vsco.co/evanhundelt
method insights goal
Marketers interviews persona define target audience
Designers interviews intention design intuitive UI
Analysts data behavior track usage, conversion
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
the theory
meat consumption
dairyconsumption
paleo
German dietFrench diet
mediterranean
diet
usage frequencymilesdriven
commuters
taxi driversweekenders
greenies
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
where do you draw the line??
0 20 40 60 80 100
0102030
editing usage
number of actions
numberofpeople(inthousands)
the practice
0 20 40 60 80 100
01020304050
sessions
number of actions
numberofpeople(inthousands) 0 20 40 60 80 100
010203040
publishing usage
number of actions
numberofpeople(inthousands)
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
→ k-means find the dimensions with the most separation and use that information to form “clusters”
• each additional dimension will change the output - but does it add information?
→ eliminate unnecessary input variables
• use intuition and data exploration
→ segment only on the things that matter:
• age on the platform
• sum of past behavior
• current behavior - what we want to model
→ this is an iterative process: re-do this step after running the clustering algorithm
step 1: choose the right inputs
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 2:
0 20 40 60 80 100
0200004000060000
0 1 2 3 4
010000200003000040000
→ otherwise your model assumes the gap between people
editing 1 and 2 photos counts the same as between people
editing 101 and 102 photos
→ log transform so that the gap between few actions gets
blown up and the gap between large numbers get shrieked
• log(2) - log(1) = 0.69
• log(102) - log(101) = 0.01
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 3: choose the number of clusters that make sense
balance:
→ sparseness
→ interpretability
• does it match intuition?
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 4: deliver the insights in an intuitive way
1 2 3 4 5 6
dimension 0.0 0.0 0.0 0.9 2.8 0.5
dimension 0.0 0.0 0.0 0.6 1.9 0.3
dimension 0.0 0.0 0.0 0.5 1.5 0.3
dimension 0.2 0.1 0.1 8.5 18.4 2.5
dimension 0.2 0.1 0.1 3.1 3.9 1.4
dimension 0.3 4.8 27.1 2.1 20.5 22.7
dimension 0.3 2.5 7.6 1.3 7.7 6.9
dimension 0.3 1.9 3.3 1.1 3.4 3.3
dimension 0.2 3.6 21.4 0.3 3.4 7.3
dimension 0.1 0.2 0.1 2.7 13.0 10.5
dimension 0.1 0.1 0.1 1.6 6.5 4.1
dimension 0.1 0.1 0.1 1.3 3.2 2.5
dimension 0.0 0.0 0.0 0.5 6.4 0.1
dimension 0.0 0.0 0.0 0.4 4.2 0.1
dimension 0.0 0.0 0.0 0.4 2.5 0.1
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 5: use programmatic rules to track cohorts
→ what happens if we re-compute the clusters every month?
• the algorithms will find different cohorts with different centers with every new dataset
• a user that was classified as a “super editor” one month might, with the same behavior, be classified
as a “casual editor” the next month
→ instead deduct the boundaries between the different groups from the initial cluster analysis and design
programmatic rules to classify users on an on-going basis
• more stable
• easier to explain
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
step 6: deliver dashboard or on-going classification
segmentation, over time source of the “green” segment, in each month
VSCO→CONFIDENTIAL→DONOTDISTRIBUTEVSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Summary
→ marketers, designers, and analysts use different
but complementary segmentation approaches
→ data-based segmentation is useful to track
usage; should be based on behavioral data only
→ most usage data is exponential so need algos to
identify cluster boundaries
6 steps to doing a clustering analysis
1. choose the right inputs
2. log transform (almost) everything
3. choose the number of clusters that make sense
4. deliver the insights in an intuitive way
5. use programmatic rules to track cohorts
6. deliver dashboard or on-going classification
vsco.co/sannalinn
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
Questions?
VSCO→CONFIDENTIAL→DONOTDISTRIBUTE
→

→
06 APR 2017
RUBEN KOGEL ( VSCO )
RUBEN@VSCO.CO
@CHILICONDATA

More Related Content

PPTX
Mencare products usage
PDF
SAS Visual Analytics Overview
PDF
Marketing Strategy Framework
PDF
Product Launch 101 - Nik Sharma
PPT
Branding
PPTX
Fundamentals of Omnichannel Content Strategy
PDF
100 Great Marketing Ideas
PDF
Top E-commerce Marketing Channels in 2021
Mencare products usage
SAS Visual Analytics Overview
Marketing Strategy Framework
Product Launch 101 - Nik Sharma
Branding
Fundamentals of Omnichannel Content Strategy
100 Great Marketing Ideas
Top E-commerce Marketing Channels in 2021

What's hot (10)

PPTX
Brand & Corporate Culture: The Burberry Example
PPTX
B2B Marketing Strategy check-list in 2022
PPTX
Innovations in Market Mix Modelling
PDF
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
PPTX
Zara brand personas
PPTX
A step by-step guide to calculating customer lifetime value
PDF
The Practical Pocket Guide to Account Planning [recovered] 19
PPTX
Veri analizi sunu
PPTX
Storytelling Platforms - Paid; Owned & Earned Media
PPT
Sales, sales management, sales strategy
Brand & Corporate Culture: The Burberry Example
B2B Marketing Strategy check-list in 2022
Innovations in Market Mix Modelling
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
Zara brand personas
A step by-step guide to calculating customer lifetime value
The Practical Pocket Guide to Account Planning [recovered] 19
Veri analizi sunu
Storytelling Platforms - Paid; Owned & Earned Media
Sales, sales management, sales strategy
Ad

Similar to Data based segmentation @ vsco (20)

PDF
Data based user segmentation - a practical guide for data analysts
PPT
Modelling Personalization
PDF
Going beyond recommendations: Where next for data driven design?
PPTX
Sites as intellegent centers
PDF
Personalization Tutorial at ACM Compute 2008
PPTX
Casper Radil - Doing Personas in Analytics
PDF
Mobile and The Big Data Question
PDF
Responding to Context: Using data to design experiences that care about custo...
PDF
MeasureFest July 2021 - Session Segmentation with Machine Learning
PDF
Personalizing the Consumer Experience with Data
PDF
Spocto a new paradigm
PDF
Provisional Persona Workshop 1.0
PDF
What Big Data Means for PR and Why It Matters to Us
 
PPTX
Data analytics and visualization
PPTX
Rockaway Academy #4 – Growth Hacking with Radko Sekerka (Rockaway VCT)
PPTX
Big data for sales and marketing people
PPTX
Data Science, Personalisation & Product management
PDF
Personalizing the User Experience
PDF
Big Data Analysis and Business Intelligence
PDF
Connecting the "dots" around your Consumers
Data based user segmentation - a practical guide for data analysts
Modelling Personalization
Going beyond recommendations: Where next for data driven design?
Sites as intellegent centers
Personalization Tutorial at ACM Compute 2008
Casper Radil - Doing Personas in Analytics
Mobile and The Big Data Question
Responding to Context: Using data to design experiences that care about custo...
MeasureFest July 2021 - Session Segmentation with Machine Learning
Personalizing the Consumer Experience with Data
Spocto a new paradigm
Provisional Persona Workshop 1.0
What Big Data Means for PR and Why It Matters to Us
 
Data analytics and visualization
Rockaway Academy #4 – Growth Hacking with Radko Sekerka (Rockaway VCT)
Big data for sales and marketing people
Data Science, Personalisation & Product management
Personalizing the User Experience
Big Data Analysis and Business Intelligence
Connecting the "dots" around your Consumers
Ad

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Data_Analytics_and_PowerBI_Presentation.pptx
SAP 2 completion done . PRESENTATION.pptx
Supervised vs unsupervised machine learning algorithms
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Analytics and business intelligence.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
ISS -ESG Data flows What is ESG and HowHow
oil_refinery_comprehensive_20250804084928 (1).pptx
Qualitative Qantitative and Mixed Methods.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Clinical guidelines as a resource for EBP(1).pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

Data based segmentation @ vsco