SlideShare a Scribd company logo
Large	
  Scale	
  Modeling	
  	
  
Overview	
   	
  
Ferris	
  Jumah	
  
Predic9on	
  Analy9cs	
  Innova9on	
  Summit	
  2013	
  
November	
  15th,	
  2013	
  
Large	
  Scale	
  Modeling	
  	
  
•  What	
  does	
  large	
  scale	
  modeling	
  mean	
  to	
  you?	
  	
  
“Building	
  models	
  that	
  consume	
  and	
  process	
  data	
  
sets	
  so	
  large	
  that	
  it	
  is	
  difficult	
  to	
  use	
  current	
  
modeling	
  tools	
  and	
  methods”	
  
	
  
	
  
	
  
LinkedIn	
  News	
  
LinkedIn	
  News	
  
•  Any9me	
  a	
  user	
  lands	
  on	
  their	
  homepage,	
  a	
  few	
  
items	
  from	
  our	
  news	
  product	
  are	
  recommended	
  
to	
  them	
  
•  This	
  is	
  powered	
  by	
  a	
  large	
  scale	
  recommenda9on	
  
engine	
  
•  For	
  every	
  user,	
  at	
  LinkedIn	
  Scale	
  
	
  
	
  
	
  
	
  
3M+	
  	
  	
  	
  Company	
  Pages	
  
2	
  new	
  
Members	
  per	
  second	
  
184	
  M+	
  
Monthly	
  Unique	
  Visitors	
  
2.5	
  B+	
  
Monthly	
  PageViews	
  
The	
  World’s	
  Largest	
  Professional	
  Network	
  
259,000,000	
  +	
  
Use	
  It	
  All	
  
•  Use	
  all	
  of	
  the	
  data	
  you	
  have	
  
•  Why	
  not	
  store,	
  process,	
  and	
  model	
  all	
  of	
  it?	
  	
  
•  “The	
  accuracy	
  &	
  nature	
  of	
  answers	
  you	
  get	
  on	
  
large	
  data	
  sets	
  can	
  be	
  completely	
  different	
  
from	
  what	
  you	
  see	
  on	
  small	
  samples”	
  
•  Not	
  using	
  it	
  is	
  losing	
  compe99ve	
  edge	
  
	
  
	
  
	
  
Norvig,	
  The	
  Unreasonable	
  
Effec9veness	
  of	
  Data,	
  
2013	
  
Classic	
  Jus9fica9on	
  
More	
  Data	
  Beats	
  Be^er	
  Algorithms	
  
Banko	
  and	
  Brill,	
  2001	
  
More	
  Data	
  Beats	
  Be^er	
  Algorithms	
  
•  As	
  data	
  set	
  size	
  increases,	
  your	
  specific	
  model	
  and	
  
the	
  tuning	
  ma^ers	
  a	
  lot	
  less	
  	
  
	
  
•  Can	
  worry	
  less	
  about	
  sample	
  size,	
  biases,	
  and	
  
generalizing	
  
•  Spend	
  your	
  9me	
  on	
  	
  
•  Exploratory	
  Analysis	
  
•  Feature	
  Engineering	
  
	
  
Exploratory	
  Analysis	
  
•  With	
  large	
  amounts	
  of	
  data,	
  insights	
  and	
  
hypothesis	
  present	
  themselves	
  
	
  
•  Group	
  By	
  And	
  Count	
  
•  With	
  large	
  amounts	
  of	
  data,	
  you	
  can	
  worry	
  less	
  about	
  
the	
  distribu9on	
  being	
  reflec9ve	
  of	
  the	
  popula9on	
  
•  Summary	
  Sta9s9cs	
  	
  
•  Simple	
  Correla9ons	
  
•  Constantly	
  Visualize	
  
	
  
	
  
	
  
Exploratory	
  Analysis	
  Across	
  LinkedIn	
  Members	
  
Exploratory	
  Analysis	
  Across	
  LinkedIn	
  Members	
  
•  Grouped	
  by	
  name	
  le^er	
  length	
  and	
  9tle	
  and	
  
counted	
  
•  No9ced	
  that	
  name	
  length	
  is	
  heavily	
  correlated	
  
with	
  industry	
  
•  Able	
  to	
  start	
  bootstrapping	
  models	
  
•  Quickly	
  validate	
  or	
  invalidate	
  a	
  model	
  
hypothesis	
  
•  Generalized	
  the	
  results	
  into	
  development	
  of	
  
the	
  9tle	
  standardiza9on	
  models	
  used	
  today	
  
	
  
	
  
	
  
Go	
  Deep	
  
•  Massive	
  datasets	
  lend	
  themselves	
  well	
  to	
  very	
  
granular	
  demographic	
  slicing	
  or	
  bucke9ng	
  	
  
•  Get	
  a	
  very	
  strong	
  sense	
  for	
  customer	
  segments	
  
•  Reduce	
  the	
  size	
  of	
  your	
  data	
  without	
  losing	
  too	
  much	
  
informa9on	
  
•  No9ce	
  very	
  specific	
  trends	
  that	
  you	
  can	
  be	
  confident	
  
are	
  real	
  
•  Personalize	
  deeply	
  
	
  
	
  
	
  
Go	
  Deep	
  
	
   	
  Say	
  LinkedIn	
  wants	
  to	
  sell	
  me	
  something…	
  
	
  
	
  
	
  
Large Scale Modeling Overview
Large Scale Modeling Overview
Keep	
  Going	
  
•  When	
  opera9ng	
  with	
  massive	
  sets,	
  combine	
  
several	
  
•  Tells	
  you	
  more	
  than	
  each	
  would	
  individually	
  
Large Scale Modeling Overview
Large Scale Modeling Overview
Large Scale Modeling Overview
Pigalls	
  S9ll	
  Apply	
  
Simpson’s	
  paradox	
  
Large	
  Datasets	
  	
  
Allow	
  More	
  	
  
Crea9vity	
  with	
  Features	
  
Mapping	
  LinkedIn	
  Skills,	
  	
  
+1	
  to	
  Edge	
  Weight	
  	
  
When	
  Listed	
  Concurrently	
  
Feature	
  Engineering	
  
Can	
  Your	
  Infrastructure	
  
Hang?	
  
First	
  ques9on…..	
  
Online	
  or	
  Offline?	
  
If	
  the	
  problem	
  domain	
  can	
  be	
  scoped	
  into	
  an	
  offline	
  
system,	
  it	
  usually	
  should	
  be	
  
	
  
Appropriate	
  When	
  
•  Data	
  is	
  best	
  modeled	
  in	
  transient	
  data	
  streams	
  rather	
  
than	
  persistent	
  rela9ons	
  
•  Data	
  relevance	
  or	
  freshness	
  fades	
  fast	
  
•  Too	
  much	
  data	
  to	
  store	
  (infra,	
  latency	
  etc)	
  and	
  must	
  be	
  
tossed	
  
•  News,	
  Adver9sing,	
  Gaming	
  (A.I.),	
  Stock	
  Markets	
  
Online	
  or	
  Offline?	
  
Benefits	
  
•  Instant	
  Gra9fica9on	
  
–  Immediate	
  integra9on	
  of	
  data	
  into	
  modeling	
  outcomes	
  
–  Yahoo	
  invented	
  S4	
  to	
  process	
  user	
  feedback	
  in	
  real-­‐9me	
  to	
  
op9mize	
  search	
  adver9sing	
  ranking	
  algorithms	
  
•  Mine	
  more	
  
–  In	
  some	
  systems	
  it’s	
  only	
  possible	
  to	
  use	
  all	
  of	
  your	
  data	
  in	
  an	
  
online	
  senng	
  because	
  there	
  is	
  simply	
  too	
  much	
  
•  Highly	
  relevant	
  now	
  (ma^ers	
  for	
  news)	
  
•  Personalized	
  +	
  Real	
  9me	
  =	
  Great	
  User	
  Experience	
  
Online	
  or	
  Offline?	
  
Challenges	
  
•  YOLO	
  (You	
  Only	
  Learn	
  Once).	
  	
  
•  Specific	
  exper9se	
  
•  Evaluate/Interpret	
  is	
  Harder	
  
–  YOLO	
  makes	
  it	
  difficult	
  to	
  evaluate	
  why	
  a	
  model	
  is	
  performing	
  
poorly,	
  and	
  inherently	
  related,	
  why	
  a	
  result	
  is	
  what	
  it	
  is	
  
•  Difficult	
  to	
  maintain	
  
– Data	
  changing,	
  adap9ng	
  to	
  new	
  features,	
  latency,	
  
evalua9on	
  
•  Infrastructure	
  that	
  can	
  support	
  it.	
  Suppor9ng	
  real	
  9me	
  
learning	
  is	
  a	
  whole	
  different	
  ballgame	
  
Big	
  Data	
  	
  
Tech	
  is	
  Young	
  
Google	
  Trends	
  Hadoop	
  &	
  NOSQL	
  
LinkedIn	
  Open	
  Source	
  
Data	
  Tech	
  
Developing	
  Bleeding	
  Edge	
  	
  
Tech	
  is	
  Great	
  
….What	
  About	
  Using	
  It?	
  
It	
  can	
  be	
  a	
  pain	
  to	
  use…..	
  
As	
  a	
  user	
  
High-­‐level	
  infrastructure	
  needs	
  
AB	
  tes9ng	
  plagorm	
   Data/schema	
  viewer	
  
Workflow	
  manager	
   Access	
  
Modeling	
  algorithms	
  implementa9on	
  
Is	
  the	
  system	
  set	
  up	
  to	
  iterate	
  
and	
  test	
  new	
  models	
  as	
  fast	
  as	
  
possible?	
  	
  
High-­‐level	
  LinkedIn	
  Data	
  Flow	
  
Evalua9ng	
  Models	
  
Evalua9ng	
  Models	
  
CROWDSOURCE!!!	
   Is	
  this	
  real?	
  
Are	
  we	
  	
  
using	
  	
  
feedback?	
  
Summary	
  
•  Large-­‐scale	
  modeling	
  	
  
•  Isn’t	
  easy	
  but	
  takes	
  advantage	
  of	
  the	
  large	
  
amounts	
  of	
  data	
  we	
  are	
  storing	
  
•  Sees	
  no9ceable	
  increases	
  in	
  solu9on	
  quality	
  
•  More	
  data	
  beats	
  be^er	
  algorithms	
  
•  Spend	
  more	
  9me	
  on	
  exploratory	
  analysis	
  and	
  feature	
  
engineering	
  
•  Benefits	
  from	
  large	
  scale	
  data	
  
•  Build	
  infrastructure	
  that	
  lets	
  you	
  iterate	
  and	
  AB	
  test	
  
as	
  fast	
  as	
  possible	
  
	
  
rumah@linkedin.com	
  
	
  

More Related Content

PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
PDF
Better Insights from Your Master Data - Graph Database LA Meetup
PDF
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
PDF
H2O World - Machine Learning for non-data scientists
PPTX
Correlation does not mean causation
PPTX
Machine Learning with Azure and Databricks Virtual Workshop
 
PPTX
Watson Analytics for HSE - Copy
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
Better Insights from Your Master Data - Graph Database LA Meetup
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
H2O World - Machine Learning for non-data scientists
Correlation does not mean causation
Machine Learning with Azure and Databricks Virtual Workshop
 
Watson Analytics for HSE - Copy

What's hot (20)

PDF
The paradox of big data - dataiku / oxalide APEROTECH
PDF
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
PDF
What Managers Need to Know about Data Science
PDF
Building Better Models Faster Using Active Learning
PDF
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
PPT
Building Personalized Data Products with Dato
PDF
Neo4j on Microsoft Azure
PPTX
DataCanvas: Big Data Analytic Flow in Cloud
PDF
Building a Recommendation Engine - A Balancing act
PPTX
Anatomy of a Big Data Application (BDA)
PDF
Before Kaggle
PPTX
IBM Deep Learning Overview
PPTX
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
PPTX
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
PDF
Machine learning in real-time - the next frontier
PPT
Objectivity/DB: A Multipurpose NoSQL Database
PDF
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
PPTX
Dataiku - From Big Data To Machine Learning
PPTX
Sentiment analysis for Business Analytics
PDF
Making Sense of Graph Databases
The paradox of big data - dataiku / oxalide APEROTECH
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
What Managers Need to Know about Data Science
Building Better Models Faster Using Active Learning
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Building Personalized Data Products with Dato
Neo4j on Microsoft Azure
DataCanvas: Big Data Analytic Flow in Cloud
Building a Recommendation Engine - A Balancing act
Anatomy of a Big Data Application (BDA)
Before Kaggle
IBM Deep Learning Overview
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
Machine learning in real-time - the next frontier
Objectivity/DB: A Multipurpose NoSQL Database
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
Dataiku - From Big Data To Machine Learning
Sentiment analysis for Business Analytics
Making Sense of Graph Databases
Ad

Similar to Large Scale Modeling Overview (20)

PDF
Neo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
PDF
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
PDF
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
PDF
How Celtra Optimizes its Advertising Platform with Databricks
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
PPTX
An Agile Approach to Machine Learning
PDF
Executive Briefing: Why managing machines is harder than you think
PPTX
GraphTalk Berlin - Einführung in Graphdatenbanken
PDF
The final frontier
PDF
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
PDF
The Shifting Landscape of Data Integration
PDF
Barga Galvanize Sept 2015
PDF
Think Big | Enterprise Artificial Intelligence
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PPTX
Top Business Intelligence Trends for 2016 by Panorama Software
PPTX
Patterns for Successful Data Science Projects (Spark AI Summit)
PDF
Mastering your data with ca e rwin dm 09082010
PDF
Driving Business Value Through Agile Data Assets
PDF
Webinar on Big Data Challenges : Presented by Raj Kasturi
Neo4j GraphTalks Oslo - Next Generation Solutions built on Neoej
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
How Celtra Optimizes its Advertising Platform with Databricks
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
An Agile Approach to Machine Learning
Executive Briefing: Why managing machines is harder than you think
GraphTalk Berlin - Einführung in Graphdatenbanken
The final frontier
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
The Shifting Landscape of Data Integration
Barga Galvanize Sept 2015
Think Big | Enterprise Artificial Intelligence
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Top Business Intelligence Trends for 2016 by Panorama Software
Patterns for Successful Data Science Projects (Spark AI Summit)
Mastering your data with ca e rwin dm 09082010
Driving Business Value Through Agile Data Assets
Webinar on Big Data Challenges : Presented by Raj Kasturi
Ad

Recently uploaded (20)

PPTX
Leprosy and NLEP programme community medicine
PPT
Predictive modeling basics in data cleaning process
DOCX
Factor Analysis Word Document Presentation
PDF
annual-report-2024-2025 original latest.
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Global Data and Analytics Market Outlook Report
PPTX
modul_python (1).pptx for professional and student
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Introduction to Data Science and Data Analysis
PDF
How to run a consulting project- client discovery
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
Leprosy and NLEP programme community medicine
Predictive modeling basics in data cleaning process
Factor Analysis Word Document Presentation
annual-report-2024-2025 original latest.
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Global Data and Analytics Market Outlook Report
modul_python (1).pptx for professional and student
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
ISS -ESG Data flows What is ESG and HowHow
Introduction to Data Science and Data Analysis
How to run a consulting project- client discovery
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
A Complete Guide to Streamlining Business Processes
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
retention in jsjsksksksnbsndjddjdnFPD.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts

Large Scale Modeling Overview

  • 1. Large  Scale  Modeling     Overview     Ferris  Jumah   Predic9on  Analy9cs  Innova9on  Summit  2013   November  15th,  2013  
  • 2. Large  Scale  Modeling     •  What  does  large  scale  modeling  mean  to  you?     “Building  models  that  consume  and  process  data   sets  so  large  that  it  is  difficult  to  use  current   modeling  tools  and  methods”        
  • 4. LinkedIn  News   •  Any9me  a  user  lands  on  their  homepage,  a  few   items  from  our  news  product  are  recommended   to  them   •  This  is  powered  by  a  large  scale  recommenda9on   engine   •  For  every  user,  at  LinkedIn  Scale          
  • 5. 3M+        Company  Pages   2  new   Members  per  second   184  M+   Monthly  Unique  Visitors   2.5  B+   Monthly  PageViews   The  World’s  Largest  Professional  Network   259,000,000  +  
  • 6. Use  It  All   •  Use  all  of  the  data  you  have   •  Why  not  store,  process,  and  model  all  of  it?     •  “The  accuracy  &  nature  of  answers  you  get  on   large  data  sets  can  be  completely  different   from  what  you  see  on  small  samples”   •  Not  using  it  is  losing  compe99ve  edge        
  • 7. Norvig,  The  Unreasonable   Effec9veness  of  Data,   2013   Classic  Jus9fica9on  
  • 8. More  Data  Beats  Be^er  Algorithms   Banko  and  Brill,  2001  
  • 9. More  Data  Beats  Be^er  Algorithms   •  As  data  set  size  increases,  your  specific  model  and   the  tuning  ma^ers  a  lot  less       •  Can  worry  less  about  sample  size,  biases,  and   generalizing   •  Spend  your  9me  on     •  Exploratory  Analysis   •  Feature  Engineering    
  • 10. Exploratory  Analysis   •  With  large  amounts  of  data,  insights  and   hypothesis  present  themselves     •  Group  By  And  Count   •  With  large  amounts  of  data,  you  can  worry  less  about   the  distribu9on  being  reflec9ve  of  the  popula9on   •  Summary  Sta9s9cs     •  Simple  Correla9ons   •  Constantly  Visualize        
  • 11. Exploratory  Analysis  Across  LinkedIn  Members  
  • 12. Exploratory  Analysis  Across  LinkedIn  Members   •  Grouped  by  name  le^er  length  and  9tle  and   counted   •  No9ced  that  name  length  is  heavily  correlated   with  industry   •  Able  to  start  bootstrapping  models   •  Quickly  validate  or  invalidate  a  model   hypothesis   •  Generalized  the  results  into  development  of   the  9tle  standardiza9on  models  used  today        
  • 13. Go  Deep   •  Massive  datasets  lend  themselves  well  to  very   granular  demographic  slicing  or  bucke9ng     •  Get  a  very  strong  sense  for  customer  segments   •  Reduce  the  size  of  your  data  without  losing  too  much   informa9on   •  No9ce  very  specific  trends  that  you  can  be  confident   are  real   •  Personalize  deeply        
  • 14. Go  Deep      Say  LinkedIn  wants  to  sell  me  something…        
  • 17. Keep  Going   •  When  opera9ng  with  massive  sets,  combine   several   •  Tells  you  more  than  each  would  individually  
  • 23. Large  Datasets     Allow  More     Crea9vity  with  Features  
  • 24. Mapping  LinkedIn  Skills,     +1  to  Edge  Weight     When  Listed  Concurrently  
  • 26. Can  Your  Infrastructure   Hang?   First  ques9on…..  
  • 27. Online  or  Offline?   If  the  problem  domain  can  be  scoped  into  an  offline   system,  it  usually  should  be     Appropriate  When   •  Data  is  best  modeled  in  transient  data  streams  rather   than  persistent  rela9ons   •  Data  relevance  or  freshness  fades  fast   •  Too  much  data  to  store  (infra,  latency  etc)  and  must  be   tossed   •  News,  Adver9sing,  Gaming  (A.I.),  Stock  Markets  
  • 28. Online  or  Offline?   Benefits   •  Instant  Gra9fica9on   –  Immediate  integra9on  of  data  into  modeling  outcomes   –  Yahoo  invented  S4  to  process  user  feedback  in  real-­‐9me  to   op9mize  search  adver9sing  ranking  algorithms   •  Mine  more   –  In  some  systems  it’s  only  possible  to  use  all  of  your  data  in  an   online  senng  because  there  is  simply  too  much   •  Highly  relevant  now  (ma^ers  for  news)   •  Personalized  +  Real  9me  =  Great  User  Experience  
  • 29. Online  or  Offline?   Challenges   •  YOLO  (You  Only  Learn  Once).     •  Specific  exper9se   •  Evaluate/Interpret  is  Harder   –  YOLO  makes  it  difficult  to  evaluate  why  a  model  is  performing   poorly,  and  inherently  related,  why  a  result  is  what  it  is   •  Difficult  to  maintain   – Data  changing,  adap9ng  to  new  features,  latency,   evalua9on   •  Infrastructure  that  can  support  it.  Suppor9ng  real  9me   learning  is  a  whole  different  ballgame  
  • 30. Big  Data     Tech  is  Young  
  • 31. Google  Trends  Hadoop  &  NOSQL  
  • 32. LinkedIn  Open  Source   Data  Tech  
  • 33. Developing  Bleeding  Edge     Tech  is  Great   ….What  About  Using  It?  
  • 34. It  can  be  a  pain  to  use…..   As  a  user  
  • 35. High-­‐level  infrastructure  needs   AB  tes9ng  plagorm   Data/schema  viewer   Workflow  manager   Access   Modeling  algorithms  implementa9on  
  • 36. Is  the  system  set  up  to  iterate   and  test  new  models  as  fast  as   possible?    
  • 39. Evalua9ng  Models   CROWDSOURCE!!!   Is  this  real?   Are  we     using     feedback?  
  • 40. Summary   •  Large-­‐scale  modeling     •  Isn’t  easy  but  takes  advantage  of  the  large   amounts  of  data  we  are  storing   •  Sees  no9ceable  increases  in  solu9on  quality   •  More  data  beats  be^er  algorithms   •  Spend  more  9me  on  exploratory  analysis  and  feature   engineering   •  Benefits  from  large  scale  data   •  Build  infrastructure  that  lets  you  iterate  and  AB  test   as  fast  as  possible