SlideShare a Scribd company logo
Hadoop Summit 2011Online Content Optimization using HadoopShail Adityashailg@yahoo-inc.com
What do we do ?Deliver right CONTENT to the right USER at the right TIME”
Effectively and “pro-actively” learn from user interactions with content that are displayed to maximize our objectives
A new scientific discipline at the interface of
Large scale Machine Learning and Statistics
Multi-objective optimization in the presence of uncertainty
User understanding
Content understandingContent Relevance at Yahoo!Editorial10s of ItemsImportantEditorsPopularPersonal / SocialScienceMillions of Items
Content Ranking ProblemsXYMost PopularMost engaging overall based on objective metricsMost Popular + Per User HistoryRotate stories I’ve already seenLight PersonalizationMore relevant to me based on my age, gender, location, and property usageDeep PersonalizationMost relevant to me based on my deep interests (entities, sources, categories, keywords)Related Items and Context-Sensitive ModelsBehavioral Affinity: People who did X, did YMost engaging in this page/section/property/device/referral context?Layout OptimizationWhich modules/ad units should be shown to this user in this context?Revenue OptimizationVoice and Business RulesReal-time Dashboard
Yahoo FrontpageTrending Now (Most popular)Today Module(Light personalization)PersonalAssistant(LightPersonalization)National News(Most Popular + User History bucket)Deals (most popular)
Recommendation: A Match-making Problem Recommendation problems
 Search: Web, Vertical
 Online advertising
 …Item InventoryArticles, web page, ads, …Use an automated algorithm to select item(s) to showGet feedback (click, time spent,..) Refine the modelsRepeat (large number of times)Measure metric(s) of interest(Total clicks, Total revenue,…)OpportunityUsers, queries,  pages, …
Problem Characteristics : Today moduleTraffic obtained from a controlled randomized experimentThings to note: a) Short lifetimes b) temporal effects c) often breaking news story
Scale: Why use Hadoop?Million events per second (user view/click, content update)Hundreds of GB data collected and modeled per run Millions of items in poolMillions of user profilesTens of thousands of Features (Content and/or User)
Data FlowOptimization EngineContent feed with biz rulesRules EngineContent MetadataExploit~99%Explore~1%Near Real-timeFeedbackReal-timeInsightsDashboardOptimized Module

More Related Content

PPTX
June 2014 HUG: Interactive analytics over hadoop
PPTX
Extractiv
PDF
Hadoop Summit 2010 Keynote
PDF
Pinterest - Big Data Machine Learning Platform at Pinterest
PPT
1 content optimization-hug-2010-07-21
PDF
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
PDF
Operationalizing Machine Learning at Scale at Starbucks
PDF
PrachiSharma
June 2014 HUG: Interactive analytics over hadoop
Extractiv
Hadoop Summit 2010 Keynote
Pinterest - Big Data Machine Learning Platform at Pinterest
1 content optimization-hug-2010-07-21
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
Operationalizing Machine Learning at Scale at Starbucks
PrachiSharma

What's hot (20)

PPTX
Spark Summit Keynote by Suren Nathan
PDF
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
PPTX
Spark Summit East Keynote by Anjul Bhambhri
PDF
Building Data Lakes with Apache Airflow
PPTX
Machine Learning with Apache Spark
PDF
Get Behind the Wheel with H2O Driverless AI Hands-On Training
PDF
JingningCao12272014
PPTX
Conference 2014: Rajat Arya - Deployment with GraphLab Create
PPT
2 hadoop@e bay-hug-2010-07-21
PDF
Distributed Heterogeneous Mixture Learning On Spark
PDF
Saving Energy in Homes with a Unified Approach to Data and AI
PDF
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
PDF
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
PPTX
Big Data tools in practice
PDF
Credit Fraud Prevention with Spark and Graph Analysis
PPTX
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
PDF
Scaling Production Machine Learning Pipelines with Databricks
PPTX
Validating credit cards on mobile using deep learning
PPTX
Practical Use of a NoSQL Database
Spark Summit Keynote by Suren Nathan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit East Keynote by Anjul Bhambhri
Building Data Lakes with Apache Airflow
Machine Learning with Apache Spark
Get Behind the Wheel with H2O Driverless AI Hands-On Training
JingningCao12272014
Conference 2014: Rajat Arya - Deployment with GraphLab Create
2 hadoop@e bay-hug-2010-07-21
Distributed Heterogeneous Mixture Learning On Spark
Saving Energy in Homes with a Unified Approach to Data and AI
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Big Data tools in practice
Credit Fraud Prevention with Spark and Graph Analysis
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
Scaling Production Machine Learning Pipelines with Databricks
Validating credit cards on mobile using deep learning
Practical Use of a NoSQL Database
Ad

Viewers also liked (20)

PPT
Marinheiros Do Poseidon Em CelebraçãO Pela Viagem Ao
KEY
The True Confessions of Charlotte Doyle Vocab 1-3
PDF
12年6月11日讲座
PPS
Change is Relative : Persistence in the Urban Environment
PDF
Juicer - A fast template engine using javascript
PPTX
Top 10 microblogging tools
PPTX
Introduction to Hadoop at Data-360 Conference
PDF
Stackato v5
PPT
Sunbeach Knowledge Base
PDF
PDF
Metrics
ODP
Nottingham hack soc
PDF
2013:7:15 pump
PDF
God is in the details
PDF
Open Historical Map: re-using obsolete information - State of the Map 2013
PPT
экономические проблемы материнства в современной россии
PPTX
Java in the database–is it really useful? Solving impossible Big Data challenges
KEY
Oratory Instruction
PPT
Keynote | The Rise and Fall and Rise of Java | James Governor
PDF
Marinheiros Do Poseidon Em CelebraçãO Pela Viagem Ao
The True Confessions of Charlotte Doyle Vocab 1-3
12年6月11日讲座
Change is Relative : Persistence in the Urban Environment
Juicer - A fast template engine using javascript
Top 10 microblogging tools
Introduction to Hadoop at Data-360 Conference
Stackato v5
Sunbeach Knowledge Base
Metrics
Nottingham hack soc
2013:7:15 pump
God is in the details
Open Historical Map: re-using obsolete information - State of the Map 2013
экономические проблемы материнства в современной россии
Java in the database–is it really useful? Solving impossible Big Data challenges
Oratory Instruction
Keynote | The Rise and Fall and Rise of Java | James Governor
Ad

Similar to Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoop" by Shail Aditya (20)

PPT
Online Content Optimization with Hadoop__HadoopSummit2010
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
PDF
Facebook Hadoop Data & Applications
PPTX
Building Recommendation Platforms with Hadoop
PPTX
Intro to Hadoop
PDF
Keynote from ApacheCon NA 2011
PPTX
Hadoop and IoT Sinergija 2014
PPTX
Hadoop and IoT Sinergija 2014
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
PDF
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
PDF
Using MongoDB + Hadoop Together
PPTX
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
PPTX
Foxvalley bigdata
PDF
Semantic web meetup 14.november 2013
PPTX
A Glimpse of Bigdata - Introduction
PDF
Big_data_1674238705.ppt is a basic background
ODP
Hadoop introduction
PPTX
Big data
PPTX
Karmasphere bdabi blueprint- final
PPTX
Clinical Trials & Big Data-Final
Online Content Optimization with Hadoop__HadoopSummit2010
Hadoop - Architectural road map for Hadoop Ecosystem
Facebook Hadoop Data & Applications
Building Recommendation Platforms with Hadoop
Intro to Hadoop
Keynote from ApacheCon NA 2011
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
Using MongoDB + Hadoop Together
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
Foxvalley bigdata
Semantic web meetup 14.november 2013
A Glimpse of Bigdata - Introduction
Big_data_1674238705.ppt is a basic background
Hadoop introduction
Big data
Karmasphere bdabi blueprint- final
Clinical Trials & Big Data-Final

More from Yahoo Developer Network (20)

PDF
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
PDF
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
PDF
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
PDF
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
PDF
CICD at Oath using Screwdriver
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
PPTX
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
PDF
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
PDF
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
PDF
Moving the Oath Grid to Docker, Eric Badger, Oath
PDF
Architecting Petabyte Scale AI Applications
PDF
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
PPTX
Jun 2017 HUG: YARN Scheduling – A Step Beyond
PDF
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
PPTX
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
CICD at Oath using Screwdriver
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Moving the Oath Grid to Docker, Eric Badger, Oath
Architecting Petabyte Scale AI Applications
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoop" by Shail Aditya

Editor's Notes

  • #2: This is the Title slide.Please use the name of the presentation that was used in the abstract submission.
  • #9: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #12: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #13: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #14: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #15: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #16: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #17: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #18: This is the agenda slide. There is only one of these in the deck.NOTES:What does X stories to run mean ? Can we be more clear on thatAlso – This should be a more a punch line of what we do. This slide to me is very broad and not clear. Following are the things that I would describeProblem of matching the best content to the interest of a userScale Millions of content slicesMillions of users
  • #19: This is the final slide; generally for questions at the end of the talk.Please post your contact information here.