SlideShare a Scribd company logo
Using Simple Machine
Learning Models in a
New Ads Manager
Ruth Garcia
September 25, 2018
London, UK
Data Science in Skyscanner
Barcelona
Beijing Edinburgh
Glasgow London Singapore
Data Science at Skyscanner
1. Decision Science: Ensuring we have the right data; insights and
information to make the most impactful, scientific decisions in every aspect
of our operations.
2. Building Data Products: Leveraging our vast wealth of data to build more
contextual, relevant products for Travellers and Travel Suppliers (our
Partners)
D.S
Eng.
Online Advertising for Mobile
$5.7 $5.4 $6.2
$7.7 $8.1 $8.6 $8.9 $9.9
$8.7 $9.4
$0.7
$1.6
$2.8
$4.4
$8.2
$11.4
$5.7 $5.4
$6.2
$7.7
$8.8
$10.3
$11.7
$14.3
$16.9
$20.8
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Mobile
Non-Mobile
15.4%
Overall CAGR
76.8%
Mobile CAGR
 Source: IAB/PwC Internet Ad Revenue Report, HY 2017
 * CAGR: Compound Annual Growth Rate
Find balance: Increase our revenue and engage users
Revenue Engaged user
Delivery of Ads
Data
ownership
?
Technical
difficulties
Black box
algorithms
Skyscanner Ads
Manager
Solution
External Ads Managers
Click prediction algorithm
Goal: Click prediction model
Test: Is it better than random?
Rankingalgorithm
Candidates
Cloud services and tools at Skyscanner
Languages
AWS services
Batch
S3
Overview of an Advertising System
Expectation vs. reality
Reality
Not flexible but fast and
easier to implement.
Flexible but with the risk of not
being fast enough.
Expectation
Challenges: Which model to use?
Model Possibilities (easy to read in node.js):
• Logistic regression
• Random Forest : gets lost
• Neural networks: too slow hard to put it in json
Solvers:
• Logistic regression: Liblinear, sag
• SGDClassifier
Train all data at once
Train data in batches
Efficient but a lot of hyperpar.
Challenge: get it as acc as
sklearn.LogisticRegression()
Gridsearch for
hyperparameteres
Challenges: Categorical values
Pros:
• No collisions
• Inverse mapping
Creatives
C1
C2
C3
C1 C2 C3
1 0 0
0 1 0
0 0 1
Creatives
C1
C2
C3
C4
C1 C2 C3 C4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Cons:
• Need to know all values in advance
• Not good for online learning
• Keep dictionary in prod
One hot encoding:
Challenges: Categorical values
id features
123 creative1,
advertiser2,mobile, etc.
321 creative2,
advertiser4,mobile, etc.
id Feat_1 Feat_2 Feat_3 …. Feat_k
123 0.1 0 1 …. 0
321 0.5 0 0 …. 1
Hashing Trick: map data of arbitrary sizes to data of a fixed size
Pros:
• Memory efficient
• Online learning
• No dictionary
Cons:
• No inverse mapping
• Hash collisions
Machine Learning Performance: offline
Precision at 1: based on
target groups.
Mean Reciprocal Rank:
order of ranked ads
AUC: if caring about ranking
Log-Loss: if caring about the
value of CTR
Other metrics :
Optimizing evaluation metric
Updating model based on different
sampling methods and training days.
2 3 4 5 6 7
Histogram of training days
6/4/18 6/11/18 6/18/18 6/25/18 7/2/18 7/9/18 7/16/18 7/23/18 7/30/18 8/6/18 8/13/18 8/20/18 8/27/18 9/3/18
AUC over time
Best AUC Worst AUC
Satisficing metric: Precision at 1
Choose best AUC conditioned of precision
at 1 better than random
6/4/18 6/11/18 6/18/18 6/25/18 7/2/18 7/9/18 7/16/18 7/23/18 7/30/18 8/6/18 8/13/18 8/20/18 8/27/18 9/3/18
Precision at 1: Satisficing metric
precision_at_1 random_precision_at_1
Last time, checked
60% increase with
Regard to random
The road ahead: Balancing exploitation and exploration
Choose ad based on
ONLY CTR
Choose ad based on
OTHER criteria
 Most common
approaches:
 𝛆 − 𝐠𝐫𝐞𝐞𝐝𝐲
 𝛆 − 𝐝𝐞𝐜𝐫𝐞𝐚𝐬𝐢𝐧𝐠
Learnings
1. Proof of concept first: start lean to prove the value of your Machine Learning project
2. Speak up front since the beginning about the benefits and requirements of
using ML in the product (talk about time and costs)
3. If you have problems with dimensionality, explore different ways of optimizing
your resources, e.g., mini batch, hashing trick.
4, Advertising systems are very dynamic so be aware how often you need to
update the model. It is likely that one single model will not work every time
in advertising.
Eng.
Thank you
@ruthygarcia
Questions?

More Related Content

PPTX
The true meaning of data
PPTX
The true meaning of data by Maciej Dabrowski
PPTX
Growth Analytics: Evolution, Community and Tools
PDF
Data Driven Hiring
PDF
Using Simple Machine Learning Models in a New Ads Manager
PDF
Using Machine Learning in the delivery of ads
PDF
Machine learning for IoT - unpacking the blackbox
PDF
Pragmatic Machine Learning @ ML Spain
The true meaning of data
The true meaning of data by Maciej Dabrowski
Growth Analytics: Evolution, Community and Tools
Data Driven Hiring
Using Simple Machine Learning Models in a New Ads Manager
Using Machine Learning in the delivery of ads
Machine learning for IoT - unpacking the blackbox
Pragmatic Machine Learning @ ML Spain

Similar to Using Simple Machine Learning Models in a New Ads Manager (20)

PDF
A few Challenges to Make Machine Learning Easy
PDF
Machine learning systems for engineers
PDF
Data Analysis - Making Big Data Work
PDF
Machine learning it is time...
PPTX
Big Data & Machine Learning - TDC2013 Sao Paulo
PDF
Hacking Predictive Modeling - RoadSec 2018
PPTX
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
PPTX
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
PDF
Introduction to Machine Learning - WeCloudData
PDF
Introduction to Machine Learning - WeCloudData
PPTX
Machine Learning
PDF
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
PDF
Machine learning and big data
PDF
The Machine Learning Workflow with Azure
PDF
Choosing a Machine Learning technique to solve your need
PDF
Machine Learning for Digital Advertising
PDF
Machine Learning for Digital Advertising
PDF
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
PDF
The Data Science Process - Do we need it and how to apply?
PDF
Prepare your data for machine learning
A few Challenges to Make Machine Learning Easy
Machine learning systems for engineers
Data Analysis - Making Big Data Work
Machine learning it is time...
Big Data & Machine Learning - TDC2013 Sao Paulo
Hacking Predictive Modeling - RoadSec 2018
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
Machine Learning
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
Machine learning and big data
The Machine Learning Workflow with Azure
Choosing a Machine Learning technique to solve your need
Machine Learning for Digital Advertising
Machine Learning for Digital Advertising
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
The Data Science Process - Do we need it and how to apply?
Prepare your data for machine learning
Ad

More from Ruth Garcia Gavilanes (9)

PDF
Assessing Online Ads Beyond Only Clicks
PDF
An Analysis of Human-Generated Friendship Recommendations
PDF
Discovering Culture in Social Media and a Brief Case of Collective Memory
PDF
Language, Twitter and Academic Conferences
PDF
USER BEHAVIOR IN MICROBLOGS WITH A CULTURAL EMPHASIS
PDF
Who are my Audiences? Evolution of Target Audiences in Microblogs
PDF
Follow My Friends This Friday! An Analysis of Human-generated Friendship Reco...
PDF
Twitter: Time, Individualism and Power
PDF
Cikm2011 doallbirdstweetthesame
Assessing Online Ads Beyond Only Clicks
An Analysis of Human-Generated Friendship Recommendations
Discovering Culture in Social Media and a Brief Case of Collective Memory
Language, Twitter and Academic Conferences
USER BEHAVIOR IN MICROBLOGS WITH A CULTURAL EMPHASIS
Who are my Audiences? Evolution of Target Audiences in Microblogs
Follow My Friends This Friday! An Analysis of Human-generated Friendship Reco...
Twitter: Time, Individualism and Power
Cikm2011 doallbirdstweetthesame
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Computer network topology notes for revision
PPTX
Leprosy and NLEP programme community medicine
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Managing Community Partner Relationships
PDF
annual-report-2024-2025 original latest.
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
climate analysis of Dhaka ,Banglades.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction-to-Cloud-ComputingFinal.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Computer network topology notes for revision
Leprosy and NLEP programme community medicine
oil_refinery_comprehensive_20250804084928 (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Supervised vs unsupervised machine learning algorithms
Managing Community Partner Relationships
annual-report-2024-2025 original latest.
Galatica Smart Energy Infrastructure Startup Pitch Deck
Data_Analytics_and_PowerBI_Presentation.pptx
Reliability_Chapter_ presentation 1221.5784

Using Simple Machine Learning Models in a New Ads Manager

  • 1. Using Simple Machine Learning Models in a New Ads Manager Ruth Garcia September 25, 2018 London, UK
  • 2. Data Science in Skyscanner Barcelona Beijing Edinburgh Glasgow London Singapore
  • 3. Data Science at Skyscanner 1. Decision Science: Ensuring we have the right data; insights and information to make the most impactful, scientific decisions in every aspect of our operations. 2. Building Data Products: Leveraging our vast wealth of data to build more contextual, relevant products for Travellers and Travel Suppliers (our Partners) D.S Eng.
  • 4. Online Advertising for Mobile $5.7 $5.4 $6.2 $7.7 $8.1 $8.6 $8.9 $9.9 $8.7 $9.4 $0.7 $1.6 $2.8 $4.4 $8.2 $11.4 $5.7 $5.4 $6.2 $7.7 $8.8 $10.3 $11.7 $14.3 $16.9 $20.8 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Mobile Non-Mobile 15.4% Overall CAGR 76.8% Mobile CAGR  Source: IAB/PwC Internet Ad Revenue Report, HY 2017  * CAGR: Compound Annual Growth Rate
  • 5. Find balance: Increase our revenue and engage users Revenue Engaged user
  • 6. Delivery of Ads Data ownership ? Technical difficulties Black box algorithms Skyscanner Ads Manager Solution External Ads Managers
  • 7. Click prediction algorithm Goal: Click prediction model Test: Is it better than random? Rankingalgorithm Candidates
  • 8. Cloud services and tools at Skyscanner Languages AWS services Batch S3
  • 9. Overview of an Advertising System
  • 10. Expectation vs. reality Reality Not flexible but fast and easier to implement. Flexible but with the risk of not being fast enough. Expectation
  • 11. Challenges: Which model to use? Model Possibilities (easy to read in node.js): • Logistic regression • Random Forest : gets lost • Neural networks: too slow hard to put it in json Solvers: • Logistic regression: Liblinear, sag • SGDClassifier Train all data at once Train data in batches Efficient but a lot of hyperpar. Challenge: get it as acc as sklearn.LogisticRegression() Gridsearch for hyperparameteres
  • 12. Challenges: Categorical values Pros: • No collisions • Inverse mapping Creatives C1 C2 C3 C1 C2 C3 1 0 0 0 1 0 0 0 1 Creatives C1 C2 C3 C4 C1 C2 C3 C4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 Cons: • Need to know all values in advance • Not good for online learning • Keep dictionary in prod One hot encoding:
  • 13. Challenges: Categorical values id features 123 creative1, advertiser2,mobile, etc. 321 creative2, advertiser4,mobile, etc. id Feat_1 Feat_2 Feat_3 …. Feat_k 123 0.1 0 1 …. 0 321 0.5 0 0 …. 1 Hashing Trick: map data of arbitrary sizes to data of a fixed size Pros: • Memory efficient • Online learning • No dictionary Cons: • No inverse mapping • Hash collisions
  • 14. Machine Learning Performance: offline Precision at 1: based on target groups. Mean Reciprocal Rank: order of ranked ads AUC: if caring about ranking Log-Loss: if caring about the value of CTR Other metrics :
  • 15. Optimizing evaluation metric Updating model based on different sampling methods and training days. 2 3 4 5 6 7 Histogram of training days 6/4/18 6/11/18 6/18/18 6/25/18 7/2/18 7/9/18 7/16/18 7/23/18 7/30/18 8/6/18 8/13/18 8/20/18 8/27/18 9/3/18 AUC over time Best AUC Worst AUC
  • 16. Satisficing metric: Precision at 1 Choose best AUC conditioned of precision at 1 better than random 6/4/18 6/11/18 6/18/18 6/25/18 7/2/18 7/9/18 7/16/18 7/23/18 7/30/18 8/6/18 8/13/18 8/20/18 8/27/18 9/3/18 Precision at 1: Satisficing metric precision_at_1 random_precision_at_1 Last time, checked 60% increase with Regard to random
  • 17. The road ahead: Balancing exploitation and exploration Choose ad based on ONLY CTR Choose ad based on OTHER criteria  Most common approaches:  𝛆 − 𝐠𝐫𝐞𝐞𝐝𝐲  𝛆 − 𝐝𝐞𝐜𝐫𝐞𝐚𝐬𝐢𝐧𝐠
  • 18. Learnings 1. Proof of concept first: start lean to prove the value of your Machine Learning project 2. Speak up front since the beginning about the benefits and requirements of using ML in the product (talk about time and costs) 3. If you have problems with dimensionality, explore different ways of optimizing your resources, e.g., mini batch, hashing trick. 4, Advertising systems are very dynamic so be aware how often you need to update the model. It is likely that one single model will not work every time in advertising. Eng.

Editor's Notes

  • #4: The distinction between Decision Science and Building Data Products is useful to make sure we recognise that data science involves a variety of different tasks, and sometimes makes it easier to think about our problems. But please bear in mind that while this distinction is useful, it there are not "Decision Science" or "Data Product" Data Scientists - there are just Data Scientists. 
  • #5: The IAB's Internet Advertising Revenue Report, a survey conducted independently by PricewaterhouseCoopers
  • #7: Slow Operationally opaque: they weren't able to know how certain things were implemented Less customizable: many changes they wanted to make weren't supported by the third party. Couldn't control the quality of ads, etc.
  • #8: MOTIVATION MAKE IT CLEAR Briefing : exact sentence Skyscanner value : deliver more Taveler, partner skyscanner
  • #9: .
  • #11: .
  • #12: Ensembles of decision trees (such as Random Forests, which is a trademarked term for one particular implementation) are very fast to train, but quite slow to create predictions once trained. More accurate ensembles require more trees, which means using the model becomes slower. In most practical situations this approach is fast enough, but there can certainly be situations where run-time performance is important and therefore other approaches would be preferred. They don’t deal with a large number of categories in categorical variables well
  • #13: To train a model you need to convert With one-hot encoding, a categorical feature becomes an array whose size is the number of possible choices for that features
  • #14: If we feed the same input to a hash function, it will always give the same output. The choice of hash function determines the range of possible outputs, i.e. the range is always fixed (e.g. numbers from 0 to 1024). Hash functions are one-way: given a hash, we can’t perform a reverse lookup to determine what the input was. Hash functions may output the same value for different inputs (collision).
  • #15: Click Stochastic average gradient Talk more about Random: In simple words, log loss measures the UNCERTAINTY of the probabilities of your model by comparing them to the true labels. I found it easier to communicate the offline metrics based on precision at 1 metric
  • #16: Satisficing
  • #17: Stochastic average gradient
  • #18: Explain more simulation and trade offs  urgency metric ? Maybe not mention bandit  We need to take care of clients, advertisers, our customers. Return vs variety Highlight: need are delivery Happy clients
  • #20: r