SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Arnoud de Munnik & Jerry Vos, wehkamp
Applied Machine Learning for
Ranking Products in an
Ecommerce Setting
#UnifiedDataAnalytics #SparkAISummit
3#UnifiedDataAnalytics #SparkAISummit
Data Scientist
@wehkamp since 2001
Education: Econometrics
Jerry Vos
Data Scientist
@wehkamp since 2011
Education: Marketing Research
Arnoud de Munnik
Agenda
• Intro wehkamp
• E-commerce ranking problem
• Our learning-to-rank pipeline
• Ranking model
• Q&A
4#UnifiedDataAnalytics #SparkAISummit
the online department store for families in the
Netherlands
1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first
2019 -
a great shop
experience
our history
where we come from
over 2.000 brands
C&A // Vingino // Hunkemöller // Mango // Tommy Hilfiger // Scotch & Soda // ONLY
HK Living // House Doctor // Woood // Bloomingville // Zuiver // whkmp’s own
our categories
Fashion // Home & garden // Electronics // Entertainment // Household // Sports & Leisure // Beauty &
Health
>400.000
products
>500.000
daily visitors
661 million
sales 18/19
11 million
packages
> 950
colleagues
60%
of customers
shopping mobile
72%
of our
customers is
female
Our journey
8#UnifiedDataAnalytics #SparkAISummit
• We work(ed) with a traditional corporate data warehouse
• Need: ML, flexibility, speed, enabling, etc.
• 2 years ago: pilot Spark on Databricks
– Challenges: Training of people, data in cloud
• Today:
– Transformation to Databricks / Cloud (S3)
– Lots of new (ML) products/prototypes and colleagues on DB platform
Machine learning @ wehkamp
9#UnifiedDataAnalytics #SparkAISummit
Recommend
ers
Forecasti
ng
Image
classificati
on
Search
Personalisat
ion
Product
ranking
Fraud
detection
And a
lot
more
Machine learning @ wehkamp
10#UnifiedDataAnalytics #SparkAISummit
Recommend
ers
Forecasti
ng
Image
classificati
on
Search
Personalisat
ion
Product
ranking
Fraud
detection
And a
lot
more
Ranking problem for ecommerce
Ranking problem for ecommerce
12#UnifiedDataAnalytics #SparkAISummit
User search for ‘jeans’
Relevant?
We return 4401 products
Ranking problem for ecommerce
13#UnifiedDataAnalytics #SparkAISummit
User navigates to
‘ladies jeans’ overview
page
Relevant?
We return 2176 products
Ranking problem for ecommerce
14#UnifiedDataAnalytics #SparkAISummit
● Consider a visit to a ‘product overview page’ (example
‘ladies jeans’) as a user query
● Main problem: maximize the order of relevance of returned
products given a user query
● How good is this list?
● Suppose we know how relevant each item is, can we
define an overall score for the relevancy of this list?
● Yes we can, the answer is NDCG
(Normalized Discounted Cumulative Gain)
https://en.wikipedia.org/wiki/Discounted_cumulative_gain
Ranking problem for ecommerce
● Suppose we know relevancy scores,
let’s rank them
● Let’s add a correction for position via
Log2(i+1)
● Divide and sum to get a score:
discounted cumulative gain (7,84)
● Do the same, but for this list in
perfect order to get an Ideal DCG.
That score will be: 9,00
● Divide our DCG / IDCG =
normalized discounted cumulative
gain (0.87)
Ranking problem for ecommerce
2 3 4
1
i
1 2 1,00 2,00
2 3 1,58 1,89
3 4 2,00 2,00
4 1 2,32 0,43
5 3 2,58 1,16
6 1 2,81 0,36
3 1
Sum: 7,84
Ranking problem for ecommerce
Relevancy scores Explain the scores with features
32 1
Title match
4
Article match
Maximize the NDCG, by giving weight to features
Reviews
Seasonality
Price
…
Learning to rank pipeline
Special thanks
Wikimedia
MjoLniR: https://github.com/wikimedia/search-MjoLniR
Pipeline
20#UnifiedDataAnalytics #SparkAISummit
Data
collection
1
Click model
2
Feature
generation
3
Ranking model
4
Serve model
(ElasticSearch
LTR)
5
Evaluation
(Tableau)
6
For relevancy scores
For explaining relevance
For estimating weights to
features
For productionising
Efforts
21#UnifiedDataAnalytics #SparkAISummit
• Initial effort of building pipeline:
2 data scientists and 1 data engineer (for search and Product Overview Page) for a couple of
months
• New click/ranking model:
1 data scientist can train, test and push a new ranking model to production within 1 hour
Data collection
22#UnifiedDataAnalytics #SparkAISummit
● Source: raw Google Analytics feed (daily)
● Per product list (i.e. search, overview page):
○ ProductID
○ Position / Page
○ Impression / Click
● Challenges:
○ tagging is different for web and app
○ devices have different display formats
Click model
Reality: We don’t know the relevancy scores; use a click model.
Goal: determine relevance of products in each SOP/POP
Approach: predict the relevance of products based on impressions and clicks of products
given its position
• Clicks over Expected clicks (COEC)
• Corrected for small search queries
In our case:
better results, easier to train & explain
• DBN click model
(https://github.com/varepsilon/clickmodels)
• Paper: Dynamic Bayesian Network ( DBN
) model: Chapelle, O. and Zhang, Y. 2009.
A dynamic bayesian network click model
for web search ranking. WWW (2009)
COEC click model
24#UnifiedDataAnalytics #SparkAISummit
Example
COEC click model
25#UnifiedDataAnalytics #SparkAISummit
search phrase Product Id clicks Expected
clicks
Bucket aka
relevancy
Jeans 0123456 250 50 3
Jeans 6543210 200 20 4
Jeans 3211231 300 300 2
Jeans 4566543 400 800 1
Jeans Random product id
9997979
- - 0Add random data
Demo clickmodel
26#UnifiedDataAnalytics #SparkAISummit
Query: “Flared jeans”
Relevancy: 1
Relevancy: 4
Feature generation
Try to explain and predict which attributes (i.e. features) of products (wrt user query)
contribute to its relevance score
27#UnifiedDataAnalytics #SparkAISummit
- Title match
- Description match
- Tf-idf
- …
● Limit the number of features to
< 100 (latency issues)
● For POP features we did not
use OHE, but a Bayesian
encoder to limit number of
features
- Popularity
- Discount / Promo
- Seasonality
- Reviews
- Days online
- Brand
- ..
Feature examples
Feature generation
28#UnifiedDataAnalytics #SparkAISummit
Feature
notebooks
Kafka
Search
processor ES index
“jeans”
Log results with
feature values
S3
● Initial training and query building with snapshot data
Query
Feature generation
29#UnifiedDataAnalytics #SparkAISummit
Feature
notebooks
Delta pre-
processed
data
Delta
feature
DB
Fetch
features
and send
to ES
Seasonality
estimate per
article type
OpenWeatherMap
API
AGG
view/sales/promo
data
Timeseries models
with prediction à
Scaled via Pandas
UDF
ES index
Feature generation
30#UnifiedDataAnalytics #SparkAISummit
Feature
notebooks
Kafka
Search
processor ES index
“jeans”
Log results with
feature values
S3
Query
Add
clickmodel
labels
Train
model
Add model
to ES
Ranking model
Ranking model
• Many machine learning techniques to use
• Elastic Search LTR plugin supports XGBoost
• XGBoost → eXtreme Gradient Boosting
– Variant of the gradient boosting technique (tree-based model)
– Non-linearity
– Good results (e.g. Kaggle competitions)
– Easy to use, tune, and evaluate
– Fast (parallel computation on single machine but also cluster
support, e.g. Spark)
• XGBoost has lots of parameters to tune; we adopt help from Hyperopt
https://hyperopt.github.io/hyperopt/
• XGBoost has rank:ndcg as option
32#UnifiedDataAnalytics #SparkAISummit
Ranking model
33#UnifiedDataAnalytics #SparkAISummit
• Each hyperopt run stores the result of its best parameters in MLflow
Ranking model
34#UnifiedDataAnalytics #SparkAISummit
After training, store information
in MLflow:
• Feature importances (SHAP)
• Test and training datasets
• Feature map
• XGBoost model
Ranking model
35#UnifiedDataAnalytics #SparkAISummit
For examining the feature importance
of each model we use SHAP
https://github.com/slundberg/shap
F1
F2
F3
F4
…
Serve model
36#UnifiedDataAnalytics #SparkAISummit
With a few lines of code we
save our model to Elastic
index
Serve model
37#UnifiedDataAnalytics #SparkAISummit
Serve model
38#UnifiedDataAnalytics #SparkAISummit
Split 1
Split 2
Split 3
Serve model
39#UnifiedDataAnalytics #SparkAISummit
Popularity
score > 0.006?
NoYes
Node 0
Node 1 Node 2
Split 1
Split 2
Split 3
Split 1
Serve model
40#UnifiedDataAnalytics #SparkAISummit
Popularity
score > 0.006?
No
Title match
> 9.25?
Yes
Yes No
Node 0
Node 1 Node 2
Node 3 Node 4
Split 1
Split 2
Split 3
Split 1
Split 2
Evaluation
• Use A/B testing to check if ranking models outperform the standard
implementation. Configuration of tests done with Planout
https://github.com/facebook/planout
• An automated Tableau report will show the results of the A/B test
• We are reporting quite a few metrics, but most importantly looking at:
- Click Trough Rate
- Revenue per session
- Paul Score https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary
41#UnifiedDataAnalytics #SparkAISummit
Evaluation
42#UnifiedDataAnalytics #SparkAISummit
Our journey ahead
• For search; build multiple models for multiple
categories, based on searchphrase classification
• Add more product specific attributes
• Test with personalisation
Wrap up
44#UnifiedDataAnalytics #SparkAISummit
Automating a learning-to-rank pipeline requires a lot of different parts
working together.
- Google Analytics
- Databricks / Spark
- Elasticsearch
- S3
- XGBoost
- Hyperopt / SHAP
- MLflow
- Planout
- Tableau
Questions?
45#UnifiedDataAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
Recent Trends in Personalization at Netflix
PDF
Homepage Personalization at Spotify
PPTX
Learn to Rank search results
PPTX
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
PDF
Learned Embeddings for Search and Discovery at Instacart
PDF
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
PDF
Recommender system algorithm and architecture
Deep Learning for Personalized Search and Recommender Systems
Recent Trends in Personalization at Netflix
Homepage Personalization at Spotify
Learn to Rank search results
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Learned Embeddings for Search and Discovery at Instacart
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Recommender system algorithm and architecture

What's hot (20)

PDF
Incorporating Diversity in a Learning to Rank Recommender System
PDF
Calibrated Recommendations
PDF
Collaborative filtering
PDF
How Lazada ranks products to improve customer experience and conversion
PDF
Personalizing the listening experience
PPTX
Learning to Rank Presentation (v2) at LexisNexis Search Guild
PDF
GTC 2021: Counterfactual Learning to Rank in E-commerce
PDF
Engagement, metrics and "recommenders"
PDF
Learning to rank search results
PPTX
Recommender system introduction
PDF
Counterfactual Learning for Recommendation
PDF
Tutorial: Context In Recommender Systems
PPTX
Learning a Personalized Homepage
PPT
Seo and page rank algorithm
PDF
Overview of recommender system
PDF
Recommendation System Explained
PPTX
Collaborative filtering
PPTX
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
PPTX
Recommender systems: Content-based and collaborative filtering
PPTX
How to Automatically Subcategorise Your Website Automatically With Python
Incorporating Diversity in a Learning to Rank Recommender System
Calibrated Recommendations
Collaborative filtering
How Lazada ranks products to improve customer experience and conversion
Personalizing the listening experience
Learning to Rank Presentation (v2) at LexisNexis Search Guild
GTC 2021: Counterfactual Learning to Rank in E-commerce
Engagement, metrics and "recommenders"
Learning to rank search results
Recommender system introduction
Counterfactual Learning for Recommendation
Tutorial: Context In Recommender Systems
Learning a Personalized Homepage
Seo and page rank algorithm
Overview of recommender system
Recommendation System Explained
Collaborative filtering
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Recommender systems: Content-based and collaborative filtering
How to Automatically Subcategorise Your Website Automatically With Python
Ad

Similar to Applied Machine Learning for Ranking Products in an Ecommerce Setting (20)

PDF
Find it! Nail it! Boosting e-commerce search conversions with machine learnin...
PPTX
PPTX
acmsigtalkshare-121023190142-phpapp01.pptx
PDF
DataEngConf 2017 - Machine Learning Models in Production
PDF
Recommender systems
PDF
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
PDF
Frequently Bought Together Recommendations Based on Embeddings
PPTX
Amazon a9
PPTX
Personalised Recommendations in E-Commerce
PDF
Ranking System for travel search (PoC)
PPTX
Pairwise reviews ranking and classification
PDF
PDF
SMX Advanced - When to use Machine Learning for Search Campaigns
PPTX
3e recommendation engines_meetup
PDF
REAL-TIME RECOMMENDATION SYSTEMS
PDF
Netflix Recommendations - Beyond the 5 Stars
PPTX
Productionalizing ML : Real Experience
PPTX
PPT
Mining competitors from large unstructured data
PDF
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Find it! Nail it! Boosting e-commerce search conversions with machine learnin...
acmsigtalkshare-121023190142-phpapp01.pptx
DataEngConf 2017 - Machine Learning Models in Production
Recommender systems
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
Frequently Bought Together Recommendations Based on Embeddings
Amazon a9
Personalised Recommendations in E-Commerce
Ranking System for travel search (PoC)
Pairwise reviews ranking and classification
SMX Advanced - When to use Machine Learning for Search Campaigns
3e recommendation engines_meetup
REAL-TIME RECOMMENDATION SYSTEMS
Netflix Recommendations - Beyond the 5 Stars
Productionalizing ML : Real Experience
Mining competitors from large unstructured data
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Database Infoormation System (DBIS).pptx
PDF
Lecture1 pattern recognition............
PPTX
Global journeys: estimating international migration
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Clinical guidelines as a resource for EBP(1).pdf
Database Infoormation System (DBIS).pptx
Lecture1 pattern recognition............
Global journeys: estimating international migration
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
climate analysis of Dhaka ,Banglades.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Mega Projects Data Mega Projects Data
Business Acumen Training GuidePresentation.pptx
1_Introduction to advance data techniques.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Reliability_Chapter_ presentation 1221.5784

Applied Machine Learning for Ranking Products in an Ecommerce Setting

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Arnoud de Munnik & Jerry Vos, wehkamp Applied Machine Learning for Ranking Products in an Ecommerce Setting #UnifiedDataAnalytics #SparkAISummit
  • 3. 3#UnifiedDataAnalytics #SparkAISummit Data Scientist @wehkamp since 2001 Education: Econometrics Jerry Vos Data Scientist @wehkamp since 2011 Education: Marketing Research Arnoud de Munnik
  • 4. Agenda • Intro wehkamp • E-commerce ranking problem • Our learning-to-rank pipeline • Ranking model • Q&A 4#UnifiedDataAnalytics #SparkAISummit
  • 5. the online department store for families in the Netherlands
  • 6. 1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first 2019 - a great shop experience our history where we come from
  • 7. over 2.000 brands C&A // Vingino // Hunkemöller // Mango // Tommy Hilfiger // Scotch & Soda // ONLY HK Living // House Doctor // Woood // Bloomingville // Zuiver // whkmp’s own our categories Fashion // Home & garden // Electronics // Entertainment // Household // Sports & Leisure // Beauty & Health >400.000 products >500.000 daily visitors 661 million sales 18/19 11 million packages > 950 colleagues 60% of customers shopping mobile 72% of our customers is female
  • 8. Our journey 8#UnifiedDataAnalytics #SparkAISummit • We work(ed) with a traditional corporate data warehouse • Need: ML, flexibility, speed, enabling, etc. • 2 years ago: pilot Spark on Databricks – Challenges: Training of people, data in cloud • Today: – Transformation to Databricks / Cloud (S3) – Lots of new (ML) products/prototypes and colleagues on DB platform
  • 9. Machine learning @ wehkamp 9#UnifiedDataAnalytics #SparkAISummit Recommend ers Forecasti ng Image classificati on Search Personalisat ion Product ranking Fraud detection And a lot more
  • 10. Machine learning @ wehkamp 10#UnifiedDataAnalytics #SparkAISummit Recommend ers Forecasti ng Image classificati on Search Personalisat ion Product ranking Fraud detection And a lot more
  • 11. Ranking problem for ecommerce
  • 12. Ranking problem for ecommerce 12#UnifiedDataAnalytics #SparkAISummit User search for ‘jeans’ Relevant? We return 4401 products
  • 13. Ranking problem for ecommerce 13#UnifiedDataAnalytics #SparkAISummit User navigates to ‘ladies jeans’ overview page Relevant? We return 2176 products
  • 14. Ranking problem for ecommerce 14#UnifiedDataAnalytics #SparkAISummit ● Consider a visit to a ‘product overview page’ (example ‘ladies jeans’) as a user query ● Main problem: maximize the order of relevance of returned products given a user query
  • 15. ● How good is this list? ● Suppose we know how relevant each item is, can we define an overall score for the relevancy of this list? ● Yes we can, the answer is NDCG (Normalized Discounted Cumulative Gain) https://en.wikipedia.org/wiki/Discounted_cumulative_gain Ranking problem for ecommerce
  • 16. ● Suppose we know relevancy scores, let’s rank them ● Let’s add a correction for position via Log2(i+1) ● Divide and sum to get a score: discounted cumulative gain (7,84) ● Do the same, but for this list in perfect order to get an Ideal DCG. That score will be: 9,00 ● Divide our DCG / IDCG = normalized discounted cumulative gain (0.87) Ranking problem for ecommerce 2 3 4 1 i 1 2 1,00 2,00 2 3 1,58 1,89 3 4 2,00 2,00 4 1 2,32 0,43 5 3 2,58 1,16 6 1 2,81 0,36 3 1 Sum: 7,84
  • 17. Ranking problem for ecommerce Relevancy scores Explain the scores with features 32 1 Title match 4 Article match Maximize the NDCG, by giving weight to features Reviews Seasonality Price …
  • 18. Learning to rank pipeline
  • 20. Pipeline 20#UnifiedDataAnalytics #SparkAISummit Data collection 1 Click model 2 Feature generation 3 Ranking model 4 Serve model (ElasticSearch LTR) 5 Evaluation (Tableau) 6 For relevancy scores For explaining relevance For estimating weights to features For productionising
  • 21. Efforts 21#UnifiedDataAnalytics #SparkAISummit • Initial effort of building pipeline: 2 data scientists and 1 data engineer (for search and Product Overview Page) for a couple of months • New click/ranking model: 1 data scientist can train, test and push a new ranking model to production within 1 hour
  • 22. Data collection 22#UnifiedDataAnalytics #SparkAISummit ● Source: raw Google Analytics feed (daily) ● Per product list (i.e. search, overview page): ○ ProductID ○ Position / Page ○ Impression / Click ● Challenges: ○ tagging is different for web and app ○ devices have different display formats
  • 23. Click model Reality: We don’t know the relevancy scores; use a click model. Goal: determine relevance of products in each SOP/POP Approach: predict the relevance of products based on impressions and clicks of products given its position • Clicks over Expected clicks (COEC) • Corrected for small search queries In our case: better results, easier to train & explain • DBN click model (https://github.com/varepsilon/clickmodels) • Paper: Dynamic Bayesian Network ( DBN ) model: Chapelle, O. and Zhang, Y. 2009. A dynamic bayesian network click model for web search ranking. WWW (2009)
  • 25. COEC click model 25#UnifiedDataAnalytics #SparkAISummit search phrase Product Id clicks Expected clicks Bucket aka relevancy Jeans 0123456 250 50 3 Jeans 6543210 200 20 4 Jeans 3211231 300 300 2 Jeans 4566543 400 800 1 Jeans Random product id 9997979 - - 0Add random data
  • 26. Demo clickmodel 26#UnifiedDataAnalytics #SparkAISummit Query: “Flared jeans” Relevancy: 1 Relevancy: 4
  • 27. Feature generation Try to explain and predict which attributes (i.e. features) of products (wrt user query) contribute to its relevance score 27#UnifiedDataAnalytics #SparkAISummit - Title match - Description match - Tf-idf - … ● Limit the number of features to < 100 (latency issues) ● For POP features we did not use OHE, but a Bayesian encoder to limit number of features - Popularity - Discount / Promo - Seasonality - Reviews - Days online - Brand - .. Feature examples
  • 28. Feature generation 28#UnifiedDataAnalytics #SparkAISummit Feature notebooks Kafka Search processor ES index “jeans” Log results with feature values S3 ● Initial training and query building with snapshot data Query
  • 29. Feature generation 29#UnifiedDataAnalytics #SparkAISummit Feature notebooks Delta pre- processed data Delta feature DB Fetch features and send to ES Seasonality estimate per article type OpenWeatherMap API AGG view/sales/promo data Timeseries models with prediction à Scaled via Pandas UDF ES index
  • 30. Feature generation 30#UnifiedDataAnalytics #SparkAISummit Feature notebooks Kafka Search processor ES index “jeans” Log results with feature values S3 Query Add clickmodel labels Train model Add model to ES
  • 32. Ranking model • Many machine learning techniques to use • Elastic Search LTR plugin supports XGBoost • XGBoost → eXtreme Gradient Boosting – Variant of the gradient boosting technique (tree-based model) – Non-linearity – Good results (e.g. Kaggle competitions) – Easy to use, tune, and evaluate – Fast (parallel computation on single machine but also cluster support, e.g. Spark) • XGBoost has lots of parameters to tune; we adopt help from Hyperopt https://hyperopt.github.io/hyperopt/ • XGBoost has rank:ndcg as option 32#UnifiedDataAnalytics #SparkAISummit
  • 33. Ranking model 33#UnifiedDataAnalytics #SparkAISummit • Each hyperopt run stores the result of its best parameters in MLflow
  • 34. Ranking model 34#UnifiedDataAnalytics #SparkAISummit After training, store information in MLflow: • Feature importances (SHAP) • Test and training datasets • Feature map • XGBoost model
  • 35. Ranking model 35#UnifiedDataAnalytics #SparkAISummit For examining the feature importance of each model we use SHAP https://github.com/slundberg/shap F1 F2 F3 F4 …
  • 36. Serve model 36#UnifiedDataAnalytics #SparkAISummit With a few lines of code we save our model to Elastic index
  • 39. Serve model 39#UnifiedDataAnalytics #SparkAISummit Popularity score > 0.006? NoYes Node 0 Node 1 Node 2 Split 1 Split 2 Split 3 Split 1
  • 40. Serve model 40#UnifiedDataAnalytics #SparkAISummit Popularity score > 0.006? No Title match > 9.25? Yes Yes No Node 0 Node 1 Node 2 Node 3 Node 4 Split 1 Split 2 Split 3 Split 1 Split 2
  • 41. Evaluation • Use A/B testing to check if ranking models outperform the standard implementation. Configuration of tests done with Planout https://github.com/facebook/planout • An automated Tableau report will show the results of the A/B test • We are reporting quite a few metrics, but most importantly looking at: - Click Trough Rate - Revenue per session - Paul Score https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary 41#UnifiedDataAnalytics #SparkAISummit
  • 43. Our journey ahead • For search; build multiple models for multiple categories, based on searchphrase classification • Add more product specific attributes • Test with personalisation
  • 44. Wrap up 44#UnifiedDataAnalytics #SparkAISummit Automating a learning-to-rank pipeline requires a lot of different parts working together. - Google Analytics - Databricks / Spark - Elasticsearch - S3 - XGBoost - Hyperopt / SHAP - MLflow - Planout - Tableau
  • 46. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT