Applied Machine Learning for Ranking Products in an Ecommerce Setting

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Arnoud de Munnik & Jerry Vos, wehkamp
Applied Machine Learning for
Ranking Products in an
Ecommerce Setting
#UnifiedDataAnalytics #SparkAISummit

3#UnifiedDataAnalytics #SparkAISummit
Data Scientist
@wehkamp since 2001
Education: Econometrics
Jerry Vos
Data Scientist
@wehkamp since 2011
Education: Marketing Research
Arnoud de Munnik

Agenda
• Intro wehkamp
• E-commerce ranking problem
• Our learning-to-rank pipeline
• Ranking model
• Q&A

the online department store for families in the
Netherlands

1952 - first advertisement 1955 - first catalog 1995 - first steps online 2010 - completely online 2018 - mobile first
2019 -
a great shop
experience
our history
where we come from

over 2.000 brands
C&A // Vingino // Hunkemöller // Mango // Tommy Hilfiger // Scotch & Soda // ONLY
HK Living // House Doctor // Woood // Bloomingville // Zuiver // whkmp’s own
our categories
Fashion // Home & garden // Electronics // Entertainment // Household // Sports & Leisure // Beauty &
Health
>400.000
products
>500.000
daily visitors
661 million
sales 18/19
11 million
packages
> 950
colleagues
60%
of customers
shopping mobile
72%
of our
customers is
female

Our journey
• We work(ed) with a traditional corporate data warehouse
• Need: ML, flexibility, speed, enabling, etc.
• 2 years ago: pilot Spark on Databricks
– Challenges: Training of people, data in cloud
• Today:
– Transformation to Databricks / Cloud (S3)
– Lots of new (ML) products/prototypes and colleagues on DB platform

Machine learning @ wehkamp
Recommend
ers
Forecasti
ng
Image
classificati
on
Search
Personalisat
ion
Product
ranking
Fraud
detection
And a
lot
more

Ranking problem for ecommerce
User search for ‘jeans’
Relevant?
We return 4401 products

User navigates to
‘ladies jeans’ overview
page
Relevant?
We return 2176 products

● Consider a visit to a ‘product overview page’ (example
‘ladies jeans’) as a user query
● Main problem: maximize the order of relevance of returned
products given a user query

● How good is this list?
● Suppose we know how relevant each item is, can we
define an overall score for the relevancy of this list?
● Yes we can, the answer is NDCG
(Normalized Discounted Cumulative Gain)
https://en.wikipedia.org/wiki/Discounted_cumulative_gain

● Suppose we know relevancy scores,
let’s rank them
● Let’s add a correction for position via
Log2(i+1)
● Divide and sum to get a score:
discounted cumulative gain (7,84)
● Do the same, but for this list in
perfect order to get an Ideal DCG.
That score will be: 9,00
● Divide our DCG / IDCG =
normalized discounted cumulative
gain (0.87)
2 3 4
1
i
1 2 1,00 2,00
2 3 1,58 1,89
3 4 2,00 2,00
4 1 2,32 0,43
5 3 2,58 1,16
6 1 2,81 0,36
3 1
Sum: 7,84

Relevancy scores Explain the scores with features
32 1
Title match
4
Article match
Maximize the NDCG, by giving weight to features
Reviews
Seasonality
Price
…

Special thanks
Wikimedia
MjoLniR: https://github.com/wikimedia/search-MjoLniR

Pipeline
Data
collection
1
Click model
2
Feature
generation
3
Ranking model
4
Serve model
(ElasticSearch
LTR)
5
Evaluation
(Tableau)
6
For relevancy scores
For explaining relevance
For estimating weights to
features
For productionising

Efforts
• Initial effort of building pipeline:
2 data scientists and 1 data engineer (for search and Product Overview Page) for a couple of
months
• New click/ranking model:
1 data scientist can train, test and push a new ranking model to production within 1 hour

Data collection
● Source: raw Google Analytics feed (daily)
● Per product list (i.e. search, overview page):
○ ProductID
○ Position / Page
○ Impression / Click
● Challenges:
○ tagging is different for web and app
○ devices have different display formats

Click model
Reality: We don’t know the relevancy scores; use a click model.
Goal: determine relevance of products in each SOP/POP
Approach: predict the relevance of products based on impressions and clicks of products
given its position
• Clicks over Expected clicks (COEC)
• Corrected for small search queries
In our case:
better results, easier to train & explain
• DBN click model
(https://github.com/varepsilon/clickmodels)
• Paper: Dynamic Bayesian Network ( DBN
) model: Chapelle, O. and Zhang, Y. 2009.
A dynamic bayesian network click model
for web search ranking. WWW (2009)

COEC click model
Example

COEC click model
search phrase Product Id clicks Expected
clicks
Bucket aka
relevancy
Jeans 0123456 250 50 3
Jeans 6543210 200 20 4
Jeans 3211231 300 300 2
Jeans 4566543 400 800 1
Jeans Random product id
9997979
- - 0Add random data

Demo clickmodel
Query: “Flared jeans”
Relevancy: 1
Relevancy: 4

Feature generation
Try to explain and predict which attributes (i.e. features) of products (wrt user query)
contribute to its relevance score
- Title match
- Description match
- Tf-idf
- …
● Limit the number of features to
< 100 (latency issues)
● For POP features we did not
use OHE, but a Bayesian
encoder to limit number of
features
- Popularity
- Discount / Promo
- Seasonality
- Reviews
- Days online
- Brand
- ..
Feature examples

Feature generation
Feature
notebooks
Kafka
Search
processor ES index
“jeans”
Log results with
feature values
S3
● Initial training and query building with snapshot data
Query

Feature generation
Feature
notebooks
Delta pre-
processed
data
Delta
feature
DB
Fetch
features
and send
to ES
Seasonality
estimate per
article type
OpenWeatherMap
API
AGG
view/sales/promo
data
Timeseries models
with prediction à
Scaled via Pandas
UDF
ES index

Feature generation
Feature
notebooks
Kafka
Search
processor ES index
“jeans”
Log results with
feature values
S3
Query
Add
clickmodel
labels
Train
model
Add model
to ES

Ranking model
• Many machine learning techniques to use
• Elastic Search LTR plugin supports XGBoost
• XGBoost → eXtreme Gradient Boosting
– Variant of the gradient boosting technique (tree-based model)
– Non-linearity
– Good results (e.g. Kaggle competitions)
– Easy to use, tune, and evaluate
– Fast (parallel computation on single machine but also cluster
support, e.g. Spark)
• XGBoost has lots of parameters to tune; we adopt help from Hyperopt
https://hyperopt.github.io/hyperopt/
• XGBoost has rank:ndcg as option

Ranking model
• Each hyperopt run stores the result of its best parameters in MLflow

Ranking model
After training, store information
in MLflow:
• Feature importances (SHAP)
• Test and training datasets
• Feature map
• XGBoost model

Ranking model
For examining the feature importance
of each model we use SHAP
https://github.com/slundberg/shap
F1
F2
F3
F4
…

Serve model
With a few lines of code we
save our model to Elastic
index

Serve model

Serve model
Split 1
Split 2
Split 3

Serve model
Popularity
score > 0.006?
NoYes
Node 0
Node 1 Node 2
Split 1
Split 2
Split 3
Split 1

Serve model
Popularity
score > 0.006?
No
Title match
> 9.25?
Yes
Yes No
Node 0
Node 1 Node 2
Node 3 Node 4
Split 1
Split 2
Split 3
Split 1
Split 2

Evaluation
• Use A/B testing to check if ranking models outperform the standard
implementation. Configuration of tests done with Planout
https://github.com/facebook/planout
• An automated Tableau report will show the results of the A/B test
• We are reporting quite a few metrics, but most importantly looking at:
- Click Trough Rate
- Revenue per session
- Paul Score https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary

Evaluation

Our journey ahead
• For search; build multiple models for multiple
categories, based on searchphrase classification
• Add more product specific attributes
• Test with personalisation

Wrap up
Automating a learning-to-rank pipeline requires a lot of different parts
working together.
- Google Analytics
- Databricks / Spark
- Elasticsearch
- S3
- XGBoost
- Hyperopt / SHAP
- MLflow
- Planout
- Tableau

Questions?

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Applied Machine Learning for Ranking Products in an Ecommerce Setting

More Related Content

What's hot (20)

Similar to Applied Machine Learning for Ranking Products in an Ecommerce Setting (20)

More from Databricks (20)

Recently uploaded (20)

Applied Machine Learning for Ranking Products in an Ecommerce Setting