When Deep Learning Meets Recommender System

© 2019 Fiverr Int. Lmt. All Rights Reserved. Proprietary &
Confidential.
Deep Learning for
Recommender
Systems
Asi Messica
April 2020

Confidential.
Fiverr is the world’s largest
marketplace for freelance
services. We change how the
world works together by
connecting business decision
makers with talented
freelancers.

Confidential.
Personalization - Why?
Help users find
servicesproductscontent
which is relevant for them to
maximize users satisfaction
and retention

Confidential.
~2016
Deep Learning becomes popular in
Recommender System
~2012
Deep Learning becomes popular in
Machine Learning
• Feature extraction directly from the content
• Heterogeneous data handled easily
• Sequential behavior modeling with RNNs/CNNs
• More accurate representation learning of users
and items
• Non linear transformation
• Deep learning worked well in other complex
domains. Worth a try!

Confidential.
Item to Item Collaborative
Filtering
Recommend items that similar users have
chosen
Similar items are items which were chosen by
the same users

Confidential.
Matrix Factorization
Approximate rating matrix by product of lower rank
matrix
Latent variables are introduced to represent the
underlying reasons of user purchasing a product
Each user and item are represented by a d
dimensional vector of latent features
…
Koren et al. (2009)

Confidential.
Factorization Machines
In many cases we want to incorporate user or
item metadata or context into the
recommendation
(U,I,R) > (U,I,F1,F2,..,R)
Matrix Factorization
Factorization machines
Rendle (2010)

Confidential.
Use Cases
Entity (Item)
Embedding
Sequence
Prediction
Hybrid
Explore-
Exploit

Confidential.
Word Embedding
Goal: Learning vector space representations of
words capturing fine-grained semantic
regularities among words
Mikolov et al. (2013) – Word2Vec
Pennington et al. (2014) - GloVe
Word Analogy
Word Similariy

Confidential.
Word2Vec CBOW
Goal: Learning vector space representations of
words capturing fine-grained semantic
regularities among words
● Continuous Bag of Words
● Maximizes the probability of the target word
given the Model
● Input: one-hot encoded words
● Input to hidden weights: Embedding
matrix of words
● Hidden to output weights
● Softmax transformation
Word2Vec - CBOW

Confidential.
Item (Entity) Embedding
Replace words with items in a session/user
profile
-Embedding: a (learned) real value vector
representing an entity
-Similar entities’ embedding are similar
Used in recommenders
Features in more advanced algorithms Grbovic
& Chen 2018, AirBNB ;
Item-to-item recommendations
Model clustering (e.g. user country or sub-
categories)

Confidential.
Prod2Vec
-Skip-gram model on products
Input: i-th product purchased by the user
Context: the other purchases of the user
-Learning user representation
Follows paragraph2vec
Input: user + products purchased except for the ith
Target: i-th product purchased by the user
[Grbovic et al. 2015]
-Skip-gram with Negative Sampling (SGNS) is
applied to event data
[Barkan & Koenigstein, 2016]
User Embedding
Grbovic et al. 2015

Confidential.
Item Similarity
Similarity & Analogy Tests
Greenstein-Messica et al. (2017)

Confidential.
Sequence Prediction
Session based recommendation
-Anonymous user recommendation/Intent
-Sequence of events (sometimes short)
-Predict next event. Ranking
GRU4REC
Network architecture:
Input: one hot encoded item Id
Optional embedding layer
GRU layer(s)
Output: scores over all items
Target: the next item in the session
Hidasi et al. (2015)

Confidential.
GRU4REC
Output sampling
Computing scores for all items in every step is slow
One positive item + several negative samples
Loss functions
Cross-entropy + Softmax
Average of BPR scores
Top1 score
Filtering:
1 click sessions, not realistically long
Items with support lower than 5 Hidasi et al. (2015)

Confidential.
GRU4REC
Key Observations
Similar accuracy with/without embedding
Multiple layers rarely help
Quick conversion (small changes after 5 -1 10
epochs)
Overall 20 – 30% improvement vs. item to item
recommendations

Confidential.
Session Based
Recommendation
Apply the advances in sequence modeling from
deep learning
- RNN architectures trained on the sequence
of user events in a session to predict next
item in session
- Report more than 10% accuracy gain over
baselines
- Adding context and attention
15,000 products, 999,000 filtered sessions, Embedded layer = 50

Confidential.
Feature Rich Session Based
Recommendations
-Items have rich feature representations such as
pictures and text descriptions
-Incorporating image and text into the GRU4REC will
improve prediction accuracy
Images encoding:
-GoogleNet implementation, pre-trained with
ImageNet
-Features were extracted from the last average
pooling layer
Text encoding:
-Bag-of-words + TFIDF
….

Confidential.
Feature Rich Session Based
Recommendations
Key Observations
-The sequence of item features of itself is not
enough to model the session well
-Incorporating the item’s features increase the
MRR (about 6%), but don’t find more relevant
items (Recall)
Youtube-like dataset, not English
….

Confidential.
Wide & Deep
- Combines the strengths of linear models with deep learning models
- Used by Google Play app store recommendations
- Sparsity of the deep model handled by using embeddings
- The feature set includes raw input features and transformed features
- Both models are trained on the same time
- Deep works better than wide (+2.9%), but the deep & wide works best (+3.9%)
Cheng et al. 2016

Confidential.
You Tube Recommender
Key challenges: scale, freshness, noise, latency
Two deep neural nets: one for candidate generation,
one for ranking
Candidate generation
-Extreme multi-class
-Embeddings of both user and video
-The embeddings are learned jointy with the rest of
the architectures
-To train the model, they used negative sampling
Ranking
-The evaluation matrix time watched
-Hundreds of features (including image thumbnail
and hand-crafted )
Covington et al. 2016]
Overview
Candidate Generation

Confidential.
DeepFM
Wide and Deep architecture aiming to leverage
strengths of Factorization Machines for the
linear component
- Models train together and both parts shared
the same weights
- Flexible handling of mixed real/categorical
variables
Guo et al. 2017

25
AB Test – Huawei App Store
Control: Logistic regression; Test: DeepFM

Confidential.
Multi-Modal Recommendations
Personalized tag recommendation for images
using deep transfer learning
- Visual image feature extraction via pre-
trained VGG-16
- Object detection via pre-trained YOLOv2
- Tagging history and image features are fed
into adapted factorization model
Ge et al. 2018
Nguyen et al. 2017

Confidential.
Latest
XDeepFM (2018) – Combining Explicit and
Implicit Feature Interactions for Recommender
Systems
Deep Interest Evolution Network (2018)
NFFM (2019) - Operation-aware Neural
Networks for User Response Prediction
https://github.com/shenweichen/DeepCTR
Lian et al. 2018

Confidential.
Gating
Goal: Optimize Exploration-Exploitation for
PayoffReward Maximization
In Recsys: the reward can be hits and the machines are the items to
recommend.

Confidential.
Exploration Principles
The best long-term strategy may involve
short-term sacrifices
Gather information to make the best overall
decision
● Naive exploration: Add a noise to the
greedy policy [𝜀 greedy]
● Optimism in the face of uncertainty:
prefer actions with uncertain values.
[Upper Confidence Bound (UCB)]
● Probability matching: select the
actions according to the probability they
are the best. [Thompson Sampling]
Probability density over mean reward

Confidential.
Evaluation
Policy evaluation*
Assumption: logging policy that was used to
gather the logged data chose each arm at each
time step uniformly at random
Chu et al. 2010. A contextual-bandit approach to
personalized news article recommendation.

Confidential.
Estimate Probability Density
over CTR in DLRS?
Deep Bayesian Bandits Showndown (ICLR
2018) – An empirical comparison + a
python library
Simple methods (not the best)
● Neural Greedy
● Dropout
● Bootstrap
Online decision algorithm
https://github.com/tensorflow/models/tree/mast
er/research/deep_contextual_bandits
Russo et al. A tutorial on Thompson sampling. 2018

Confidential.
MCDropout as a Bayesian
Approximation: Representing
Model Uncertainty in Deep
Learning [Gal et al. 2016]
Use MC Dropout in inference time to estimate
model uncertainty
They claimed that using Dropout at inference
time is equivalent to doing Bayesian
approximation
Can be leveraged for TS, UCB for Recommender
System explorationexploitation combination
[Zeldes et al. 2017 (Taboola)]
https://github.com/yaringal/DropoutUncertaintyE
xps
https://www.cs.ox.ac.uk/people/yarin.gal/website/bl
og_3d801aa532c1ce.html

Confidential.
Take Home Points
● There are advantages in using deep learning for recommender systems
● Proceed with caution
● Stay tuned

Confidential.
Questions?

Confidential.
Thank You!
asi.messica@gmail.com

When Deep Learning Meets Recommender System

More Related Content

Similar to When Deep Learning Meets Recommender System (20)

Recently uploaded (20)

When Deep Learning Meets Recommender System