How can pre-training help to solve the cold start problem?

How Pre-Training can help
solve Cold Start Problem?
Lokesh Vadlamudi
San Jose State University

Why do we need to solve cold start problem?
• Modern day recommendation systems suffer from a major issue
which is the lack of data. This is generally known as cold start problem
(data sparsity).

• The two types of models that leverage pre-training are:
• 1. Feature based models
• 2. Fine tuning models.

• In feature based models, the information of features are collected
from side information (knowledge graphs and content of items) for
users and items using pre-trained models.
• In fine tuning models, initially the model is pre-trained with user-item
interaction data, and later fine tuned to suit the needs of specific
recommendation tasks.

Different types of Feature based models are:
• 1. Content Based Recommendation:
• Item to item interaction data helps in recommending users the items.
In terms of pre-trained models, they help in getting the useful
features from text, images, etc.
• 2. Knowledge Graph Based Recommendation:
• It contains connections between users, items, etc. A knowledge graph
generally contains, user profiles, item attributes, cross domain item
relations.

• 3. Social Recommendation:
• This type of recommendation needs social graphs which in turn is
based on relation between users. A user can like items that his/her
friends already liked. In this method, the social network embeddings
pre-trained can help better the recommendation model.

Fine tuning models:
• Shallow Neural Network:
• The shallow neural network is considered as the base model for many
knowledge transfer experiments, namely shallow MLP, recurrent neural
networks.
• Ni et al.(2018) recommended a DUPN model, by implementing better pre-
training tasks. User representations are captured by LSTM and attention
layer. This model is pre-trained by multiple task objectives, including click-
through rate prediction, price prediction, shop prediction from which it can
learn universal user representations. Though the results were impressive
with accurate predictions, this model requires many extra information on
user preference for an enhanced pre-training tasks.

BERT-based Models
• Masked Item Prediction
• given the input sequence some of the items are randomly masked
with special toke [MASK].This model should rebuild the masked items.
The interaction sequence in sequential order are the items interacted
by user at a time.
• This model has used user interaction sequence by considering the
whole context for representations unlike, left-to-right next item
prediction task commonly used in session-based recommendation
system. Hence pre-trained MLP models provide accurate results.

BERT for Recommendation System
• BERT is a multi-layered bidirectional Transformer. Transformers
comprises of two sublayers namely : multi-head attention sub-layer
and point-wise-feed-forward-network
• Multi-head Attention
• the Transformer uses the multi-head self-attention which takes
information jointly from different vector sub-spaces. Specifically, this
mechanism first linearly projects the input sequence into sub-spaces
and then produces the output representation with attention
functions.

• Point-wise Feed -Forward Network
• the point-wise feed-forward function devises the model’s non-
linearity. A full connected feed-forward information is applied
individually and identically to each position. The sub-layer consists of
ReLU activation and two linear transformations
• Chen et al present a fine-tune BERT4RS with a content-based click
through prediction task. The pre-trained BERT is produced from
historical behavior sequence of the user representation and item
representation is produced for its content.

• Yang et al. (2019) follows the BERT4RS for next basket recommendation
task, in which the model is pre-trained with MIP and next basket prediction
(NBP) tasks. In reality, a user usually buys or browses a series of items (a
basket) at a time.
• Parameter-Efficient Pre-trained Model:
• Fine-tuning models for different tasks separately can be computationally
expensive. To solve this issue, Yuan et al proposed Peterrec, which uses a
grafting neural net also known as model patch. After model patching, the
networks can keep all pre-trained parameters unchanged.
•

Experimentation
• The experiment is done on movie-lens dataset. Caser model and
BERT4Rec model are used to check the use of pre-training in
recommendation. Deep knowledge transfer performs best with deep
BERT4Rec model. The next item predictions are better with Caser
model. When we inject external knowledge, BERT4Rec performs
better than Caser. Thus we can conclude that pre-training does help
in improving recommendations where cold start is present.

How can pre-training help to solve the cold start problem?

More Related Content

What's hot (20)

Similar to How can pre-training help to solve the cold start problem? (20)

Recently uploaded (20)

How can pre-training help to solve the cold start problem?