SlideShare a Scribd company logo
How Pre-Training can help
solve Cold Start Problem?
Lokesh Vadlamudi
San Jose State University
Why do we need to solve cold start problem?
• Modern day recommendation systems suffer from a major issue
which is the lack of data. This is generally known as cold start problem
(data sparsity).
• The two types of models that leverage pre-training are:
• 1. Feature based models
• 2. Fine tuning models.
• In feature based models, the information of features are collected
from side information (knowledge graphs and content of items) for
users and items using pre-trained models.
• In fine tuning models, initially the model is pre-trained with user-item
interaction data, and later fine tuned to suit the needs of specific
recommendation tasks.
Different types of Feature based models are:
• 1. Content Based Recommendation:
• Item to item interaction data helps in recommending users the items.
In terms of pre-trained models, they help in getting the useful
features from text, images, etc.
• 2. Knowledge Graph Based Recommendation:
• It contains connections between users, items, etc. A knowledge graph
generally contains, user profiles, item attributes, cross domain item
relations.
• 3. Social Recommendation:
• This type of recommendation needs social graphs which in turn is
based on relation between users. A user can like items that his/her
friends already liked. In this method, the social network embeddings
pre-trained can help better the recommendation model.
Fine tuning models:
• Shallow Neural Network:
• The shallow neural network is considered as the base model for many
knowledge transfer experiments, namely shallow MLP, recurrent neural
networks.
• Ni et al.(2018) recommended a DUPN model, by implementing better pre-
training tasks. User representations are captured by LSTM and attention
layer. This model is pre-trained by multiple task objectives, including click-
through rate prediction, price prediction, shop prediction from which it can
learn universal user representations. Though the results were impressive
with accurate predictions, this model requires many extra information on
user preference for an enhanced pre-training tasks.
BERT-based Models
• Masked Item Prediction
• given the input sequence some of the items are randomly masked
with special toke [MASK].This model should rebuild the masked items.
The interaction sequence in sequential order are the items interacted
by user at a time.
• This model has used user interaction sequence by considering the
whole context for representations unlike, left-to-right next item
prediction task commonly used in session-based recommendation
system. Hence pre-trained MLP models provide accurate results.
BERT for Recommendation System
• BERT is a multi-layered bidirectional Transformer. Transformers
comprises of two sublayers namely : multi-head attention sub-layer
and point-wise-feed-forward-network
• Multi-head Attention
• the Transformer uses the multi-head self-attention which takes
information jointly from different vector sub-spaces. Specifically, this
mechanism first linearly projects the input sequence into sub-spaces
and then produces the output representation with attention
functions.
• Point-wise Feed -Forward Network
• the point-wise feed-forward function devises the model’s non-
linearity. A full connected feed-forward information is applied
individually and identically to each position. The sub-layer consists of
ReLU activation and two linear transformations
• Chen et al present a fine-tune BERT4RS with a content-based click
through prediction task. The pre-trained BERT is produced from
historical behavior sequence of the user representation and item
representation is produced for its content.
• Yang et al. (2019) follows the BERT4RS for next basket recommendation
task, in which the model is pre-trained with MIP and next basket prediction
(NBP) tasks. In reality, a user usually buys or browses a series of items (a
basket) at a time.
• Parameter-Efficient Pre-trained Model:
• Fine-tuning models for different tasks separately can be computationally
expensive. To solve this issue, Yuan et al proposed Peterrec, which uses a
grafting neural net also known as model patch. After model patching, the
networks can keep all pre-trained parameters unchanged.
•
Experimentation
• The experiment is done on movie-lens dataset. Caser model and
BERT4Rec model are used to check the use of pre-training in
recommendation. Deep knowledge transfer performs best with deep
BERT4Rec model. The next item predictions are better with Caser
model. When we inject external knowledge, BERT4Rec performs
better than Caser. Thus we can conclude that pre-training does help
in improving recommendations where cold start is present.

More Related Content

PDF
Been Kim - Interpretable machine learning, Nov 2015
PDF
Interactive Machine Learning
PDF
Interactive Machine Learning Appendix
PPTX
Short Story Submission on Meta Learning
PDF
Face Recognition Using Neural Networks
PDF
Activity Monitoring Using Wearable Sensors and Smart Phone
PPTX
01 Introduction to Machine Learning
PPTX
Facial recognition
Been Kim - Interpretable machine learning, Nov 2015
Interactive Machine Learning
Interactive Machine Learning Appendix
Short Story Submission on Meta Learning
Face Recognition Using Neural Networks
Activity Monitoring Using Wearable Sensors and Smart Phone
01 Introduction to Machine Learning
Facial recognition

What's hot (20)

PPTX
Machine learning overview
PPTX
Neural network techniques
PDF
Iaetsd an enhanced feature selection for
PPTX
A neural ada boost based facial expression recogniton System
PPTX
Machine learning
DOCX
Learning Methods in a Neural Network
PPTX
Machine learning ppt
PDF
Symbolic-Connectionist Representational Model for Optimizing Decision Making ...
PPT
NEURAL Network Design Training
PPTX
Machine learning ppt.
PPTX
Machine learning
PPTX
Internship project presentation_final_upload
DOC
Lecture #1: Introduction to machine learning (ML)
PPTX
Eckovation Machine Learning
PDF
M43016571
PPTX
Alanoud alqoufi inductive learning
PPT
Machine learning
PPTX
Machine learning in agriculture module 2
PPTX
Types of Machine Learning
Machine learning overview
Neural network techniques
Iaetsd an enhanced feature selection for
A neural ada boost based facial expression recogniton System
Machine learning
Learning Methods in a Neural Network
Machine learning ppt
Symbolic-Connectionist Representational Model for Optimizing Decision Making ...
NEURAL Network Design Training
Machine learning ppt.
Machine learning
Internship project presentation_final_upload
Lecture #1: Introduction to machine learning (ML)
Eckovation Machine Learning
M43016571
Alanoud alqoufi inductive learning
Machine learning
Machine learning in agriculture module 2
Types of Machine Learning
Ad

Similar to How can pre-training help to solve the cold start problem? (20)

PPTX
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
PPTX
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
PPTX
241216_JH_labseminar[Learning and Optimization of Implicit Negative Feedback ...
DOC
Table of Contents
PDF
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
PPTX
Introduction to Deep Leaning(UNIT 1).pptx
PPTX
Building High Available and Scalable Machine Learning Applications
PPTX
Foundation of ML Project Presentation - 1.pptx
PDF
Performance Comparison between Pytorch and Mindspore
PDF
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
PPTX
CBIR with RF
PDF
Bangla Handwritten Digit Recognition Report.pdf
PPTX
Survey of Attention mechanism
PPTX
Deep learning summary
PPTX
Artificial Intelligence (AI) INTERNSHIP.pptx
PPTX
408187464-Age-and-Gender-Detection-3-pptx.pptx
PPTX
Unit one ppt of deeep learning which includes Ann cnn
PDF
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
PPTX
Presentation 7.pptx
PPTX
Introduction to Generative AI refers to a subset of artificial intelligence
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
241216_JH_labseminar[Learning and Optimization of Implicit Negative Feedback ...
Table of Contents
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Introduction to Deep Leaning(UNIT 1).pptx
Building High Available and Scalable Machine Learning Applications
Foundation of ML Project Presentation - 1.pptx
Performance Comparison between Pytorch and Mindspore
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
CBIR with RF
Bangla Handwritten Digit Recognition Report.pdf
Survey of Attention mechanism
Deep learning summary
Artificial Intelligence (AI) INTERNSHIP.pptx
408187464-Age-and-Gender-Detection-3-pptx.pptx
Unit one ppt of deeep learning which includes Ann cnn
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...
Presentation 7.pptx
Introduction to Generative AI refers to a subset of artificial intelligence
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
1_Introduction to advance data techniques.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Computer network topology notes for revision
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
1_Introduction to advance data techniques.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Knowledge Engineering Part 1
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
IB Computer Science - Internal Assessment.pptx
Moving the Public Sector (Government) to a Digital Adoption
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Business Data Analytics.
Major-Components-ofNKJNNKNKNKNKronment.pptx
Quality review (1)_presentation of this 21
oil_refinery_comprehensive_20250804084928 (1).pptx
Launch Your Data Science Career in Kochi – 2025
Business Acumen Training GuidePresentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu

How can pre-training help to solve the cold start problem?

  • 1. How Pre-Training can help solve Cold Start Problem? Lokesh Vadlamudi San Jose State University
  • 2. Why do we need to solve cold start problem? • Modern day recommendation systems suffer from a major issue which is the lack of data. This is generally known as cold start problem (data sparsity).
  • 3. • The two types of models that leverage pre-training are: • 1. Feature based models • 2. Fine tuning models.
  • 4. • In feature based models, the information of features are collected from side information (knowledge graphs and content of items) for users and items using pre-trained models. • In fine tuning models, initially the model is pre-trained with user-item interaction data, and later fine tuned to suit the needs of specific recommendation tasks.
  • 5. Different types of Feature based models are: • 1. Content Based Recommendation: • Item to item interaction data helps in recommending users the items. In terms of pre-trained models, they help in getting the useful features from text, images, etc. • 2. Knowledge Graph Based Recommendation: • It contains connections between users, items, etc. A knowledge graph generally contains, user profiles, item attributes, cross domain item relations.
  • 6. • 3. Social Recommendation: • This type of recommendation needs social graphs which in turn is based on relation between users. A user can like items that his/her friends already liked. In this method, the social network embeddings pre-trained can help better the recommendation model.
  • 7. Fine tuning models: • Shallow Neural Network: • The shallow neural network is considered as the base model for many knowledge transfer experiments, namely shallow MLP, recurrent neural networks. • Ni et al.(2018) recommended a DUPN model, by implementing better pre- training tasks. User representations are captured by LSTM and attention layer. This model is pre-trained by multiple task objectives, including click- through rate prediction, price prediction, shop prediction from which it can learn universal user representations. Though the results were impressive with accurate predictions, this model requires many extra information on user preference for an enhanced pre-training tasks.
  • 8. BERT-based Models • Masked Item Prediction • given the input sequence some of the items are randomly masked with special toke [MASK].This model should rebuild the masked items. The interaction sequence in sequential order are the items interacted by user at a time. • This model has used user interaction sequence by considering the whole context for representations unlike, left-to-right next item prediction task commonly used in session-based recommendation system. Hence pre-trained MLP models provide accurate results.
  • 9. BERT for Recommendation System • BERT is a multi-layered bidirectional Transformer. Transformers comprises of two sublayers namely : multi-head attention sub-layer and point-wise-feed-forward-network • Multi-head Attention • the Transformer uses the multi-head self-attention which takes information jointly from different vector sub-spaces. Specifically, this mechanism first linearly projects the input sequence into sub-spaces and then produces the output representation with attention functions.
  • 10. • Point-wise Feed -Forward Network • the point-wise feed-forward function devises the model’s non- linearity. A full connected feed-forward information is applied individually and identically to each position. The sub-layer consists of ReLU activation and two linear transformations • Chen et al present a fine-tune BERT4RS with a content-based click through prediction task. The pre-trained BERT is produced from historical behavior sequence of the user representation and item representation is produced for its content.
  • 11. • Yang et al. (2019) follows the BERT4RS for next basket recommendation task, in which the model is pre-trained with MIP and next basket prediction (NBP) tasks. In reality, a user usually buys or browses a series of items (a basket) at a time. • Parameter-Efficient Pre-trained Model: • Fine-tuning models for different tasks separately can be computationally expensive. To solve this issue, Yuan et al proposed Peterrec, which uses a grafting neural net also known as model patch. After model patching, the networks can keep all pre-trained parameters unchanged. •
  • 12. Experimentation • The experiment is done on movie-lens dataset. Caser model and BERT4Rec model are used to check the use of pre-training in recommendation. Deep knowledge transfer performs best with deep BERT4Rec model. The next item predictions are better with Caser model. When we inject external knowledge, BERT4Rec performs better than Caser. Thus we can conclude that pre-training does help in improving recommendations where cold start is present.