SlideShare a Scribd company logo
“Deep Reinforcement Learning
based Recommendation with
Explicit User-Item Interactions
Modeling“
by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou
Zhang
Presented by Kishor Datta Gupta
Problem
• A Recommender System refers to
a system that is capable of
predicting the future preference of
a set of items for a user, and
recommend the top items..
• How to build an effective
recommender system?
Recommender
System
Analyzed
Content-based collaborative filtering
Matrix factorization based methods
Logistic regression
Factorization machines and its variants
Deep learning models
Multi-armed bandits
Problem in
Existing systems
• They consider the
recommendation procedure as
a static process, i.e., they
assume the underlying user’s
preference keeps static and
they aim to learn the user’s
preference as precise as
possible.
• They are learned to maximize
the immediate rewards of
recommendations, but ignore
the long-term benefits that the
recommendations can make
Analysis on sequential patterns on user’s behavior in
MovieLens and Yahoo!Music datasets
Proposed Solution
• A deep reinforcement learning based recommendation
framework DRR. Unlike the conventional studies, DRR
adopts an “Actor-Critic” structure and treats the
recommendation as a sequential decision making
process, which takes both the immediate and long-
term rewards into consideration
Deep RL based Recommendation (DRR) Framework
Deep RL based Recommendation (DRR) Framework
The Actor network:
The user state, denoted by the embeddings of
her n latest positively interacted items, is
regarded as the input. Then the embeddings are
fed into a state representation module (which will
be introduced in detail later) to produce a
summarized representations for the user.
The top ranked item (w.r.t. the ranking scores) is
recommended to the user.
used ε-greedy exploration technique.
Deep RL based Recommendation (DRR) Framework
The Critic network:
According to the Q-value, the
parameters of the Actor
network are updated in the
direction of improving the
performance of action a Based
on the deterministic policy
gradient theorem
Critic network is updated
accordingly by the temporal-
difference learning approach ,
i.e., minimizing the mean
squared error
Deep RL based Recommendation (DRR) Framework
State Representation Module:
Modeling the feature
interactions explicitly can boost
the performance
DRR-p.
DRR-u
DRR-Ave
Deep RL based Recommendation (DRR) Framework
DRR-p.
Product based neural network for the state representation module
utilizes a product operator to capture the pairwise local dependency between
items.
Deep RL based Recommendation (DRR) Framework
DRR-u.
In DRR-u, we can see that the user embedding is also incorporated. In addition to
the local dependency between items, the pairwise interactions of user-item are
also taken into account.
Deep RL based Recommendation (DRR) Framework
DRR-Ave.
the embeddings of items are first transformed by a weighted average pooling
layer. Then, the resulting vector is leveraged to model the interactions with the
input user. Finally, the embedding of the user, the interaction vector, and the
average pooling result of items are concatenate into a vector to denote the state
representation.
Deep RL based Recommendation (DRR) Training
DRR utilizes the users’ interaction history with the recommender agent as training data.
During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the
user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1.
According to the feedback, the recommender updates its recommendation policy.
The training procedure mainly includes two phases, i.e., transition generation and model updating.
• For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an
actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can
be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the
transition(st,at,rt,st+1)into the replay bufferD.
• In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling
technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’
parameters with the soft replace strategy.
Experiments
Dataset
•Movielens 100k
•Movielens 1m
•Yahoo Music
•Jester
Comparison Method
•Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number
of positive ratings from current available items to the users at each timestep.
•PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements.
•SVD++mixes strengths of the latent model as well as the neighborhood model
•DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies.
Reward Function
•in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j
comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore,
the reward function can be defined as follows: R(s,a) =rate i,j/10
Results
Normalized Discounted Cumulative Gain (NDCG)
Claims
both the immediate and long-term
rewards into account. Incorporated and
three instantiation structures are
designed, which can explicitly model the
interactions between users and items.
Extensive experiments on four real-world
datasets demonstrate the superiority of
the proposed DRR method over state-of-
the-art competitors
My thaughts
A neural network to convert user-
item relation into a state is not clear
to me.
Authors didn’t consider cold-start
problem.
Questions ?

More Related Content

PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
Collaborative filtering
PPTX
Recommendation at Netflix Scale
PDF
Recommender system algorithm and architecture
PDF
Matrix Factorization In Recommender Systems
PDF
Recent advances in deep recommender systems
PDF
An introduction to Machine Learning
PDF
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Deep Learning for Personalized Search and Recommender Systems
Collaborative filtering
Recommendation at Netflix Scale
Recommender system algorithm and architecture
Matrix Factorization In Recommender Systems
Recent advances in deep recommender systems
An introduction to Machine Learning
Applied Machine Learning for Ranking Products in an Ecommerce Setting

What's hot (20)

PDF
Recommender Systems
PPTX
Recommender systems for E-commerce
PDF
An introduction to Recommender Systems
PDF
Building Data Pipelines for Music Recommendations at Spotify
PPTX
Recommender system introduction
PDF
Tutorial: Context In Recommender Systems
PPTX
Recommendation System
PPT
Recommendation system
PDF
Recommendation System Explained
PDF
Overview of recommender system
PPTX
Collaborative Filtering Recommendation System
PDF
Introduction to Recommendation Systems
PDF
Feature Engineering
PDF
Homepage Personalization at Spotify
PPT
Decision tree
PPTX
Locality sensitive hashing
PPTX
Content based filtering
PDF
Calibrated Recommendations
PDF
General Tips for participating Kaggle Competitions
PDF
Deep Learning for Recommender Systems
Recommender Systems
Recommender systems for E-commerce
An introduction to Recommender Systems
Building Data Pipelines for Music Recommendations at Spotify
Recommender system introduction
Tutorial: Context In Recommender Systems
Recommendation System
Recommendation system
Recommendation System Explained
Overview of recommender system
Collaborative Filtering Recommendation System
Introduction to Recommendation Systems
Feature Engineering
Homepage Personalization at Spotify
Decision tree
Locality sensitive hashing
Content based filtering
Calibrated Recommendations
General Tips for participating Kaggle Competitions
Deep Learning for Recommender Systems
Ad

Similar to Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling (20)

PPTX
Reinforcement Learning in Recommender Systems.pptx
PDF
Crafting Recommenders: the Shallow and the Deep of it!
PDF
Further enhancements of recommender systems using deep learning
PPTX
Deep learning in python by purshottam verma
PDF
Introduction to Recommender System
PDF
Deep Learning for Recommender Systems
PPTX
Talk@rmit 09112017
PDF
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
PPTX
Deep Learning for Recommender Systems
PDF
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
PPTX
RNNs for Recommendations and Personalization
PDF
Deep Learning Recommender Systems
PDF
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
PDF
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
PDF
LatentCross.pdf
PDF
Fundamentals of Deep Recommender Systems
PDF
Deep Learning for Recommender Systems with Nick pentreath
PPTX
How can pre-training help to solve the cold start problem?
PDF
Recent Trends in Personalization at Netflix
PDF
Reward constrained interactive recommendation with natural language feedback ...
Reinforcement Learning in Recommender Systems.pptx
Crafting Recommenders: the Shallow and the Deep of it!
Further enhancements of recommender systems using deep learning
Deep learning in python by purshottam verma
Introduction to Recommender System
Deep Learning for Recommender Systems
Talk@rmit 09112017
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Deep Learning for Recommender Systems
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
RNNs for Recommendations and Personalization
Deep Learning Recommender Systems
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
LatentCross.pdf
Fundamentals of Deep Recommender Systems
Deep Learning for Recommender Systems with Nick pentreath
How can pre-training help to solve the cold start problem?
Recent Trends in Personalization at Netflix
Reward constrained interactive recommendation with natural language feedback ...
Ad

More from Kishor Datta Gupta (20)

PPTX
GAN introduction.pptx
PPTX
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
PPTX
A safer approach to build recommendation systems on unidentifiable data
PPTX
Adversarial Attacks and Defense
PPTX
Who is responsible for adversarial defense
PPTX
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
PPTX
Zero shot learning
PPTX
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
PPTX
Machine learning in computer security
PPTX
Policy Based reinforcement Learning for time series Anomaly detection
PPTX
Cyber intrusion
PPTX
understanding the pandemic through mining covid news using natural language p...
PPTX
Different representation space for MNIST digit
PPTX
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
PPTX
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
PPTX
Adversarial Input Detection Using Image Processing Techniques (IPT)
PPTX
Clustering report
PPTX
Basic digital image concept
PPTX
An empirical study on algorithmic bias (aiml compsac2020)
PPTX
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
GAN introduction.pptx
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
A safer approach to build recommendation systems on unidentifiable data
Adversarial Attacks and Defense
Who is responsible for adversarial defense
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Zero shot learning
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Machine learning in computer security
Policy Based reinforcement Learning for time series Anomaly detection
Cyber intrusion
understanding the pandemic through mining covid news using natural language p...
Different representation space for MNIST digit
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Adversarial Input Detection Using Image Processing Techniques (IPT)
Clustering report
Basic digital image concept
An empirical study on algorithmic bias (aiml compsac2020)
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

  • 1. “Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling“ by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou Zhang Presented by Kishor Datta Gupta
  • 2. Problem • A Recommender System refers to a system that is capable of predicting the future preference of a set of items for a user, and recommend the top items.. • How to build an effective recommender system?
  • 3. Recommender System Analyzed Content-based collaborative filtering Matrix factorization based methods Logistic regression Factorization machines and its variants Deep learning models Multi-armed bandits
  • 4. Problem in Existing systems • They consider the recommendation procedure as a static process, i.e., they assume the underlying user’s preference keeps static and they aim to learn the user’s preference as precise as possible. • They are learned to maximize the immediate rewards of recommendations, but ignore the long-term benefits that the recommendations can make Analysis on sequential patterns on user’s behavior in MovieLens and Yahoo!Music datasets
  • 5. Proposed Solution • A deep reinforcement learning based recommendation framework DRR. Unlike the conventional studies, DRR adopts an “Actor-Critic” structure and treats the recommendation as a sequential decision making process, which takes both the immediate and long- term rewards into consideration
  • 6. Deep RL based Recommendation (DRR) Framework
  • 7. Deep RL based Recommendation (DRR) Framework The Actor network: The user state, denoted by the embeddings of her n latest positively interacted items, is regarded as the input. Then the embeddings are fed into a state representation module (which will be introduced in detail later) to produce a summarized representations for the user. The top ranked item (w.r.t. the ranking scores) is recommended to the user. used ε-greedy exploration technique.
  • 8. Deep RL based Recommendation (DRR) Framework The Critic network: According to the Q-value, the parameters of the Actor network are updated in the direction of improving the performance of action a Based on the deterministic policy gradient theorem Critic network is updated accordingly by the temporal- difference learning approach , i.e., minimizing the mean squared error
  • 9. Deep RL based Recommendation (DRR) Framework State Representation Module: Modeling the feature interactions explicitly can boost the performance DRR-p. DRR-u DRR-Ave
  • 10. Deep RL based Recommendation (DRR) Framework DRR-p. Product based neural network for the state representation module utilizes a product operator to capture the pairwise local dependency between items.
  • 11. Deep RL based Recommendation (DRR) Framework DRR-u. In DRR-u, we can see that the user embedding is also incorporated. In addition to the local dependency between items, the pairwise interactions of user-item are also taken into account.
  • 12. Deep RL based Recommendation (DRR) Framework DRR-Ave. the embeddings of items are first transformed by a weighted average pooling layer. Then, the resulting vector is leveraged to model the interactions with the input user. Finally, the embedding of the user, the interaction vector, and the average pooling result of items are concatenate into a vector to denote the state representation.
  • 13. Deep RL based Recommendation (DRR) Training DRR utilizes the users’ interaction history with the recommender agent as training data. During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1. According to the feedback, the recommender updates its recommendation policy. The training procedure mainly includes two phases, i.e., transition generation and model updating. • For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the transition(st,at,rt,st+1)into the replay bufferD. • In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’ parameters with the soft replace strategy.
  • 14. Experiments Dataset •Movielens 100k •Movielens 1m •Yahoo Music •Jester Comparison Method •Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number of positive ratings from current available items to the users at each timestep. •PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements. •SVD++mixes strengths of the latent model as well as the neighborhood model •DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies. Reward Function •in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore, the reward function can be defined as follows: R(s,a) =rate i,j/10
  • 16. Claims both the immediate and long-term rewards into account. Incorporated and three instantiation structures are designed, which can explicitly model the interactions between users and items. Extensive experiments on four real-world datasets demonstrate the superiority of the proposed DRR method over state-of- the-art competitors
  • 17. My thaughts A neural network to convert user- item relation into a state is not clear to me. Authors didn’t consider cold-start problem.