Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

“Deep Reinforcement Learning
based Recommendation with
Explicit User-Item Interactions
Modeling“
by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou
Zhang
Presented by Kishor Datta Gupta

Problem
• A Recommender System refers to
a system that is capable of
predicting the future preference of
a set of items for a user, and
recommend the top items..
• How to build an effective
recommender system?

Recommender
System
Analyzed
Content-based collaborative filtering
Matrix factorization based methods
Logistic regression
Factorization machines and its variants
Deep learning models
Multi-armed bandits

Problem in
Existing systems
• They consider the
recommendation procedure as
a static process, i.e., they
assume the underlying user’s
preference keeps static and
they aim to learn the user’s
preference as precise as
possible.
• They are learned to maximize
the immediate rewards of
recommendations, but ignore
the long-term benefits that the
recommendations can make
Analysis on sequential patterns on user’s behavior in
MovieLens and Yahoo!Music datasets

Proposed Solution
• A deep reinforcement learning based recommendation
framework DRR. Unlike the conventional studies, DRR
adopts an “Actor-Critic” structure and treats the
recommendation as a sequential decision making
process, which takes both the immediate and long-
term rewards into consideration

Deep RL based Recommendation (DRR) Framework

The Actor network:
The user state, denoted by the embeddings of
her n latest positively interacted items, is
regarded as the input. Then the embeddings are
fed into a state representation module (which will
be introduced in detail later) to produce a
summarized representations for the user.
The top ranked item (w.r.t. the ranking scores) is
recommended to the user.
used ε-greedy exploration technique.

The Critic network:
According to the Q-value, the
parameters of the Actor
network are updated in the
direction of improving the
performance of action a Based
on the deterministic policy
gradient theorem
Critic network is updated
accordingly by the temporal-
difference learning approach ,
i.e., minimizing the mean
squared error

State Representation Module:
Modeling the feature
interactions explicitly can boost
the performance
DRR-p.
DRR-u
DRR-Ave

DRR-p.
Product based neural network for the state representation module
utilizes a product operator to capture the pairwise local dependency between
items.

DRR-u.
In DRR-u, we can see that the user embedding is also incorporated. In addition to
the local dependency between items, the pairwise interactions of user-item are
also taken into account.

DRR-Ave.
the embeddings of items are first transformed by a weighted average pooling
layer. Then, the resulting vector is leveraged to model the interactions with the
input user. Finally, the embedding of the user, the interaction vector, and the
average pooling result of items are concatenate into a vector to denote the state
representation.

Deep RL based Recommendation (DRR) Training
DRR utilizes the users’ interaction history with the recommender agent as training data.
During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the
user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1.
According to the feedback, the recommender updates its recommendation policy.
The training procedure mainly includes two phases, i.e., transition generation and model updating.
• For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an
actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can
be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the
transition(st,at,rt,st+1)into the replay bufferD.
• In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling
technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’
parameters with the soft replace strategy.

Experiments
Dataset
•Movielens 100k
•Movielens 1m
•Yahoo Music
•Jester
Comparison Method
•Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number
of positive ratings from current available items to the users at each timestep.
•PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements.
•SVD++mixes strengths of the latent model as well as the neighborhood model
•DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies.
Reward Function
•in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j
comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore,
the reward function can be defined as follows: R(s,a) =rate i,j/10

Results
Normalized Discounted Cumulative Gain (NDCG)

Claims
both the immediate and long-term
rewards into account. Incorporated and
three instantiation structures are
designed, which can explicitly model the
interactions between users and items.
Extensive experiments on four real-world
datasets demonstrate the superiority of
the proposed DRR method over state-of-
the-art competitors

My thaughts
A neural network to convert user-
item relation into a state is not clear
to me.
Authors didn’t consider cold-start
problem.

Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

More Related Content

What's hot (20)

Similar to Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling (20)

More from Kishor Datta Gupta (20)

Recently uploaded (20)

Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling