Exploiting Distributional Semantics Models for Natural Language Context-aware Justifications for Recommender Systems

GIUSEPPE SPILLO, CATALDO MUSTO, MARCO DE GEMMIS, PASQUALE LOPS, GIOVANNI
SEMERARO
UNIVERSITA’ DEGLI STUDI DI BARI «ALDO MORO»
SWAP RESEARCH GROUP – HTTPS://WWW.DI.UNIBA.IT/~SWAP
EXPLOITING DISTRIBUTIONAL SEMANTICS MODELS FOR
NATURAL LANGUAGE CONTEXT-AWARE
JUSTIFICATIONS FOR RECOMMENDER SYSTEMS
CLIC-IT 2020
Seventh Italian Conference on Computational Linguistics
Bologna, 1 - 3 March, 2021
ONLINE Conference
linkedin.com/giuseppe-
spillo-89542b1b6/
@spillo_giuseppe

INTRODUCTION
 In this project we present a methodology to generate context aware natural
language justifications supporting the suggestions produced by a recommendation
algorithm, using distributional semantics models.
I suggest you Lost in
translation 2

WHY CONTEXT AWARE JUSTIFICATION?
 Intuition: just like the selection of the
most suitable item is influenced by the
contexts of usage, a justification that
supports a recommendation should
vary depending on the different
contextual situations in which the item
will be consumed.
Context:
company
Alone
Friends
Couple
3

AN EXAMPLE OF CONTEXT-AWARE
JUSTIFICATION
…it's suitable if you don't want to be focused on it
because it is simple in plot and direction.
…it's perfect to spend an evening in sweet
company because this entirely unexpected ending
is one of the most romantic and hopeful moments
you will ever see on screen.
I recommend you Lost in translation
because people who liked this movie
think that…
Context: low
attention
Context: couple
4

DISTRIBUTIONAL SEMANTICS MODELS
 We designed a natural language processing
pipeline that exploits distributional semantics
models to build a term-context matrix that
encodes the importance of terms and concepts
in each contextual dimension.
 In this way we can obtain a vector space
representation of each context, which is used to
identify the most suitable pieces of information
to be combined in a justification.
5

JUSTIFICATION, NOT EXPLANATIONS
 We are not referring to explanations since these are post-hoc justifications: a
recommender system suggests an item, and this framework will generate a
justification independent from the mechanism of recommendation, but will adapt to
the context of consumption of the user.
 In this way we can justify a recommendation even for items that have not a minimum
number of ratings
6

THE PIPELINE OF THE FRAMEWORK
 CONTEXT LEARNER: it uses DSMs to learn a vector space representation of each context.
 RANKER: it implements a scoring mechanism to identify the most suitable review excerpts that can
support the recommendation.
 GENERATOR: it puts together previously retrieved pieces of information and provides the user with a
context-aware justification of the item.
7

8

THE CONTEXT LEARNER
 As said before, the idea of this module is to contruct a context learner exploiting
Distributional Semantics Models.
 Given a set of terms 𝑇 = 𝑡1, … , 𝑡𝑛 and a set contexts 𝐶 = {𝑐1, … , 𝑐𝑘}, this module
constructs a term-context matrix 𝐶𝑛,𝑘 that encodes the importance of a term 𝑡𝑖 for
the context 𝑐𝑗.
 How can we construct this matrix?
9

MANUAL ANNOTATION
 Starting from a set 𝑅 of user reviews, we split each review 𝑟 ∈ 𝑅 in sentences to obtain a
set of sentences 𝑆.
Sent c1 c2 c3 c4 c5
s1 ✓ ✓
s2 ✓ ✓
s3 ✓ ✓
s4 ✓ ✓ ✓
 Then, given this set of sentences 𝑆, we
manually annotated a subset of these
sentences in order to obtain a set 𝑆′ =
{𝑠1,𝑠2, … , 𝑠𝑚}, where each 𝑠𝑖 is labelled
with one or more contextual settings,
based on the concepts mentioned in the
sentence. Each 𝑠𝑖 can be annotated with
more than one context.
10

MANUAL ANNOTATION
 For example, the sentence ‘very romantic movie’ can be annotated with the context
company=couple.
 The intuition is that a sentence expressing the concepts of “romantic” can be very useful
to support the recommendation of an item for a user who expresses her desire to spend
time with the partner.
11

SENTENCE-CONTEXT MATRIX
 In this way we built the sentence-
context matrix 𝐴𝑚,𝑘, in which each
𝐴𝑠𝑖,𝑐𝑗
is equal to 1 if the sentence 𝑠𝑖 is
annotated for the context 𝑐𝑗 (so the
concepts mentioned in that sentence
are relevant for that particulat context),
0 otherwise.
Sent c1 c2 c3 c4 c5
s1 1 0 0 1 0
s2 0 1 0 0 1
s3 1 0 0 1 0
s4 0 1 1 0 1
12

TERM-SENTENCE MATRIX
 The next step is to split all the annotated sentences s ∈ 𝑆′ into terms
𝑡𝑖 ∈ 𝑇 = {𝑡1, 𝑡2, … , 𝑡𝑛} to identify the specific concepts expressed in each annotated
sentence and build a term-sentence matrix 𝑉
𝑛,𝑚.
 Each value of this matrix contains the TF-IDF of the term 𝑡𝑖 in the sentence 𝑠𝑗; the IDF
values are computed on all the annotated sentences.
 We also used NLP techniques to reduce the size of the vocabulary of terms, including
tokenization, lemmatization, POS-tagging filtering, using the CoreNLP framework.
13

NLP TECHNIQUES
14
‘These’, ‘scenes’, ‘are’,
‘incredibly’, ‘romantic’
‘These scenes are
incredibly romantic’
Tokenization
Lemmatization
POS-tagging

TERM-CONTEXT MATRIX
 Once obtained the term-sentence matrix 𝑉
𝑛,𝑚 and the sentence-context matrix 𝐴𝑚,𝑘,
it is possible to compute the term-context matrix 𝐶𝑛,𝑘 by simply multiplying them:
 𝐶𝑛,𝑘 = 𝑉
𝑛,𝑚 × 𝐴𝑚,𝑘 =
𝑣1,1 ⋯ 𝑣1,𝑚
⋮ ⋱ ⋮
𝑣𝑛,1 ⋯ 𝑣𝑛,𝑚
×
𝑎1,1 ⋯ 𝑎1,𝑘
⋮ ⋱ ⋮
𝑎𝑚,1 ⋯ 𝑎𝑚,𝑘
=
𝑐1,1 ⋯ 𝑐1,𝑘
⋮ ⋱ ⋮
𝑐𝑛,1 ⋯ 𝑐𝑛,𝑘
15
Term Good mood High Alone Couple
happy 3,4 1,5 1,7 2,4
fun 2,8 1,3 2,1 2,8
focusing 1,0 3,9 2,6 0,4
romantic 1,1 0,7 0,4 4,7

CONTEXT VECTORS
 We can obtain two different
outputs:
 First, we can extract column
vectors from matrix 𝐶. Each
column vector 𝑐𝑗 represents the
vector space representation of
the context 𝑐𝑗, obtained by
exploiting DSMs.
Term Good mood High
attention
Alone Couple
happy 3,4 1,5 1,7 2,4
fun 2,8 1,3 2,1 2,8
focusing 1,0 3,9 2,6 0,4
romantic 1,1 0,7 0,4 4,7
Column vectors of contexts «good mood»
and «couple»
16

LEXICON GENERATED
 We can obtain two different
outputs:
 Second, we can obtain the lexicon
of a contextual dimension by
extracting the first k lemmas with
the highest TF-IDF scores for each
column.
Term Good
mood
Couple
happy 3,4 2,4
fun 2,8 2,8
focusing 1,0 0,4
romantic 1,1 4,7
Sorting by scores
Good mood: happy, fun, focusing, romantic
Couple: romantic, fun, happy, focusing
17

18

THE RANKER
 Given the set of contextual vectors generated by the context learner, a recommended
item with its reviews, and the current contextual situation of the user, the aim of the
ranker is to choose from the user reviews the most relevant sentences for the current
contexts of the user, that will be then included into the justification
S1: Very romantic movie
S2: Engaging plot
S3: Perfect for a relaxing night
Company: couple
S1: Very
romantic movie
19

SENTENCE REPRESENTATION
 To establish the relevance of a sentence for a context, we used the cosine similarity
between the vector representation of the context (given by the context learner) and the
vector representation of the sentence, which is build at this step.
 We chose only sentences with a positive sentiment: this has been decided because the
justification has to convince the user to consume the item.
 Since each contextual vector has an n-dimensional representation, the ranker has to
build the same n-dimensional representation for the sentence
20

SENTENCE VECTOR
 To build this representation, first reviews for an item are split into sentences.
 Then, these sentences are filtered by sentiment (only positive), tokenized and
lemmatized (as done before).
 Finally, the vector 𝑠𝑖 is instantiated in the same space defined by the term-context
matrix 𝐶𝑛,𝑘.
 In particular, 𝑠𝑖 = 𝑣𝑡1
, 𝑣𝑡2
, … , 𝑣𝑡𝑛
𝑇
, where each 𝑣𝑡𝑗
represents the TF-IDF score of the
term 𝑡𝑗 (TF counts how many times 𝑡𝑗 appears in 𝑠𝑖, while IDF is calculated in the
canonical way).
21

COSINE SIMILARITY
 At this point, we have both the contextual vectors and the sentence vector
representations, so it is possibile to compute the cosine similarity between them.
 The sentence with the highest cosine similarity score is established to be the most
relevant sentence for that context.
 This is performed for each context of consumption of the user: one sentence will be
chosen for each of them.
 Let’s see a practical example
22

VISUAL EXAMPLE
 Let us suppose that we instantiated two
different sentence vectors, related to the
same item:
 s1=‘the plot is really interesting and
engaging’
 s2=‘wonderful love story’
 Let’s suppose that the user’s consumption
contexts are:
 𝑐1=‘attention:high’
 𝑐2=’company:couple’
23

VISUAL EXAMPLE
 The closest sentence vector to the
context vector c1 is s1, so s1 will be
chosen for that context
 The closest sentence vector to the
context vector c2 is s2, so s2 will be
chosen for that context
24

25

THE GENERATOR
 The goal of the generator module is to put together chosen sentences in a single
natural language justification to be presented to the user.
 The generated justifications are based on the combination of a fixed part, which is
common to all the justifications, and a dynamic part that depends on the outputs
returned by the previous steps.
 The top-1 sentence for each current contextual dimension is selected, and the
different sentences are merged by exploiting simple connectives, such as adverbs
and conjunctions.
26

THE GENERATOR
 Following the previous example, and
supposing that the item recommended is
Lost in translation, a real justification
provided by this framework could be this
one
27

FILMANDO
 We tested this metodology in the movies domain.
 We defined a set of consumption contexts for movies, and given a set of movies from
with their reviews, we applied the pipeline and constructed a web app integrating
the results.
28

CONTEXTS OF CONSUMPTION CHOSEN
 We defined 3 different contextual situations, that can assume different values
 Attention level:
 High, low
 Mood:
 Good mood, bad mood
 Company:
 Alone, Friends, Couple
29

EXPERIMENT SPECIFICATIONS
 For both the context learners (the one based on matrix multiplication and the one based
on PMI) we generated 3 kind of matrix configurations:
 The first, based on unigrams
 The second, based on bigrams
 The third, based on the combination of unigrams and bigrams
 The intuition behind this decision is that it is possibile that two single words, taken
alone, assume a meaning, but if they are considered together they could mean
something else.
30

RESEARCH QUESTIONS
 1) How effective are DSMs based justifications, on varying of different combinations
of the parameters?
 2) Do DSMs based justification algorithms obtain performance at least comparable
with respect to a static justification algorithms?
 3) Do context aware justification obtain better performance with respect to non-
contextual justification?
31

FILMANDO
Welcome screen
Context selection
32

FILMANDO
Justification
provided
Rating of the
justification
Transparency
Persuasion
Engagement
Trust
33

FILMANDO
Comparison with
baselines
Expressing
preference
Style preference
Transparency
Persuasion
Engagement
Trust
34

1) EFFECTIVENESS OF THE MODEL
35
DSMs effectiveness
Question Unigrams Bigrams Unigrams +
Bigrams
Transparency «I understood why the movie was suggested to
me»
3.38 3.81 3.64
Persuasion «The justification made the recommendation
more convincing»
3.56 3.62 3.54
Engagement «The justification allowed me to discover more
information about the movie»
3.54 3.72 3.70
Trust «The justification increased my trust in
recommender systems»
3.44 3.66 3.61
Bigrams behave better

2) VS STATIC CONTEXTUAL BASELINE
36
Preferences: DSMs VS static contextual baseline
CA + DSMs Baseline Indifferent
Transparecny
53.28% 38.10% 19.52%
Persuasion
24.10% 36.33% 19.57%
Engagement
49.31% 39.23% 11.56%
Trust
42.86% 39.31% 17.83%
 Improvements over a
contextual baseline based on
static lexicon, except for the
engagement

3) VS NON CONTEXTUAL DISTRIBUTIONAL BASELINE
37
Preferences: DSMs VS distributional non-contextual
baseline
CA + DSMs Baseline Indifferent
Transparecny
52.38% 38.10% 19.32%
Persuasion
54.10% 36.33% 19.57%
Engagement
49.31% 39.23% 11.56%
Trust
42.86% 39.31% 17.83%
 Great improvements over a
non-contextual baseline based
on DSMs

RECAP
 The model seems to be appreciated by users
 A representation based on bigrams better catches
the semantics of the different context of
consumptions
 Users tend to prefer context-aware justifications,
and DSMs allow to build a more effective
representation
38

THANK YOU FOR YOUR
ATTENTION!
39

Exploiting Distributional Semantics Models for Natural Language Context-aware Justifications for Recommender Systems

More Related Content

What's hot (15)

Similar to Exploiting Distributional Semantics Models for Natural Language Context-aware Justifications for Recommender Systems (20)

Recently uploaded (20)

Exploiting Distributional Semantics Models for Natural Language Context-aware Justifications for Recommender Systems