Towards Diverse Recommendation

Towards Diverse Recommendation


Neil Hurley

Complex Adaptive System Laboratory
Computer Science and Informatics
University College Dublin

Clique Strategic Research Cluster
clique.ucd.ie

October 2011

DiveRS: International Workshop on Novelty and Diversity in Recommender Systems


Outline

1 Setting the Context



Outline


2 Novelty and Diversity in Information retrieval
IR Measures of Diversity
IR Measures of Novelty



Outline



3 Diversity Research in Recommender Systems
Concentration Measures of Diversity
Serendipity


Setting the Context

Outline



Serendipity


Setting the Context

Recommendation Performance I

Much effort has been spent on improving the performance of
recommenders from the point of view of rating prediction.
It is a well-defined statistical problem;
We have agreed objective measure of prediction quality.
Efficient algorithms have been developed that are good at
maximising predictive accuracy.


Setting the Context


Not a completely solved problem – e.g. dealing with dynamic
data.


Setting the Context


Not a completely solved problem – e.g. dealing with dynamic
data.
But, there are well accepted evaluation methodologies and
quality measures.


Setting the Context

Recommendation Performance II

But good recommendation is not about ability to predict past
ratings.
Recommendation quality is subjective;
People’s tastes ﬂuctuate;
People can be inﬂuenced and persuaded;
Recommendation can be as much about psychology as
statistics.


Setting the Context


ratings.
statistics.
A number of ‘qualities’ are being more and more talked about
with regard to other dimensions of recommendation:


Setting the Context


ratings.
statistics.
Novelty


Setting the Context


ratings.
statistics.
Novelty
Interestingness


Setting the Context


ratings.
statistics.
Novelty
Interestingness
Diversity


Setting the Context


ratings.
statistics.
Novelty
Interestingness
Diversity
Serendipity


Setting the Context


ratings.
statistics.
Novelty
Interestingness
Diversity
Serendipity
User satisfaction


Setting the Context

Recommendation Performance III

Clearly user-surveys may be the only way to determine subject
satisfaction with a system.
(Castagnos et al, 2010) present useful survey results on the
importance of diversity.
In order to make progress on recommendation algorithms that
seek improvements along these dimensions, we need
Agreed (objective?) measures of these qualities and agreed
evaluation methodologies


Setting the Context

Agenda

Focus in this talk on measures of novelty and diversity, rather
than algorithms for diversiﬁcation.
Initially look at how these concepts are deﬁned in IR research.
Then examine ideas that have emerged from the RS
community.


Novelty and Diversity in Information retrieval

Outline



Serendipity



Novelty and Diversity in Information Retrieval

The Probability Ranking Principle
“If a reference retrieval system’s response to each request is a
ranking of the documents in the collection in order of decreasing
probability of relevance . . . the overall eﬀectiveness of the system
to its user will be the best that is obtainable” (W.S. Cooper)

Nevertheless, relevance measured for each single document
has been challenged since as long ago as 1964





Goﬀman (1964). . . one must deﬁne relevance in relation to the
entire set of documents rather than to only one document





Goﬀman (1964). . . one must deﬁne relevance in relation to the
entire set of documents rather than to only one document
Boyce (1982) . . . A retrieval system which aspires to the
retrieval of relevant documents should have a second stage
which will order the topical set in a manner so as to provide
maximum informativeness



The Maximal Marginal Relevance (MMR) criterion
“ reduce redundancy while maintaining query relevance in
re-ranking retrieved documents” (Carbonell and Goldstein 1998)
Given a set of retrieved documents R, for a query Q incrementally
rank the documents according to

MMR arg max λsim1 (Di , Q) − (1 − λ) max sim2 (Di , Dj )
Di ∈RS Dj ∈S

where S is the set of documents already ranked from R.

Iterative greedy approach to increasing the diversity of a
ranking.




The Expected Metric Principle
“in a probabilistic context, one should directly optimize for the
expected value of the metric of interest” Chen and Karger (2006).

Chen and Karger (2006) introduces a greedy optimisation
framework in which the next document is selected to greedily
optimise the selected objective.
An objective such as mean k-call at n where k-call is 1 if the
top-n result contains at least k relevant documents, naturally
increases result-set diversity.
For 1-call, this results in an approach of selecting the next
document, assuming that all documents selected so far are
not relevant.



PMP – rank according to

Pr(d|r)
Pr(r|d) =⇒
Pr(d| r)

k-call at n – rank according to

Pr(at least k of r0 , ..., rn−1 |d0 , d1 , ...dn−1 )

Consider a query such as Trojan Horse, whose meaning is
ambiguous. The PMP criterion would determine the most
likely meaning and present a ranked list reﬂecting that
meaning. A 1-call at n criterion would present a result
pertaining to each possible meaning, with an aim of getting at
least one right.



Figure: Results from Chen and Karger (2006) on TREC2004 Robust
Track.

MSL = Mean Search Length (mean of rank of ﬁrst relevant
document minus one)
MRR = Mean Reciprocal Rank (mean of the reciprocal rank of
ﬁrst relevant document)



Agrawal et al. (2009) propose a similar approach of an
objective function to maximise the probability of ﬁnding at
least one relevant result.
They dub their approach the result diversiﬁcation problem and
state it as
S = arg max Pr(S|q)
S⊆D,|S|=k

Pr(S|q) = Pr(c|q)(1 − (1 − V (d|q, c)))
c d∈S

where
S is the retrieved result set of k documents
c ∈ C is a set of categories
V (d|q, c) is the likelihood of the document satisfying the user
intent, given the query q.



Zhai and Laﬀerty (2006) – risk minimization of a loss
function over possible returned document rankings measuring
how unhappy the user is with that set.



Axioms of Diversiﬁcation (Gollapudi and Sharma 2009)

r(.) : D × Q → R+ a measure of relevance
d(., .) : D × D → R+ a similarity function
Diversiﬁcation objective
∗
Rk = arg max{Rk ⊆D,|Rk |=k} f (Rk , q, r(.), d(., .))

What properties should f () satisfy?



Axioms of Diversiﬁcation (Gollapudi and Sharma 2009) I

1 Scale Invariance – insensitive to scaling distance and
relevance by constant.




2 Consistency – Making output more relevance and more
diverse and other documents less relevant and less diverse
should not change output of the ranking.




3 Richness – Should be able to obtain any possible set as
output by appropriate choice of r(.) and d(., .).




4 Stability – Output should not change arbitrarily with size:
∗ ∗
Rk ⊆ Rk+1 .




4 Stability – Output should not change arbitrarily with size:
∗ ∗
Rk ⊆ Rk+1 .
5 Independence of Irrelevant Attributes f (R) independent
of r(u) and d(u, v) for u, v ∈ S.
/



Axioms of Diversiﬁcation (Gollapudi and Sharma 2009) II

6 Monotonicity – Addition of a document to R should not
decrease the score : f (R ∪ {d}) ≥ f (R).




7 Strength of Relevance – No f (.) ignores the relevance
scores.




scores.
8 Strength of Similarity – No f (.) ignores the similarity scores.




scores.
No Function satisﬁes all 8 axioms




scores.
MaxSum Diversification
Weighted sum of the sums of relevance and dissimilarity of
items in the selected set.
saﬁsﬁes all axioms except stability.




scores.
MaxSum Diversification
Weighted sum of the sums of relevance and dissimilarity of
safisfies all axioms except stability.
MaxMin Diversification
Weighted sum of the min relevance and min dissimilarity of
satisfies all axioms except consistency and stability.


Outline



Serendipity



IR Measure of Diversity

S-recall (Zhai and Laﬀerty 2006)
S-recall at rank n is deﬁned as the number of subtopics retrieved
up to a given rank n divided by the total number of subtopics : Let
Si ⊆ S be the number of subtopics in the ith document di then
n
| i=1 Si |
S − recall@n =
|S|

Let minrank(S, k) = size of the smallest subset of documents
that cover at least k subtopics.
Usually most useful to consider S − recall@n where
n = minrank(S, |S|)




S-precision (Zhai and Laﬀerty 2006)
S-precision at rank n is the ratio of the minimum rank at which a
given recall value can optimally be achieved to the ﬁrst rank at
which the same recall value actually has been achieved.
Let k = | n Si |. Then
i=1

j
minrank(S, k)
S − precision@n = where m∗ = arg min | Si | ≥ k
m∗ j
i=1




α-NDCG (Clarke et al. 2008)
Standard NDCG (Normalised Cumulative Discounted Gain)
calculates a gain for each document based on its relevance and a
logarithmic discount for the rank it appears at. Extended for
diversity evaluation, the gain is incremented by 1 for each new
subtopic, and αk (0 ≤ α ≤ 1) for a subtopic that has been seen k
times in previously-ranked documents.




Intent-aware Precision (Agrawal et al. 2009)
Intent-aware precision precIA is calculated by ﬁrst calculating
precision for each distinct subtopic separately, then averaging these
precisions according to a distribution of the proportion of users
that are interested in that subtopic:
n
1
precIA@n = Pr(s|q) I(s ∈ di )
n
s∈S i=1



Outline



Serendipity




Novelty Measures (Agrawal et al. 2009)

KL-divergence D(di ||dj ) is used to measure novelty of di wrt
dj .
Alternatively, di can be modelled as a mixture of dj and a
background model. The higher the weight of dj in the
mixture, the less novel is di wrt dj .
Pairwise measures are combined to give overall measure of
novelty wrt all documents in result set.



Summary of IR Research

Long recognised that the probability ranking principle does
not adequately measure result list quality
– the usefulness of a document depends on what other
documents are on the list.




Considering that each document consists of a set of subtopics,
information nuggets or facets




The novelty of a document is a measure of how much
redundancy it contains, where it is redundant w.r.t. a facet, if
that facet is already covered by another document.




The diversity of a result list is a measure of the number of
relevant facets it contains.




No complete consensus here – e.g. Gollapudi and Sharma (2009)deﬁne “novelty” as fraction of topics covered




No complete consensus here – e.g. Gollapudi and Sharma (2009)deﬁne “novelty” as fraction of topics covered

Consider selecting document with least redundancy vs
selecting document that improves overall diversity.




In general, IR lines of research wrt diversity and novelty
consider the following:
Relevance scores for documents are not independent – need to
consider relevance wrt to the entire result set, rather than each
document in turn.
Diversity is related to query ambiguity –
Diﬀerence between selecting documents according to the
probability of meaning; or
Selecting documents to cover all meanings, so that at least
one is relevant.
Diversity is a measure of set; novelty is a measure of each
document wrt a particular set in which it is contained.


Diversity Research in Recommender Systems

Outline



Serendipity



Diversity – The Long Tail Problem

Figure: Sales Demand for 1000 products



Figure: Top 2% of Most Popular Products Account for 13% of Sales



Figure: Least Popular Items Account for 30% of Sales



“Less is More”
– Chris Anderson [Why the Future of Business is Selling Less of
More]



Recommenders and The Long Tail Problem

To support an increase in sales, need to increase the diversity
of the set of recommendations made to the end-user.
Recommend items in the long-tail that are highly likely to be
liked by the current-user.
Implies ﬁnding those items that are liked by the current user
and relatively few other users.



Diversity – The End-user Perspective

Definition
The diversity of a set L of size p is the average dissimilarity of the
items in the set
2
fD (L) = (1 − s(i, j))
p(p − 1)
i∈L j<i∈L

We have found it useful to define novelty (or relative diversity) as
follows:
Definition
The novelty of an item i in a set L is
1
nL (i) = (1 − s(i, j))
p−1
j∈L,j=i


Diversity – The End-user Perspective

User Proﬁle from Movielens Dataset, |Pu | = 764, N = 20,
|Tu | = 0.1 × |Pu |
40% of most novel items accrue no hits at all.



Other Deﬁnitions of Novelty/Diversity in RS

Castells et al. (2011) outlines some of the ways that novelty
impacts on recommender system design.
Distinguishes item popularity and item similarity; user-relative
measures and global measures
Popularity-based novelty:
novelty(i) = − log p(i) global measure
or 1 − log(p(K|i))
novelty(i) = − log(p(i|u)) user perspective
or 1 − log(p(K|i, u))
Similarity perspective
novelty(i|S) = p(j|S)d(i, j)
j∈S



Outline



Serendipity



Evaluating Diversity

In our 2009 RecSys paper, we evaluated our diversiﬁcation
method on test sets T (µ) consisting of items chosen from the
top 100 × (1 − µ)% most novel items in the user proﬁles.



Toy Example

Motivate our diversity methodology using a toy example in
which a user-base of four users, u1 , u2 , u3 , u4 is recommended
items from a catalogue of 4 items i1 , i2 , i3 , i4 .
The system recommends N = 2 items to each user.
Any particular scenario can be represented in a table that
indicates whether a user actually likes an item or not (1 or 0)
and the probability that the recommender system will
recommend the corresponding item to the user.
Assume that G1 = {i1 , i2 } is a single genre (e.g. horror
movies) and G2 = {i3 , i4 } is another.
Simple similarity measure s(i1 , i2 ) = s(i3 , i4 ) = 1 and
cross-genre similarities are zero.



Toy Example

Biased but Full Recommended Set Diversity
i1 i2 i3 i4
u1 1 (1) 1 (0) 1 (1)
2
1
0 (2)
u2 1 (0) 1 (1) 1 (1)
2
1
0 (2)
u3 0 (1)
2
1
1 (2) 1 (1) 1 (0)
u4 1 (1) 0 (0) 1( 1 )
2 1 (1)
2

Always recommends an item from G1 and an item from G2 .
Probability of i1 being recommended to a randomly selected
user – 1 (1 + 0 + 1 + 1) = 5 – is higher than that of i2 ( 8 ),
4 2 8
3

for instance.
Recommendations do not spread evenly across the product
catalogue.
Biased towards consistently recommending i1 to u1 but never
recommending i2 to u1 .


Toy Example

No System Level Biases

i1 i2 i3 i4
1
u1 1 (1) 1 (3) 1 (1)
3
1
0 (3)
u2 0 (1)
3 1 (1) 1
1 (3) 1
1 (3)
u3 1 (1)
3
1
0 (3) 1 (1) 1
1 (3)
u4 1 (1)
3
1
1 (3) 0 (1)
3 1 (1)

The probability of recommending i1 , to a randomly chosen
relevant user (i.e. u1 , u3 or u4 ) is 3 (1 + 1 + 1 ) = 9 .
1
3 3
5

Similarly, for i2 , i3 and i4 .
Focusing on the set of items that are relevant to u1 (i.e. i1 , i2
and i3 ), the algorithm is three times as likely to recommend i1
as either of the other relevant items.


Toy Example

No System or User Level Biases

i1 i2 i3 i4
u1 1 (1)
3
1
1 (3) 1 (1)
3 0 (1)
1
u2 0 (1) 1 (3) 1 (1)
3
1
1 (3)
u3 1 (1)
3 0 (1) 1
1 (3) 1
1 (3)
u4 1 (1)
3
1
1 (3) 0 (1) 1
1 (3)

Same probability of recommending any relevant item to a user
Same probability that an item is recommended when it is
relevant.



Algorithm Diversity

Definition
We define an algorithm to be fully diverse from the user
perspective if it recommends any of the user’s set of relevant items
with equal probability.

Definition
We define an algorithm to be fully diverse from the system
perspective if the probability of recommending an item, when it is
relevant is equal across all items.



Lorenz Curve and the Gini Index

A plot of the cumulative proportion of the product catalogue
against cumulative proportion of sales




69% of the sales are of the 10% top selling products.




G = 0 implies equal sales to all products. G = 1 when single
product gets all sales.



Measuring Recommendation Success

Measurement unit of success in recommender systems = Hit
Interpret as the recommendation of a product known to be
liked by the user.



Hits Inequality – Concentration Curves of Hits

Lorenz curve and gini index measure inequality within the
hits distribution over all items in the product catalogue.
Concentration curve and concentration index of hits vs
popularity measures bias of hits distribution towards popular
items.
Concentraion curve and concentration index of hits vs
novelty measures bias of hits distribution towards novel
items.



Concentration Curves

n products accrue hits {h1 , . . . , hn } – concentration curve depends
on correlation between hits and popularity.



Temporal Diversity

Lathia et al. (2010) investigates diversity over time – do
recommendations change over time?
Now diversity is measured between two recommended sets,
formed at diﬀerent points in time
1
|Ri+1 Ri |
diversity(Ri+1 , Ri ) =
n
And novelty is measured as the number of new items over all
time
1
novelty(Ri+1 ) = |Ri+1 ∪i Ri |
j=1
n
kNN algorihms exhibit more temporal diversity than SVD
matrix factorisation
Switching between multiple algorithms is oﬀered as one means
to improve temporal diversity.

Serendipity

Outline



Serendipity


Serendipity

Measuring the Unexpected

Serendipity – the extent to which recommendations may
positively surprise users.
Murakami et al. (2008) propose to measure unexpectedness as
the “distance between results produced by the method to be
evaluated and those produced by a primitive prediction
method”.
n i
1 j=1 rel(sj )
= max(Pr(si ) − Prim(si ), 0) × rel(si ) ×
n i
i=1
Ge et al. (2010) follow a similar approach, such that if R1 is
the recommended set returned by the RS and R2 is a set
returned by Prim then
1
serendipity = rel(sj )
|R1 R2 |
sj ∈R1 R2

Serendipity

Novelty vs Serendipity

Novelty with regard to a given set is a measure of how
diﬀerent an item is to other items in the set;
It does not involve any notion of relevance
Is a serendipitous recommendation equivalent to a relevant
novel recommendation?
To me, serendipity encapsulates a higher degree of risk – a
novel item with a low chance of relevance, according to our
model, which yet turns out to be relevant.


Serendipity

Conclusions

IR research gives some directions in how to deﬁne and
evaluate diversity and novelty
We can ask
Are these adequate for RS research?
Can we map them to the needs of RS evaluation?
How are they deﬁcient?
Recent research is beginning to clarify these issues for RS
I believe that objective measures are possible


Serendipity

Conclusions

IR research gives some directions in how to deﬁne and
evaluate diversity and novelty
We can ask
Are these adequate for RS research?
Can we map them to the needs of RS evaluation?
How are they deﬁcient?
Recent research is beginning to clarify these issues for RS
I believe that objective measures are possible
I look forward to some interesting discussions on these issues!!


Serendipity

Thank You

My research is sponsored by Science Foundation Ireland under
grant 08/SRC/I1407: Clique: Graph and Network Analysis Cluster


Serendipity

References I

Agrawal, R., Gollapudi, S., Halverson, A. and Ieong, S.: 2009,
Diversifying search results, Proceedings of the Second ACM
International Conference on Web Search and Data Mining,
WSDM ’09, ACM, New York, NY, USA, pp. 5–14.
URL: http://doi.acm.org/10.1145/1498759.1498766
Boyce, B. R.: 1982, Beyond topicality : A two stage view of
relevance and the retrieval process, Inf. Process. Manage.
18(3), 105–109.


Serendipity

References II

Carbonell, J. and Goldstein, J.: 1998, The use of mmr,
diversity-based reranking for reordering documents and
producing summaries, Proceedings of the 21st annual
international ACM SIGIR conference on Research and
development in information retrieval, SIGIR ’98, ACM, New
York, NY, USA, pp. 335–336.
Castells, P., Vargas, S. and Wang, J.: 2011, Novelty and Diversity
Metrics for Recommender Systems: Choice, Discovery and
Relevance, International Workshop on Diversity in Document
Retrieval (DDR 2011) at the 33rd European Conference on
Information Retrieval (ECIR 2011).


Serendipity

References III

Chen, H. and Karger, D. R.: 2006, Less is more: probabilistic
models for retrieving fewer relevant documents, in E. N.
Efthimiadis, S. T. Dumais, D. Hawking and K. J¨rvelin (eds),
a
SIGIR, ACM, pp. 429–436.
Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan,
A., B¨ttcher, S. and MacKinnon, I.: 2008, Novelty and diversity
u
in information retrieval evaluation, Proceedings of the 31st
annual international ACM SIGIR conference on Research and
development in information retrieval, SIGIR ’08, ACM, New
York, NY, USA, pp. 659–666.


Serendipity

References IV

Ge, M., Delgado-Battenfeld, C. and Jannach, D.: 2010, Beyond
accuracy: evaluating recommender systems by coverage and
serendipity, Proceedings of the fourth ACM conference on
Recommender systems, RecSys ’10, ACM, New York, NY, USA,
pp. 257–260.
Goﬀman, W.: 1964, On relevance as a measure, Information
Storage and Retrieval 2(3), 201–203.
Gollapudi, S. and Sharma, A.: 2009, An axiomatic approach for
result diversiﬁcation, Proceedings of the 18th international
conference on World wide web, WWW ’09, ACM, New York,
NY, USA, pp. 381–390.


Serendipity

References V

Lathia, N., Hailes, S., Capra, L. and Amatriain, X.: 2010,
Temporal diversity in recommender systems, in F. Crestani,
S. Marchand-Maillet, H.-H. Chen, E. N. Efthimiadis and
J. Savoy (eds), SIGIR, ACM, pp. 210–217.
Murakami, T., Mori, K. and Orihara, R.: 2008, Metrics for
evaluating the serendipity of recommendation lists, in K. Satoh,
A. Inokuchi, K. Nagao and T. Kawamura (eds), New Frontiers
in Artiﬁcial Intelligence, Vol. 4914 of Lecture Notes in Computer
Science, Springer Berlin / Heidelberg, pp. 40–46.
Zhai, C. and Laﬀerty, J.: 2006, A risk minimization framework for
information retrieval, Inf. Process. Manage. 42, 31–55.
URL: http://dx.doi.org/10.1016/j.ipm.2004.11.003


Towards Diverse Recommendation

More Related Content

Similar to Towards Diverse Recommendation (20)

Recently uploaded (20)

Towards Diverse Recommendation