Empirical Evaluation of Active Learning in Recommender Systems

Empirical Evaluation of Active
Learning in Recommender Systems
Mehdi Elahi
Postdoc Researcher
Politecnico di Milano, Italy
1
seminar @ Politecnico di Milano
July 2015
www.linkedin.com/in/mehdielahi

My Previous Research Group
2
Ph.D. Adviser:
Francesco Ricci
Full Professor
Dean of the faculty of CS
https://www.inf.unibz.it/idse/

My Current Research Group
3
Research Adviser:
Paolo Cremonesi
Associate Professor
DEIB @ Politecnico di Milano
http://recsys.deib.polimi.it

Outline
¤ Introduction
¤ Active Learning in RS
¤ Oﬄine Evaluation and Results
¤ Application to Mobile RS
¤ Conclusion and Future Works
4

Introduction
¤ Recommender Systems (RSs) are tools that support users
decision making by suggesting products that can be
interesting to them.
¤ Examples of Recommender Systems:
5

Introduction
¤ Collaborative Filtering:
¤ A technique to predict unknown ratings, by exploiting
ratings given by users, and to recommend the items with
highest predicted ratings.
6

Sparsity of the Data
¤ In Netflix: 98.8 % of the
ratings are unknown
¤ In Movielens: 95.7 % of
the ratings are
unknown
1 2 3 4 5 6 7 8 9
1 2
2 5 3 1
3 1 5
4 5 5
5
6 4 1 4
7
8 5 4
9 5
Items
Users
Ratings
7

Active Learning for Collaborative
Filtering
¤ Active Learning:
¤ Requests and try to collect more ratings from the users
before offering recommendations
8

Which Items should be chosen?
¤ Not all the ratings are equally useful, i.e.,
equally bring information to the system.
¤ To minimize the user rating effort only some
of them should be requested and acquired
9

Definition of AL Strategy
¤ An active learning strategy for a Collaborative
Filtering is a set of rules to choose the best items
for the users to rate
10

Non-Personalized Strategies
¤  Random: selects Items randomly (baseline)
¤  Popularity: scores an item according to the frequency of its ratings
and then chooses the highest scored items (Carenini, 2003)
¤  Entropy: scores each item with the entropy of its ratings and then
chooses the highest scored items (Rashid, 2002 and 2008)
¤  Variance: scores each item with the variance of its ratings and
then chooses the highest scored items (Rubens, 2011)
¤  log(Popularity)*Entropy: combines the popularity and entropy
scores and then chooses the highest scored items (Rashid, 2002
and 2008)
11

Personalized Single Strategies
¤  Highest Predicted: scores an item according to the prediction of
its ratings and then chooses the highest scored items (Elahi, 2011)
¤  Lowest Predicted: scores an item according to the prediction of its
ratings and then chooses the lowest scored items (Elahi, 2011)
¤  Highest-Lowest Predicted: combines the highest predicted and
lowest predicted scores and chooses the highest and lowest
scored items (Elahi, 2011)
¤  Binary Prediction: scores an item according to the prediction of its
ratings (using transformed matrix of user-item) and then chooses
the highest scored items (Elahi, 2011)
¤  Personality based binary prediction: extends the binary prediction
strategy by using user attributes, such as the scores for the Big Five
personality traits on a scale from 1 to 5 (Elahi, 2013).
12

Personalized Combined Strategies
¤ Combined with Voting: scores an item according to the
votes given by a committee of different strategies and then
chooses the highest scored items (Elahi, 2011)
¤ Combined with Switching: adaptively selects a strategy from
a pool of individual AL strategies, based on the estimation of how
well each strategy is able to cope with the conditions at hand.
Then the selected strategy scores an item according to its
criterion (Elahi, 2012)
13

Offline Evaluation (A)
¤  Datasets are par))oned into three subsets:
¤ Known (K): contains the rating values that are considered to be known by the
system at a certain point in time.
¤ Unknown (X): contains the ratings that are considered to be known by the
users but not to the system. These ratings are incrementally elicited, i.e., they are
transferred into K if the system asks them to the (simulated) users.
¤ Test (T): contains the ratings that are never elicited and are used only to test the
recommendation eﬀectiveness after the system has acquired the new elicited
ratings.
Netflix
No. of users: 480189
No. of items: 17770
No. of ratings: 100M*
Time span: 1998 – 2005
*We used the 1st 1M ratings
Movielens
No. of users: 6040
No. of items: 3900
No. of ratings: 1M
Time span: 2000- 2003
14

Learning Iteration
Item Score
1 151
2 44
3 7
4 1
5 42
6 34
7 9
8 55
9 20
… …
N 12
System computes the
scores for all the
items that can be
scored (according to
a strategy)
15

Learning Iteration
Top 10
items
Score
1 151
8 55
43 54
11 50
2 44
5 42
6 34
22 33
75 29
13 25
The system selects
the top 10 items
and presents them
to the simulated
user
16

Learning Iteration
The items that are
rated in the unknown
set (X) are found and
transferred to the
known set (K)
Rated
items
1
2
5
75
13
17

Learning Iteration
The items that are
rated in the unknown
set (X) are found and
transferred to the
known set (K)
18

System-wide vs User-centered
19
We have conducted System-wide evalua.on.

Results: MAE
¤ Mean Absolute Error
¤ The lower the better.
¤ Measures the average
absolute deviation of
the predicted rating
from the user's true
rating:
0 20 40 60 80 100 120 140 160 180 200
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
# of iterations
MAE
Mean Absolute Error (MAE)
random
popularity
lowest−pred
highest−pred
voting
(Elahi, 2011)
20

Effect on data distribution
1 2 3 4 5 6 7 8 9
1 5 5 4 2 5 4
2 4 5 5 3 1
3 1 5 4 5
4 5 5 4 4 5 5
5 5 4 5 4 5 5
6 3 5 5 1 4 4 5
7 4 4 5 5 5 5
8 5 5 4 5 5 5
9 5 5 5 5 4
Items
Users
1 2 3 4 5 6 7 8 9
1 2
2 5 3 1
3 1 5
4 5 5
5
6 3 1 4
7
8 5 4
9 5
Items
Rating
Elicitation
21

Histogram of Known Set
¤ Prediction Bias
¤ Since majority of the ratings added by highest-predicted
strategy are ratings with high values, the prediction for
the test set is biased
1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Rating values
Probability
Iteration 1
1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Rating values
Probability
Iteration 20
1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Rating values
Probability
Iteration 40
1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Rating values
Probability
Iteration 60
22

Evaluation: NDCG
¤ Normalized Discounted
Cumulative Gain:
¤ The higher the better.
¤ The recommendations for
u are sorted according to
the predicted rating
values, then DCGu is
computed: 0 20 40 60 80 100 120 140 160 180 200
0.8
0.82
0.84
0.86
0.88
0.9
0.92
# of iterations
NDCG
Normalized Discounted Cumulative Gain (NDCG)
random
popularity
lowest−pred
highest−pred
voting
(Elahi, 2011)
23

Evaluation: Precision
¤ Precision: percentage of
the items with rating
values (as in T ) equal to 4
or 5 in the top 10
recommended items.
0 20 40 60 80 100 120 140 160 180 200
0.72
0.74
0.76
0.78
0.8
0.82
0.84
# of iterations
Precision
Presision
random
popularity
lowest−pred
highest−pred
voting
(Elahi, 2011)
24

Successful Requests
¤ The ratio of the ratings acquired over those requested at
different iterations.

Offline Evaluation (B)
¤ All the strategies show a non-monotone behavior, and
there are a lot of fluctuations, since the test set,
dynamically changes in every week.
¤ However, still the proposed strategies perform excellent
in this situation compared to the base-line.
0 20 40 60 80 100 120 140 160
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
# of iterations
MAE
Traditional Evaluation Setting
random
highest−pred
log(pop)*entropy
voting
5 10 15 20 25 30 35 40 45
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
# of weeks
MAE
Proposed Evaluation Setting
Natural Acquisition
random
highest−pred
log(pop)*entropy
voting
switching
Without Natural Acquisition (Elahi, 2011) With Natural Acquisition (Elahi, 2012)
27

Evaluation: MAE (normalized)
¤  The highest predicted strategy (the default strategy of RSs) is not
performing very differently from the natural acquisition of ratings.
¤  In fact, it is not acquiring additional ratings to those collected by
the natural rating acquisition, i.e., the user rates these items by her
own initiative
4 6 8 10 12 14 16 18 20 22 24
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
# of weeks
normalizedMAE
Normalized Mean Absolute Error
Natural Acquisition
random
highest−pred
log(pop)*entropy
voting
switching
(Elahi, 2012)
28

Evaluation: NDCG (normalized)
4 6 8 10 12 14 16 18 20 22 24
−0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
# of weeks
normalizedNDCG
Normalized NDCG
Natural Acquisition
random
highest−pred
log(pop)*entropy
voting
switching
(Elahi, 2012)
29
¤ Our proposed Voting and Switching strategies, both
perform excellent.

Conclusion of Offline Evaluations
¤  We demonstrate that it is possible to adapt to the changes in
the characteristics of the rating set by proposing two novel
AL strategies:
¤  Combined with Voting
¤  Combined with Switching
¤  a more realistic active learning evaluation settings in which
ratings are added not only by the AL strategies, but also by
users without being prompted to rate (natural rating
acquisition).
¤  Our results show that the natural rating acquisition
considerably influences and changes the performance of
the AL strategies.
30

Application: South Tyrol Suggests(STS)
¤ A mobile Android context-aware
RS that recommends places of
interests (POIs) in the South Tyrol
region.
¤ The system was in an extreme
cold-start situation (only 700
ratings for total of 27,000 POIs).
31

STS: Personality Questionnaire
Neuroticism
Conscientious-
ness
Openness
ExtraversionAgreeableness
Big Five
Personality Traits
32

STS: Personality Questionnaire
Neuroticism
Conscientious-
ness
Openness
ExtraversionAgreeableness
Big Five
Personality Traits
33

STS: Active Learning
¤ Using the personality of the user in
the prediction model, the system
estimates which POIs the user likely
has experienced, and hence, can
rate.
34

STS: Recommendations
¤ STS computes rating predictions for
all POIs from the database, using
the personality information of the
users and the ratings they have
given to the POIs.
36

User Study
Research Hypotheses:
¤ Our proposed personality-based active learning
strategy leads to a larger number of acquired user
ratings and related contextual conditions.
¤ The prediction accuracy and context-awareness of the
recommendation model improves the most when
utilizing our proposed active learning strategy.
37

Results: Ratio of the Rating Acquisition
Pairs of strategies

Means
p-value
# of ratings
Random / log(popularity) * entropy
1.35 / 2.07
< 0.001
73 / 112
Random / personality-based binary prediction
1.35 / 2.31
< 0.001
73 / 125
Personality-based binary prediction / log(popularity)
* entropy
2.31 / 2.07
0.005
125 / 112
39

Results: Ratio of the Rating Acquisition
0 10 20 30 40 50 60
0
1
2
3
4
#ofacquiredratings
Random
0 10 20 30 40 50 60
0
1
2
3
4
#ofacquiredratings
Log(popularity) * Entropy
0 10 20 30 40 50 60
0
1
2
3
4
#ofacquiredratings
Personality−Based Binary Prediction
2.5
3
3.5
dratings
Comparison of Regressions
Random
0 10 20 30 40 50 60
0
1
2
3
#ofacquiredrati
0 10 20 30 40 50 60
0
1
2
3
4
#ofacquiredratings
0 10 20 30 40 50 60
0
1
2
3
4
#ofacquiredratings
0 10 20 30 40 50 60
1
1.5
2
2.5
3
3.5
Users over Time#ofacquiredratings
Comparison of Regressions
Random
40

Results: Context-Awareness
41
log(popularity) * entropy
personality based binary pred.
Q1
3.58
3.56
Q2
2.95
3.31
# of
context
1.01
1.52

Results: Context-Awareness
42
Comparison of MAE (the lower the be<er) and
nDCG (the higher the be<er)

Conclusion of User Study
In a live user study, we have:
ü  shown that user personality has an important impact in
user’s rating behavior.
ü  Successfully verified both research hypotheses, i.e., the
personality-based active learning strategy acquired
more ratings and improves the most the rating
prediction accuracy.
43

Main Contributions
¤  Proposing several novel personalized active learning
strategies for collaborative filtering.
¤  Offline evaluation of several active learning strategies with
regards to their system-wide effectiveness.
¤  Comprehensive evaluation of active learning strategies with
regards to several evaluation measures.
¤  Evaluation of active learning strategies with and without
natural acquisition of ratings.
¤  Application of active learning in an up-and-running mobile
context-aware recommender system.
44

Future Works
45
¤ Gamification in Active Learning for RS: making the
rating process more funny and enjoyable for the user.
Shoot the ball to the
place you visited
and liked the most

Future Works
46
¤ Active Learning for Relevant
Context Selection: how to
select context factors that are
relevant to the items.
46
Which contextual condition is more
relevant to this item?

Future Works
47
¤ Sequential Active Learning: selecting and presenting
the items to the user to rate incrementally.
¤ Hence the system can immediately adapt to the
remaining rating requests.
item 1 item 2 item 3 item 4

Future Works
48
3 2 5 1 2
2 3 3 3 4
4 5 3 4 1
5 2 2 1 5
4 1 2 1 5
5 5 2
2 1
5 3
3 4 1
? ? ? ? ?
5 2 3
2 1 4
5 5
5 3 1 2
5 3 2
Target domain Auxiliary source domainUser Personality
new user
Active Learning Active Learning

Future Works
49
High Color VarianceLow Color Variance

Publications on AL
Book Chapter:
2015
¤  N. Rubens, M. Elahi, M. Sugiyama, and D. Kaplan, Active Learning in Recommender Systems. Book
chapter in Recommender Systems Handbook, Springer Verlag, 2015
Journal:
2016
¤  M. Elahi, F. Ricci, N. Rubens, A survey of active learning in collaborative filtering recommender systems,
Computer Science Review,,,,2016,Elsevier
¤  I. Fern´andez-Tob´ıas, M. Braunhofer, M. Elahi, F. Ricci, and I. Cantador. Alleviating the New User Problem
in Collaborative Filtering by Exploiting Personality Information, User Modeling and User-Adapted
Interaction (UMUAI), Personality in Personalized Systems, 2016, Springer
2014
¤  M. Braunhofer, M. Elahi, and F. Ricci. Techniques for cold-starting context-aware mobile recommender
systems for tourism. Intelligenza Artificiale, 8(2):129–143, 2014
2013
¤  M. Elahi, F. Ricci, and N. Rubens. Active learning strategies for rating elicitation in collaborative filtering:
A system-wide perspective. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):13, 201
50
Full list @ www.researchgate.net/profile/Mehdi_Elahi2

Publications on AL
Conference:
2015
¤  M. Braunhofer, M. Elahi, and F. Ricci. User personality and the new user problem in a context-aware
points of interest recommender system. In Information and Communication Technologies in Tourism
2015. Springer International Publishing, 2015
2014
¤  M. Elahi, F. Ricci, and N. Rubens. Active learning in collaborative filtering recommender systems. In E-
Commerce and Web Technologies (EC-Web), pages 113–124. Springer International Publishing, 2014
¤  M. Braunhofer, M. Elahi, M. Ge, and F. Ricci. Context dependent preference acquisition with personality-
based active learning in mobile recommender systems. In Learning and Collaboration Technologies.
Technology-Rich Environments for Learning and Collaboration, pages 105–116. Springer International
Publishing, 2014
2013
¤  M. Elahi, M. Braunhofer, M. Ricci, and M. Tkalcic. Personality- based active learning for collaborative
filtering recommender systems. In AI* IA 2013: Advances in Artificial Intelligence, pages 360–371.
Springer International Publishing, 2013
51

Publications on AL
Conference:
2012
¤  M. Elahi, F. Ricci, and N. Rubens. Adapting to natural rating acquisition with combined active learning
strategies. In Foundations of Intelligent Systems, pages 254–263. Springer Berlin Heidelberg, 2012
2011
¤  M. Elahi, V. Repsys, and F. Ricci. Rating elicitation strategies for collaborative filtering. In E-Commerce
and Web Technologies (EC-Web), pages 160–171. Springer Berlin Heidelberg, 2011
52

Thank you!
53
seminar @ Politecnico di Milano
July 2015

Empirical Evaluation of Active Learning in Recommender Systems

More Related Content

What's hot (20)

Similar to Empirical Evaluation of Active Learning in Recommender Systems (20)

Recently uploaded (20)

Empirical Evaluation of Active Learning in Recommender Systems