Letting Users Choose Recommender Algorithms

Letting Users Choose
Recommender Algorithms
Michael Ekstrand
(Texas State University)
Daniel Kluver, Max Harper, and Joe Konstan
(GroupLens Research / University of Minnesota)

Research Objective
If we give users control over the algorithm providing
their recommendations, what happens?

Why User Control?
• Different users, different needs/wants
• Allow users to personalize the recommendation
experience to their needs and preferences.
• Transparency and control may promote trust

Research Questions
• Do users make use of a switching feature?
• How much do they use it?
• What algorithms do they settle on?
• Do algorithm or user properties predict choice?

Relation to Previous Work
Paper you just saw: tweak algorithm output
We change the whole algorithm
Previous study (RecSys 2014): what do users perceive
to be different, and say they want?
We see what their actions say they want

Outline
1. Introduction (just did that)
2. Experimental Setup
3. Findings
4. Conclusion & Future Work

Context: MovieLens
• Let MovieLens users switch between algorithms
• Algorithm produces:
• Recommendations (in sort-by-recommended mode)
• Predictions (everywhere)
• Change is persistent until next tweak
• Switcher integrated into top menu

Algorithms
• Four algorithms
• Peasant: personalized (user-item) mean rating
• Bard: group-based recommender (Chang et al. CSCW
2015)
• Warrior: item-item CF
• Wizard: FunkSVD CF
• Each modified with 10% blend of popularity rank
for top-N recommendation

Experiment Design
• Only consider established users
• Each user randomly assigned an initial algorithm
(not the Bard)
• Allow users to change algorithms
• Interstitial highlighted feature on first login
• Log interactions

Users Switch Algorithms
• 3005 total users
• 25% (748) switched at least once
• 72.1% of switchers (539) settled on different
algorithm
Finding 1: Users do use the control

Ok, so how do they switch?
• Many times or just a few?
• Repeatedly throughout their use, or find an
algorithm and stick with it?

Switching Behavior: Few Times
196
157
118
63
54
32
12
21 22
12 11 4 7 3 5 4 1 4 2
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# of Transitions
Transition Count Histogram

Switching Beh.: Few Sessions
• Break sessions at 60 mins of inactivity
• 63% only switched in 1 session, 81% in 2 sessions
• 44% only switched in 1st session
• Few intervening events (switches concentrated)
Finding 2: users use the menu some, then leave it
alone

I’ll just stay here…
Question: do users find some algorithms more
initially satisfactory than others?

29.69%
22.07%
17.67%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
Baseline Item-Item SVD
Initial Algorithm
Frac. of Users Switching
(all diffs. significant, χ2 p<0.05)

…or go over there…
Question: do users tend to find some algorithms
more finally satisfactory than others?

…by some path
What do users do between initial and final?
• As stated, not many flips
• Most common: change to other personalized,
maybe change back (A -> B, A -> B -> A)
• Users starting w/ baseline usually tried one or both
personalized algorithms

53 62
292
341
0
50
100
150
200
250
300
350
400
Baseline Group Item-Item SVD
Final Choice of Algorithm
(for users who tried menu)

Algorithm Preferences
• Users prefer personalized (more likely to stay
initially or finally)
• Small preference of SVD over item-item
• Caveat: algorithm naming may confound

Interlude: Offline Experiment
• For each user:
• Discarded all ratings after starting experiment
• Use 5 most recent pre-experiment ratings for testing
• Train recommenders
• Measure:
• RMSE for test ratings
• Boolean recall: is a rated move in first 24 recs?
• Diversity (intra-list similarity over tag genome)
• Mean pop. rank of 24-item list
• Why 24? Size of single page of MovieLens results

Algorithms Made Different Recs
• Average of 53.8 unique items/user (out of 72
possible)
• Baseline and Item-Item most different (Jaccard
similarity)
• Accuracy is another story…

Algorithm Accuracy
0.62
0.64
0.66
0.68
0.7
0.72
0.74
RMSE
0
0.05
0.1
0.15
0.2
0.25
0.3
Boolean Recall

Not Predicting User Preference
• Algorithm properties do directly not predict user
preference, or whether they will switch
• Little ability to predict user behavior overall
• If user starts with baseline, diverse baseline recs
increase likelihood of trying another algorithm
• If user starts w/ item-item, novel baseline recs increase
likelihood of trying
• No other significant effects found
• Basic user properties do not predict behavior

What does this mean?
• Users take advantage of the feature
• Users experiment a little bit, then leave it alone
• Observed preference for personalized recs,
especially SVD
• Impact on long-term user satisfaction unknown

Future Work
• Disentangle preference and naming
• More domains
• Understand impact on long-term user satisfaction
and retention

Questions?
This work was supported by the National Science Foundation under grants
IIS 08-08692 and 10-17697.

Letting Users Choose Recommender Algorithms

More Related Content

Similar to Letting Users Choose Recommender Algorithms (20)

Recently uploaded (20)

Letting Users Choose Recommender Algorithms