Big data - A critical appraisal

+
Thomas Debeauvais
tdebeauv@uci.edu

Bart Knijnenburg
bart.k@uci.edu

Big Data
A critical appraisal

+ 2

Outline

 The wonders of Big Data

 The Perils of Big Data

 User Experiments

 A Note on Privacy

+

The Wonders
of Big Data
How Big Data will put
the personal back
in e-commerce

+ 4

Large vs small datasets

 Everything is significant!

 Data from most/all of your customers
 More than just an educated guess
 This is what really happens!

 Large datasets can improve business intelligence

+ 5

The Netflix challenge

 Recommendations seen as  $1M prize if 10% better than
Netflix’ strongest asset Netflix’s Moviematch

 2006-2009  Data: 18k movies, 500k
users, 100M ratings

+ 6

The Netflix challenge

 Netflix’s rational:
 “Improve our ability to connect people to the movies they love”
 Improve recommendations = improve satisfaction and retention
 Small R&D team, slow progress
 $1M will pay for itself

 Based on Padhraic Smyth’s report at
http://www.ics.uci.edu/~smyth/courses/cs277/slides/netflix_over
view.pdf

+ 7

Matrix approximation

 Distinguish noise from signal: variance and eigenvalues

 Singular value decomposition
 Ratings(m*n) = U(m*n) E(n*n) V(n*n)

 Rank-k approximation
 Ratings(m*n) ≈ U(m*k) E(k*k) V(k*n)
n movies k k n movies

E V
k

k
m users
m users

Ratings = U

independent, quirky,
critically acclaimed 8
Plot of V with k=2

Lowbrow Drama,
comedies, serious
Horror, comedy,
Male or Strong
adolescent female
audience lead

mainstream,
formulaic

[Koren et al. 2009]

+ 9

Bias is information

[Smyth 2010]

+ 10

Take-aways

 Matrix decomposition
 Meaningful movie categories!
 For example: lowbrow, quirky, indie, strong female lead

 Older movies are rated higher
 So ...?
 Should recommend older movies more often or less often?
 Why are they rated higher?

+

The Perils
of Big Data
How overfitting and
a lack of domain knowledge
can lead to suboptimal solutions

+ 12

What about random?

 “We were demonstrating our new recommender to a client.
They were amazed by how well it predicted their preferences!”

 “Later we found out that we forgot to activate the algorithm: the
system was giving completely random recommendations.”

+ 14

Model complexity

 “Our winning entries consist of more than 100 different
predictor sets” [Koren et al 2009]

 Only 10% better than Netflix
 Why?

 Intrinsic noise
 Example: children watch cartoons, Mum is recommended cartoons
 Should Netflix implement a “switch user” feature?
 Domain knowledge!

+ 15

More gotchas

 Obvious truisms and correlation fallacies
 Still present in large datasets
 Domain knowledge!

 Overfitting: simple models that make sense vs complex models
that fit the data

+

User Experiments
How user evaluations
can be used to create
meaningful experiences

+ 17

Offline evaluations

 Calibration/Evaluation
 Gather rating data
 Remove 10% of the ratings of each user
 Optimize the algorithm to predict those 10%

 Execution
 Predict the rating of unknown items
 Recommend items with highest predicted rating

+ 18

Offline evaluations
http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

 Problems  Solutions
 Offline evaluations may not  Test with real users
give the same outcome as (A/B testing)
online evaluations (Cosley et
al., 2002; McNee et al., 2002)

 Higher rating does not mean  Consider other behaviors
good recommendation (McNee (consumption, retention)
et al., 2006)

 The algorithm counts for only  A/B test other aspects
5% of the relevance of a (interaction, presentation)
recommender system (Francisco
Martin, 2009)

+ 19

Online evaluations

 Testing a recommender against
a random videoclip system (A/B
test) number of
clips watched
 Expectation: Consumption from beginning
to end total number of
+ viewing time clips clicked
will increase
 Reality: The number of personalized
recommendations
− −
clicked clips and total viewing OSA

time went down! perceived system
effectiveness
+ EXP

+
 Insight: Recommender is more perceived recommendation
quality
effective SSA
+

 More clips watched from choice
satisfaction
beginning to end EXP

 Users browse less, consume
more

+ 20

Behavior vs Questionnaires

 Behavior is hard to interpret
 Relationship between behavior and satisfaction is not always trivial

 Questionnaires are a better predictor of long-term retention
 With behavior only, you will need to run for a long time

 Questionnaire data is more robust
 Fewer participants needed

+ 21

A guide to user experiments
http://bit.ly/recsys2011short http://bit.ly/recsystutorialhandout

 “Is my system good?”
 What does good mean?
 We need to define measures

 “Does my system score high on this satisfaction scale?”
 What does high mean?
 We need to compare it against something

 “Does my system score higher than this other system?”
 Say we find that it scores higher on satisfaction... why does it?
 Apply the concept of ceteris paribus

+ 22

An example…

 We compared three
recommender systems
 Three different algorithms

 System effectiveness scale:
 The system has no real benefit
for me.
 I would recommend the system
to others.
 The system is useful.
 I can save time using the
system.
 I can find better TV programs
without the help of the system.

+ 23

An example…

The mediating variables tell the entire story

+ 24

An example…

Matrix Factorization recommender with Matrix Factorization recommender with
explicit feedback (MF-E) implicit feedback (MF-I)
(versus generally most popular; GMP) (versus most popular; GMP)
OSA OSA

+ +

perceived recommendation perceived recommendation perceived system
variety + quality + effectiveness
SSA SSA EXP

+

A Note on Privacy
How to avoid
this looming danger
of our Big Data future

+ 26

Personalization… with control

+ 27

Privacy concerns

 Second Netflix challenge

 Anonymized dataset

 Lawsuit from Californian closeted lesbian Mum

 Netflix withdraws their second challenge

 http://arstechnica.com/tech-policy/2012/07/class-action-lawsuit-
settlement-forces-netflix-privacy-changes/

+ 28

Privacy directive

 Transparency
 “companies should provide
clear descriptions of [...] why
they need the data, how they
will use it”
 Informed consent

 Control
 “companies should offer
consumers clear and simple
choices [...] about personal
data collection, use, and
disclosure”
 User empowerment

+ 29

Transparency Paradox

+ 30

Control Paradox

 “bewildering tangle of options” (New York Times, 2010)

 “labyrinthian controls” (U.S. Consumer Magazine, 2012)

 Researchers asked: “what do your privacy settings mean?”
 86% of Facebook users got it wrong!

+ 31

Control Paradox
http://bit.ly/chi2013privacy

 Introducing an “extreme”
E sharing option
 Nothing - City - Block
benefits 

B  Add the option Exact

 Expected:
C
 Some will choose Exact
instead of Block
N
 Unexpected:
privacy   Sharing increases across
the board!

+ 32

Bounded rationality

A 25%
?
B 37%
?
C 53%
?
D 0%
?

+ 33

Idea: nudging

 People do not always choose
what is best for them

 Idea: use defaults to “nudge”
users in the right direction

+ 34

What is the right direction?

 “More information = better, e.g. for personalization”
 Techniques to increase disclosure cause reactance in the more
privacy-minded users

 “Privacy is an absolute right“
 More difficult for less privacy-minded users to enjoy the benefits that
disclosure would provide

+ 35

It depends on the user!

 “What is best for consumers
depends upon characteristics
of the consumer

 An outcome that maximizes
consumer welfare may be
suboptimal for some consumers
in a context where there is
heterogeneity in preferences”
(Smith, Goldstein & Johnson, 2009)

+ 36

Privacy Adaptation Procedure
http://bit.ly/privdim

 Idea:
 Personalize users’ privacy settings!
 Automatic defaults in line with “disclosure profile”
 Using big data to improve big data privacy 

 Relieves some of the burden of the privacy decision:
 The right privacy-related information
 The right amount of control

 “Realistic empowerment”

+  The wonders of Big Data
Big Data can be used to create powerful
personalized e-commerce experiences

Big Data solutions will only work if the
developers have an adequate amount of
domain knowledge

Big Data solutions need to be tested on
Conclusions real users, with a focus on user
experience

Big Data can raise privacy concerns, but
it can at the same time be used to
alleviate these concerns

+  The wonders of Big Data
 Big Data can be used to create
powerful personalized e-commerce
experiences

 Big Data solutions will only work if the
developers have an adequate amount
of domain knowledge

Questions?  Big Data solutions need to be tested
on real users, with a focus on user
experience

 Big Data can raise privacy
concerns, but it can at the same time
be used to alleviate these concerns

Big data - A critical appraisal

More Related Content

Viewers also liked (6)

Similar to Big data - A critical appraisal (20)

Recently uploaded (20)

Big data - A critical appraisal

Editor's Notes