SlideShare a Scribd company logo
Rus M. Mesas, Alejandro Bellogín
Universidad Autónoma de Madrid
Spain
RecSys, August 2017
Evaluating Decision-Aware
Recommender Systems
2
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
Method Precision Coverage Best?
R1 0.093 100% 
R2 0.094 97.8%
3
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
Method Precision Coverage Best?
R1 0.093 100% 
R2 0.094 97.8%
Method Precision Coverage Best?
R1 0.037 100%
R2 0.133 100%
R3 0.245 99.7% 
4
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
Method Precision Coverage Best?
R1 0.093 100% 
R2 0.094 97.8%
Method Precision Coverage Best?
R1 0.037 100%
R2 0.133 100%
R3 0.245 99.7% 
Method Precision Coverage Best?
R1 0.093 100%
R2 0.181 95.6% ?
R3 0.283 59.0% ?
R4 0.326 28.2% ?
5
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
▪ To force different coverage levels, we allow
recommenders to decide if a recommendation is
worthy of being presented to the user or not
Estimations
6
Alejandro Bellogín – RecSys, August 2017
Balancing coverage and precision
▪ [Herlocker et al 2004]: “there is no general
coverage metric that, at the same time, gives more
weight to relevant items when accounting for
coverage, and combines coverage and accuracy
measures”
▪ [Gunawardana & Shani 2015] leave the problem of
balancing coverage and precision as an open issue
in the area
7
Alejandro Bellogín – RecSys, August 2017
Combination metrics
8
Alejandro Bellogín – RecSys, August 2017
Our proposal: Correctness metric
▪ Adapted from Question Answering:
• Several questions to be answered by a system
• Each question has several options
• Only one option is correct
• If an answer is not given, it should not be considered as
an incorrect answer
• Hence, if two systems have the same number of correct
answers but one has failed less questions (it has decided
not to respond), it should be better than the other one
A. Peñas & Á. Rodrigo. 2011. A simple measure to assess non response. ACL.
9
Alejandro Bellogín – RecSys, August 2017
Correctness metric for recommendation
▪ Each recommendation algorithm is a system
▪ Each candidate item to be ranked is a question
▪ If an item is recommended, it could be relevant or
not
▪ The same set of items is presented to each system
Recommended list Precision@5 Correctness
10
Alejandro Bellogín – RecSys, August 2017
Correctness metrics for recommendation
▪ Four instantiations:
• Based on users
• Based on items
11
Alejandro Bellogín – RecSys, August 2017
What about the decision-aware
recommenders?
Estimations
12
Alejandro Bellogín – RecSys, August 2017
Decision-aware recommender systems
▪ Exploiting the confidence a system has on its own
recommendations
▪ Not completely new
• Significance weighting
• Support and confidence in case-based recommenders
▪ Focus on Collaborative Filtering algorithms
• Support of prediction score of nearest-neighbour
methods
• Uncertainty in prediction score of a probabilistic matrix
factorisation algorithm
13
Alejandro Bellogín – RecSys, August 2017
Estimating confidence in
decision-aware recommendation
▪ For user-based KNN
▪ For probabilistic MF
At least n (out of k)
neighbours have
participated in
rating estimation?
14
Alejandro Bellogín – RecSys, August 2017
Experimental setup
▪ Datasets
• MovieLens 100K, MovieLens 1M, Jester
• Random 5-fold training/test split
▪ Evaluation
• Generate a ranking with every item in the test set
• Metrics at cutoff 10: precision (P), user space coverage
(USC), item space coverage (ISC), correctness (UC, RUC,
IC, RIC), novelty (EPC), diversity (AggrDiv)
▪ Frameworks
• RankSys: evaluation metrics, KNN recommenders
• RiVal: data splitting
15
Alejandro Bellogín – RecSys, August 2017
Performance: prediction uncertainty
16
Alejandro Bellogín – RecSys, August 2017
Impact on novelty and diversity
▪ Prediction uncertainty
• More strict constraints (smaller uncertainty) decrease
novelty and diversity
17
Alejandro Bellogín – RecSys, August 2017
Conclusions
▪ We have proposed a family of metrics based on
the assumption that it is better to avoid a
recommendation rather than providing a bad
recommendation
▪ We have shown that a balance between precision,
coverage, diversity, and novelty is critical
▪ We have proposed two strategies to decide if an
item should be presented to the user
18
Alejandro Bellogín – RecSys, August 2017
Future work
▪ Extend the correctness metrics to combine other
evaluation dimensions
▪ Objective way to discriminate between systems:
which one is really the best one?
▪ Consider the psychological aspect of the
recommendation: the user is expecting to receive
N recommendations (better bad than none?)
19
Alejandro Bellogín – RecSys, August 2017
Thank you
Evaluating Decision-Aware
Recommender Systems
Rus M. Mesas, Alejandro Bellogín
Universidad Autónoma de Madrid
Spain
RecSys, August 2017
20
Alejandro Bellogín – RecSys, August 2017
Performance: prediction support
21
Alejandro Bellogín – RecSys, August 2017
Impact on novelty and diversity
▪ Prediction support
• Larger n decreases the
diversity and novelty of
the lists
• More popular items are
being recommended
22
Alejandro Bellogín – RecSys, August 2017
Motivation
▪ Typical evaluation: it is better to fail than avoiding
a recommendation
• Assumption: no returning an item is an advocate of that
item being considered as not relevant
▪ In this work: a recommender system may decide
not to recommend a specific item
• We need a metric where “no recommendation” does
not mean relevant or not relevant. If possible, it should
mean “better than not relevant”
23
Alejandro Bellogín – RecSys, August 2017
Definition of uncertainty for PMF
▪ PMF: probabilistic matrix factorisation using a
Bayesian approximation proposed in [Lim & Teh
2007]
▪ The standard deviation is derived using mean-field
variational inference:

More Related Content

PDF
Revisiting neighborhood-based recommenders for temporal scenarios
PPTX
Controlled assessment - Data Analysis
PPT
Data Analysis With Spss - Reliability
PDF
Replicable Evaluation of Recommender Systems
PDF
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
PDF
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
PDF
productionising-recommenders
PDF
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Revisiting neighborhood-based recommenders for temporal scenarios
Controlled assessment - Data Analysis
Data Analysis With Spss - Reliability
Replicable Evaluation of Recommender Systems
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
productionising-recommenders
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018

Similar to Evaluating decision-aware recommender systems (20)

PDF
Modern Recommendation for Advanced Practitioners
PDF
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
PDF
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
PDF
Survey on Software Defect Prediction
PDF
A Test-suite Diagnosability Metric for Spectrum-based Fault Localization Appr...
PDF
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
PDF
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
PPTX
Survey on Software Defect Prediction
PDF
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
PDF
Hypervolume-based search for test case prioritization - ssbse 2015
PDF
Interactive Recommender Systems
PDF
Evaluating Collaborative Filtering Recommender Systems
PDF
CLiMF: Collaborative Less-is-More Filtering
PDF
A survey of fault prediction using machine learning algorithms
PDF
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
PDF
GeneralizibilityFairness - DEFirst Reading Group
PDF
Best Practices in Recommender System Challenges
PDF
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
PPTX
Tutorial on sequence aware recommender systems - UMAP 2018
PDF
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Modern Recommendation for Advanced Practitioners
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
Survey on Software Defect Prediction
A Test-suite Diagnosability Metric for Spectrum-based Fault Localization Appr...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Survey on Software Defect Prediction
Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
Hypervolume-based search for test case prioritization - ssbse 2015
Interactive Recommender Systems
Evaluating Collaborative Filtering Recommender Systems
CLiMF: Collaborative Less-is-More Filtering
A survey of fault prediction using machine learning algorithms
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
GeneralizibilityFairness - DEFirst Reading Group
Best Practices in Recommender System Challenges
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Tutorial on sequence aware recommender systems - UMAP 2018
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ad

More from Alejandro Bellogin (17)

PDF
Recommender Systems and Misinformation: The Problem or the Solution?
PDF
Implicit vs Explicit trust in Social Matrix Factorization
PDF
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
PDF
CWI @ Contextual Suggestion track - TREC 2013
PDF
CWI @ Federated Web Track - TREC 2013
PDF
Probabilistic Collaborative Filtering with Negative Cross Entropy
PDF
Understanding Similarity Metrics in Neighbour-based Recommender Systems
PDF
Artist popularity: do web and social music services agree?
PDF
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
PDF
Performance prediction and evaluation in Recommender Systems: an Information ...
PDF
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
PDF
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
PDF
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
PDF
Predicting performance in Recommender Systems - Slides
PDF
Predicting performance in Recommender Systems - Poster slam
PDF
Predicting performance in Recommender Systems - Poster
PDF
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Recommender Systems and Misinformation: The Problem or the Solution?
Implicit vs Explicit trust in Social Matrix Factorization
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Federated Web Track - TREC 2013
Probabilistic Collaborative Filtering with Negative Cross Entropy
Understanding Similarity Metrics in Neighbour-based Recommender Systems
Artist popularity: do web and social music services agree?
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Performance prediction and evaluation in Recommender Systems: an Information ...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Ad

Recently uploaded (20)

PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Business Ethics Teaching Materials for college
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Classroom Observation Tools for Teachers
PPTX
master seminar digital applications in india
Microbial disease of the cardiovascular and lymphatic systems
Business Ethics Teaching Materials for college
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Microbial diseases, their pathogenesis and prophylaxis
Abdominal Access Techniques with Prof. Dr. R K Mishra
human mycosis Human fungal infections are called human mycosis..pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Module 4: Burden of Disease Tutorial Slides S2 2025
VCE English Exam - Section C Student Revision Booklet
FourierSeries-QuestionsWithAnswers(Part-A).pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Classroom Observation Tools for Teachers
master seminar digital applications in india

Evaluating decision-aware recommender systems

  • 1. Rus M. Mesas, Alejandro Bellogín Universidad Autónoma de Madrid Spain RecSys, August 2017 Evaluating Decision-Aware Recommender Systems
  • 2. 2 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8%
  • 3. 3 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8% Method Precision Coverage Best? R1 0.037 100% R2 0.133 100% R3 0.245 99.7% 
  • 4. 4 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8% Method Precision Coverage Best? R1 0.037 100% R2 0.133 100% R3 0.245 99.7%  Method Precision Coverage Best? R1 0.093 100% R2 0.181 95.6% ? R3 0.283 59.0% ? R4 0.326 28.2% ?
  • 5. 5 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision ▪ To force different coverage levels, we allow recommenders to decide if a recommendation is worthy of being presented to the user or not Estimations
  • 6. 6 Alejandro Bellogín – RecSys, August 2017 Balancing coverage and precision ▪ [Herlocker et al 2004]: “there is no general coverage metric that, at the same time, gives more weight to relevant items when accounting for coverage, and combines coverage and accuracy measures” ▪ [Gunawardana & Shani 2015] leave the problem of balancing coverage and precision as an open issue in the area
  • 7. 7 Alejandro Bellogín – RecSys, August 2017 Combination metrics
  • 8. 8 Alejandro Bellogín – RecSys, August 2017 Our proposal: Correctness metric ▪ Adapted from Question Answering: • Several questions to be answered by a system • Each question has several options • Only one option is correct • If an answer is not given, it should not be considered as an incorrect answer • Hence, if two systems have the same number of correct answers but one has failed less questions (it has decided not to respond), it should be better than the other one A. Peñas & Á. Rodrigo. 2011. A simple measure to assess non response. ACL.
  • 9. 9 Alejandro Bellogín – RecSys, August 2017 Correctness metric for recommendation ▪ Each recommendation algorithm is a system ▪ Each candidate item to be ranked is a question ▪ If an item is recommended, it could be relevant or not ▪ The same set of items is presented to each system Recommended list Precision@5 Correctness
  • 10. 10 Alejandro Bellogín – RecSys, August 2017 Correctness metrics for recommendation ▪ Four instantiations: • Based on users • Based on items
  • 11. 11 Alejandro Bellogín – RecSys, August 2017 What about the decision-aware recommenders? Estimations
  • 12. 12 Alejandro Bellogín – RecSys, August 2017 Decision-aware recommender systems ▪ Exploiting the confidence a system has on its own recommendations ▪ Not completely new • Significance weighting • Support and confidence in case-based recommenders ▪ Focus on Collaborative Filtering algorithms • Support of prediction score of nearest-neighbour methods • Uncertainty in prediction score of a probabilistic matrix factorisation algorithm
  • 13. 13 Alejandro Bellogín – RecSys, August 2017 Estimating confidence in decision-aware recommendation ▪ For user-based KNN ▪ For probabilistic MF At least n (out of k) neighbours have participated in rating estimation?
  • 14. 14 Alejandro Bellogín – RecSys, August 2017 Experimental setup ▪ Datasets • MovieLens 100K, MovieLens 1M, Jester • Random 5-fold training/test split ▪ Evaluation • Generate a ranking with every item in the test set • Metrics at cutoff 10: precision (P), user space coverage (USC), item space coverage (ISC), correctness (UC, RUC, IC, RIC), novelty (EPC), diversity (AggrDiv) ▪ Frameworks • RankSys: evaluation metrics, KNN recommenders • RiVal: data splitting
  • 15. 15 Alejandro Bellogín – RecSys, August 2017 Performance: prediction uncertainty
  • 16. 16 Alejandro Bellogín – RecSys, August 2017 Impact on novelty and diversity ▪ Prediction uncertainty • More strict constraints (smaller uncertainty) decrease novelty and diversity
  • 17. 17 Alejandro Bellogín – RecSys, August 2017 Conclusions ▪ We have proposed a family of metrics based on the assumption that it is better to avoid a recommendation rather than providing a bad recommendation ▪ We have shown that a balance between precision, coverage, diversity, and novelty is critical ▪ We have proposed two strategies to decide if an item should be presented to the user
  • 18. 18 Alejandro Bellogín – RecSys, August 2017 Future work ▪ Extend the correctness metrics to combine other evaluation dimensions ▪ Objective way to discriminate between systems: which one is really the best one? ▪ Consider the psychological aspect of the recommendation: the user is expecting to receive N recommendations (better bad than none?)
  • 19. 19 Alejandro Bellogín – RecSys, August 2017 Thank you Evaluating Decision-Aware Recommender Systems Rus M. Mesas, Alejandro Bellogín Universidad Autónoma de Madrid Spain RecSys, August 2017
  • 20. 20 Alejandro Bellogín – RecSys, August 2017 Performance: prediction support
  • 21. 21 Alejandro Bellogín – RecSys, August 2017 Impact on novelty and diversity ▪ Prediction support • Larger n decreases the diversity and novelty of the lists • More popular items are being recommended
  • 22. 22 Alejandro Bellogín – RecSys, August 2017 Motivation ▪ Typical evaluation: it is better to fail than avoiding a recommendation • Assumption: no returning an item is an advocate of that item being considered as not relevant ▪ In this work: a recommender system may decide not to recommend a specific item • We need a metric where “no recommendation” does not mean relevant or not relevant. If possible, it should mean “better than not relevant”
  • 23. 23 Alejandro Bellogín – RecSys, August 2017 Definition of uncertainty for PMF ▪ PMF: probabilistic matrix factorisation using a Bayesian approximation proposed in [Lim & Teh 2007] ▪ The standard deviation is derived using mean-field variational inference: