SlideShare a Scribd company logo
Recommendation
system session 2
Eng. Maryam Mostafa
• Evaluation metrics in recommender.
• What makes a good recommender system.
• Example on rank-based recommender.
• Task1.
• Example on collaborative filtering
recommender.
• Task2.
• Try Content-Based filtering at home.
Table of content for today
session2.pdf
session2.pdf
•Offline Evaluation: Offline evaluation is done in similar ways we evaluate
machine learning models, i.e., we usually have a fixed dataset, collected and
immutable before the beginning of the evaluation, and then the dataset is
splitted into two parts, the train and test set, the RS are trained on the train and
then evaluated over the test set.
•Online Evaluation: As the name states, the online evaluation is usually
performed online, with real users interacting with different versions or algorithms
of a RS and the evaluation is performed by collecting metrics associated with
the user behavior in real time.
Offline
Offline:
• Pros - This type of evaluation can be easier to set. By having lots of already published datasets with their respective
ratings or evaluations, people can easily set up and evaluate their algorithms by comparing their output with the
expected output from the already published results. By having a fixed dataset and possible fixed user interactions
with it (all existing ratings in the dataset) the results of an offline evaluation is also reproducible in an easier way,
comparing to online evaluations.
• Cons - There are a few discussions regarding the validity of offline evaluations. For example, the most criticized
aspect of it is the overall capacity of the performance evaluation of the trained algorithm in a splitted test set. The
idea of a RS is to provide new recommendations that the user probably doesn't know yet. The problem of testing it in
a test set is that we must have already the user's evaluations for each item/recommendation, i.e. we end up testing
only item that we are sure the user knows. Even more, in this evaluation, if the RS recommend an item the user
hadn't evaluated yet but that could be a good recommendation, we penalize it because we don't have it in our
test set. In the end, we end up penalizing the RS for doing its job.
When do I perform one or another?
Both approaches have its pros and cons:
Online:
• Pros - Contrary to offline evaluations, in an online context, we have the possibility to collect real
time user interaction with the RS, among which, reviews, clicks, preferences etc. This can bring a
whole better picture when evaluating the RS's performance. Besides, as we are evaluating real time
data, instead of a static one, we're able to provide further analysis if desired.
• Cons - Dynamic real time data also bring a negative point in the evaluation, as the reproducibility of
the experiment can be worse, when comparing to a static script and dataset. Besides, in order to
prepare (and maybe even create) the environment to test the RS, we must expend a considerable
higher amount of time to set it up.
DIFFERENT OFFLINE METRICS
The different offline metrics and other measures that define our Recommendation System are
mentioned below.
session2.pdf
session2.pdf
Note
• They are not complete withing themselves in case of Recommendation Systems
i.e RMSE value of 0.8766 for an algorithm doesn’t mean anything until there is
another algorithm with another RMSE value with which we can compare our
current RMSE value.
• MSE or RMSE doesn’t matter in the real world. What matters the most is which
movies you post in front of a user in top N recommendations and how users react
to it.
session2.pdf
session2.pdf
session2.pdf
session2.pdf
session2.pdf
session2.pdf
session2.pdf
session2.pdf
session2.pdf
WHAT TO FOCUS ON — WHICH METRIC?
Given that we have covered various metrics and dimensions for
evaluating our Recommendation System; You might be thinking which
metric is the best?! Well, it “depends”. There are many factors that we
have to consider before giving priority to one metric over another.
Metrics must be looked at together and we must understand the trade-
offs between them. We must also focus on the requirements and main
objective for building Recommendation System.
session2.pdf
Python code for implementation of each
of the above metrics can be found in
Kaggle kernel (here)
Now let’s
practice
Thank you

More Related Content

PPTX
Collaborative Filtering Recommendation System
PPTX
Online BookStore Recommender Systems Using Collaborative Filtering Algorithm
PDF
Evaluating Collaborative Filtering Recommender Systems
PDF
HCI-Chapter9.pdf. learning materials for us all
PDF
HCI-Chapter9 (1).pdf. Simple download and learn
PDF
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
PDF
IRJET- Hybrid Book Recommendation System
PDF
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
Collaborative Filtering Recommendation System
Online BookStore Recommender Systems Using Collaborative Filtering Algorithm
Evaluating Collaborative Filtering Recommender Systems
HCI-Chapter9.pdf. learning materials for us all
HCI-Chapter9 (1).pdf. Simple download and learn
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
IRJET- Hybrid Book Recommendation System
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis

Similar to session2.pdf (20)

PPTX
ppt.pptx
PDF
A Community Detection and Recommendation System
DOCX
Performance appraisal model
DOCX
Performance appraisal model
PDF
IRJET- Analysis of Question and Answering Recommendation System
PDF
IRJET - Online Product Scoring based on Sentiment based Review Analysis
PPTX
Text Enhanced Recommendation System Model Based on Yelp Reviews
PDF
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
PDF
B1802021823
PDF
Which Performance Appraisal Style Suits Your Company?
PDF
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
PPTX
535701365-Project-on-Movie-Recommendation.pptx
PPTX
Recommender system
DOCX
CHAPTER6 Performing a Risk AssessmentTHERE ARE SEVERAL S.docx
PDF
Personalized recommendation for cold start users
PDF
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
PDF
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
DOCX
DYNAMIC FACET ORDERING FOR FACETED PRODUCT SEARCH ENGINES
PPTX
Paper prototype evaluation
PDF
FIND MY VENUE: Content & Review Based Location Recommendation System
ppt.pptx
A Community Detection and Recommendation System
Performance appraisal model
Performance appraisal model
IRJET- Analysis of Question and Answering Recommendation System
IRJET - Online Product Scoring based on Sentiment based Review Analysis
Text Enhanced Recommendation System Model Based on Yelp Reviews
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
B1802021823
Which Performance Appraisal Style Suits Your Company?
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
535701365-Project-on-Movie-Recommendation.pptx
Recommender system
CHAPTER6 Performing a Risk AssessmentTHERE ARE SEVERAL S.docx
Personalized recommendation for cold start users
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
DYNAMIC FACET ORDERING FOR FACETED PRODUCT SEARCH ENGINES
Paper prototype evaluation
FIND MY VENUE: Content & Review Based Location Recommendation System
Ad

Recently uploaded (20)

PDF
STKI Israel Market Study 2025 version august
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
1. Introduction to Computer Programming.pptx
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Architecture types and enterprise applications.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
STKI Israel Market Study 2025 version august
NewMind AI Weekly Chronicles – August ’25 Week III
cloud_computing_Infrastucture_as_cloud_p
Chapter 5: Probability Theory and Statistics
O2C Customer Invoices to Receipt V15A.pptx
TLE Review Electricity (Electricity).pptx
Enhancing emotion recognition model for a student engagement use case through...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Getting started with AI Agents and Multi-Agent Systems
1. Introduction to Computer Programming.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Univ-Connecticut-ChatGPT-Presentaion.pdf
Hybrid model detection and classification of lung cancer
A comparative study of natural language inference in Swahili using monolingua...
Architecture types and enterprise applications.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
1 - Historical Antecedents, Social Consideration.pdf
Hindi spoken digit analysis for native and non-native speakers
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Ad

session2.pdf

  • 2. • Evaluation metrics in recommender. • What makes a good recommender system. • Example on rank-based recommender. • Task1. • Example on collaborative filtering recommender. • Task2. • Try Content-Based filtering at home. Table of content for today
  • 5. •Offline Evaluation: Offline evaluation is done in similar ways we evaluate machine learning models, i.e., we usually have a fixed dataset, collected and immutable before the beginning of the evaluation, and then the dataset is splitted into two parts, the train and test set, the RS are trained on the train and then evaluated over the test set. •Online Evaluation: As the name states, the online evaluation is usually performed online, with real users interacting with different versions or algorithms of a RS and the evaluation is performed by collecting metrics associated with the user behavior in real time.
  • 7. Offline: • Pros - This type of evaluation can be easier to set. By having lots of already published datasets with their respective ratings or evaluations, people can easily set up and evaluate their algorithms by comparing their output with the expected output from the already published results. By having a fixed dataset and possible fixed user interactions with it (all existing ratings in the dataset) the results of an offline evaluation is also reproducible in an easier way, comparing to online evaluations. • Cons - There are a few discussions regarding the validity of offline evaluations. For example, the most criticized aspect of it is the overall capacity of the performance evaluation of the trained algorithm in a splitted test set. The idea of a RS is to provide new recommendations that the user probably doesn't know yet. The problem of testing it in a test set is that we must have already the user's evaluations for each item/recommendation, i.e. we end up testing only item that we are sure the user knows. Even more, in this evaluation, if the RS recommend an item the user hadn't evaluated yet but that could be a good recommendation, we penalize it because we don't have it in our test set. In the end, we end up penalizing the RS for doing its job. When do I perform one or another? Both approaches have its pros and cons:
  • 8. Online: • Pros - Contrary to offline evaluations, in an online context, we have the possibility to collect real time user interaction with the RS, among which, reviews, clicks, preferences etc. This can bring a whole better picture when evaluating the RS's performance. Besides, as we are evaluating real time data, instead of a static one, we're able to provide further analysis if desired. • Cons - Dynamic real time data also bring a negative point in the evaluation, as the reproducibility of the experiment can be worse, when comparing to a static script and dataset. Besides, in order to prepare (and maybe even create) the environment to test the RS, we must expend a considerable higher amount of time to set it up.
  • 9. DIFFERENT OFFLINE METRICS The different offline metrics and other measures that define our Recommendation System are mentioned below.
  • 12. Note • They are not complete withing themselves in case of Recommendation Systems i.e RMSE value of 0.8766 for an algorithm doesn’t mean anything until there is another algorithm with another RMSE value with which we can compare our current RMSE value. • MSE or RMSE doesn’t matter in the real world. What matters the most is which movies you post in front of a user in top N recommendations and how users react to it.
  • 22. WHAT TO FOCUS ON — WHICH METRIC? Given that we have covered various metrics and dimensions for evaluating our Recommendation System; You might be thinking which metric is the best?! Well, it “depends”. There are many factors that we have to consider before giving priority to one metric over another. Metrics must be looked at together and we must understand the trade- offs between them. We must also focus on the requirements and main objective for building Recommendation System.
  • 24. Python code for implementation of each of the above metrics can be found in Kaggle kernel (here)