This document provides an overview of recommender system evaluation, including offline and online evaluation methods. It discusses the pros and cons of each approach. Offline evaluation uses pre-existing dataset splits for training and testing, allowing easier reproducibility but not capturing new recommendations. Online evaluation collects real-time user interaction data, providing a better picture of performance but being harder to reproduce. The document also covers common recommendation evaluation metrics like RMSE and discusses how different metrics should be considered together based on the system's objectives. Code examples for calculating these metrics are available online.
Related topics: