Users provide inconsistent ratings when rating items multiple times, introducing natural noise that limits recommendation accuracy. An experiment with 118 users rating 100 movies over 3 trials found:
1. Test-retest reliability was good at 0.924 but mild and negative ratings were less reliable.
2. Pairwise RMSE between trials was 0.557-0.765, with the largest error between the most distant trials.
3. Common recommendation algorithms like user-based kNN, item-based kNN and SVD were robust to noise, with less than 5% difference in RMSE across trials. The second trial consistently had the lowest noise.