SlideShare a Scribd company logo
Collaborative Filtering with Binary,
Positive-only Data
Tutorial @ ECML PKDD, September 2015, Porto
Koen Verstrepen+,
Kanishka Bhaduri*,
Bart Goethals+ *
+
Agenda
•  Introduction
•  Algorithms
•  Netflix
Agenda
•  Introduction
•  Algorithms
•  Netflix
Binary, Positive-Only Data
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRES
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top
273–280.
Fabio Aiolli. 2014. Convex A
293–296.
S.S. Anand and B. Mobasher
Collaborative Filtering
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRES
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top
273–280.
Fabio Aiolli. 2014. Convex A
293–296.
S.S. Anand and B. Mobasher
Movies
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
Music
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
Social Networks
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
Tagging / Annotation
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
Paris
New York
Porto
Statue of Liberty
Eiffel Tower
Also Explicit Feedback
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
Matrix Representation
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many a
evaluation measures, multiple data split methods, sufficiently rand
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin
recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Man
evaluation measures, multiple data split methods
— also empirically evaluate the explanations extract
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation
C.M. Bishop. 2006. Pattern Recognition and Machine Learning.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H
recommender systems. In Advances in Knowledge Discovery
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P
— Convince the reader this is much better than offline, how to
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets
evaluation measures, multiple data split methods, sufficien
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde
recommender systems. In Advances in Knowledge Discovery and Data M
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance
top-n recommendation tasks. In Proceedings of the fourth ACM confer
1	
   	
  	
   1	
   	
  	
   1	
  
	
  	
   1	
   	
  	
   	
  	
  
1	
   	
  	
   	
  	
   1	
  
1	
   	
  	
   	
  	
   1	
  
— Convince the re
8. EXPERIMENTAL
— Who: ?
— THE offline com
evaluation meas
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
— THE offli
evaluatio
— also empi
9. SYMBOL
U
I
R
REFERENC
F. Aiolli. 201
273–280.
Fabio Aiolli. 2
293–296.
S.S. Anand an
C.M. Bishop.
Evangelia Ch
R
Unknown = 0 no negative information
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many a
evaluation measures, multiple data split methods, sufficiently rand
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin
recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Man
evaluation measures, multiple data split methods
— also empirically evaluate the explanations extract
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation
C.M. Bishop. 2006. Pattern Recognition and Machine Learning.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H
recommender systems. In Advances in Knowledge Discovery
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P
— Convince the reader this is much better than offline, how to
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets
evaluation measures, multiple data split methods, sufficien
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde
recommender systems. In Advances in Knowledge Discovery and Data M
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance
top-n recommendation tasks. In Proceedings of the fourth ACM confer
1	
   	
  0	
   1	
   	
  0	
   1	
  
	
  0	
   1	
   0	
   0	
  	
   	
  0	
  
1	
   0	
  	
   0	
  	
   1	
   0	
  
0	
   1	
   	
  0	
   	
  0	
   1	
  
— Convince the re
8. EXPERIMENTAL
— Who: ?
— THE offline com
evaluation meas
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
— THE offli
evaluatio
— also empi
9. SYMBOL
U
I
R
REFERENC
F. Aiolli. 201
273–280.
Fabio Aiolli. 2
293–296.
S.S. Anand an
C.M. Bishop.
Evangelia Ch
R
Different Data
Ratings
Graded relevance,
Positive-Only
Binary,
Positive-Only
1	
   	
  	
   5	
   	
  	
   4	
  
	
  	
   3	
   3	
   	
  	
   	
  	
  
4	
   	
  	
   	
  	
   2	
   2	
  
5	
   5	
   	
  	
   	
  	
   1	
  
	
  	
   5	
   	
  	
   4	
  
	
  	
   	
  	
   	
  	
  
4	
   	
  	
   	
  	
  
5	
   5	
   	
  	
   	
  	
  
	
  	
   X	
   	
  	
   X	
  
	
  	
   	
  	
   	
  	
  
X	
   	
  	
   	
  	
  
X	
   X	
   	
  	
   	
  	
  
• 
•  Movies
•  Music
•  …
•  Minutes watched
•  Times clicked
•  Times listened
•  Money spent
•  Visits/week
•  …
•  Seen
•  Bought
•  Watched
•  Clicked
•  …
Sparse 10 in 10 000
Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Difference with rating-based algorithms
– Parameter inference
•  Netflix
Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Difference with rating-based algorithms
– Parameter inference
•  Netflix
pLSA An elegant example
[Hofmann 2004]
pLSA probabilistic Latent Semantic Analysis
— also empirically evaluate
9. SYMBOLS FOR PRESENTA
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N R
273–280.
Fabio Aiolli. 2014. Convex AUC o
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
F. Aiolli. 2013. Effi
273–280.
Fabio Aiolli. 2014. C
293–296.
— Who: ?
— THE offline comparison of OCCF algorithms
evaluation measures, multiple data split me
— also empirically evaluate the explanations e
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Ve
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N r
293–296.
— THE offline compariso
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRESE
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-
273–280.
Fabio Aiolli. 2014. Convex AU
293–296.
pLSA latent interests
— also empirically evaluate
9. SYMBOLS FOR PRESENTA
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N R
273–280.
Fabio Aiolli. 2014. Convex AUC o
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
F. Aiolli. 2013. Effi
273–280.
Fabio Aiolli. 2014. C
293–296.
U
I
R
D
REFERENCES
F. Aiolli. 2013. Efficient
273–280.
Fabio Aiolli. 2014. Conve
293–296.
S.S. Anand and B. Mobas
— We should emphasise how choosing hyperparameters is oft
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to d
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, m
evaluation measures, multiple data split methods, sufficiently
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s
recommender systems. In Advances in Knowledge Discovery and Data Min
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of
top-n recommendation tasks. In Proceedings of the fourth ACM conferen
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh
Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization
1:30 K. Verstrepen e
— Convince the reader ranking is more important than RMSE or MSE.
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random missing ratings: influence of popularity a
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction and ranking with non-random missing da
— Marlin et al. :collaborative filtering and the missing at random assumption
— Steck: Training and testing of recommender systems on data missing not at rand
— We should emphasise how choosing hyperparameters is often done in a way t
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to do it etc.
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma
evaluation measures, multiple data split methods, sufficiently randomized.
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst
— Who: ?
— THE offline comparison of OCCF algorithms
evaluation measures, multiple data split me
— also empirically evaluate the explanations e
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Ve
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N r
293–296.
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizati
293–296.
S.S. Anand and B. Mobasher. 2006. Contex
— THE offline compariso
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRESE
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-
273–280.
Fabio Aiolli. 2014. Convex AU
293–296.
pLSA generative model
— also empirically evaluate
9. SYMBOLS FOR PRESENTA
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N R
273–280.
Fabio Aiolli. 2014. Convex AUC o
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
F. Aiolli. 2013. Effi
273–280.
Fabio Aiolli. 2014. C
293–296.
U
I
R
D
REFERENCES
F. Aiolli. 2013. Efficient
273–280.
Fabio Aiolli. 2014. Conve
293–296.
S.S. Anand and B. Mobas
— We should emphasise how choosing hyperparameters is oft
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to d
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, m
evaluation measures, multiple data split methods, sufficiently
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s
recommender systems. In Advances in Knowledge Discovery and Data Min
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of
top-n recommendation tasks. In Proceedings of the fourth ACM conferen
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh
Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization
1:30 K. Verstrepen e
— Convince the reader ranking is more important than RMSE or MSE.
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random missing ratings: influence of popularity a
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction and ranking with non-random missing da
— Marlin et al. :collaborative filtering and the missing at random assumption
— Steck: Training and testing of recommender systems on data missing not at rand
— We should emphasise how choosing hyperparameters is often done in a way t
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to do it etc.
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma
evaluation measures, multiple data split methods, sufficiently randomized.
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst
— Who: ?
— THE offline comparison of OCCF algorithms
evaluation measures, multiple data split me
— also empirically evaluate the explanations e
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Ve
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N r
293–296.
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizati
293–296.
S.S. Anand and B. Mobasher. 2006. Contex
— THE offline compariso
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRESE
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-
273–280.
Fabio Aiolli. 2014. Convex AU
293–296.
pLSA probabilistic weights
— also empirically evaluate
9. SYMBOLS FOR PRESENTA
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N R
273–280.
Fabio Aiolli. 2014. Convex AUC o
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
F. Aiolli. 2013. Effi
273–280.
Fabio Aiolli. 2014. C
293–296.
U
I
R
D
REFERENCES
F. Aiolli. 2013. Efficient
273–280.
Fabio Aiolli. 2014. Conve
293–296.
S.S. Anand and B. Mobas
— We should emphasise how choosing hyperparameters is oft
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to d
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, m
evaluation measures, multiple data split methods, sufficiently
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s
recommender systems. In Advances in Knowledge Discovery and Data Min
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of
top-n recommendation tasks. In Proceedings of the fourth ACM conferen
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh
Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization
1:30 K. Verstrepen e
— Convince the reader ranking is more important than RMSE or MSE.
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random missing ratings: influence of popularity a
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction and ranking with non-random missing da
— Marlin et al. :collaborative filtering and the missing at random assumption
— Steck: Training and testing of recommender systems on data missing not at rand
— We should emphasise how choosing hyperparameters is often done in a way t
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to do it etc.
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma
evaluation measures, multiple data split methods, sufficiently randomized.
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst
— Who: ?
— THE offline comparison of OCCF algorithms
evaluation measures, multiple data split me
— also empirically evaluate the explanations e
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Ve
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N r
293–296.
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizati
293–296.
S.S. Anand and B. Mobasher. 2006. Contex
— THE offline compariso
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRESE
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-
273–280.
Fabio Aiolli. 2014. Convex AU
293–296.
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. E
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, N
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde
recommender systems. In Advances in Knowledge Discovery and Data M
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance
top-n recommendation tasks. In Proceedings of the fourth ACM confer
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recom
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of t
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehens
Methods. In Recommender Systems Handbook, F.
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
PD
d=1 p(d | u) = 1P
i2I p(i | d) = 1
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Data
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedb
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
PD
d=1 p(d | u) =P
i2I p(i | d) = 1
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I p(i | d) = 1
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
REFERENCES
F. Aiolli. 2013. Efficient
273–280.
Fabio Aiolli. 2014. Conv
293–296.
S.S. Anand and B. Moba
C.M. Bishop. 2006. Patte
pLSA compute like-probability
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random m
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014.
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
pLSA computing the weights
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
D
d = 1
d = 1
d = D
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
(tempered)	
  Expecta5on-­‐Maximiza5on	
  (EM)	
  
(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
optimization for top-N recommendation with implicit feedback. In RecSy
06. Contextual Recommendation. In WebMine. 142–160.
gnition and Machine Learning. Springer, New York, NY.
George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-
pLSA à General
pLSA recap
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random m
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014.
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
ptimization for top-N recommendation with implicit feedback. In RecSys.
6. Contextual Recommendation. In WebMine. 142–160.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
pLSA recap
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random m
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014.
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
ptimization for top-N recommendation with implicit feedback. In RecSys.
6. Contextual Recommendation. In WebMine. 142–160.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
2|u|
models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
2|u|
models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
2 models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
nt top-n recommendation for very large scale binary rated datasets. In Proceedings
rence on Recommender systems. ACM, 273–280.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
nt top-n recommendation for very large scale binary rated datasets. In Proceedings
rence on Recommender systems. ACM, 273–280.
x AUC optimization for top-N recommendation with implicit feedback. In Proceed-
Conference on Recommender systems. ACM, 293–296.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
ent top-n recommendation for very large scale binary rated datasets. In Proceedings
erence on Recommender systems. ACM, 273–280.
ex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
ES
013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings
ACM conference on Recommender systems. ACM, 273–280.
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
e 8th ACM Conference on Recommender systems. ACM, 293–296.
h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
pLSA recap
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random m
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014.
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
ptimization for top-N recommendation with implicit feedback. In RecSys.
6. Contextual Recommendation. In WebMine. 142–160.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
2|u|
models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
2|u|
models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
2 models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
nt top-n recommendation for very large scale binary rated datasets. In Proceedings
rence on Recommender systems. ACM, 273–280.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
nt top-n recommendation for very large scale binary rated datasets. In Proceedings
rence on Recommender systems. ACM, 273–280.
x AUC optimization for top-N recommendation with implicit feedback. In Proceed-
Conference on Recommender systems. ACM, 293–296.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
ent top-n recommendation for very large scale binary rated datasets. In Proceedings
erence on Recommender systems. ACM, 273–280.
ex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
ES
013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings
ACM conference on Recommender systems. ACM, 273–280.
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
e 8th ACM Conference on Recommender systems. ACM, 293–296.
h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
pLSA matrix factorization notation
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random m
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014.
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
ptimization for top-N recommendation with implicit feedback. In RecSys.
6. Contextual Recommendation. In WebMine. 142–160.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many d
evaluation measures, multiple data split methods, s
— also empirically evaluate the explanations extracted
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
evaluation measures, multiple data split methods, su
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u) =
DX
p(i|d) · p(d|u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publicati
p(i|u) =
X
d=1
p(i|d) · p(d|u
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D
ACM Computing Surveys, Vol. 1, No
pLSA matrix factorization notation
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random m
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014.
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
ptimization for top-N recommendation with implicit feedback. In RecSys.
6. Contextual Recommendation. In WebMine. 142–160.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many d
evaluation measures, multiple data split methods, s
— also empirically evaluate the explanations extracted
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
evaluation measures, multiple data split methods, su
— also empirically evaluate the explanations extracted.
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u) =
DX
p(i|d) · p(d|u)
D ⇥
|U|
|I|
|U|
|I|
|I|
D
Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)
ltering: A Theoretical and Experimental Comparison of the State Of The Art1:31
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)
Scores = Matrix Factorization
— Who: ?
— THE offline comparison of OCCF alg
evaluation measures, multiple data
— also empirically evaluate the explan
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
— also empirically evaluate the explan
9. SYMBOLS FOR PRESENTATION
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u)
max
P
log p(i | u)
D ⇥
|U|
|I|
|U|
|I|
|I|
D
Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)
Collaborative Filtering: A Theoretical and Experimental Comparison of the Stat
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)
Deviation Function
S =
⇣
S(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
Efficient Top-N Recommendation for Very Large Scale Binary Rated Dat
4. Convex AUC optimization for top-N recommendation with implicit feed
B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
6. Pattern Recognition and Machine Learning. Springer, New York, NY.
akopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear
r systems. In Advances in Knowledge Discovery and Data Mining. Spring
, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommend
max
X
Rui=1
log p(i|u)
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
min D (S, R) =
X
Rui=1
log Sui
min D (S, R)
S =
⇣
S(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
min D (S, R) =
X
Rui=1
log Sui
S =
⇣
S(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
min D (S, R) =
X
Rui=1
log Sui
Summary: 2 Basic Building Blocks
Factorization Model
Deviation Function
Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Parameter inference
•  Netflix
Tour of The Models
pLSA soft clustering interpretation
user-item scores
user-cluster affinity
item-cluster affinity
mixed clusters
[Hofmann 2004]
[Hu et al. 2008]
[Pan et al. 2008]
[Sindhwani et al. 2010]
[Yao et al. 2014]
[Pan and Scholz 2009]
[Rendle et al. 2009]
[Shi et al. 2012]
[Takàcs and Tikk 2012]
pLSA soft clustering interpretation
— Who: Kanishka?
— Convince the rea
8. EXPERIMENTAL
— Who: ?
— THE offline comp
evaluation meas
— also empirically
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
max
P
Rui=1
log p(i
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
— Who: ?
— THE offline co
evaluation me
— also empirical
9. SYMBOLS FO
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) =
P
i2I
p(i | d) = 1
max
P
Rui=1
log
REFERENCES
F. Aiolli. 2013. Effi
273–280.
Fabio Aiolli. 2014. C
293–296.
S.S. Anand and B. M
D ⇥
|U|
|I|
|U|
|I|
|I|
D
0.05	
  
0.1	
  
0.5	
  
0.3	
  
0.4	
  
0.1	
  
0.4	
  
0.1	
  
max
P
Rui=1
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D = 4
d = 1
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u)
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D = 4
d = 1
ACM Comp
Binary, Positive-Only Collaborative Filtering
d = 2
d = 3
d = 4
S
Binary, Positive-Only Collaborative Filtering
d = 2
d = 3
d = 4
S
Binary, Positive-Only Collaborative Filtering: A
d = 2
d = 3
d = 4
Sui
S
0.04	
  
0.01	
  
0.20	
  
0.03	
  
0.28	
  
user-item scores
user-cluster affinity
item-cluster affinity
Hard Clustering
user-item scores
user-uCluster membership
item-iCluster membership
item probabilities
uCluster-iCluster similarity
[Hofmann 2004]
[Hofmann 1999]
[Ungar and Foster 1998]
Item Similarity dense
user-item scores original rating matrix item-item similarity
[Rendle et al. 2009]
[Aiolli 2013]
Item Similarity sparse
user-item scores item-item similarityoriginal rating matrix
[Deshpande and Karypis 2004]
[Sigurbjörnsson and Van Zwol 2008]
[Ning and Karypis 2011]
User Similarity sparse
user-item scores
column normalized
original rating matrix
(row normalized)
user-user similarity
[Sarwar et al. 2000]
User Similarity dense
user-item scores
column normalized
original rating matrix
(row normalized)
user-user similarity
[Aiolli 2014]
[Aiolli 2013]
User+Item Similarity
[Verstrepen and Goethals 2014]
Factored Item Similarity symmetrical
user-item scores original rating matrix Identical item profiles
evaluation measures, mult
— also empirically evaluate th
9. SYMBOLS FOR PRESENTAT
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
9. SYMBOLS FO
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) =
P
i2I
p(i | d) = 1
max
P
Rui=1
log
item clusters
Item-cluster affinity
Similarity by dotproduct
[Weston et al. 2013b]
Factored Item Similarity asymmetrical + bias
user-item scores
original rating matrix
row normalized
Item profile if
known preference
Item profile
if candidateitem biasesuser biases
[Kabbur et al. 2013]
Higher Order Item Similarity inner product
user-item scores extended rating matrix Itemset-item similarity
selected higher order
itemsets[Christakopoulou and Karypis 2014]
[Deshpande and Karypis 2004]
[Menezes et al. 2010]
[van Leeuwen and Puspitaningrum 2012]
[Lin et al. 2002]
Higher Order Item Similarity max product
0.05	
  
0.1	
  
0.5	
  
0.3	
  
0.4	
  
0.1	
  
0.4	
  
0.1	
  
0.04	
  
0.01	
  
0.20	
  
0.03	
  
0.20	
  
max
MP
[Sarwar et al. 2001]
[Mobasher et al. 2001]
Higher Order User Similarity inner product
user-item scores user-userset similarity extended rating matrix
selected higher order
usersets
[Lin et al. 2002]
Best of few user models non linearity by max
[Weston et al. 2013a]
Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison o
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In P
of the 7th ACM conference on Recommender systems. ACM, 273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback.
ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear meth
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
ENCES
olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings
he 7th ACM conference on Recommender systems. ACM, 273–280.
olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-
commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,
46.
~ 3 models/user
Best of all user models efficient max out of
[Verstrepen and Goethals 2015]
Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Compariso
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. I
of the 7th ACM conference on Recommender systems. ACM, 273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac
ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear me
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer,
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
ENCES
olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings
he 7th ACM conference on Recommender systems. ACM, 273–280.
olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-
commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,
46.
. . .
max for every (u, i)
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log(1 Sui) +
⇣
||S(
2|u|
models/user
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale b
of the 7th ACM conference on Recommender systems. ACM, 273–280
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation
ings of the 8th ACM Conference on Recommender systems. ACM, 29
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recom
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o
recommender systems. In Advances in Knowledge Discovery and Da
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance
n recommendation tasks. In Proceedings of the fourth ACM conferen
39–46.
Mukund Deshpande and George Karypis. 2004. Item-based top-n recom
2 models/
2|u|
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very
of the 7th ACM conference on Recommender systems. A
Fabio Aiolli. 2014. Convex AUC optimization for top-N rec
ings of the 8th ACM Conference on Recommender syste
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Con
C.M. Bishop. 2006. Pattern Recognition and Machine Lear
Evangelia Christakopoulou and George Karypis. 2014. Ho
recommender systems. In Advances in Knowledge Dis
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010
n recommendation tasks. In Proceedings of the fourth
Combination item vectors can be shared
[Kabbur and Karypis 2014]
Binary, Positive-Only Collaborative Filtering: A Theoretical and E
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
=
Z
( ) · p(
D(S, R) = DKL(Q(S)||p(S|
. . .
max for every (u, i)
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large sca
of the 7th ACM conference on Recommender systems. ACM, 273
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommenda
ings of the 8th ACM Conference on Recommender systems. ACM
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual r
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Spr
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Hig
recommender systems. In Advances in Knowledge Discovery an
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets.
of the 7th ACM conference on Recommender systems. ACM, 273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear m
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer,
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender alg
n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender
39–46.
Sigmoid link function for probabilistic frameworks
[Johnson 2014]
= r
u2U i2I
Rui=1
j2I
Duij(S, R) =
=
Z
( ) · p(
ive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of t
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
CES
2013. Efficient top-n recommendation for very large scale binary rated datasets. In Pro
h ACM conference on Recommender systems. ACM, 273–280.
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In
he 8th ACM Conference on Recommender systems. ACM, 293–296.
gh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Pdf over parameters i.s.o. point estimation
[Koeningstein et al. 2012]
[Paquet and Koeningstein 2013]
Summary: 2 Basic Building Blocks
Factorization Model
Deviation Function
Summary: 2 Basic Building Blocks
Factorization Model
Deviation Function
a.k.a. What do we minimize in order to find the
parameters in the factor matrices?
Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Difference with rating-based algorithms
– Parameter inference
•  Netflix
Tour of Deviation Functions
Local Minima depending on initialisation
Rui=1
X
Rui=1
log Sui
=
X
Rui=1
log Sui
n D (S, R)
for Very Large Scale Binary Rated Datasets. In RecSys.
top-N recommendation with implicit feedback. In RecSys.
ecommendation. In WebMine. 142–160.
hine Learning. Springer, New York, NY.
2014. Hoslim: Higher-order sparse linear method for top-n
ledge Discovery and Data Mining. Springer, 38–49.
urrin. 2010. Performance of recommender algorithms on
s of the fourth ACM conference on Recommender systems.
d Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
hensive Survey of Neighborhood-based Recommendation
ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
hirani. 2010. Regularization paths for generalized linear
tistical software 33, 1 (2010), 1.
en PLSA and NMF and implications. In SIGIR. 601–602.
Indexing. In SIGIR. 50–57.
ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1
Mining and Knowledge Discovery Handbook, O. Mainmon
Y.
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KN
every row S
(1)
u. and
(S(1,1)
, . . . , S(T,F )
)
REFERENCES
Max Likelihood high scores for known preferences
max
Rui=1
log Sui
min
X
Rui=1
log Sui
in D (S, R) =
X
Rui=1
log Sui
Binary, Positive-Only Collaborative Filtering: A Theoretical a
d = 2
d = 3
d = 4
D = |I|
S
S(1)
S(2)
S
(1)
ud 0
S
(1)
di 0
Binary, Positive-Only Collaborative Filtering: A Th
d = 2
d = 3
d = 4
D = |I|
S
S(1)
S(2)
S
(1)
ud 0
S
(1)
di 0
Binary, Positive-Only Collaborative
d = 2
d = 3
d = 4
D = |I|
S
S(1)
S(2)
S
(1)
ud 0
S
(1)
di 0
1
1
[Hofmann 2004]
[Hofmann 1999]
Reconstruction
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
((Rui Ruj) (Sui Suj))
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
NCES
2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re
280.
lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re
296.
d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for
Reconstruction
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
((Rui Ruj) (Sui Suj))
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
NCES
2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re
280.
lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re
296.
d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for
`Ridge’regularization[Kabbur et al. 2013]
[Kabbur and Karypis 2014]
Reconstruction
Elastic net regularization
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
((Rui Ruj) (Sui Suj))
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
NCES
2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re
280.
lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re
296.
d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for
(1 0)2
= 1 = (1 2)2
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
((Rui Ruj) (Sui Suj))
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
FERENCES
Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Dataset
273–280.
bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac
293–296.
`Ridge’regularization
[Ning and Karypis 2011]
[Christakopoulou and Karypis 2014]
[Kabbur et al. 2013]
[Kabbur and Karypis 2014]
Reconstruction between AMAU and AMAN
u2U i2I
rly makes the AMAU assumption. Making the A
all missing values are interpreted as an absenc
on becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
he AMAU assumption is too careful because the
atives. On the other hand, the AMAN assumpti
ually searching for the preferences among the un
t al. 2008] and Pan et al. [Pan et al. 2008] simul
een AMAU and AMAN:
D (S, R) =
X X
Wui (Rui Sui)
2
,
AMAN
Reconstruction between AMAU and AMAN
999]. Ungar and Foster [Ungar and Foster 199
ethod, but remain vague about the details of t
tion Based Deviation Functions. Next, there is a
-based matrix factorization algorithms for ra
Bell 2011]. They start from the 2-factor factor
(Eq. 3) but strip the parameters of all their st
ated to be an approximate, factorized reconstru
h is to find S(1)
and S(2)
such that they mini
ror between S and R. A deviation function th
D (S, R) =
X
u2U
X
i2I
Rui (Rui Sui)
2
.
u2U i2I
rly makes the AMAU assumption. Making the A
all missing values are interpreted as an absenc
on becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
he AMAU assumption is too careful because the
atives. On the other hand, the AMAN assumpti
ually searching for the preferences among the un
t al. 2008] and Pan et al. [Pan et al. 2008] simul
een AMAU and AMAN:
D (S, R) =
X X
Wui (Rui Sui)
2
,
AMAU
AMAN
Reconstruction between AMAU and AMAN
999]. Ungar and Foster [Ungar and Foster 199
ethod, but remain vague about the details of t
tion Based Deviation Functions. Next, there is a
-based matrix factorization algorithms for ra
Bell 2011]. They start from the 2-factor factor
(Eq. 3) but strip the parameters of all their st
ated to be an approximate, factorized reconstru
h is to find S(1)
and S(2)
such that they mini
ror between S and R. A deviation function th
D (S, R) =
X
u2U
X
i2I
Rui (Rui Sui)
2
.
u2U i2I
rly makes the AMAU assumption. Making the A
all missing values are interpreted as an absenc
on becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
he AMAU assumption is too careful because the
atives. On the other hand, the AMAN assumpti
ually searching for the preferences among the un
t al. 2008] and Pan et al. [Pan et al. 2008] simul
een AMAU and AMAN:
D (S, R) =
X X
Wui (Rui Sui)
2
,
nction becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
, the AMAU assumption is too careful because the
egatives. On the other hand, the AMAN assumpti
actually searching for the preferences among the u
Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu
tween AMAU and AMAN:
D (S, R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
,
n⇥m
assigns a weight to every value in R. The hig
bout Rui. There is a high confidence about the one
fidence about the zeros being dislikes. To formaliz
2008] give two potential definitions of Wui:
AMAU
AMAN
Middle Way
Reconstruction choosing W
nction becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
, the AMAU assumption is too careful because the
egatives. On the other hand, the AMAN assumpti
actually searching for the preferences among the u
Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu
tween AMAU and AMAN:
D (S, R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
,
n⇥m
assigns a weight to every value in R. The hig
bout Rui. There is a high confidence about the one
fidence about the zeros being dislikes. To formaliz
2008] give two potential definitions of Wui:
Middle Way
d=1
X
i2I
S
(1)
di = 1
⇢
Wui = 1 if Rui = 0
Wui = ↵ if Rui = 1
ient Top-N Recommendation for Very Large Scale Binary Rate
eys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
Reconstruction regularization
ollaborative Filtering: A Theoretical and Experimental Comparison
matrix factorization of its statistical meaning, also the c
9 disappear. Simply minimizing Equation 11 however r
t are overfitted on the training data. Therefore both Hu
o minimize a regularized version
R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
+
⇣
||S(1)
||F + ||S(2)
||F
⌘
,
egularization hyperparameter and ||.||F the Frobenius
an make it hard to find a good value. Additionally, Pan
te regularization:
=
X
u2U
X
i2I
Wui
⇣
(Rui Sui)
2
+
⇣
||S
(1)
u⇤ ||F + ||S
(2)
⇤j ||F
⌘⌘
.
Squared reconstruction error term Regularization term
Regularization hyperparameter
[Hu et al. 2008]
[Pan et al. 2008]
[Pan and Scholz 2009]
Reconstruction more complex
matrix factorization of its statistical meaning, also the c
disappear. Simply minimizing Equation 11 however re
are overfitted on the training data. Therefore both Hu
o minimize a regularized version
) =
X
u2U
X
i2I
Wui (Rui Sui)
2
+
⇣
||S(1)
||F + ||S(2)
||F
⌘
,
gularization hyperparameter and ||.||F the Frobenius n
an make it hard to find a good value. Additionally, Pan
te regularization:
=
X
u2U
X
i2I
Wui
⇣
(Rui Sui)
2
+
⇣
||S
(1)
u⇤ ||F + ||S
(2)
⇤j ||F
⌘⌘
.
function is defined over all user-item pairs, a direct
s stochastic gradient descent (SGD), which is frequentl
orizations in rating prediction problems, seems unfeasib
Reconstruction rewritten
S ||F + ||S ||F
X
2U
X
i2I
(1 Rui)H (Pui) ,
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
N Recommendation for Very Large Scale Binary Rated Datasets
UC optimization for top-N recommendation with implicit feedback
Reconstruction rewritten
S ||F + ||S ||F
X
2U
X
i2I
(1 Rui)H (Pui) ,
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
N Recommendation for Very Large Scale Binary Rated Datasets
UC optimization for top-N recommendation with implicit feedback
S ||F + ||S ||F
X
2U
X
i2I
(1 Rui)H (Pui) ,
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
N Recommendation for Very Large Scale Binary Rated Datasets
UC optimization for top-N recommendation with implicit feedback
Reconstruction guess unknown = 0
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
p (1 Sui)
2
+ (1 p) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
Reconstruction unknown can also be 1
[Yao et al. 2014]
Reconstruction less assumptions, more parameters
u2U i2I
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
NCES
013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas
80.
i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
96.
and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
p. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Reconstruction more regularization
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui)
NCES
013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas
80.
i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
sitive-Only Collaborative Filtering: A Theoretical and Experimental Compariso
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui)
NCES
013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas
80.
i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
Reconstruction more (flexible) parameters
[Sindhwani et al. 2010]
Reconstruction conceptual flaw
imate, factorized reconstruction of R
d S(2)
such that they minimize the
R. A deviation function that reflect
X
2U
X
i2I
Rui (Rui Sui)
2
.
AU assumption. Making the AMAN
s are interpreted as an absence of pr
=
X X
(Rui Sui)
2
.
1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 S
F + ||S(2)
||F
X
I
(1 Rui)H (Pui)
(1 0)2
= 1 = (1 2)2
-N Recommendation for Very Large Scale Binary Rate
UC optimization for top-N recommendation with impli
Log likelihood similar idea
[C. Johnson 2014]
. . .
max for every (u, i)
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log
⇣
1 Sui) + (||S(1)
||2
F + ||S(2)
||2
F
⌘
NCES
. 2013. Efficient top-n recommendation for very large scale binary rated datasets.
7th ACM conference on Recommender systems. ACM, 273–280.
Log likelihood similar idea
[C. Johnson 2014]
. . .
max for every (u, i)
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log
⇣
1 Sui) + (||S(1)
||2
F + ||S(2)
||2
F
⌘
NCES
. 2013. Efficient top-n recommendation for very large scale binary rated datasets.
7th ACM conference on Recommender systems. ACM, 273–280.
. . .
max for every (u, i)
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log(1 Sui) +
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
NCES
li. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In P
7th ACM conference on Recommender systems. ACM, 273–280.
Zero-­‐mean,	
  spherical	
  
Gaussian	
  priors	
  
Maximum Margin not all preferences equally preferred
⇢
˜Rui = 1 if Rui = 1
˜Rui = 1 if Rui = 0,
on funtion as
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃,
orm, a regularization hyperparameter, h
⇣
˜Rui · Su
igure 3 [Rennie and Srebro 2005] and W given b
on incorporates the confidence about the training da
knowledge about the degree of preference by means
e the degree of preference is considered unknown, a
[Pan and Scholz 2009]
Maximum Margin not all preferences equally preferred
⇢
˜Rui = 1 if Rui = 1
˜Rui = 1 if Rui = 0,
on funtion as
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃,
orm, a regularization hyperparameter, h
⇣
˜Rui · Su
igure 3 [Rennie and Srebro 2005] and W given b
on incorporates the confidence about the training da
knowledge about the degree of preference by means
e the degree of preference is considered unknown, a
[Pan and Scholz 2009]
Maximum Margin not all preferences equally preferred
⇢
˜Rui = 1 if Rui = 1
˜Rui = 1 if Rui = 0,
on funtion as
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃,
orm, a regularization hyperparameter, h
⇣
˜Rui · Su
igure 3 [Rennie and Srebro 2005] and W given b
on incorporates the confidence about the training da
knowledge about the degree of preference by means
e the degree of preference is considered unknown, a
[Pan and Scholz 2009]
1999]. Ungar and Foster [Ungar and Foster 1998] proposed a simila
method, but remain vague about the details of their method.
uction Based Deviation Functions. Next, there is a group of algorithm
D-based matrix factorization algorithms for rating prediction prob
d Bell 2011]. They start from the 2-factor factorization that describe
l (Eq. 3) but strip the parameters of all their statistical meaning. In
lated to be an approximate, factorized reconstruction of R. A straigh
h is to find S(1)
and S(2)
such that they minimize the the square
rror between S and R. A deviation function that reflects this line o
D (S, R) =
X
u2U
X
i2I
Rui (Rui Sui)
2
.
early makes the AMAU assumption. Making the AMAN assumption
nd, all missing values are interpreted as an absence of preference an
nction becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
, the AMAU assumption is too careful because the vast majority of th
has important practical consequences: If Rui = 1, the square loss
and Sui = 2. However, Sui = 2 is a much better prediction than Sui
the reconstruction based deviation functions (implicitly) assume
are equally strong, which is an important simplification of reality
A deviation function that does not suffer from this flaw was p
Scholz [Pan and Scholz 2009], who applied the idea of Maximum
torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary,
orative filtering. They construct the matrix ˜R as
⇢
˜Rui = 1 if Rui = 1
˜Rui = 1 if Rui = 0,
and define the deviation funtion as
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃
with ||.||⌃ the trace norm, a regularization hyperparameter, h
⇣
hinge loss given by Figure 3 [Rennie and Srebro 2005] and W
Equations 14-16.
The deviation function incorporates the confidence about the tra
of W and the missing knowledge about the degree of preference b
loss h
⇣
˜Rui · Sui
⌘
. Since the degree of preference is considered unk
1 is not penalized.
ons. Notice that R is a binary matrix and
ued matrix. Therefore, the interpretation
amentally flawed. This fundamental flaw
i = 1, the square loss is 1 for both Sui = 0
er prediction than Sui = 0. Put differently,
s (implicitly) assume that all preferences
mplification of reality.
from this flaw was proposed by Pan and
he idea of Maximum Margin Matrix Fac-
et al. 2004] to binary, positive-only collab-
˜R as
if Rui = 1
if Rui = 0,
p(i|d2)
p(i|dD)
Sui = 0
Sui = 2
p(i|d1)
p(i|d2)
p(i|dD)
Sui = 0
Sui = 2
1
1
~ 0.5
0
e J(U, V, ) is fairly simple. Ignoring fo
e non-differentiability of h(z) = (1 z
ient of J(U, V, ) is easy to compute. T
ve with respect to each element of U is:
Uia C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
T r
ij( ir
to the best of our knowledge they have not yet been used for one-
tering.
anking Based Deviation Functions. The scores computed by recommen
used to personally rank all items for every user. Therefore, Rendle
2009] argued that it is natural to directly optimize the ranking. M
aim to maximize the area under the ROC curve (AUC), which is
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin
odel S are more in line with the observed data R. However, beca
on-differentiable, their deviation function is a differentiable app
gative AUC from which constant factors have been removed and
ation term has been added:
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
|
the sigmoid function and 1, 2 regularization constants, which
AUC directly optimize the ranking
[Rendle et al. 2009]
AUC directly optimize the ranking
[Rendle et al. 2009]
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES
3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
AUC non-differentiable
[Rendle et al. 2009]
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES
3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
AUC smooth approximation
n-differentiability of h(z) = (1 z)+
of J(U, V, ) is easy to compute. The
th respect to each element of U is:
C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
Tr
ij( ir Ui
ased Deviation Functions. The scores computed by recommender s
personally rank all items for every user. Therefore, Rendle et al
rgued that it is natural to directly optimize the ranking. More s
maximize the area under the ROC curve (AUC), which is give
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
and (false) = 0. If the AUC is higher, the pairwise rankings in
are more in line with the observed data R. However, because
rentiable, their deviation function is a differentiable approxim
AUC from which constant factors have been removed and to w
rm has been added:
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
||2
F ,
moid function and 1, 2 regularization constants, which are
he method. Notice that this deviation function coniders all m
negative, i.e. it corresponds to the AMAN assumption.[Rendle et al. 2009]
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES
3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
Pairwise Ranking 2 similar to AUC
+ ||S ||F + ||S ||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui) (57
(1 0)2
= 1 = (1 2)2
(58
w(j) =
X
u2U
Ruj
˜R
⌘
=
X
u2U
X
Rui=1
X
Ruj =0
((Rui Ruj) (Sui Suj))
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
ES
3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
nd B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
ristakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-
nder systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
[Kabbur et al. 2013]
Pairwise Ranking 3 no regularization, also 1 to 1
+
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
,
larization constant and () the sigmoid function. Notice that this
n de facto ignores all missing feedback, i.e. it corresponds to the AM
r ranking based deviation function was proposed by Tak´acs
and Tikk 2012]
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Rui
X
j2I
w(j) ((Sui Suj) (Rui Ruj))
2
,
er-defined item weighting function. The simplest choice is w(j) =
native proposed by Tak´acs and Tikk is w(j) =
P
u2U Ruj. This devia
some resemblance with the one in Equation 4.1.4. However, a squ
stead of the log-loss of the sigmoid. Furthermore, this deviation fun
s the score-difference between all known preferences, which is not
.1.4. Finally, it is remarkable that Tak´acs and Tikk explicitly do not
on term, whereas most other authors find that the regularization ter
their models performance.
or Probability Deviation Functions. At this point, we almost finished
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui)
(1 0)2
= 1 = (1 2)2
w(j) =
X
u2U
Ruj
NCES
013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re
80.
i. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re
96.
and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
[Takàcs and Tikk 2012]
al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is defined as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR focus on top of the ranking
[Shi et al. 2012]
al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is defined as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR non-differentiable
[Shi et al. 2012]
K. Verstrepen et al.
h r>(a | B) gives the rank of a among all numbers in B when ordered in de-
g order. Unfortunately, the non-smoothness of r>() and max makes the direct
ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.
h this smoothed version differentiable, it could still be practically intractable
mize it. Therefore, they propose to optimize a lower bound instead. After also
regularization terms, their final deviation function is given by
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Rui
⇣
log (Sui)
+
X
j2I
log (1 Ruj (Suj Sui))
⌘
+
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
, (21)
a regularization constant and () the sigmoid function. Notice that this devi-
nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU
al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is defined as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR differentiable approximation, computationally feasible
[Shi et al. 2012]
al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is defined as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR known preferences score high
promote
K. Verstrepen et al.
h r>(a | B) gives the rank of a among all numbers in B when ordered in de-
g order. Unfortunately, the non-smoothness of r>() and max makes the direct
ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.
h this smoothed version differentiable, it could still be practically intractable
mize it. Therefore, they propose to optimize a lower bound instead. After also
regularization terms, their final deviation function is given by
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Rui
⇣
log (Sui)
+
X
j2I
log (1 Ruj (Suj Sui))
⌘
+
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
, (21)
a regularization constant and () the sigmoid function. Notice that this devi-
nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU[Shi et al. 2012]
al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is defined as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR push down other known preferences
K. Verstrepen et al.
h r>(a | B) gives the rank of a among all numbers in B when ordered in de-
g order. Unfortunately, the non-smoothness of r>() and max makes the direct
ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.
h this smoothed version differentiable, it could still be practically intractable
mize it. Therefore, they propose to optimize a lower bound instead. After also
regularization terms, their final deviation function is given by
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Rui
⇣
log (Sui)
+
X
j2I
log (1 Ruj (Suj Sui))
⌘
+
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
, (21)
a regularization constant and () the sigmoid function. Notice that this devi-
nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU
promote
scatter
[Shi et al. 2012]
al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is defined as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR corresponds to AMAU assumption
K. Verstrepen et al.
h r>(a | B) gives the rank of a among all numbers in B when ordered in de-
g order. Unfortunately, the non-smoothness of r>() and max makes the direct
ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.
h this smoothed version differentiable, it could still be practically intractable
mize it. Therefore, they propose to optimize a lower bound instead. After also
regularization terms, their final deviation function is given by
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Rui
⇣
log (Sui)
+
X
j2I
log (1 Ruj (Suj Sui))
⌘
+
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
, (21)
a regularization constant and () the sigmoid function. Notice that this devi-
nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU
promote
scatterAMAU
[Shi et al. 2012]
u2U Rui=1 Ruj=0
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
CES
13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase
0.
2014. Convex AUC optimization for top-N recommendation with implicit feedba
6.
ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
kth-Order Statistic basis = AUC
[Weston et al. 2013]
kth-Order Statistic strip normalization
u2U Rui=1 Ruj=0
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
CES
13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase
0.
2014. Convex AUC optimization for top-N recommendation with implicit feedba
6.
ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
u2U i2I
(Rui Sui) +
t=1 f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
(5
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
[Weston et al. 2013]
kth-Order Statistic focus on highly ranked negatives
u2U Rui=1 Ruj=0
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
CES
13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase
0.
2014. Convex AUC optimization for top-N recommendation with implicit feedba
6.
ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
u2U i2I
(Rui Sui) +
t=1 f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
(5
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
u2U i2I t=1 f=1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES
[Weston et al. 2013]
kth-Order Statistic weight known preferences by rank
u2U Rui=1 Ruj=0
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
CES
13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase
0.
2014. Convex AUC optimization for top-N recommendation with implicit feedba
6.
ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
u2U i2I
(Rui Sui) +
t=1 f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
(5
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
u2U i2I t=1 f=1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
X X X
[Weston et al. 2013]
kth-Order Statistic non-differentiable
max min
↵u⇤
Rui=1 Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
[Weston et al. 2013]
kth-Order Statistic hinge loss & sampling approximations
max min
↵u⇤
Rui=1 Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of T
approximation
⌘
=
X
u2U
X
Rui=1
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
max(0, 1 + Suj Sui)
N 1|{j 2 I | Ruj = 0}|
,
(31)
hey replaced the indicator function by the hinge-loss and approximated the
N 1
|{j 2 I | Ruj = 0}|, in which N the number of items k that were
sampled until Suk + 1 > Sui
2
. Furthermore, Weston et al. use the simple
unction
(Sui|{Sui|Rui=1})
|u|
⌘
= 1 if r>(Sui | S ✓ {Sui | Rui = 1}, |S| = K) = k and
(Sui|{Sui|Rui=1})
|u|
⌘
= 0 otherwise ,
Binary, Positive-Only Collaborative Filtering: A Theoretical and Experime
ferentiable approximation
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui=1
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
m
N
in which they replaced the indicator function by the hinge-los
rank with N 1
|{j 2 I | Ruj = 0}|, in which N the numbe
randomly sampled until Suk + 1 > Sui
2
. Furthermore, West
weighting function
8
<w
⇣
r>(Sui|{Sui|Rui=1})
|u|
⌘
= 1 if r>(Sui | S ✓ {Sui | Rui = 1}
⇣ ⌘
S
(1)
u⇤ = arg max
S
(1)
u
min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj),
every user u, it holds that
P
Rui=1 ↵ui = 1 and
P
Rui=0 ↵ui = 1. To avoid
of ↵, he adds two regularization terms:
rg max
S
(1)
u
min
↵u⇤
0
@
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj) + p
X
Rui=1
↵2
ui + n
X
Rui=0
↵2
ui
1
A ,
regularization hyperparameters. S(1)
is regularized by means of the row-
ion constraint. Solving the above maximization for every user, is equivalent
ing the deviation function
=
X
u2U
0
@max
↵u⇤
0
@
X
Rui=1
X
Ruj =0
↵ui↵uj(Suj Sui) p
X
Rui=1
↵2
ui n
X
Rui=0
↵2
ui
1
A
1
A .
(29)
t this approach corresponds to te AMAN assumption.
3: Three Factor Matrices, One Factor Matrix A Priori Unknown
also algorithms which model S with 3 factor matrices:
S = S(1)
S(2)
S(3)
.
of our knowledge, they all follow the special case
S = RS(2)
S(3)
.
e, the users are represented by |I|-dimensional binary vectors, the items are
d by f-dimensional real vectors and the similarity between two items i and
ted by the inner product S
(2)
i⇤ S
(3)
⇤j , which means that S(2)
S(3)
represents the
arity matrix.
et al. [Weston et al. 2013] adopt a version of this model with a symmetric
arity matrix, which is imposed by setting S(3)
= S(2)T
.
ne hand, the deviation functions in Equation 21 and 4.1.4 try to minimize
rank of the known preferences. On the other hand, the deviation function
n 22 tries to push one known preference as high as possible to the top of
anking (Eq. 22). Weston et al. [Weston et al. 2013] propose to minimize a
etween the above two extremes:
X
ui=1
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
, (30)
function that weights the importance of the known preference as a function
icted rank among all known preferences. This weighting function is user-
d determines the trade-off between the two extremes, i.e. minimizing the
of the known preferences and minimizing the maximal rank of the known
s. Because this function is non-differentiable, Weston et al. propose the dif-
Weston et al. [Weston et al. 2013] a
item-similarity matrix, which is impos
On the one hand, the deviation fun
the mean rank of the known preferen
in Equation 22 tries to push one kno
the item-ranking (Eq. 22). Weston et
trade-off between the above two extre
X
u2U
X
Rui=1
w
✓
r>(Sui | {Sui | Rui =
|u|
with w() a function that weights the im
of its predicted rank among all know
defined and determines the trade-off
S = RS S .
e users are represented by |I|-dimensional binary vectors, the item
y f-dimensional real vectors and the similarity between two items
by the inner product S
(2)
i⇤ S
(3)
⇤j , which means that S(2)
S(3)
represen
y matrix.
. [Weston et al. 2013] adopt a version of this model with a sym
y matrix, which is imposed by setting S(3)
= S(2)T
.
hand, the deviation functions in Equation 21 and 4.1.4 try to min
k of the known preferences. On the other hand, the deviation fu
2 tries to push one known preference as high as possible to the
ng (Eq. 22). Weston et al. [Weston et al. 2013] propose to minim
een the above two extremes:
w
✓
r>(Sui | {Sui | Rui = 1})
|u|
◆ X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
,
ction that weights the importance of the known preference as a fu
d rank among all known preferences. This weighting function is
etermines the trade-off between the two extremes, i.e. minimizi
the known preferences and minimizing the maximal rank of the k
1	
   2	
   …	
   k	
   …	
   K	
  
0	
   0	
   0	
   1	
   0	
   0	
  
1	
   2	
   …	
   N	
  
false	
   false	
   false	
   true	
  
[Weston et al. 2013]
KL-divergence approximation of posterior pdf
ui ui
( ) · p( |
D(S, R) = DKL(Q(S)||p(S|R))
n recommendation for very large scale binar
n Recommender systems. ACM, 273–280.
optimization for top-N recommendation wi
Approximation of
[Koeningstein et al. 2012]
[Paquet and Koeningstein 2013]
Local Minima converge to local minimum
Rui=1
X
Rui=1
log Sui
=
X
Rui=1
log Sui
n D (S, R)
for Very Large Scale Binary Rated Datasets. In RecSys.
top-N recommendation with implicit feedback. In RecSys.
ecommendation. In WebMine. 142–160.
hine Learning. Springer, New York, NY.
2014. Hoslim: Higher-order sparse linear method for top-n
ledge Discovery and Data Mining. Springer, 38–49.
urrin. 2010. Performance of recommender algorithms on
s of the fourth ACM conference on Recommender systems.
d Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
hensive Survey of Neighborhood-based Recommendation
ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
hirani. 2010. Regularization paths for generalized linear
tistical software 33, 1 (2010), 1.
en PLSA and NMF and implications. In SIGIR. 601–602.
Indexing. In SIGIR. 50–57.
ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1
Mining and Knowledge Discovery Handbook, O. Mainmon
Y.
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KN
every row S
(1)
u. and
(S(1,1)
, . . . , S(T,F )
)
REFERENCES
Convex unique minimum
Rui=1
X
Rui=1
log Sui
=
X
Rui=1
log Sui
n D (S, R)
for Very Large Scale Binary Rated Datasets. In RecSys.
top-N recommendation with implicit feedback. In RecSys.
ecommendation. In WebMine. 142–160.
hine Learning. Springer, New York, NY.
2014. Hoslim: Higher-order sparse linear method for top-n
ledge Discovery and Data Mining. Springer, 38–49.
urrin. 2010. Performance of recommender algorithms on
s of the fourth ACM conference on Recommender systems.
d Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
hensive Survey of Neighborhood-based Recommendation
ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
hirani. 2010. Regularization paths for generalized linear
tistical software 33, 1 (2010), 1.
en PLSA and NMF and implications. In SIGIR. 601–602.
Indexing. In SIGIR. 50–57.
ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1
Mining and Knowledge Discovery Handbook, O. Mainmon
Y.
Convex Optimization Algorithm
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KN
every row S
(1)
u. and
(S(1,1)
, . . . , S(T,F )
)
REFERENCES
Max-Min-Margin AUC as average margin
e J(U, V, ) is fairly simple. Ignoring fo
e non-differentiability of h(z) = (1 z
ient of J(U, V, ) is easy to compute. T
ve with respect to each element of U is:
Uia C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
T r
ij( ir
to the best of our knowledge they have not yet been used for one-
tering.
anking Based Deviation Functions. The scores computed by recommen
used to personally rank all items for every user. Therefore, Rendle
2009] argued that it is natural to directly optimize the ranking. M
aim to maximize the area under the ROC curve (AUC), which is
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin
odel S are more in line with the observed data R. However, beca
on-differentiable, their deviation function is a differentiable app
gative AUC from which constant factors have been removed and
ation term has been added:
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
|
the sigmoid function and 1, 2 regularization constants, which[Aiolli 2014]
Max-Min-Margin AUC as average margin
e J(U, V, ) is fairly simple. Ignoring fo
e non-differentiability of h(z) = (1 z
ient of J(U, V, ) is easy to compute. T
ve with respect to each element of U is:
Uia C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
T r
ij( ir
to the best of our knowledge they have not yet been used for one-
tering.
anking Based Deviation Functions. The scores computed by recommen
used to personally rank all items for every user. Therefore, Rendle
2009] argued that it is natural to directly optimize the ranking. M
aim to maximize the area under the ROC curve (AUC), which is
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin
odel S are more in line with the observed data R. However, beca
on-differentiable, their deviation function is a differentiable app
gative AUC from which constant factors have been removed and
ation term has been added:
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
|
the sigmoid function and 1, 2 regularization constants, which[Aiolli 2014]
ing-based deviation functions. Rendle et al. propose to use exactl
n function as in Equation 21 to optimize the AUC [Rendle et al.
rence is that for computing S, RS(2)
is used instead of S(1)
S(2)
, i.e.
r matrix is unknown. Because S(2)
can be interpreted as a item-si
call this method BPR-kNN.
olli [Aiolli 2014] on the other hand, chooses the user-based alterna
ith ¯R the column normalized version of R and S(1)
row normalized
lds that 1  Sui 1 since Sui = S
(1)
u⇤
¯R⇤i with ||S
(1)
u⇤ ||  1 and || ¯R⇤i
vidual user u 2 U, he starts from AUCu, the AUC for u:
AUCu =
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj).
, he proposes a lower bound on AUCu:
AUCu
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
,
interprets it as a weighted sum of margins
Sui Suj
2 between an
s and any absent feedback, in which every margin gets the same w
Max-Min-Margin AUC as average margin
e J(U, V, ) is fairly simple. Ignoring fo
e non-differentiability of h(z) = (1 z
ient of J(U, V, ) is easy to compute. T
ve with respect to each element of U is:
Uia C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
T r
ij( ir
to the best of our knowledge they have not yet been used for one-
tering.
anking Based Deviation Functions. The scores computed by recommen
used to personally rank all items for every user. Therefore, Rendle
2009] argued that it is natural to directly optimize the ranking. M
aim to maximize the area under the ROC curve (AUC), which is
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin
odel S are more in line with the observed data R. However, beca
on-differentiable, their deviation function is a differentiable app
gative AUC from which constant factors have been removed and
ation term has been added:
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
|
the sigmoid function and 1, 2 regularization constants, which[Aiolli 2014]
Max-Min-Margin AUC as average margin
e J(U, V, ) is fairly simple. Ignoring fo
e non-differentiability of h(z) = (1 z
ient of J(U, V, ) is easy to compute. T
ve with respect to each element of U is:
Uia C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
T r
ij( ir
to the best of our knowledge they have not yet been used for one-
tering.
anking Based Deviation Functions. The scores computed by recommen
used to personally rank all items for every user. Therefore, Rendle
2009] argued that it is natural to directly optimize the ranking. M
aim to maximize the area under the ROC curve (AUC), which is
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin
odel S are more in line with the observed data R. However, beca
on-differentiable, their deviation function is a differentiable app
gative AUC from which constant factors have been removed and
ation term has been added:
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
|
the sigmoid function and 1, 2 regularization constants, which
ing-based deviation functions. Rendle et al. propose to use exactl
n function as in Equation 21 to optimize the AUC [Rendle et al.
rence is that for computing S, RS(2)
is used instead of S(1)
S(2)
, i.e.
r matrix is unknown. Because S(2)
can be interpreted as a item-si
call this method BPR-kNN.
olli [Aiolli 2014] on the other hand, chooses the user-based alterna
ith ¯R the column normalized version of R and S(1)
row normalized
lds that 1  Sui 1 since Sui = S
(1)
u⇤
¯R⇤i with ||S
(1)
u⇤ ||  1 and || ¯R⇤i
vidual user u 2 U, he starts from AUCu, the AUC for u:
AUCu =
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj).
, he proposes a lower bound on AUCu:
AUCu
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
,
interprets it as a weighted sum of margins
Sui Suj
2 between an
s and any absent feedback, in which every margin gets the same w[Aiolli 2014]
Max-Min-Margin average à min total
u2U i2I
(Rui Sui)
2
+
t=1 f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
FERENCES
Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
273–280.
bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
293–296.
. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
angelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
olo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system
39–46.
[Aiolli 2014]
Max-Min-Margin average à min total
u2U i2I
(Rui Sui)
2
+
t=1 f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
FERENCES
Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
273–280.
bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
293–296.
. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
angelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
olo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system
39–46.
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms o
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system[Aiolli 2014]
Max-Min-Margin add regularization
u2U i2I
(Rui Sui)
2
+
t=1 f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
FERENCES
Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
273–280.
bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
293–296.
. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
angelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
olo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system
39–46.
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms o
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system
irs that are difficult to rank correctly. Therefore, Aiolli proposes to replace the uni-
m weighting with a weighting scheme that minimizes the total margin. Specifically,
propose to solve for every user u, the joint optimization problem
S
(1)
u⇤ = arg max
S
(1)
u
min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj),
here for every user u, it holds that
P
Rui=1 ↵ui = 1 and
P
Rui=0 ↵ui = 1. To avoid
erfitting of ↵, he adds two regularization terms:
S
(1)
u⇤ = arg max
S
(1)
u
min
↵u⇤
0
@
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj) + p
X
Rui=1
↵2
ui + n
X
Rui=0
↵2
ui
1
A ,
th p, n regularization hyperparameters. S(1)
is regularized by means of the row-
rmalization constraint. Solving the above maximization for every user, is equivalent
minimizing the deviation function
⇣
S, ˜R
⌘
=
X
u2U
0
@max
↵u⇤
0
@
X
Rui=1
X
Ruj =0
↵ui↵uj(Suj Sui) p
X
Rui=1
↵2
ui n
X
Rui=0
↵2
ui
1
A
1
A .
(29)
otice that this approach corresponds to te AMAN assumption.
. Group 3: Three Factor Matrices, One Factor Matrix A Priori Unknown
ere are also algorithms which model S with 3 factor matrices:
[Aiolli 2014]
Convex unique minimum
Rui=1
X
Rui=1
log Sui
=
X
Rui=1
log Sui
n D (S, R)
for Very Large Scale Binary Rated Datasets. In RecSys.
top-N recommendation with implicit feedback. In RecSys.
ecommendation. In WebMine. 142–160.
hine Learning. Springer, New York, NY.
2014. Hoslim: Higher-order sparse linear method for top-n
ledge Discovery and Data Mining. Springer, 38–49.
urrin. 2010. Performance of recommender algorithms on
s of the fourth ACM conference on Recommender systems.
d Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
hensive Survey of Neighborhood-based Recommendation
ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
hirani. 2010. Regularization paths for generalized linear
tistical software 33, 1 (2010), 1.
en PLSA and NMF and implications. In SIGIR. 601–602.
Indexing. In SIGIR. 50–57.
ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1
Mining and Knowledge Discovery Handbook, O. Mainmon
Y.
Analytically computable
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KN
every row S
(1)
u. and
(S(1,1)
, . . . , S(T,F )
)
REFERENCES
Nearest Neighbors user- or item-similarity
[Aiolli 2013]
[Deshpande and Karypis 2004]
[Sigurbjörnsson and Van Zwol 2008]
[Sarwar et al. 2001]
[Mobasher et al. 2001]
[Lin et al. 2002]
[Sarwar et al. 2000]
[Menezes et al. 2010]
[van Leeuwen and Puspitaningrum 2012]
Nearest Neighbors similarity measures
K. Verstrepen et al.
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(v, u) · |KNN (v)  {u}| S(2)
vu
⌘2
t Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
vex AUC optimization for top-N recommendation with implicit feedback. In RecSys.
asher. 2006. Contextual Recommendation. In WebMine. 142–160.
tern Recognition and Machine Learning. Springer, New York, NY.
ulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
[Aiolli 2013]
[Deshpande and Karypis 2004]
[Sigurbjörnsson and Van Zwol 2008]
[Sarwar et al. 2001]
[Mobasher et al. 2001]
[Lin et al. 2002]
[Sarwar et al. 2000]
[Menezes et al. 2010]
[van Leeuwen and Puspitaningrum 2012]
1:34 K. V
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(1)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
S(3)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
Nearest Neighbors similarity measures
K. Verstrepen et al.
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(v, u) · |KNN (v)  {u}| S(2)
vu
⌘2
t Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
vex AUC optimization for top-N recommendation with implicit feedback. In RecSys.
asher. 2006. Contextual Recommendation. In WebMine. 142–160.
tern Recognition and Machine Learning. Springer, New York, NY.
ulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
K. Verstrepen et al.
X
I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
v2U
⇣
sim(v, u) · |KNN (v)  {u}| S(2)
vu
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
C optimization for top-N recommendation with implicit feedback. In RecSys.
1:34
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(2)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Ra
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with imp
293–296.
1:34 K. Verstrepen
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(2)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In R
1:34
X
i2I
X
j2I
⇣
sim
X
u2U
X
v2U
⇣
sim
S
(2)
ji =
S(2)
uv =
for all i, j 2 I
for all u, v 2 U
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomme
273–280.
[Aiolli 2013]
[Deshpande and Karypis 2004]
[Sigurbjörnsson and Van Zwol 2008]
[Sarwar et al. 2001]
[Mobasher et al. 2001]
[Lin et al. 2002]
[Sarwar et al. 2000]
[Menezes et al. 2010]
[van Leeuwen and Puspitaningrum 2012]
1:34 K. V
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(1)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
S(3)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
Nearest Neighbors unified
K. Verstrepen et al.
X
I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
v2U
⇣
sim(v, u) · |KNN (v)  {u}| S(2)
vu
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
C optimization for top-N recommendation with implicit feedback. In RecSys.
1:34 K. Verstrepen
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(2)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In R
1:34
X
i2I
X
j2I
⇣
sim
X
u2U
X
v2U
⇣
sim
S
(2)
ji =
S(2)
uv =
for all i, j 2 I
for all u, v 2 U
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomme
273–280.
1:34 K
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(2)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
S(3)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u
[Verstrepen and Goethals 2014]
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(1)
uv
⌘2
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
S(3)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(3)
uv
⌘2
every row S
(1)
u. and every column S
(2)
.i the same unit vector
O(|U| ⇥ |I|)
O(|R|)
O(|R| ⇥ |I|)
Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Difference with rating-based algorithms
– Parameter inference
•  Netflix
Netflix Prize rating data
n-star rating scale n=5
n-star rating scale n=10
n-star rating scale n=1
No negative feedback
?
Pearsson Correlation not applicable
cally, our case of binary, positive-only data is just a special case of ra
= Bh = 1. However, collaborative filtering algorithms for rating da
build on the implicit assumption that Bl < Bh, i.e. that both positive a
back is available. Since this negative feedback is not available in our
t is not surprising that, in general, algorithms for rating data gene
nonsensical results [Hu et al. 2008; Pan et al. 2008].
algorithms for rating data, for example, often use the Pearson corre
as a similarity measure. The Pearson correlation coefficient betwee
given by
pcc(u, v) =
P
Ruj ,Rvj >0
(Ruj Ru)(Rvj Rv)
r P
Ruj ,Rvj >0
(Ruj Ru)2
r P
Ruj ,Rvj >0
(Rvj Rv)2
,
and Rv the average rating of u and v respectively. In our setting, wit
only data however, Ruj and Rvj are by definition always one. Cons
and Rv are always one. Therefore, the Pearson correlation is alway
d (zero divided by zero), making it a useless similarity measure fo
only data. Even if we would hack it by omitting the terms for mean c
d Rv, it is still useless since it would always be equal to either one o
Pearsson Correlation not applicable
cally, our case of binary, positive-only data is just a special case of ra
= Bh = 1. However, collaborative filtering algorithms for rating da
build on the implicit assumption that Bl < Bh, i.e. that both positive a
back is available. Since this negative feedback is not available in our
t is not surprising that, in general, algorithms for rating data gene
nonsensical results [Hu et al. 2008; Pan et al. 2008].
algorithms for rating data, for example, often use the Pearson corre
as a similarity measure. The Pearson correlation coefficient betwee
given by
pcc(u, v) =
P
Ruj ,Rvj >0
(Ruj Ru)(Rvj Rv)
r P
Ruj ,Rvj >0
(Ruj Ru)2
r P
Ruj ,Rvj >0
(Rvj Rv)2
,
and Rv the average rating of u and v respectively. In our setting, wit
only data however, Ruj and Rvj are by definition always one. Cons
and Rv are always one. Therefore, the Pearson correlation is alway
d (zero divided by zero), making it a useless similarity measure fo
only data. Even if we would hack it by omitting the terms for mean c
d Rv, it is still useless since it would always be equal to either one o
1	
   1	
  
1	
   1	
  
Pearsson Correlation not applicable
cally, our case of binary, positive-only data is just a special case of ra
= Bh = 1. However, collaborative filtering algorithms for rating da
build on the implicit assumption that Bl < Bh, i.e. that both positive a
back is available. Since this negative feedback is not available in our
t is not surprising that, in general, algorithms for rating data gene
nonsensical results [Hu et al. 2008; Pan et al. 2008].
algorithms for rating data, for example, often use the Pearson corre
as a similarity measure. The Pearson correlation coefficient betwee
given by
pcc(u, v) =
P
Ruj ,Rvj >0
(Ruj Ru)(Rvj Rv)
r P
Ruj ,Rvj >0
(Ruj Ru)2
r P
Ruj ,Rvj >0
(Rvj Rv)2
,
and Rv the average rating of u and v respectively. In our setting, wit
only data however, Ruj and Rvj are by definition always one. Cons
and Rv are always one. Therefore, the Pearson correlation is alway
d (zero divided by zero), making it a useless similarity measure fo
only data. Even if we would hack it by omitting the terms for mean c
d Rv, it is still useless since it would always be equal to either one o
1	
   1	
  
1	
   1	
  
1	
   1	
  
1	
   1	
  
Different Neighborhood trivial solutions
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
?
lgorithms for rating data typically find the k users (ite
) and that have rated i (have been rated by u) [Desrosie
t al. 2011]. On bpo data, this approach results in the n
or every (u, i)-pair.
Also the matrix factorization methods for rating data
o bpo data. Take for example a basic loss function for
ata:
min
S(1),S(2)
X
Rui>0
⇣
Rui S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u·
which for bpo data simplifies to
min
S(1),S(2)
X
Rui>0
⇣
1 S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u· ||
he squared error term of this loss function is minimize
f S(1)
and S(2)
respectively are all the same unit vector.
Matrix Factorization # trivial solutions = inf
lgorithms for rating data typically find the k users (ite
) and that have rated i (have been rated by u) [Desrosie
t al. 2011]. On bpo data, this approach results in the n
or every (u, i)-pair.
Also the matrix factorization methods for rating data
o bpo data. Take for example a basic loss function for
ata:
min
S(1),S(2)
X
Rui>0
⇣
Rui S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u·
which for bpo data simplifies to
min
S(1),S(2)
X
Rui>0
⇣
1 S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u· ||
he squared error term of this loss function is minimize
f S(1)
and S(2)
respectively are all the same unit vector.
Matrix Factorization # trivial solutions = inf
lgorithms for rating data typically find the k users (ite
) and that have rated i (have been rated by u) [Desrosie
t al. 2011]. On bpo data, this approach results in the n
or every (u, i)-pair.
Also the matrix factorization methods for rating data
o bpo data. Take for example a basic loss function for
ata:
min
S(1),S(2)
X
Rui>0
⇣
Rui S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u·
which for bpo data simplifies to
min
S(1),S(2)
X
Rui>0
⇣
1 S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u· ||
he squared error term of this loss function is minimize
f S(1)
and S(2)
respectively are all the same unit vector.
Matrix Factorization # trivial solutions = inf
1	
  
lgorithms for rating data typically find the k users (ite
) and that have rated i (have been rated by u) [Desrosie
t al. 2011]. On bpo data, this approach results in the n
or every (u, i)-pair.
Also the matrix factorization methods for rating data
o bpo data. Take for example a basic loss function for
ata:
min
S(1),S(2)
X
Rui>0
⇣
Rui S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u·
which for bpo data simplifies to
min
S(1),S(2)
X
Rui>0
⇣
1 S
(1)
u· S
(2)
·i
⌘2
+
⇣
||S
(1)
u· ||
he squared error term of this loss function is minimize
f S(1)
and S(2)
respectively are all the same unit vector.
Matrix Factorization # trivial solutions = inf
1	
  
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
S(3)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |K
every row S
(1)
u. and every column S
(2)
.i the same unit vector
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with i
293–296.
Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Difference with rating-based algorithms
– Parameter inference
•  Netflix
SGD mostly prohibitive
every row S
(1)
u. and every column S
(2)
.i the same unit vec
O(|U| ⇥ |I|)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In We
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o
recommender systems. In Advances in Knowledge Discovery and Da
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performa
top-n recommendation tasks. In Proceedings of the fourth ACM co
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendat
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Nei
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
D (S, R) =
X
Rui=1
log Sui
min D (S, R)
ficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.
Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
. Pattern Recognition and Machine Learning. Springer, New York, NY.
kopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on
endation tasks. In Proceedings of the fourth ACM conference on Recommender systems.
d G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based Recommendation
ecommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
on, MA.
Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear
ordinate descent. Journal of statistical software 33, 1 (2010), 1.
. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.
Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.
. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1
5.
Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon
h (Eds.). Springer, New York, NY.
S
(2)
ji = sim(j, i) · |KNN (j)  {i}
S(2)
uv = sim(u, v) · |KNN (u)  {v}
S(3)
uv = sim(u, v) · |KNN (u)  {v}
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u
every row S
(1)
u. and every column S
(2)
.i the same unit vect
O(|U| ⇥ |I|)
O(d3
(|U| + |I|) + d2
|R|
(S(1,1)
, . . . , S(T,F )
)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In We
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o
recommender systems. In Advances in Knowledge Discovery and Da
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performa
top-n recommendation tasks. In Proceedings of the fourth ACM co
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendat
S
(2)
ji = sim(j, i) · |KNN (j)  {i}|
S(2)
uv = sim(u, v) · |KNN (u)  {v}|
S(3)
uv = sim(u, v) · |KNN (u)  {v}|
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(3
uv
⌘2
every row S
(1)
u. and every column S
(2)
.i the same unit vector
O(|U| ⇥ |I|)
O(d3
(|U| + |I|) + d2
|R|
(S(1,1)
, . . . , S(T,F )
)
⌘ · rD(S, R)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.
uv
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KNN (j)  {i}| S
(2)
ji
⌘2
+
X
u2U
X
v2
every row S
(1)
u. and every column S
(2)
.i the same
O(|U| ⇥ |I|)
O(d3
(|U| + |I|) + d
(S(1,1)
, . . . , S(T,F )
)
⌘ · rD(S, R)
=
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very La
start finish
along the way
O(|U| ⇥ |I|)
O(d3
(|U| + |I|) + d2
|R|
⌘ · rD(S, R)
Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.
ex AUC optimization for top-N recommendation with implicit feedback. In RecSys.
sher. 2006. Contextual Recommendation. In WebMine. 142–160.
rn Recognition and Machine Learning. Springer, New York, NY.
lou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
ms. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
da Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on
ion tasks. In Proceedings of the fourth ACM conference on Recommender systems.
arypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
arypis. 2011. A Comprehensive Survey of Neighborhood-based Recommendation
mender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
A.
or Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear
te descent. Journal of statistical software 33, 1 (2010), 1.
O(d3
(|U| + |I|) + d2
|R|
(S(1,1)
, . . . , S(T,F )
)
⌘ · rD(S, R)
=
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated D
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse line
recommender systems. In Advances in Knowledge Discovery and Data Mining. Spri
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recomme
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Rec
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms.
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-base
Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and
O(|U| ⇥ |I|)
O(d3
(|U| + |I|) + d2
|R|
(S(1,1)
, . . . , S(T,F )
)
⌘ · rD(S, R)
=
rD(S, R) = r
X
u2U
X
i2I
Rui=1
Dui(S, R) =
X
u2U
X
i2I
Rui=1
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
REFERENCES
. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
273–280.
abio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. I
293–296.
.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
= sim(j, i) · |KNN (j)  {i}|
sim(u, v) · |KNN (u)  {v}|
sim(u, v) · |KNN (u)  {v}|
S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(3
uv
⌘2
n S
(2)
.i the same unit vector
O(|U| ⇥ |I|)
ndation for Very Large Scale Binary Rated Datasets. In RecSys.
tion for top-N recommendation with implicit feedback. In RecSys.
xtual Recommendation. In WebMine. 142–160.
nd Machine Learning. Springer, New York, NY.
arypis. 2014. Hoslim: Higher-order sparse linear method for top-n
n Knowledge Discovery and Data Mining. Springer, 38–49.
S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(3
uv
⌘2
S
(2)
.i the same unit vector
O(|U| ⇥ |I|)
O(|R|)
d3
(|U| + |I|) + d2
|R|
⌘ · rD(S, R)
X
2I
i=1
Dui(S, R) =
X
u2U
X
i2I
Rui=1
rDui(S, R)
X X X
O(|R|)
O(d3
(|U| + |I|) + d2
|R|
(S(1,1)
, . . . , S(T,F )
)
⌘ · rD(S, R)
=
rD(S, R) = r
X
u2U
X
i2I
Rui=1
Dui(S, R) =
X
u2U
X
i2I
Rui=1
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
O(|U| ⇥ |I|)
O(|R|)
O(d3
(|U| + |I|) + d2
|R|
(S(1,1)
, . . . , S(T,F )
)
⌘ · rD(S, R)
=
rD(S, R) = r
X
u2U
X
i2I
Rui=1
Dui(S, R) =
X
u2U
X
i2I
Rui=1
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
REFERENCES
= sim(j, i) · |KNN (j)  {i}|
sim(u, v) · |KNN (u)  {v}|
sim(u, v) · |KNN (u)  {v}|
S
(2)
ji
⌘2
+
X
u2U
X
v2U
⇣
sim(u, v) · |KNN (u)  {v}| S(3
uv
⌘2
n S
(2)
.i the same unit vector
O(|U| ⇥ |I|)
O(|R|)
O(|R| ⇥ |I|)
O(d3
(|U| + |I|) + d2
|R|
x1000	
  
SGD mostly prohibitive
every row S
(1)
u. and every column S
(2)
.i the same unit vec
O(|U| ⇥ |I|)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In We
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o
recommender systems. In Advances in Knowledge Discovery and Da
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performa
top-n recommendation tasks. In Proceedings of the fourth ACM co
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendat
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Nei
[Shi et al. 2012]
ALS if possible
O(|U| ⇥ |I|)
O(d3
(|U| + |I|) + d2
|R|
ERENCES
lli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rate
73–280.
Aiolli. 2014. Convex AUC optimization for top-N recommendation with implic
93–296.
nand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–
Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York,
gelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse
ecommender systems. In Advances in Knowledge Discovery and Data Mining. S
Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recom
op-n recommendation tasks. In Proceedings of the fourth ACM conference on
9–46.
shpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorith
43–177.
srosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-b
Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira,
ch makes the algorithm less attractive for bpo data.
herefore, algorithms for bpo data typically use a variant of the alternating
ares (ALS) method if the deviation function allows it [Koren et al. 2009; Hu
8]. In this respect, the deviation functions 17 and 18 are appealing because
be minimized with a variant of the alternating least squares (ALS) method.
example the deviation function from equation 17
D (S, R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
+
⇣
||S(1)
||F + ||S(2)
||F
⌘
,
=
X
u2U
X
i2I
Wui
⇣
Rui S
(1)
u⇤ S
(2)
⇤i
⌘2
+
⇣
||S(1)
||F + ||S(2)
||F
⌘
.
most deviation functions, this deviation function is non-convex in the param
tained in S(1)
and S(2)
and has therefore multiple local optima. However, i
mporarily fixes the parameters in S(1)
, it becomes convex in S(2)
and we can an
y find updated values for S(2)
that minimize this convex function and are ther
ranteed to reduce D (S, R). Subsequently, one can temporarily fix the paramet
and in the same way compute updated values for S(1)
that are also guarante
uce D (S, R). One can keep alternating between fixing S(1)
and S(2)
until a co
ce criterium of choice is met. Hu et al. [Hu et al. 2008], Pan et al. [Pan et al.
Pan and Scholz [Pan and Scholz 2009] give a detailed descriptions of possible
cedure. The description by Hu et al. contains optimizations for the case in w
sing preferences are uniformly weighted. Pan and Scholz [Pan and Scholz
fix	
  –	
  solve	
  
solve	
  –	
  fix	
  
fix	
  –	
  solve	
  
solve	
  –	
  fix	
  
fix	
  –	
  solve	
  
solve	
  –	
  fix	
  
…	
  
	
  
	
  
	
  
[Hu et al. 2008]
[Pan et al. 2008]
[Pan and Scholz 2009]
[Pilászy et al. 2010]
[Zhou et al. 2008]
[Yao et al. 2014]
[Takàcs and Tikk 2012]
SGD with Sampling if necessary
•  uniform pdf
•  uniform pdf+ bagging
•  pdf ~ popularity
•  pdf ~ gradient size
•  discard samples until large gradient is
encountered
[Rendle et al. 2009]
[Pan and Scholz 2009]
[Rendle and Freudenthaler 2014]
[Rendle and Freudenthaler 2014]
[Weston et al. 2013]
Others
•  expectation maximization
•  cyclic coordinate descent
•  quadratic programming
•  direct computation
•  Variational Inference
[Hofmann 2004, Hofmann 1999]
[Ning and Karypis 2012]
[Christakopoulou and Karypis 2014]
[Aiolli 2014]
[Aiolli 2013]
[Deshpande and Karypis 2004]
[Sigurbjörnsson and Van Zwol 2008]
[Sarwar et al. 2001]
[Mobasher et al. 2001]
[Lin et al. 2002]
[Sarwar et al. 2000]
[Menezes et al. 2010]
[van Leeuwen and Puspitaningrum 2012]
[Verstrepen and Goethals 2014]
[Verstrepen and Goethals 2015]
[Koeningstein et al. 2012]
[Paquet and Koeningstein 2013]
Agenda
•  Introduction
•  Algorithms
•  Netflix
References
1:36 K. Verstrepen et al.
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-
tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.
Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-
tion Systems (TOIS) 22, 1 (2004), 89–115.
Frank H¨oppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, Oded
Mainmon and Lior Rokach (Eds.). Springer, New York, NY.
Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In
Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.
Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-
tems: an introduction. Cambridge University Press.
Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-N
Recommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.
IEEE, 167–174.
Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-
ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 659–667.
Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. In
Proceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.
Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-
book. Springer, 145–186.
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log(1 Sui) +
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings
of the 7th ACM conference on Recommender systems. ACM, 273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-
n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,
39–46.
Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-
actions on Information Systems (TOIS) 22, 1 (2004), 143–177.
Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-
dation methods. In Recommender systems handbook. Springer, 107–144.
Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear
models via coordinate descent. Journal of statistical software 33, 1 (2010), 1.
Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedings
of the 28th annual international ACM SIGIR conference on Research and development in information
retrieval. ACM, 601–602.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
References
Frank H¨oppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, Oded
Mainmon and Lior Rokach (Eds.). Springer, New York, NY.
Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In
Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.
Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-
tems: an introduction. Cambridge University Press.
Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-N
Recommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.
IEEE, 167–174.
Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-
ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 659–667.
Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. In
Proceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.
Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-
book. Springer, 145–186.
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender
systems. Computer 8 (2009), 30–37.
Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule mining
for recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105.
Hao Ma. 2013. An experimental study on implicit social recommendation. In Proceedings of the 36th inter-
national ACM SIGIR conference on Research and development in information retrieval. ACM, 73–82.
Guilherme Vale Menezes, Jussara M Almeida, Fabiano Bel´em, Marcos Andr´e Gonc¸alves, An´ısio Lacerda,
Edleno Silva De Moura, Gisele L Pappa, Adriano Veloso, and Nivio Ziviani. 2010. Demand-driven tag
recommendation. In Machine Learning and Knowledge Discovery in Databases. Springer, 402–417.
Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. 2001. Effective personalization based on
association rule discovery from web usage data. In Proceedings of the 3rd international workshop on
Web information and data management. ACM, 9–15.
Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In Data
Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 497–506.
Rong Pan and Martin Scholz. 2009. Mind the gaps: weighting the unknown in large-scale one-class col-
laborative filtering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 667–676.
Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-
class collaborative filtering. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on.
IEEE, 502–511.
Ulrich Paquet and Noam Koenigstein. 2013. One-class collaborative filtering with random graphs. In Pro-
ceedings of the 22nd international conference on World Wide Web. International World Wide Web Con-
ferences Steering Committee, 999–1008.
Istv´an Pil´aszy, D´avid Zibriczky, and Domonkos Tikk. 2010. Fast als-based matrix factorization for explicit
and implicit feedback datasets. In Proceedings of the fourth ACM conference on Recommender systems.
ACM, 71–78.
Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendation
from implicit feedback. In Proceedings of the 7th ACM international conference on Web search and data
mining. ACM, 273–282.
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian
personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncer-
tainty in Artificial Intelligence. AUAI Press, 452–461.
Jasson DM Rennie and Nathan Srebro. 2005. Fast maximum margin matrix factorization for collaborative
prediction. In Proceedings of the 22nd international conference on Machine learning. ACM, 713–719.
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2000. Analysis of recommendation al-
gorithms for e-commerce. In Proceedings of the 2nd ACM conference on Electronic commerce. ACM,
158–167.
ReferencesBinary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:37
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering
recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web.
ACM, 285–295.
Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver, and Alan Hanjalic. 2012.
CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of
the sixth ACM conference on Recommender systems. ACM, 139–146.
Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative filtering beyond the user-item matrix: A
survey of the state of the art and future challenges. ACM Computing Surveys (CSUR) 47, 1 (2014), 3.
B¨orkur Sigurbj¨ornsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowl-
edge. In Proceedings of the 17th international conference on World Wide Web. ACM, 327–336.
Vikas Sindhwani, Serhat S Bucak, Jianying Hu, and Aleksandra Mojsilovic. 2010. One-class matrix comple-
tion with low-density factorizations. In Data Mining (ICDM), 2010 IEEE 10th International Conference
on. IEEE, 1055–1060.
Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. 2004. Maximum-margin matrix factorization. In
Advances in neural information processing systems. 1329–1336.
G´abor Tak´acs and Domonkos Tikk. 2012. Alternating least squares for personalized ranking. In Proceedings
of the sixth ACM conference on Recommender systems. ACM, 83–90.
Lyle H Ungar and Dean P Foster. 1998. Clustering methods for collaborative filtering. In AAAI workshop on
recommendation systems, Vol. 1. 114–129.
Matthijs van Leeuwen and Diyah Puspitaningrum. 2012. Improving tag recommendation using few associ-
ations. In Advances in Intelligent Data Analysis XI. Springer, 184–194.
Koen Verstrepen and Bart Goethals. 2014. Unifying nearest neighbors collaborative filtering. In Proceedings
of the 8th ACM Conference on Recommender systems. ACM, 177–184.
Koen Verstrepen and Bart Goethals. 2015. Top-N recommendation for Shared Accounts. In Proceedings of
the 9th ACM Conference on Recommender systems. ACM.
Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary image
annotation. In IJCAI, Vol. 11. 2764–2770.
Jason Weston, Ron J Weiss, and Hector Yee. 2013a. Nonlinear latent factorization by embedding multiple
user interests. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 65–68.
Jason Weston, Hector Yee, and Ron J Weiss. 2013b. Learning to rank recommendations with the k-order
statistic loss. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 245–248.
Yuan Yao, Hanghang Tong, Guo Yan, Feng Xu, Xiang Zhang, Boleslaw K Szymanski, and Jian Lu. 2014.
Dual-regularized one-class collaborative filtering. In Proceedings of the 23rd ACM International Confer-
ence on Conference on Information and Knowledge Management. ACM, 759–768.
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative
filtering for the netflix prize. In Algorithmic Aspects in Information and Management. Springer, 337–
348.
Received February 2014; revised March 2015; accepted June 2015

More Related Content

PDF
Matrix Factorization Techniques For Recommender Systems
PPTX
Remote worker's gross vs net salary
PPT
Презентация для соискателей
PDF
Programma convengo li psi m 4-5-6 novembre 2016
PPSX
A Day In The Life Of India
PDF
Crossover Turkiye Karsilama
PDF
Crossover Chief Architect Sinav Surecleri
PPTX
Презентация для соискателей о бирже дистанционной работы Crossover
Matrix Factorization Techniques For Recommender Systems
Remote worker's gross vs net salary
Презентация для соискателей
Programma convengo li psi m 4-5-6 novembre 2016
A Day In The Life Of India
Crossover Turkiye Karsilama
Crossover Chief Architect Sinav Surecleri
Презентация для соискателей о бирже дистанционной работы Crossover

Similar to Tutorial bpocf (20)

PDF
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
PDF
Introduction to Recommender Systems
PDF
Multiverse Recommendation: N-dimensional Tensor Factorization for Context-awa...
PPTX
Recommender Systems: Advances in Collaborative Filtering
PDF
Notes on Recommender Systems pdf 2nd module
PDF
Survey of Recommendation Systems
PDF
Tutorial: Context In Recommender Systems
PDF
ESSIR 2013 Recommender Systems tutorial
PPTX
Collaborative Filtering Survey
PDF
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
PPTX
Recommendation system
PPTX
Lessons learnt at building recommendation services at industry scale
PDF
IntroductionRecommenderSystems_Petroni.pdf
PDF
Research on Recommender Systems: Beyond Ratings and Lists
PDF
Advances In Collaborative Filtering
PPTX
Collaborative Filtering Recommendation System
PDF
Sociocast NODE vs. Collaborative Filtering Benchmark
PPT
Collaborative filtering using orthogonal nonnegative matrix
PPT
Chapter 02 collaborative recommendation
PPT
Chapter 02 collaborative recommendation
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Introduction to Recommender Systems
Multiverse Recommendation: N-dimensional Tensor Factorization for Context-awa...
Recommender Systems: Advances in Collaborative Filtering
Notes on Recommender Systems pdf 2nd module
Survey of Recommendation Systems
Tutorial: Context In Recommender Systems
ESSIR 2013 Recommender Systems tutorial
Collaborative Filtering Survey
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Recommendation system
Lessons learnt at building recommendation services at industry scale
IntroductionRecommenderSystems_Petroni.pdf
Research on Recommender Systems: Beyond Ratings and Lists
Advances In Collaborative Filtering
Collaborative Filtering Recommendation System
Sociocast NODE vs. Collaborative Filtering Benchmark
Collaborative filtering using orthogonal nonnegative matrix
Chapter 02 collaborative recommendation
Chapter 02 collaborative recommendation
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Computer network topology notes for revision
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
annual-report-2024-2025 original latest.
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
1_Introduction to advance data techniques.pptx
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Computer network topology notes for revision
Business Acumen Training GuidePresentation.pptx
Reliability_Chapter_ presentation 1221.5784
annual-report-2024-2025 original latest.
Fluorescence-microscope_Botany_detailed content
Business Ppt On Nestle.pptx huunnnhhgfvu
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
Qualitative Qantitative and Mixed Methods.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Acceptance and paychological effects of mandatory extra coach I classes.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Ad

Tutorial bpocf

  • 1. Collaborative Filtering with Binary, Positive-only Data Tutorial @ ECML PKDD, September 2015, Porto Koen Verstrepen+, Kanishka Bhaduri*, Bart Goethals+ * +
  • 4. Binary, Positive-Only Data — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRES U I R REFERENCES F. Aiolli. 2013. Efficient Top 273–280. Fabio Aiolli. 2014. Convex A 293–296. S.S. Anand and B. Mobasher
  • 5. Collaborative Filtering — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRES U I R REFERENCES F. Aiolli. 2013. Efficient Top 273–280. Fabio Aiolli. 2014. Convex A 293–296. S.S. Anand and B. Mobasher
  • 6. Movies — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  • 7. Music — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  • 8. Social Networks — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  • 9. Tagging / Annotation — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. Paris New York Porto Statue of Liberty Eiffel Tower
  • 10. Also Explicit Feedback — also empirically evaluate the ex 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recomm 273–280. Fabio Aiolli. 2014. Convex AUC optimiza 293–296. 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296.
  • 11. Matrix Representation 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many a evaluation measures, multiple data split methods, sufficiently rand — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16 C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Man evaluation measures, multiple data split methods — also empirically evaluate the explanations extract 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H recommender systems. In Advances in Knowledge Discovery Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P — Convince the reader this is much better than offline, how to 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets evaluation measures, multiple data split methods, sufficien — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde recommender systems. In Advances in Knowledge Discovery and Data M Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance top-n recommendation tasks. In Proceedings of the fourth ACM confer 1       1       1       1           1           1   1           1   — Convince the re 8. EXPERIMENTAL — Who: ? — THE offline com evaluation meas — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. — THE offli evaluatio — also empi 9. SYMBOL U I R REFERENC F. Aiolli. 201 273–280. Fabio Aiolli. 2 293–296. S.S. Anand an C.M. Bishop. Evangelia Ch R
  • 12. Unknown = 0 no negative information 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many a evaluation measures, multiple data split methods, sufficiently rand — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16 C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Man evaluation measures, multiple data split methods — also empirically evaluate the explanations extract 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H recommender systems. In Advances in Knowledge Discovery Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P — Convince the reader this is much better than offline, how to 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets evaluation measures, multiple data split methods, sufficien — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION U I R REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde recommender systems. In Advances in Knowledge Discovery and Data M Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance top-n recommendation tasks. In Proceedings of the fourth ACM confer 1    0   1    0   1    0   1   0   0      0   1   0     0     1   0   0   1    0    0   1   — Convince the re 8. EXPERIMENTAL — Who: ? — THE offline com evaluation meas — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. — THE offli evaluatio — also empi 9. SYMBOL U I R REFERENC F. Aiolli. 201 273–280. Fabio Aiolli. 2 293–296. S.S. Anand an C.M. Bishop. Evangelia Ch R
  • 13. Different Data Ratings Graded relevance, Positive-Only Binary, Positive-Only 1       5       4       3   3           4           2   2   5   5           1       5       4               4           5   5               X       X               X           X   X           •  •  Movies •  Music •  … •  Minutes watched •  Times clicked •  Times listened •  Money spent •  Visits/week •  … •  Seen •  Bought •  Watched •  Clicked •  …
  • 14. Sparse 10 in 10 000
  • 15. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  • 16. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  • 17. pLSA An elegant example [Hofmann 2004]
  • 18. pLSA probabilistic Latent Semantic Analysis — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296.
  • 19. pLSA latent interests — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. U I R D REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conve 293–296. S.S. Anand and B. Mobas — We should emphasise how choosing hyperparameters is oft causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to d 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, m evaluation measures, multiple data split methods, sufficiently — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s recommender systems. In Advances in Knowledge Discovery and Data Min Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of top-n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization 1:30 K. Verstrepen e — Convince the reader ranking is more important than RMSE or MSE. — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random missing ratings: influence of popularity a positivity on evaluation metrics — Marlin et al. :Collaaborative prediction and ranking with non-random missing da — Marlin et al. :collaborative filtering and the missing at random assumption — Steck: Training and testing of recommender systems on data missing not at rand — We should emphasise how choosing hyperparameters is often done in a way t causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to do it etc. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma evaluation measures, multiple data split methods, sufficiently randomized. — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizati 293–296. S.S. Anand and B. Mobasher. 2006. Contex — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296.
  • 20. pLSA generative model — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. U I R D REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conve 293–296. S.S. Anand and B. Mobas — We should emphasise how choosing hyperparameters is oft causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to d 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, m evaluation measures, multiple data split methods, sufficiently — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s recommender systems. In Advances in Knowledge Discovery and Data Min Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of top-n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization 1:30 K. Verstrepen e — Convince the reader ranking is more important than RMSE or MSE. — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random missing ratings: influence of popularity a positivity on evaluation metrics — Marlin et al. :Collaaborative prediction and ranking with non-random missing da — Marlin et al. :collaborative filtering and the missing at random assumption — Steck: Training and testing of recommender systems on data missing not at rand — We should emphasise how choosing hyperparameters is often done in a way t causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to do it etc. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma evaluation measures, multiple data split methods, sufficiently randomized. — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizati 293–296. S.S. Anand and B. Mobasher. 2006. Contex — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296.
  • 21. pLSA probabilistic weights — also empirically evaluate 9. SYMBOLS FOR PRESENTA U I R REFERENCES F. Aiolli. 2013. Efficient Top-N R 273–280. Fabio Aiolli. 2014. Convex AUC o 293–296. 9. SYMBOLS FO U I R REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. U I R D REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conve 293–296. S.S. Anand and B. Mobas — We should emphasise how choosing hyperparameters is oft causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to d 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, m evaluation measures, multiple data split methods, sufficiently — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s recommender systems. In Advances in Knowledge Discovery and Data Min Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of top-n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization 1:30 K. Verstrepen e — Convince the reader ranking is more important than RMSE or MSE. — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random missing ratings: influence of popularity a positivity on evaluation metrics — Marlin et al. :Collaaborative prediction and ranking with non-random missing da — Marlin et al. :collaborative filtering and the missing at random assumption — Steck: Training and testing of recommender systems on data missing not at rand — We should emphasise how choosing hyperparameters is often done in a way t causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better than offline, how to do it etc. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma evaluation measures, multiple data split methods, sufficiently randomized. — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst — Who: ? — THE offline comparison of OCCF algorithms evaluation measures, multiple data split me — also empirically evaluate the explanations e 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Ve 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N r 293–296. — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizati 293–296. S.S. Anand and B. Mobasher. 2006. Contex — THE offline compariso evaluation measures, — also empirically evalu 9. SYMBOLS FOR PRESE x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top- 273–280. Fabio Aiolli. 2014. Convex AU 293–296. I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. E 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, N Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde recommender systems. In Advances in Knowledge Discovery and Data M Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance top-n recommendation tasks. In Proceedings of the fourth ACM confer 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recom C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014 recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of t 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehens Methods. In Recommender Systems Handbook, F. x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 PD d=1 p(d | u) = 1P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Data 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedb 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 PD d=1 p(d | u) =P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 REFERENCES F. Aiolli. 2013. Efficient 273–280. Fabio Aiolli. 2014. Conv 293–296. S.S. Anand and B. Moba C.M. Bishop. 2006. Patte
  • 22. pLSA compute like-probability — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP
  • 23. pLSA computing the weights — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an D d = 1 d = 1 d = D — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 (tempered)  Expecta5on-­‐Maximiza5on  (EM)   (1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) Recommendation for Very Large Scale Binary Rated Datasets. In RecSy optimization for top-N recommendation with implicit feedback. In RecSy 06. Contextual Recommendation. In WebMine. 142–160. gnition and Machine Learning. Springer, New York, NY. George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-
  • 25. pLSA recap — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP
  • 26. pLSA recap — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2 models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. x AUC optimization for top-N recommendation with implicit feedback. In Proceed- Conference on Recommender systems. ACM, 293–296. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) ent top-n recommendation for very large scale binary rated datasets. In Proceedings erence on Recommender systems. ACM, 273–280. ex AUC optimization for top-N recommendation with implicit feedback. In Proceed- p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35 rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ES 013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings ACM conference on Recommender systems. ACM, 273–280. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- e 8th ACM Conference on Recommender systems. ACM, 293–296. h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
  • 27. pLSA recap — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2|u| models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 2 models/user 2|u| S = S(1) ⇤ S(2) + S(3) + S(4) S(5) S(6) p(d1|u) p(d2|u) p(dD|u) ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) nt top-n recommendation for very large scale binary rated datasets. In Proceedings rence on Recommender systems. ACM, 273–280. x AUC optimization for top-N recommendation with implicit feedback. In Proceed- Conference on Recommender systems. ACM, 293–296. K. Verstrepen et al. p(i|d1) p(i|d2) p(i|dD) ent top-n recommendation for very large scale binary rated datasets. In Proceedings erence on Recommender systems. ACM, 273–280. ex AUC optimization for top-N recommendation with implicit feedback. In Proceed- p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan p(i|u) = DX d=1 p(i|d) · p(d|u) ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35 rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ES 013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings ACM conference on Recommender systems. ACM, 273–280. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- e 8th ACM Conference on Recommender systems. ACM, 293–296. h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
  • 28. pLSA matrix factorization notation — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many d evaluation measures, multiple data split methods, s — also empirically evaluate the explanations extracted 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP evaluation measures, multiple data split methods, su — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) = DX p(i|d) · p(d|u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) = DX d=1 p(i|d) · p(d|u) max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January p(i|u) = DX d=1 p(i|d) · p(d|u) max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publicati p(i|u) = X d=1 p(i|d) · p(d|u max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D ACM Computing Surveys, Vol. 1, No
  • 29. pLSA matrix factorization notation — also empirically 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Efficie 273–280. Fabio Aiolli. 2014. Co 293–296. 9. SYM U I R REFER F. Aioll 273 Fabio A 293 U I R D REFERENCE F. Aiolli. 2013 273–280. Fabio Aiolli. 20 293–296. S.S. Anand an — We should emphasise how choo causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF evaluation measures, multiple d — also empirically evaluate the exp 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommen 273–280. Fabio Aiolli. 2014. Convex AUC optimizat 293–296. S.S. Anand and B. Mobasher. 2006. Conte C.M. Bishop. 2006. Pattern Recognition an Evangelia Christakopoulou and George Ka recommender systems. In Advances in Paolo Cremonesi, Yehuda Koren, and Ro top-n recommendation tasks. In Proc 39–46. M. Deshpande and G. Karypis. 2004. Item 143–177. C. Desrosiers and G. Karypis. 2011. A C Methods. In Recommender Systems H Springer, Boston, MA. Jerome Friedman, Trevor Hastie, and Ro 1:30 — Convince the reader ranking is more impo — data splits (leave-one-out, 5 fold, ...) — Pradel et al. :ranking with non-random m positivity on evaluation metrics — Marlin et al. :Collaaborative prediction an — Marlin et al. :collaborative filtering and th — Steck: Training and testing of recommend — We should emphasise how choosing hype causes leakage. 7.2. online — Who: Kanishka? — Convince the reader this is much better th 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithm evaluation measures, multiple data split m — also empirically evaluate the explanations 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for V 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recomm C.M. Bishop. 2006. Pattern Recognition and Machine L Evangelia Christakopoulou and George Karypis. 2014. recommender systems. In Advances in Knowledge Paolo Cremonesi, Yehuda Koren, and Roberto Turrin top-n recommendation tasks. In Proceedings of th — Who: ? — THE offline comparison of O evaluation measures, multip — also empirically evaluate th 9. SYMBOLS FOR PRESENTATI x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficient Top-N Reco 273–280. Fabio Aiolli. 2014. Convex AUC opti 293–296. — THE offline comp evaluation measu — also empirically e 9. SYMBOLS FOR P x U I R D d = 1 d = D ... REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — THE o evalua — also em 9. SYMB x U I R D d = 1 d = D ... REFERE F. Aiolli. 2 273–2 Fabio Aiol 293–2 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendat 273–280. Fabio Aiolli. 2014. Convex AUC optimization f 293–296. S.S. Anand and B. Mobasher. 2006. Contextua C.M. Bishop. 2006. Pattern Recognition and M Evangelia Christakopoulou and George Karyp recommender systems. In Advances in Kn Paolo Cremonesi, Yehuda Koren, and Roberto top-n recommendation tasks. In Proceedin 39–46. U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(i | d) REFERENCES F. Aiolli. 2013. Efficient T 273–280. Fabio Aiolli. 2014. Convex 293–296. S.S. Anand and B. Mobash C.M. Bishop. 2006. Pattern Evangelia Christakopoulou recommender systems Paolo Cremonesi, Yehuda top-n recommendation 39–46. M. Deshpande and G. Kar 143–177. C. Desrosiers and G. Kar Methods. In Recomme p(i|u) = DX d=1 p(i|d) · p(d|u) ecommendation for Very Large Scale Binary Rated Datasets. In RecSys. ptimization for top-N recommendation with implicit feedback. In RecSys. 6. Contextual Recommendation. In WebMine. 142–160. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. 8. EXPERIMENTAL EVALUATION — Who: ? — THE offline comparison of OCCF algorithms. Many d evaluation measures, multiple data split methods, s — also empirically evaluate the explanations extracted 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 d = d = ... u i p(u p(d p(d p(i DP evaluation measures, multiple data split methods, su — also empirically evaluate the explanations extracted. 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) = DX p(i|d) · p(d|u) D ⇥ |U| |I| |U| |I| |I| D Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31 Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2) ltering: A Theoretical and Experimental Comparison of the State Of The Art1:31 Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2)
  • 30. Scores = Matrix Factorization — Who: ? — THE offline comparison of OCCF alg evaluation measures, multiple data — also empirically evaluate the explan 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P — also empirically evaluate the explan 9. SYMBOLS FOR PRESENTATION x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) max P log p(i | u) D ⇥ |U| |I| |U| |I| |I| D Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2) Collaborative Filtering: A Theoretical and Experimental Comparison of the Stat Sui = S (1) u⇤ · S (2) ⇤i S = S(1) S(2)
  • 31. Deviation Function S = ⇣ S(1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) Efficient Top-N Recommendation for Very Large Scale Binary Rated Dat 4. Convex AUC optimization for top-N recommendation with implicit feed B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. 6. Pattern Recognition and Machine Learning. Springer, New York, NY. akopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear r systems. In Advances in Knowledge Discovery and Data Mining. Spring , Yehuda Koren, and Roberto Turrin. 2010. Performance of recommend max X Rui=1 log p(i|u) max X Rui=1 log Sui min X Rui=1 log Sui min D (S, R) = X Rui=1 log Sui min D (S, R) S = ⇣ S(1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) max X Rui=1 log Sui min X Rui=1 log Sui min D (S, R) = X Rui=1 log Sui S = ⇣ S(1,1) · · · S(1,F1) ⌘ + · · · + ⇣ S(T,1) · · · S(T,FT ) ⌘ max X Rui=1 log p(i|u) max X Rui=1 log Sui min X Rui=1 log Sui min D (S, R) = X Rui=1 log Sui
  • 32. Summary: 2 Basic Building Blocks Factorization Model Deviation Function
  • 33. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Parameter inference •  Netflix
  • 34. Tour of The Models
  • 35. pLSA soft clustering interpretation user-item scores user-cluster affinity item-cluster affinity mixed clusters [Hofmann 2004] [Hu et al. 2008] [Pan et al. 2008] [Sindhwani et al. 2010] [Yao et al. 2014] [Pan and Scholz 2009] [Rendle et al. 2009] [Shi et al. 2012] [Takàcs and Tikk 2012]
  • 36. pLSA soft clustering interpretation — Who: Kanishka? — Convince the rea 8. EXPERIMENTAL — Who: ? — THE offline comp evaluation meas — also empirically 9. SYMBOLS FOR P x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 max P Rui=1 log p(i REFERENCES F. Aiolli. 2013. Efficien 273–280. Fabio Aiolli. 2014. Con 293–296. S.S. Anand and B. Mob — Who: ? — THE offline co evaluation me — also empirical 9. SYMBOLS FO x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = P i2I p(i | d) = 1 max P Rui=1 log REFERENCES F. Aiolli. 2013. Effi 273–280. Fabio Aiolli. 2014. C 293–296. S.S. Anand and B. M D ⇥ |U| |I| |U| |I| |I| D 0.05   0.1   0.5   0.3   0.4   0.1   0.4   0.1   max P Rui=1 |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D = 4 d = 1 i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 p(i|u) max P Rui=1 log p(i | u) |U| ⇥ |I| |U| ⇥ D D ⇥ |I| |U| |I| D = 4 d = 1 ACM Comp Binary, Positive-Only Collaborative Filtering d = 2 d = 3 d = 4 S Binary, Positive-Only Collaborative Filtering d = 2 d = 3 d = 4 S Binary, Positive-Only Collaborative Filtering: A d = 2 d = 3 d = 4 Sui S 0.04   0.01   0.20   0.03   0.28   user-item scores user-cluster affinity item-cluster affinity
  • 37. Hard Clustering user-item scores user-uCluster membership item-iCluster membership item probabilities uCluster-iCluster similarity [Hofmann 2004] [Hofmann 1999] [Ungar and Foster 1998]
  • 38. Item Similarity dense user-item scores original rating matrix item-item similarity [Rendle et al. 2009] [Aiolli 2013]
  • 39. Item Similarity sparse user-item scores item-item similarityoriginal rating matrix [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008] [Ning and Karypis 2011]
  • 40. User Similarity sparse user-item scores column normalized original rating matrix (row normalized) user-user similarity [Sarwar et al. 2000]
  • 41. User Similarity dense user-item scores column normalized original rating matrix (row normalized) user-user similarity [Aiolli 2014] [Aiolli 2013]
  • 43. Factored Item Similarity symmetrical user-item scores original rating matrix Identical item profiles evaluation measures, mult — also empirically evaluate th 9. SYMBOLS FOR PRESENTAT x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = 1 P i2I p(i | d) = 1 9. SYMBOLS FO x U I R D d = 1 d = D ... u i p(u | i) p(d | u) p(d | u) 0 p(i | d) 0 DP d=1 p(d | u) = P i2I p(i | d) = 1 max P Rui=1 log item clusters Item-cluster affinity Similarity by dotproduct [Weston et al. 2013b]
  • 44. Factored Item Similarity asymmetrical + bias user-item scores original rating matrix row normalized Item profile if known preference Item profile if candidateitem biasesuser biases [Kabbur et al. 2013]
  • 45. Higher Order Item Similarity inner product user-item scores extended rating matrix Itemset-item similarity selected higher order itemsets[Christakopoulou and Karypis 2014] [Deshpande and Karypis 2004] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012] [Lin et al. 2002]
  • 46. Higher Order Item Similarity max product 0.05   0.1   0.5   0.3   0.4   0.1   0.4   0.1   0.04   0.01   0.20   0.03   0.20   max MP [Sarwar et al. 2001] [Mobasher et al. 2001]
  • 47. Higher Order User Similarity inner product user-item scores user-userset similarity extended rating matrix selected higher order usersets [Lin et al. 2002]
  • 48. Best of few user models non linearity by max [Weston et al. 2013a] Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison o rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In P of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear meth recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38– rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ENCES olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings he 7th ACM conference on Recommender systems. ACM, 273–280. olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- of the 8th ACM Conference on Recommender systems. ACM, 293–296. Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top- commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 46. ~ 3 models/user
  • 49. Best of all user models efficient max out of [Verstrepen and Goethals 2015] Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Compariso rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. I of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear me recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) ENCES olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings he 7th ACM conference on Recommender systems. ACM, 273–280. olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- of the 8th ACM Conference on Recommender systems. ACM, 293–296. Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top- commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 46. . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log(1 Sui) + ⇣ ||S( 2|u| models/user REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale b of the 7th ACM conference on Recommender systems. ACM, 273–280 Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation ings of the 8th ACM Conference on Recommender systems. ACM, 29 Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recom C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o recommender systems. In Advances in Knowledge Discovery and Da Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance n recommendation tasks. In Proceedings of the fourth ACM conferen 39–46. Mukund Deshpande and George Karypis. 2004. Item-based top-n recom 2 models/ 2|u| REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very of the 7th ACM conference on Recommender systems. A Fabio Aiolli. 2014. Convex AUC optimization for top-N rec ings of the 8th ACM Conference on Recommender syste Sarabjot Singh Anand and Bamshad Mobasher. 2007. Con C.M. Bishop. 2006. Pattern Recognition and Machine Lear Evangelia Christakopoulou and George Karypis. 2014. Ho recommender systems. In Advances in Knowledge Dis Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010 n recommendation tasks. In Proceedings of the fourth
  • 50. Combination item vectors can be shared [Kabbur and Karypis 2014] Binary, Positive-Only Collaborative Filtering: A Theoretical and E rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U = Z ( ) · p( D(S, R) = DKL(Q(S)||p(S| . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large sca of the 7th ACM conference on Recommender systems. ACM, 273 Fabio Aiolli. 2014. Convex AUC optimization for top-N recommenda ings of the 8th ACM Conference on Recommender systems. ACM Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual r C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Spr Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Hig recommender systems. In Advances in Knowledge Discovery an rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R = Z ( ) · p( | ) · d D(S, R) = DKL(Q(S)||p(S|R)) . . . max for every (u, i) REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedba ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear m recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender alg n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender 39–46.
  • 51. Sigmoid link function for probabilistic frameworks [Johnson 2014] = r u2U i2I Rui=1 j2I Duij(S, R) = = Z ( ) · p(
  • 52. ive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of t rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) = Z ( ) · p( | ) · d CES 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Pro h ACM conference on Recommender systems. ACM, 273–280. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In he 8th ACM Conference on Recommender systems. ACM, 293–296. gh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Pdf over parameters i.s.o. point estimation [Koeningstein et al. 2012] [Paquet and Koeningstein 2013]
  • 53. Summary: 2 Basic Building Blocks Factorization Model Deviation Function
  • 54. Summary: 2 Basic Building Blocks Factorization Model Deviation Function a.k.a. What do we minimize in order to find the parameters in the factor matrices?
  • 55. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  • 56. Tour of Deviation Functions
  • 57. Local Minima depending on initialisation Rui=1 X Rui=1 log Sui = X Rui=1 log Sui n D (S, R) for Very Large Scale Binary Rated Datasets. In RecSys. top-N recommendation with implicit feedback. In RecSys. ecommendation. In WebMine. 142–160. hine Learning. Springer, New York, NY. 2014. Hoslim: Higher-order sparse linear method for top-n ledge Discovery and Data Mining. Springer, 38–49. urrin. 2010. Performance of recommender algorithms on s of the fourth ACM conference on Recommender systems. d Top-N Recommendation Algorithms. TOIS 22, 1 (2004), hensive Survey of Neighborhood-based Recommendation ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). hirani. 2010. Regularization paths for generalized linear tistical software 33, 1 (2010), 1. en PLSA and NMF and implications. In SIGIR. 601–602. Indexing. In SIGIR. 50–57. ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1 Mining and Knowledge Discovery Handbook, O. Mainmon Y. for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KN every row S (1) u. and (S(1,1) , . . . , S(T,F ) ) REFERENCES
  • 58. Max Likelihood high scores for known preferences max Rui=1 log Sui min X Rui=1 log Sui in D (S, R) = X Rui=1 log Sui Binary, Positive-Only Collaborative Filtering: A Theoretical a d = 2 d = 3 d = 4 D = |I| S S(1) S(2) S (1) ud 0 S (1) di 0 Binary, Positive-Only Collaborative Filtering: A Th d = 2 d = 3 d = 4 D = |I| S S(1) S(2) S (1) ud 0 S (1) di 0 Binary, Positive-Only Collaborative d = 2 d = 3 d = 4 D = |I| S S(1) S(2) S (1) ud 0 S (1) di 0 1 1 [Hofmann 2004] [Hofmann 1999]
  • 59. Reconstruction w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 NCES 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 280. lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 296. d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for
  • 60. Reconstruction w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 NCES 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 280. lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 296. d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for `Ridge’regularization[Kabbur et al. 2013] [Kabbur and Karypis 2014]
  • 61. Reconstruction Elastic net regularization w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 NCES 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 280. lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 296. d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for (1 0)2 = 1 = (1 2)2 w(j) = X u2U Ruj X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 FERENCES Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Dataset 273–280. bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac 293–296. `Ridge’regularization [Ning and Karypis 2011] [Christakopoulou and Karypis 2014] [Kabbur et al. 2013] [Kabbur and Karypis 2014]
  • 62. Reconstruction between AMAU and AMAN u2U i2I rly makes the AMAU assumption. Making the A all missing values are interpreted as an absenc on becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . he AMAU assumption is too careful because the atives. On the other hand, the AMAN assumpti ually searching for the preferences among the un t al. 2008] and Pan et al. [Pan et al. 2008] simul een AMAU and AMAN: D (S, R) = X X Wui (Rui Sui) 2 , AMAN
  • 63. Reconstruction between AMAU and AMAN 999]. Ungar and Foster [Ungar and Foster 199 ethod, but remain vague about the details of t tion Based Deviation Functions. Next, there is a -based matrix factorization algorithms for ra Bell 2011]. They start from the 2-factor factor (Eq. 3) but strip the parameters of all their st ated to be an approximate, factorized reconstru h is to find S(1) and S(2) such that they mini ror between S and R. A deviation function th D (S, R) = X u2U X i2I Rui (Rui Sui) 2 . u2U i2I rly makes the AMAU assumption. Making the A all missing values are interpreted as an absenc on becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . he AMAU assumption is too careful because the atives. On the other hand, the AMAN assumpti ually searching for the preferences among the un t al. 2008] and Pan et al. [Pan et al. 2008] simul een AMAU and AMAN: D (S, R) = X X Wui (Rui Sui) 2 , AMAU AMAN
  • 64. Reconstruction between AMAU and AMAN 999]. Ungar and Foster [Ungar and Foster 199 ethod, but remain vague about the details of t tion Based Deviation Functions. Next, there is a -based matrix factorization algorithms for ra Bell 2011]. They start from the 2-factor factor (Eq. 3) but strip the parameters of all their st ated to be an approximate, factorized reconstru h is to find S(1) and S(2) such that they mini ror between S and R. A deviation function th D (S, R) = X u2U X i2I Rui (Rui Sui) 2 . u2U i2I rly makes the AMAU assumption. Making the A all missing values are interpreted as an absenc on becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . he AMAU assumption is too careful because the atives. On the other hand, the AMAN assumpti ually searching for the preferences among the un t al. 2008] and Pan et al. [Pan et al. 2008] simul een AMAU and AMAN: D (S, R) = X X Wui (Rui Sui) 2 , nction becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . , the AMAU assumption is too careful because the egatives. On the other hand, the AMAN assumpti actually searching for the preferences among the u Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu tween AMAU and AMAN: D (S, R) = X u2U X i2I Wui (Rui Sui) 2 , n⇥m assigns a weight to every value in R. The hig bout Rui. There is a high confidence about the one fidence about the zeros being dislikes. To formaliz 2008] give two potential definitions of Wui: AMAU AMAN Middle Way
  • 65. Reconstruction choosing W nction becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . , the AMAU assumption is too careful because the egatives. On the other hand, the AMAN assumpti actually searching for the preferences among the u Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu tween AMAU and AMAN: D (S, R) = X u2U X i2I Wui (Rui Sui) 2 , n⇥m assigns a weight to every value in R. The hig bout Rui. There is a high confidence about the one fidence about the zeros being dislikes. To formaliz 2008] give two potential definitions of Wui: Middle Way d=1 X i2I S (1) di = 1 ⇢ Wui = 1 if Rui = 0 Wui = ↵ if Rui = 1 ient Top-N Recommendation for Very Large Scale Binary Rate eys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
  • 66. Reconstruction regularization ollaborative Filtering: A Theoretical and Experimental Comparison matrix factorization of its statistical meaning, also the c 9 disappear. Simply minimizing Equation 11 however r t are overfitted on the training data. Therefore both Hu o minimize a regularized version R) = X u2U X i2I Wui (Rui Sui) 2 + ⇣ ||S(1) ||F + ||S(2) ||F ⌘ , egularization hyperparameter and ||.||F the Frobenius an make it hard to find a good value. Additionally, Pan te regularization: = X u2U X i2I Wui ⇣ (Rui Sui) 2 + ⇣ ||S (1) u⇤ ||F + ||S (2) ⇤j ||F ⌘⌘ . Squared reconstruction error term Regularization term Regularization hyperparameter [Hu et al. 2008] [Pan et al. 2008] [Pan and Scholz 2009]
  • 67. Reconstruction more complex matrix factorization of its statistical meaning, also the c disappear. Simply minimizing Equation 11 however re are overfitted on the training data. Therefore both Hu o minimize a regularized version ) = X u2U X i2I Wui (Rui Sui) 2 + ⇣ ||S(1) ||F + ||S(2) ||F ⌘ , gularization hyperparameter and ||.||F the Frobenius n an make it hard to find a good value. Additionally, Pan te regularization: = X u2U X i2I Wui ⇣ (Rui Sui) 2 + ⇣ ||S (1) u⇤ ||F + ||S (2) ⇤j ||F ⌘⌘ . function is defined over all user-item pairs, a direct s stochastic gradient descent (SGD), which is frequentl orizations in rating prediction problems, seems unfeasib
  • 68. Reconstruction rewritten S ||F + ||S ||F X 2U X i2I (1 Rui)H (Pui) , X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F N Recommendation for Very Large Scale Binary Rated Datasets UC optimization for top-N recommendation with implicit feedback
  • 69. Reconstruction rewritten S ||F + ||S ||F X 2U X i2I (1 Rui)H (Pui) , X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F N Recommendation for Very Large Scale Binary Rated Datasets UC optimization for top-N recommendation with implicit feedback
  • 70. S ||F + ||S ||F X 2U X i2I (1 Rui)H (Pui) , X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F N Recommendation for Very Large Scale Binary Rated Datasets UC optimization for top-N recommendation with implicit feedback Reconstruction guess unknown = 0
  • 71. + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ p (1 Sui) 2 + (1 p) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 Reconstruction unknown can also be 1 [Yao et al. 2014]
  • 72. Reconstruction less assumptions, more parameters u2U i2I + X u2U X i2I (1 Rui)Wui (0 Sui) 2 + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba 96. and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. p. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
  • 73. Reconstruction more regularization + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F ↵ X u2U X i2I (1 Rui)H (Pui) NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
  • 74. sitive-Only Collaborative Filtering: A Theoretical and Experimental Compariso X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F ↵ X u2U X i2I (1 Rui)H (Pui) NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datas 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba Reconstruction more (flexible) parameters [Sindhwani et al. 2010]
  • 75. Reconstruction conceptual flaw imate, factorized reconstruction of R d S(2) such that they minimize the R. A deviation function that reflect X 2U X i2I Rui (Rui Sui) 2 . AU assumption. Making the AMAN s are interpreted as an absence of pr = X X (Rui Sui) 2 . 1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 S F + ||S(2) ||F X I (1 Rui)H (Pui) (1 0)2 = 1 = (1 2)2 -N Recommendation for Very Large Scale Binary Rate UC optimization for top-N recommendation with impli
  • 76. Log likelihood similar idea [C. Johnson 2014] . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log ⇣ 1 Sui) + (||S(1) ||2 F + ||S(2) ||2 F ⌘ NCES . 2013. Efficient top-n recommendation for very large scale binary rated datasets. 7th ACM conference on Recommender systems. ACM, 273–280.
  • 77. Log likelihood similar idea [C. Johnson 2014] . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log ⇣ 1 Sui) + (||S(1) ||2 F + ||S(2) ||2 F ⌘ NCES . 2013. Efficient top-n recommendation for very large scale binary rated datasets. 7th ACM conference on Recommender systems. ACM, 273–280. . . . max for every (u, i) max log p(S|R) max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log(1 Sui) + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ NCES li. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In P 7th ACM conference on Recommender systems. ACM, 273–280. Zero-­‐mean,  spherical   Gaussian  priors  
  • 78. Maximum Margin not all preferences equally preferred ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, on funtion as ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃, orm, a regularization hyperparameter, h ⇣ ˜Rui · Su igure 3 [Rennie and Srebro 2005] and W given b on incorporates the confidence about the training da knowledge about the degree of preference by means e the degree of preference is considered unknown, a [Pan and Scholz 2009]
  • 79. Maximum Margin not all preferences equally preferred ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, on funtion as ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃, orm, a regularization hyperparameter, h ⇣ ˜Rui · Su igure 3 [Rennie and Srebro 2005] and W given b on incorporates the confidence about the training da knowledge about the degree of preference by means e the degree of preference is considered unknown, a [Pan and Scholz 2009]
  • 80. Maximum Margin not all preferences equally preferred ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, on funtion as ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃, orm, a regularization hyperparameter, h ⇣ ˜Rui · Su igure 3 [Rennie and Srebro 2005] and W given b on incorporates the confidence about the training da knowledge about the degree of preference by means e the degree of preference is considered unknown, a [Pan and Scholz 2009] 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a simila method, but remain vague about the details of their method. uction Based Deviation Functions. Next, there is a group of algorithm D-based matrix factorization algorithms for rating prediction prob d Bell 2011]. They start from the 2-factor factorization that describe l (Eq. 3) but strip the parameters of all their statistical meaning. In lated to be an approximate, factorized reconstruction of R. A straigh h is to find S(1) and S(2) such that they minimize the the square rror between S and R. A deviation function that reflects this line o D (S, R) = X u2U X i2I Rui (Rui Sui) 2 . early makes the AMAU assumption. Making the AMAN assumption nd, all missing values are interpreted as an absence of preference an nction becomes D (S, R) = X u2U X i2I (Rui Sui) 2 . , the AMAU assumption is too careful because the vast majority of th has important practical consequences: If Rui = 1, the square loss and Sui = 2. However, Sui = 2 is a much better prediction than Sui the reconstruction based deviation functions (implicitly) assume are equally strong, which is an important simplification of reality A deviation function that does not suffer from this flaw was p Scholz [Pan and Scholz 2009], who applied the idea of Maximum torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, orative filtering. They construct the matrix ˜R as ⇢ ˜Rui = 1 if Rui = 1 ˜Rui = 1 if Rui = 0, and define the deviation funtion as D ⇣ S, ˜R ⌘ = X u2U X i2I Wuih ⇣ ˜Rui · Sui ⌘ + ||S||⌃ with ||.||⌃ the trace norm, a regularization hyperparameter, h ⇣ hinge loss given by Figure 3 [Rennie and Srebro 2005] and W Equations 14-16. The deviation function incorporates the confidence about the tra of W and the missing knowledge about the degree of preference b loss h ⇣ ˜Rui · Sui ⌘ . Since the degree of preference is considered unk 1 is not penalized. ons. Notice that R is a binary matrix and ued matrix. Therefore, the interpretation amentally flawed. This fundamental flaw i = 1, the square loss is 1 for both Sui = 0 er prediction than Sui = 0. Put differently, s (implicitly) assume that all preferences mplification of reality. from this flaw was proposed by Pan and he idea of Maximum Margin Matrix Fac- et al. 2004] to binary, positive-only collab- ˜R as if Rui = 1 if Rui = 0, p(i|d2) p(i|dD) Sui = 0 Sui = 2 p(i|d1) p(i|d2) p(i|dD) Sui = 0 Sui = 2 1 1 ~ 0.5 0
  • 81. e J(U, V, ) is fairly simple. Ignoring fo e non-differentiability of h(z) = (1 z ient of J(U, V, ) is easy to compute. T ve with respect to each element of U is: Uia C R 1X r=1 X j|ij2S Tij(k)h ⇣ T r ij( ir to the best of our knowledge they have not yet been used for one- tering. anking Based Deviation Functions. The scores computed by recommen used to personally rank all items for every user. Therefore, Rendle 2009] argued that it is natural to directly optimize the ranking. M aim to maximize the area under the ROC curve (AUC), which is AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin odel S are more in line with the observed data R. However, beca on-differentiable, their deviation function is a differentiable app gative AUC from which constant factors have been removed and ation term has been added: D ⇣ S, ˜R ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) | the sigmoid function and 1, 2 regularization constants, which AUC directly optimize the ranking [Rendle et al. 2009]
  • 82. AUC directly optimize the ranking [Rendle et al. 2009] max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
  • 83. AUC non-differentiable [Rendle et al. 2009] max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
  • 84. AUC smooth approximation n-differentiability of h(z) = (1 z)+ of J(U, V, ) is easy to compute. The th respect to each element of U is: C R 1X r=1 X j|ij2S Tij(k)h ⇣ Tr ij( ir Ui ased Deviation Functions. The scores computed by recommender s personally rank all items for every user. Therefore, Rendle et al rgued that it is natural to directly optimize the ranking. More s maximize the area under the ROC curve (AUC), which is give AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), and (false) = 0. If the AUC is higher, the pairwise rankings in are more in line with the observed data R. However, because rentiable, their deviation function is a differentiable approxim AUC from which constant factors have been removed and to w rm has been added: ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) ||2 F , moid function and 1, 2 regularization constants, which are he method. Notice that this deviation function coniders all m negative, i.e. it corresponds to the AMAN assumption.[Rendle et al. 2009] max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I
  • 85. Pairwise Ranking 2 similar to AUC + ||S ||F + ||S ||F ↵ X u2U X i2I (1 Rui)H (Pui) (57 (1 0)2 = 1 = (1 2)2 (58 w(j) = X u2U Ruj ˜R ⌘ = X u2U X Rui=1 X Ruj =0 ((Rui Ruj) (Sui Suj)) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F ES 3. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy nd B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. ristakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top- nder systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. [Kabbur et al. 2013]
  • 86. Pairwise Ranking 3 no regularization, also 1 to 1 + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ , larization constant and () the sigmoid function. Notice that this n de facto ignores all missing feedback, i.e. it corresponds to the AM r ranking based deviation function was proposed by Tak´acs and Tikk 2012] D ⇣ S, ˜R ⌘ = X u2U X i2I Rui X j2I w(j) ((Sui Suj) (Rui Ruj)) 2 , er-defined item weighting function. The simplest choice is w(j) = native proposed by Tak´acs and Tikk is w(j) = P u2U Ruj. This devia some resemblance with the one in Equation 4.1.4. However, a squ stead of the log-loss of the sigmoid. Furthermore, this deviation fun s the score-difference between all known preferences, which is not .1.4. Finally, it is remarkable that Tak´acs and Tikk explicitly do not on term, whereas most other authors find that the regularization ter their models performance. or Probability Deviation Functions. At this point, we almost finished X u2U X i2I RuiWui (1 Sui) 2 + X u2U X i2I (1 Rui)Wui ⇣ Pui (1 Sui) 2 + (1 Pui) (0 Sui) 2 ⌘ + ||S(1) ||F + ||S(2) ||F ↵ X u2U X i2I (1 Rui)H (Pui) (1 0)2 = 1 = (1 2)2 w(j) = X u2U Ruj NCES 013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re 80. i. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re 96. and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. [Takàcs and Tikk 2012]
  • 87. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR focus on top of the ranking [Shi et al. 2012]
  • 88. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR non-differentiable [Shi et al. 2012]
  • 89. K. Verstrepen et al. h r>(a | B) gives the rank of a among all numbers in B when ordered in de- g order. Unfortunately, the non-smoothness of r>() and max makes the direct ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR. h this smoothed version differentiable, it could still be practically intractable mize it. Therefore, they propose to optimize a lower bound instead. After also regularization terms, their final deviation function is given by D ⇣ S, ˜R ⌘ = X u2U X i2I Rui ⇣ log (Sui) + X j2I log (1 Ruj (Suj Sui)) ⌘ + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ , (21) a regularization constant and () the sigmoid function. Notice that this devi- nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR differentiable approximation, computationally feasible [Shi et al. 2012]
  • 90. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR known preferences score high promote K. Verstrepen et al. h r>(a | B) gives the rank of a among all numbers in B when ordered in de- g order. Unfortunately, the non-smoothness of r>() and max makes the direct ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR. h this smoothed version differentiable, it could still be practically intractable mize it. Therefore, they propose to optimize a lower bound instead. After also regularization terms, their final deviation function is given by D ⇣ S, ˜R ⌘ = X u2U X i2I Rui ⇣ log (Sui) + X j2I log (1 Ruj (Suj Sui)) ⌘ + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ , (21) a regularization constant and () the sigmoid function. Notice that this devi- nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU[Shi et al. 2012]
  • 91. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR push down other known preferences K. Verstrepen et al. h r>(a | B) gives the rank of a among all numbers in B when ordered in de- g order. Unfortunately, the non-smoothness of r>() and max makes the direct ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR. h this smoothed version differentiable, it could still be practically intractable mize it. Therefore, they propose to optimize a lower bound instead. After also regularization terms, their final deviation function is given by D ⇣ S, ˜R ⌘ = X u2U X i2I Rui ⇣ log (Sui) + X j2I log (1 Ruj (Suj Sui)) ⌘ + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ , (21) a regularization constant and () the sigmoid function. Notice that this devi- nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU promote scatter [Shi et al. 2012]
  • 92. al derivative with respect to Vja is analo rivative with respect to ik is @J @ ir = C X j|ij2S Tr ijh ⇣ Tr ij( ir UiVj ) gradient in-hand, we can turn to gradie he sigmoid function and 1, 2 regularization constants, whic s of the method. Notice that this deviation function conider qually negative, i.e. it corresponds to the AMAN assumption. , very often, only the N highest ranked items are shown to use Shi et al. 2012] propose to minimize the mean reciprocal rank (M . The MRR is defined as MRR = 1 |U| X u2U r> ✓ max Rui=1 Sui | Su⇤ ◆ 1 , g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. MRR corresponds to AMAU assumption K. Verstrepen et al. h r>(a | B) gives the rank of a among all numbers in B when ordered in de- g order. Unfortunately, the non-smoothness of r>() and max makes the direct ation of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR. h this smoothed version differentiable, it could still be practically intractable mize it. Therefore, they propose to optimize a lower bound instead. After also regularization terms, their final deviation function is given by D ⇣ S, ˜R ⌘ = X u2U X i2I Rui ⇣ log (Sui) + X j2I log (1 Ruj (Suj Sui)) ⌘ + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ , (21) a regularization constant and () the sigmoid function. Notice that this devi- nction de facto ignores all missing feedback, i.e. it corresponds to the AMAU promote scatterAMAU [Shi et al. 2012]
  • 93. u2U Rui=1 Ruj=0 X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), CES 13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase 0. 2014. Convex AUC optimization for top-N recommendation with implicit feedba 6. ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. kth-Order Statistic basis = AUC [Weston et al. 2013]
  • 94. kth-Order Statistic strip normalization u2U Rui=1 Ruj=0 X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), CES 13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase 0. 2014. Convex AUC optimization for top-N recommendation with implicit feedba 6. ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. u2U i2I (Rui Sui) + t=1 f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) (5 AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), [Weston et al. 2013]
  • 95. kth-Order Statistic focus on highly ranked negatives u2U Rui=1 Ruj=0 X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), CES 13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase 0. 2014. Convex AUC optimization for top-N recommendation with implicit feedba 6. ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. u2U i2I (Rui Sui) + t=1 f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) (5 AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), u2U i2I t=1 f=1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES [Weston et al. 2013]
  • 96. kth-Order Statistic weight known preferences by rank u2U Rui=1 Ruj=0 X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), CES 13. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datase 0. 2014. Convex AUC optimization for top-N recommendation with implicit feedba 6. ng Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. u2U i2I (Rui Sui) + t=1 f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) (5 AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), u2U i2I t=1 f=1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), ES X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) X X X [Weston et al. 2013]
  • 97. kth-Order Statistic non-differentiable max min ↵u⇤ Rui=1 Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. [Weston et al. 2013]
  • 98. kth-Order Statistic hinge loss & sampling approximations max min ↵u⇤ Rui=1 Ruj =0 ↵ui↵uj(Sui Suj) X u2U X Rui=1 X Ruj=0 (Sui > Suj) X u2U X Rui=1 w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015. ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of T approximation ⌘ = X u2U X Rui=1 w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 max(0, 1 + Suj Sui) N 1|{j 2 I | Ruj = 0}| , (31) hey replaced the indicator function by the hinge-loss and approximated the N 1 |{j 2 I | Ruj = 0}|, in which N the number of items k that were sampled until Suk + 1 > Sui 2 . Furthermore, Weston et al. use the simple unction (Sui|{Sui|Rui=1}) |u| ⌘ = 1 if r>(Sui | S ✓ {Sui | Rui = 1}, |S| = K) = k and (Sui|{Sui|Rui=1}) |u| ⌘ = 0 otherwise , Binary, Positive-Only Collaborative Filtering: A Theoretical and Experime ferentiable approximation D ⇣ S, ˜R ⌘ = X u2U X Rui=1 w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 m N in which they replaced the indicator function by the hinge-los rank with N 1 |{j 2 I | Ruj = 0}|, in which N the numbe randomly sampled until Suk + 1 > Sui 2 . Furthermore, West weighting function 8 <w ⇣ r>(Sui|{Sui|Rui=1}) |u| ⌘ = 1 if r>(Sui | S ✓ {Sui | Rui = 1} ⇣ ⌘ S (1) u⇤ = arg max S (1) u min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj), every user u, it holds that P Rui=1 ↵ui = 1 and P Rui=0 ↵ui = 1. To avoid of ↵, he adds two regularization terms: rg max S (1) u min ↵u⇤ 0 @ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) + p X Rui=1 ↵2 ui + n X Rui=0 ↵2 ui 1 A , regularization hyperparameters. S(1) is regularized by means of the row- ion constraint. Solving the above maximization for every user, is equivalent ing the deviation function = X u2U 0 @max ↵u⇤ 0 @ X Rui=1 X Ruj =0 ↵ui↵uj(Suj Sui) p X Rui=1 ↵2 ui n X Rui=0 ↵2 ui 1 A 1 A . (29) t this approach corresponds to te AMAN assumption. 3: Three Factor Matrices, One Factor Matrix A Priori Unknown also algorithms which model S with 3 factor matrices: S = S(1) S(2) S(3) . of our knowledge, they all follow the special case S = RS(2) S(3) . e, the users are represented by |I|-dimensional binary vectors, the items are d by f-dimensional real vectors and the similarity between two items i and ted by the inner product S (2) i⇤ S (3) ⇤j , which means that S(2) S(3) represents the arity matrix. et al. [Weston et al. 2013] adopt a version of this model with a symmetric arity matrix, which is imposed by setting S(3) = S(2)T . ne hand, the deviation functions in Equation 21 and 4.1.4 try to minimize rank of the known preferences. On the other hand, the deviation function n 22 tries to push one known preference as high as possible to the top of anking (Eq. 22). Weston et al. [Weston et al. 2013] propose to minimize a etween the above two extremes: X ui=1 w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) , (30) function that weights the importance of the known preference as a function icted rank among all known preferences. This weighting function is user- d determines the trade-off between the two extremes, i.e. minimizing the of the known preferences and minimizing the maximal rank of the known s. Because this function is non-differentiable, Weston et al. propose the dif- Weston et al. [Weston et al. 2013] a item-similarity matrix, which is impos On the one hand, the deviation fun the mean rank of the known preferen in Equation 22 tries to push one kno the item-ranking (Eq. 22). Weston et trade-off between the above two extre X u2U X Rui=1 w ✓ r>(Sui | {Sui | Rui = |u| with w() a function that weights the im of its predicted rank among all know defined and determines the trade-off S = RS S . e users are represented by |I|-dimensional binary vectors, the item y f-dimensional real vectors and the similarity between two items by the inner product S (2) i⇤ S (3) ⇤j , which means that S(2) S(3) represen y matrix. . [Weston et al. 2013] adopt a version of this model with a sym y matrix, which is imposed by setting S(3) = S(2)T . hand, the deviation functions in Equation 21 and 4.1.4 try to min k of the known preferences. On the other hand, the deviation fu 2 tries to push one known preference as high as possible to the ng (Eq. 22). Weston et al. [Weston et al. 2013] propose to minim een the above two extremes: w ✓ r>(Sui | {Sui | Rui = 1}) |u| ◆ X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) , ction that weights the importance of the known preference as a fu d rank among all known preferences. This weighting function is etermines the trade-off between the two extremes, i.e. minimizi the known preferences and minimizing the maximal rank of the k 1   2   …   k   …   K   0   0   0   1   0   0   1   2   …   N   false   false   false   true   [Weston et al. 2013]
  • 99. KL-divergence approximation of posterior pdf ui ui ( ) · p( | D(S, R) = DKL(Q(S)||p(S|R)) n recommendation for very large scale binar n Recommender systems. ACM, 273–280. optimization for top-N recommendation wi Approximation of [Koeningstein et al. 2012] [Paquet and Koeningstein 2013]
  • 100. Local Minima converge to local minimum Rui=1 X Rui=1 log Sui = X Rui=1 log Sui n D (S, R) for Very Large Scale Binary Rated Datasets. In RecSys. top-N recommendation with implicit feedback. In RecSys. ecommendation. In WebMine. 142–160. hine Learning. Springer, New York, NY. 2014. Hoslim: Higher-order sparse linear method for top-n ledge Discovery and Data Mining. Springer, 38–49. urrin. 2010. Performance of recommender algorithms on s of the fourth ACM conference on Recommender systems. d Top-N Recommendation Algorithms. TOIS 22, 1 (2004), hensive Survey of Neighborhood-based Recommendation ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). hirani. 2010. Regularization paths for generalized linear tistical software 33, 1 (2010), 1. en PLSA and NMF and implications. In SIGIR. 601–602. Indexing. In SIGIR. 50–57. ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1 Mining and Knowledge Discovery Handbook, O. Mainmon Y. for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KN every row S (1) u. and (S(1,1) , . . . , S(T,F ) ) REFERENCES
  • 101. Convex unique minimum Rui=1 X Rui=1 log Sui = X Rui=1 log Sui n D (S, R) for Very Large Scale Binary Rated Datasets. In RecSys. top-N recommendation with implicit feedback. In RecSys. ecommendation. In WebMine. 142–160. hine Learning. Springer, New York, NY. 2014. Hoslim: Higher-order sparse linear method for top-n ledge Discovery and Data Mining. Springer, 38–49. urrin. 2010. Performance of recommender algorithms on s of the fourth ACM conference on Recommender systems. d Top-N Recommendation Algorithms. TOIS 22, 1 (2004), hensive Survey of Neighborhood-based Recommendation ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). hirani. 2010. Regularization paths for generalized linear tistical software 33, 1 (2010), 1. en PLSA and NMF and implications. In SIGIR. 601–602. Indexing. In SIGIR. 50–57. ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1 Mining and Knowledge Discovery Handbook, O. Mainmon Y. Convex Optimization Algorithm for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KN every row S (1) u. and (S(1,1) , . . . , S(T,F ) ) REFERENCES
  • 102. Max-Min-Margin AUC as average margin e J(U, V, ) is fairly simple. Ignoring fo e non-differentiability of h(z) = (1 z ient of J(U, V, ) is easy to compute. T ve with respect to each element of U is: Uia C R 1X r=1 X j|ij2S Tij(k)h ⇣ T r ij( ir to the best of our knowledge they have not yet been used for one- tering. anking Based Deviation Functions. The scores computed by recommen used to personally rank all items for every user. Therefore, Rendle 2009] argued that it is natural to directly optimize the ranking. M aim to maximize the area under the ROC curve (AUC), which is AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin odel S are more in line with the observed data R. However, beca on-differentiable, their deviation function is a differentiable app gative AUC from which constant factors have been removed and ation term has been added: D ⇣ S, ˜R ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) | the sigmoid function and 1, 2 regularization constants, which[Aiolli 2014]
  • 103. Max-Min-Margin AUC as average margin e J(U, V, ) is fairly simple. Ignoring fo e non-differentiability of h(z) = (1 z ient of J(U, V, ) is easy to compute. T ve with respect to each element of U is: Uia C R 1X r=1 X j|ij2S Tij(k)h ⇣ T r ij( ir to the best of our knowledge they have not yet been used for one- tering. anking Based Deviation Functions. The scores computed by recommen used to personally rank all items for every user. Therefore, Rendle 2009] argued that it is natural to directly optimize the ranking. M aim to maximize the area under the ROC curve (AUC), which is AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin odel S are more in line with the observed data R. However, beca on-differentiable, their deviation function is a differentiable app gative AUC from which constant factors have been removed and ation term has been added: D ⇣ S, ˜R ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) | the sigmoid function and 1, 2 regularization constants, which[Aiolli 2014]
  • 104. ing-based deviation functions. Rendle et al. propose to use exactl n function as in Equation 21 to optimize the AUC [Rendle et al. rence is that for computing S, RS(2) is used instead of S(1) S(2) , i.e. r matrix is unknown. Because S(2) can be interpreted as a item-si call this method BPR-kNN. olli [Aiolli 2014] on the other hand, chooses the user-based alterna ith ¯R the column normalized version of R and S(1) row normalized lds that 1  Sui 1 since Sui = S (1) u⇤ ¯R⇤i with ||S (1) u⇤ ||  1 and || ¯R⇤i vidual user u 2 U, he starts from AUCu, the AUC for u: AUCu = 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj). , he proposes a lower bound on AUCu: AUCu 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 , interprets it as a weighted sum of margins Sui Suj 2 between an s and any absent feedback, in which every margin gets the same w Max-Min-Margin AUC as average margin e J(U, V, ) is fairly simple. Ignoring fo e non-differentiability of h(z) = (1 z ient of J(U, V, ) is easy to compute. T ve with respect to each element of U is: Uia C R 1X r=1 X j|ij2S Tij(k)h ⇣ T r ij( ir to the best of our knowledge they have not yet been used for one- tering. anking Based Deviation Functions. The scores computed by recommen used to personally rank all items for every user. Therefore, Rendle 2009] argued that it is natural to directly optimize the ranking. M aim to maximize the area under the ROC curve (AUC), which is AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin odel S are more in line with the observed data R. However, beca on-differentiable, their deviation function is a differentiable app gative AUC from which constant factors have been removed and ation term has been added: D ⇣ S, ˜R ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) | the sigmoid function and 1, 2 regularization constants, which[Aiolli 2014]
  • 105. Max-Min-Margin AUC as average margin e J(U, V, ) is fairly simple. Ignoring fo e non-differentiability of h(z) = (1 z ient of J(U, V, ) is easy to compute. T ve with respect to each element of U is: Uia C R 1X r=1 X j|ij2S Tij(k)h ⇣ T r ij( ir to the best of our knowledge they have not yet been used for one- tering. anking Based Deviation Functions. The scores computed by recommen used to personally rank all items for every user. Therefore, Rendle 2009] argued that it is natural to directly optimize the ranking. M aim to maximize the area under the ROC curve (AUC), which is AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui>0 X Ruj=0 (Sui > Suj), ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin odel S are more in line with the observed data R. However, beca on-differentiable, their deviation function is a differentiable app gative AUC from which constant factors have been removed and ation term has been added: D ⇣ S, ˜R ⌘ = X u2U X Rui>0 X Ruj=0 log (Suj Sui) 1||S(1) ||2 F 2||S(2) | the sigmoid function and 1, 2 regularization constants, which ing-based deviation functions. Rendle et al. propose to use exactl n function as in Equation 21 to optimize the AUC [Rendle et al. rence is that for computing S, RS(2) is used instead of S(1) S(2) , i.e. r matrix is unknown. Because S(2) can be interpreted as a item-si call this method BPR-kNN. olli [Aiolli 2014] on the other hand, chooses the user-based alterna ith ¯R the column normalized version of R and S(1) row normalized lds that 1  Sui 1 since Sui = S (1) u⇤ ¯R⇤i with ||S (1) u⇤ ||  1 and || ¯R⇤i vidual user u 2 U, he starts from AUCu, the AUC for u: AUCu = 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj). , he proposes a lower bound on AUCu: AUCu 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 , interprets it as a weighted sum of margins Sui Suj 2 between an s and any absent feedback, in which every margin gets the same w[Aiolli 2014]
  • 106. Max-Min-Margin average à min total u2U i2I (Rui Sui) 2 + t=1 f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) FERENCES Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 273–280. bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy 293–296. . Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. angelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. olo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system 39–46. [Aiolli 2014]
  • 107. Max-Min-Margin average à min total u2U i2I (Rui Sui) 2 + t=1 f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) FERENCES Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 273–280. bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy 293–296. . Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. angelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. olo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system 39–46. X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms o top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system[Aiolli 2014]
  • 108. Max-Min-Margin add regularization u2U i2I (Rui Sui) 2 + t=1 f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) FERENCES Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 273–280. bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy 293–296. . Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. angelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. olo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system 39–46. X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F X u2U X i2I (Rui Sui) 2 + TX t=1 FX f=1 tf ||S(t,f) ||2 F + ||S(t,f) ||1 max 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 Sui Suj 2 max min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms o top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender system irs that are difficult to rank correctly. Therefore, Aiolli proposes to replace the uni- m weighting with a weighting scheme that minimizes the total margin. Specifically, propose to solve for every user u, the joint optimization problem S (1) u⇤ = arg max S (1) u min ↵u⇤ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj), here for every user u, it holds that P Rui=1 ↵ui = 1 and P Rui=0 ↵ui = 1. To avoid erfitting of ↵, he adds two regularization terms: S (1) u⇤ = arg max S (1) u min ↵u⇤ 0 @ X Rui=1 X Ruj =0 ↵ui↵uj(Sui Suj) + p X Rui=1 ↵2 ui + n X Rui=0 ↵2 ui 1 A , th p, n regularization hyperparameters. S(1) is regularized by means of the row- rmalization constraint. Solving the above maximization for every user, is equivalent minimizing the deviation function ⇣ S, ˜R ⌘ = X u2U 0 @max ↵u⇤ 0 @ X Rui=1 X Ruj =0 ↵ui↵uj(Suj Sui) p X Rui=1 ↵2 ui n X Rui=0 ↵2 ui 1 A 1 A . (29) otice that this approach corresponds to te AMAN assumption. . Group 3: Three Factor Matrices, One Factor Matrix A Priori Unknown ere are also algorithms which model S with 3 factor matrices: [Aiolli 2014]
  • 109. Convex unique minimum Rui=1 X Rui=1 log Sui = X Rui=1 log Sui n D (S, R) for Very Large Scale Binary Rated Datasets. In RecSys. top-N recommendation with implicit feedback. In RecSys. ecommendation. In WebMine. 142–160. hine Learning. Springer, New York, NY. 2014. Hoslim: Higher-order sparse linear method for top-n ledge Discovery and Data Mining. Springer, 38–49. urrin. 2010. Performance of recommender algorithms on s of the fourth ACM conference on Recommender systems. d Top-N Recommendation Algorithms. TOIS 22, 1 (2004), hensive Survey of Neighborhood-based Recommendation ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). hirani. 2010. Regularization paths for generalized linear tistical software 33, 1 (2010), 1. en PLSA and NMF and implications. In SIGIR. 601–602. Indexing. In SIGIR. 50–57. ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1 Mining and Knowledge Discovery Handbook, O. Mainmon Y. Analytically computable for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KN every row S (1) u. and (S(1,1) , . . . , S(T,F ) ) REFERENCES
  • 110. Nearest Neighbors user- or item-similarity [Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008] [Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002] [Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012]
  • 111. Nearest Neighbors similarity measures K. Verstrepen et al. X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(v, u) · |KNN (v) {u}| S(2) vu ⌘2 t Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. vex AUC optimization for top-N recommendation with implicit feedback. In RecSys. asher. 2006. Contextual Recommendation. In WebMine. 142–160. tern Recognition and Machine Learning. Springer, New York, NY. ulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n [Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008] [Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002] [Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012] 1:34 K. V X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(1) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| S(3) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U
  • 112. Nearest Neighbors similarity measures K. Verstrepen et al. X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(v, u) · |KNN (v) {u}| S(2) vu ⌘2 t Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. vex AUC optimization for top-N recommendation with implicit feedback. In RecSys. asher. 2006. Contextual Recommendation. In WebMine. 142–160. tern Recognition and Machine Learning. Springer, New York, NY. ulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n K. Verstrepen et al. X I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X v2U ⇣ sim(v, u) · |KNN (v) {u}| S(2) vu ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. C optimization for top-N recommendation with implicit feedback. In RecSys. 1:34 X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(2) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Ra 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with imp 293–296. 1:34 K. Verstrepen X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(2) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In R 1:34 X i2I X j2I ⇣ sim X u2U X v2U ⇣ sim S (2) ji = S(2) uv = for all i, j 2 I for all u, v 2 U REFERENCES F. Aiolli. 2013. Efficient Top-N Recomme 273–280. [Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008] [Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002] [Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012] 1:34 K. V X u2U X Rui=1 X Ruj =0 (Suj + 1 Sui) r>(Suj | {Suk | Ruk = 0}) AUC = 1 |U| X u2U 1 |u| · (|I| |u|) X Rui=1 X Ruj =0 (Sui > Suj), X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(1) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| S(3) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U
  • 113. Nearest Neighbors unified K. Verstrepen et al. X I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X v2U ⇣ sim(v, u) · |KNN (v) {u}| S(2) vu ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. C optimization for top-N recommendation with implicit feedback. In RecSys. 1:34 K. Verstrepen X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(2) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In R 1:34 X i2I X j2I ⇣ sim X u2U X v2U ⇣ sim S (2) ji = S(2) uv = for all i, j 2 I for all u, v 2 U REFERENCES F. Aiolli. 2013. Efficient Top-N Recomme 273–280. 1:34 K X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(2) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| S(3) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |KNN (u [Verstrepen and Goethals 2014] X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(1) uv ⌘2 S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| S(3) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(3) uv ⌘2 every row S (1) u. and every column S (2) .i the same unit vector O(|U| ⇥ |I|) O(|R|) O(|R| ⇥ |I|)
  • 114. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  • 120. Pearsson Correlation not applicable cally, our case of binary, positive-only data is just a special case of ra = Bh = 1. However, collaborative filtering algorithms for rating da build on the implicit assumption that Bl < Bh, i.e. that both positive a back is available. Since this negative feedback is not available in our t is not surprising that, in general, algorithms for rating data gene nonsensical results [Hu et al. 2008; Pan et al. 2008]. algorithms for rating data, for example, often use the Pearson corre as a similarity measure. The Pearson correlation coefficient betwee given by pcc(u, v) = P Ruj ,Rvj >0 (Ruj Ru)(Rvj Rv) r P Ruj ,Rvj >0 (Ruj Ru)2 r P Ruj ,Rvj >0 (Rvj Rv)2 , and Rv the average rating of u and v respectively. In our setting, wit only data however, Ruj and Rvj are by definition always one. Cons and Rv are always one. Therefore, the Pearson correlation is alway d (zero divided by zero), making it a useless similarity measure fo only data. Even if we would hack it by omitting the terms for mean c d Rv, it is still useless since it would always be equal to either one o
  • 121. Pearsson Correlation not applicable cally, our case of binary, positive-only data is just a special case of ra = Bh = 1. However, collaborative filtering algorithms for rating da build on the implicit assumption that Bl < Bh, i.e. that both positive a back is available. Since this negative feedback is not available in our t is not surprising that, in general, algorithms for rating data gene nonsensical results [Hu et al. 2008; Pan et al. 2008]. algorithms for rating data, for example, often use the Pearson corre as a similarity measure. The Pearson correlation coefficient betwee given by pcc(u, v) = P Ruj ,Rvj >0 (Ruj Ru)(Rvj Rv) r P Ruj ,Rvj >0 (Ruj Ru)2 r P Ruj ,Rvj >0 (Rvj Rv)2 , and Rv the average rating of u and v respectively. In our setting, wit only data however, Ruj and Rvj are by definition always one. Cons and Rv are always one. Therefore, the Pearson correlation is alway d (zero divided by zero), making it a useless similarity measure fo only data. Even if we would hack it by omitting the terms for mean c d Rv, it is still useless since it would always be equal to either one o 1   1   1   1  
  • 122. Pearsson Correlation not applicable cally, our case of binary, positive-only data is just a special case of ra = Bh = 1. However, collaborative filtering algorithms for rating da build on the implicit assumption that Bl < Bh, i.e. that both positive a back is available. Since this negative feedback is not available in our t is not surprising that, in general, algorithms for rating data gene nonsensical results [Hu et al. 2008; Pan et al. 2008]. algorithms for rating data, for example, often use the Pearson corre as a similarity measure. The Pearson correlation coefficient betwee given by pcc(u, v) = P Ruj ,Rvj >0 (Ruj Ru)(Rvj Rv) r P Ruj ,Rvj >0 (Ruj Ru)2 r P Ruj ,Rvj >0 (Rvj Rv)2 , and Rv the average rating of u and v respectively. In our setting, wit only data however, Ruj and Rvj are by definition always one. Cons and Rv are always one. Therefore, the Pearson correlation is alway d (zero divided by zero), making it a useless similarity measure fo only data. Even if we would hack it by omitting the terms for mean c d Rv, it is still useless since it would always be equal to either one o 1   1   1   1   1   1   1   1  
  • 123. Different Neighborhood trivial solutions 9. SYMBOLS FOR U I R REFERENCES F. Aiolli. 2013. Effic 273–280. Fabio Aiolli. 2014. C 293–296. ?
  • 124. lgorithms for rating data typically find the k users (ite ) and that have rated i (have been rated by u) [Desrosie t al. 2011]. On bpo data, this approach results in the n or every (u, i)-pair. Also the matrix factorization methods for rating data o bpo data. Take for example a basic loss function for ata: min S(1),S(2) X Rui>0 ⇣ Rui S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· which for bpo data simplifies to min S(1),S(2) X Rui>0 ⇣ 1 S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· || he squared error term of this loss function is minimize f S(1) and S(2) respectively are all the same unit vector. Matrix Factorization # trivial solutions = inf
  • 125. lgorithms for rating data typically find the k users (ite ) and that have rated i (have been rated by u) [Desrosie t al. 2011]. On bpo data, this approach results in the n or every (u, i)-pair. Also the matrix factorization methods for rating data o bpo data. Take for example a basic loss function for ata: min S(1),S(2) X Rui>0 ⇣ Rui S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· which for bpo data simplifies to min S(1),S(2) X Rui>0 ⇣ 1 S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· || he squared error term of this loss function is minimize f S(1) and S(2) respectively are all the same unit vector. Matrix Factorization # trivial solutions = inf
  • 126. lgorithms for rating data typically find the k users (ite ) and that have rated i (have been rated by u) [Desrosie t al. 2011]. On bpo data, this approach results in the n or every (u, i)-pair. Also the matrix factorization methods for rating data o bpo data. Take for example a basic loss function for ata: min S(1),S(2) X Rui>0 ⇣ Rui S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· which for bpo data simplifies to min S(1),S(2) X Rui>0 ⇣ 1 S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· || he squared error term of this loss function is minimize f S(1) and S(2) respectively are all the same unit vector. Matrix Factorization # trivial solutions = inf 1  
  • 127. lgorithms for rating data typically find the k users (ite ) and that have rated i (have been rated by u) [Desrosie t al. 2011]. On bpo data, this approach results in the n or every (u, i)-pair. Also the matrix factorization methods for rating data o bpo data. Take for example a basic loss function for ata: min S(1),S(2) X Rui>0 ⇣ Rui S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· which for bpo data simplifies to min S(1),S(2) X Rui>0 ⇣ 1 S (1) u· S (2) ·i ⌘2 + ⇣ ||S (1) u· || he squared error term of this loss function is minimize f S(1) and S(2) respectively are all the same unit vector. Matrix Factorization # trivial solutions = inf 1   S(2) uv = sim(u, v) · |KNN (u) {v}| S(3) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |K every row S (1) u. and every column S (2) .i the same unit vector REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with i 293–296.
  • 128. Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference •  Netflix
  • 129. SGD mostly prohibitive every row S (1) u. and every column S (2) .i the same unit vec O(|U| ⇥ |I|) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In We C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o recommender systems. In Advances in Knowledge Discovery and Da Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performa top-n recommendation tasks. In Proceedings of the fourth ACM co 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendat 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Nei max X Rui=1 log Sui min X Rui=1 log Sui D (S, R) = X Rui=1 log Sui min D (S, R) ficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. . Pattern Recognition and Machine Learning. Springer, New York, NY. kopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on endation tasks. In Proceedings of the fourth ACM conference on Recommender systems. d G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004), G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based Recommendation ecommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). on, MA. Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear ordinate descent. Journal of statistical software 33, 1 (2010), 1. . Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57. . 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1 5. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon h (Eds.). Springer, New York, NY. S (2) ji = sim(j, i) · |KNN (j) {i} S(2) uv = sim(u, v) · |KNN (u) {v} S(3) uv = sim(u, v) · |KNN (u) {v} for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u every row S (1) u. and every column S (2) .i the same unit vect O(|U| ⇥ |I|) O(d3 (|U| + |I|) + d2 |R| (S(1,1) , . . . , S(T,F ) ) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In We C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o recommender systems. In Advances in Knowledge Discovery and Da Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performa top-n recommendation tasks. In Proceedings of the fourth ACM co 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendat S (2) ji = sim(j, i) · |KNN (j) {i}| S(2) uv = sim(u, v) · |KNN (u) {v}| S(3) uv = sim(u, v) · |KNN (u) {v}| for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(3 uv ⌘2 every row S (1) u. and every column S (2) .i the same unit vector O(|U| ⇥ |I|) O(d3 (|U| + |I|) + d2 |R| (S(1,1) , . . . , S(T,F ) ) ⌘ · rD(S, R) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys. uv for all i, j 2 I for all u, v 2 U X i2I X j2I ⇣ sim(j, i) · |KNN (j) {i}| S (2) ji ⌘2 + X u2U X v2 every row S (1) u. and every column S (2) .i the same O(|U| ⇥ |I|) O(d3 (|U| + |I|) + d (S(1,1) , . . . , S(T,F ) ) ⌘ · rD(S, R) = REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very La start finish along the way
  • 130. O(|U| ⇥ |I|) O(d3 (|U| + |I|) + d2 |R| ⌘ · rD(S, R) Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys. ex AUC optimization for top-N recommendation with implicit feedback. In RecSys. sher. 2006. Contextual Recommendation. In WebMine. 142–160. rn Recognition and Machine Learning. Springer, New York, NY. lou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n ms. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. da Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on ion tasks. In Proceedings of the fourth ACM conference on Recommender systems. arypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004), arypis. 2011. A Comprehensive Survey of Neighborhood-based Recommendation mender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). A. or Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear te descent. Journal of statistical software 33, 1 (2010), 1. O(d3 (|U| + |I|) + d2 |R| (S(1,1) , . . . , S(T,F ) ) ⌘ · rD(S, R) = REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated D 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit f 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160 C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse line recommender systems. In Advances in Knowledge Discovery and Data Mining. Spri Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recomme top-n recommendation tasks. In Proceedings of the fourth ACM conference on Rec 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-base Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and O(|U| ⇥ |I|) O(d3 (|U| + |I|) + d2 |R| (S(1,1) , . . . , S(T,F ) ) ⌘ · rD(S, R) = rD(S, R) = r X u2U X i2I Rui=1 Dui(S, R) = X u2U X i2I Rui=1 rDui(S, R) rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) REFERENCES . Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I 273–280. abio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. I 293–296. .S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. = sim(j, i) · |KNN (j) {i}| sim(u, v) · |KNN (u) {v}| sim(u, v) · |KNN (u) {v}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(3 uv ⌘2 n S (2) .i the same unit vector O(|U| ⇥ |I|) ndation for Very Large Scale Binary Rated Datasets. In RecSys. tion for top-N recommendation with implicit feedback. In RecSys. xtual Recommendation. In WebMine. 142–160. nd Machine Learning. Springer, New York, NY. arypis. 2014. Hoslim: Higher-order sparse linear method for top-n n Knowledge Discovery and Data Mining. Springer, 38–49. S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(3 uv ⌘2 S (2) .i the same unit vector O(|U| ⇥ |I|) O(|R|) d3 (|U| + |I|) + d2 |R| ⌘ · rD(S, R) X 2I i=1 Dui(S, R) = X u2U X i2I Rui=1 rDui(S, R) X X X O(|R|) O(d3 (|U| + |I|) + d2 |R| (S(1,1) , . . . , S(T,F ) ) ⌘ · rD(S, R) = rD(S, R) = r X u2U X i2I Rui=1 Dui(S, R) = X u2U X i2I Rui=1 rDui(S, R) rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160. O(|U| ⇥ |I|) O(|R|) O(d3 (|U| + |I|) + d2 |R| (S(1,1) , . . . , S(T,F ) ) ⌘ · rD(S, R) = rD(S, R) = r X u2U X i2I Rui=1 Dui(S, R) = X u2U X i2I Rui=1 rDui(S, R) rD(S, R) = r X u2U X i2I Dui(S, R) = X u2U X i2I rDui(S, R) rD(S, R) = r X u2U X i2I Rui=1 X j2I Duij(S, R) = X u2U X i2I Rui=1 X j2I rDuij(S, R) REFERENCES = sim(j, i) · |KNN (j) {i}| sim(u, v) · |KNN (u) {v}| sim(u, v) · |KNN (u) {v}| S (2) ji ⌘2 + X u2U X v2U ⇣ sim(u, v) · |KNN (u) {v}| S(3 uv ⌘2 n S (2) .i the same unit vector O(|U| ⇥ |I|) O(|R|) O(|R| ⇥ |I|) O(d3 (|U| + |I|) + d2 |R| x1000   SGD mostly prohibitive every row S (1) u. and every column S (2) .i the same unit vec O(|U| ⇥ |I|) REFERENCES F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation 293–296. S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In We C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o recommender systems. In Advances in Knowledge Discovery and Da Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performa top-n recommendation tasks. In Proceedings of the fourth ACM co 39–46. M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendat 143–177. C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Nei [Shi et al. 2012]
  • 131. ALS if possible O(|U| ⇥ |I|) O(d3 (|U| + |I|) + d2 |R| ERENCES lli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rate 73–280. Aiolli. 2014. Convex AUC optimization for top-N recommendation with implic 93–296. nand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142– Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, gelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse ecommender systems. In Advances in Knowledge Discovery and Data Mining. S Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recom op-n recommendation tasks. In Proceedings of the fourth ACM conference on 9–46. shpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorith 43–177. srosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-b Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, ch makes the algorithm less attractive for bpo data. herefore, algorithms for bpo data typically use a variant of the alternating ares (ALS) method if the deviation function allows it [Koren et al. 2009; Hu 8]. In this respect, the deviation functions 17 and 18 are appealing because be minimized with a variant of the alternating least squares (ALS) method. example the deviation function from equation 17 D (S, R) = X u2U X i2I Wui (Rui Sui) 2 + ⇣ ||S(1) ||F + ||S(2) ||F ⌘ , = X u2U X i2I Wui ⇣ Rui S (1) u⇤ S (2) ⇤i ⌘2 + ⇣ ||S(1) ||F + ||S(2) ||F ⌘ . most deviation functions, this deviation function is non-convex in the param tained in S(1) and S(2) and has therefore multiple local optima. However, i mporarily fixes the parameters in S(1) , it becomes convex in S(2) and we can an y find updated values for S(2) that minimize this convex function and are ther ranteed to reduce D (S, R). Subsequently, one can temporarily fix the paramet and in the same way compute updated values for S(1) that are also guarante uce D (S, R). One can keep alternating between fixing S(1) and S(2) until a co ce criterium of choice is met. Hu et al. [Hu et al. 2008], Pan et al. [Pan et al. Pan and Scholz [Pan and Scholz 2009] give a detailed descriptions of possible cedure. The description by Hu et al. contains optimizations for the case in w sing preferences are uniformly weighted. Pan and Scholz [Pan and Scholz fix  –  solve   solve  –  fix   fix  –  solve   solve  –  fix   fix  –  solve   solve  –  fix   …         [Hu et al. 2008] [Pan et al. 2008] [Pan and Scholz 2009] [Pilászy et al. 2010] [Zhou et al. 2008] [Yao et al. 2014] [Takàcs and Tikk 2012]
  • 132. SGD with Sampling if necessary •  uniform pdf •  uniform pdf+ bagging •  pdf ~ popularity •  pdf ~ gradient size •  discard samples until large gradient is encountered [Rendle et al. 2009] [Pan and Scholz 2009] [Rendle and Freudenthaler 2014] [Rendle and Freudenthaler 2014] [Weston et al. 2013]
  • 133. Others •  expectation maximization •  cyclic coordinate descent •  quadratic programming •  direct computation •  Variational Inference [Hofmann 2004, Hofmann 1999] [Ning and Karypis 2012] [Christakopoulou and Karypis 2014] [Aiolli 2014] [Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008] [Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002] [Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012] [Verstrepen and Goethals 2014] [Verstrepen and Goethals 2015] [Koeningstein et al. 2012] [Paquet and Koeningstein 2013]
  • 135. References 1:36 K. Verstrepen et al. Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna- tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57. Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa- tion Systems (TOIS) 22, 1 (2004), 89–115. Frank H¨oppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, Oded Mainmon and Lior Rokach (Eds.). Springer, New York, NY. Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272. Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys- tems: an introduction. Cambridge University Press. Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-N Recommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on. IEEE, 167–174. Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec- ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 659–667. Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 281–284. Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand- book. Springer, 145–186. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender max log Y u2U Y i2I S↵Rui ui (1 Sui) log Y u2U Y i2I S↵Rui ui (1 Sui) X u2U X i2I ↵Rui log Sui + log(1 Sui) + ⇣ ||S(1) ||2 F + ||S(2) ||2 F ⌘ REFERENCES Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 273–280. Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed- ings of the 8th ACM Conference on Recommender systems. ACM, 293–296. Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer. C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY. Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top- n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 39–46. Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans- actions on Information Systems (TOIS) 22, 1 (2004), 143–177. Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen- dation methods. In Recommender systems handbook. Springer, 107–144. Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33, 1 (2010), 1. Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 601–602. ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
  • 136. References Frank H¨oppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, Oded Mainmon and Lior Rokach (Eds.). Springer, New York, NY. Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272. Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys- tems: an introduction. Cambridge University Press. Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-N Recommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on. IEEE, 167–174. Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec- ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 659–667. Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 281–284. Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand- book. Springer, 145–186. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37. Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule mining for recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105. Hao Ma. 2013. An experimental study on implicit social recommendation. In Proceedings of the 36th inter- national ACM SIGIR conference on Research and development in information retrieval. ACM, 73–82. Guilherme Vale Menezes, Jussara M Almeida, Fabiano Bel´em, Marcos Andr´e Gonc¸alves, An´ısio Lacerda, Edleno Silva De Moura, Gisele L Pappa, Adriano Veloso, and Nivio Ziviani. 2010. Demand-driven tag recommendation. In Machine Learning and Knowledge Discovery in Databases. Springer, 402–417. Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. 2001. Effective personalization based on association rule discovery from web usage data. In Proceedings of the 3rd international workshop on Web information and data management. ACM, 9–15. Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 497–506. Rong Pan and Martin Scholz. 2009. Mind the gaps: weighting the unknown in large-scale one-class col- laborative filtering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 667–676. Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One- class collaborative filtering. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 502–511. Ulrich Paquet and Noam Koenigstein. 2013. One-class collaborative filtering with random graphs. In Pro- ceedings of the 22nd international conference on World Wide Web. International World Wide Web Con- ferences Steering Committee, 999–1008. Istv´an Pil´aszy, D´avid Zibriczky, and Domonkos Tikk. 2010. Fast als-based matrix factorization for explicit and implicit feedback datasets. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 71–78. Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendation from implicit feedback. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 273–282. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncer- tainty in Artificial Intelligence. AUAI Press, 452–461. Jasson DM Rennie and Nathan Srebro. 2005. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd international conference on Machine learning. ACM, 713–719. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2000. Analysis of recommendation al- gorithms for e-commerce. In Proceedings of the 2nd ACM conference on Electronic commerce. ACM, 158–167.
  • 137. ReferencesBinary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:37 Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. ACM, 285–295. Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver, and Alan Hanjalic. 2012. CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 139–146. Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Computing Surveys (CSUR) 47, 1 (2014), 3. B¨orkur Sigurbj¨ornsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowl- edge. In Proceedings of the 17th international conference on World Wide Web. ACM, 327–336. Vikas Sindhwani, Serhat S Bucak, Jianying Hu, and Aleksandra Mojsilovic. 2010. One-class matrix comple- tion with low-density factorizations. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 1055–1060. Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. 2004. Maximum-margin matrix factorization. In Advances in neural information processing systems. 1329–1336. G´abor Tak´acs and Domonkos Tikk. 2012. Alternating least squares for personalized ranking. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 83–90. Lyle H Ungar and Dean P Foster. 1998. Clustering methods for collaborative filtering. In AAAI workshop on recommendation systems, Vol. 1. 114–129. Matthijs van Leeuwen and Diyah Puspitaningrum. 2012. Improving tag recommendation using few associ- ations. In Advances in Intelligent Data Analysis XI. Springer, 184–194. Koen Verstrepen and Bart Goethals. 2014. Unifying nearest neighbors collaborative filtering. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 177–184. Koen Verstrepen and Bart Goethals. 2015. Top-N recommendation for Shared Accounts. In Proceedings of the 9th ACM Conference on Recommender systems. ACM. Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, Vol. 11. 2764–2770. Jason Weston, Ron J Weiss, and Hector Yee. 2013a. Nonlinear latent factorization by embedding multiple user interests. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 65–68. Jason Weston, Hector Yee, and Ron J Weiss. 2013b. Learning to rank recommendations with the k-order statistic loss. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 245–248. Yuan Yao, Hanghang Tong, Guo Yan, Feng Xu, Xiang Zhang, Boleslaw K Szymanski, and Jian Lu. 2014. Dual-regularized one-class collaborative filtering. In Proceedings of the 23rd ACM International Confer- ence on Conference on Information and Knowledge Management. ACM, 759–768. Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the netflix prize. In Algorithmic Aspects in Information and Management. Springer, 337– 348. Received February 2014; revised March 2015; accepted June 2015