SlideShare a Scribd company logo
Matrix Factorizations for
Recommender Systems on Implicit
Data
Li-Yen Kuo and Ming-Syan Chen
National Taiwan University.
郭立言 Kuo, Li-Yen
NetDB, EE Dept, National Taiwan University
Python, JuliaLang, Matlab
Unsupervised Learning, Recommender Systems,
Bayesian Graphs, GANs, Adversarial Training
Matrix Factorizations for Recommender Systems on Implicit Data
INTRODUCTION
5
The Internet changes business model and user behaviors.
Clayton Christensen, a Harvard professor, said that Blockbuster’s ignorance and
laziness was its own undoing. Netflix began serving a ‘niche market’ and slowly
began to take over Blockbuster’s entire market [1].
THE INTERNET CHANGES OUR LIFE
[1]	https://medium.com/@ScAshwin/the-rise-of-netflix-and-the-fall-of- 3blockbuster-29e5457339b7
Source:	https://www.dailymail.co.uk/sciencetech/article-
5301869/Website-Flixable-makes-easier-browse-Netflix.html
Source:	https://techcrunch.com/2018/07/13/theres-now-just-
one-blockbuster-remaining-in-the-u-s/
6
Recommender systems are widely used in many popular commercial on-line systems.
HOW TO PROVIDE RECOMMENDATION?
Editorial Systems
hand-curated
Global RecSys
simple statistics popularity
Personalized RecSys
tailored to Individuals, e.g., Amazon and Netflix
7
小編很辛苦...QQ
Editorial Systems
hand-curated
Global RecSys
simple statistics popularity
Personalized RecSys
tailored to Individuals, e.g., Amazon and Netflix
Recommender systems are widely used in many popular commercial on-line systems.
HOW TO PROVIDE RECOMMENDATION?
8
Editorial Systems
hand-curated
Global RecSys
simple statistics popularity
Personalized RecSys
tailored to Individuals, e.g., Amazon and Netflix
Popular	/	latest	items	may	attract	most	of	people.	
But	NOT	EVERYONE!	(80/20	rule)
Recommender systems are widely used in many popular commercial on-line systems.
HOW TO PROVIDE RECOMMENDATION?
9
Editorial Systems
hand-curated
Global RecSys
simple statistics popularity
Personalized RecSys
tailored to Individuals, e.g., Amazon and Netflix
It	can	explore	the	tastes	of	the	rest	20%.	But	it	difficult.
Recommender systems are widely used in many popular commercial on-line systems.
HOW TO PROVIDE RECOMMENDATION?
10
Over 75% of what people watch comes from a recommendation.
EVERYTHING IS PERSONALIZED
Ranking
Row:	category
11
Ariana Grande
Lady Gaga
Coldplay
Miles Davis
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
L. Beethoven
Item 5
Item 1 Item 2 Item 3 Item 4 Item 5
User 1
User 2
User 3
User 4
User-item relationship can be represented by a utility matrix.
The meaning of an entry depends on the scenario.
DEFINITION OF UTILITY MATRIX
12TWO TYPES OF UTILITY MATRIX
utility matrix
explicit
implicit
binary
count
rating
13
In rating systems, such as MovieLens [1] and Allmusic [2], the value of an entry
denotes the rating of the item given by the user.
UTILITY MATRIX ON EXPLICIT RATINGS
Ariana Grande
Lady Gaga
Coldplay
Miles Davis
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
L. Beethoven
Item 5
3
4
5
4
3
4
1
3
5
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 3 4 ? ? ?
User 2 ? 5 4 ? 3
User 3 ? ? 4 1 ?
User 4 ? ? ? 3 5
[1]	https://movielens.org/
[2]	https://www.allmusic.com/
14
Considering personal offset is important. Ratings can explicitly reflect the preference of
an individual.
Users always give ratings on what they are familiar with.
UTILITY MATRIX ON EXPLICIT RATINGS
Ariana Grande
Lady Gaga
Coldplay
Miles Davis
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
L. Beethoven
Item 5
3
4
5
4
3
4
1
3
5
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 3 4 ? ? ?
User 2 ? 5 4 ? 3
User 3 ? ? 4 1 ?
User 4 ? ? ? 3 5
[1]	https://movielens.org/
[2]	https://www.allmusic.com/
15
Ariana Grande
Lady Gaga
Coldplay
Miles Davis
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
L. Beethoven
Item 5
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 1 1 ? ? ?
User 2 ? 1 1 ? 1
User 3 ? ? 1 1 ?
User 4 ? ? ? 1 1
For instance, in a music podcast service, the value of an entry may denote the
subscription.
UTILITY MATRIX ON IMPLICIT BINARY DATA
16UTILITY MATRIX ON IMPLICIT COUNT DATA
Ariana Grande
Lady Gaga
Coldplay
Miles Davis
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
L. Beethoven
Item 5
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 ? ? ?
User 2 ? 33 17 ? 10
User 3 ? ? 5116 72 ?
User 4 ? ? ? 6 3942
Or it may denote the play count.
120
242
33
17
10
5116
72
6
3942
17EXPLICIT OR IMPLICIT?
Lady Gaga
Coldplay
User 2
Item 2
Item 3
L. Beethoven
Item 5
Explicit Feedback
Has negative feedback.
Neglect noise.
Reflect preference.
5
4
3
Lady Gaga
Coldplay
User 2
Item 2
Item 3
L. Beethoven
Item 5
33
17
10
543
neutral
desirableundesirable
Implicit Feedback
No negative feedback.
Comes with noise.
Reflect confidence rather than preference.
0
positive & incremental
18GOAL OF RECSYS
Ariana Grande
Lady Gaga
Coldplay
Miles Davis
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
L. Beethoven
Item 5
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 0 0 0
User 2 0 33 17 0 10
User 3 0 0 5116 72 ?
User 4 0 0 0 6 3942
A principal goal in recommender systems is to retrieve unconsumed items that the
target user would likely consume in the future.
MATRIX
FACTORIZATION
20MATRIX FACTORIZATION FOR RECSYS
The most well-known method for recommendation is matrix factorization (MF).
MF is a class of collaborative filtering (CF) algorithms.
Rec
Sys CF MF
21
Why do we need MF? Let’s see an example below.
Ariana Grande
Lady Gaga
Sheeran Ed
Coldplay
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
Linkin Park
Item 5
The 1975
Item 6
MATRIX FACTORIZATION FOR RECSYS
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
22
Rock
Pop Music
The fact that similar users consume similar items can be represented by a latent
factor.
Ariana Grande
Lady Gaga
Sheeran Ed
Coldplay
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
Linkin Park
Item 5
The 1975
Item 6
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
MATRIX FACTORIZATION FOR RECSYS
23
Rock
Pop Music
A user would likely consume an item involving the same latent factor with her.
Ariana Grande
Lady Gaga
Sheeran Ed
Coldplay
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
Linkin Park
Item 5
The 1975
Item 6
MATRIX FACTORIZATION FOR RECSYS
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
24
When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the
optimizer will preserve the information embedded in 𝐗 as much as possible.
Hence, MF factorizes a utility matrix to acquire latent factors.
≈ ×
𝐗 𝜽
𝜷
MATRIX FACTORIZATION FOR RECSYS
25
When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the
optimizer will preserve the information embedded in 𝐗 as much as possible.
Hence, MF factorizes a utility matrix to acquire latent factors.
≈ ×
𝐗 𝜽
𝜷
MATRIX FACTORIZATION FOR RECSYS
𝑥'' ≈ 𝜽'
(
𝜷' ⇒ 𝑝 𝑥'' 𝜽'
(
𝜷'
26
When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the
optimizer will preserve the information embedded in 𝐗 as much as possible.
Hence, MF factorizes a utility matrix to acquire latent factors.
≈ ×
𝐗 𝜽
𝜷
MATRIX FACTORIZATION FOR RECSYS
𝑥'+ ≈ 𝜽'
(
𝜷+ ⇒ 𝑝 𝑥'+ 𝜽'
(
𝜷+
27
When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the
optimizer will preserve the information embedded in 𝐗 as much as possible.
Hence, MF factorizes a utility matrix to acquire latent factors.
≈ ×
𝐗 𝜽
𝜷
MATRIX FACTORIZATION FOR RECSYS
𝑥+' ≈ 𝜽+
(
𝜷' ⇒ 𝑝 𝑥+' 𝜽+
(
𝜷'
28
When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the
optimizer will preserve the information embedded in 𝐗 as much as possible.
Hence, MF factorizes a utility matrix to acquire latent factors.
≈ ×
𝐗 𝜽
𝜷
MATRIX FACTORIZATION FOR RECSYS
∀ 𝑢, 𝑖 , 𝑥01 ≈ 𝜽0
(
𝜷1 ⇒ 2 𝑝 𝑥01 𝜽0
(
𝜷1
0,1
⇒ 4 log 𝑝 𝑥01 𝜽0
(
𝜷1
0,1
29MATRIX FACTORIZATION FOR RECSYS
User 1
User 2
User 3
User 4
Item
1
Item
2
Item
3
Item
4
Item
5
Item
6
× =
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
represent
approximate
Rock
Pop Music
Ariana Grande
Lady Gaga
Sheeran Ed
Coldplay
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
Linkin Park
Item 5
The 1975
Item 6
30MATRIX FACTORIZATION FOR RECSYS
User 1
User 2
User 3
User 4
Item
1
Item
2
Item
3
Item
4
Item
5
Item
6
× =
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
User 1
User 2
User 3
User 4
represent
approximate
Rock
Pop Music
Ariana Grande
Lady Gaga
Sheeran Ed
Coldplay
User 1
User 2
User 3
User 4
Item 1
Item 2
Item 3
Item 4
Linkin Park
Item 5
The 1975
Item 6
MF ON IMPLICIT
COUNT DATA
32
MFUnsupervisedSupervised
MODELING IMPLICIT COUNT DATA
A machine learning method can somehow represented by the 3-tiers structure.
Objective
function
Model
Data
𝑝 𝑦 𝑓 𝑥
𝑓 ;
𝑥, 𝑦
𝑝 𝑥 𝑓
𝑓
𝑥
𝑝 𝑋 𝜽𝜷(
𝜽𝜷(
𝑋
LTR
𝑝 ≻ 𝑓 𝑖, 𝑗
𝑖 ≻ 𝑗
𝑓 ;
33MODELING IMPLICIT COUNT DATA
Unlike ratings, user preference is extracted implicitly from these count data.
Accordingly, we have 4 assumptions as the prior knowledge.
Based on some of these assumptions, various objectives are proposed.
1 Most of unconsumed items are undesirable for a target user.
2 Items consumed frequently are more desirable than those which
consumed occasionally.
3 The value of an entry is assumed to be over a Poisson.
4 The value of an entry is assumed to be over a negative binomial.
5 Items are exposed to the target user before she/he consumes them.
34MODELING IMPLICIT COUNT DATA
Assumption 1.1
Assuming that most of unconsumed items are undesirable for a target user, we set
these unobserved entries to zero, namely, consuming 0 times.
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 0 0 0
User 2 0 33 17 0 10
User 3 0 0 5116 72 0
User 4 0 0 0 6 3942
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 ? ? ?
User 2 ? 33 17 ? 10
User 3 ? ? 5116 72 ?
User 4 ? ? ? 6 3942
35MODELING IMPLICIT COUNT DATA
Assumption 1.2
Assuming that most of unconsumed items are undesirable for a target user, we rank
consumed items higher than unconsumed ones.
User 1
{Item 1, Item 2} ≻ {Item 3, Item 4, Item 5}
User 2
{Item 2, Item 3, Item5} ≻ {Item 1, Item 4}
User 3
{Item 3, Item 4} ≻ {Item 1, Item 2, Item 5}
User 4
{Item 4, Item5} ≻ {Item 1, Item 2, Item 5}
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 ? ? ?
User 2 ? 33 17 ? 10
User 3 ? ? 5116 72 ?
User 4 ? ? ? 6 3942
36BAYESIAN PERSONALIZED RANKING (BPR)
By using assumption 1.2, in BPR (Rendel et al., 2009), the consumed-or-not problem
is considered as a pairwise learning to rank.
[Rendel et	al.]	Rendle,	Steffen,	et	al.	BPR:	Bayesian	personalized	ranking	from	implicit	feedback.	AUAI	Press,	2009.
Objective
function
Model
Data
MF or K-NN
𝑖 ≻0 𝑗
37
Model MF:
Objective
function
BAYESIAN PERSONALIZED RANKING (BPR)
38MODELING IMPLICIT COUNT DATA
Assumption 2
Items consumed frequently are more desirable than those which consumed occasionally.
This can lead to regression-based and LTR objective.
User 1
{Item 2} ≻ {Item 1}
User 2
{Item 2} ≻ {Item 3} ≻ {Item 5}
User 3
{Item 3} ≻ {Item 4}
User 4
{Item5} ≻ {Item 4}
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 ? ? ?
User 2 ? 33 17 ? 10
User 3 ? ? 5116 122 ?
User 4 ? ? ? 6 3942
39
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 0 0 0
User 2 0 33 17 0 10
User 3 0 0 5116 122 0
User 4 0 0 0 6 3942
MODELING IMPLICIT COUNT DATA
Assumption 3
Exploiting the nature of count data, one can assume the value of each entry to be over
a Poisson distribution independently.
40HIERARCHICAL POISSON FACTORIZATION (HPF)
HPF (Gopalan et al., 2015), where each entry is assumed to be a Poisson, is a
widely-used MF method for recommendation on implicit count data.
Objective
function
Model
Data
2 𝑝 𝑥01 𝜽0
?
𝜷1
0,1
, where	𝑝 𝑥01 𝜽0
?
𝜷1 ~Poi 𝑥01; 𝜽0
?
𝜷1
𝑋
MF	with	Gamma	latent	factors
41HIERARCHICAL POISSON FACTORIZATION (HPF)
HPF (Gopalan et al., 2015), where each entry is assumed to be a Poisson, is a
widely-used MF method for recommendation on implicit count data.
42
Since PF can down-weight the effect of zero entries and updates with the
computational cost linear to the number of nonzero entries, PF widely used for large
sparse utility matrices.
HIERARCHICAL POISSON FACTORIZATION (HPF)
log 0! = log 1 = 0
43PERSONALIZED RANKING ON PF (PRPF)
In PRPF (Kuo et al., 2018), a pairwise LTR model is proposed to permute consumed
items for each user according to assumption 2.
[Kuo	et	al.,	2018]	Kuo,	Li-Yen,	et	al.	Personalized	Ranking	on	Poisson	Factorization.	In	SDM,	2018.
Objective	
function
Model
Data 𝑖 ≻0 𝑗
MF	with	Gamma	latent	factors
User 1
{Item 2} ≻ {Item 1}
User 2
{Item 2} ≻ {Item 3} ≻ {Item 5}
User 3
{Item 3} ≻ {Item 4}
User 4
{Item5} ≻ {Item 4}
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 ? ? ?
User 2 ? 33 17 ? 10
User 3 ? ? 5116 72 ?
User 4 ? ? ? 6 3942
44CHALLENGES OF IMPLICIT COUNT DATA
Challenge 1: data overdispersion
Real-life user consuming behaviors follow power-law distributions approximately.
Data are often overdispersed.
45CHALLENGES OF IMPLICIT COUNT DATA
Challenge 1: data overdispersion
Some songs are played quite frequently since one may always listen to the songs by the artists
who she likes.
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 0 0 0
User 2 0 33 17 0 10
User 3 0 0 5116 122 0
User 4 0 0 0 6 3942
46CHALLENGES OF IMPLICIT COUNT DATA
Challenge 1: data overdispersion
In fact, the problem can be addressed that the Poisson distribution cannot model the data
owning to its limited variance. As long as the variance the model assumes is larger than the
variance of data, the effect of outliers will be reduced.
0 5116
0 5116
0 5116
47CHALLENGES OF IMPLICIT COUNT DATA
Challenge 2: failure exposure estimation
An outlier entry in a utility matrix can be explained that the user always plays the songs on
her/his own initiative, which means that almost no failure exposures might exist.
Accordingly, the number of failure exposures vary with users and items since user behaviors are
different.
Nevertheless, most previous works only consider successful events but omit failure exposure.
The failure exposure is hard to model and can merely be regarded as an implicit information
estimated from the relationship between observations and the corresponding model inference.
Ariana Grande
User 1 Item 1
120 Ariana Grande
User 1 Item 1
120
???
Success exposure count
Failure exposure count
48SCENARIO OF USER CONSUMPTION
Scenario
Success
exposure
Failure
exposure
Observed event (nonzero entry)
User 𝑢 plays 𝑖 on her/his own initiative. High Very Low
User 𝑢 is exposed to 𝑖 and 𝑢 plays 𝑖.
Medium/
Low
Medium/
Low
Unobserved event (zero entry)
User 𝑢 is exposed to 𝑖 and 𝑢 does not play 𝑖. 0
Medium/
Low
User 𝑢 has not been exposed to 𝑖 yet. 0 ?
49
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 120 242 0 0 0
User 2 0 33 17 0 10
User 3 0 0 5116 122 0
User 4 0 0 0 6 3942
MODELING IMPLICIT COUNT DATA
Assumption 4
To alleviate the data dispersion problem, the value of each entry is assumed to be a
negative binomial distribution independently.
50NEGATIVE BINOMIAL OR POISSON?
Negative Binomial Poisson
Notation 𝑥01~NB 𝑟01,
𝜽0
( 𝜷1
𝑟01 + 𝜽0
( 𝜷1
𝑥01~Poi 𝜽0
( 𝜷1
Success Event Count 𝜽0
( 𝜷1 𝜽0
( 𝜷1
Failure Event Count 𝑟01 ∞
Expectation 𝜽0
( 𝜷1 𝜽0
( 𝜷1
Variance 𝜽0
( 𝜷1 1 +
𝜽0
( 𝜷1
𝑟01
𝜽0
( 𝜷1
51NEGATIVE BINOMIAL OR POISSON?
Negative Binomial Poisson
Notation 𝑥01~NB 𝑟01,
𝜽0
( 𝜷1
𝑟01 + 𝜽0
( 𝜷1
𝑥01~Poi 𝜽0
( 𝜷1
Success Event Count 𝜽0
( 𝜷1 𝜽0
( 𝜷1
Failure Event Count 𝑟01 ∞
Expectation 𝜽0
( 𝜷1 𝜽0
( 𝜷1
Variance 𝜽0
( 𝜷1 1 +
𝜽0
( 𝜷1
𝑟01
𝜽0
( 𝜷1
52MODELING IMPLICIT COUNT DATA
Assumption 5
Items are exposed to the target user before she/he consumes them.
Ariana Grande
User 1 Item 1
120
???
Success exposure count
Failure exposure count
Ariana Grande
User 1 Item 1Exposure
53EXPOMF
In ExpoMF (Liang et al., 2016), a probabilistic approach directly incorporates user
exposure to items into collaborative filtering.
[Liang	et	al.,	2016]	Liang,	Dawen,	et	al.	Modeling	User	Exposure	in	Recommendation.	In	WWW,	2016.
Objective
function
Model
Data 𝑌
MF	incorporating	Bernoulli	exposure
54EXPOMF
In ExpoMF (Liang et al., 2016), a probabilistic approach directly incorporates user
exposure to items into collaborative filtering.
[Liang	et	al.,	2016]	Liang,	Dawen,	et	al.	Modeling	User	Exposure	in	Recommendation.	In	WWW,	2016.
HIERARCHICAL NEGATIVE
BINOMIAL FACTORIZATION
56
PRELIMINARY:
NEGATIVE BINOMIAL DISTRIBUTION
As such, in NBF, entry 𝑥 𝑢𝑖 can be sampled from the generative process
Compared with Poisson distribution, where the mean equals the variance, NB is more
feasible to data with larger variance. The mean and the variance of NB in Eq. (1) are
Ariana Grande
User 1 Item 1
𝜽0
(
𝜷1
𝑟01
Success exposure count
Failure exposure count
Success probability =
𝜽0
(
𝜷1
𝑟01 + 𝜽0
(
𝜷1
57
PRELIMINARY:
POISSON-GAMMA MIXTURE
The NB can also be viewed as a Poisson-gamma mixture (Lawless, 1987; Gardner et
al., 1995), which is defined as
where the latent variables 𝑑01 denotes the dispersion of variable 𝑥 𝑢𝑖.
𝑥01
𝑑01
𝑟01
𝜽0 𝜷1𝑥01
𝑟01
𝜽0 𝜷1
Poisson
Gamma
Equivalent
NB
58
PRELIMINARY:
POISSON-GAMMA MIXTURE
Poisson-gamma mixture
Since the variance of 𝑑 𝑢𝑖 is 𝑟𝑢𝑖
−1
, the less 𝑟𝑢𝑖 can be, the larger the variance of 𝑑 𝑢𝑖 is.
When 𝑟01 is fixed as a constant, the variety of 𝑑01 is limited.
𝑥01
𝑑01
𝑟01
𝜽0 𝜷1
Poisson
Gamma
59
PROPOSED: HIERARCHICAL
POISSON-GAMMA MIXTURE
Poisson-gamma mixture
Since the variance of 𝑑 𝑢𝑖 is 𝑟𝑢𝑖
−1
, the less 𝑟𝑢𝑖 can be, the larger the variance of 𝑑 𝑢𝑖 is.
When 𝑟01 is fixed as a constant, the variety of 𝑑01 is limited.
To tackle this, we let 𝑟01 be a gamma variable according to the conjugate prior
relationship.
𝑥01
𝑑01
𝑟01
𝜽0 𝜷1
Poisson
Gamma
𝑥01
𝑑01
𝜽0 𝜷1
Poisson
Gamma
𝑟01 Gamma
𝑔 ℎ
060
Hierarchical Negative
Binomial Factorization (HNBF)
! ∈ ℐ$ ∈ %
&'	 )'*+
,-
.'*
/'*
0'
1
2
/
3*	4*
5 ℎ
061
Hierarchical Negative
Binomial Factorization (HNBF)
! ∈ ℐ$ ∈ %
&'	 )'*+
,-
.'*
/'*
0'
1
2
/
3*	4*
5 ℎ
062
Hierarchical Negative
Binomial Factorization (HNBF)
! ∈ ℐ$ ∈ %
&'	 )'*+
,-
.'*
/'*
0'
1
2
/
3*	4*
5 ℎ
063
Hierarchical Negative
Binomial Factorization (HNBF)
too slow!
! ∈ ℐ$ ∈ %
&'	 )'*+
,-
.'*
/'*
0'
1
2
/
3*	4*
5 ℎ
064
FastHNBF
(",$) ∈ '(
$ ∈ ℐ" ∈ *
+,	 .,/
0, 1/	
2
3
ℎ5
ℎ6
7
8, 9/
:5
:6
;,/: ℎ
<,/
.,/
(",$) ∈ '=
>,
?<
@/	A/ B
065
FastHNBF
(",$) ∈ '(
$ ∈ ℐ" ∈ *
+,	 .,/
0, 1/	
2
3
ℎ5
ℎ6
7
8, 9/
:5
:6
;,/: ℎ
<,/
.,/
(",$) ∈ '=
>,
?<
@/	A/ B
66UPDATING: VARIATIONAL INFERENCE
Since the variables are tangled, we use variational inference to optimize the posteriori.
Variational inference (VI) is a method for approximating posterior distributions, indexed by free
variational parameters, by maximizing the evidence lower bound (ELBO), a lower bound on
the logarithm of the marginal probability of the observations log 𝑝(𝑥). (Jordan et al., 1999;
Hoffman et al., 2013).
67UPDATING: VARIATIONAL INFERENCE
The simplest variational family of distributions is the mean field family, where each latent
variable is independent and governed by its own variational parameter. Thus, the traditional
coordinate ascent algorithm for fitting the variational parameters is used.
The variational family of the proposed framework is
EXPERIMENTS
69DATASETS AND COMPETING METHODS
HPF Hierarchical Poisson Factorization
(Gopalan et al., 2015)
PRPF Personalized Ranking on Poisson Factorization
(Kuo et al., 2018)
CCPF Coupled Compound Poisson Factorization
(Basbug & Engelhardt, 2017)
NBF Negative Binomial Factorization
(Gouvert et al., 2018)
BPR Bayesian Personalized Ranking
(Rendle et al., 2009)
ExpoMF Exposure Matrix Factorization
(Liang et al., 2016)
70CONVERGENCE
HPF converges fast since it does not consider data dispersion.
NBF converges slightly slower than HPF because of the augmented variables for data
dispersion.
Since HNBF comprises a Bayesian structures for data dispersion, the latent variables for
dispersion vary freely. Thus, HNBF converges slower than NBF. Even so, HNBF still converges
efficiently owing to the low updating cost per epoch.
71COMPUTING TIME
Since HPF only considers nonzero entries during the updating of latent variables, the updating
cost is the least. HNBF has augmented variables for estimating data dispersion, so that the
updating is slightly slower. Notice that HNBF runs much faster than NBF even though HNBF is
more sophisticated than NBF.
72RESULTS ON IMPLICIT COUNT DATA
Last.fm1K Prec@5 Prec@10 Rec@5 Rec@10
HPF
34.8% 32.2% 1.41% 2.54%
1.16% 0.74% 0.04% 0.07%
PRPF
44.1% 39.6% 1.79% 3.11%
1.2% 0.9% 0.1% 0.1%
CCPF-HPF
33.54% 30.91% 1.35% 2.41%
1.38% 0.84% 0.06% 0.05%
NBF
37.29% 34.08% 1.51% 2.71%
1.55% 0.93% 0.08% 0.08%
FastHNBF
47.9% 42.9% 1.81% 3.12%
0.50% 0.43% 0.03% 0.04%
BPR
31.3% 28.6% 1.21% 2.12%
1.42% 1.17% 0.04% 0.05%
ExpoMF
55.1% 50.3% 2.20% 3.87%
0.28% 0.18% 0.03% 0.04%
73RESULTS ON IMPLICIT COUNT DATA
Last.fm2K Prec@5 Prec@10 Rec@5 Rec@10
HPF
14.8% 11.7% 6.85% 10.72%
0.65% 0.41% 0.30% 0.37%
PRPF
19.9% 15.0% 9.24% 13.76%
0.45% 0.29% 0.20% 0.26%
CCPF-HPF
14.5% 11.5% 6.70% 10.60%
0.69% 0.45% 0.33% 0.44%
NBF
16.06% 12.42% 7.43% 11.45%
0.51% 0.23% 0.24% 0.21%
FastHNBF
20.8% 15.5% 9.63% 14.25%
0.34% 0.18% 0.17% 0.17%
BPR
19.5% 14.2% 9.04% 13.11%
1.42% 1.17% 0.04% 0.05%
ExpoMF
24.0% 17.7% 11.14% 16.20%
0.28% 0.18% 0.03% 0.04%
74RESULTS ON IMPLICIT COUNT DATA
Last.fm360K Prec@5 Prec@10 Rec@5 Rec@10
HPF
9.20% 7.58% 4.29% 7.05%
0.24% 0.17% 0.11% 0.16%
PRPF
9.77% 7.93% 4.52% 7.33%
0.17% 0.15% 0.11% 0.15%
CCPF-HPF
9.25% 7.64% 4.31% 7.11%
0.14% 0.10% 0.07% 0.09%
NBF
n/a n/a n/a n/a
n/a n/a n/a n/a
FastHNBF
10.08% 8.21% 4.70% 7.64%
0.15% 0.10% 0.07% 0.10%
BPR
n/a n/a n/a n/a
n/a n/a n/a n/a
ExpoMF
n/a n/a n/a n/a
n/a n/a n/a n/a
75RESULTS ON IMPLICIT COUNT DATA
MovieLens100K Prec@5 Prec@10 Rec@5 Rec@10
HPF
40.14% 34.48% 12.28% 20.27%
0.78% 0.43% 0.39% 0.33%
PRPF
39.18% 33.24% 12.26% 19.67%
1.15% 1.26% 0.34% 0.64%
CCPF-HPF
39.91% 34.23% 12.00% 19.91%
0.58% 0.50% 0.30% 0.47%
NBF
40.31% 34.58% 12.24% 20.45%
0.51% 0.37% 0.21% 0.37%
FastHNBF
40.93% 34.84% 12.46% 20.41%
0.34% 0.26% 0.20% 0.28%
BPR
34.28% 29.45% 11.47% 18.81%
0.82% 0.67% 0.28% 0.46%
ExpoMF
44.05% 37.13% 14.02% 22.67%
0.15% 0.14% 0.05% 0.18%
76RESULTS ON IMPLICIT COUNT DATA
MovieLens1M Prec@5 Prec@10 Rec@5 Rec@10
HPF
36.14% 31.43% 7.56% 12.64%
0.57% 0.42% 0.26% 0.38%
PRPF
35.56% 30.93% 7.35% 12.28%
0.43% 0.36% 0.17% 0.27%
CCPF-HPF
35.48% 30.98% 7.33% 12.35%
0.37% 0.34% 0.20% 0.33%
NBF
35.72% 31.04% 7.53% 12.60%
0.40% 0.35% 0.17% 0.24%
FastHNBF
36.19% 31.52% 7.58% 12.67%
0.20% 0.20% 0.08% 0.18%
BPR
26.85% 23.02% 5.86% 9.78%
0.47% 0.41% 0.13% 0.22%
ExpoMF
40.50% 35.13% 9.06% 14.90%
0.05% 0.05% 0.03% 0.03%
77EFFICIENT OR EFFECTIVE?
The methods for recommendation are categorized into two types: efficient models and effective
models. The two types of models should be concerned respectively.
0.21	
0.34	
0.97	
5.54	
0.30	
44.04	
0.1
1
10
100
HPF HNBF NBF PRPF CCPF ExpoMF
Time	(Seconds)
78THE ARCHITECTURE OF RECSYS
ExpoMF
Source:	https://medium.com/netflix-techblog/system-architectures-for-
personalization-and-recommendation-e081aa94b5d8
FastHNBF
THANK YOUhttps://github.com/iankuoli/Test_Julia

More Related Content

PDF
Introduction to R Short course Fall 2016
PDF
VSSML17 L3. Clusters and Anomaly Detection
PDF
BSSML16 L4. Association Discovery and Topic Modeling
PDF
Building Data Pipelines for Music Recommendations at Spotify
PDF
Music Recommendations at Scale with Spark
PDF
CF Models for Music Recommendations At Spotify
PDF
Product Recommendation System​ By Using Collaborative Filtering and Network B...
PDF
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Introduction to R Short course Fall 2016
VSSML17 L3. Clusters and Anomaly Detection
BSSML16 L4. Association Discovery and Topic Modeling
Building Data Pipelines for Music Recommendations at Spotify
Music Recommendations at Scale with Spark
CF Models for Music Recommendations At Spotify
Product Recommendation System​ By Using Collaborative Filtering and Network B...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...

Similar to Matrix Factorizations for Recommender Systems on Implicit Data (20)

PDF
Matrix Factorization Techniques For Recommender Systems
PPTX
Lessons learnt at building recommendation services at industry scale
PDF
Introduction to behavior based recommendation system
PDF
Scala Data Pipelines for Music Recommendations
PPTX
Recommender Systems: Advances in Collaborative Filtering
PPTX
Recommendation system
PDF
Algorithmic Music Recommendations at Spotify
PDF
Recsys matrix-factorizations
PDF
Recommendation System --Theory and Practice
PPTX
Utilizing Marginal Net Utility for Recommendation in E-commerce
PDF
Netflix Recommendations - Beyond the 5 Stars
PDF
Advances In Collaborative Filtering
PPTX
Rokach-GomaxSlides (1).pptx
PPTX
Rokach-GomaxSlides.pptx
PDF
IntroductionRecommenderSystems_Petroni.pdf
PDF
Factorization Machines and Applications in Recommender Systems
PDF
ML+Hadoop at NYC Predictive Analytics
PDF
Recommender Systems
PDF
Machine learning @ Spotify - Madison Big Data Meetup
PDF
The Factorization Machines algorithm for building recommendation system - Paw...
Matrix Factorization Techniques For Recommender Systems
Lessons learnt at building recommendation services at industry scale
Introduction to behavior based recommendation system
Scala Data Pipelines for Music Recommendations
Recommender Systems: Advances in Collaborative Filtering
Recommendation system
Algorithmic Music Recommendations at Spotify
Recsys matrix-factorizations
Recommendation System --Theory and Practice
Utilizing Marginal Net Utility for Recommendation in E-commerce
Netflix Recommendations - Beyond the 5 Stars
Advances In Collaborative Filtering
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides.pptx
IntroductionRecommenderSystems_Petroni.pdf
Factorization Machines and Applications in Recommender Systems
ML+Hadoop at NYC Predictive Analytics
Recommender Systems
Machine learning @ Spotify - Madison Big Data Meetup
The Factorization Machines algorithm for building recommendation system - Paw...
Ad

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Introduction to the R Programming Language
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Database Infoormation System (DBIS).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
annual-report-2024-2025 original latest.
Introduction to Knowledge Engineering Part 1
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Quality review (1)_presentation of this 21
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
Qualitative Qantitative and Mixed Methods.pptx
Introduction to the R Programming Language
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IB Computer Science - Internal Assessment.pptx
Database Infoormation System (DBIS).pptx
Ad

Matrix Factorizations for Recommender Systems on Implicit Data

  • 1. Matrix Factorizations for Recommender Systems on Implicit Data Li-Yen Kuo and Ming-Syan Chen National Taiwan University.
  • 2. 郭立言 Kuo, Li-Yen NetDB, EE Dept, National Taiwan University Python, JuliaLang, Matlab Unsupervised Learning, Recommender Systems, Bayesian Graphs, GANs, Adversarial Training
  • 5. 5 The Internet changes business model and user behaviors. Clayton Christensen, a Harvard professor, said that Blockbuster’s ignorance and laziness was its own undoing. Netflix began serving a ‘niche market’ and slowly began to take over Blockbuster’s entire market [1]. THE INTERNET CHANGES OUR LIFE [1] https://medium.com/@ScAshwin/the-rise-of-netflix-and-the-fall-of- 3blockbuster-29e5457339b7 Source: https://www.dailymail.co.uk/sciencetech/article- 5301869/Website-Flixable-makes-easier-browse-Netflix.html Source: https://techcrunch.com/2018/07/13/theres-now-just- one-blockbuster-remaining-in-the-u-s/
  • 6. 6 Recommender systems are widely used in many popular commercial on-line systems. HOW TO PROVIDE RECOMMENDATION? Editorial Systems hand-curated Global RecSys simple statistics popularity Personalized RecSys tailored to Individuals, e.g., Amazon and Netflix
  • 7. 7 小編很辛苦...QQ Editorial Systems hand-curated Global RecSys simple statistics popularity Personalized RecSys tailored to Individuals, e.g., Amazon and Netflix Recommender systems are widely used in many popular commercial on-line systems. HOW TO PROVIDE RECOMMENDATION?
  • 8. 8 Editorial Systems hand-curated Global RecSys simple statistics popularity Personalized RecSys tailored to Individuals, e.g., Amazon and Netflix Popular / latest items may attract most of people. But NOT EVERYONE! (80/20 rule) Recommender systems are widely used in many popular commercial on-line systems. HOW TO PROVIDE RECOMMENDATION?
  • 9. 9 Editorial Systems hand-curated Global RecSys simple statistics popularity Personalized RecSys tailored to Individuals, e.g., Amazon and Netflix It can explore the tastes of the rest 20%. But it difficult. Recommender systems are widely used in many popular commercial on-line systems. HOW TO PROVIDE RECOMMENDATION?
  • 10. 10 Over 75% of what people watch comes from a recommendation. EVERYTHING IS PERSONALIZED Ranking Row: category
  • 11. 11 Ariana Grande Lady Gaga Coldplay Miles Davis User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 L. Beethoven Item 5 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 User 2 User 3 User 4 User-item relationship can be represented by a utility matrix. The meaning of an entry depends on the scenario. DEFINITION OF UTILITY MATRIX
  • 12. 12TWO TYPES OF UTILITY MATRIX utility matrix explicit implicit binary count rating
  • 13. 13 In rating systems, such as MovieLens [1] and Allmusic [2], the value of an entry denotes the rating of the item given by the user. UTILITY MATRIX ON EXPLICIT RATINGS Ariana Grande Lady Gaga Coldplay Miles Davis User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 L. Beethoven Item 5 3 4 5 4 3 4 1 3 5 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 3 4 ? ? ? User 2 ? 5 4 ? 3 User 3 ? ? 4 1 ? User 4 ? ? ? 3 5 [1] https://movielens.org/ [2] https://www.allmusic.com/
  • 14. 14 Considering personal offset is important. Ratings can explicitly reflect the preference of an individual. Users always give ratings on what they are familiar with. UTILITY MATRIX ON EXPLICIT RATINGS Ariana Grande Lady Gaga Coldplay Miles Davis User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 L. Beethoven Item 5 3 4 5 4 3 4 1 3 5 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 3 4 ? ? ? User 2 ? 5 4 ? 3 User 3 ? ? 4 1 ? User 4 ? ? ? 3 5 [1] https://movielens.org/ [2] https://www.allmusic.com/
  • 15. 15 Ariana Grande Lady Gaga Coldplay Miles Davis User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 L. Beethoven Item 5 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 1 1 ? ? ? User 2 ? 1 1 ? 1 User 3 ? ? 1 1 ? User 4 ? ? ? 1 1 For instance, in a music podcast service, the value of an entry may denote the subscription. UTILITY MATRIX ON IMPLICIT BINARY DATA
  • 16. 16UTILITY MATRIX ON IMPLICIT COUNT DATA Ariana Grande Lady Gaga Coldplay Miles Davis User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 L. Beethoven Item 5 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 ? ? ? User 2 ? 33 17 ? 10 User 3 ? ? 5116 72 ? User 4 ? ? ? 6 3942 Or it may denote the play count. 120 242 33 17 10 5116 72 6 3942
  • 17. 17EXPLICIT OR IMPLICIT? Lady Gaga Coldplay User 2 Item 2 Item 3 L. Beethoven Item 5 Explicit Feedback Has negative feedback. Neglect noise. Reflect preference. 5 4 3 Lady Gaga Coldplay User 2 Item 2 Item 3 L. Beethoven Item 5 33 17 10 543 neutral desirableundesirable Implicit Feedback No negative feedback. Comes with noise. Reflect confidence rather than preference. 0 positive & incremental
  • 18. 18GOAL OF RECSYS Ariana Grande Lady Gaga Coldplay Miles Davis User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 L. Beethoven Item 5 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 0 0 0 User 2 0 33 17 0 10 User 3 0 0 5116 72 ? User 4 0 0 0 6 3942 A principal goal in recommender systems is to retrieve unconsumed items that the target user would likely consume in the future.
  • 20. 20MATRIX FACTORIZATION FOR RECSYS The most well-known method for recommendation is matrix factorization (MF). MF is a class of collaborative filtering (CF) algorithms. Rec Sys CF MF
  • 21. 21 Why do we need MF? Let’s see an example below. Ariana Grande Lady Gaga Sheeran Ed Coldplay User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Linkin Park Item 5 The 1975 Item 6 MATRIX FACTORIZATION FOR RECSYS Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4
  • 22. 22 Rock Pop Music The fact that similar users consume similar items can be represented by a latent factor. Ariana Grande Lady Gaga Sheeran Ed Coldplay User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Linkin Park Item 5 The 1975 Item 6 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4 MATRIX FACTORIZATION FOR RECSYS
  • 23. 23 Rock Pop Music A user would likely consume an item involving the same latent factor with her. Ariana Grande Lady Gaga Sheeran Ed Coldplay User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Linkin Park Item 5 The 1975 Item 6 MATRIX FACTORIZATION FOR RECSYS Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4
  • 24. 24 When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the optimizer will preserve the information embedded in 𝐗 as much as possible. Hence, MF factorizes a utility matrix to acquire latent factors. ≈ × 𝐗 𝜽 𝜷 MATRIX FACTORIZATION FOR RECSYS
  • 25. 25 When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the optimizer will preserve the information embedded in 𝐗 as much as possible. Hence, MF factorizes a utility matrix to acquire latent factors. ≈ × 𝐗 𝜽 𝜷 MATRIX FACTORIZATION FOR RECSYS 𝑥'' ≈ 𝜽' ( 𝜷' ⇒ 𝑝 𝑥'' 𝜽' ( 𝜷'
  • 26. 26 When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the optimizer will preserve the information embedded in 𝐗 as much as possible. Hence, MF factorizes a utility matrix to acquire latent factors. ≈ × 𝐗 𝜽 𝜷 MATRIX FACTORIZATION FOR RECSYS 𝑥'+ ≈ 𝜽' ( 𝜷+ ⇒ 𝑝 𝑥'+ 𝜽' ( 𝜷+
  • 27. 27 When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the optimizer will preserve the information embedded in 𝐗 as much as possible. Hence, MF factorizes a utility matrix to acquire latent factors. ≈ × 𝐗 𝜽 𝜷 MATRIX FACTORIZATION FOR RECSYS 𝑥+' ≈ 𝜽+ ( 𝜷' ⇒ 𝑝 𝑥+' 𝜽+ ( 𝜷'
  • 28. 28 When using two low-rank matrices 𝜽 and 𝜷 to regenerate utility matrix 𝐗, the optimizer will preserve the information embedded in 𝐗 as much as possible. Hence, MF factorizes a utility matrix to acquire latent factors. ≈ × 𝐗 𝜽 𝜷 MATRIX FACTORIZATION FOR RECSYS ∀ 𝑢, 𝑖 , 𝑥01 ≈ 𝜽0 ( 𝜷1 ⇒ 2 𝑝 𝑥01 𝜽0 ( 𝜷1 0,1 ⇒ 4 log 𝑝 𝑥01 𝜽0 ( 𝜷1 0,1
  • 29. 29MATRIX FACTORIZATION FOR RECSYS User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 × = Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4 represent approximate Rock Pop Music Ariana Grande Lady Gaga Sheeran Ed Coldplay User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Linkin Park Item 5 The 1975 Item 6
  • 30. 30MATRIX FACTORIZATION FOR RECSYS User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 × = Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4 represent approximate Rock Pop Music Ariana Grande Lady Gaga Sheeran Ed Coldplay User 1 User 2 User 3 User 4 Item 1 Item 2 Item 3 Item 4 Linkin Park Item 5 The 1975 Item 6
  • 32. 32 MFUnsupervisedSupervised MODELING IMPLICIT COUNT DATA A machine learning method can somehow represented by the 3-tiers structure. Objective function Model Data 𝑝 𝑦 𝑓 𝑥 𝑓 ; 𝑥, 𝑦 𝑝 𝑥 𝑓 𝑓 𝑥 𝑝 𝑋 𝜽𝜷( 𝜽𝜷( 𝑋 LTR 𝑝 ≻ 𝑓 𝑖, 𝑗 𝑖 ≻ 𝑗 𝑓 ;
  • 33. 33MODELING IMPLICIT COUNT DATA Unlike ratings, user preference is extracted implicitly from these count data. Accordingly, we have 4 assumptions as the prior knowledge. Based on some of these assumptions, various objectives are proposed. 1 Most of unconsumed items are undesirable for a target user. 2 Items consumed frequently are more desirable than those which consumed occasionally. 3 The value of an entry is assumed to be over a Poisson. 4 The value of an entry is assumed to be over a negative binomial. 5 Items are exposed to the target user before she/he consumes them.
  • 34. 34MODELING IMPLICIT COUNT DATA Assumption 1.1 Assuming that most of unconsumed items are undesirable for a target user, we set these unobserved entries to zero, namely, consuming 0 times. Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 0 0 0 User 2 0 33 17 0 10 User 3 0 0 5116 72 0 User 4 0 0 0 6 3942 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 ? ? ? User 2 ? 33 17 ? 10 User 3 ? ? 5116 72 ? User 4 ? ? ? 6 3942
  • 35. 35MODELING IMPLICIT COUNT DATA Assumption 1.2 Assuming that most of unconsumed items are undesirable for a target user, we rank consumed items higher than unconsumed ones. User 1 {Item 1, Item 2} ≻ {Item 3, Item 4, Item 5} User 2 {Item 2, Item 3, Item5} ≻ {Item 1, Item 4} User 3 {Item 3, Item 4} ≻ {Item 1, Item 2, Item 5} User 4 {Item 4, Item5} ≻ {Item 1, Item 2, Item 5} Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 ? ? ? User 2 ? 33 17 ? 10 User 3 ? ? 5116 72 ? User 4 ? ? ? 6 3942
  • 36. 36BAYESIAN PERSONALIZED RANKING (BPR) By using assumption 1.2, in BPR (Rendel et al., 2009), the consumed-or-not problem is considered as a pairwise learning to rank. [Rendel et al.] Rendle, Steffen, et al. BPR: Bayesian personalized ranking from implicit feedback. AUAI Press, 2009. Objective function Model Data MF or K-NN 𝑖 ≻0 𝑗
  • 38. 38MODELING IMPLICIT COUNT DATA Assumption 2 Items consumed frequently are more desirable than those which consumed occasionally. This can lead to regression-based and LTR objective. User 1 {Item 2} ≻ {Item 1} User 2 {Item 2} ≻ {Item 3} ≻ {Item 5} User 3 {Item 3} ≻ {Item 4} User 4 {Item5} ≻ {Item 4} Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 ? ? ? User 2 ? 33 17 ? 10 User 3 ? ? 5116 122 ? User 4 ? ? ? 6 3942
  • 39. 39 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 0 0 0 User 2 0 33 17 0 10 User 3 0 0 5116 122 0 User 4 0 0 0 6 3942 MODELING IMPLICIT COUNT DATA Assumption 3 Exploiting the nature of count data, one can assume the value of each entry to be over a Poisson distribution independently.
  • 40. 40HIERARCHICAL POISSON FACTORIZATION (HPF) HPF (Gopalan et al., 2015), where each entry is assumed to be a Poisson, is a widely-used MF method for recommendation on implicit count data. Objective function Model Data 2 𝑝 𝑥01 𝜽0 ? 𝜷1 0,1 , where 𝑝 𝑥01 𝜽0 ? 𝜷1 ~Poi 𝑥01; 𝜽0 ? 𝜷1 𝑋 MF with Gamma latent factors
  • 41. 41HIERARCHICAL POISSON FACTORIZATION (HPF) HPF (Gopalan et al., 2015), where each entry is assumed to be a Poisson, is a widely-used MF method for recommendation on implicit count data.
  • 42. 42 Since PF can down-weight the effect of zero entries and updates with the computational cost linear to the number of nonzero entries, PF widely used for large sparse utility matrices. HIERARCHICAL POISSON FACTORIZATION (HPF) log 0! = log 1 = 0
  • 43. 43PERSONALIZED RANKING ON PF (PRPF) In PRPF (Kuo et al., 2018), a pairwise LTR model is proposed to permute consumed items for each user according to assumption 2. [Kuo et al., 2018] Kuo, Li-Yen, et al. Personalized Ranking on Poisson Factorization. In SDM, 2018. Objective function Model Data 𝑖 ≻0 𝑗 MF with Gamma latent factors User 1 {Item 2} ≻ {Item 1} User 2 {Item 2} ≻ {Item 3} ≻ {Item 5} User 3 {Item 3} ≻ {Item 4} User 4 {Item5} ≻ {Item 4} Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 ? ? ? User 2 ? 33 17 ? 10 User 3 ? ? 5116 72 ? User 4 ? ? ? 6 3942
  • 44. 44CHALLENGES OF IMPLICIT COUNT DATA Challenge 1: data overdispersion Real-life user consuming behaviors follow power-law distributions approximately. Data are often overdispersed.
  • 45. 45CHALLENGES OF IMPLICIT COUNT DATA Challenge 1: data overdispersion Some songs are played quite frequently since one may always listen to the songs by the artists who she likes. Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 0 0 0 User 2 0 33 17 0 10 User 3 0 0 5116 122 0 User 4 0 0 0 6 3942
  • 46. 46CHALLENGES OF IMPLICIT COUNT DATA Challenge 1: data overdispersion In fact, the problem can be addressed that the Poisson distribution cannot model the data owning to its limited variance. As long as the variance the model assumes is larger than the variance of data, the effect of outliers will be reduced. 0 5116 0 5116 0 5116
  • 47. 47CHALLENGES OF IMPLICIT COUNT DATA Challenge 2: failure exposure estimation An outlier entry in a utility matrix can be explained that the user always plays the songs on her/his own initiative, which means that almost no failure exposures might exist. Accordingly, the number of failure exposures vary with users and items since user behaviors are different. Nevertheless, most previous works only consider successful events but omit failure exposure. The failure exposure is hard to model and can merely be regarded as an implicit information estimated from the relationship between observations and the corresponding model inference. Ariana Grande User 1 Item 1 120 Ariana Grande User 1 Item 1 120 ??? Success exposure count Failure exposure count
  • 48. 48SCENARIO OF USER CONSUMPTION Scenario Success exposure Failure exposure Observed event (nonzero entry) User 𝑢 plays 𝑖 on her/his own initiative. High Very Low User 𝑢 is exposed to 𝑖 and 𝑢 plays 𝑖. Medium/ Low Medium/ Low Unobserved event (zero entry) User 𝑢 is exposed to 𝑖 and 𝑢 does not play 𝑖. 0 Medium/ Low User 𝑢 has not been exposed to 𝑖 yet. 0 ?
  • 49. 49 Item 1 Item 2 Item 3 Item 4 Item 5 User 1 120 242 0 0 0 User 2 0 33 17 0 10 User 3 0 0 5116 122 0 User 4 0 0 0 6 3942 MODELING IMPLICIT COUNT DATA Assumption 4 To alleviate the data dispersion problem, the value of each entry is assumed to be a negative binomial distribution independently.
  • 50. 50NEGATIVE BINOMIAL OR POISSON? Negative Binomial Poisson Notation 𝑥01~NB 𝑟01, 𝜽0 ( 𝜷1 𝑟01 + 𝜽0 ( 𝜷1 𝑥01~Poi 𝜽0 ( 𝜷1 Success Event Count 𝜽0 ( 𝜷1 𝜽0 ( 𝜷1 Failure Event Count 𝑟01 ∞ Expectation 𝜽0 ( 𝜷1 𝜽0 ( 𝜷1 Variance 𝜽0 ( 𝜷1 1 + 𝜽0 ( 𝜷1 𝑟01 𝜽0 ( 𝜷1
  • 51. 51NEGATIVE BINOMIAL OR POISSON? Negative Binomial Poisson Notation 𝑥01~NB 𝑟01, 𝜽0 ( 𝜷1 𝑟01 + 𝜽0 ( 𝜷1 𝑥01~Poi 𝜽0 ( 𝜷1 Success Event Count 𝜽0 ( 𝜷1 𝜽0 ( 𝜷1 Failure Event Count 𝑟01 ∞ Expectation 𝜽0 ( 𝜷1 𝜽0 ( 𝜷1 Variance 𝜽0 ( 𝜷1 1 + 𝜽0 ( 𝜷1 𝑟01 𝜽0 ( 𝜷1
  • 52. 52MODELING IMPLICIT COUNT DATA Assumption 5 Items are exposed to the target user before she/he consumes them. Ariana Grande User 1 Item 1 120 ??? Success exposure count Failure exposure count Ariana Grande User 1 Item 1Exposure
  • 53. 53EXPOMF In ExpoMF (Liang et al., 2016), a probabilistic approach directly incorporates user exposure to items into collaborative filtering. [Liang et al., 2016] Liang, Dawen, et al. Modeling User Exposure in Recommendation. In WWW, 2016. Objective function Model Data 𝑌 MF incorporating Bernoulli exposure
  • 54. 54EXPOMF In ExpoMF (Liang et al., 2016), a probabilistic approach directly incorporates user exposure to items into collaborative filtering. [Liang et al., 2016] Liang, Dawen, et al. Modeling User Exposure in Recommendation. In WWW, 2016.
  • 56. 56 PRELIMINARY: NEGATIVE BINOMIAL DISTRIBUTION As such, in NBF, entry 𝑥 𝑢𝑖 can be sampled from the generative process Compared with Poisson distribution, where the mean equals the variance, NB is more feasible to data with larger variance. The mean and the variance of NB in Eq. (1) are Ariana Grande User 1 Item 1 𝜽0 ( 𝜷1 𝑟01 Success exposure count Failure exposure count Success probability = 𝜽0 ( 𝜷1 𝑟01 + 𝜽0 ( 𝜷1
  • 57. 57 PRELIMINARY: POISSON-GAMMA MIXTURE The NB can also be viewed as a Poisson-gamma mixture (Lawless, 1987; Gardner et al., 1995), which is defined as where the latent variables 𝑑01 denotes the dispersion of variable 𝑥 𝑢𝑖. 𝑥01 𝑑01 𝑟01 𝜽0 𝜷1𝑥01 𝑟01 𝜽0 𝜷1 Poisson Gamma Equivalent NB
  • 58. 58 PRELIMINARY: POISSON-GAMMA MIXTURE Poisson-gamma mixture Since the variance of 𝑑 𝑢𝑖 is 𝑟𝑢𝑖 −1 , the less 𝑟𝑢𝑖 can be, the larger the variance of 𝑑 𝑢𝑖 is. When 𝑟01 is fixed as a constant, the variety of 𝑑01 is limited. 𝑥01 𝑑01 𝑟01 𝜽0 𝜷1 Poisson Gamma
  • 59. 59 PROPOSED: HIERARCHICAL POISSON-GAMMA MIXTURE Poisson-gamma mixture Since the variance of 𝑑 𝑢𝑖 is 𝑟𝑢𝑖 −1 , the less 𝑟𝑢𝑖 can be, the larger the variance of 𝑑 𝑢𝑖 is. When 𝑟01 is fixed as a constant, the variety of 𝑑01 is limited. To tackle this, we let 𝑟01 be a gamma variable according to the conjugate prior relationship. 𝑥01 𝑑01 𝑟01 𝜽0 𝜷1 Poisson Gamma 𝑥01 𝑑01 𝜽0 𝜷1 Poisson Gamma 𝑟01 Gamma 𝑔 ℎ
  • 60. 060 Hierarchical Negative Binomial Factorization (HNBF) ! ∈ ℐ$ ∈ % &' )'*+ ,- .'* /'* 0' 1 2 / 3* 4* 5 ℎ
  • 61. 061 Hierarchical Negative Binomial Factorization (HNBF) ! ∈ ℐ$ ∈ % &' )'*+ ,- .'* /'* 0' 1 2 / 3* 4* 5 ℎ
  • 62. 062 Hierarchical Negative Binomial Factorization (HNBF) ! ∈ ℐ$ ∈ % &' )'*+ ,- .'* /'* 0' 1 2 / 3* 4* 5 ℎ
  • 63. 063 Hierarchical Negative Binomial Factorization (HNBF) too slow! ! ∈ ℐ$ ∈ % &' )'*+ ,- .'* /'* 0' 1 2 / 3* 4* 5 ℎ
  • 64. 064 FastHNBF (",$) ∈ '( $ ∈ ℐ" ∈ * +, .,/ 0, 1/ 2 3 ℎ5 ℎ6 7 8, 9/ :5 :6 ;,/: ℎ <,/ .,/ (",$) ∈ '= >, ?< @/ A/ B
  • 65. 065 FastHNBF (",$) ∈ '( $ ∈ ℐ" ∈ * +, .,/ 0, 1/ 2 3 ℎ5 ℎ6 7 8, 9/ :5 :6 ;,/: ℎ <,/ .,/ (",$) ∈ '= >, ?< @/ A/ B
  • 66. 66UPDATING: VARIATIONAL INFERENCE Since the variables are tangled, we use variational inference to optimize the posteriori. Variational inference (VI) is a method for approximating posterior distributions, indexed by free variational parameters, by maximizing the evidence lower bound (ELBO), a lower bound on the logarithm of the marginal probability of the observations log 𝑝(𝑥). (Jordan et al., 1999; Hoffman et al., 2013).
  • 67. 67UPDATING: VARIATIONAL INFERENCE The simplest variational family of distributions is the mean field family, where each latent variable is independent and governed by its own variational parameter. Thus, the traditional coordinate ascent algorithm for fitting the variational parameters is used. The variational family of the proposed framework is
  • 69. 69DATASETS AND COMPETING METHODS HPF Hierarchical Poisson Factorization (Gopalan et al., 2015) PRPF Personalized Ranking on Poisson Factorization (Kuo et al., 2018) CCPF Coupled Compound Poisson Factorization (Basbug & Engelhardt, 2017) NBF Negative Binomial Factorization (Gouvert et al., 2018) BPR Bayesian Personalized Ranking (Rendle et al., 2009) ExpoMF Exposure Matrix Factorization (Liang et al., 2016)
  • 70. 70CONVERGENCE HPF converges fast since it does not consider data dispersion. NBF converges slightly slower than HPF because of the augmented variables for data dispersion. Since HNBF comprises a Bayesian structures for data dispersion, the latent variables for dispersion vary freely. Thus, HNBF converges slower than NBF. Even so, HNBF still converges efficiently owing to the low updating cost per epoch.
  • 71. 71COMPUTING TIME Since HPF only considers nonzero entries during the updating of latent variables, the updating cost is the least. HNBF has augmented variables for estimating data dispersion, so that the updating is slightly slower. Notice that HNBF runs much faster than NBF even though HNBF is more sophisticated than NBF.
  • 72. 72RESULTS ON IMPLICIT COUNT DATA Last.fm1K Prec@5 Prec@10 Rec@5 Rec@10 HPF 34.8% 32.2% 1.41% 2.54% 1.16% 0.74% 0.04% 0.07% PRPF 44.1% 39.6% 1.79% 3.11% 1.2% 0.9% 0.1% 0.1% CCPF-HPF 33.54% 30.91% 1.35% 2.41% 1.38% 0.84% 0.06% 0.05% NBF 37.29% 34.08% 1.51% 2.71% 1.55% 0.93% 0.08% 0.08% FastHNBF 47.9% 42.9% 1.81% 3.12% 0.50% 0.43% 0.03% 0.04% BPR 31.3% 28.6% 1.21% 2.12% 1.42% 1.17% 0.04% 0.05% ExpoMF 55.1% 50.3% 2.20% 3.87% 0.28% 0.18% 0.03% 0.04%
  • 73. 73RESULTS ON IMPLICIT COUNT DATA Last.fm2K Prec@5 Prec@10 Rec@5 Rec@10 HPF 14.8% 11.7% 6.85% 10.72% 0.65% 0.41% 0.30% 0.37% PRPF 19.9% 15.0% 9.24% 13.76% 0.45% 0.29% 0.20% 0.26% CCPF-HPF 14.5% 11.5% 6.70% 10.60% 0.69% 0.45% 0.33% 0.44% NBF 16.06% 12.42% 7.43% 11.45% 0.51% 0.23% 0.24% 0.21% FastHNBF 20.8% 15.5% 9.63% 14.25% 0.34% 0.18% 0.17% 0.17% BPR 19.5% 14.2% 9.04% 13.11% 1.42% 1.17% 0.04% 0.05% ExpoMF 24.0% 17.7% 11.14% 16.20% 0.28% 0.18% 0.03% 0.04%
  • 74. 74RESULTS ON IMPLICIT COUNT DATA Last.fm360K Prec@5 Prec@10 Rec@5 Rec@10 HPF 9.20% 7.58% 4.29% 7.05% 0.24% 0.17% 0.11% 0.16% PRPF 9.77% 7.93% 4.52% 7.33% 0.17% 0.15% 0.11% 0.15% CCPF-HPF 9.25% 7.64% 4.31% 7.11% 0.14% 0.10% 0.07% 0.09% NBF n/a n/a n/a n/a n/a n/a n/a n/a FastHNBF 10.08% 8.21% 4.70% 7.64% 0.15% 0.10% 0.07% 0.10% BPR n/a n/a n/a n/a n/a n/a n/a n/a ExpoMF n/a n/a n/a n/a n/a n/a n/a n/a
  • 75. 75RESULTS ON IMPLICIT COUNT DATA MovieLens100K Prec@5 Prec@10 Rec@5 Rec@10 HPF 40.14% 34.48% 12.28% 20.27% 0.78% 0.43% 0.39% 0.33% PRPF 39.18% 33.24% 12.26% 19.67% 1.15% 1.26% 0.34% 0.64% CCPF-HPF 39.91% 34.23% 12.00% 19.91% 0.58% 0.50% 0.30% 0.47% NBF 40.31% 34.58% 12.24% 20.45% 0.51% 0.37% 0.21% 0.37% FastHNBF 40.93% 34.84% 12.46% 20.41% 0.34% 0.26% 0.20% 0.28% BPR 34.28% 29.45% 11.47% 18.81% 0.82% 0.67% 0.28% 0.46% ExpoMF 44.05% 37.13% 14.02% 22.67% 0.15% 0.14% 0.05% 0.18%
  • 76. 76RESULTS ON IMPLICIT COUNT DATA MovieLens1M Prec@5 Prec@10 Rec@5 Rec@10 HPF 36.14% 31.43% 7.56% 12.64% 0.57% 0.42% 0.26% 0.38% PRPF 35.56% 30.93% 7.35% 12.28% 0.43% 0.36% 0.17% 0.27% CCPF-HPF 35.48% 30.98% 7.33% 12.35% 0.37% 0.34% 0.20% 0.33% NBF 35.72% 31.04% 7.53% 12.60% 0.40% 0.35% 0.17% 0.24% FastHNBF 36.19% 31.52% 7.58% 12.67% 0.20% 0.20% 0.08% 0.18% BPR 26.85% 23.02% 5.86% 9.78% 0.47% 0.41% 0.13% 0.22% ExpoMF 40.50% 35.13% 9.06% 14.90% 0.05% 0.05% 0.03% 0.03%
  • 77. 77EFFICIENT OR EFFECTIVE? The methods for recommendation are categorized into two types: efficient models and effective models. The two types of models should be concerned respectively. 0.21 0.34 0.97 5.54 0.30 44.04 0.1 1 10 100 HPF HNBF NBF PRPF CCPF ExpoMF Time (Seconds)
  • 78. 78THE ARCHITECTURE OF RECSYS ExpoMF Source: https://medium.com/netflix-techblog/system-architectures-for- personalization-and-recommendation-e081aa94b5d8 FastHNBF