Local collaborative autoencoders (WSDM2021)

Local Collaborative Autoencoders
Minjin Choi1, Yoonki Jeong1, Joonseok Lee2, Jongwuk Lee1
Sungkyunkwan University (SKKU), South Korea1
Google Research, United States2

Global Low-rank Assumption
➢Existing models are based on the global low-lank assumption.
• All users and items share the same latent features.
➢Limitation: some users/items may have different latent features.
3
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1
0.5 ... 0.1
0.6 ... 0.2
0.7 ... 0.3
… ... ...
0.1 ... 0.7
0.2 0.4 1.2 … 2.3
… ... … ... ...
0.4 0.6 0.3 … 1.5
k features
k
features

Local Low-rank Assumption
➢A user-item matrix can be divided to several sub-matrices
with the local low-rank assumption.
• Each sub-matrix represents different communities.
• Local models represent various communities with different characteristics.
4
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1

Limitation of Existing Local Models
➢If the local model is too large, it is close to the global model.
• Because LLORMA uses large local models, the local model may not
represent its unique characteristic.
• The performance gain may come from an ensemble effect.
5
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1
Lee et al., "LLORMA: Local Low-Rank Matrix Approximation," JMLR 2016

Limitation of Existing Local Models
➢If the local model is too small, the accuracy is too low.
• Because sGLSVD uses small local models, some local model may have
insufficient training data.
6
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1
Evangelia Christakopoulou and George Karypis, “Local Latent Space Models for Top-N Recommendation,” KDD 2018
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1

Research Question
7
How to build coherent and accurate local models?

Our Key Contributions
➢When the local model keeps small and coherent,
we build the local model with a relatively large training data.
8
(Dataset: ML10M)
Training matrix
== Test matrix
training matrix
> test matrix

Our Key Contributions
➢Autoencoder-based models are used as the base model
to train the local model.
• They are useful for capturing non-linear and complicated patterns.
9
1 1 ? ? 1
1 1 1 ? ?
1 1 ? 1 ?
? 1 1 1 ?
? 1 1 ? 1
Sedhain et al., "AutoRec: Autoencoders Meet Collaborative Filtering," WWW 2015
2 1 3
…
2 1 3
…
…
𝒉(𝒓)
𝒓
ො
𝒓
𝑾, 𝒃
𝑾′
, 𝒃′
Autoencoder-based models take
each row as input.

Local Collaborative Autoencoders (LOCA)
➢Overall architecture of LOCA
• Step 1: Discovering two local communities for an anchor user
• Step 2: Training a local model with an expanded community
• Step 3: Combining multiple local models
11
A local community
for an anchor user
An expanded local
community
Local model q
…
Final model
Local model 2
Local model 1

Step 1: Discovering Local Communities
➢For an anchor user, determine a local community
and expand the local community for training.
12
1.0
0.6
0.3
0.5
0.4
0.7
0.4
0.2
Calculate the similarities between the
anchor user and the other users.
…
…
1.0
0.7
0.6
1.0
0.7
0.6
0.5
0.5
0.4
0.4
Neighbors for the
local community
Expanded neighbors to
train the local community

Step 2: Training a Local Model
➢Train the local model with the expanded community.
• Use the autoencoder-based model for training the local model.
• Note: It is possible to utilize any base models, e.g., MF, AE and EASER.
• The similarities with the anchor are used for the user weights for training.
13
1.0
0.6
0.3
0.5
0.4
0.7
0.4
1 1 ? 1
1 1 1 1
1 1 ? ?
1 1 ? 1
1 ? ? ?
1 1 1 1
1 1 ? ?
…
…
…
…
…
…
…
…
Harald Steck., "Embarrassingly shallow autoencoders for sparse data,” WWW 2019.

Step 3: Combining Multiple Local Models
➢Aggregate multiple local models with the weight of
each local model.
14
Local model 1
Local model q
…
1 1 ? 1
1 1 1 1
1 1 ? ?
1 1 ? 1
1 ? ? ?
1 1 1 1
1 1 ? ?
? 1 1 ?
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
.9 .4 .5 .7
.8 .9 .6 .6
.7 .8 .5 .8
…
.7 .4 .5 .9
.7 .5 .5 .8
.8 .6 .7 .8
…
…
…
.9 .4 .5 .7
.8 .9 .6 .6
.7 .8 .5 .8
.8 .7 .5 1
.7 .4 .5 .9
.7 .5 .5 .8
.8 .6 .7 .8
.6 1 1 .5
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…

Training Local Models in Detail
➢The loss function for the 𝑗-th local model
➢Aggregating all local models and a global model
15
argmin
𝜽(𝒋)
෍
𝒓𝒖∈𝑹
𝒕𝒖
(𝒋)
ℒ 𝒓𝒖, 𝑴𝒍𝒐𝒄𝒂𝒍
(𝒓𝒖; 𝜽(𝒋)
) + 𝝀𝛀(𝜽(𝒋)
)
෡
𝑹 = 𝜶𝑴𝒈𝒍𝒐𝒃𝒂𝒍 𝑹; 𝜽 𝒈 + 𝟏 − 𝜶 ෍
𝒋=𝟏
𝒒
𝒘(𝒋) ⊙ 𝑴𝒍𝒐𝒄𝒂𝒍(𝑹; 𝜽(𝒋)) ⊘ 𝒘
Parameter for the
𝒋-th local model
The 𝒋-th local model
User weight for training the 𝒋-th local model
User weight for inferring the 𝒋-th local model
The global model is used to handle the users
who are not covered by local models.

How to Choose Anchor Users
➢Random selection: Choose 𝑘 anchor users at random.
➢Need to maximize the coverage of users by local models.
• An optimal maximum coverage algorithm is the NP-hard problem.
➢Use the greedy method to maximize the coverage.
• Select the anchor user who has the most uncovered users iteratively.
16
Group1
Group 2
Group 3

Experimental Setup: Dataset
➢We evaluate our model over five public datasets
with various characteristics (e.g., domain, sparsity).
18
Dataset # of users # of items # of ratings Sparsity
MovieLens 10M
(ML10M)
69,878 10,677 10,000,054 98.66%
MovieLens 20M
(ML20M)
138,493 26,744 20,000,263 99.46%
Amazon Music
(AMusic)
4,964 11,797 97,439 99.83%
Amazon Game
(AGame)
13,063 17,408 236,415 99.90%
Yelp 25,677 25,815 731,671 99.89%

Evaluation Protocol and Metrics
➢Evaluation protocol: leave-5-out
• Hold-out the last 5 interactions as the test data for each user.
➢Evaluation metrics
• Recall@100
• Measures the number of test items included in the top-N list.
• NDCG@100
• Measures the ranking of test items in the top-N list.
19
Time
Training data:
Test data:
User interaction:

Competitive Global/Local Models
➢Four autoencoder-based global models
• CDAE: a denoising autoencoder-based model with a latent user vector
• MultVAE: a VAE-based model
• RecVAE: a VAE-based model by improving MultVAE
• EASER: an item-to-item latent factor model
➢Two local models
• LLORMA: local model using MF as the base model
• sGLSVD: local model using SVD as the base model
20
Yao Wu et al., “Collaborative Denoising Auto-Encoders for Top-N Recommender Systems,” WSDM 2016
Dawen Liang et al., "Variational autoencoders for collaborative filtering,” WWW 2018.
Ilya Shenbin et al., "RecVAE: A new variational autoencoder for Top-N recommendations with implicit feedback,” WSDM 2020.
Harald Steck., "Embarrassingly shallow autoencoders for sparse data,” WWW 2019.
Lee et al., "LLORMA: Local Low-Rank Matrix Approximation," JMLR 2016
Evangelia Christakopoulou and George Karypis, “Local Latent Space Models for Top-N Recommendation,” KDD 2018

Accuracy: LOCA vs. Competing Models
21
Dataset Metric CDAE MultVAE EASER RecVAE LLORMA sGLSVD LOCAVAE LOCAEASE
ML10M
Recall@100 0.4685 0.4653 0.4648 0.4705 0.4692 0.4468 0.4865 0.4798
NDCG@100 0.1982 0.1945 0.2000 0.1996 0.2042 0.1953 0.2073 0.2049
ML20M
Recall@100 0.4324 0.4397 0.4468 0.4417 0.3355 0.4342 0.4419 0.4654
NDCG@100 0.1844 0.1860 0.1948 0.1857 0.1446 0.1919 0.1884 0.2024
Amusic
Recall@100 0.0588 0.0681 0.0717 0.0582 0.0517 0.0515 0.0748 0.0717
NDCG@100 0.712 0.0822 0.0821 0.0810 0.0638 0.0613 0.0893 0.0826
Agames
Recall@100 0.1825 0.2081 0.1913 0.1920 0.1223 0.1669 0.2147 0.1947
NDCG@100 0.0808 0.0920 0.0915 0.0849 0.0539 0.0777 0.0966 0.0922
Yelp
Recall@100 0.2094 0.2276 0.2187 0.2262 0.1013 0.1965 0.2354 0.2205
NDCG@100 0.0920 0.0982 0.0972 0.0975 0.0429 0.0857 0.1103 0.0981
Global Models Local Models Ours
➢LOCA consistently outperforms competitive
global/local models over five benchmark datasets.

Effect of the Number of Local Models
➢The accuracy of LOCA improved consistently with
an increase in the number of local models.
22
0.185
0.19
0.195
0.2
0.205
0.21
0 50 100 150 200 250 300
NDCG@100
The number of local models
ML10M
MultVAE Ensemble_VAE
LLORMA_VAE LOCA_VAE
0.078
0.08
0.082
0.084
0.086
0.088
0.09
0 50 100 150 200 250 300
NDCG@100 The number of local models
AMusic
MultVAE Ensemble_VAE
LLORMA_VAE LOCA_VAE

Effect of Anchor Selection Method
➢Our coverage-based anchor selection outperforms
the other methods in terms of accuracies and coverages.
23
0.196
0.198
0.2
0.202
0.204
0.206
0.208
50 100 150 200 250 300
NDCG@100
The number of local models
ML10M
Random K-means Ours
0.08
0.082
0.084
0.086
0.088
0.09
50 100 150 200 250 300
NDCG@100 The number of local models
AMusic
Random K-means Ours

Illustration of LOCA
➢When a user has multiple tastes, LOCA can capture
the user preference by combining different local patterns.
• For a user (66005 in ML10M) who likes Sci-Fi and Horror movies, LOCA
shows a better accuracy.
24
Recommendation Local 70 Local 179 Global Ground Truth
Top-1
Sci-Fi
Adventure
Horror
Action
Thriller
Action
Sci-Fi
Action
Top-2
Sci-Fi
Horror
Horror
Drama
Drama
Horror
Thriller
Top-3
Sci-Fi
Action
Horror
Drama
Drama
Mystery
Horror
Action

Conclusion
➢We propose a new local recommender framework,
namely local collaborative autoencoders (LOCA).
➢LOCA can handle a large number of local models effectively.
• Adopts a local model with different training/inference strategies.
• Utilizes autoencoder-based model as the base model.
• Makes use of a greedy maximum coverage method to build various
local models.
➢LOCA outperforms the state-of-the-art global and local models
over various benchmark datasets.
26

Q&A
27
Code: https://github.com/jin530/LOCA
Email: zxcvxd@skku.edu

Local collaborative autoencoders (WSDM2021)

More Related Content

Similar to Local collaborative autoencoders (WSDM2021) (20)

Recently uploaded (20)

Local collaborative autoencoders (WSDM2021)