SlideShare a Scribd company logo
On Popularity Bias of Multimodal-aware Recommender Systems:
a Modalities-driven Analysis
Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Tommaso Di Noia
Politecnico di Bari, Bari (Italy)
email: firstname.lastname@poliba.it
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval
Ottawa, ON, Canada, 11-02-2023
Co-located with ACM Multimedia 2023
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
● Introduction and motivations
● Background
● Proposed analysis
● Results and discussion
● Conclusion and future work
Outline
2
Introduction and motivations
3
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Multimodal-aware recommender systems [Malitesta et al. (2023a)] exploit multimodal (i.e., audio, visual, textual)
content data to augment the representation of items, thus tackling known issues such as dataset sparsity and the
inexplicable nature of users’ actions (i.e., views, clicks) on online platforms.
4
Recommendation systems leveraging multimodal data
࢛
࢏
MODALITIES
࢓૚
࢓૛
࢓૜
. . .
. . .
MULTIMODAL
FEATURE
EXTRACTOR
࣐࢓ሺ‫ڄ‬ሻ
MULTIMODAL
REPRESENTATION
JOINT
ࣆሺ‫ڄ‬ሻ
COORDINATE
ࣆ࢓ ‫ڄ‬
. . .
INFERENCE
࣋ሺ‫ڄ‬ሻ
EARLY
FUSION
ࢽࢋሺ‫ڄ‬ሻ
LATE
FUSION
ࢽ࢒ሺ‫ڄ‬ሻ
(1) (2)
(a)
(b)
MULTIMODAL
FUSION
(3)
(a)
(b)
(4)
࢘
Which? How? When?
INPUT
[Malitesta et al. (2023a)] 2023. Formalizing Multimedia Recommendation through Multimodal Deep Learning. Under review at TORS. Available online at: arXiv:2309.05273.
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Most of multimodal-aware recommender
systems are based upon factorization models
for recommendation, such as the matrix
factorization with Bayesian personalized
ranking architecture (MFBPR [Rendle et al.]).
Given its simple implementation and efficacy,
MFBPR has long constituted the backbone of
recommendation algorithms in collaborative
filtering [He et al. (2020), Mao et al.], not only
in multimodal recommendation.
5
Multimodal-aware recommendation and factorization models
[Rendle et al.] 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI.
[He et al. (2020)] 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In SIGIR. ACM, 639–648.
[Mao et al.] 2021. SimpleX: A Simple and Strong Baseline for Collaborative Filtering. In CIKM. ACM, 1243–1252.
𝑢
𝑖
#
𝑦!"
𝑚# 𝑚$
𝑚%
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Nevertheless, the literature has shown that MFBPR-like models
may be affected by popularity bias [Abdollahpouri et al.,
Ricardo Baeza-Yates, Boratto et al., Jannach et al.]. Such
recommender systems tend to boost the performance of items from
the short-head at the detriment of the items from the long-tail.
6
Popularity bias in matrix factorization
[Abdollahpouri et al.] 2017. Controlling Popularity Bias in Learning-to-Rank Recommendation. In RecSys. ACM, 42–46.
[Ricardo Baeza-Yates] 2020. Bias in Search and Recommender Systems. In RecSys. ACM, 2.
[Boratto et al.] 2021. Connecting user and item perspectives in popularity debiasing for collaborative recommendation. Inf. Process. Manag. 58, 1 (2021), 102387.
[Jannach et al.] 2015. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Model. User Adapt. Interact. 25, 5 (2015), 427–491.
Daniele Malitesta∗
Politecnico di Bari, Italy
daniele.malitesta@poliba.it
Giandomenico Cornacchia∗
Politecnico di Bari, Italy
giandomenico.cornacchia@poliba.it
Claudio Pomo
Politecnico di Bari, Italy
claudio.pomo@poliba.it
Tommaso Di Noia
Politecnico di Bari, Italy
tommaso.dinoia@poliba.it
ABSTRACT
Multimodal-aware recommender systems (MRSs) exploit multi-
modal content (e.g., product images or descriptions) as items’ side
information to improve recommendation accuracy. While most of
such methods rely on factorization models (e.g., MFBPR) as base
architecture, it has been shown that MFBPR may be a�ected by
popularity bias, meaning that it inherently tends to boost the rec-
ommendation of popular (i.e., short-head) items at the detriment of
niche (i.e., long-tail) items from the catalog. Motivated by this as-
sumption, in this work, we provide one of the �rst analyses on how
multimodality in recommendation could further amplify popularity
bias. Concretely, we evaluate the performance of four state-of-the-
art MRSs algorithms (i.e., VBPR, MMGCN, GRCN, LATTICE) on
three datasets from Amazon by assessing, along with recommen-
dation accuracy metrics, performance measures accounting for
the diversity of recommended items and the portion of retrieved
niche items. To better investigate this aspect, we decide to study
the separate in�uence of each modality (i.e., visual and textual) on
popularity bias in di�erent evaluation dimensions. Results, which
demonstrate how the single modality may augment the negative
e�ect of popularity bias, shed light on the importance to provide a
more rigorous analysis of the performance of such models.
0 500 1,000 1,500 2,000 2,500
0
20
40
60
80
100
120
items
popularity
short-head
long-tail
Figure 1: Short-head and long-tail items from the O�ce
dataset in the Amazon catalog.
Systems: a Modalities-driven Analysis. In Proceedings of Make sure to en-
ter the correct conference title from your rights con�rmation emai (Confer-
ence acronym ’XX). ACM, New York, NY, USA, 10 pages. https://doi.org/
.12911v1
[cs.IR]
24
Aug
2023
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Some recent works [Liu et al., Kowald and Lacic, Malitesta et al.
(2023b)] address bias in multimodal-aware recommendation, but
with different definitions and settings with respect to the one of
popularity bias we presented earlier.
7
Popularity bias in multimodal-aware recommendation
[Liu et al.] 2022. EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation. In ACM Multimedia. ACM, 687–695.
[Kowald and Lacic] 2022. Popularity Bias in Collaborative Filtering-Based Multimedia Recommender Systems. In BIAS (Communications in Computer and Information Science, Vol. 1610). Springer, 1–11.
[Malitesta et al. (2023b)] 2023. Disentangling the Performance Puzzle of Multimodal-aware Recommender Systems. In EvalRS@KDD (CEUR Workshop Proceedings, Vol. 3450). CEUR-WS.org.
Daniele Malitesta∗
Politecnico di Bari, Italy
daniele.malitesta@poliba.it
Giandomenico Cornacchia∗
Politecnico di Bari, Italy
giandomenico.cornacchia@poliba.it
Claudio Pomo
Politecnico di Bari, Italy
claudio.pomo@poliba.it
Tommaso Di Noia
Politecnico di Bari, Italy
tommaso.dinoia@poliba.it
ABSTRACT
Multimodal-aware recommender systems (MRSs) exploit multi-
modal content (e.g., product images or descriptions) as items’ side
information to improve recommendation accuracy. While most of
such methods rely on factorization models (e.g., MFBPR) as base
architecture, it has been shown that MFBPR may be a�ected by
popularity bias, meaning that it inherently tends to boost the rec-
ommendation of popular (i.e., short-head) items at the detriment of
niche (i.e., long-tail) items from the catalog. Motivated by this as-
sumption, in this work, we provide one of the �rst analyses on how
multimodality in recommendation could further amplify popularity
bias. Concretely, we evaluate the performance of four state-of-the-
art MRSs algorithms (i.e., VBPR, MMGCN, GRCN, LATTICE) on
three datasets from Amazon by assessing, along with recommen-
dation accuracy metrics, performance measures accounting for
the diversity of recommended items and the portion of retrieved
niche items. To better investigate this aspect, we decide to study
the separate in�uence of each modality (i.e., visual and textual) on
popularity bias in di�erent evaluation dimensions. Results, which
demonstrate how the single modality may augment the negative
e�ect of popularity bias, shed light on the importance to provide a
more rigorous analysis of the performance of such models.
0 500 1,000 1,500 2,000 2,500
0
20
40
60
80
100
120
items
popularity
short-head
long-tail
Figure 1: Short-head and long-tail items from the O�ce
dataset in the Amazon catalog.
Systems: a Modalities-driven Analysis. In Proceedings of Make sure to en-
ter the correct conference title from your rights con�rmation emai (Confer-
ence acronym ’XX). ACM, New York, NY, USA, 10 pages. https://doi.org/
.12911v1
[cs.IR]
24
Aug
2023
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
ü Propose one of the first analyses on how multimodal-aware recommender systems may amplify popularity bias
ü Select four state-of-the-art multimodal-aware recommender systems (i.e., VBPR, MMGCN, GRCN, and LATTICE)
ü Train them on three categories of the Amazon Catalogue (i.e., Office, Toys, and Clothing)
ü Evaluate the performance on recommendation accuracy and popularity bias (i.e., diversity and percentage of retrieved items from the long-tail)
ü Assess the separate impact of each multimodal side information on single and paired recommendation metrics
8
Our contributions
Research questions
RQ1) How do multimodal-aware recommendation models behave in terms of accuracy, diversity, and popularity bias?
RQ2) What is the influence of each modality (i.e., visual, textual, multimodal) on such performance measures?
Background
9
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
10
Preliminaries
U
I
u1 u2
i3
i2
i1
1 1 1
0 1 0
1 1 0
0 0 1
1 1 0
X
User-item
interaction matrix
u3
u4 u5
USERS
ITEMS
eu
ei
fi
fu
Collaborative
Multimodal
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
• Visual Bayesian personalized ranking (VBPR [He
et al. (2016)])
• Multimodal graph convolutional network for
recommendation (MMGCN [Wei et al. (2019)])
• Graph-refined convolutional network (GRCN
[Wei et al. (2020)])
• Latent structure mining method for multimodal
recommendation (LATTICE [Zhang et al.])
11
Multimodal-aware recommender systems
[He et al. (2016)] 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150.
[Wei et al. (2019)] 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video. In ACM Multimedia. ACM, 1437–1445.
[Wei et al. (2020)] 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. In ACM Multimedia. ACM, 3541–3549.
[Zhang et al.] 2021. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880.
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis MMIR ’23, November 2, 2023, O�awa, ON, Canada
Models Year Venue Prediction
VBPR 2016 AAAI ĜD8 = e>
D e8 + f>
D C(f8) with f8 = k
<2M
f<
8
MMGCN 2019 MM ĜD8 = f>
D f8 with fD =
Õ
<2M
2(eD,6(f<
D ),C(f<
D , eD))
GRCN 2020 MM ĜD8 = f>
D f8 with fD = 6(eD, f<
D , 8< 2 M) ||
✓
k
<2M
C(f<
D )
◆
LATTICE 2021 MM ĜD8 = e>
D f8 with f8 = e8 +
6(e8,f<
8 ,8<2M)
||6(e8,f<
8 ,8<2M)||2
Table 1: Statistics of the tested datasets.
Datasets |U| |I| |R| Sparsity (%)
O�ce 4,905 2,420 53,258 99.5513
Toys 19,412 11,924 167,597 99.9276
Clothing 39,387 23,033 278,677 99.9693
approach. Then, we focused on quantifying the singular modality
in�uence on the multimodal scenario in terms of accuracy, diver-
and MFBPR as a reference for the other multimodal-aware
recommender systems we want to analyze.
RQ2. What is the in�uence of each modality setting (i.e., visual, tex
tual, multimodal) on such performance measures? Section 5.2
takes a step further by analyzing how each modality (i.e., vi
sual, textual, and multimodal) in�uences accuracy, diversity
and popularity bias; the evaluation is conducted both on the
single metric and across pairs of metrics.
5.1 Recommendation accuracy, diversity, and
popularity bias (RQ1)
Proposed analysis
12
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
13
Datasets and multimodal features
[McAuley et al.] 2015. Image-Based Recommendations on Styles and Substitutes. In SIGIR. ACM, 43–52.
[Deldjoo et al.] 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967.
[Zhang et al.] 2021. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880.
MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
Table 1: Statistics of the tested datasets.
Datasets |U| |I| |R| Sparsity (%)
O�ce 4,905 2,420 53,258 99.5513
Toys 19,412 11,924 167,597 99.9276
Clothing 39,387 23,033 278,677 99.9693
through weighted element-wise addition, and the �nal adjacency
matrix is exploited to perform graph convolution to update the
representation of the collaborative item embeddings. Then, this
updated version is added to the initial collaborative item embed-
ding. Finally, the dot product between the collaborative user and
(updated) item embeddings predicts the interaction score:
ĜD8 = e>
D f8 with f8 = e8 +
6(e8, f<
8 , 8< 2 M)
, (6)
thorough coverage to the list of user interactions [7]:
Recall@: =
1
|U|
’
D2U
|RelD @:|
|RelD |
, (7)
where RelD indicates the set of relevant items for user D, while
RelD @: is the set of relevant recommended items in the top-: list.
Normalized discount cumulative gain. The normalized discount
cumulative gain (nDCG) considers the relevance and the ranking
position of recommended products, taking into account the varied
degrees of relevance:
nDCG@: =
1
|U|
’
D
DCGD@:
IDCGD@:
, (8)
where DCG@: =
Õ:
8=1
2A4;D,8 1
log2 (8+1) quanti�es the cumulative gain of
relevance scores through the recommended list, with A4;D,8 2 RelD,
and IDCG represents the cumulative gain of relevance scores for a
Amazon Catalogue [McAuley et al.] Multimodal features
• Visual features: 4,096 embeddings [Deldjoo et al.]
• Textual features: 1,024 embeddings [Zhang et al.]
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Item coverage:
14
Evaluation metrics
representation of the collaborative item embeddings. Then, this
updated version is added to the initial collaborative item embed-
ding. Finally, the dot product between the collaborative user and
(updated) item embeddings predicts the interaction score:
ĜD8 = e>
D f8 with f8 = e8 +
6(e8, f<
8 , 8< 2 M)
||6(e8, f<
8 , 8< 2 M)||2
, (6)
where6 is a LightGCN [28] architecture performing graph structure
learning as stated above.
4 PROPOSED ANALYSIS
In this section, we present the details to conduct our analysis. Ini-
tially, we report on the used datasets, describing the methodologies
employed for extracting multimodal features. Subsequently, we
introduce and formally de�ne the evaluation metrics employed,
encompassing accuracy, diversity, and popularity bias. Finally, we
provide a thorough summary of the reproducibility information
for our study, detailing the methods used for dataset splitting and
�ltering as well as the strategy for hyperparameter search.
4.1 Datasets
he multimodal recommender systems have been tested on three
popular [17, 33, 66, 69] datasets from the Amazon catalog [46]: Of-
�ce Products (O�ce), (b) Toys & Games (Toys), and (c) Clothing,
Shoes & Jewelry (Clothing). The multimodal datasets provide both
images and descriptions for each available item. Speci�cally, we uti-
lize the pre-extracted 4,096-dimensional visual features [24] which
are made publicly available1. For the textual modality, we follow the
existing literature [66], which aggregates the item’s title, descrip-
tions, categories, and brand, thereby generating textual embeddings
by leveraging sentence transformers [51]. The generated features
are 1,024-dimensional embeddings. Additional dataset information
can be found in Table 1.
nDCG@: =
1
|U|
’
D
DCGD@:
IDCGD@:
, (8)
where DCG@: =
Õ:
8=1
2A4;D,8 1
log2 (8+1) quanti�es the cumulative gain of
relevance scores through the recommended list, with A4;D,8 2 RelD,
and IDCG represents the cumulative gain of relevance scores for a
perfect (ideal) recommender system.
Item coverage. The item coverage (abbreviated “iCov” in the fol-
lowing) gives information on the coverage (item-side) measured in
recommendation lists. A higher item coverage suggests that a larger
fraction of the item space is being scrutinized and recommended to
consumers, implying a more comprehensive coverage of user pref-
erences and potentially a more comprehensive recommendation
experience. In particular, we have:
iCov@: =
|
–
D Î
D@:|
|I
CA08= |
, (9)
where Î
D@: is the list of top-: recommended items for a user D.
Average percentage of long-tail items. The average percentage
of long-tail items (APLT) is a measure used to assess the presence
of popularity bias in recommendation systems [2]. Popularity bias
refers to the tendency of recommendation algorithms to priori-
tize popular or mainstream items over less well-known or niche
items. This bias can lead to limited exposure of users to diverse and
personalized recommendations. The metric measure the percent-
age of items belonging to the medium/long-tail distribution in the
recommendation lists averaged over all users:
APLT@: =
1
|U|
’
D2U
|{8 | 8 2 (Î
D@:  ⇠ )}|
:
, (10)
where is the set of items belonging to the short-tail distribution
while ⇠ is the set of items from the medium/long-tail distribution.
Note that we decide to integrate the evaluation of the APLT along
with the iCov (introduced above) because the latter may be func-
ˆD8 = e>
D f8 with f8 = e8 +
6(e8, f<
8 , 8< 2 M)
||6(e8, f<
8 , 8< 2 M)||2
, (6)
is a LightGCN [28] architecture performing graph structure
g as stated above.
ROPOSED ANALYSIS
section, we present the details to conduct our analysis. Ini-
e report on the used datasets, describing the methodologies
ed for extracting multimodal features. Subsequently, we
ce and formally de�ne the evaluation metrics employed,
assing accuracy, diversity, and popularity bias. Finally, we
a thorough summary of the reproducibility information
study, detailing the methods used for dataset splitting and
g as well as the strategy for hyperparameter search.
Datasets
imodal recommender systems have been tested on three
[17, 33, 66, 69] datasets from the Amazon catalog [46]: Of-
ducts (O�ce), (b) Toys & Games (Toys), and (c) Clothing,
& Jewelry (Clothing). The multimodal datasets provide both
and descriptions for each available item. Speci�cally, we uti-
pre-extracted 4,096-dimensional visual features [24] which
e publicly available1. For the textual modality, we follow the
g literature [66], which aggregates the item’s title, descrip-
ategories, and brand, thereby generating textual embeddings
raging sentence transformers [51]. The generated features
4-dimensional embeddings. Additional dataset information
ound in Table 1.
Evaluation metrics
roposed study, we refer to various metrics that may bring
itional insights which have not been investigated yet in
odal recommendation. Indeed, we do not solely rely on
relevance scores through the recommended list, with A4;D,8 2 RelD,
and IDCG represents the cumulative gain of relevance scores for a
perfect (ideal) recommender system.
Item coverage. The item coverage (abbreviated “iCov” in the fol-
lowing) gives information on the coverage (item-side) measured in
recommendation lists. A higher item coverage suggests that a larger
fraction of the item space is being scrutinized and recommended to
consumers, implying a more comprehensive coverage of user pref-
erences and potentially a more comprehensive recommendation
experience. In particular, we have:
iCov@: =
|
–
D Î
D@:|
|I
CA08= |
, (9)
where Î
D@: is the list of top-: recommended items for a user D.
Average percentage of long-tail items. The average percentage
of long-tail items (APLT) is a measure used to assess the presence
of popularity bias in recommendation systems [2]. Popularity bias
refers to the tendency of recommendation algorithms to priori-
tize popular or mainstream items over less well-known or niche
items. This bias can lead to limited exposure of users to diverse and
personalized recommendations. The metric measure the percent-
age of items belonging to the medium/long-tail distribution in the
recommendation lists averaged over all users:
APLT@: =
1
|U|
’
D2U
|{8 | 8 2 (Î
D@:  ⇠ )}|
:
, (10)
where is the set of items belonging to the short-tail distribution
while ⇠ is the set of items from the medium/long-tail distribution.
Note that we decide to integrate the evaluation of the APLT along
with the iCov (introduced above) because the latter may be func-
tional to provide a complete interpretation of the former. Indeed,
following their de�nitions and formulations, the two metrics are
conceptually related.
Metrics value interpretation An ideal recommender system
should increase all the metrics listed above according to the princi-
Average percentage of long-tail items [Abdollahpouri et al.]
Recall:
by leveraging sentence transformers [51]. The generated features
are 1,024-dimensional embeddings. Additional dataset information
can be found in Table 1.
4.2 Evaluation metrics
In the proposed study, we refer to various metrics that may bring
out additional insights which have not been investigated yet in
multimodal recommendation. Indeed, we do not solely rely on
accuracy metrics (i.e., Recall and nDCG) but also on diversity (i.e.,
item coverage) and popularity bias (i.e., APLT) metrics. The metrics
listed hereinafter are calculated on top-: recommendation lists.
Recall. The Recall assesses the system’s capacity to retrieve rele-
vant items from the recommendation list, highlighting the need for
thorough coverage to the list of user interactions [7]:
Recall@: =
1
|U|
’
D2U
|RelD @:|
|RelD |
, (7)
where RelD indicates the set of relevant items for user D, while
RelD @: is the set of relevant recommended items in the top-: list.
Normalized discount cumulative gain. The normalized discount
cumulative gain (nDCG) considers the relevance and the ranking
1https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html.
where is the set of items belonging to the short-tail distribution
while ⇠ is the set of items from the medium/long-tail distribution.
Note that we decide to integrate the evaluation of the APLT along
with the iCov (introduced above) because the latter may be func-
tional to provide a complete interpretation of the former. Indeed,
following their de�nitions and formulations, the two metrics are
conceptually related.
Metrics value interpretation An ideal recommender system
should increase all the metrics listed above according to the princi-
ple “higher is better” to boost accuracy and diversity while reducing
the popularity bias of the produced recommendations. Neverthe-
less, with the current work, we try to unveil whether and why
multimodal-aware recommender systems are a�ected by pop-
ularity bias. Thus, in the following, we will take into account
those se�ings in which accuracy is high, while diversity and
popularity bias are low (according to the metrics de�nitions).
4.3 Reproducibility
We investigate the models’ behavior in three di�erent settings:
(i) visual modality, in which we employ only visual features, (ii)
textual modality, in which we employ only textual features, and (iii)
multimodal, where both modalities are considered and combined.
In the �rst step, we evaluate the models in the multimodal set-
ting which is the same setting as the original one for each tested
Normalized discount cumulative gain:
Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
position of recommended products, taking into account the varied
degrees of relevance:
nDCG@: =
1
|U|
’
D
DCGD@:
IDCGD@:
, (8)
where DCG@: =
Õ:
8=1
2A4;D,8 1
log2 (8+1) quanti�es the cumulative gain of
relevance scores through the recommended list, with A4;D,8 2 RelD,
and IDCG represents the cumulative gain of relevance scores for a
perfect (ideal) recommender system.
Item coverage. The item coverage (abbreviated “iCov” in the fol-
lowing) gives information on the coverage (item-side) measured in
recommendation lists. A higher item coverage suggests that a larger
fraction of the item space is being scrutinized and recommended to
consumers, implying a more comprehensive coverage of user pref-
Accuracy Popularity Bias
[Abdollahpouri et al.] 2017. Controlling Popularity Bias in Learning-to-Rank Recommendation. In RecSys. ACM, 42–46.
Results and discussion
15
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
16
Recommendation accuracy, diversity, and popularity bias (RQ1)
MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
Table 2: Results in terms of recommendation accuracy (Recall, nDCG), diversity (iCov) and popularity bias (APLT). For accuracy
metrics, " means better performance, while # means less diversity and more popularity bias. We remind that, while iCov and
APLT metrics would generally adhere to the principle of “higher is better” (") for an ideal recommender system, in this work we
consider the opposite as we want to emphasize which models are performing worst in terms of diversity and popularity bias.
Datasets Models
top@10 top@20 top@50
Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT#
O�ce
Random 0.0034 0.0020 2,414 0.5950 0.0079 0.0034 2,414 0.5948 0.0220 0.0068 2,414 0.5924
MostPop 0.0302 0.0208 20 0.0000 0.0533 0.0282 32 0.0000 0.1143 0.0439 66 0.0000
MFBPR 0.0602 0.0389 2,268 0.2294 0.0955 0.0500 2,357 0.2379 0.1657 0.0677 2,398 0.2513
VBPR 0.0652 0.0419 2,265 0.2321 0.1025 0.0533 2,354 0.2375 0.1774 0.0721 2,404 0.2469
MMGCN 0.0455 0.0300 74 0.0016 0.0798 0.0405 112 0.0078 0.1575 0.0598 247 0.0205
GRCN 0.0393 0.0253 2,390 0.3438 0.0667 0.0339 2,409 0.3469 0.1250 0.0488 2,414 0.3548
LATTICE 0.0664 0.0449 2,121 0.1752 0.1029 0.0566 2,315 0.2039 0.1780 0.0751 2,397 0.2413
Toys
Random 0.0011 0.0006 11,879 0.4894 0.0021 0.0008 11,879 0.4896 0.0051 0.0015 11,879 0.4902
MostPop 0.0130 0.0075 13 0.0000 0.0229 0.0104 24 0.0000 0.0451 0.0156 56 0.0000
MFBPR 0.0641 0.0403 10,016 0.1167 0.0903 0.0481 10,944 0.1268 0.1394 0.0596 11,544 0.1460
VBPR 0.0710 0.0458 10,085 0.1064 0.1006 0.0545 11,026 0.1180 0.1523 0.0667 11,624 0.1400
MMGCN 0.0256 0.0150 4,499 0.0961 0.0426 0.0200 6,238 0.1058 0.0785 0.0285 8,657 0.1263
GRCN 0.0554 0.0354 11,007 0.2368 0.0831 0.0436 11,609 0.2482 0.1355 0.0559 11,847 0.2679
LATTICE 0.0805 0.0512 8,767 0.0546 0.1165 0.0617 10,285 0.0684 0.1771 0.0759 11,397 0.0950
Clothing
Random 0.0004 0.0002 23,016 0.4487 0.0010 0.0003 23,016 0.4478 0.0024 0.0006 23,016 0.4482
MostPop 0.0089 0.0046 13 0.0000 0.0157 0.0063 24 0.0000 0.0322 0.0095 56 0.0000
MFBPR 0.0303 0.0156 18,414 0.0729 0.0459 0.0195 20,582 0.0824 0.0734 0.0249 22,171 0.1017
VBPR 0.0339 0.0181 19,195 0.0809 0.0529 0.0229 21,251 0.0915 0.0847 0.0292 22,555 0.1112
MMGCN 0.0227 0.0119 1,744 0.0044 0.0348 0.0150 2,864 0.0066 0.0609 0.0201 5,373 0.0121
GRCN 0.0319 0.0164 21,490 0.2358 0.0496 0.0209 22,503 0.2459 0.0858 0.0281 22,954 0.2631
LATTICE 0.0502 0.0275 13,463 0.0134 0.0744 0.0336 17,538 0.0207 0.1186 0.0425 21,458 0.0385
popularity bias phenomenon as much as MMGCN does. Indeed,
even if LATTICE’s iCov is the second-worst across all the datasets,
the metric is always close to the best models in terms of diversity.
Finally, VBPR and GRCN con�rm their ability (already observed
on the diversity measure) to tackle also popularity bias in all ex-
discuss the in�uence of each single modality on the performance.
We consider two evaluation dimensions where modalities in�uence
is assessed (i) on accuracy, diversity, and popularity bias separately,
and (ii) on pairs of metrics to investigate their joint variations.
Modalities in�uence on the single metric. Figure 2 displays the
LATTICE stands out for its accuracy
performance…😃
…but amplifies popularity bias 🥲
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
17
Recommendation accuracy, diversity, and popularity bias (RQ1)
MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
Table 2: Results in terms of recommendation accuracy (Recall, nDCG), diversity (iCov) and popularity bias (APLT). For accuracy
metrics, " means better performance, while # means less diversity and more popularity bias. We remind that, while iCov and
APLT metrics would generally adhere to the principle of “higher is better” (") for an ideal recommender system, in this work we
consider the opposite as we want to emphasize which models are performing worst in terms of diversity and popularity bias.
Datasets Models
top@10 top@20 top@50
Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT#
O�ce
Random 0.0034 0.0020 2,414 0.5950 0.0079 0.0034 2,414 0.5948 0.0220 0.0068 2,414 0.5924
MostPop 0.0302 0.0208 20 0.0000 0.0533 0.0282 32 0.0000 0.1143 0.0439 66 0.0000
MFBPR 0.0602 0.0389 2,268 0.2294 0.0955 0.0500 2,357 0.2379 0.1657 0.0677 2,398 0.2513
VBPR 0.0652 0.0419 2,265 0.2321 0.1025 0.0533 2,354 0.2375 0.1774 0.0721 2,404 0.2469
MMGCN 0.0455 0.0300 74 0.0016 0.0798 0.0405 112 0.0078 0.1575 0.0598 247 0.0205
GRCN 0.0393 0.0253 2,390 0.3438 0.0667 0.0339 2,409 0.3469 0.1250 0.0488 2,414 0.3548
LATTICE 0.0664 0.0449 2,121 0.1752 0.1029 0.0566 2,315 0.2039 0.1780 0.0751 2,397 0.2413
Toys
Random 0.0011 0.0006 11,879 0.4894 0.0021 0.0008 11,879 0.4896 0.0051 0.0015 11,879 0.4902
MostPop 0.0130 0.0075 13 0.0000 0.0229 0.0104 24 0.0000 0.0451 0.0156 56 0.0000
MFBPR 0.0641 0.0403 10,016 0.1167 0.0903 0.0481 10,944 0.1268 0.1394 0.0596 11,544 0.1460
VBPR 0.0710 0.0458 10,085 0.1064 0.1006 0.0545 11,026 0.1180 0.1523 0.0667 11,624 0.1400
MMGCN 0.0256 0.0150 4,499 0.0961 0.0426 0.0200 6,238 0.1058 0.0785 0.0285 8,657 0.1263
GRCN 0.0554 0.0354 11,007 0.2368 0.0831 0.0436 11,609 0.2482 0.1355 0.0559 11,847 0.2679
LATTICE 0.0805 0.0512 8,767 0.0546 0.1165 0.0617 10,285 0.0684 0.1771 0.0759 11,397 0.0950
Clothing
Random 0.0004 0.0002 23,016 0.4487 0.0010 0.0003 23,016 0.4478 0.0024 0.0006 23,016 0.4482
MostPop 0.0089 0.0046 13 0.0000 0.0157 0.0063 24 0.0000 0.0322 0.0095 56 0.0000
MFBPR 0.0303 0.0156 18,414 0.0729 0.0459 0.0195 20,582 0.0824 0.0734 0.0249 22,171 0.1017
VBPR 0.0339 0.0181 19,195 0.0809 0.0529 0.0229 21,251 0.0915 0.0847 0.0292 22,555 0.1112
MMGCN 0.0227 0.0119 1,744 0.0044 0.0348 0.0150 2,864 0.0066 0.0609 0.0201 5,373 0.0121
GRCN 0.0319 0.0164 21,490 0.2358 0.0496 0.0209 22,503 0.2459 0.0858 0.0281 22,954 0.2631
LATTICE 0.0502 0.0275 13,463 0.0134 0.0744 0.0336 17,538 0.0207 0.1186 0.0425 21,458 0.0385
popularity bias phenomenon as much as MMGCN does. Indeed,
even if LATTICE’s iCov is the second-worst across all the datasets,
the metric is always close to the best models in terms of diversity.
Finally, VBPR and GRCN con�rm their ability (already observed
on the diversity measure) to tackle also popularity bias in all ex-
discuss the in�uence of each single modality on the performance.
We consider two evaluation dimensions where modalities in�uence
is assessed (i) on accuracy, diversity, and popularity bias separately,
and (ii) on pairs of metrics to investigate their joint variations.
Modalities in�uence on the single metric. Figure 2 displays the
MMGCN struggles with diversity… 🤒
...exhibits strong popularity bias… 😱
…and sacrifices accuracy in certain
scenarios ☠
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
18
Recommendation accuracy, diversity, and popularity bias (RQ1)
MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
Table 2: Results in terms of recommendation accuracy (Recall, nDCG), diversity (iCov) and popularity bias (APLT). For accuracy
metrics, " means better performance, while # means less diversity and more popularity bias. We remind that, while iCov and
APLT metrics would generally adhere to the principle of “higher is better” (") for an ideal recommender system, in this work we
consider the opposite as we want to emphasize which models are performing worst in terms of diversity and popularity bias.
Datasets Models
top@10 top@20 top@50
Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT#
O�ce
Random 0.0034 0.0020 2,414 0.5950 0.0079 0.0034 2,414 0.5948 0.0220 0.0068 2,414 0.5924
MostPop 0.0302 0.0208 20 0.0000 0.0533 0.0282 32 0.0000 0.1143 0.0439 66 0.0000
MFBPR 0.0602 0.0389 2,268 0.2294 0.0955 0.0500 2,357 0.2379 0.1657 0.0677 2,398 0.2513
VBPR 0.0652 0.0419 2,265 0.2321 0.1025 0.0533 2,354 0.2375 0.1774 0.0721 2,404 0.2469
MMGCN 0.0455 0.0300 74 0.0016 0.0798 0.0405 112 0.0078 0.1575 0.0598 247 0.0205
GRCN 0.0393 0.0253 2,390 0.3438 0.0667 0.0339 2,409 0.3469 0.1250 0.0488 2,414 0.3548
LATTICE 0.0664 0.0449 2,121 0.1752 0.1029 0.0566 2,315 0.2039 0.1780 0.0751 2,397 0.2413
Toys
Random 0.0011 0.0006 11,879 0.4894 0.0021 0.0008 11,879 0.4896 0.0051 0.0015 11,879 0.4902
MostPop 0.0130 0.0075 13 0.0000 0.0229 0.0104 24 0.0000 0.0451 0.0156 56 0.0000
MFBPR 0.0641 0.0403 10,016 0.1167 0.0903 0.0481 10,944 0.1268 0.1394 0.0596 11,544 0.1460
VBPR 0.0710 0.0458 10,085 0.1064 0.1006 0.0545 11,026 0.1180 0.1523 0.0667 11,624 0.1400
MMGCN 0.0256 0.0150 4,499 0.0961 0.0426 0.0200 6,238 0.1058 0.0785 0.0285 8,657 0.1263
GRCN 0.0554 0.0354 11,007 0.2368 0.0831 0.0436 11,609 0.2482 0.1355 0.0559 11,847 0.2679
LATTICE 0.0805 0.0512 8,767 0.0546 0.1165 0.0617 10,285 0.0684 0.1771 0.0759 11,397 0.0950
Clothing
Random 0.0004 0.0002 23,016 0.4487 0.0010 0.0003 23,016 0.4478 0.0024 0.0006 23,016 0.4482
MostPop 0.0089 0.0046 13 0.0000 0.0157 0.0063 24 0.0000 0.0322 0.0095 56 0.0000
MFBPR 0.0303 0.0156 18,414 0.0729 0.0459 0.0195 20,582 0.0824 0.0734 0.0249 22,171 0.1017
VBPR 0.0339 0.0181 19,195 0.0809 0.0529 0.0229 21,251 0.0915 0.0847 0.0292 22,555 0.1112
MMGCN 0.0227 0.0119 1,744 0.0044 0.0348 0.0150 2,864 0.0066 0.0609 0.0201 5,373 0.0121
GRCN 0.0319 0.0164 21,490 0.2358 0.0496 0.0209 22,503 0.2459 0.0858 0.0281 22,954 0.2631
LATTICE 0.0502 0.0275 13,463 0.0134 0.0744 0.0336 17,538 0.0207 0.1186 0.0425 21,458 0.0385
popularity bias phenomenon as much as MMGCN does. Indeed,
even if LATTICE’s iCov is the second-worst across all the datasets,
the metric is always close to the best models in terms of diversity.
Finally, VBPR and GRCN con�rm their ability (already observed
on the diversity measure) to tackle also popularity bias in all ex-
discuss the in�uence of each single modality on the performance.
We consider two evaluation dimensions where modalities in�uence
is assessed (i) on accuracy, diversity, and popularity bias separately,
and (ii) on pairs of metrics to investigate their joint variations.
Modalities in�uence on the single metric. Figure 2 displays the
VBPR and GRCN better manage all
the metrics by finding the right
compromise among them 😎
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
19
Modalities influence on recommendation performance (RQ2)
Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis MMIR ’23, November 2, 2023, O�awa, ON, Canada
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
(a) Recall
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
(b) iCov
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis MMIR ’23, November 2, 2023, O�awa, ON, Canada
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
(a) Recall
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
(b) iCov
-10%
0%
+10%
+20%
-10%
0%
+10%
+20%
-10%
0%
+10%
+20%
VBPR MMGCN GRCN LATTICE
O�ce
VBPR MMGCN GRCN LATTICE
Toys
VBPR MMGCN GRCN LATTICE
Clothing
(a) Recall
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
(b) iCov
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
O�ce
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Toys
VBPR MMGCN GRCN LATTICE
-20%
-10%
0%
+10%
+20%
Clothing
(c) APLT
visual textual
Figure 2: Percentage variation on the (a) Recall, (b) iCov, and (c) APLT when training the multimodal recommender systems
with either visual or textual modalities. The 0% line stands for the reference performance provided by the multimodal version
of the model. All results refer to the top@20 recommendation lists.
showing consistent trends. Indeed, the visual modality reduces the
Recall while the textual increases it (with the only exception of
VBPR whose percentage variation is negligible).
Di�erently from the accuracy analysis, we recognize a quasi-
stable trend in the performance variation measured for the diversity
metric (Figure 2b). Considering the O�ce dataset, each modality’s
contribution is generally irrelevant except for MMGCN, for which
the visual modality slightly improves the coverage across the whole
recommendation list, while the textual one worsens the perfor-
mance by a large margin. Assessing the trend on Toys, both the
modalities decrease the coverage performance of the model when
injected separately in the recommendation pipeline; remarkably,
MMGCN is once again the model a�ected by the single modality
presence the most, but this time the coverage performance widely
deteriorates because of both the visual and textual modalities. Fi-
nally, on Clothing, both modalities lower the model’s item coverage,
with speci�c reference to the visual modality.
As the last part of our analysis, we take into account each modal-
ity’s contribution to the popularity bias dimension (Figure 2c). Start-
ing from O�ce, we notice how both modalities are prone to enforce
popularity bias if injected singularly, with the only exception of
LATTICE whose textual modality limits the popularity bias (the
The textual modality improves the accuracy… 💪
…while both modalities negatively affect the
diversity and reinforce the popularity bias 😭
Single metric setting
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
20
Modalities influence on recommendation performance (RQ2)
The textual modality has a
significant influence on
accuracy… 😣
but minimal effects on diversity
and popularity bias 😇
Pair-wise metric setting
MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
0.03 0.04 0.05 0.06 0.07 0.08
0.00
0.05
0.10
0.15
0.20
0.25
Recall
APLT
(a)
0.03 0.04 0.05 0.06 0.07 0.08
5 000
10 000
15 000
20 000
Recall
iCov
(b)
5 000 10 000 15 000 20 000
0.00
0.05
0.10
0.15
0.20
0.25
iCov
APLT
(c)
VBPR MMGCN GRCN LATTICE
multimodal visual textual
Figure 3: Performance analysis on Clothing when comparing (a) Recall vs. APLT, (b) Recall vs. iCov, and (c) iCov vs. APLT for
di�erent modality settings involving the multimodal, visual, and textual modalities. Metrics are on top@20.
APLT increases); this is interesting as we remind that LATTICE is
the second-worst model in terms of popularity bias, but using only
the textual modality reduces its accuracy performance and the in�u-
ence of popular items in the recommendation list. When it comes to
the Toys dataset, every single modality enforces the popularity bias
of MMGCN and GRCN; for VBPR, the visual and textual modalities
6 CONCLUSION AND FUTURE WORK
Motivated by the assumption that factorization models in recom-
mendation (such as MFBPR) are a�ected by popularity bias, in this
work, we provided one of the �rst systematic analyses on how
multimodal-aware recommender systems (largely built upon MF-
BPR) further amplify the recommendation of popular items. After
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
21
Modalities influence on recommendation performance (RQ2)
The visual modality reduces
accuracy… 😨
…and jointly worsens the
popularity bias and diversity 😵
Pair-wise metric setting (cont’d)
MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia
0.03 0.04 0.05 0.06 0.07 0.08
0.00
0.05
0.10
0.15
0.20
0.25
Recall
APLT
(a)
0.03 0.04 0.05 0.06 0.07 0.08
5 000
10 000
15 000
20 000
Recall
iCov
(b)
5 000 10 000 15 000 20 000
0.00
0.05
0.10
0.15
0.20
0.25
iCov
APLT
(c)
VBPR MMGCN GRCN LATTICE
multimodal visual textual
Figure 3: Performance analysis on Clothing when comparing (a) Recall vs. APLT, (b) Recall vs. iCov, and (c) iCov vs. APLT for
di�erent modality settings involving the multimodal, visual, and textual modalities. Metrics are on top@20.
APLT increases); this is interesting as we remind that LATTICE is
the second-worst model in terms of popularity bias, but using only
the textual modality reduces its accuracy performance and the in�u-
ence of popular items in the recommendation list. When it comes to
the Toys dataset, every single modality enforces the popularity bias
of MMGCN and GRCN; for VBPR, the visual and textual modalities
6 CONCLUSION AND FUTURE WORK
Motivated by the assumption that factorization models in recom-
mendation (such as MFBPR) are a�ected by popularity bias, in this
work, we provided one of the �rst systematic analyses on how
multimodal-aware recommender systems (largely built upon MF-
BPR) further amplify the recommendation of popular items. After
Conclusion and future work
22
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Conclusion
● Analysis on influence of multimodality on popularity bias
● Four SOTA multimodal recommendation approaches on three datasets
● Three evaluation dimensions and three modality settings
● [RQ1] VBPR and GRCN strike a better compromise among all metrics
● [RQ2 single] Separate injection of modalities improves accuracy but negatively impacts diversity and popularity bias
● [RQ2 pairs textual] Highly impacts on accuracy but little effect on diversity and popularity bias
● [RQ2 pairs visual] Reduces accuracy while exacerbating popularity bias and limiting the diversity
Future work
● More complete study on the performance of these models
● Assessing the performance of more recent multimodal approaches [Malitesta et al. (2023a)]
23
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Reach us out!
24
The authors:
• Daniele Malitesta (daniele.malitesta@poliba.it)
• Giandomenico Cornacchia (giandomenico.cornacchia@poliba.it)
• Claudio Pomo (claudio.pomo@poliba.it)
• Tommaso Di Noia (tommaso.dinoia@poliba.it)
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023)
Don’t forget to check out our theoretical/experimental survey
25

More Related Content

PDF
Recommender systems: a novel approach based on singular value decomposition
PDF
An Unsupervised Approach For Reputation Generation
PDF
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
PDF
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...
PDF
A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix...
PPTX
PhD defense
PDF
Survey on Location Based Recommendation System Using POI
PDF
Multidirectional Product Support System for Decision Making In Textile Indust...
Recommender systems: a novel approach based on singular value decomposition
An Unsupervised Approach For Reputation Generation
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...
A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix...
PhD defense
Survey on Location Based Recommendation System Using POI
Multidirectional Product Support System for Decision Making In Textile Indust...

Similar to [MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis (20)

PDF
Advances In Collaborative Filtering
PPTX
Recommenders, Topics, and Text
PDF
factorization methods
PDF
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
DOC
Poster Abstracts
PDF
An improvised model for identifying influential nodes in multi parameter soci...
PDF
Data mining java titles adrit solutions
PDF
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
PDF
PDF
Current trends of opinion mining and sentiment analysis in social networks
PDF
Recommender Systems
PDF
Bx044461467
PDF
Improving-Movie-Recommendation-Systems-Filtering-by-Exploiting-UserBased-Revi...
PPTX
Literature Review on Social Networking in Supply chain
PDF
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
PDF
Detailed structure applicable to hybrid recommendation technique
PDF
On the benefit of logic-based machine learning to learn pairwise comparisons
PPT
DBLP-SSE: A DBLP Search Support Engine
PDF
Extending canonical action research model to implement social media in microb...
PDF
A Novel Latent Factor Model For Recommender System
Advances In Collaborative Filtering
Recommenders, Topics, and Text
factorization methods
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
Poster Abstracts
An improvised model for identifying influential nodes in multi parameter soci...
Data mining java titles adrit solutions
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
Current trends of opinion mining and sentiment analysis in social networks
Recommender Systems
Bx044461467
Improving-Movie-Recommendation-Systems-Filtering-by-Exploiting-UserBased-Revi...
Literature Review on Social Networking in Supply chain
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Detailed structure applicable to hybrid recommendation technique
On the benefit of logic-based machine learning to learn pairwise comparisons
DBLP-SSE: A DBLP Search Support Engine
Extending canonical action research model to implement social media in microb...
A Novel Latent Factor Model For Recommender System
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
Mechanical Engineering MATERIALS Selection
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Well-logging-methods_new................
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPT
introduction to datamining and warehousing
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
PPT on Performance Review to get promotions
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Digital Logic Computer Design lecture notes
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Mechanical Engineering MATERIALS Selection
Model Code of Practice - Construction Work - 21102022 .pdf
Safety Seminar civil to be ensured for safe working.
R24 SURVEYING LAB MANUAL for civil enggi
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Well-logging-methods_new................
bas. eng. economics group 4 presentation 1.pptx
introduction to datamining and warehousing
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Construction Project Organization Group 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT on Performance Review to get promotions
Operating System & Kernel Study Guide-1 - converted.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Digital Logic Computer Design lecture notes
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Ad

[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis

  • 1. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Tommaso Di Noia Politecnico di Bari, Bari (Italy) email: firstname.lastname@poliba.it The 1st International Workshop on Deep Multimodal Learning for Information Retrieval Ottawa, ON, Canada, 11-02-2023 Co-located with ACM Multimedia 2023
  • 2. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) ● Introduction and motivations ● Background ● Proposed analysis ● Results and discussion ● Conclusion and future work Outline 2
  • 4. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Multimodal-aware recommender systems [Malitesta et al. (2023a)] exploit multimodal (i.e., audio, visual, textual) content data to augment the representation of items, thus tackling known issues such as dataset sparsity and the inexplicable nature of users’ actions (i.e., views, clicks) on online platforms. 4 Recommendation systems leveraging multimodal data ࢛ ࢏ MODALITIES ࢓૚ ࢓૛ ࢓૜ . . . . . . MULTIMODAL FEATURE EXTRACTOR ࣐࢓ሺ‫ڄ‬ሻ MULTIMODAL REPRESENTATION JOINT ࣆሺ‫ڄ‬ሻ COORDINATE ࣆ࢓ ‫ڄ‬ . . . INFERENCE ࣋ሺ‫ڄ‬ሻ EARLY FUSION ࢽࢋሺ‫ڄ‬ሻ LATE FUSION ࢽ࢒ሺ‫ڄ‬ሻ (1) (2) (a) (b) MULTIMODAL FUSION (3) (a) (b) (4) ࢘ Which? How? When? INPUT [Malitesta et al. (2023a)] 2023. Formalizing Multimedia Recommendation through Multimodal Deep Learning. Under review at TORS. Available online at: arXiv:2309.05273.
  • 5. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Most of multimodal-aware recommender systems are based upon factorization models for recommendation, such as the matrix factorization with Bayesian personalized ranking architecture (MFBPR [Rendle et al.]). Given its simple implementation and efficacy, MFBPR has long constituted the backbone of recommendation algorithms in collaborative filtering [He et al. (2020), Mao et al.], not only in multimodal recommendation. 5 Multimodal-aware recommendation and factorization models [Rendle et al.] 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. [He et al. (2020)] 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In SIGIR. ACM, 639–648. [Mao et al.] 2021. SimpleX: A Simple and Strong Baseline for Collaborative Filtering. In CIKM. ACM, 1243–1252. 𝑢 𝑖 # 𝑦!" 𝑚# 𝑚$ 𝑚%
  • 6. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Nevertheless, the literature has shown that MFBPR-like models may be affected by popularity bias [Abdollahpouri et al., Ricardo Baeza-Yates, Boratto et al., Jannach et al.]. Such recommender systems tend to boost the performance of items from the short-head at the detriment of the items from the long-tail. 6 Popularity bias in matrix factorization [Abdollahpouri et al.] 2017. Controlling Popularity Bias in Learning-to-Rank Recommendation. In RecSys. ACM, 42–46. [Ricardo Baeza-Yates] 2020. Bias in Search and Recommender Systems. In RecSys. ACM, 2. [Boratto et al.] 2021. Connecting user and item perspectives in popularity debiasing for collaborative recommendation. Inf. Process. Manag. 58, 1 (2021), 102387. [Jannach et al.] 2015. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Model. User Adapt. Interact. 25, 5 (2015), 427–491. Daniele Malitesta∗ Politecnico di Bari, Italy daniele.malitesta@poliba.it Giandomenico Cornacchia∗ Politecnico di Bari, Italy giandomenico.cornacchia@poliba.it Claudio Pomo Politecnico di Bari, Italy claudio.pomo@poliba.it Tommaso Di Noia Politecnico di Bari, Italy tommaso.dinoia@poliba.it ABSTRACT Multimodal-aware recommender systems (MRSs) exploit multi- modal content (e.g., product images or descriptions) as items’ side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base architecture, it has been shown that MFBPR may be a�ected by popularity bias, meaning that it inherently tends to boost the rec- ommendation of popular (i.e., short-head) items at the detriment of niche (i.e., long-tail) items from the catalog. Motivated by this as- sumption, in this work, we provide one of the �rst analyses on how multimodality in recommendation could further amplify popularity bias. Concretely, we evaluate the performance of four state-of-the- art MRSs algorithms (i.e., VBPR, MMGCN, GRCN, LATTICE) on three datasets from Amazon by assessing, along with recommen- dation accuracy metrics, performance measures accounting for the diversity of recommended items and the portion of retrieved niche items. To better investigate this aspect, we decide to study the separate in�uence of each modality (i.e., visual and textual) on popularity bias in di�erent evaluation dimensions. Results, which demonstrate how the single modality may augment the negative e�ect of popularity bias, shed light on the importance to provide a more rigorous analysis of the performance of such models. 0 500 1,000 1,500 2,000 2,500 0 20 40 60 80 100 120 items popularity short-head long-tail Figure 1: Short-head and long-tail items from the O�ce dataset in the Amazon catalog. Systems: a Modalities-driven Analysis. In Proceedings of Make sure to en- ter the correct conference title from your rights con�rmation emai (Confer- ence acronym ’XX). ACM, New York, NY, USA, 10 pages. https://doi.org/ .12911v1 [cs.IR] 24 Aug 2023
  • 7. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Some recent works [Liu et al., Kowald and Lacic, Malitesta et al. (2023b)] address bias in multimodal-aware recommendation, but with different definitions and settings with respect to the one of popularity bias we presented earlier. 7 Popularity bias in multimodal-aware recommendation [Liu et al.] 2022. EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation. In ACM Multimedia. ACM, 687–695. [Kowald and Lacic] 2022. Popularity Bias in Collaborative Filtering-Based Multimedia Recommender Systems. In BIAS (Communications in Computer and Information Science, Vol. 1610). Springer, 1–11. [Malitesta et al. (2023b)] 2023. Disentangling the Performance Puzzle of Multimodal-aware Recommender Systems. In EvalRS@KDD (CEUR Workshop Proceedings, Vol. 3450). CEUR-WS.org. Daniele Malitesta∗ Politecnico di Bari, Italy daniele.malitesta@poliba.it Giandomenico Cornacchia∗ Politecnico di Bari, Italy giandomenico.cornacchia@poliba.it Claudio Pomo Politecnico di Bari, Italy claudio.pomo@poliba.it Tommaso Di Noia Politecnico di Bari, Italy tommaso.dinoia@poliba.it ABSTRACT Multimodal-aware recommender systems (MRSs) exploit multi- modal content (e.g., product images or descriptions) as items’ side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base architecture, it has been shown that MFBPR may be a�ected by popularity bias, meaning that it inherently tends to boost the rec- ommendation of popular (i.e., short-head) items at the detriment of niche (i.e., long-tail) items from the catalog. Motivated by this as- sumption, in this work, we provide one of the �rst analyses on how multimodality in recommendation could further amplify popularity bias. Concretely, we evaluate the performance of four state-of-the- art MRSs algorithms (i.e., VBPR, MMGCN, GRCN, LATTICE) on three datasets from Amazon by assessing, along with recommen- dation accuracy metrics, performance measures accounting for the diversity of recommended items and the portion of retrieved niche items. To better investigate this aspect, we decide to study the separate in�uence of each modality (i.e., visual and textual) on popularity bias in di�erent evaluation dimensions. Results, which demonstrate how the single modality may augment the negative e�ect of popularity bias, shed light on the importance to provide a more rigorous analysis of the performance of such models. 0 500 1,000 1,500 2,000 2,500 0 20 40 60 80 100 120 items popularity short-head long-tail Figure 1: Short-head and long-tail items from the O�ce dataset in the Amazon catalog. Systems: a Modalities-driven Analysis. In Proceedings of Make sure to en- ter the correct conference title from your rights con�rmation emai (Confer- ence acronym ’XX). ACM, New York, NY, USA, 10 pages. https://doi.org/ .12911v1 [cs.IR] 24 Aug 2023
  • 8. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) ü Propose one of the first analyses on how multimodal-aware recommender systems may amplify popularity bias ü Select four state-of-the-art multimodal-aware recommender systems (i.e., VBPR, MMGCN, GRCN, and LATTICE) ü Train them on three categories of the Amazon Catalogue (i.e., Office, Toys, and Clothing) ü Evaluate the performance on recommendation accuracy and popularity bias (i.e., diversity and percentage of retrieved items from the long-tail) ü Assess the separate impact of each multimodal side information on single and paired recommendation metrics 8 Our contributions Research questions RQ1) How do multimodal-aware recommendation models behave in terms of accuracy, diversity, and popularity bias? RQ2) What is the influence of each modality (i.e., visual, textual, multimodal) on such performance measures?
  • 10. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 10 Preliminaries U I u1 u2 i3 i2 i1 1 1 1 0 1 0 1 1 0 0 0 1 1 1 0 X User-item interaction matrix u3 u4 u5 USERS ITEMS eu ei fi fu Collaborative Multimodal
  • 11. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) • Visual Bayesian personalized ranking (VBPR [He et al. (2016)]) • Multimodal graph convolutional network for recommendation (MMGCN [Wei et al. (2019)]) • Graph-refined convolutional network (GRCN [Wei et al. (2020)]) • Latent structure mining method for multimodal recommendation (LATTICE [Zhang et al.]) 11 Multimodal-aware recommender systems [He et al. (2016)] 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150. [Wei et al. (2019)] 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video. In ACM Multimedia. ACM, 1437–1445. [Wei et al. (2020)] 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. In ACM Multimedia. ACM, 3541–3549. [Zhang et al.] 2021. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis MMIR ’23, November 2, 2023, O�awa, ON, Canada Models Year Venue Prediction VBPR 2016 AAAI ĜD8 = e> D e8 + f> D C(f8) with f8 = k <2M f< 8 MMGCN 2019 MM ĜD8 = f> D f8 with fD = Õ <2M 2(eD,6(f< D ),C(f< D , eD)) GRCN 2020 MM ĜD8 = f> D f8 with fD = 6(eD, f< D , 8< 2 M) || ✓ k <2M C(f< D ) ◆ LATTICE 2021 MM ĜD8 = e> D f8 with f8 = e8 + 6(e8,f< 8 ,8<2M) ||6(e8,f< 8 ,8<2M)||2 Table 1: Statistics of the tested datasets. Datasets |U| |I| |R| Sparsity (%) O�ce 4,905 2,420 53,258 99.5513 Toys 19,412 11,924 167,597 99.9276 Clothing 39,387 23,033 278,677 99.9693 approach. Then, we focused on quantifying the singular modality in�uence on the multimodal scenario in terms of accuracy, diver- and MFBPR as a reference for the other multimodal-aware recommender systems we want to analyze. RQ2. What is the in�uence of each modality setting (i.e., visual, tex tual, multimodal) on such performance measures? Section 5.2 takes a step further by analyzing how each modality (i.e., vi sual, textual, and multimodal) in�uences accuracy, diversity and popularity bias; the evaluation is conducted both on the single metric and across pairs of metrics. 5.1 Recommendation accuracy, diversity, and popularity bias (RQ1)
  • 13. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 13 Datasets and multimodal features [McAuley et al.] 2015. Image-Based Recommendations on Styles and Substitutes. In SIGIR. ACM, 43–52. [Deldjoo et al.] 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967. [Zhang et al.] 2021. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880. MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia Table 1: Statistics of the tested datasets. Datasets |U| |I| |R| Sparsity (%) O�ce 4,905 2,420 53,258 99.5513 Toys 19,412 11,924 167,597 99.9276 Clothing 39,387 23,033 278,677 99.9693 through weighted element-wise addition, and the �nal adjacency matrix is exploited to perform graph convolution to update the representation of the collaborative item embeddings. Then, this updated version is added to the initial collaborative item embed- ding. Finally, the dot product between the collaborative user and (updated) item embeddings predicts the interaction score: ĜD8 = e> D f8 with f8 = e8 + 6(e8, f< 8 , 8< 2 M) , (6) thorough coverage to the list of user interactions [7]: Recall@: = 1 |U| ’ D2U |RelD @:| |RelD | , (7) where RelD indicates the set of relevant items for user D, while RelD @: is the set of relevant recommended items in the top-: list. Normalized discount cumulative gain. The normalized discount cumulative gain (nDCG) considers the relevance and the ranking position of recommended products, taking into account the varied degrees of relevance: nDCG@: = 1 |U| ’ D DCGD@: IDCGD@: , (8) where DCG@: = Õ: 8=1 2A4;D,8 1 log2 (8+1) quanti�es the cumulative gain of relevance scores through the recommended list, with A4;D,8 2 RelD, and IDCG represents the cumulative gain of relevance scores for a Amazon Catalogue [McAuley et al.] Multimodal features • Visual features: 4,096 embeddings [Deldjoo et al.] • Textual features: 1,024 embeddings [Zhang et al.]
  • 14. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Item coverage: 14 Evaluation metrics representation of the collaborative item embeddings. Then, this updated version is added to the initial collaborative item embed- ding. Finally, the dot product between the collaborative user and (updated) item embeddings predicts the interaction score: ĜD8 = e> D f8 with f8 = e8 + 6(e8, f< 8 , 8< 2 M) ||6(e8, f< 8 , 8< 2 M)||2 , (6) where6 is a LightGCN [28] architecture performing graph structure learning as stated above. 4 PROPOSED ANALYSIS In this section, we present the details to conduct our analysis. Ini- tially, we report on the used datasets, describing the methodologies employed for extracting multimodal features. Subsequently, we introduce and formally de�ne the evaluation metrics employed, encompassing accuracy, diversity, and popularity bias. Finally, we provide a thorough summary of the reproducibility information for our study, detailing the methods used for dataset splitting and �ltering as well as the strategy for hyperparameter search. 4.1 Datasets he multimodal recommender systems have been tested on three popular [17, 33, 66, 69] datasets from the Amazon catalog [46]: Of- �ce Products (O�ce), (b) Toys & Games (Toys), and (c) Clothing, Shoes & Jewelry (Clothing). The multimodal datasets provide both images and descriptions for each available item. Speci�cally, we uti- lize the pre-extracted 4,096-dimensional visual features [24] which are made publicly available1. For the textual modality, we follow the existing literature [66], which aggregates the item’s title, descrip- tions, categories, and brand, thereby generating textual embeddings by leveraging sentence transformers [51]. The generated features are 1,024-dimensional embeddings. Additional dataset information can be found in Table 1. nDCG@: = 1 |U| ’ D DCGD@: IDCGD@: , (8) where DCG@: = Õ: 8=1 2A4;D,8 1 log2 (8+1) quanti�es the cumulative gain of relevance scores through the recommended list, with A4;D,8 2 RelD, and IDCG represents the cumulative gain of relevance scores for a perfect (ideal) recommender system. Item coverage. The item coverage (abbreviated “iCov” in the fol- lowing) gives information on the coverage (item-side) measured in recommendation lists. A higher item coverage suggests that a larger fraction of the item space is being scrutinized and recommended to consumers, implying a more comprehensive coverage of user pref- erences and potentially a more comprehensive recommendation experience. In particular, we have: iCov@: = | – D Î D@:| |I CA08= | , (9) where Î D@: is the list of top-: recommended items for a user D. Average percentage of long-tail items. The average percentage of long-tail items (APLT) is a measure used to assess the presence of popularity bias in recommendation systems [2]. Popularity bias refers to the tendency of recommendation algorithms to priori- tize popular or mainstream items over less well-known or niche items. This bias can lead to limited exposure of users to diverse and personalized recommendations. The metric measure the percent- age of items belonging to the medium/long-tail distribution in the recommendation lists averaged over all users: APLT@: = 1 |U| ’ D2U |{8 | 8 2 (Î D@: ⇠ )}| : , (10) where is the set of items belonging to the short-tail distribution while ⇠ is the set of items from the medium/long-tail distribution. Note that we decide to integrate the evaluation of the APLT along with the iCov (introduced above) because the latter may be func- ˆD8 = e> D f8 with f8 = e8 + 6(e8, f< 8 , 8< 2 M) ||6(e8, f< 8 , 8< 2 M)||2 , (6) is a LightGCN [28] architecture performing graph structure g as stated above. ROPOSED ANALYSIS section, we present the details to conduct our analysis. Ini- e report on the used datasets, describing the methodologies ed for extracting multimodal features. Subsequently, we ce and formally de�ne the evaluation metrics employed, assing accuracy, diversity, and popularity bias. Finally, we a thorough summary of the reproducibility information study, detailing the methods used for dataset splitting and g as well as the strategy for hyperparameter search. Datasets imodal recommender systems have been tested on three [17, 33, 66, 69] datasets from the Amazon catalog [46]: Of- ducts (O�ce), (b) Toys & Games (Toys), and (c) Clothing, & Jewelry (Clothing). The multimodal datasets provide both and descriptions for each available item. Speci�cally, we uti- pre-extracted 4,096-dimensional visual features [24] which e publicly available1. For the textual modality, we follow the g literature [66], which aggregates the item’s title, descrip- ategories, and brand, thereby generating textual embeddings raging sentence transformers [51]. The generated features 4-dimensional embeddings. Additional dataset information ound in Table 1. Evaluation metrics roposed study, we refer to various metrics that may bring itional insights which have not been investigated yet in odal recommendation. Indeed, we do not solely rely on relevance scores through the recommended list, with A4;D,8 2 RelD, and IDCG represents the cumulative gain of relevance scores for a perfect (ideal) recommender system. Item coverage. The item coverage (abbreviated “iCov” in the fol- lowing) gives information on the coverage (item-side) measured in recommendation lists. A higher item coverage suggests that a larger fraction of the item space is being scrutinized and recommended to consumers, implying a more comprehensive coverage of user pref- erences and potentially a more comprehensive recommendation experience. In particular, we have: iCov@: = | – D Î D@:| |I CA08= | , (9) where Î D@: is the list of top-: recommended items for a user D. Average percentage of long-tail items. The average percentage of long-tail items (APLT) is a measure used to assess the presence of popularity bias in recommendation systems [2]. Popularity bias refers to the tendency of recommendation algorithms to priori- tize popular or mainstream items over less well-known or niche items. This bias can lead to limited exposure of users to diverse and personalized recommendations. The metric measure the percent- age of items belonging to the medium/long-tail distribution in the recommendation lists averaged over all users: APLT@: = 1 |U| ’ D2U |{8 | 8 2 (Î D@: ⇠ )}| : , (10) where is the set of items belonging to the short-tail distribution while ⇠ is the set of items from the medium/long-tail distribution. Note that we decide to integrate the evaluation of the APLT along with the iCov (introduced above) because the latter may be func- tional to provide a complete interpretation of the former. Indeed, following their de�nitions and formulations, the two metrics are conceptually related. Metrics value interpretation An ideal recommender system should increase all the metrics listed above according to the princi- Average percentage of long-tail items [Abdollahpouri et al.] Recall: by leveraging sentence transformers [51]. The generated features are 1,024-dimensional embeddings. Additional dataset information can be found in Table 1. 4.2 Evaluation metrics In the proposed study, we refer to various metrics that may bring out additional insights which have not been investigated yet in multimodal recommendation. Indeed, we do not solely rely on accuracy metrics (i.e., Recall and nDCG) but also on diversity (i.e., item coverage) and popularity bias (i.e., APLT) metrics. The metrics listed hereinafter are calculated on top-: recommendation lists. Recall. The Recall assesses the system’s capacity to retrieve rele- vant items from the recommendation list, highlighting the need for thorough coverage to the list of user interactions [7]: Recall@: = 1 |U| ’ D2U |RelD @:| |RelD | , (7) where RelD indicates the set of relevant items for user D, while RelD @: is the set of relevant recommended items in the top-: list. Normalized discount cumulative gain. The normalized discount cumulative gain (nDCG) considers the relevance and the ranking 1https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html. where is the set of items belonging to the short-tail distribution while ⇠ is the set of items from the medium/long-tail distribution. Note that we decide to integrate the evaluation of the APLT along with the iCov (introduced above) because the latter may be func- tional to provide a complete interpretation of the former. Indeed, following their de�nitions and formulations, the two metrics are conceptually related. Metrics value interpretation An ideal recommender system should increase all the metrics listed above according to the princi- ple “higher is better” to boost accuracy and diversity while reducing the popularity bias of the produced recommendations. Neverthe- less, with the current work, we try to unveil whether and why multimodal-aware recommender systems are a�ected by pop- ularity bias. Thus, in the following, we will take into account those se�ings in which accuracy is high, while diversity and popularity bias are low (according to the metrics de�nitions). 4.3 Reproducibility We investigate the models’ behavior in three di�erent settings: (i) visual modality, in which we employ only visual features, (ii) textual modality, in which we employ only textual features, and (iii) multimodal, where both modalities are considered and combined. In the �rst step, we evaluate the models in the multimodal set- ting which is the same setting as the original one for each tested Normalized discount cumulative gain: Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia position of recommended products, taking into account the varied degrees of relevance: nDCG@: = 1 |U| ’ D DCGD@: IDCGD@: , (8) where DCG@: = Õ: 8=1 2A4;D,8 1 log2 (8+1) quanti�es the cumulative gain of relevance scores through the recommended list, with A4;D,8 2 RelD, and IDCG represents the cumulative gain of relevance scores for a perfect (ideal) recommender system. Item coverage. The item coverage (abbreviated “iCov” in the fol- lowing) gives information on the coverage (item-side) measured in recommendation lists. A higher item coverage suggests that a larger fraction of the item space is being scrutinized and recommended to consumers, implying a more comprehensive coverage of user pref- Accuracy Popularity Bias [Abdollahpouri et al.] 2017. Controlling Popularity Bias in Learning-to-Rank Recommendation. In RecSys. ACM, 42–46.
  • 16. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 16 Recommendation accuracy, diversity, and popularity bias (RQ1) MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia Table 2: Results in terms of recommendation accuracy (Recall, nDCG), diversity (iCov) and popularity bias (APLT). For accuracy metrics, " means better performance, while # means less diversity and more popularity bias. We remind that, while iCov and APLT metrics would generally adhere to the principle of “higher is better” (") for an ideal recommender system, in this work we consider the opposite as we want to emphasize which models are performing worst in terms of diversity and popularity bias. Datasets Models top@10 top@20 top@50 Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# O�ce Random 0.0034 0.0020 2,414 0.5950 0.0079 0.0034 2,414 0.5948 0.0220 0.0068 2,414 0.5924 MostPop 0.0302 0.0208 20 0.0000 0.0533 0.0282 32 0.0000 0.1143 0.0439 66 0.0000 MFBPR 0.0602 0.0389 2,268 0.2294 0.0955 0.0500 2,357 0.2379 0.1657 0.0677 2,398 0.2513 VBPR 0.0652 0.0419 2,265 0.2321 0.1025 0.0533 2,354 0.2375 0.1774 0.0721 2,404 0.2469 MMGCN 0.0455 0.0300 74 0.0016 0.0798 0.0405 112 0.0078 0.1575 0.0598 247 0.0205 GRCN 0.0393 0.0253 2,390 0.3438 0.0667 0.0339 2,409 0.3469 0.1250 0.0488 2,414 0.3548 LATTICE 0.0664 0.0449 2,121 0.1752 0.1029 0.0566 2,315 0.2039 0.1780 0.0751 2,397 0.2413 Toys Random 0.0011 0.0006 11,879 0.4894 0.0021 0.0008 11,879 0.4896 0.0051 0.0015 11,879 0.4902 MostPop 0.0130 0.0075 13 0.0000 0.0229 0.0104 24 0.0000 0.0451 0.0156 56 0.0000 MFBPR 0.0641 0.0403 10,016 0.1167 0.0903 0.0481 10,944 0.1268 0.1394 0.0596 11,544 0.1460 VBPR 0.0710 0.0458 10,085 0.1064 0.1006 0.0545 11,026 0.1180 0.1523 0.0667 11,624 0.1400 MMGCN 0.0256 0.0150 4,499 0.0961 0.0426 0.0200 6,238 0.1058 0.0785 0.0285 8,657 0.1263 GRCN 0.0554 0.0354 11,007 0.2368 0.0831 0.0436 11,609 0.2482 0.1355 0.0559 11,847 0.2679 LATTICE 0.0805 0.0512 8,767 0.0546 0.1165 0.0617 10,285 0.0684 0.1771 0.0759 11,397 0.0950 Clothing Random 0.0004 0.0002 23,016 0.4487 0.0010 0.0003 23,016 0.4478 0.0024 0.0006 23,016 0.4482 MostPop 0.0089 0.0046 13 0.0000 0.0157 0.0063 24 0.0000 0.0322 0.0095 56 0.0000 MFBPR 0.0303 0.0156 18,414 0.0729 0.0459 0.0195 20,582 0.0824 0.0734 0.0249 22,171 0.1017 VBPR 0.0339 0.0181 19,195 0.0809 0.0529 0.0229 21,251 0.0915 0.0847 0.0292 22,555 0.1112 MMGCN 0.0227 0.0119 1,744 0.0044 0.0348 0.0150 2,864 0.0066 0.0609 0.0201 5,373 0.0121 GRCN 0.0319 0.0164 21,490 0.2358 0.0496 0.0209 22,503 0.2459 0.0858 0.0281 22,954 0.2631 LATTICE 0.0502 0.0275 13,463 0.0134 0.0744 0.0336 17,538 0.0207 0.1186 0.0425 21,458 0.0385 popularity bias phenomenon as much as MMGCN does. Indeed, even if LATTICE’s iCov is the second-worst across all the datasets, the metric is always close to the best models in terms of diversity. Finally, VBPR and GRCN con�rm their ability (already observed on the diversity measure) to tackle also popularity bias in all ex- discuss the in�uence of each single modality on the performance. We consider two evaluation dimensions where modalities in�uence is assessed (i) on accuracy, diversity, and popularity bias separately, and (ii) on pairs of metrics to investigate their joint variations. Modalities in�uence on the single metric. Figure 2 displays the LATTICE stands out for its accuracy performance…😃 …but amplifies popularity bias 🥲
  • 17. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 17 Recommendation accuracy, diversity, and popularity bias (RQ1) MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia Table 2: Results in terms of recommendation accuracy (Recall, nDCG), diversity (iCov) and popularity bias (APLT). For accuracy metrics, " means better performance, while # means less diversity and more popularity bias. We remind that, while iCov and APLT metrics would generally adhere to the principle of “higher is better” (") for an ideal recommender system, in this work we consider the opposite as we want to emphasize which models are performing worst in terms of diversity and popularity bias. Datasets Models top@10 top@20 top@50 Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# O�ce Random 0.0034 0.0020 2,414 0.5950 0.0079 0.0034 2,414 0.5948 0.0220 0.0068 2,414 0.5924 MostPop 0.0302 0.0208 20 0.0000 0.0533 0.0282 32 0.0000 0.1143 0.0439 66 0.0000 MFBPR 0.0602 0.0389 2,268 0.2294 0.0955 0.0500 2,357 0.2379 0.1657 0.0677 2,398 0.2513 VBPR 0.0652 0.0419 2,265 0.2321 0.1025 0.0533 2,354 0.2375 0.1774 0.0721 2,404 0.2469 MMGCN 0.0455 0.0300 74 0.0016 0.0798 0.0405 112 0.0078 0.1575 0.0598 247 0.0205 GRCN 0.0393 0.0253 2,390 0.3438 0.0667 0.0339 2,409 0.3469 0.1250 0.0488 2,414 0.3548 LATTICE 0.0664 0.0449 2,121 0.1752 0.1029 0.0566 2,315 0.2039 0.1780 0.0751 2,397 0.2413 Toys Random 0.0011 0.0006 11,879 0.4894 0.0021 0.0008 11,879 0.4896 0.0051 0.0015 11,879 0.4902 MostPop 0.0130 0.0075 13 0.0000 0.0229 0.0104 24 0.0000 0.0451 0.0156 56 0.0000 MFBPR 0.0641 0.0403 10,016 0.1167 0.0903 0.0481 10,944 0.1268 0.1394 0.0596 11,544 0.1460 VBPR 0.0710 0.0458 10,085 0.1064 0.1006 0.0545 11,026 0.1180 0.1523 0.0667 11,624 0.1400 MMGCN 0.0256 0.0150 4,499 0.0961 0.0426 0.0200 6,238 0.1058 0.0785 0.0285 8,657 0.1263 GRCN 0.0554 0.0354 11,007 0.2368 0.0831 0.0436 11,609 0.2482 0.1355 0.0559 11,847 0.2679 LATTICE 0.0805 0.0512 8,767 0.0546 0.1165 0.0617 10,285 0.0684 0.1771 0.0759 11,397 0.0950 Clothing Random 0.0004 0.0002 23,016 0.4487 0.0010 0.0003 23,016 0.4478 0.0024 0.0006 23,016 0.4482 MostPop 0.0089 0.0046 13 0.0000 0.0157 0.0063 24 0.0000 0.0322 0.0095 56 0.0000 MFBPR 0.0303 0.0156 18,414 0.0729 0.0459 0.0195 20,582 0.0824 0.0734 0.0249 22,171 0.1017 VBPR 0.0339 0.0181 19,195 0.0809 0.0529 0.0229 21,251 0.0915 0.0847 0.0292 22,555 0.1112 MMGCN 0.0227 0.0119 1,744 0.0044 0.0348 0.0150 2,864 0.0066 0.0609 0.0201 5,373 0.0121 GRCN 0.0319 0.0164 21,490 0.2358 0.0496 0.0209 22,503 0.2459 0.0858 0.0281 22,954 0.2631 LATTICE 0.0502 0.0275 13,463 0.0134 0.0744 0.0336 17,538 0.0207 0.1186 0.0425 21,458 0.0385 popularity bias phenomenon as much as MMGCN does. Indeed, even if LATTICE’s iCov is the second-worst across all the datasets, the metric is always close to the best models in terms of diversity. Finally, VBPR and GRCN con�rm their ability (already observed on the diversity measure) to tackle also popularity bias in all ex- discuss the in�uence of each single modality on the performance. We consider two evaluation dimensions where modalities in�uence is assessed (i) on accuracy, diversity, and popularity bias separately, and (ii) on pairs of metrics to investigate their joint variations. Modalities in�uence on the single metric. Figure 2 displays the MMGCN struggles with diversity… 🤒 ...exhibits strong popularity bias… 😱 …and sacrifices accuracy in certain scenarios ☠
  • 18. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 18 Recommendation accuracy, diversity, and popularity bias (RQ1) MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia Table 2: Results in terms of recommendation accuracy (Recall, nDCG), diversity (iCov) and popularity bias (APLT). For accuracy metrics, " means better performance, while # means less diversity and more popularity bias. We remind that, while iCov and APLT metrics would generally adhere to the principle of “higher is better” (") for an ideal recommender system, in this work we consider the opposite as we want to emphasize which models are performing worst in terms of diversity and popularity bias. Datasets Models top@10 top@20 top@50 Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# Recall" nDCG" iCov# APLT# O�ce Random 0.0034 0.0020 2,414 0.5950 0.0079 0.0034 2,414 0.5948 0.0220 0.0068 2,414 0.5924 MostPop 0.0302 0.0208 20 0.0000 0.0533 0.0282 32 0.0000 0.1143 0.0439 66 0.0000 MFBPR 0.0602 0.0389 2,268 0.2294 0.0955 0.0500 2,357 0.2379 0.1657 0.0677 2,398 0.2513 VBPR 0.0652 0.0419 2,265 0.2321 0.1025 0.0533 2,354 0.2375 0.1774 0.0721 2,404 0.2469 MMGCN 0.0455 0.0300 74 0.0016 0.0798 0.0405 112 0.0078 0.1575 0.0598 247 0.0205 GRCN 0.0393 0.0253 2,390 0.3438 0.0667 0.0339 2,409 0.3469 0.1250 0.0488 2,414 0.3548 LATTICE 0.0664 0.0449 2,121 0.1752 0.1029 0.0566 2,315 0.2039 0.1780 0.0751 2,397 0.2413 Toys Random 0.0011 0.0006 11,879 0.4894 0.0021 0.0008 11,879 0.4896 0.0051 0.0015 11,879 0.4902 MostPop 0.0130 0.0075 13 0.0000 0.0229 0.0104 24 0.0000 0.0451 0.0156 56 0.0000 MFBPR 0.0641 0.0403 10,016 0.1167 0.0903 0.0481 10,944 0.1268 0.1394 0.0596 11,544 0.1460 VBPR 0.0710 0.0458 10,085 0.1064 0.1006 0.0545 11,026 0.1180 0.1523 0.0667 11,624 0.1400 MMGCN 0.0256 0.0150 4,499 0.0961 0.0426 0.0200 6,238 0.1058 0.0785 0.0285 8,657 0.1263 GRCN 0.0554 0.0354 11,007 0.2368 0.0831 0.0436 11,609 0.2482 0.1355 0.0559 11,847 0.2679 LATTICE 0.0805 0.0512 8,767 0.0546 0.1165 0.0617 10,285 0.0684 0.1771 0.0759 11,397 0.0950 Clothing Random 0.0004 0.0002 23,016 0.4487 0.0010 0.0003 23,016 0.4478 0.0024 0.0006 23,016 0.4482 MostPop 0.0089 0.0046 13 0.0000 0.0157 0.0063 24 0.0000 0.0322 0.0095 56 0.0000 MFBPR 0.0303 0.0156 18,414 0.0729 0.0459 0.0195 20,582 0.0824 0.0734 0.0249 22,171 0.1017 VBPR 0.0339 0.0181 19,195 0.0809 0.0529 0.0229 21,251 0.0915 0.0847 0.0292 22,555 0.1112 MMGCN 0.0227 0.0119 1,744 0.0044 0.0348 0.0150 2,864 0.0066 0.0609 0.0201 5,373 0.0121 GRCN 0.0319 0.0164 21,490 0.2358 0.0496 0.0209 22,503 0.2459 0.0858 0.0281 22,954 0.2631 LATTICE 0.0502 0.0275 13,463 0.0134 0.0744 0.0336 17,538 0.0207 0.1186 0.0425 21,458 0.0385 popularity bias phenomenon as much as MMGCN does. Indeed, even if LATTICE’s iCov is the second-worst across all the datasets, the metric is always close to the best models in terms of diversity. Finally, VBPR and GRCN con�rm their ability (already observed on the diversity measure) to tackle also popularity bias in all ex- discuss the in�uence of each single modality on the performance. We consider two evaluation dimensions where modalities in�uence is assessed (i) on accuracy, diversity, and popularity bias separately, and (ii) on pairs of metrics to investigate their joint variations. Modalities in�uence on the single metric. Figure 2 displays the VBPR and GRCN better manage all the metrics by finding the right compromise among them 😎
  • 19. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 19 Modalities influence on recommendation performance (RQ2) Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis MMIR ’23, November 2, 2023, O�awa, ON, Canada VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing (a) Recall VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing (b) iCov VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis MMIR ’23, November 2, 2023, O�awa, ON, Canada VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing (a) Recall VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing (b) iCov -10% 0% +10% +20% -10% 0% +10% +20% -10% 0% +10% +20% VBPR MMGCN GRCN LATTICE O�ce VBPR MMGCN GRCN LATTICE Toys VBPR MMGCN GRCN LATTICE Clothing (a) Recall VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing (b) iCov VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% O�ce VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Toys VBPR MMGCN GRCN LATTICE -20% -10% 0% +10% +20% Clothing (c) APLT visual textual Figure 2: Percentage variation on the (a) Recall, (b) iCov, and (c) APLT when training the multimodal recommender systems with either visual or textual modalities. The 0% line stands for the reference performance provided by the multimodal version of the model. All results refer to the top@20 recommendation lists. showing consistent trends. Indeed, the visual modality reduces the Recall while the textual increases it (with the only exception of VBPR whose percentage variation is negligible). Di�erently from the accuracy analysis, we recognize a quasi- stable trend in the performance variation measured for the diversity metric (Figure 2b). Considering the O�ce dataset, each modality’s contribution is generally irrelevant except for MMGCN, for which the visual modality slightly improves the coverage across the whole recommendation list, while the textual one worsens the perfor- mance by a large margin. Assessing the trend on Toys, both the modalities decrease the coverage performance of the model when injected separately in the recommendation pipeline; remarkably, MMGCN is once again the model a�ected by the single modality presence the most, but this time the coverage performance widely deteriorates because of both the visual and textual modalities. Fi- nally, on Clothing, both modalities lower the model’s item coverage, with speci�c reference to the visual modality. As the last part of our analysis, we take into account each modal- ity’s contribution to the popularity bias dimension (Figure 2c). Start- ing from O�ce, we notice how both modalities are prone to enforce popularity bias if injected singularly, with the only exception of LATTICE whose textual modality limits the popularity bias (the The textual modality improves the accuracy… 💪 …while both modalities negatively affect the diversity and reinforce the popularity bias 😭 Single metric setting
  • 20. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 20 Modalities influence on recommendation performance (RQ2) The textual modality has a significant influence on accuracy… 😣 but minimal effects on diversity and popularity bias 😇 Pair-wise metric setting MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia 0.03 0.04 0.05 0.06 0.07 0.08 0.00 0.05 0.10 0.15 0.20 0.25 Recall APLT (a) 0.03 0.04 0.05 0.06 0.07 0.08 5 000 10 000 15 000 20 000 Recall iCov (b) 5 000 10 000 15 000 20 000 0.00 0.05 0.10 0.15 0.20 0.25 iCov APLT (c) VBPR MMGCN GRCN LATTICE multimodal visual textual Figure 3: Performance analysis on Clothing when comparing (a) Recall vs. APLT, (b) Recall vs. iCov, and (c) iCov vs. APLT for di�erent modality settings involving the multimodal, visual, and textual modalities. Metrics are on top@20. APLT increases); this is interesting as we remind that LATTICE is the second-worst model in terms of popularity bias, but using only the textual modality reduces its accuracy performance and the in�u- ence of popular items in the recommendation list. When it comes to the Toys dataset, every single modality enforces the popularity bias of MMGCN and GRCN; for VBPR, the visual and textual modalities 6 CONCLUSION AND FUTURE WORK Motivated by the assumption that factorization models in recom- mendation (such as MFBPR) are a�ected by popularity bias, in this work, we provided one of the �rst systematic analyses on how multimodal-aware recommender systems (largely built upon MF- BPR) further amplify the recommendation of popular items. After
  • 21. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) 21 Modalities influence on recommendation performance (RQ2) The visual modality reduces accuracy… 😨 …and jointly worsens the popularity bias and diversity 😵 Pair-wise metric setting (cont’d) MMIR ’23, November 2, 2023, O�awa, ON, Canada Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, & Tommaso Di Noia 0.03 0.04 0.05 0.06 0.07 0.08 0.00 0.05 0.10 0.15 0.20 0.25 Recall APLT (a) 0.03 0.04 0.05 0.06 0.07 0.08 5 000 10 000 15 000 20 000 Recall iCov (b) 5 000 10 000 15 000 20 000 0.00 0.05 0.10 0.15 0.20 0.25 iCov APLT (c) VBPR MMGCN GRCN LATTICE multimodal visual textual Figure 3: Performance analysis on Clothing when comparing (a) Recall vs. APLT, (b) Recall vs. iCov, and (c) iCov vs. APLT for di�erent modality settings involving the multimodal, visual, and textual modalities. Metrics are on top@20. APLT increases); this is interesting as we remind that LATTICE is the second-worst model in terms of popularity bias, but using only the textual modality reduces its accuracy performance and the in�u- ence of popular items in the recommendation list. When it comes to the Toys dataset, every single modality enforces the popularity bias of MMGCN and GRCN; for VBPR, the visual and textual modalities 6 CONCLUSION AND FUTURE WORK Motivated by the assumption that factorization models in recom- mendation (such as MFBPR) are a�ected by popularity bias, in this work, we provided one of the �rst systematic analyses on how multimodal-aware recommender systems (largely built upon MF- BPR) further amplify the recommendation of popular items. After
  • 23. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Conclusion ● Analysis on influence of multimodality on popularity bias ● Four SOTA multimodal recommendation approaches on three datasets ● Three evaluation dimensions and three modality settings ● [RQ1] VBPR and GRCN strike a better compromise among all metrics ● [RQ2 single] Separate injection of modalities improves accuracy but negatively impacts diversity and popularity bias ● [RQ2 pairs textual] Highly impacts on accuracy but little effect on diversity and popularity bias ● [RQ2 pairs visual] Reduces accuracy while exacerbating popularity bias and limiting the diversity Future work ● More complete study on the performance of these models ● Assessing the performance of more recent multimodal approaches [Malitesta et al. (2023a)] 23
  • 24. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Reach us out! 24 The authors: • Daniele Malitesta (daniele.malitesta@poliba.it) • Giandomenico Cornacchia (giandomenico.cornacchia@poliba.it) • Claudio Pomo (claudio.pomo@poliba.it) • Tommaso Di Noia (tommaso.dinoia@poliba.it)
  • 25. On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis The 1st International Workshop on Deep Multimodal Learning for Information Retrieval (Ottawa, November 02, 2023) Don’t forget to check out our theoretical/experimental survey 25