sigmod-keynote.pdf

On A Quest for Combating
Filter Bubbles and
Misinformation
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, BC, Canada
SIGMOD 2022, Philadelphia, PA
6/15/22

Prolegomenon
• What this talk is not about and will not do for you.
• Classify different kinds of “fake news”: e.g., mis/disinformation ...
• Computational Fact Checking or Claim Verification
• Offer a comprehensive solution to the filter bubble/echo chambers
or “fake news” problems.
• The scope of both stretch beyond just tech (e.g., models and
algorithms).
• Even the “tech-restricted” versions we won’t get to completely solve
today (in this talk).
6/15/22 SIGMOD 2022, Philadelphia, PA 3

Prolegomenon
• Instead, we will examine some (necessarily restricted)
models and formulations of problems.
• Offer a view of how research done in some different
contexts may inspire techniques for solving restricted
versions of the filter bubbles / echo chambers and the
misinformation problems.
• In case I missed your work, …

Not long ago, or maybe long ago …

And then came …
but arguably also these …
Which led to many great things

•Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions

["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
Filter Bubble and Echo Chambers exacerbate polarization

Filter Bubble and Echo Chambers exacerbate polarization
["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].

Political Echo Chambers
● Members of densely connected groups are
likely to have the same opinions and
attitudes.
● Study focus on opposing political echo
chambers (~250K each) on Twitter in Japan.
● Political echo chambers have denser and
more core-periphery information spreading
structures than those of most other
communities.
[Asatani et al. Dense and influential core promotion of daily viral information spread in political echo chambers. Scientific
Reports 2021].

The Price of Filter Bubbles
• Filter bubbles and echo chambers can impede natural
opinion formation
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018].
• Can lead to one-sided policy decisions
[Perrone and Wieder. Pro-painkiller echo chamber shaped policy amid drug epidemic. The Center for
Public Integrity, 2016].
• And erosion of societal trust
[Nguyen. Echo chambers and epistemic bubbles. Episteme, 2020].

• Filter Bubbles and Echo Chambers
•Misinformation

Misinformation is Not a New Problem

Economic Cost of Misinformation

Economic Impact of Misinformation
FAKE NEWS: ELECTIONS
THE U.S. TO SPEND $200 MILLION ALONE ADVANCING FAKE
NEWS
$400 MILLION SPENT GLOBALLY ON FAKE POLITICAL NEWS
COVID-19 Vaccine Misinformation and
Disinformation Costs an Estimated $50 to
$300 Million Each Day
[Bruns, Hosangadi, Trotochaud, and Sell. Johns Hopkins
Center for Health Security. 2021].
[U. of Baltimore and CHEQ. The economic
cost of bad actors on the internet. Fake
News 2019].

Misinformation Propagation
● The connections between misinformation spreaders are denser than
connections between fact-checkers.
● Increasing the value of k takes us from the periphery to the denser inner
core structure.
k-Core decomposition of the pre-Election retweet network. Orange = fact-
checks and purple = claims.
[Shao, Hui, Wang et al. Anatomy of an online misinformation network. PLoS ONE 2018].

Misinformation Propagation + Bubbles
● Echo-chambers with misinformed sub-communities are much denser than
those with informed sub-communities.
[Memon and Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. CEUR Workshop 2020].
(a) Retweet (b) Mention
(c) Reply
(d) Retweet+Mention+Reply

• Misinformation
•Detecting Densest Subgraphs – Undirected

Densest Subgraphs: Undirected
• What is a good notion of density?
• Classical: average degree: ! " =
$
%
.
• Average #motifs/vertex: ' ", Ψ =
* +,,
%
. '-./ − optimal density.
• E.g., Δ-density.
• More generally, Ψ-density for pattern Ψ (e.g., h-clique).
• Intuition: densest subgraphs may indicate echo chambers.
6/15/22 SIGMOD 2022, Philadelphia, PA
#instances of Ψ (motif) in G.
21

Different notions of density.
-densest subgraph.
density = 11/7.
-densest subgraph.
density = 2/4.
• Clique-density.
• Pattern-density.
22

k-cores and k-clique-cores
3
2
1, 0
<latexit sha1_base64="h+a8v17wh/Dw4VGrNfEaZ6hpP7Q=">AAAB9XicbVDLSgNBEJyNrxhfUY9eBoPgxbArQT0GvHiMYB6QrGF20kmGzGOZmVXDkv/w4kERr/6LN//GSbIHTSxoKKq66e6KYs6M9f1vL7eyura+kd8sbG3v7O4V9w8aRiWaQp0qrnQrIgY4k1C3zHJoxRqIiDg0o9H11G8+gDZMyTs7jiEUZCBZn1FinXQ/6ohIPaVnE6o0mG6x5Jf9GfAyCTJSQhlq3eJXp6doIkBayokx7cCPbZgSbRnlMCl0EgMxoSMygLajkggwYTq7eoJPnNLDfaVdSYtn6u+JlAhjxiJynYLYoVn0puJ/Xjux/aswZTJOLEg6X9RPOLYKTyPAPaaBWj52hFDN3K2YDokm1LqgCi6EYPHlZdI4LwcX5cptpVStZHHk0RE6RqcoQJeoim5QDdURRRo9o1f05j16L9679zFvzXnZzCH6A+/zB+i2kr8=</latexit>
k-cores
<latexit sha1_base64="9hFQByLqhvsg+DyYKvGpEOrpZNE=">AAACBHicbVDLSgNBEJyNrxhfUY+5DAYhgoZdCeox4MVjBPOA7BJmJ51kyOzMMjMrhiUHL/6KFw+KePUjvPk3Th4HTSxoKKq66e4KY860cd1vJ7Oyura+kd3MbW3v7O7l9w8aWiaKQp1KLlUrJBo4E1A3zHBoxQpIFHJohsPrid+8B6WZFHdmFEMQkb5gPUaJsVInXygNT7FvFCOiz+HEj0L5kJ6NqVSgO/miW3anwMvEm5MimqPWyX/5XUmTCIShnGjd9tzYBClRhlEO45yfaIgJHZI+tC0VJAIdpNMnxvjYKl3ck8qWMHiq/p5ISaT1KAptZ0TMQC96E/E/r52Y3lWQMhEnBgSdLeolHBuJJ4ngLlNADR9ZQqhi9lZMB0QRamxuORuCt/jyMmmcl72LcuW2UqxW5nFkUQEdoRLy0CWqohtUQ3VE0SN6Rq/ozXlyXpx352PWmnHmM4foD5zPHyBll8E=</latexit>
(k, 4)-cores
0 1 2, 3
(", $)-core of G – maximal subgraph where each vertex participates in ≥
' instances of Ψ.
23

Densest Subgraph Discovery
Problem: Given a graph G(V, E) and an h-clique Ψ "#, %# ,
find the subgraph D with the highest h-clique density
& ', Ψ .
Ψ can be any pattern: e.g., a 3-star, Δ, etc.
Focus of this talk: h-cliques.
24

SOTA1
: Densest Subgraph Discovery:
Exact
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the
feasibility
• Example: ! = 0, % = 1 (max triangle deg)
• α= (l+r)/2=0.5.
• Run time: '
( )
* − 1
ℎ − 1
+ ) Λ + min ), Λ 2
.
1
As of 2017.
[Mitzenmacher, Pachocki, Peng, Tourakakis, and Xu. Scalable large near-clique detection in large-scale networks via
sampling. KDD 2015].
#instances of Ψ.
⇒⇒

A
DS Discovery – A Triangle Example
B
C
D
s t
Ψ"
Ψ#
Ψ$
Ψ%
0
1
1
1
3&
3&
3&
3&
+∞
+∞
+∞
+∞
+∞
+∞
+∞
+∞
1
1
1
Flow network. 26
If ) = 0.5
If ) = 1/3
⇐

SOTA1
Densest Subgraph Discovery:
Approximation
• Approximation algorithm: PeelApp
• Iteratively peel the vertex w/ smallest h-clique-degree.
• Let !", !$, … be the list of residual subgraphs generated.
• Return !& with the highest density.
• Approximation:
• The density of S is at least
"
'(
⋅ *+,- =
"
/
⋅ *012.
• Running time: time.
<latexit sha1_base64="iHkLEsdke5bqZTUfsJFWe3g6ats=">AAACBHicbVDLSsNAFJ34rPUVddnNYBHqoiWRoi5cFNy4s4J9QBPKZDJph05mwsxEKKELN/6KGxeKuPUj3Pk3TtsstPXAhcM593LvPUHCqNKO822trK6tb2wWtorbO7t7+/bBYVuJVGLSwoIJ2Q2QIoxy0tJUM9JNJEFxwEgnGF1P/c4DkYoKfq/HCfFjNOA0ohhpI/Xt0m2FezgUGmZh1fXwUAhF4LDqTk5h3y47NWcGuEzcnJRBjmbf/vJCgdOYcI0ZUqrnOon2MyQ1xYxMil6qSILwCA1Iz1COYqL8bPbEBJ4YJYSRkKa4hjP190SGYqXGcWA6Y6SHatGbiv95vVRHl35GeZJqwvF8UZQyqAWcJgJDKgnWbGwIwpKaWyEeIomwNrkVTQju4svLpH1Wc89r9bt6uXGVx1EAJXAMKsAFF6ABbkATtAAGj+AZvII368l6sd6tj3nripXPHIE/sD5/AEI0lo0=</latexit>
O(n ·
✓
d 1
h 1
◆
)
[Tsourakakis. The k-clique densest subgraph problem. WWW 2015].
27
1
As of 2017.

DSD: SOTA Limitations
• Initial bounds on ! not tight.
• Size of flow network can be large: e.g., large G with
many instances of Ψ.
• Flow network built from original G each time.
• Even PeelApp does redundant work.
$, Ψ -core to the rescue!
Can we “bound” the densest subgraph?

Bounding Densest Subgraphs with Cores
• Theorem: G, k, Ψ as before. H a (#, Ψ)-core of G. Then:
#
&'
≤ ) *, Ψ ≤ #+,-.
Special case: #+,--core has density in
/012
3
, #+,- .
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
29
h

Bounding DSG with cores: An Example
For !"#$ = 2 and a 2-core, LB = 1 and UB = 2.
' = 1. ' =
5
4
,
9
6
,
13
8
, ⋯ → 2.
30

Bounding Densest Subgraphs with Cores
• Lemma: The DSG of G must be contained in its
(⌈#$%&⌉, Ψ)-core.

Exact algorithm: CoreExact
• Our algorithm: CoreExact
• Follow the same framework as existing exact algorithm
• Three core-based optimization techniques
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the feasibility
1. Tighter bounds derived from cores [
"#$%
&'
, )*+,]
2. Build the flow network on cores
3. Locate Clique-densest subgraph in even smaller cores after each checking

Approximation Algorithms
• IncApp:
• Do a (", Ψ)-core decomposition of G. time.
• Return the ("&'(, Ψ)-core.
•
)
|+,|
=
)
.
-approximation.
• Finding (repeatedly) clique-degree can be expensive for
large cliques.
• CoreApp: Heuristic to directly find ("&'(, Ψ)-core.
<latexit sha1_base64="ojo/HvrAsrswEIka12R2Rr1XIFU=">AAACBHicbVDLSsNAFJ3UV62vqMtuBotQFy2JFHVZcOPOCvYBTSiTyaQdOpkJMxOhhC7c+CtuXCji1o9w5984bbPQ1gMXDufcy733BAmjSjvOt1VYW9/Y3Cpul3Z29/YP7MOjjhKpxKSNBROyFyBFGOWkralmpJdIguKAkW4wvp753QciFRX8Xk8S4sdoyGlEMdJGGtjl2yr3cCg0zMKa6+GREIrAUc2dnsGBXXHqzhxwlbg5qYAcrYH95YUCpzHhGjOkVN91Eu1nSGqKGZmWvFSRBOExGpK+oRzFRPnZ/IkpPDVKCCMhTXEN5+rviQzFSk3iwHTGSI/UsjcT//P6qY6u/IzyJNWE48WiKGVQCzhLBIZUEqzZxBCEJTW3QjxCEmFtciuZENzll1dJ57zuXtQbd41K08njKIIyOAFV4IJL0AQ3oAXaAINH8AxewZv1ZL1Y79bHorVg5TPH4A+szx8+mJaB</latexit>
O(n ·
✓
d 1
h 1
◆
)
33

Approximation Algorithms
Core App:
1. Sort vertices of G in ↓ order of their h-clique-based core
number, using cheaper proxy.
2. Obtain the max core & core number " from top-#
vertices
3. If the max degree of remaining vertices is larger than "
• # = 2×#, repeat 2.
• Otherwise, output the max core
Same worst case time complexity as IncApp and PeelApp (SOTA) but much faster in practice.

Sample Experiment Results
As-Caida (n = 26K, m = 106K). Friendster (n = 20M, m = 106M).
35

Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
Courtesy: Thirumuruganathan, QCRI.

Densest subgraph :
86 vertices
18-core
density : 12.5407
Top-2 densest subgraph:
1134 vertices
13-core
density : 10.0150
Cross edges: 296
Side
effects
of
Vaccine
Modes
of
Transm-
ission
of Virus.
Case counts in diff
states and
countries.

Mini Case Study II: Voter Fraud 2020
Tweets on US Presidential Election 2020.
Number of nodes : 1,385,225
Number of edges : 6,631,720
Number of Tweets: 8,085,323
Size of the largest connected component:
Number of nodes: 1,356,657
Number of edges : 6,611,465

1962 vertices
91-core
Density: 83.7665
2206 vertices
54-core
Density: 50.9231 Cross edges: 1385

Repeated allegations
of voter fraud.
retweeting Sydney
Powell’s tweet
warning states against
certifying the election.
Quoting Trump “dirty
rolls ==> dirty polls”.
big tech is colluding
with dems to defeat
Trump. Vote in person
to fight against mail-in
voter fraud. FBI said
many military mail-in
votes, all for Trump,
were thrown away in
a ditch in PA. Biggest
voter fraud in
American history.
Voting machines
known to be insecure.
Need proof of
citizenship and photo
ID to prevent fraud.
Fact-checkers from AP,
Politifact, &
Reuters confirm -- no
evidence of
widespread election
fraud. Experts confirm
elections are secure;
most of the
interference comes
from misinformation
campaigns. GOP and
Trump team are
sowing disinfo. and
panic. Need to protect
democracy. Trump’s
narrower margin wins
in 2016 vs Biden’s
wider ones in 2020.
Debunk “Deborah
Jean Christiansen’s
vote is fraud” by
quoting her. More
former Trump aides
getting infected than
voter fraud cases!
Quotes of Sydney Powell’s tweet; replies
that there is no evidence of widespread
fraud; Biden brags about having “the most
extensive and inclusive VOTER FRAUD
organization in the history of American
politics; (CNN) dishonesty taxonomy of
Trump rally; Phily Mayor hiding info. from
people. Anyone caught cheating with
Voter Fraud games should be federally
charged; State officials from both parties
stated the election went well. Losing side
refusing to recognize clear winner;
weaving conspiracy theories and
strangling faith and belief.

Mini Case Study III: Nepal Earthquake
• Graph constructed from cascades of tweets collected following the Nepal
earthquake, April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.

1463 vertices
129-core
density: 105.328
370 vertices
115-core
density : 71.9378
129 edges
Requests
for help
Info on
earthquake –
magnitude,
distance to cities
affected from
capital
Reports
on
damage
and ruin

Recent Progress on DSGs
WWW2020
Provide near optimal
via multiple peeling
1 + # -approx within
$(
& '( )
*∗ ⋅
-
./) proved by
[SODA2022]
STOC2020
(1 + #)-approximation
on dynamic graph
With $(log4 5 ⋅ #67)
per edge
insertion/deletion
WWW2020
Define and find
minimal DSG
Minimal: no proper
subgraph is a DSGs
SODA2022
A flow-based 1 + # -
approx algo
With 8
$(
9
.
)
[Digvijay, Gao, Peng et al. Flowless: Extracting densest subgraphs without flow computations. WWW 2020].
[Sawlani and Wang. Near-optimal fully dynamic densest subgraph. STOC 2020].
[Chang and Qiao. Deconstruct Densest Subgraphs. WWW 2020].
[Chekuri, Quanrud, and Torres. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. SODA 2022].

• Misinformation
•Detecting Densest Subgraphs – Directed

Directed Densest Subgraphs
a
e
d
c
b
!∗
#∗
A directed densest subgraph (DDS) of a digraph
is a pair of vertex sets (S, T). Its density is
<latexit sha1_base64="jzi2npcaaUdd+d3XTNTd2P0/iEE=">AAACGnicbVBNS8NAEN3U7/oV9ehlsQgKUhIp6kUoiuBRsdVCU8pmu2mXbrJxdyKUJL/Di3/FiwdFvIkX/43b2oNaHwy8fW+GnXl+LLgGx/m0ClPTM7Nz8wvFxaXllVV7bf1ay0RRVqdSSNXwiWaCR6wOHARrxIqR0Bfsxu+fDv2bO6Y0l1ENBjFrhaQb8YBTAkZq266nenLnaq+2i4+xFyhC0+xs9M7y1NO3CtLsKvNoRwLOalme47ZdcsrOCHiSuGNSQmNctO13ryNpErIIqCBaN10nhlZKFHAqWF70Es1iQvuky5qGRiRkupWOTsvxtlE6OJDKVAR4pP6cSEmo9SD0TWdIoKf/ekPxP6+ZQHDUSnkUJ8Ai+v1RkAgMEg9zwh2uGAUxMIRQxc2umPaIyQdMmkUTgvv35ElyvV92D8qVy0qpejKOYx5toi20g1x0iKroHF2gOqLoHj2iZ/RiPVhP1qv19t1asMYzG+gXrI8vxligKA==</latexit>
⇢(S, T) =
|E(S, T)|
p
|S| · |T|
-- generalizes edge density from undirected graphs.
Problem: Find !∗
, #∗
with max. %.
[Kannan and Vinay. Analyzing the structure of large graphs. Tech Report 1999].
45

SOTA1
DDS Discovery: Exact
• Repeatedly solve Max-flow, similarly to the undirected case.
• for each value of ! =
|$|
|%|
: 0 < ) , |+| ≤ -
• Find the max density by binary search.
• Build flow network and solve Max-flow.
• Overall time: . -/
01234567 .
• > 2 days on ~1,200 vertices and ~2,600 edges.
[Khuller and Saha. On finding dense subgraphs. ICALP 2009].
46
1As of 2019.

SOTA DDS Discovery: Approximation
Greedy Peeling Algorithm:
• Build a bipartite graph
(L,R,E) where ! = # = $
• The edges are all from
! copy to # copy
• Each time remove a node
with least degree
• Report densest subgraph
among those obtained.
c
a b
d
e
% & + ( time.
Approximation?
47
G
[Khuller and Saha. On finding dense subgraphs. ICALP 2009].

SOTA DDS Discovery: Approximation
• Fix [personal communication with authors].
• 2-approximation algorithm
• !(#(# + %))
KS-Approx
density: 2.75
Ground truth
density: 6
<latexit sha1_base64="Whotl/O/SEtWiMWhAbdFJgi04F4=">AAACfHicbVFdSwJBFB23L7MvrcdehiwoKtm1qB6jgnrwoSgrMJHZ8aqDs7PLzN1Qln5Cr/Xb+jPRrBqkdmHgcM7cz+NHUhh03a+MMzM7N7+QXcwtLa+sruUL648mjDWHKg9lqJ99ZkAKBVUUKOE50sACX8KT371M9adX0EaE6gH7EdQD1laiJThDS937Da+RL7oldxB0GngjUCSjuG0UMvylGfI4AIVcMmNqnhthPWEaBZfwlnuJDUSMd1kbahYqFoCpJ4NZ3+iOZZq0FWr7FNIB+zcjYYEx/cC3PwOGHTOppeR/Wi3G1lk9ESqKERQfNmrFkmJI08VpU2jgKPsWMK6FnZXyDtOMoz3PWJdB7Qj42CZJL1aCh02YYCX2UDNLGsCACZVulVSEinu0InywN1Hwq9qyqbx7JdoCzUHFeqAOrjVAd28qxdriTZowDR7LJe+oVL47Lp5fjAzKkk2yRXaJR07JObkht6RKOGmTd/JBPjPfzraz7xwOvzqZUc4GGQvn5Ad1dMU4</latexit>
<latexit sha1_base64="5KqrWk8OLGSMOzvxwclVW1sn29I=">AAACfHicbVHLSiNBFK20zhgfo0aXbgqj4DAaulXUpaigiywUjQoxhOqbm1ikurqpui0JjZ/gVr/NnxGrYwSTzIWCwzl1nydMlLTk++8Fb2r61++Z4uzc/MKfxaXl0sqtjVMDWINYxeY+FBaV1FgjSQrvE4MiChXehd3TXL97QmNlrG+on2AjEh0t2xIEOeoamkFzuexX/EHwSRAMQZkN47JZKsBDK4Y0Qk2ghLX1wE+okQlDEhQ+zz2kFhMBXdHBuoNaRGgb2WDWZ77pmBZvx8Y9TXzA/szIRGRtPwrdz0jQox3XcvJ/Wj2l9lEjkzpJCTV8NWqnilPM88V5SxoEUn0HBBjpZuXwKIwAcucZ6TKonSCMbJL1Ui0hbuEYq6hHRjjSIkVC6nyrrCp12uNVGaK7icZv1ZXN5a0z2ZFkt6vOA719bhC7fydSnC3BuAmT4Ha3EuxVdq/2y8cnQ4OKbI2tsy0WsEN2zC7YJasxYB32wl7ZW+HD2/D+eTtfX73CMGeVjYR38Al3jMU5</latexit> <latexit sha1_base64="OciOVARK1sKEoilDge+XCapw8Sg=">AAACfHicbVFdbxJBFB1WrS3alupjXyaiCaYt2cWm9ZGoiT7wgGn5SICQ2csFJszObmbuNpANP8FX/W3+GeMsYCLQm0xycs7czxMmSlry/d8F78nTZwfPD4+KL14en5yWzl61bZwawBbEKjbdUFhUUmOLJCnsJgZFFCrshLPPud55QGNlrO9pkeAgEhMtxxIEOeoOhrVhqexX/VXwfRBsQJltojk8K0B/FEMaoSZQwtpe4Cc0yIQhCQqXxX5qMREwExPsOahFhHaQrWZd8neOGfFxbNzTxFfs/xmZiKxdRKH7GQma2l0tJx/TeimNPw4yqZOUUMO60ThVnGKeL85H0iCQWjggwEg3K4epMALInWery6p2grC1STZPtYR4hDusojkZ4UiLFAmp862yhtTpnDdkiO4mGv+prmwuV77IiSR72XAe6MuvBnH2fi/F2RLsmrAP2rVq8KFa+35drn/aGHTIztkbVmEBu2V19o01WYsBm7Af7Cf7VfjjvfUuvKv1V6+wyXnNtsK7+Qt5osU6</latexit> <latexit sha1_base64="DGIsGN9ixCJF6GsZzWTuQPbmAhU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ2O/sVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Be7jFOw==</latexit> <latexit sha1_base64="JOt/1H2zqv7i0ww80DAT2XJ/owU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ+OgsVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Bfc7FPA==</latexit> <latexit sha1_base64="wI9CgGlL/wh61/YzwYNb5yZoG+8=">AAACfHicbVHLSitBEO2Mb72+l24acy8oapjxvRQVdJGFco0KMYSeSiU26ekZumskYfAT3Oq3+TNiT4xgEgsaDud0PU+YKGnJ998L3tj4xOTU9Mzs3J/5hcWl5ZVbG6cGsAKxis19KCwqqbFCkhTeJwZFFCq8C9tnuX73hMbKWN9QN8FaJFpaNiUIctR/qB/Ul4p+ye8FHwVBHxRZP67qywV4aMSQRqgJlLC2GvgJ1TJhSILC59mH1GIioC1aWHVQiwhtLevN+sz/OabBm7FxTxPvsT8zMhFZ241C9zMS9GiHtZz8Taum1DyuZVInKaGGr0bNVHGKeb44b0iDQKrrgAAj3awcHoURQO48A116tROEgU2yTqolxA0cYhV1yAhHWqRISJ1vlZWlTju8LEN0N9H4rbqyubxxLluS7HbZeaC3Lwxie3MkxdkSDJswCm53S8Feafd6v3hy2jdomq2xdbbBAnbETtglu2IVBqzFXtgreyt8eH+9LW/n66tX6OessoHwDj8Bf+TFPQ==</latexit> <latexit sha1_base64="WOlJgwemx+DmvqbfEWG3xF6xG2Q=">AAACfHicbVFdSxtBFJ2sVlPbaqKPfRlMC5Zq2I0SfRQr6EMelDYqxBBmb26SIbOzy8xdSVjyE3zV39Y/UzobI5jECwOHc+Z+njBR0pLv/y14K6sf1taLHzc+ff6yuVUqb9/YODWATYhVbO5CYVFJjU2SpPAuMSiiUOFtOPyV67cPaKyM9R8aJ9iORF/LngRBjvoNnXqnVPGr/jT4MghmoMJmcdUpF+C+G0MaoSZQwtpW4CfUzoQhCQonG/epxUTAUPSx5aAWEdp2Np11wr87pst7sXFPE5+ybzMyEVk7jkL3MxI0sItaTr6ntVLqnbQzqZOUUMNLo16qOMU8X5x3pUEgNXZAgJFuVg4DYQSQO89cl2ntBGFuk2yUaglxFxdYRSMywpEWKRJS51tlDanTEW/IEN1NNL6qrmwu753LviS733Ae6P0Lgzj8sZTibAkWTVgGN7VqcFitXR9VTs9mBhXZV7bL9ljAjtkpu2RXrMmA9dkje2LPhX/eN++nd/Dy1SvMcnbYXHj1/4H6xT4=</latexit>
<latexit sha1_base64="b/lZi7cHtUhY0qgyTwdfMpaH82g=">AAACfnicbVFdSxtBFL1ZW6u2WrWPfRkaLAoad6Ogj1IL9iEPFowKMYTZyU28ZHZ2mbkrCUt+g6/60/w3zsYUmsQLA4dz5n6eONPkOAxfKsHSh4/Ln1ZW1z5/Wd/4urm1fe3S3CpsqlSn9jaWDjUZbDKxxtvMokxijTfx4LzUbx7QOkrNFY8ybCeyb6hHSrKnmnGnqI87m9WwFk5CLIJoCqowjcvOVkXddVOVJ2hYaelcKwozbhfSMimN47W73GEm1UD2seWhkQm6djGZdix2PNMVvdT6Z1hM2P8zCpk4N0pi/zORfO/mtZJ8T2vl3DttF2SynNGot0a9XAtORbm66JJFxXrkgVSW/KxC3UsrFfsDzXSZ1M5QzWxSDHNDKu3iHKt5yFZ60iEnkky5VdEgkw9Fg2L0NzH4T/VlS3n3N/WJ3X7Du2D2LyziYG8hxdsSzZuwCK7rteioVv97XD37NTVoBb7DD9iFCE7gDP7AJTRBAcEjPMFzAMHP4CA4fPsaVKY532AmgtNXo8DFRg==</latexit> <latexit sha1_base64="HFj6g0RuKsIntz/MrjsPqy3QnNo=">AAACfnicbVFdSxtBFL3ZqvX7oz76MhgqChp3tVAfxQr6kAeFRoUYwuzkJl4yO7vM3C0JS35DX9uf1n/T2RjBJF4YOJwz9/PEmSbHYfivEnxaWFz6vLyyura+sbm1vfPlwaW5VdhQqU7tUywdajLYYGKNT5lFmcQaH+P+j1J//IXWUWp+8jDDViJ7hrqkJHuqEbeL81F7uxrWwnGIeRBNQBUmcdfeqajnTqryBA0rLZ1rRmHGrUJaJqVxtPqcO8yk6sseNj00MkHXKsbTjsRXz3REN7X+GRZj9n1GIRPnhknsfyaSX9ysVpIfac2cuxetgkyWMxr12qiba8GpKFcXHbKoWA89kMqSn1WoF2mlYn+gqS7j2hmqqU2KQW5IpR2cYTUP2EpPOuREkim3Kupk8oGoU4z+JgbfVF+2lA+vqUfsjuveBXN8YxH7R3Mp3pZo1oR58HBWi85rZ/ffqpdXE4OWYQ/24RAi+A6XcAt30AAFBL/hD/wNIDgIToLT169BZZKzC1MRXPwHpdfFRw==</latexit>
<latexit sha1_base64="Mrac9AmGg1pDfULu3wtz7vyya0Y=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7PqB+ha0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDPbHFhw==</latexit>
<latexit sha1_base64="5PN1cqdjLamdY7CGZ+vd+T/Tydo=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7Kr48Ra0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDP8jFiA==</latexit>
<latexit sha1_base64="kaC645jKDfiAyiPu7G+ZAQe5fW0=">AAACf3icbVFdSwJBFB23b/vSeuxlSKKCkF0LqreooB58KEgNTGR2vOrk7OwyczeUxf/Qa/2z/k2ztkFqFwYO58z9PH4khUHX/co5C4tLyyura/n1jc2t7UJxp27CWHOo8VCG+tlnBqRQUEOBEp4jDSzwJTT8wU2qN95AGxGqJxxF0ApYT4mu4AwtVffbiXcxbhdKbtmdBJ0HXgZKJIuHdjHHXzohjwNQyCUzpum5EbYSplFwCeP8S2wgYnzAetC0ULEATCuZjDumB5bp0G6o7VNIJ+zfjIQFxowC3/4MGPbNrJaS/2nNGLsXrUSoKEZQ/KdRN5YUQ5ruTjtCA0c5soBxLeyslPeZZhzthaa6TGpHwKc2SYaxEjzswAwrcYiaWdIABkyodKukKlQ8pFXhg72Jgl/Vlk3lo1vRE2hOqtYGdXKnAQbHcynWFm/WhHlQr5S903Ll8ax0dZ0ZtEr2yD45Ih45J1fknjyQGuHklbyTD/Lp5JxDp+y4P1+dXJazS6bCufwGPavFhw==</latexit>
…
18 vertices
36 vertices
<latexit sha1_base64="0KVnyYv6DtN8OkwP1tQAIjKB8QQ=">AAACfHicbVFdbxJBFB1WWyttLdVHXybSJjStZBeN+kjUxD7wQKN8JEDI3eECE2ZnNzN3DWTDT/BVf5t/xjgLNCnQm0xycs7czxMmSlry/b8F78nTg8NnR8+LxyenL85K5y/bNk6NwJaIVWy6IVhUUmOLJCnsJgYhChV2wtmXXO/8RGNlrH/QIsFBBBMtx1IAOeo7DINhqexX/VXwfRBsQJltojk8L4j+KBZphJqEAmt7gZ/QIANDUihcFvupxQTEDCbYc1BDhHaQrWZd8kvHjPg4Nu5p4iv2YUYGkbWLKHQ/I6Cp3dVy8jGtl9L40yCTOkkJtVg3GqeKU8zzxflIGhSkFg6AMNLNysUUDAhy59nqsqqdoNjaJJunWop4hDusojkZcKRFikDqfKusIXU65w0ZoruJxnvVlc3lylc5kWRvGs4DffPNIM6u9lKcLcGuCfugXasG76q1u/fl+ueNQUfsNXvDKixgH1md3bImazHBJuwX+83+FP55F96193b91Stscl6xrfA+/AdzXMU3</latexit>
Approximation Ratio
'
(.*+
= 2.18
# of c nodes = 41(
# of b nodes = 21(
# of a nodes = 1
Ground truth density: 21
KS-Approx density:
23
(3456
Approx Ratio:
(3456
(
Enlarge the
graph
[Khuller and Saha. On finding dense subgraphs. ICALP. 2009].
7∗.
7∗.
48
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD
2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].

Densest Directed Subgraph: An Exact
Algorithm
• (", $)-core: An (S, T)-induced
subgraph:
• Every node in S has outdegree ≥ ".
• Every node in T has indegree ≥ $.
• S and T not necessarily disjoint.
• H = ({a,b}, {c,d}) is a (2, 2)-core.
c
a b
d
e
⇤
<latexit sha1_base64="vK5hisxuwaLuWz+t3EWeuy906m4=">AAACfHicbVHZSgMxFE3Hre7boy/BKriWGRX1UVTQhz5U7CLUKpn0toZmMkNyR1qGfoKv+m3+jJipFWzrhcDhnNz1+JEUBl33M+NMTE5Nz2Rn5+YXFpeWV1bXKiaMNYcyD2WoH3xmQAoFZRQo4SHSwAJfQtVvX6V69RW0EaEqYTeCesBaSjQFZ2ip+9LT3vNKzs27/aDjwBuAHBlE8Xk1wx8bIY8DUMglM6bmuRHWE6ZRcAm9ucfYQMR4m7WgZqFiAZh60p+1R7ct06DNUNunkPbZvxkJC4zpBr79GTB8MaNaSv6n1WJsntcToaIYQfGfRs1YUgxpujhtCA0cZdcCxrWws1L+wjTjaM8z1KVfOwI+tEnSiZXgYQNGWIkd1MySBjBgQqVbJQWh4g4tCB/sTRT8qrZsKu9ci5ZAc1CwHqiDGw3Q3h1LsbZ4oyaMg8pR3jvOH92d5C4uBwZlyQbZJDvEI2fkgtySIikTTlrkjbyTj8yXs+XsO4c/X53MIGedDIVz+g1Hc8Ui</latexit>
⇤
<latexit sha1_base64="IfdjkWd9tC1nJRISm8srvbkdDxo=">AAACfHicbVHLSgMxFE3HV32/lm6CVaivMqOiLkUFXXRR0bZCrZJJb2toJjMkd6Rl6Ce41W/zZ8RMrWBbLwQO5+Q+jx9JYdB1PzPOxOTU9Ex2dm5+YXFpeWV1rWLCWHMo81CG+sFnBqRQUEaBEh4iDSzwJVT99mWqV19BGxGqe+xGUA9YS4mm4AwtdXf3tPu8knMLbj/oOPAGIEcGUXpezfDHRsjjABRyyYypeW6E9YRpFFxCb+4xNhAx3mYtqFmoWACmnvRn7dFtyzRoM9T2KaR99m9GwgJjuoFvfwYMX8yolpL/abUYm2f1RKgoRlD8p1EzlhRDmi5OG0IDR9m1gHEt7KyUvzDNONrzDHXp146AD22SdGIleNiAEVZiBzWzpAEMmFDpVklRqLhDi8IHexMFv6otm8r5K9ESaPaL1gO1f60B2jtjKdYWb9SEcVA5LHhHhcPb49z5xcCgLNkgmyRPPHJKzskNKZEy4aRF3sg7+ch8OVvOnnPw89XJDHLWyVA4J99FW8Uh</latexit>
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Ma et al. On Densest Subgraph Discovery. TODS 2021].
49

Densest Directed Subgraph: Core-Exact
Theorem: The DDS of G is contained in the (
"∗
$ %
,
%⋅"∗
$
)-
core.
• a =
)∗
|+∗|
-- unknown; search through all
,
-
: 0 < 1, 2 ≤ 4.
• 6∗
-- unknown: start with good bounds and use binary search.
• E.g., lower bound = any 2-approx. solution and upper bound = 2 ×
lower bound.
• Still 9(4$
:;%<=>?@) but much faster in practice – smaller flow
graphs.
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].

Densest Directed Subgraph: DC-Exact
• Uses a “divide and conquer” approach.
• For a given
!
"
, result of binary search for “best” (S,T) pair
gives enough info. about subranges of ratios that can be
skipped.
• Algorithm DC-Exact: $ %&'()*+,- , e.g., …
• % ≪ /0
in practice.

Densest Directed Subgraph: Core-Approx
• G[S,T] – (x,y)-core of G. Then ! ", $ ≥ &'.
• Let [&∗
, '∗
] be the max core-number pair, i. e. , it
maximizes &' among all (&, ')-cores.
• !∗
≤ 2 &∗'∗.
• èThe (&∗
, '∗
)-core is a 2-approx. solution to DDS.

• Naïve implementation: for each !, compute all (!, $)-
cores, 0 < $ < (, and return (!∗
, $∗
)-core
à *(( + + ( ) time.
• Can we do better?

x
8
5
2
7
1
6
y
4
3
7
4 8
2
1 3 6
5
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
Candidates
[ ⇤
, ⇤
]
Main idea:
for each ! ≤ #, search for the
largest %;
for each % ≤ #, search for the
largest !;
&( ( ⋅ (* + ()) time.
Max equal pair: (#, #).

Sample Experiment Results: Exact Algorithms
Up to 6 orders of magnitude faster
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
SIGMOD 2020].

Sample Experiment Results: Approx Algorithms
Up to 6 orders of magnitude faster
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
AR: (~3.4M, ~5.8M)
BA: (~2.1M, ~17.8M)
TW: (~52.6M, ~1.96B)
SIGMOD 2020].
[Bahmani, Kumar, Vassilvitskii. Densest Subgraph in Streaming and MapReduce. VLDB 2012].

Better Approximation Ratio?
• Propose a new LP formulation for DDS problem
• A divide-and-conquer algorithmic framework
• An efficient (1 + $)-approximation algorithm
• An efficient exact algorithm
• Up to 3 orders of magnitude faster than the state-of-the-
art exact and approximation algorithms
Any real positive number
[Ma, Fang, Cheng, L., and Han. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery.
SIGMOD 2022].
For more details, go to Chenhao’s talk Wednesday at 2 pm in Rm 202B.

Recent Progress on DDS
• A Concurrent work from SODA2022
• Gives (1 + $)-approximation in &
'(
(
)
) time via network
flow for undirected graphs
• Can also be extended to directed graphs with extra time cost
• It would be interesting to compare two algos empirically
[Chekuri, Quanrud, and Torres. “Densest Subgraph: Supermodularity, Iterative Peeling, and Flow.” SODA 2022].

Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)

Directed Densest Subgraph from Covid-19
Source Nodes = 777
Target Nodes = 15
Common Nodes = 2
(5 70)-core.
Density: 55.8826
777 nodes “influenced” by
15 “initiators”.
Vaccine side
effects,
Modes of
Transmission.

Mini Case Study II: Nepal Earthquake
• Graph constructed from cascades of tweets collected following the Nepal earthquake,
April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.

Directed Densest Subgraph from Nepal
Source Nodes: 122637
Target Nodes: 25233
Common nodes: 20713
(1,51)-core
density: 34.309
Tens of thousands of “initiators”
and more than a hundred thousand of
”influenced”.
Info on damage
and requests for
help.

• Misinformation
•Combating via Mitigation: A Refresher on
Influence Maximization

Propagation/Diffusion Models
• How does influence/information
travel in networks?
• Example Phenomena: infection,
product adoption, information,
opinion, rumor, etc.
• Stochastic diffusion models –
discrete/continuous time.
• How can we launch campaigns
to optimize design objectives?
[Kempe,Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].

Influence Maximization
• Core optimization problem in IM: Given a diffusion model M, a network
G = (V, E), model parameters, and problem parameters (e.g., budget). Find a
seed set under budget that maximizes .
expected number of adopters given
initial adopters S (spread).
S ⇢ V M (S)
65 6/15/22 SIGMOD 2022, Philadelphia, PA 65
e.g., edge propagation probabilities.

Complexity of IM
• Theorem: The IM problem is NP-hard for several major diffusion models
under both discrete time and continuous time.
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].

Complexity of Spread Computation
• Theorem: It is #P-hard to compute the expected spread of a node set
under major diffusion models. #simple paths in a digraph.
[Chen, Wang, and Yang. Efficient influence maximization in social networks. KDD 2009].
[Chen, Yuan, and Zhang. Scalable influence maximization in social networks under the linear threshold model.
ICDM 2010].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].

Properties of Spread Function
is
monotone: S ✓ S0
=) (S)  (S0
).
(S)

Properties of Spread Function
is
submodular:
(S)
S ⇢ S0
⇢ V, x 2 V S0
=)
(x|S0
)  (x|s), where
(x|S) := (S [ {x}) (S).
marginal gain.

Approximation of Submodular Function
Maximization
• Theorem: Let be a monotone submodular function, with Let
and resp. be the greedy and optimal solutions. Then
OPT
f : 2V
! R 0 f(;) = 0.
SGrd
S⇤
f(SGrd
) (1
1
e
)f(S⇤
).
[Nemhauser, Woolsey, and Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Prog. 1978].

Approximation of Submodular Function
Maximization

• Theorem: The spread function is monotone and submodular under
various major diffusion models, for both discrete and continuous time.
(.)

Baseline Approximation Algorithm
Monte Carlo simulations for estimating
expected spread.
Lazy Forward optimization to save useless
updates.
è Greedy still extremely slow on large networks.
[Leskovec, Krause, Guestarin, Faloutsos, VanBriesen, and N. Glance.
Cost-effective outbreak detection in networks. KDD 2007].
[Kempe, Kleinberg, and Tardos. Maximizing the spread
of influence through a social network. KDD 2003].

Reverse Influence Sampling
• A series of algorithms that guarantee a
-approximation to the optimal
expected spread.
• Key : use random reverse reachable sets
(rr-sets) to gauge quality of (candidate) seeds.
(1
1
e
✏)
<latexit sha1_base64="AW/ZWNJ71ORm2nTuWljbif+hLkI=">AAACAXicbVBNS8NAEN34WetX1IvgZbEI9dCSVEGPBS8eK9gPaErZbCft0s0m7G6EEuLFv+LFgyJe/Rfe/Ddu2xy09cHA470ZZub5MWdKO863tbK6tr6xWdgqbu/s7u3bB4ctFSWSQpNGPJIdnyjgTEBTM82hE0sgoc+h7Y9vpn77AaRikbjXkxh6IRkKFjBKtJH69nHZrXiBJDR1sxSyigexYjwS53275FSdGfAycXNSQjkaffvLG0Q0CUFoyolSXdeJdS8lUjPKISt6iYKY0DEZQtdQQUJQvXT2QYbPjDLAQSRNCY1n6u+JlIRKTULfdIZEj9SiNxX/87qJDq57KRNxokHQ+aIg4VhHeBoHHjAJVPOJIYRKZm7FdERMHtqEVjQhuIsvL5NWrepeVGt3l6V6PY+jgE7QKSojF12hOrpFDdREFD2iZ/SK3qwn68V6tz7mrStWPnOE/sD6/AGGeJZN</latexit>
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].

Reverse Reachable Sets (RR-Sets)
7
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].

7
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
RR-set = {A}
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]

7
• An RR-set is a subgraph sample of !
• Generation of RR-sets under the IC model:
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
sample its/their
incoming edges
RR-set = {A, C, B, E}
add the sampled
neighbors
• Intuition:
– An rr-set is a sample set of nodes that can
influence node A
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]

Influence Estimation with RR-Sets
• Theorem: Pr[S overlaps a random rr-set] =
!
"
× expected spread of S.
• Family of approx. algorithms: TIM, IMM, Stop-
and-Stare, …
[Tang et al., “Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency”, SIGMOD 2014]
[Tang et al., “Influence Maximization in Near-Linear Time: A Martingale Approach”, SIGMOD 2015]
[Chen et al. An issue in the Martingale Analysis of the Influence Maximization Algorithm IMM. arXiv 2018].
[Nguyen et al., “Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks”,
SIGMOD 2016] à arXiv
[K. Huang, S. Wang, G. Bevilacqua, X. Xiao, and L. Revisiting the Stop-and-Stare Algorithms for Influence
Maximization, PVLDB 2017]

What if objective is not submodular?
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].

! "#$%
≥
1
(
1 − e+,-
OPT.

What if the objective is not submodular?
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
• f – monotone but not submodular.
• !, # – monotone and submodular and
! (#) lower (resp. upper) bounds f.
• Let $% ($', $() be the Greedy solution to
max
-⊆/, - 01
2 $ and $34 ∈ {$%, $', $(} (resp. …)
be the best w.r.t. f(.).
Then

What if the objective is not submodular?
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
! "#$ ≥ max{
!("+)
-("+)
,
/("0
123
)
!("0
123
)
} ⋅ 1 −
1
8
⋅ ! "0
123
.
OPT.

• Misinformation
•Mitigating Filter Bubbles

Filter Bubbles, Echo Chambers, and Polarization
• Selective exposure to viewpoints/issues can engender/worsen
polarization.
[Pariser. The filter bubble: What the Internet is hiding from you. Penguin, 2011].
[Bakshy, Messing, and Adamic. Exposure to ideologically diverse news and opinion on Facebook. Science 2015].
• Aggravated by echo chambers in social media.
[Garrett. Echo chambers online?: Politically motivated selective exposure among internet news users. JCMC 2009].
[Akoglu. Quantifying political polarity based on bipartite opinion networks. ICWSM 2014].
[Amelkin, Singh, and Bogdanov. A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks.
TKDD 2019].
[Chen, Lijffijit,, and De Bie. Quantifying and Minimizing Risk of Conflict in Social Media. KDD 2018].
[Garimella, de Morales, Gionis, and Mathioudakis. Quantifying Controversy over Social Media. TOCS 2018].

Balancing Exposure by Connections
• Link Recommendation
[Amelkin and A. K. Singh. Fighting opinion control in social networks via link recommendation. KDD
2019].
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW
2018],.
[Zhu, Bao, and Zhang. Minimizing Polarization and Disagreement in Social Networks via Link
Recommendation. NeurIPS 2021].

Interdisciplinary Approach
• Comprehensive solution goes beyond CS: e.g.,
Polarization Lab https://www.polarizationlab.com
• Interdisciplinary (CS, stats, sociology) approach.
• Real-life experiment by recruiting democrat and republican
volunteers incentivized to follow bots tweeting posts initially
aligned with their ideology but gradually from the other side of
the aisle.
• Complemented with offline tracking and study.
[Bail. Breaking the Social Media Prism. Princeton Univ. Press. 2021].

Balancing via Information Campaigns
• Smart Algorithm Bursts Social Networks' "Filter
Bubbles"
• “Instead of building echo chambers, Facebook, Twitter and
company can tweak their code to broaden exposure to wider
ranges of views.”
• “… results suggest that targeting a strategic group of social
media users and feeding them the right content is more
effective for propagating diverse views through a social media
network …”
[IEEE Spectrum Jan 2021. Featuring research of Aslay, Matakos, Galbrun, and Gionis. TKDE 2020].

Balancing via Information Campaigns
• Information Campaign Approach
[Garimella, Gionis, Parotsidis, and Tatti. Balancing information exposure in social networks. NeurIPS
2018].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE
2020].
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
• Common assumptions:
• awareness = adoption.
• Adoption of opposing views is independent.

Opinions can have complex interaction
Adopted and propagated independently?!
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
Source: https://newsinteractives.cbc.ca/elections/federal/2021/party-platforms/#section-climate-change

Pure competition.
levels by 2030.
warnings of looming
exaggerated.

Partial competition.
levels by 2030.
warnings of looming
exaggerated.

Complementation/reinforcement.
levels by 2030.
warnings of looming
exaggerated.

• Misinformation
•A User Utility Perspective
A useful digression.

Awareness vs adoption
Higher utility!!
Awareness spreads like epidemic, but adoption depends on UTILITY
[Kalish. A new product adoption model with price advertising and uncertainty, Management Science 1985].

Complementary (aka Reinforcing) Campaigns

Welfare Maximization: complementary
setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, and budget vector. Find an
allocation of seed nodes to items that maximizes the
expected social welfare.
Expected sum of utilities of
itemsets adopted by users.

What does the theory say?
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].

A simple greedy still works
98
GREEDY ALGORITHM
Does not require specific
utility-parameters as input
(1 −
$
%
) approximation

Prefix-preserving seed selection - PRIMA
1 −
#
$
%&'()*+
1 −
#
$
%&'(#
,# ,-
1 −
#
$
%&'(-
,)*+ = max
2
b2
Select enough samples corresponding to every
budget of the budget vector
○ Challenge: The number of samples required is not monotone in
budget
99

Competing Campaigns

Welfare Maximization: competing setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, budget vector, and a fixed
(partial) allocation of seed nodes to items, find an allocation
of seed nodes to items that maximizes the expected
social welfare.
Expected sum of utilities of
itemsets adopted by users.

How hard is (the) competition?
[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].

[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
General case algorithm - SeqGRD
!"
!#
$# $% $"
• Instance dependent approximation :
&'()
&'*+
(- −
-
/
)123
• Sort the items based on their utilities – {$# > $% > ⋯ > $"}
!%
…
… ∑!9
$":; = max exp.
utility of any
bundle.
$"9<= exp. min
utility of any item.
PRIMA+.

• Misinformation
•A Network Host Utility Perspective

Filter bubble problem
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
NAY!
• Items (opinions) are complementary objective-wise
• Items (opinions) are competing propagation-wise
[Garrett Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal of computer-mediated communication 2009].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020].

Problem: Key Ingredients
§Competition parameter
§ After being influenced, adopt the second item w.p. = !, 0 ≤ ! < 1
§(Host’s) Reward of adoption is supermodular, models
complementarity
§ &, for the first item
§ & + Δ, for the second item, & < Δ
§Expected (host) utility for user adopting both & + !Δ
§Goal is to maximize the sum of utilities under a competition-
driven diffusion
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].

Filter bubble mitigation
• There is an existing bubble
• A more general setting
107
Item A
Problem FB Mitigation (FBM): Given graph ! = #, %, & ,
competition parameter ', 0 < ' < 1, fixed A seeds +,, and
budget -, find B seeds +., such that +. ≤ - and the
expected welfare is maximized.

Inherent Challenges – Strike One
• FBM is neither monotone nor submodular.
• Restricted (sequential) setting: propagation of follower
doesn’t start before that of leader ends. FBM in the
sequential setting is monotone and submodular! J
• But wait! FBM can be arbitrarily worse than FBM$%& and
vice versa! L

Another Attempt
Item A
First
Level
Competition
Item B
• Expected reward at each FLC node = ! + #Δ.
Surrogate objective: Expected # FLC nodes ×
(! + #Δ).
• Clearly a lower bound for FBM.
• But the FLC objective is neither monotone
nor submodular.

Algorithm 1 – SPReadGRD
• Greedily selects B seeds that maximize the marginal
spread
• Ignore the welfare objective
• PRIMA+ is used to do the seed selection
• Given fixed !"
, PRIMA selects !#
, such that
• %(!#
∪ !"
) = 1 −
,
-
− . %(!#∗
∪ !"
)

Analyzing SpreadGRD
• Given !, for the welfare function # the following holds:
• $% ! ≤ # ! ≤ $ + (Δ %(!)
• SPRGRD therefore has the following bound:
# !,
∪ !.
≥ $ ⋅ % !,
∪ !.
≥ $ ⋅ 1 −
1
3
− 4 ⋅ % !,
∪ !∗
≥
$
(Δ + $
(1 −
1
3
− 4)#(!,
∪ !∗
)

Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• A node influences its neighbors, with every item in the
awareness set
• !" # ≥ !(#)
• !"(⋅) is monotone and submodular

• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• !*(⋅) is monotone and submodular

• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• Using sandwich
• Let #,-./ = 0123045678∈ 5:,5,5<
!(#,=>)
• ! #,-./ ≥ max
B 5<
B< 5<
,
B: 5∗
B 5∗ 1 −
F
G
!(#∗
)

Algorithm 3 - NetRewGRD
115
Item A
Item B
First
Level
Competition
• Extends state of the sampling for
welfare objective
• Reverse reachable trees
• Recursive weight update using a
linear pass
• Scales for large networks
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].

Experiments
• Baselines considered:
• COEX: Maximizes co-adoptions of both items
• TDEM: Maximizes welfare based on leaning scores
116
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
[Aslay, Matakos, Galbrun, and Gionis. "Maximizing the diversity of exposure in a social network. TKDE 2020]

Sample of Results - Quality

Sample of Results – Running Time

• Misinformation
•Mitigating Misinformation

Misinformation Mitigation – Prior Art
• Influence Blocking
• Temporal aspects ignored or not differentiated
• Focus on scalability
[Ceren, Agrawal, and El Abbadi. "Limiting the spread of misinformation in social networks." WWW
2011],
[He, Song, Chen, and Jiang. Influence blocking maximization in social networks under the competitive
linear threshold model. SDM 2012],
[Song,, Hsu, and Lee. Temporal influence blocking: Minimizing the effect of misinformation in social
networks. ICDE 2017],
[Tong,Wu, Guo et al. An efficient randomized algorithm for rumor blocking in online social
networks." IEEE TNSE 2017],
[Tong, Du, and Wu. On misinformation containment in online social networks. NeurIPS 2018],
[Simpson, Srinivasan, and Thomo. Reverse Prevention Sampling for Misinformation Mitigation in
Social Networks. ICDT 2020].

Temporal Aspects of Propagation
[Vosoughi, Roy, and Aral. The spread of true and false news online. Science 2018]
Together these have important consequences for effective seed set selection
[Mitchell, Stocking, and Matsa. Long-form reading shows signs of life in our mobile news world. Pew
Research Center 2016]
Misinformation spreads faster, farther, and wider than truth! Adoption decisions
have varying lengths

Temporal Aspects of Propagation
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
• Associate meeting probabilities with each edge
• User reaction times sampled from a data-driven distribution
t = 0 t = 2 t = 3 t = 6
Adoption decisions of !", !$, !%, !&, !' uncontested.
!( faces a tie; broken with a random permutation, e.g., !', !" .
F->3.
DW: [3,6].
M->4.
Tie!

Misinformation Mitigation Problem
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB
2022]
Reward function !(⋅) measures effectiveness of mitigation
P1 is not submodular!
P1: Given fake seeds %& and reward function !(⋅),
find a seed set that maximizes the expected reward
Truth reaches well
before misinfo.
Truth arrives too late!

Sandwiching the Mitigation Objective
Observe: Supermodular behavior arises due to joint effect of mitigation seeds, i.e. acting
alone they would not achieve the same reward.
LB: Maximum reward over singleton seed sets from !" (tight).
!" = {%&, %(}
LB = *+,
-∈{/0,/1}
2(%4, {5})

Sandwiching the Mitigation Objective
Simple Candidate: drop meeting events and enforce dominant tie-breaking.
Tighter UB: remove meeting events on edges that can be traversed by both sides.
!" = {%&, %(}

Importance Sampling
Observe: only nodes reached
by the misinformation are
eligible for reward.
Idea: only sample roots from
nodes that misinfo
campaignreaches → tighter
bounds!
RDR sets: weighted analog to
RR sets for reward probabilities

Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Two settings for selecting misinformation seeds: (1) from top-k influential users and (2) uniformly at random
Small # popular instigators. Several bots or newly created puppet accounts.

Experiments
Reward distribution dominated by uncontested mitigation adoption

Experiments
Mitigation seeds remain effective under simultaneous perturbation of model parameters.

• Misinformation
•Misinformation Intervention

Intervention Challenges
Detectors are fallible
Hard vs Soft intervention

Misinformation Intervention – Prior Art
• Disadvantaging posts with misleading info, deleting
edges, removing nodes, … à too hard?
• No correction for wrong intervention!
[Farajtabar, Mehrdad, et al. Fake news mitigation via point process based intervention. ICML 2017],
[Tong et al. Gelling, and melting, large graphs by edge manipulation. CIKM 2012],
[Khalil, Boutros, Dilkina, and Song. "Scalable diffusion-aware optimization of network topology KDD 2014],
[Chen, Chen, et al. "Node immunization on large graphs: Theory and algorithms." TKDE 2015],
[Medya,, Silva, and Singh. "Approximate Algorithms for Data-driven Influence Limitation." TKDE 2020],
[Caraban et al. "23 ways to nudge: A review of technology-mediated nudging in human-computer
interaction." SIGCHI 2019],
[Caraban, Konstantinou, and Karapanos. "The Nudge Deck: A design support tool for technology-mediated
nudging." ACM Designing Interactive Systems Conference. 2020],
[Bhuiyan et al. "NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges." CSCW2
2021].

Cost Aware Intervention
[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]

Reward Function
!"
#$%
− reach of item '" after intervention.
!"
$()#$%
− reach of item '" w/ no intervention.

[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Cost Aware Intervention

Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
NCB-TS: Neural Contextual Bandits w/ Thompson Sampling
CB-TS: Contextual Bandits w/ Thompson Sampling
RB: (Learned) Rule based
CSC: Cost Sensitive Classification

Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Real-time Evaluation from Twitter’s stream during 10-Oct-2020 to 10-Nov-2020.
• 5 million tweets w/ 1800 distinct English news articles
• Topics include Politics (32%), Healthcare (26%), Entertainment (30%), Misc. (12%)
Manual Evaluation
• Random sample of 750 viral and non-viral
tweets
• 3 volunteers evaluated intervention
• Accuracy of 92.1%
Automated Evaluation
• Google FactCheck Claim Search API
• TiKL: That is a Known Lie
• Accuracy of 96.6%

• Misinformation
•Summary & Open Questions

Summary
• Efficient detection of dense subgraphs in undirected and
directed graphs is useful for finding filter bubbles and groups
of actors engaged in spreading misinformation.
• In mitigating filter bubbles via information campaigns,
competition between viewpoints/opinions cannot be ignored.
• In mitigating misinformation, it’s critical to incorporate
temporal aspects.
• In misinformation intervention, it’s important to watch your
step and correct your gait in the face of mistakes.

Open Questions – Detection
• Integrating content analysis in going after the “right”
densest subgraphs.
• Can we detect filter bubbles and groups promoting
misinformation as they form?
• Longitudinal: (how) do these groups transform over time?

Open Questions – Countering
• Multiple campaigns of items involving partial/pure
competition, complementation?
• How can we learn propagation probabilities, competition
parameters, utilities from available propagation traces?
• Go beyond expected outcome? E.g., as filter bubbles or
misinformation spreading occur, can we counter them?

Open Questions --
• Case studies reflecting the effect of mitigation campaigns on
filter bubbles and misinformation diffusion.
• Integrating with claim verification and (computational) fact
checking efforts.
• Incentivizing balance of adoption (in case of filter bubbles)
and adoption of truth (in case of misinformation).

Acknowledgments
Chenhao Ma Farnoosh Hashemi Glenn Bevilacqua Michael Simpson
HKU UBC UBC->Oracle UBC
Prithu Banerjee Reynold Cheng Saravanan Thirimuruganathan Xiaolin Han
UBC ->Oracle HKU QCRI, HBKU HKU
Xuemin Lin Wenjie Zhang Yixiang Fang Wei Chen Wei Lu
UNSW UNSW CUHK MSRA UBC→LinkedIn

ந"
றி!

sigmod-keynote.pdf

More Related Content

Similar to sigmod-keynote.pdf (20)

Recently uploaded (20)

sigmod-keynote.pdf