SlideShare a Scribd company logo
On A Quest for Combating
Filter Bubbles and
Misinformation
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, BC, Canada
SIGMOD 2022, Philadelphia, PA
6/15/22
Prolegomenon
• What this talk is not about and will not do for you.
• Classify different kinds of “fake news”: e.g., mis/disinformation ...
• Computational Fact Checking or Claim Verification
• Offer a comprehensive solution to the filter bubble/echo chambers
or “fake news” problems.
• The scope of both stretch beyond just tech (e.g., models and
algorithms).
• Even the “tech-restricted” versions we won’t get to completely solve
today (in this talk).
6/15/22 SIGMOD 2022, Philadelphia, PA 3
Prolegomenon
• Instead, we will examine some (necessarily restricted)
models and formulations of problems.
• Offer a view of how research done in some different
contexts may inspire techniques for solving restricted
versions of the filter bubbles / echo chambers and the
misinformation problems.
• In case I missed your work, …
6/15/22 SIGMOD 2022, Philadelphia, PA 4
Not long ago, or maybe long ago …
And then came …
6/15/22 SIGMOD 2022, Philadelphia, PA 6
but arguably also these …
Which led to many great things
6/15/22 SIGMOD 2022, Philadelphia, PA 7
•Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 8
["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
Filter Bubble and Echo Chambers exacerbate polarization
Filter Bubble and Echo Chambers exacerbate polarization
["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
Political Echo Chambers
● Members of densely connected groups are
likely to have the same opinions and
attitudes.
● Study focus on opposing political echo
chambers (~250K each) on Twitter in Japan.
● Political echo chambers have denser and
more core-periphery information spreading
structures than those of most other
communities.
5/14/22 SIGMOD 2022, Philadelphia, PA 11
[Asatani et al. Dense and influential core promotion of daily viral information spread in political echo chambers. Scientific
Reports 2021].
The Price of Filter Bubbles
• Filter bubbles and echo chambers can impede natural
opinion formation
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018].
• Can lead to one-sided policy decisions
[Perrone and Wieder. Pro-painkiller echo chamber shaped policy amid drug epidemic. The Center for
Public Integrity, 2016].
• And erosion of societal trust
[Nguyen. Echo chambers and epistemic bubbles. Episteme, 2020].
6/15/22 SIGMOD 2022, Philadelphia, PA 12
• Filter Bubbles and Echo Chambers
•Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 13
Misinformation is Not a New Problem
6/15/22 SIGMOD 2022, Philadelphia, PA 14
Economic Cost of Misinformation
6/15/22 SIGMOD 2022, Philadelphia, PA 15
Economic Impact of Misinformation
6/15/22 SIGMOD 2022, Philadelphia, PA 16
FAKE NEWS: ELECTIONS
THE U.S. TO SPEND $200 MILLION ALONE ADVANCING FAKE
NEWS
$400 MILLION SPENT GLOBALLY ON FAKE POLITICAL NEWS
COVID-19 Vaccine Misinformation and
Disinformation Costs an Estimated $50 to
$300 Million Each Day
[Bruns, Hosangadi, Trotochaud, and Sell. Johns Hopkins
Center for Health Security. 2021].
[U. of Baltimore and CHEQ. The economic
cost of bad actors on the internet. Fake
News 2019].
Misinformation Propagation
● The connections between misinformation spreaders are denser than
connections between fact-checkers.
● Increasing the value of k takes us from the periphery to the denser inner
core structure.
5/14/22 SIGMOD 2022, Philadelphia, PA 18
k-Core decomposition of the pre-Election retweet network. Orange = fact-
checks and purple = claims.
[Shao, Hui, Wang et al. Anatomy of an online misinformation network. PLoS ONE 2018].
Misinformation Propagation + Bubbles
● Echo-chambers with misinformed sub-communities are much denser than
those with informed sub-communities.
5/14/22 SIGMOD 2022, Philadelphia, PA 19
[Memon and Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. CEUR Workshop 2020].
(a) Retweet (b) Mention
(c) Reply
(d) Retweet+Mention+Reply
• Filter Bubbles and Echo Chambers
• Misinformation
•Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 20
Densest Subgraphs: Undirected
• What is a good notion of density?
• Classical: average degree: ! " =
$
%
.
• Average #motifs/vertex: ' ", Ψ =
* +,,
%
. '-./ − optimal density.
• E.g., Δ-density.
• More generally, Ψ-density for pattern Ψ (e.g., h-clique).
• Intuition: densest subgraphs may indicate echo chambers.
6/15/22 SIGMOD 2022, Philadelphia, PA
#instances of Ψ (motif) in G.
21
Different notions of density.
6/15/22 SIGMOD 2022, Philadelphia, PA
-densest subgraph.
density = 11/7.
-densest subgraph.
density = 2/4.
• Clique-density.
• Pattern-density.
22
k-cores and k-clique-cores
6/15/22 SIGMOD 2022, Philadelphia, PA
3
2
1, 0
<latexit sha1_base64="h+a8v17wh/Dw4VGrNfEaZ6hpP7Q=">AAAB9XicbVDLSgNBEJyNrxhfUY9eBoPgxbArQT0GvHiMYB6QrGF20kmGzGOZmVXDkv/w4kERr/6LN//GSbIHTSxoKKq66e6KYs6M9f1vL7eyura+kd8sbG3v7O4V9w8aRiWaQp0qrnQrIgY4k1C3zHJoxRqIiDg0o9H11G8+gDZMyTs7jiEUZCBZn1FinXQ/6ohIPaVnE6o0mG6x5Jf9GfAyCTJSQhlq3eJXp6doIkBayokx7cCPbZgSbRnlMCl0EgMxoSMygLajkggwYTq7eoJPnNLDfaVdSYtn6u+JlAhjxiJynYLYoVn0puJ/Xjux/aswZTJOLEg6X9RPOLYKTyPAPaaBWj52hFDN3K2YDokm1LqgCi6EYPHlZdI4LwcX5cptpVStZHHk0RE6RqcoQJeoim5QDdURRRo9o1f05j16L9679zFvzXnZzCH6A+/zB+i2kr8=</latexit>
k-cores
<latexit sha1_base64="9hFQByLqhvsg+DyYKvGpEOrpZNE=">AAACBHicbVDLSgNBEJyNrxhfUY+5DAYhgoZdCeox4MVjBPOA7BJmJ51kyOzMMjMrhiUHL/6KFw+KePUjvPk3Th4HTSxoKKq66e4KY860cd1vJ7Oyura+kd3MbW3v7O7l9w8aWiaKQp1KLlUrJBo4E1A3zHBoxQpIFHJohsPrid+8B6WZFHdmFEMQkb5gPUaJsVInXygNT7FvFCOiz+HEj0L5kJ6NqVSgO/miW3anwMvEm5MimqPWyX/5XUmTCIShnGjd9tzYBClRhlEO45yfaIgJHZI+tC0VJAIdpNMnxvjYKl3ck8qWMHiq/p5ISaT1KAptZ0TMQC96E/E/r52Y3lWQMhEnBgSdLeolHBuJJ4ngLlNADR9ZQqhi9lZMB0QRamxuORuCt/jyMmmcl72LcuW2UqxW5nFkUQEdoRLy0CWqohtUQ3VE0SN6Rq/ozXlyXpx352PWmnHmM4foD5zPHyBll8E=</latexit>
(k, 4)-cores
0 1 2, 3
(", $)-core of G – maximal subgraph where each vertex participates in ≥
' instances of Ψ.
23
Densest Subgraph Discovery
6/15/22 SIGMOD 2022, Philadelphia, PA
Problem: Given a graph G(V, E) and an h-clique Ψ "#, %# ,
find the subgraph D with the highest h-clique density
& ', Ψ .
Ψ can be any pattern: e.g., a 3-star, Δ, etc.
Focus of this talk: h-cliques.
24
SOTA1
: Densest Subgraph Discovery:
Exact
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the
feasibility
• Example: ! = 0, % = 1 (max triangle deg)
• α= (l+r)/2=0.5.
• Run time: '
( )
* − 1
ℎ − 1
+ ) Λ + min ), Λ 2
.
1
As of 2017.
6/15/22 SIGMOD 2022, Philadelphia, PA 25
[Mitzenmacher, Pachocki, Peng, Tourakakis, and Xu. Scalable large near-clique detection in large-scale networks via
sampling. KDD 2015].
#instances of Ψ.
⇒⇒
A
DS Discovery – A Triangle Example
6/15/22 SIGMOD 2022, Philadelphia, PA
B
C
D
s t
Ψ"
Ψ#
Ψ$
Ψ%
0
1
1
1
3&
3&
3&
3&
+∞
+∞
+∞
+∞
+∞
+∞
+∞
+∞
1
1
1
Flow network. 26
If ) = 0.5
If ) = 1/3
⇐
SOTA1
Densest Subgraph Discovery:
Approximation
• Approximation algorithm: PeelApp
• Iteratively peel the vertex w/ smallest h-clique-degree.
• Let !", !$, … be the list of residual subgraphs generated.
• Return !& with the highest density.
• Approximation:
• The density of S is at least
"
'(
⋅ *+,- =
"
/
⋅ *012.
• Running time: time.
6/15/22 SIGMOD 2022, Philadelphia, PA
<latexit sha1_base64="iHkLEsdke5bqZTUfsJFWe3g6ats=">AAACBHicbVDLSsNAFJ34rPUVddnNYBHqoiWRoi5cFNy4s4J9QBPKZDJph05mwsxEKKELN/6KGxeKuPUj3Pk3TtsstPXAhcM593LvPUHCqNKO822trK6tb2wWtorbO7t7+/bBYVuJVGLSwoIJ2Q2QIoxy0tJUM9JNJEFxwEgnGF1P/c4DkYoKfq/HCfFjNOA0ohhpI/Xt0m2FezgUGmZh1fXwUAhF4LDqTk5h3y47NWcGuEzcnJRBjmbf/vJCgdOYcI0ZUqrnOon2MyQ1xYxMil6qSILwCA1Iz1COYqL8bPbEBJ4YJYSRkKa4hjP190SGYqXGcWA6Y6SHatGbiv95vVRHl35GeZJqwvF8UZQyqAWcJgJDKgnWbGwIwpKaWyEeIomwNrkVTQju4svLpH1Wc89r9bt6uXGVx1EAJXAMKsAFF6ABbkATtAAGj+AZvII368l6sd6tj3nripXPHIE/sD5/AEI0lo0=</latexit>
O(n ·
✓
d 1
h 1
◆
)
[Tsourakakis. The k-clique densest subgraph problem. WWW 2015].
27
1
As of 2017.
DSD: SOTA Limitations
• Initial bounds on ! not tight.
• Size of flow network can be large: e.g., large G with
many instances of Ψ.
• Flow network built from original G each time.
• Even PeelApp does redundant work.
6/15/22 SIGMOD 2022, Philadelphia, PA 28
$, Ψ -core to the rescue!
Can we “bound” the densest subgraph?
Bounding Densest Subgraphs with Cores
• Theorem: G, k, Ψ as before. H a (#, Ψ)-core of G. Then:
#
&'
≤ ) *, Ψ ≤ #+,-.
Special case: #+,--core has density in
/012
3
, #+,- .
6/15/22 SIGMOD 2022, Philadelphia, PA
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
29
h
Bounding DSG with cores: An Example
6/15/22 SIGMOD 2022, Philadelphia, PA
For !"#$ = 2 and a 2-core, LB = 1 and UB = 2.
' = 1. ' =
5
4
,
9
6
,
13
8
, ⋯ → 2.
30
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
Bounding Densest Subgraphs with Cores
• Lemma: The DSG of G must be contained in its
(⌈#$%&⌉, Ψ)-core.
6/15/22 SIGMOD 2022, Philadelphia, PA 31
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
Exact algorithm: CoreExact
• Our algorithm: CoreExact
• Follow the same framework as existing exact algorithm
• Three core-based optimization techniques
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the feasibility
6/15/22 SIGMOD 2022, Philadelphia, PA 32
1. Tighter bounds derived from cores [
"#$%
&'
, )*+,]
2. Build the flow network on cores
3. Locate Clique-densest subgraph in even smaller cores after each checking
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
Approximation Algorithms
• IncApp:
• Do a (", Ψ)-core decomposition of G. time.
• Return the ("&'(, Ψ)-core.
•
)
|+,|
=
)
.
-approximation.
• Finding (repeatedly) clique-degree can be expensive for
large cliques.
• CoreApp: Heuristic to directly find ("&'(, Ψ)-core.
6/15/22 SIGMOD 2022, Philadelphia, PA
<latexit sha1_base64="ojo/HvrAsrswEIka12R2Rr1XIFU=">AAACBHicbVDLSsNAFJ3UV62vqMtuBotQFy2JFHVZcOPOCvYBTSiTyaQdOpkJMxOhhC7c+CtuXCji1o9w5984bbPQ1gMXDufcy733BAmjSjvOt1VYW9/Y3Cpul3Z29/YP7MOjjhKpxKSNBROyFyBFGOWkralmpJdIguKAkW4wvp753QciFRX8Xk8S4sdoyGlEMdJGGtjl2yr3cCg0zMKa6+GREIrAUc2dnsGBXXHqzhxwlbg5qYAcrYH95YUCpzHhGjOkVN91Eu1nSGqKGZmWvFSRBOExGpK+oRzFRPnZ/IkpPDVKCCMhTXEN5+rviQzFSk3iwHTGSI/UsjcT//P6qY6u/IzyJNWE48WiKGVQCzhLBIZUEqzZxBCEJTW3QjxCEmFtciuZENzll1dJ57zuXtQbd41K08njKIIyOAFV4IJL0AQ3oAXaAINH8AxewZv1ZL1Y79bHorVg5TPH4A+szx8+mJaB</latexit>
O(n ·
✓
d 1
h 1
◆
)
33
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
Approximation Algorithms
Core App:
1. Sort vertices of G in ↓ order of their h-clique-based core
number, using cheaper proxy.
2. Obtain the max core & core number " from top-#
vertices
3. If the max degree of remaining vertices is larger than "
• # = 2×#, repeat 2.
• Otherwise, output the max core
6/15/22 SIGMOD 2022, Philadelphia, PA 34
Same worst case time complexity as IncApp and PeelApp (SOTA) but much faster in practice.
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
Sample Experiment Results
6/15/22 SIGMOD 2022, Philadelphia, PA
As-Caida (n = 26K, m = 106K). Friendster (n = 20M, m = 106M).
35
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
6/15/22 SIGMOD 2022, Philadelphia, PA 36
Courtesy: Thirumuruganathan, QCRI.
6/15/22 SIGMOD 2022, Philadelphia, PA 37
Densest subgraph :
86 vertices
18-core
density : 12.5407
Top-2 densest subgraph:
1134 vertices
13-core
density : 10.0150
Cross edges: 296
Side
effects
of
Vaccine
Modes
of
Transm-
ission
of Virus.
Case counts in diff
states and
countries.
Mini Case Study II: Voter Fraud 2020
6/15/22 SIGMOD 2022, Philadelphia, PA 38
Tweets on US Presidential Election 2020.
Number of nodes : 1,385,225
Number of edges : 6,631,720
Number of Tweets: 8,085,323
Size of the largest connected component:
Number of nodes: 1,356,657
Number of edges : 6,611,465
Courtesy: Thirumuruganathan, QCRI.
6/15/22 SIGMOD 2022, Philadelphia, PA 39
1962 vertices
91-core
Density: 83.7665
2206 vertices
54-core
Density: 50.9231 Cross edges: 1385
6/15/22 SIGMOD 2022, Philadelphia, PA 40
Repeated allegations
of voter fraud.
retweeting Sydney
Powell’s tweet
warning states against
certifying the election.
Quoting Trump “dirty
rolls ==> dirty polls”.
big tech is colluding
with dems to defeat
Trump. Vote in person
to fight against mail-in
voter fraud. FBI said
many military mail-in
votes, all for Trump,
were thrown away in
a ditch in PA. Biggest
voter fraud in
American history.
Voting machines
known to be insecure.
Need proof of
citizenship and photo
ID to prevent fraud.
Fact-checkers from AP,
Politifact, &
Reuters confirm -- no
evidence of
widespread election
fraud. Experts confirm
elections are secure;
most of the
interference comes
from misinformation
campaigns. GOP and
Trump team are
sowing disinfo. and
panic. Need to protect
democracy. Trump’s
narrower margin wins
in 2016 vs Biden’s
wider ones in 2020.
Debunk “Deborah
Jean Christiansen’s
vote is fraud” by
quoting her. More
former Trump aides
getting infected than
voter fraud cases!
Quotes of Sydney Powell’s tweet; replies
that there is no evidence of widespread
fraud; Biden brags about having “the most
extensive and inclusive VOTER FRAUD
organization in the history of American
politics; (CNN) dishonesty taxonomy of
Trump rally; Phily Mayor hiding info. from
people. Anyone caught cheating with
Voter Fraud games should be federally
charged; State officials from both parties
stated the election went well. Losing side
refusing to recognize clear winner;
weaving conspiracy theories and
strangling faith and belief.
Mini Case Study III: Nepal Earthquake
6/15/22 SIGMOD 2022, Philadelphia, PA 41
• Graph constructed from cascades of tweets collected following the Nepal
earthquake, April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
Courtesy: Thirumuruganathan, QCRI.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.
6/15/22 SIGMOD 2022, Philadelphia, PA 42
1463 vertices
129-core
density: 105.328
370 vertices
115-core
density : 71.9378
129 edges
Requests
for help
Info on
earthquake –
magnitude,
distance to cities
affected from
capital
Reports
on
damage
and ruin
Recent Progress on DSGs
WWW2020
Provide near optimal
via multiple peeling
1 + # -approx within
$(
& '( )
*∗ ⋅
-
./) proved by
[SODA2022]
STOC2020
(1 + #)-approximation
on dynamic graph
With $(log4 5 ⋅ #67)
per edge
insertion/deletion
WWW2020
Define and find
minimal DSG
Minimal: no proper
subgraph is a DSGs
SODA2022
A flow-based 1 + # -
approx algo
With 8
$(
9
.
)
6/15/22 SIGMOD 2022, Philadelphia, PA 43
[Digvijay, Gao, Peng et al. Flowless: Extracting densest subgraphs without flow computations. WWW 2020].
[Sawlani and Wang. Near-optimal fully dynamic densest subgraph. STOC 2020].
[Chang and Qiao. Deconstruct Densest Subgraphs. WWW 2020].
[Chekuri, Quanrud, and Torres. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. SODA 2022].
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
•Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 44
Directed Densest Subgraphs
6/15/22 SIGMOD 2022, Philadelphia, PA
a
e
d
c
b
!∗
#∗
A directed densest subgraph (DDS) of a digraph
is a pair of vertex sets (S, T). Its density is
<latexit sha1_base64="jzi2npcaaUdd+d3XTNTd2P0/iEE=">AAACGnicbVBNS8NAEN3U7/oV9ehlsQgKUhIp6kUoiuBRsdVCU8pmu2mXbrJxdyKUJL/Di3/FiwdFvIkX/43b2oNaHwy8fW+GnXl+LLgGx/m0ClPTM7Nz8wvFxaXllVV7bf1ay0RRVqdSSNXwiWaCR6wOHARrxIqR0Bfsxu+fDv2bO6Y0l1ENBjFrhaQb8YBTAkZq266nenLnaq+2i4+xFyhC0+xs9M7y1NO3CtLsKvNoRwLOalme47ZdcsrOCHiSuGNSQmNctO13ryNpErIIqCBaN10nhlZKFHAqWF70Es1iQvuky5qGRiRkupWOTsvxtlE6OJDKVAR4pP6cSEmo9SD0TWdIoKf/ekPxP6+ZQHDUSnkUJ8Ai+v1RkAgMEg9zwh2uGAUxMIRQxc2umPaIyQdMmkUTgvv35ElyvV92D8qVy0qpejKOYx5toi20g1x0iKroHF2gOqLoHj2iZ/RiPVhP1qv19t1asMYzG+gXrI8vxligKA==</latexit>
⇢(S, T) =
|E(S, T)|
p
|S| · |T|
-- generalizes edge density from undirected graphs.
Problem: Find !∗
, #∗
with max. %.
[Kannan and Vinay. Analyzing the structure of large graphs. Tech Report 1999].
45
SOTA1
DDS Discovery: Exact
• Repeatedly solve Max-flow, similarly to the undirected case.
• for each value of ! =
|$|
|%|
: 0 < ) , |+| ≤ -
• Find the max density by binary search.
• Build flow network and solve Max-flow.
• Overall time: . -/
01234567 .
• > 2 days on ~1,200 vertices and ~2,600 edges.
6/15/22 SIGMOD 2022, Philadelphia, PA
[Khuller and	Saha.	On	finding	dense	subgraphs.	ICALP	2009].
46
1As of 2019.
SOTA DDS Discovery: Approximation
6/15/22 SIGMOD 2022, Philadelphia, PA
Greedy	Peeling	Algorithm:
• Build	a	bipartite	graph	
(L,R,E)	where	! = # = $
• The	edges	are	all	from	
! copy	to	# copy	
• Each	time	remove	a	node	
with	least	degree
• Report	densest	subgraph	
among	those	obtained.	
c
a b
d
e
% & + ( time.
Approximation?
47
G
[Khuller and	Saha.	On	finding	dense	subgraphs.	ICALP	2009].
SOTA DDS Discovery: Approximation
6/15/22 SIGMOD 2022, Philadelphia, PA
• Fix [personal communication with authors].
• 2-approximation algorithm
• !(#(# + %))
KS-Approx
density: 2.75
Ground truth
density: 6
<latexit sha1_base64="Whotl/O/SEtWiMWhAbdFJgi04F4=">AAACfHicbVFdSwJBFB23L7MvrcdehiwoKtm1qB6jgnrwoSgrMJHZ8aqDs7PLzN1Qln5Cr/Xb+jPRrBqkdmHgcM7cz+NHUhh03a+MMzM7N7+QXcwtLa+sruUL648mjDWHKg9lqJ99ZkAKBVUUKOE50sACX8KT371M9adX0EaE6gH7EdQD1laiJThDS937Da+RL7oldxB0GngjUCSjuG0UMvylGfI4AIVcMmNqnhthPWEaBZfwlnuJDUSMd1kbahYqFoCpJ4NZ3+iOZZq0FWr7FNIB+zcjYYEx/cC3PwOGHTOppeR/Wi3G1lk9ESqKERQfNmrFkmJI08VpU2jgKPsWMK6FnZXyDtOMoz3PWJdB7Qj42CZJL1aCh02YYCX2UDNLGsCACZVulVSEinu0InywN1Hwq9qyqbx7JdoCzUHFeqAOrjVAd28qxdriTZowDR7LJe+oVL47Lp5fjAzKkk2yRXaJR07JObkht6RKOGmTd/JBPjPfzraz7xwOvzqZUc4GGQvn5Ad1dMU4</latexit>
<latexit sha1_base64="5KqrWk8OLGSMOzvxwclVW1sn29I=">AAACfHicbVHLSiNBFK20zhgfo0aXbgqj4DAaulXUpaigiywUjQoxhOqbm1ikurqpui0JjZ/gVr/NnxGrYwSTzIWCwzl1nydMlLTk++8Fb2r61++Z4uzc/MKfxaXl0sqtjVMDWINYxeY+FBaV1FgjSQrvE4MiChXehd3TXL97QmNlrG+on2AjEh0t2xIEOeoamkFzuexX/EHwSRAMQZkN47JZKsBDK4Y0Qk2ghLX1wE+okQlDEhQ+zz2kFhMBXdHBuoNaRGgb2WDWZ77pmBZvx8Y9TXzA/szIRGRtPwrdz0jQox3XcvJ/Wj2l9lEjkzpJCTV8NWqnilPM88V5SxoEUn0HBBjpZuXwKIwAcucZ6TKonSCMbJL1Ui0hbuEYq6hHRjjSIkVC6nyrrCp12uNVGaK7icZv1ZXN5a0z2ZFkt6vOA719bhC7fydSnC3BuAmT4Ha3EuxVdq/2y8cnQ4OKbI2tsy0WsEN2zC7YJasxYB32wl7ZW+HD2/D+eTtfX73CMGeVjYR38Al3jMU5</latexit> <latexit sha1_base64="OciOVARK1sKEoilDge+XCapw8Sg=">AAACfHicbVFdbxJBFB1WrS3alupjXyaiCaYt2cWm9ZGoiT7wgGn5SICQ2csFJszObmbuNpANP8FX/W3+GeMsYCLQm0xycs7czxMmSlry/d8F78nTZwfPD4+KL14en5yWzl61bZwawBbEKjbdUFhUUmOLJCnsJgZFFCrshLPPud55QGNlrO9pkeAgEhMtxxIEOeoOhrVhqexX/VXwfRBsQJltojk8K0B/FEMaoSZQwtpe4Cc0yIQhCQqXxX5qMREwExPsOahFhHaQrWZd8neOGfFxbNzTxFfs/xmZiKxdRKH7GQma2l0tJx/TeimNPw4yqZOUUMO60ThVnGKeL85H0iCQWjggwEg3K4epMALInWery6p2grC1STZPtYR4hDusojkZ4UiLFAmp862yhtTpnDdkiO4mGv+prmwuV77IiSR72XAe6MuvBnH2fi/F2RLsmrAP2rVq8KFa+35drn/aGHTIztkbVmEBu2V19o01WYsBm7Af7Cf7VfjjvfUuvKv1V6+wyXnNtsK7+Qt5osU6</latexit> <latexit sha1_base64="DGIsGN9ixCJF6GsZzWTuQPbmAhU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ2O/sVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Be7jFOw==</latexit> <latexit sha1_base64="JOt/1H2zqv7i0ww80DAT2XJ/owU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ+OgsVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Bfc7FPA==</latexit> <latexit sha1_base64="wI9CgGlL/wh61/YzwYNb5yZoG+8=">AAACfHicbVHLSitBEO2Mb72+l24acy8oapjxvRQVdJGFco0KMYSeSiU26ekZumskYfAT3Oq3+TNiT4xgEgsaDud0PU+YKGnJ998L3tj4xOTU9Mzs3J/5hcWl5ZVbG6cGsAKxis19KCwqqbFCkhTeJwZFFCq8C9tnuX73hMbKWN9QN8FaJFpaNiUIctR/qB/Ul4p+ye8FHwVBHxRZP67qywV4aMSQRqgJlLC2GvgJ1TJhSILC59mH1GIioC1aWHVQiwhtLevN+sz/OabBm7FxTxPvsT8zMhFZ241C9zMS9GiHtZz8Taum1DyuZVInKaGGr0bNVHGKeb44b0iDQKrrgAAj3awcHoURQO48A116tROEgU2yTqolxA0cYhV1yAhHWqRISJ1vlZWlTju8LEN0N9H4rbqyubxxLluS7HbZeaC3Lwxie3MkxdkSDJswCm53S8Feafd6v3hy2jdomq2xdbbBAnbETtglu2IVBqzFXtgreyt8eH+9LW/n66tX6OessoHwDj8Bf+TFPQ==</latexit> <latexit sha1_base64="WOlJgwemx+DmvqbfEWG3xF6xG2Q=">AAACfHicbVFdSxtBFJ2sVlPbaqKPfRlMC5Zq2I0SfRQr6EMelDYqxBBmb26SIbOzy8xdSVjyE3zV39Y/UzobI5jECwOHc+Z+njBR0pLv/y14K6sf1taLHzc+ff6yuVUqb9/YODWATYhVbO5CYVFJjU2SpPAuMSiiUOFtOPyV67cPaKyM9R8aJ9iORF/LngRBjvoNnXqnVPGr/jT4MghmoMJmcdUpF+C+G0MaoSZQwtpW4CfUzoQhCQonG/epxUTAUPSx5aAWEdp2Np11wr87pst7sXFPE5+ybzMyEVk7jkL3MxI0sItaTr6ntVLqnbQzqZOUUMNLo16qOMU8X5x3pUEgNXZAgJFuVg4DYQSQO89cl2ntBGFuk2yUaglxFxdYRSMywpEWKRJS51tlDanTEW/IEN1NNL6qrmwu753LviS733Ae6P0Lgzj8sZTibAkWTVgGN7VqcFitXR9VTs9mBhXZV7bL9ljAjtkpu2RXrMmA9dkje2LPhX/eN++nd/Dy1SvMcnbYXHj1/4H6xT4=</latexit>
<latexit sha1_base64="b/lZi7cHtUhY0qgyTwdfMpaH82g=">AAACfnicbVFdSxtBFL1ZW6u2WrWPfRkaLAoad6Ogj1IL9iEPFowKMYTZyU28ZHZ2mbkrCUt+g6/60/w3zsYUmsQLA4dz5n6eONPkOAxfKsHSh4/Ln1ZW1z5/Wd/4urm1fe3S3CpsqlSn9jaWDjUZbDKxxtvMokxijTfx4LzUbx7QOkrNFY8ybCeyb6hHSrKnmnGnqI87m9WwFk5CLIJoCqowjcvOVkXddVOVJ2hYaelcKwozbhfSMimN47W73GEm1UD2seWhkQm6djGZdix2PNMVvdT6Z1hM2P8zCpk4N0pi/zORfO/mtZJ8T2vl3DttF2SynNGot0a9XAtORbm66JJFxXrkgVSW/KxC3UsrFfsDzXSZ1M5QzWxSDHNDKu3iHKt5yFZ60iEnkky5VdEgkw9Fg2L0NzH4T/VlS3n3N/WJ3X7Du2D2LyziYG8hxdsSzZuwCK7rteioVv97XD37NTVoBb7DD9iFCE7gDP7AJTRBAcEjPMFzAMHP4CA4fPsaVKY532AmgtNXo8DFRg==</latexit> <latexit sha1_base64="HFj6g0RuKsIntz/MrjsPqy3QnNo=">AAACfnicbVFdSxtBFL3ZqvX7oz76MhgqChp3tVAfxQr6kAeFRoUYwuzkJl4yO7vM3C0JS35DX9uf1n/T2RjBJF4YOJwz9/PEmSbHYfivEnxaWFz6vLyyura+sbm1vfPlwaW5VdhQqU7tUywdajLYYGKNT5lFmcQaH+P+j1J//IXWUWp+8jDDViJ7hrqkJHuqEbeL81F7uxrWwnGIeRBNQBUmcdfeqajnTqryBA0rLZ1rRmHGrUJaJqVxtPqcO8yk6sseNj00MkHXKsbTjsRXz3REN7X+GRZj9n1GIRPnhknsfyaSX9ysVpIfac2cuxetgkyWMxr12qiba8GpKFcXHbKoWA89kMqSn1WoF2mlYn+gqS7j2hmqqU2KQW5IpR2cYTUP2EpPOuREkim3Kupk8oGoU4z+JgbfVF+2lA+vqUfsjuveBXN8YxH7R3Mp3pZo1oR58HBWi85rZ/ffqpdXE4OWYQ/24RAi+A6XcAt30AAFBL/hD/wNIDgIToLT169BZZKzC1MRXPwHpdfFRw==</latexit>
<latexit sha1_base64="Mrac9AmGg1pDfULu3wtz7vyya0Y=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7PqB+ha0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDPbHFhw==</latexit>
<latexit sha1_base64="5PN1cqdjLamdY7CGZ+vd+T/Tydo=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7Kr48Ra0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDP8jFiA==</latexit>
<latexit sha1_base64="kaC645jKDfiAyiPu7G+ZAQe5fW0=">AAACf3icbVFdSwJBFB23b/vSeuxlSKKCkF0LqreooB58KEgNTGR2vOrk7OwyczeUxf/Qa/2z/k2ztkFqFwYO58z9PH4khUHX/co5C4tLyyura/n1jc2t7UJxp27CWHOo8VCG+tlnBqRQUEOBEp4jDSzwJTT8wU2qN95AGxGqJxxF0ApYT4mu4AwtVffbiXcxbhdKbtmdBJ0HXgZKJIuHdjHHXzohjwNQyCUzpum5EbYSplFwCeP8S2wgYnzAetC0ULEATCuZjDumB5bp0G6o7VNIJ+zfjIQFxowC3/4MGPbNrJaS/2nNGLsXrUSoKEZQ/KdRN5YUQ5ruTjtCA0c5soBxLeyslPeZZhzthaa6TGpHwKc2SYaxEjzswAwrcYiaWdIABkyodKukKlQ8pFXhg72Jgl/Vlk3lo1vRE2hOqtYGdXKnAQbHcynWFm/WhHlQr5S903Ll8ax0dZ0ZtEr2yD45Ih45J1fknjyQGuHklbyTD/Lp5JxDp+y4P1+dXJazS6bCufwGPavFhw==</latexit>
…
18 vertices
36 vertices
<latexit sha1_base64="0KVnyYv6DtN8OkwP1tQAIjKB8QQ=">AAACfHicbVFdbxJBFB1WWyttLdVHXybSJjStZBeN+kjUxD7wQKN8JEDI3eECE2ZnNzN3DWTDT/BVf5t/xjgLNCnQm0xycs7czxMmSlry/b8F78nTg8NnR8+LxyenL85K5y/bNk6NwJaIVWy6IVhUUmOLJCnsJgYhChV2wtmXXO/8RGNlrH/QIsFBBBMtx1IAOeo7DINhqexX/VXwfRBsQJltojk8L4j+KBZphJqEAmt7gZ/QIANDUihcFvupxQTEDCbYc1BDhHaQrWZd8kvHjPg4Nu5p4iv2YUYGkbWLKHQ/I6Cp3dVy8jGtl9L40yCTOkkJtVg3GqeKU8zzxflIGhSkFg6AMNLNysUUDAhy59nqsqqdoNjaJJunWop4hDusojkZcKRFikDqfKusIXU65w0ZoruJxnvVlc3lylc5kWRvGs4DffPNIM6u9lKcLcGuCfugXasG76q1u/fl+ueNQUfsNXvDKixgH1md3bImazHBJuwX+83+FP55F96193b91Stscl6xrfA+/AdzXMU3</latexit>
Approximation	Ratio
'
(.*+
= 2.18
#	of	c nodes	= 41(
#	of	b nodes	= 21(
#	of	a nodes	= 1
Ground	truth	density:	21
KS-Approx density:	
23
(3456
Approx Ratio:	
(3456
(
Enlarge	the	
graph
[Khuller and	Saha.	On	finding	dense	subgraphs.	ICALP.	2009].	
7∗.
7∗.
48
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD
2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
Densest Directed Subgraph: An Exact
Algorithm
• (", $)-core: An (S, T)-induced
subgraph:
• Every node in S has outdegree ≥ ".
• Every node in T has indegree ≥ $.
• S and T not necessarily disjoint.
• H = ({a,b}, {c,d}) is a (2, 2)-core.
6/15/22 SIGMOD 2022, Philadelphia, PA
c
a b
d
e
⇤
<latexit sha1_base64="vK5hisxuwaLuWz+t3EWeuy906m4=">AAACfHicbVHZSgMxFE3Hre7boy/BKriWGRX1UVTQhz5U7CLUKpn0toZmMkNyR1qGfoKv+m3+jJipFWzrhcDhnNz1+JEUBl33M+NMTE5Nz2Rn5+YXFpeWV1bXKiaMNYcyD2WoH3xmQAoFZRQo4SHSwAJfQtVvX6V69RW0EaEqYTeCesBaSjQFZ2ip+9LT3vNKzs27/aDjwBuAHBlE8Xk1wx8bIY8DUMglM6bmuRHWE6ZRcAm9ucfYQMR4m7WgZqFiAZh60p+1R7ct06DNUNunkPbZvxkJC4zpBr79GTB8MaNaSv6n1WJsntcToaIYQfGfRs1YUgxpujhtCA0cZdcCxrWws1L+wjTjaM8z1KVfOwI+tEnSiZXgYQNGWIkd1MySBjBgQqVbJQWh4g4tCB/sTRT8qrZsKu9ci5ZAc1CwHqiDGw3Q3h1LsbZ4oyaMg8pR3jvOH92d5C4uBwZlyQbZJDvEI2fkgtySIikTTlrkjbyTj8yXs+XsO4c/X53MIGedDIVz+g1Hc8Ui</latexit>
⇤
<latexit sha1_base64="IfdjkWd9tC1nJRISm8srvbkdDxo=">AAACfHicbVHLSgMxFE3HV32/lm6CVaivMqOiLkUFXXRR0bZCrZJJb2toJjMkd6Rl6Ce41W/zZ8RMrWBbLwQO5+Q+jx9JYdB1PzPOxOTU9Ex2dm5+YXFpeWV1rWLCWHMo81CG+sFnBqRQUEaBEh4iDSzwJVT99mWqV19BGxGqe+xGUA9YS4mm4AwtdXf3tPu8knMLbj/oOPAGIEcGUXpezfDHRsjjABRyyYypeW6E9YRpFFxCb+4xNhAx3mYtqFmoWACmnvRn7dFtyzRoM9T2KaR99m9GwgJjuoFvfwYMX8yolpL/abUYm2f1RKgoRlD8p1EzlhRDmi5OG0IDR9m1gHEt7KyUvzDNONrzDHXp146AD22SdGIleNiAEVZiBzWzpAEMmFDpVklRqLhDi8IHexMFv6otm8r5K9ESaPaL1gO1f60B2jtjKdYWb9SEcVA5LHhHhcPb49z5xcCgLNkgmyRPPHJKzskNKZEy4aRF3sg7+ch8OVvOnnPw89XJDHLWyVA4J99FW8Uh</latexit>
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Ma et al. On Densest Subgraph Discovery. TODS 2021].
49
Densest Directed Subgraph: Core-Exact
Theorem: The DDS of G is contained in the (
"∗
$ %
,
%⋅"∗
$
)-
core.
• a =
)∗
|+∗|
-- unknown; search through all
,
-
: 0 < 1, 2 ≤ 4.
• 6∗
-- unknown: start with good bounds and use binary search.
• E.g., lower bound = any 2-approx. solution and upper bound = 2 ×
lower bound.
• Still 9(4$
:;%<=>?@) but much faster in practice – smaller flow
graphs.
6/15/22 SIGMOD 2022, Philadelphia, PA 50
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
Densest Directed Subgraph: DC-Exact
• Uses a “divide and conquer” approach.
• For a given
!
"
, result of binary search for “best” (S,T) pair
gives enough info. about subranges of ratios that can be
skipped.
• Algorithm DC-Exact: $ %&'()*+,- , e.g., …
• % ≪ /0
in practice.
6/15/22 SIGMOD 2022, Philadelphia, PA 51
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
Densest Directed Subgraph: Core-Approx
• G[S,T] – (x,y)-core of G. Then ! ", $ ≥ &'.
• Let [&∗
, '∗
] be the max core-number pair, i. e. , it
maximizes &' among all (&, ')-cores.
• !∗
≤ 2 &∗'∗.
• èThe (&∗
, '∗
)-core is a 2-approx. solution to DDS.
6/15/22 SIGMOD 2022, Philadelphia, PA 52
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
Densest Directed Subgraph: Core-Approx
• Naïve implementation: for each !, compute all (!, $)-
cores, 0 < $ < (, and return (!∗
, $∗
)-core
à *(( + + ( ) time.
• Can we do better?
6/15/22 SIGMOD 2022, Philadelphia, PA 53
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
Densest Directed Subgraph: Core-Approx
6/15/22 SIGMOD 2022, Philadelphia, PA 54
x
8
5
2
7
1
6
y
4
3
7
4 8
2
1 3 6
5
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
Candidates
[ ⇤
, ⇤
]
Main idea:
for each ! ≤ #, search for the
largest %;
for each % ≤ #, search for the
largest !;
&( ( ⋅ (* + ()) time.
Max equal pair: (#, #).
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
Sample Experiment Results: Exact Algorithms
6/15/22 SIGMOD 2022, Philadelphia, PA 55
Up	to	6	orders	of	magnitude	faster	
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
Sample Experiment Results: Approx Algorithms
6/15/22 SIGMOD 2022, Philadelphia, PA 56
Up	to	6	orders	of	magnitude	faster	
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
AR: (~3.4M, ~5.8M)
BA: (~2.1M, ~17.8M)
TW: (~52.6M, ~1.96B)
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Bahmani, Kumar, Vassilvitskii. Densest Subgraph in Streaming and MapReduce. VLDB 2012].
Better Approximation Ratio?
• Propose	a	new	LP	formulation	for	DDS	problem
• A	divide-and-conquer	algorithmic	framework
• An	efficient	(1 + $)-approximation	algorithm
• An	efficient	exact	algorithm
• Up	to	3	orders	of	magnitude	faster	than	the	state-of-the-
art	exact	and	approximation	algorithms
6/15/22 SIGMOD 2022, Philadelphia, PA 57
Any	real	positive	number
[Ma, Fang, Cheng, L., and Han. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery.
SIGMOD 2022].
For more details, go to Chenhao’s talk Wednesday at 2 pm in Rm 202B.
Recent Progress on DDS
• A Concurrent work from SODA2022
• Gives (1 + $)-approximation in &
'(
(
)
) time via network
flow for undirected graphs
• Can also be extended to directed graphs with extra time cost
• It would be interesting to compare two algos empirically
6/15/22 SIGMOD 2022, Philadelphia, PA 58
[Chekuri, Quanrud, and Torres. “Densest Subgraph: Supermodularity, Iterative Peeling, and Flow.” SODA 2022].
Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
6/15/22 SIGMOD 2022, Philadelphia, PA 59
Courtesy: Thirumuruganathan, QCRI.
Directed Densest Subgraph from Covid-19
6/15/22 SIGMOD 2022, Philadelphia, PA 60
Source Nodes = 777
Target Nodes = 15
Common Nodes = 2
(5 70)-core.
Density: 55.8826
777 nodes “influenced” by
15 “initiators”.
Vaccine side
effects,
Modes of
Transmission.
Mini Case Study II: Nepal Earthquake
6/15/22 SIGMOD 2022, Philadelphia, PA 61
• Graph constructed from cascades of tweets collected following the Nepal earthquake,
April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.
Courtesy: Thirumuruganathan, QCRI.
Directed Densest Subgraph from Nepal
6/15/22 SIGMOD 2022, Philadelphia, PA 62
Source Nodes: 122637
Target Nodes: 25233
Common nodes: 20713
(1,51)-core
density: 34.309
Tens of thousands of “initiators”
and more than a hundred thousand of
”influenced”.
Info on damage
and requests for
help.
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
•Combating via Mitigation: A Refresher on
Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 63
Propagation/Diffusion Models
6/15/22 SIGMOD 2022, Philadelphia, PA 64
• How does influence/information
travel in networks?
• Example Phenomena: infection,
product adoption, information,
opinion, rumor, etc.
• Stochastic diffusion models –
discrete/continuous time.
• How can we launch campaigns
to optimize design objectives?
[Kempe,Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
Influence Maximization
• Core optimization problem in IM: Given a diffusion model M, a network
G = (V, E), model parameters, and problem parameters (e.g., budget). Find a
seed set under budget that maximizes .
expected number of adopters given
initial adopters S (spread).
S ⇢ V M (S)
65 6/15/22 SIGMOD 2022, Philadelphia, PA 65
e.g., edge propagation probabilities.
Complexity of IM
• Theorem: The IM problem is NP-hard for several major diffusion models
under both discrete time and continuous time.
6/15/22 SIGMOD 2022, Philadelphia, PA 66
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
Complexity of Spread Computation
• Theorem: It is #P-hard to compute the expected spread of a node set
under major diffusion models. #simple paths in a digraph.
[Chen, Wang, and Yang. Efficient influence maximization in social networks. KDD 2009].
[Chen, Yuan, and Zhang. Scalable influence maximization in social networks under the linear threshold model.
ICDM 2010].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
6/15/22 SIGMOD 2022, Philadelphia, PA 67
Properties of Spread Function
is
monotone: S ✓ S0
=) (S)  (S0
).
(S)
6/15/22 SIGMOD 2022, Philadelphia, PA 68
Properties of Spread Function
is
submodular:
(S)
S ⇢ S0
⇢ V, x 2 V  S0
=)
(x|S0
)  (x|s), where
(x|S) := (S [ {x}) (S).
marginal gain.
6/15/22 SIGMOD 2022, Philadelphia, PA 69
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
Approximation of Submodular Function
Maximization
• Theorem: Let be a monotone submodular function, with Let
and resp. be the greedy and optimal solutions. Then
OPT
f : 2V
! R 0 f(;) = 0.
SGrd
S⇤
f(SGrd
) (1
1
e
)f(S⇤
).
[Nemhauser, Woolsey, and Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Prog. 1978].
6/15/22 SIGMOD 2022, Philadelphia, PA 70
Approximation of Submodular Function
Maximization

• Theorem: The spread function is monotone and submodular under
various major diffusion models, for both discrete and continuous time.
(.)
6/15/22 SIGMOD 2022, Philadelphia, PA 71
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
Baseline Approximation Algorithm
Monte Carlo simulations for estimating
expected spread.
Lazy Forward optimization to save useless
updates.
è Greedy still extremely slow on large networks.
[Leskovec, Krause, Guestarin, Faloutsos, VanBriesen, and N. Glance.
Cost-effective outbreak detection in networks. KDD 2007].
[Kempe, Kleinberg, and Tardos. Maximizing the spread
of influence through a social network. KDD 2003].
6/15/22 SIGMOD 2022, Philadelphia, PA 72
Reverse Influence Sampling
• A series of algorithms that guarantee a
-approximation to the optimal
expected spread.
• Key : use random reverse reachable sets
(rr-sets) to gauge quality of (candidate) seeds.
(1
1
e
✏)
<latexit sha1_base64="AW/ZWNJ71ORm2nTuWljbif+hLkI=">AAACAXicbVBNS8NAEN34WetX1IvgZbEI9dCSVEGPBS8eK9gPaErZbCft0s0m7G6EEuLFv+LFgyJe/Rfe/Ddu2xy09cHA470ZZub5MWdKO863tbK6tr6xWdgqbu/s7u3bB4ctFSWSQpNGPJIdnyjgTEBTM82hE0sgoc+h7Y9vpn77AaRikbjXkxh6IRkKFjBKtJH69nHZrXiBJDR1sxSyigexYjwS53275FSdGfAycXNSQjkaffvLG0Q0CUFoyolSXdeJdS8lUjPKISt6iYKY0DEZQtdQQUJQvXT2QYbPjDLAQSRNCY1n6u+JlIRKTULfdIZEj9SiNxX/87qJDq57KRNxokHQ+aIg4VhHeBoHHjAJVPOJIYRKZm7FdERMHtqEVjQhuIsvL5NWrepeVGt3l6V6PY+jgE7QKSojF12hOrpFDdREFD2iZ/SK3qwn68V6tz7mrStWPnOE/sD6/AGGeJZN</latexit>
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].
6/15/22 SIGMOD 2022, Philadelphia, PA 73
Reverse Reachable Sets (RR-Sets)
7
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].
6/15/22 SIGMOD 2022, Philadelphia, PA 74
Reverse Reachable Sets (RR-Sets)
7
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
RR-set = {A}
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]
6/15/22 SIGMOD 2022, Philadelphia, PA 75
Reverse Reachable Sets (RR-Sets)
7
• An RR-set is a subgraph sample of !
• Generation of RR-sets under the IC model:
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
sample its/their
incoming edges
RR-set = {A, C, B, E}
add the sampled
neighbors
• Intuition:
– An rr-set is a sample set of nodes that can
influence node A
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]
6/15/22 SIGMOD 2022, Philadelphia, PA 76
Influence Estimation with RR-Sets
• Theorem: Pr[S overlaps a random rr-set] =
!
"
× expected spread of S.
• Family of approx. algorithms: TIM, IMM, Stop-
and-Stare, …
[Tang et al., “Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency”, SIGMOD 2014]
[Tang et al., “Influence Maximization in Near-Linear Time: A Martingale Approach”, SIGMOD 2015]
[Chen et al. An issue in the Martingale Analysis of the Influence Maximization Algorithm IMM. arXiv 2018].
[Nguyen et al., “Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks”,
SIGMOD 2016] à arXiv
[K. Huang, S. Wang, G. Bevilacqua, X. Xiao, and L. Revisiting the Stop-and-Stare Algorithms for Influence
Maximization, PVLDB 2017]
6/15/22 SIGMOD 2022, Philadelphia, PA 77
What if objective is not submodular?
6/15/22 SIGMOD 2022, Philadelphia, PA 78
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
What if objective is not submodular?
6/15/22 SIGMOD 2022, Philadelphia, PA 79
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
What if objective is not submodular?
6/15/22 SIGMOD 2022, Philadelphia, PA 80
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
What if the objective is not submodular?
6/15/22 SIGMOD 2022, Philadelphia, PA 81
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
• f – monotone but not submodular.
• !, # – monotone and submodular and
! (#) lower (resp. upper) bounds f.
• Let $% ($', $() be the Greedy solution to
max
-⊆/, - 01
2 $ and $34 ∈ {$%, $', $(} (resp. …)
be the best w.r.t. f(.).
Then
What if the objective is not submodular?
6/15/22 SIGMOD 2022, Philadelphia, PA 82
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
! "#$ ≥ max{
!("+)
-("+)
,
/("0
123
)
!("0
123
)
} ⋅ 1 −
1
8
⋅ ! "0
123
.
OPT.
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 83
Filter Bubbles, Echo Chambers, and Polarization
• Selective exposure to viewpoints/issues can engender/worsen
polarization.
[Pariser. The filter bubble: What the Internet is hiding from you. Penguin, 2011].
[Bakshy, Messing, and Adamic. Exposure to ideologically diverse news and opinion on Facebook. Science 2015].
• Aggravated by echo chambers in social media.
[Garrett. Echo chambers online?: Politically motivated selective exposure among internet news users. JCMC 2009].
[Akoglu. Quantifying political polarity based on bipartite opinion networks. ICWSM 2014].
[Amelkin, Singh, and Bogdanov. A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks.
TKDD 2019].
[Chen, Lijffijit,, and De Bie. Quantifying and Minimizing Risk of Conflict in Social Media. KDD 2018].
[Garimella, de Morales, Gionis, and Mathioudakis. Quantifying Controversy over Social Media. TOCS 2018].
6/15/22 SIGMOD 2022, Philadelphia, PA 84
Balancing Exposure by Connections
• Link Recommendation
[Amelkin and A. K. Singh. Fighting opinion control in social networks via link recommendation. KDD
2019].
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW
2018],.
[Zhu, Bao, and Zhang. Minimizing Polarization and Disagreement in Social Networks via Link
Recommendation. NeurIPS 2021].
6/15/22 SIGMOD 2022, Philadelphia, PA 85
Interdisciplinary Approach
• Comprehensive solution goes beyond CS: e.g.,
Polarization Lab https://www.polarizationlab.com
• Interdisciplinary (CS, stats, sociology) approach.
• Real-life experiment by recruiting democrat and republican
volunteers incentivized to follow bots tweeting posts initially
aligned with their ideology but gradually from the other side of
the aisle.
• Complemented with offline tracking and study.
[Bail. Breaking the Social Media Prism. Princeton Univ. Press. 2021].
6/15/22 SIGMOD 2022, Philadelphia, PA 86
Balancing via Information Campaigns
• Smart Algorithm Bursts Social Networks' "Filter
Bubbles"
• “Instead of building echo chambers, Facebook, Twitter and
company can tweak their code to broaden exposure to wider
ranges of views.”
• “… results suggest that targeting a strategic group of social
media users and feeding them the right content is more
effective for propagating diverse views through a social media
network …”
6/15/22 SIGMOD 2022, Philadelphia, PA 87
[IEEE Spectrum Jan 2021. Featuring research of Aslay, Matakos, Galbrun, and Gionis. TKDE 2020].
Balancing via Information Campaigns
• Information Campaign Approach
[Garimella, Gionis, Parotsidis, and Tatti. Balancing information exposure in social networks. NeurIPS
2018].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE
2020].
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
• Common assumptions:
• awareness = adoption.
• Adoption of opposing views is independent.
6/15/22 SIGMOD 2022, Philadelphia, PA 88
Opinions can have complex interaction
6/15/22 SIGMOD 2022, Philadelphia, PA 89
Adopted and propagated independently?!
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
Source: https://newsinteractives.cbc.ca/elections/federal/2021/party-platforms/#section-climate-change
Opinions can have complex interaction
6/15/22 SIGMOD 2022, Philadelphia, PA 90
Pure competition.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
Opinions can have complex interaction
6/15/22 SIGMOD 2022, Philadelphia, PA 91
Partial competition.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
Opinions can have complex interaction
6/15/22 SIGMOD 2022, Philadelphia, PA 92
Complementation/reinforcement.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
•A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
A useful digression.
6/15/22 SIGMOD 2022, Philadelphia, PA 93
Awareness vs adoption
Higher utility!!
Awareness spreads like epidemic, but adoption depends on UTILITY
[Kalish. A new product adoption model with price advertising and uncertainty, Management Science 1985].
6/15/22 SIGMOD 2022, Philadelphia, PA 94
Complementary (aka Reinforcing) Campaigns
6/15/22 SIGMOD 2022, Philadelphia, PA 95
Welfare Maximization: complementary
setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, and budget vector. Find an
allocation of seed nodes to items that maximizes the
expected social welfare.
Expected sum of utilities of
itemsets adopted by users.
6/15/22 SIGMOD 2022, Philadelphia, PA 96
What does the theory say?
6/15/22 SIGMOD 2022, Philadelphia, PA 97
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
A simple greedy still works
98
GREEDY ALGORITHM
Does not require specific
utility-parameters as input
(1 −
$
%
) approximation
6/15/22 SIGMOD 2022, Philadelphia, PA
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
Prefix-preserving seed selection - PRIMA
1 −
#
$
%&'()*+
1 −
#
$
%&'(#
,# ,-
1 −
#
$
%&'(-
,)*+ = max
2
b2
Select enough samples corresponding to every
budget of the budget vector
○ Challenge: The number of samples required is not monotone in
budget
99
6/15/22 SIGMOD 2022, Philadelphia, PA
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
6/15/22 SIGMOD 2022, Philadelphia, PA 100
Competing Campaigns
Welfare Maximization: competing setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, budget vector, and a fixed
(partial) allocation of seed nodes to items, find an allocation
of seed nodes to items that maximizes the expected
social welfare.
Expected sum of utilities of
itemsets adopted by users.
6/15/22 SIGMOD 2022, Philadelphia, PA 101
How hard is (the) competition?
6/15/22 SIGMOD 2022, Philadelphia, PA 102
[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
General case algorithm - SeqGRD
!"
!#
$# $% $"
• Instance dependent approximation :
&'()
&'*+
(- −
-
/
)123
• Sort the items based on their utilities – {$# > $% > ⋯ > $"}
!%
…
… ∑!9
6/15/22 SIGMOD 2022, Philadelphia, PA 103
$":; = max exp.
utility of any
bundle.
$"9<= exp. min
utility of any item.
PRIMA+.
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
• A User Utility Perspective
•A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 104
Filter bubble problem
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
NAY!
• Items (opinions) are complementary objective-wise
• Items (opinions) are competing propagation-wise
[Garrett Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal of computer-mediated communication 2009].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020].
6/15/22 SIGMOD 2022, Philadelphia, PA 105
Problem: Key Ingredients
§Competition parameter
§ After being influenced, adopt the second item w.p. = !, 0 ≤ ! < 1
§(Host’s) Reward of adoption is supermodular, models
complementarity
§ &, for the first item
§ & + Δ, for the second item, & < Δ
§Expected (host) utility for user adopting both & + !Δ
§Goal is to maximize the sum of utilities under a competition-
driven diffusion
6/15/22 SIGMOD 2022, Philadelphia, PA 106
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
Filter bubble mitigation
• There is an existing bubble
• A more general setting
107
Item A
Problem FB Mitigation (FBM): Given graph ! = #, %, & ,
competition parameter ', 0 < ' < 1, fixed A seeds +,, and
budget -, find B seeds +., such that +. ≤ - and the
expected welfare is maximized.
6/15/22 SIGMOD 2022, Philadelphia, PA
Inherent Challenges – Strike One
• FBM is neither monotone nor submodular.
• Restricted (sequential) setting: propagation of follower
doesn’t start before that of leader ends. FBM in the
sequential setting is monotone and submodular! J
• But wait! FBM can be arbitrarily worse than FBM$%& and
vice versa! L
6/15/22 SIGMOD 2022, Philadelphia, PA 108
Another Attempt
6/15/22 SIGMOD 2022, Philadelphia, PA 109
Item A
First
Level
Competition
Item B
• Expected reward at each FLC node = ! + #Δ.
Surrogate objective: Expected # FLC nodes ×
(! + #Δ).
• Clearly a lower bound for FBM.
• But the FLC objective is neither monotone
nor submodular.
Algorithm 1 – SPReadGRD
• Greedily selects B seeds that maximize the marginal
spread
• Ignore the welfare objective
• PRIMA+ is used to do the seed selection
• Given fixed !"
, PRIMA selects !#
, such that
• %(!#
∪ !"
) = 1 −
,
-
− . %(!#∗
∪ !"
)
6/15/22 SIGMOD 2022, Philadelphia, PA 110
Analyzing SpreadGRD
• Given !, for the welfare function # the following holds:
• $% ! ≤ # ! ≤ $ + (Δ %(!)
• SPRGRD therefore has the following bound:
# !,
∪ !.
≥ $ ⋅ % !,
∪ !.
≥ $ ⋅ 1 −
1
3
− 4 ⋅ % !,
∪ !∗
≥
$
(Δ + $
(1 −
1
3
− 4)#(!,
∪ !∗
)
6/15/22 SIGMOD 2022, Philadelphia, PA 111
Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• A node influences its neighbors, with every item in the
awareness set
• !" # ≥ !(#)
• !"(⋅) is monotone and submodular
6/15/22 SIGMOD 2022, Philadelphia, PA 112
Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• !*(⋅) is monotone and submodular
6/15/22 SIGMOD 2022, Philadelphia, PA 113
Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• Using sandwich
• Let #,-./ = 0123045678∈ 5:,5,5<
!(#,=>)
• ! #,-./ ≥ max
B 5<
B< 5<
,
B: 5∗
B 5∗ 1 −
F
G
!(#∗
)
6/15/22 SIGMOD 2022, Philadelphia, PA 114
Algorithm 3 - NetRewGRD
115
Item A
Item B
First
Level
Competition
• Extends state of the sampling for
welfare objective
• Reverse reachable trees
• Recursive weight update using a
linear pass
• Scales for large networks
6/15/22 SIGMOD 2022, Philadelphia, PA
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
Experiments
• Baselines considered:
• COEX: Maximizes co-adoptions of both items
• TDEM: Maximizes welfare based on leaning scores
116
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
[Aslay, Matakos, Galbrun, and Gionis. "Maximizing the diversity of exposure in a social network. TKDE 2020]
6/15/22 SIGMOD 2022, Philadelphia, PA
Sample of Results - Quality
6/15/22 SIGMOD 2022, Philadelphia, PA 117
Sample of Results – Running Time
6/15/22 SIGMOD 2022, Philadelphia, PA 118
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
•Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 119
Misinformation Mitigation – Prior Art
• Influence Blocking
• Temporal aspects ignored or not differentiated
• Focus on scalability
[Ceren, Agrawal, and El Abbadi. "Limiting the spread of misinformation in social networks." WWW
2011],
[He, Song, Chen, and Jiang. Influence blocking maximization in social networks under the competitive
linear threshold model. SDM 2012],
[Song,, Hsu, and Lee. Temporal influence blocking: Minimizing the effect of misinformation in social
networks. ICDE 2017],
[Tong,Wu, Guo et al. An efficient randomized algorithm for rumor blocking in online social
networks." IEEE TNSE 2017],
[Tong, Du, and Wu. On misinformation containment in online social networks. NeurIPS 2018],
[Simpson, Srinivasan, and Thomo. Reverse Prevention Sampling for Misinformation Mitigation in
Social Networks. ICDT 2020].
6/15/22 SIGMOD 2022, Philadelphia, PA 120
Temporal Aspects of Propagation
[Vosoughi, Roy, and Aral. The spread of true and false news online. Science 2018]
Together these have important consequences for effective seed set selection
[Mitchell, Stocking, and Matsa. Long-form reading shows signs of life in our mobile news world. Pew
Research Center 2016]
Misinformation spreads faster, farther, and wider than truth! Adoption decisions
have varying lengths
6/15/22 SIGMOD 2022, Philadelphia, PA 121
Temporal Aspects of Propagation
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
• Associate meeting probabilities with each edge
• User reaction times sampled from a data-driven distribution
t = 0 t = 2 t = 3 t = 6
6/15/22 SIGMOD 2022, Philadelphia, PA 122
Adoption decisions of !", !$, !%, !&, !' uncontested.
!( faces a tie; broken with a random permutation, e.g., !', !" .
F->3.
DW: [3,6].
M->4.
Tie!
Misinformation Mitigation Problem
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB
2022]
Reward function !(⋅) measures effectiveness of mitigation
P1 is not submodular!
P1: Given fake seeds %& and reward function !(⋅),
find a seed set that maximizes the expected reward
6/15/22 SIGMOD 2022, Philadelphia, PA 123
Truth reaches well
before misinfo.
Truth arrives too late!
Sandwiching the Mitigation Objective
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Observe: Supermodular behavior arises due to joint effect of mitigation seeds, i.e. acting
alone they would not achieve the same reward.
LB: Maximum reward over singleton seed sets from !" (tight).
!" = {%&, %(}
LB = *+,
-∈{/0,/1}
2(%4, {5})
6/15/22 SIGMOD 2022, Philadelphia, PA 124
Sandwiching the Mitigation Objective
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Simple Candidate: drop meeting events and enforce dominant tie-breaking.
Tighter UB: remove meeting events on edges that can be traversed by both sides.
!" = {%&, %(}
6/15/22 SIGMOD 2022, Philadelphia, PA 125
Importance Sampling
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Observe: only nodes reached
by the misinformation are
eligible for reward.
Idea: only sample roots from
nodes that misinfo
campaignreaches → tighter
bounds!
RDR sets: weighted analog to
RR sets for reward probabilities
6/15/22 SIGMOD 2022, Philadelphia, PA 126
Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Two settings for selecting misinformation seeds: (1) from top-k influential users and (2) uniformly at random
6/15/22 SIGMOD 2022, Philadelphia, PA 127
Small # popular instigators. Several bots or newly created puppet accounts.
Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Reward distribution dominated by uncontested mitigation adoption
6/15/22 SIGMOD 2022, Philadelphia, PA 128
Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Mitigation seeds remain effective under simultaneous perturbation of model parameters.
6/15/22 SIGMOD 2022, Philadelphia, PA 129
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
•Misinformation Intervention
• Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 130
Intervention Challenges
Detectors are fallible
Hard vs Soft intervention
6/15/22 SIGMOD 2022, Philadelphia, PA 131
Misinformation Intervention – Prior Art
• Disadvantaging posts with misleading info, deleting
edges, removing nodes, … à too hard?
• No correction for wrong intervention!
[Farajtabar, Mehrdad, et al. Fake news mitigation via point process based intervention. ICML 2017],
[Tong et al. Gelling, and melting, large graphs by edge manipulation. CIKM 2012],
[Khalil, Boutros, Dilkina, and Song. "Scalable diffusion-aware optimization of network topology KDD 2014],
[Chen, Chen, et al. "Node immunization on large graphs: Theory and algorithms." TKDE 2015],
[Medya,, Silva, and Singh. "Approximate Algorithms for Data-driven Influence Limitation." TKDE 2020],
[Caraban et al. "23 ways to nudge: A review of technology-mediated nudging in human-computer
interaction." SIGCHI 2019],
[Caraban, Konstantinou, and Karapanos. "The Nudge Deck: A design support tool for technology-mediated
nudging." ACM Designing Interactive Systems Conference. 2020],
[Bhuiyan et al. "NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges." CSCW2
2021].
6/15/22 SIGMOD 2022, Philadelphia, PA 132
Cost Aware Intervention
[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
6/15/22 SIGMOD 2022, Philadelphia, PA 133
Reward Function
!"
#$%
− reach of item '" after intervention.
!"
$()#$%
− reach of item '" w/ no intervention.
6/15/22 SIGMOD 2022, Philadelphia, PA 134
[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Cost Aware Intervention
6/15/22 SIGMOD 2022, Philadelphia, PA 135
Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
NCB-TS: Neural Contextual Bandits w/ Thompson Sampling
CB-TS: Contextual Bandits w/ Thompson Sampling
RB: (Learned) Rule based
CSC: Cost Sensitive Classification
6/15/22 SIGMOD 2022, Philadelphia, PA 136
Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Real-time Evaluation from Twitter’s stream during 10-Oct-2020 to 10-Nov-2020.
• 5 million tweets w/ 1800 distinct English news articles
• Topics include Politics (32%), Healthcare (26%), Entertainment (30%), Misc. (12%)
Manual Evaluation
• Random sample of 750 viral and non-viral
tweets
• 3 volunteers evaluated intervention
• Accuracy of 92.1%
Automated Evaluation
• Google FactCheck Claim Search API
• TiKL: That is a Known Lie
• Accuracy of 96.6%
6/15/22 SIGMOD 2022, Philadelphia, PA 137
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
•Summary & Open Questions
6/15/22 SIGMOD 2022, Philadelphia, PA 138
Summary
• Efficient detection of dense subgraphs in undirected and
directed graphs is useful for finding filter bubbles and groups
of actors engaged in spreading misinformation.
• In mitigating filter bubbles via information campaigns,
competition between viewpoints/opinions cannot be ignored.
• In mitigating misinformation, it’s critical to incorporate
temporal aspects.
• In misinformation intervention, it’s important to watch your
step and correct your gait in the face of mistakes.
6/15/22 SIGMOD 2022, Philadelphia, PA 139
Open Questions – Detection
• Integrating content analysis in going after the “right”
densest subgraphs.
• Can we detect filter bubbles and groups promoting
misinformation as they form?
• Longitudinal: (how) do these groups transform over time?
6/15/22 SIGMOD 2022, Philadelphia, PA 140
Open Questions – Countering
• Multiple campaigns of items involving partial/pure
competition, complementation?
• How can we learn propagation probabilities, competition
parameters, utilities from available propagation traces?
• Go beyond expected outcome? E.g., as filter bubbles or
misinformation spreading occur, can we counter them?
6/15/22 SIGMOD 2022, Philadelphia, PA 141
Open Questions --
• Case studies reflecting the effect of mitigation campaigns on
filter bubbles and misinformation diffusion.
• Integrating with claim verification and (computational) fact
checking efforts.
• Incentivizing balance of adoption (in case of filter bubbles)
and adoption of truth (in case of misinformation).
6/15/22 SIGMOD 2022, Philadelphia, PA 142
Acknowledgments
6/15/22 SIGMOD 2022, Philadelphia, PA 143
Chenhao Ma Farnoosh Hashemi Glenn Bevilacqua Michael Simpson
HKU UBC UBC->Oracle UBC
Prithu Banerjee Reynold Cheng Saravanan Thirimuruganathan Xiaolin Han
UBC ->Oracle HKU QCRI, HBKU HKU
Xuemin Lin Wenjie Zhang Yixiang Fang Wei Chen Wei Lu
UNSW UNSW CUHK MSRA UBC→LinkedIn
6/15/22 SIGMOD 2022, Philadelphia, PA 144
ந"
றி!

More Related Content

PPTX
chaitra-1.pptx fake news detection using machine learning
PPTX
Detecting Fake News Through NLP
PPTX
Fake news detection
PPTX
Detecting fake news .pptx
PPTX
Fake News Detection Using Machine learning algorithm
PPTX
Fake news detection project
PPTX
Fake News detection.pptx
PPTX
Community detection in social networks
chaitra-1.pptx fake news detection using machine learning
Detecting Fake News Through NLP
Fake news detection
Detecting fake news .pptx
Fake News Detection Using Machine learning algorithm
Fake news detection project
Fake News detection.pptx
Community detection in social networks

Similar to sigmod-keynote.pdf (20)

PDF
cuhk-fb-mi-talk.pdf
PPTX
Are Filter Bubbles Real?
PPTX
Communities and dynamics in social networks
PPTX
Are Filter Bubbles Real?
PPTX
Echo Chamber? What Echo Chamber? Reviewing the Evidence
PPTX
Are Filter Bubbles Real?
PPTX
Beyond the Bubble: A Critical Review of the Evidence for Echo Chambers and Fi...
PDF
Networks, Deep Learning (and COVID-19)
PPTX
Are Filter Bubbles Real?
PDF
Streaming Graph Processing and Analytics
PPTX
Polarization on Social Media
PPTX
Group and Community Detection in Social Networks
PPTX
Filter Bubbles in the Australian Twittersphere?
PDF
Social Dynamics on Networks
PDF
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
PDF
Disinformation challenges tools and techniques to deal or live with it
PDF
Jürgens diata12-communities
PPTX
Gatewatching 11: Echo Chambers? Filter Bubbles? Reviewing the Evidence
PDF
Echo chambers and filter bubbles
PPT
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
cuhk-fb-mi-talk.pdf
Are Filter Bubbles Real?
Communities and dynamics in social networks
Are Filter Bubbles Real?
Echo Chamber? What Echo Chamber? Reviewing the Evidence
Are Filter Bubbles Real?
Beyond the Bubble: A Critical Review of the Evidence for Echo Chambers and Fi...
Networks, Deep Learning (and COVID-19)
Are Filter Bubbles Real?
Streaming Graph Processing and Analytics
Polarization on Social Media
Group and Community Detection in Social Networks
Filter Bubbles in the Australian Twittersphere?
Social Dynamics on Networks
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large G...
Disinformation challenges tools and techniques to deal or live with it
Jürgens diata12-communities
Gatewatching 11: Echo Chambers? Filter Bubbles? Reviewing the Evidence
Echo chambers and filter bubbles
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Ad

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to the R Programming Language
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Predictive modeling basics in data cleaning process
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Database Infoormation System (DBIS).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Transcultural that can help you someday.
PDF
annual-report-2024-2025 original latest.
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Leprosy and NLEP programme community medicine
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
[EN] Industrial Machine Downtime Prediction
STERILIZATION AND DISINFECTION-1.ppthhhbx
SAP 2 completion done . PRESENTATION.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to the R Programming Language
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Predictive modeling basics in data cleaning process
Reliability_Chapter_ presentation 1221.5784
Database Infoormation System (DBIS).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
Miokarditis (Inflamasi pada Otot Jantung)
Transcultural that can help you someday.
annual-report-2024-2025 original latest.
Ad

sigmod-keynote.pdf

  • 1. On A Quest for Combating Filter Bubbles and Misinformation Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada SIGMOD 2022, Philadelphia, PA 6/15/22
  • 2. Prolegomenon • What this talk is not about and will not do for you. • Classify different kinds of “fake news”: e.g., mis/disinformation ... • Computational Fact Checking or Claim Verification • Offer a comprehensive solution to the filter bubble/echo chambers or “fake news” problems. • The scope of both stretch beyond just tech (e.g., models and algorithms). • Even the “tech-restricted” versions we won’t get to completely solve today (in this talk). 6/15/22 SIGMOD 2022, Philadelphia, PA 3
  • 3. Prolegomenon • Instead, we will examine some (necessarily restricted) models and formulations of problems. • Offer a view of how research done in some different contexts may inspire techniques for solving restricted versions of the filter bubbles / echo chambers and the misinformation problems. • In case I missed your work, … 6/15/22 SIGMOD 2022, Philadelphia, PA 4
  • 4. Not long ago, or maybe long ago …
  • 5. And then came … 6/15/22 SIGMOD 2022, Philadelphia, PA 6 but arguably also these … Which led to many great things
  • 6. 6/15/22 SIGMOD 2022, Philadelphia, PA 7
  • 7. •Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 8
  • 8. ["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017]. Filter Bubble and Echo Chambers exacerbate polarization
  • 9. Filter Bubble and Echo Chambers exacerbate polarization ["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
  • 10. Political Echo Chambers ● Members of densely connected groups are likely to have the same opinions and attitudes. ● Study focus on opposing political echo chambers (~250K each) on Twitter in Japan. ● Political echo chambers have denser and more core-periphery information spreading structures than those of most other communities. 5/14/22 SIGMOD 2022, Philadelphia, PA 11 [Asatani et al. Dense and influential core promotion of daily viral information spread in political echo chambers. Scientific Reports 2021].
  • 11. The Price of Filter Bubbles • Filter bubbles and echo chambers can impede natural opinion formation [Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018]. • Can lead to one-sided policy decisions [Perrone and Wieder. Pro-painkiller echo chamber shaped policy amid drug epidemic. The Center for Public Integrity, 2016]. • And erosion of societal trust [Nguyen. Echo chambers and epistemic bubbles. Episteme, 2020]. 6/15/22 SIGMOD 2022, Philadelphia, PA 12
  • 12. • Filter Bubbles and Echo Chambers •Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 13
  • 13. Misinformation is Not a New Problem 6/15/22 SIGMOD 2022, Philadelphia, PA 14
  • 14. Economic Cost of Misinformation 6/15/22 SIGMOD 2022, Philadelphia, PA 15
  • 15. Economic Impact of Misinformation 6/15/22 SIGMOD 2022, Philadelphia, PA 16 FAKE NEWS: ELECTIONS THE U.S. TO SPEND $200 MILLION ALONE ADVANCING FAKE NEWS $400 MILLION SPENT GLOBALLY ON FAKE POLITICAL NEWS COVID-19 Vaccine Misinformation and Disinformation Costs an Estimated $50 to $300 Million Each Day [Bruns, Hosangadi, Trotochaud, and Sell. Johns Hopkins Center for Health Security. 2021]. [U. of Baltimore and CHEQ. The economic cost of bad actors on the internet. Fake News 2019].
  • 16. Misinformation Propagation ● The connections between misinformation spreaders are denser than connections between fact-checkers. ● Increasing the value of k takes us from the periphery to the denser inner core structure. 5/14/22 SIGMOD 2022, Philadelphia, PA 18 k-Core decomposition of the pre-Election retweet network. Orange = fact- checks and purple = claims. [Shao, Hui, Wang et al. Anatomy of an online misinformation network. PLoS ONE 2018].
  • 17. Misinformation Propagation + Bubbles ● Echo-chambers with misinformed sub-communities are much denser than those with informed sub-communities. 5/14/22 SIGMOD 2022, Philadelphia, PA 19 [Memon and Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. CEUR Workshop 2020]. (a) Retweet (b) Mention (c) Reply (d) Retweet+Mention+Reply
  • 18. • Filter Bubbles and Echo Chambers • Misinformation •Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 20
  • 19. Densest Subgraphs: Undirected • What is a good notion of density? • Classical: average degree: ! " = $ % . • Average #motifs/vertex: ' ", Ψ = * +,, % . '-./ − optimal density. • E.g., Δ-density. • More generally, Ψ-density for pattern Ψ (e.g., h-clique). • Intuition: densest subgraphs may indicate echo chambers. 6/15/22 SIGMOD 2022, Philadelphia, PA #instances of Ψ (motif) in G. 21
  • 20. Different notions of density. 6/15/22 SIGMOD 2022, Philadelphia, PA -densest subgraph. density = 11/7. -densest subgraph. density = 2/4. • Clique-density. • Pattern-density. 22
  • 21. k-cores and k-clique-cores 6/15/22 SIGMOD 2022, Philadelphia, PA 3 2 1, 0 <latexit sha1_base64="h+a8v17wh/Dw4VGrNfEaZ6hpP7Q=">AAAB9XicbVDLSgNBEJyNrxhfUY9eBoPgxbArQT0GvHiMYB6QrGF20kmGzGOZmVXDkv/w4kERr/6LN//GSbIHTSxoKKq66e6KYs6M9f1vL7eyura+kd8sbG3v7O4V9w8aRiWaQp0qrnQrIgY4k1C3zHJoxRqIiDg0o9H11G8+gDZMyTs7jiEUZCBZn1FinXQ/6ohIPaVnE6o0mG6x5Jf9GfAyCTJSQhlq3eJXp6doIkBayokx7cCPbZgSbRnlMCl0EgMxoSMygLajkggwYTq7eoJPnNLDfaVdSYtn6u+JlAhjxiJynYLYoVn0puJ/Xjux/aswZTJOLEg6X9RPOLYKTyPAPaaBWj52hFDN3K2YDokm1LqgCi6EYPHlZdI4LwcX5cptpVStZHHk0RE6RqcoQJeoim5QDdURRRo9o1f05j16L9679zFvzXnZzCH6A+/zB+i2kr8=</latexit> k-cores <latexit sha1_base64="9hFQByLqhvsg+DyYKvGpEOrpZNE=">AAACBHicbVDLSgNBEJyNrxhfUY+5DAYhgoZdCeox4MVjBPOA7BJmJ51kyOzMMjMrhiUHL/6KFw+KePUjvPk3Th4HTSxoKKq66e4KY860cd1vJ7Oyura+kd3MbW3v7O7l9w8aWiaKQp1KLlUrJBo4E1A3zHBoxQpIFHJohsPrid+8B6WZFHdmFEMQkb5gPUaJsVInXygNT7FvFCOiz+HEj0L5kJ6NqVSgO/miW3anwMvEm5MimqPWyX/5XUmTCIShnGjd9tzYBClRhlEO45yfaIgJHZI+tC0VJAIdpNMnxvjYKl3ck8qWMHiq/p5ISaT1KAptZ0TMQC96E/E/r52Y3lWQMhEnBgSdLeolHBuJJ4ngLlNADR9ZQqhi9lZMB0QRamxuORuCt/jyMmmcl72LcuW2UqxW5nFkUQEdoRLy0CWqohtUQ3VE0SN6Rq/ozXlyXpx352PWmnHmM4foD5zPHyBll8E=</latexit> (k, 4)-cores 0 1 2, 3 (", $)-core of G – maximal subgraph where each vertex participates in ≥ ' instances of Ψ. 23
  • 22. Densest Subgraph Discovery 6/15/22 SIGMOD 2022, Philadelphia, PA Problem: Given a graph G(V, E) and an h-clique Ψ "#, %# , find the subgraph D with the highest h-clique density & ', Ψ . Ψ can be any pattern: e.g., a 3-star, Δ, etc. Focus of this talk: h-cliques. 24
  • 23. SOTA1 : Densest Subgraph Discovery: Exact • Binary search to guess the density • Construct the flow network • Based on guessed density and original graph • Use max-flow algorithm to check the feasibility • Example: ! = 0, % = 1 (max triangle deg) • α= (l+r)/2=0.5. • Run time: ' ( ) * − 1 ℎ − 1 + ) Λ + min ), Λ 2 . 1 As of 2017. 6/15/22 SIGMOD 2022, Philadelphia, PA 25 [Mitzenmacher, Pachocki, Peng, Tourakakis, and Xu. Scalable large near-clique detection in large-scale networks via sampling. KDD 2015]. #instances of Ψ. ⇒⇒
  • 24. A DS Discovery – A Triangle Example 6/15/22 SIGMOD 2022, Philadelphia, PA B C D s t Ψ" Ψ# Ψ$ Ψ% 0 1 1 1 3& 3& 3& 3& +∞ +∞ +∞ +∞ +∞ +∞ +∞ +∞ 1 1 1 Flow network. 26 If ) = 0.5 If ) = 1/3 ⇐
  • 25. SOTA1 Densest Subgraph Discovery: Approximation • Approximation algorithm: PeelApp • Iteratively peel the vertex w/ smallest h-clique-degree. • Let !", !$, … be the list of residual subgraphs generated. • Return !& with the highest density. • Approximation: • The density of S is at least " '( ⋅ *+,- = " / ⋅ *012. • Running time: time. 6/15/22 SIGMOD 2022, Philadelphia, PA <latexit sha1_base64="iHkLEsdke5bqZTUfsJFWe3g6ats=">AAACBHicbVDLSsNAFJ34rPUVddnNYBHqoiWRoi5cFNy4s4J9QBPKZDJph05mwsxEKKELN/6KGxeKuPUj3Pk3TtsstPXAhcM593LvPUHCqNKO822trK6tb2wWtorbO7t7+/bBYVuJVGLSwoIJ2Q2QIoxy0tJUM9JNJEFxwEgnGF1P/c4DkYoKfq/HCfFjNOA0ohhpI/Xt0m2FezgUGmZh1fXwUAhF4LDqTk5h3y47NWcGuEzcnJRBjmbf/vJCgdOYcI0ZUqrnOon2MyQ1xYxMil6qSILwCA1Iz1COYqL8bPbEBJ4YJYSRkKa4hjP190SGYqXGcWA6Y6SHatGbiv95vVRHl35GeZJqwvF8UZQyqAWcJgJDKgnWbGwIwpKaWyEeIomwNrkVTQju4svLpH1Wc89r9bt6uXGVx1EAJXAMKsAFF6ABbkATtAAGj+AZvII368l6sd6tj3nripXPHIE/sD5/AEI0lo0=</latexit> O(n · ✓ d 1 h 1 ◆ ) [Tsourakakis. The k-clique densest subgraph problem. WWW 2015]. 27 1 As of 2017.
  • 26. DSD: SOTA Limitations • Initial bounds on ! not tight. • Size of flow network can be large: e.g., large G with many instances of Ψ. • Flow network built from original G each time. • Even PeelApp does redundant work. 6/15/22 SIGMOD 2022, Philadelphia, PA 28 $, Ψ -core to the rescue! Can we “bound” the densest subgraph?
  • 27. Bounding Densest Subgraphs with Cores • Theorem: G, k, Ψ as before. H a (#, Ψ)-core of G. Then: # &' ≤ ) *, Ψ ≤ #+,-. Special case: #+,--core has density in /012 3 , #+,- . 6/15/22 SIGMOD 2022, Philadelphia, PA [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 29 h
  • 28. Bounding DSG with cores: An Example 6/15/22 SIGMOD 2022, Philadelphia, PA For !"#$ = 2 and a 2-core, LB = 1 and UB = 2. ' = 1. ' = 5 4 , 9 6 , 13 8 , ⋯ → 2. 30 [Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
  • 29. Bounding Densest Subgraphs with Cores • Lemma: The DSG of G must be contained in its (⌈#$%&⌉, Ψ)-core. 6/15/22 SIGMOD 2022, Philadelphia, PA 31 [Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
  • 30. Exact algorithm: CoreExact • Our algorithm: CoreExact • Follow the same framework as existing exact algorithm • Three core-based optimization techniques • Binary search to guess the density • Construct the flow network • Based on guessed density and original graph • Use max-flow algorithm to check the feasibility 6/15/22 SIGMOD 2022, Philadelphia, PA 32 1. Tighter bounds derived from cores [ "#$% &' , )*+,] 2. Build the flow network on cores 3. Locate Clique-densest subgraph in even smaller cores after each checking [Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
  • 31. Approximation Algorithms • IncApp: • Do a (", Ψ)-core decomposition of G. time. • Return the ("&'(, Ψ)-core. • ) |+,| = ) . -approximation. • Finding (repeatedly) clique-degree can be expensive for large cliques. • CoreApp: Heuristic to directly find ("&'(, Ψ)-core. 6/15/22 SIGMOD 2022, Philadelphia, PA <latexit sha1_base64="ojo/HvrAsrswEIka12R2Rr1XIFU=">AAACBHicbVDLSsNAFJ3UV62vqMtuBotQFy2JFHVZcOPOCvYBTSiTyaQdOpkJMxOhhC7c+CtuXCji1o9w5984bbPQ1gMXDufcy733BAmjSjvOt1VYW9/Y3Cpul3Z29/YP7MOjjhKpxKSNBROyFyBFGOWkralmpJdIguKAkW4wvp753QciFRX8Xk8S4sdoyGlEMdJGGtjl2yr3cCg0zMKa6+GREIrAUc2dnsGBXXHqzhxwlbg5qYAcrYH95YUCpzHhGjOkVN91Eu1nSGqKGZmWvFSRBOExGpK+oRzFRPnZ/IkpPDVKCCMhTXEN5+rviQzFSk3iwHTGSI/UsjcT//P6qY6u/IzyJNWE48WiKGVQCzhLBIZUEqzZxBCEJTW3QjxCEmFtciuZENzll1dJ57zuXtQbd41K08njKIIyOAFV4IJL0AQ3oAXaAINH8AxewZv1ZL1Y79bHorVg5TPH4A+szx8+mJaB</latexit> O(n · ✓ d 1 h 1 ◆ ) 33 [Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
  • 32. Approximation Algorithms Core App: 1. Sort vertices of G in ↓ order of their h-clique-based core number, using cheaper proxy. 2. Obtain the max core & core number " from top-# vertices 3. If the max degree of remaining vertices is larger than " • # = 2×#, repeat 2. • Otherwise, output the max core 6/15/22 SIGMOD 2022, Philadelphia, PA 34 Same worst case time complexity as IncApp and PeelApp (SOTA) but much faster in practice. [Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
  • 33. Sample Experiment Results 6/15/22 SIGMOD 2022, Philadelphia, PA As-Caida (n = 26K, m = 106K). Friendster (n = 20M, m = 106M). 35 [Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
  • 34. Mini Case Study: Covid-19 •Covid-19 Retweets. 1,025,937 retweets involving 660,730 users. è(660,730 nodes, 835193 edges). •Largest connected component: (399,962 nodes, 663,506 edges) 6/15/22 SIGMOD 2022, Philadelphia, PA 36 Courtesy: Thirumuruganathan, QCRI.
  • 35. 6/15/22 SIGMOD 2022, Philadelphia, PA 37 Densest subgraph : 86 vertices 18-core density : 12.5407 Top-2 densest subgraph: 1134 vertices 13-core density : 10.0150 Cross edges: 296 Side effects of Vaccine Modes of Transm- ission of Virus. Case counts in diff states and countries.
  • 36. Mini Case Study II: Voter Fraud 2020 6/15/22 SIGMOD 2022, Philadelphia, PA 38 Tweets on US Presidential Election 2020. Number of nodes : 1,385,225 Number of edges : 6,631,720 Number of Tweets: 8,085,323 Size of the largest connected component: Number of nodes: 1,356,657 Number of edges : 6,611,465 Courtesy: Thirumuruganathan, QCRI.
  • 37. 6/15/22 SIGMOD 2022, Philadelphia, PA 39 1962 vertices 91-core Density: 83.7665 2206 vertices 54-core Density: 50.9231 Cross edges: 1385
  • 38. 6/15/22 SIGMOD 2022, Philadelphia, PA 40 Repeated allegations of voter fraud. retweeting Sydney Powell’s tweet warning states against certifying the election. Quoting Trump “dirty rolls ==> dirty polls”. big tech is colluding with dems to defeat Trump. Vote in person to fight against mail-in voter fraud. FBI said many military mail-in votes, all for Trump, were thrown away in a ditch in PA. Biggest voter fraud in American history. Voting machines known to be insecure. Need proof of citizenship and photo ID to prevent fraud. Fact-checkers from AP, Politifact, & Reuters confirm -- no evidence of widespread election fraud. Experts confirm elections are secure; most of the interference comes from misinformation campaigns. GOP and Trump team are sowing disinfo. and panic. Need to protect democracy. Trump’s narrower margin wins in 2016 vs Biden’s wider ones in 2020. Debunk “Deborah Jean Christiansen’s vote is fraud” by quoting her. More former Trump aides getting infected than voter fraud cases! Quotes of Sydney Powell’s tweet; replies that there is no evidence of widespread fraud; Biden brags about having “the most extensive and inclusive VOTER FRAUD organization in the history of American politics; (CNN) dishonesty taxonomy of Trump rally; Phily Mayor hiding info. from people. Anyone caught cheating with Voter Fraud games should be federally charged; State officials from both parties stated the election went well. Losing side refusing to recognize clear winner; weaving conspiracy theories and strangling faith and belief.
  • 39. Mini Case Study III: Nepal Earthquake 6/15/22 SIGMOD 2022, Philadelphia, PA 41 • Graph constructed from cascades of tweets collected following the Nepal earthquake, April 2015. • 265383 nodes. • 3898972 edges. • largest connected component: • 258756 nodes. • 3771999 edges. Courtesy: Thirumuruganathan, QCRI. https://zenodo.org/record/2587475#.Ypkxmi-caFg.
  • 40. 6/15/22 SIGMOD 2022, Philadelphia, PA 42 1463 vertices 129-core density: 105.328 370 vertices 115-core density : 71.9378 129 edges Requests for help Info on earthquake – magnitude, distance to cities affected from capital Reports on damage and ruin
  • 41. Recent Progress on DSGs WWW2020 Provide near optimal via multiple peeling 1 + # -approx within $( & '( ) *∗ ⋅ - ./) proved by [SODA2022] STOC2020 (1 + #)-approximation on dynamic graph With $(log4 5 ⋅ #67) per edge insertion/deletion WWW2020 Define and find minimal DSG Minimal: no proper subgraph is a DSGs SODA2022 A flow-based 1 + # - approx algo With 8 $( 9 . ) 6/15/22 SIGMOD 2022, Philadelphia, PA 43 [Digvijay, Gao, Peng et al. Flowless: Extracting densest subgraphs without flow computations. WWW 2020]. [Sawlani and Wang. Near-optimal fully dynamic densest subgraph. STOC 2020]. [Chang and Qiao. Deconstruct Densest Subgraphs. WWW 2020]. [Chekuri, Quanrud, and Torres. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. SODA 2022].
  • 42. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected •Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 44
  • 43. Directed Densest Subgraphs 6/15/22 SIGMOD 2022, Philadelphia, PA a e d c b !∗ #∗ A directed densest subgraph (DDS) of a digraph is a pair of vertex sets (S, T). Its density is <latexit sha1_base64="jzi2npcaaUdd+d3XTNTd2P0/iEE=">AAACGnicbVBNS8NAEN3U7/oV9ehlsQgKUhIp6kUoiuBRsdVCU8pmu2mXbrJxdyKUJL/Di3/FiwdFvIkX/43b2oNaHwy8fW+GnXl+LLgGx/m0ClPTM7Nz8wvFxaXllVV7bf1ay0RRVqdSSNXwiWaCR6wOHARrxIqR0Bfsxu+fDv2bO6Y0l1ENBjFrhaQb8YBTAkZq266nenLnaq+2i4+xFyhC0+xs9M7y1NO3CtLsKvNoRwLOalme47ZdcsrOCHiSuGNSQmNctO13ryNpErIIqCBaN10nhlZKFHAqWF70Es1iQvuky5qGRiRkupWOTsvxtlE6OJDKVAR4pP6cSEmo9SD0TWdIoKf/ekPxP6+ZQHDUSnkUJ8Ai+v1RkAgMEg9zwh2uGAUxMIRQxc2umPaIyQdMmkUTgvv35ElyvV92D8qVy0qpejKOYx5toi20g1x0iKroHF2gOqLoHj2iZ/RiPVhP1qv19t1asMYzG+gXrI8vxligKA==</latexit> ⇢(S, T) = |E(S, T)| p |S| · |T| -- generalizes edge density from undirected graphs. Problem: Find !∗ , #∗ with max. %. [Kannan and Vinay. Analyzing the structure of large graphs. Tech Report 1999]. 45
  • 44. SOTA1 DDS Discovery: Exact • Repeatedly solve Max-flow, similarly to the undirected case. • for each value of ! = |$| |%| : 0 < ) , |+| ≤ - • Find the max density by binary search. • Build flow network and solve Max-flow. • Overall time: . -/ 01234567 . • > 2 days on ~1,200 vertices and ~2,600 edges. 6/15/22 SIGMOD 2022, Philadelphia, PA [Khuller and Saha. On finding dense subgraphs. ICALP 2009]. 46 1As of 2019.
  • 45. SOTA DDS Discovery: Approximation 6/15/22 SIGMOD 2022, Philadelphia, PA Greedy Peeling Algorithm: • Build a bipartite graph (L,R,E) where ! = # = $ • The edges are all from ! copy to # copy • Each time remove a node with least degree • Report densest subgraph among those obtained. c a b d e % & + ( time. Approximation? 47 G [Khuller and Saha. On finding dense subgraphs. ICALP 2009].
  • 46. SOTA DDS Discovery: Approximation 6/15/22 SIGMOD 2022, Philadelphia, PA • Fix [personal communication with authors]. • 2-approximation algorithm • !(#(# + %)) KS-Approx density: 2.75 Ground truth density: 6 <latexit sha1_base64="Whotl/O/SEtWiMWhAbdFJgi04F4=">AAACfHicbVFdSwJBFB23L7MvrcdehiwoKtm1qB6jgnrwoSgrMJHZ8aqDs7PLzN1Qln5Cr/Xb+jPRrBqkdmHgcM7cz+NHUhh03a+MMzM7N7+QXcwtLa+sruUL648mjDWHKg9lqJ99ZkAKBVUUKOE50sACX8KT371M9adX0EaE6gH7EdQD1laiJThDS937Da+RL7oldxB0GngjUCSjuG0UMvylGfI4AIVcMmNqnhthPWEaBZfwlnuJDUSMd1kbahYqFoCpJ4NZ3+iOZZq0FWr7FNIB+zcjYYEx/cC3PwOGHTOppeR/Wi3G1lk9ESqKERQfNmrFkmJI08VpU2jgKPsWMK6FnZXyDtOMoz3PWJdB7Qj42CZJL1aCh02YYCX2UDNLGsCACZVulVSEinu0InywN1Hwq9qyqbx7JdoCzUHFeqAOrjVAd28qxdriTZowDR7LJe+oVL47Lp5fjAzKkk2yRXaJR07JObkht6RKOGmTd/JBPjPfzraz7xwOvzqZUc4GGQvn5Ad1dMU4</latexit> <latexit sha1_base64="5KqrWk8OLGSMOzvxwclVW1sn29I=">AAACfHicbVHLSiNBFK20zhgfo0aXbgqj4DAaulXUpaigiywUjQoxhOqbm1ikurqpui0JjZ/gVr/NnxGrYwSTzIWCwzl1nydMlLTk++8Fb2r61++Z4uzc/MKfxaXl0sqtjVMDWINYxeY+FBaV1FgjSQrvE4MiChXehd3TXL97QmNlrG+on2AjEh0t2xIEOeoamkFzuexX/EHwSRAMQZkN47JZKsBDK4Y0Qk2ghLX1wE+okQlDEhQ+zz2kFhMBXdHBuoNaRGgb2WDWZ77pmBZvx8Y9TXzA/szIRGRtPwrdz0jQox3XcvJ/Wj2l9lEjkzpJCTV8NWqnilPM88V5SxoEUn0HBBjpZuXwKIwAcucZ6TKonSCMbJL1Ui0hbuEYq6hHRjjSIkVC6nyrrCp12uNVGaK7icZv1ZXN5a0z2ZFkt6vOA719bhC7fydSnC3BuAmT4Ha3EuxVdq/2y8cnQ4OKbI2tsy0WsEN2zC7YJasxYB32wl7ZW+HD2/D+eTtfX73CMGeVjYR38Al3jMU5</latexit> <latexit sha1_base64="OciOVARK1sKEoilDge+XCapw8Sg=">AAACfHicbVFdbxJBFB1WrS3alupjXyaiCaYt2cWm9ZGoiT7wgGn5SICQ2csFJszObmbuNpANP8FX/W3+GeMsYCLQm0xycs7czxMmSlry/d8F78nTZwfPD4+KL14en5yWzl61bZwawBbEKjbdUFhUUmOLJCnsJgZFFCrshLPPud55QGNlrO9pkeAgEhMtxxIEOeoOhrVhqexX/VXwfRBsQJltojk8K0B/FEMaoSZQwtpe4Cc0yIQhCQqXxX5qMREwExPsOahFhHaQrWZd8neOGfFxbNzTxFfs/xmZiKxdRKH7GQma2l0tJx/TeimNPw4yqZOUUMO60ThVnGKeL85H0iCQWjggwEg3K4epMALInWery6p2grC1STZPtYR4hDusojkZ4UiLFAmp862yhtTpnDdkiO4mGv+prmwuV77IiSR72XAe6MuvBnH2fi/F2RLsmrAP2rVq8KFa+35drn/aGHTIztkbVmEBu2V19o01WYsBm7Af7Cf7VfjjvfUuvKv1V6+wyXnNtsK7+Qt5osU6</latexit> <latexit sha1_base64="DGIsGN9ixCJF6GsZzWTuQPbmAhU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ2O/sVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Be7jFOw==</latexit> <latexit sha1_base64="JOt/1H2zqv7i0ww80DAT2XJ/owU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ+OgsVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Bfc7FPA==</latexit> <latexit sha1_base64="wI9CgGlL/wh61/YzwYNb5yZoG+8=">AAACfHicbVHLSitBEO2Mb72+l24acy8oapjxvRQVdJGFco0KMYSeSiU26ekZumskYfAT3Oq3+TNiT4xgEgsaDud0PU+YKGnJ998L3tj4xOTU9Mzs3J/5hcWl5ZVbG6cGsAKxis19KCwqqbFCkhTeJwZFFCq8C9tnuX73hMbKWN9QN8FaJFpaNiUIctR/qB/Ul4p+ye8FHwVBHxRZP67qywV4aMSQRqgJlLC2GvgJ1TJhSILC59mH1GIioC1aWHVQiwhtLevN+sz/OabBm7FxTxPvsT8zMhFZ241C9zMS9GiHtZz8Taum1DyuZVInKaGGr0bNVHGKeb44b0iDQKrrgAAj3awcHoURQO48A116tROEgU2yTqolxA0cYhV1yAhHWqRISJ1vlZWlTju8LEN0N9H4rbqyubxxLluS7HbZeaC3Lwxie3MkxdkSDJswCm53S8Feafd6v3hy2jdomq2xdbbBAnbETtglu2IVBqzFXtgreyt8eH+9LW/n66tX6OessoHwDj8Bf+TFPQ==</latexit> <latexit sha1_base64="WOlJgwemx+DmvqbfEWG3xF6xG2Q=">AAACfHicbVFdSxtBFJ2sVlPbaqKPfRlMC5Zq2I0SfRQr6EMelDYqxBBmb26SIbOzy8xdSVjyE3zV39Y/UzobI5jECwOHc+Z+njBR0pLv/y14K6sf1taLHzc+ff6yuVUqb9/YODWATYhVbO5CYVFJjU2SpPAuMSiiUOFtOPyV67cPaKyM9R8aJ9iORF/LngRBjvoNnXqnVPGr/jT4MghmoMJmcdUpF+C+G0MaoSZQwtpW4CfUzoQhCQonG/epxUTAUPSx5aAWEdp2Np11wr87pst7sXFPE5+ybzMyEVk7jkL3MxI0sItaTr6ntVLqnbQzqZOUUMNLo16qOMU8X5x3pUEgNXZAgJFuVg4DYQSQO89cl2ntBGFuk2yUaglxFxdYRSMywpEWKRJS51tlDanTEW/IEN1NNL6qrmwu753LviS733Ae6P0Lgzj8sZTibAkWTVgGN7VqcFitXR9VTs9mBhXZV7bL9ljAjtkpu2RXrMmA9dkje2LPhX/eN++nd/Dy1SvMcnbYXHj1/4H6xT4=</latexit> <latexit sha1_base64="b/lZi7cHtUhY0qgyTwdfMpaH82g=">AAACfnicbVFdSxtBFL1ZW6u2WrWPfRkaLAoad6Ogj1IL9iEPFowKMYTZyU28ZHZ2mbkrCUt+g6/60/w3zsYUmsQLA4dz5n6eONPkOAxfKsHSh4/Ln1ZW1z5/Wd/4urm1fe3S3CpsqlSn9jaWDjUZbDKxxtvMokxijTfx4LzUbx7QOkrNFY8ybCeyb6hHSrKnmnGnqI87m9WwFk5CLIJoCqowjcvOVkXddVOVJ2hYaelcKwozbhfSMimN47W73GEm1UD2seWhkQm6djGZdix2PNMVvdT6Z1hM2P8zCpk4N0pi/zORfO/mtZJ8T2vl3DttF2SynNGot0a9XAtORbm66JJFxXrkgVSW/KxC3UsrFfsDzXSZ1M5QzWxSDHNDKu3iHKt5yFZ60iEnkky5VdEgkw9Fg2L0NzH4T/VlS3n3N/WJ3X7Du2D2LyziYG8hxdsSzZuwCK7rteioVv97XD37NTVoBb7DD9iFCE7gDP7AJTRBAcEjPMFzAMHP4CA4fPsaVKY532AmgtNXo8DFRg==</latexit> <latexit sha1_base64="HFj6g0RuKsIntz/MrjsPqy3QnNo=">AAACfnicbVFdSxtBFL3ZqvX7oz76MhgqChp3tVAfxQr6kAeFRoUYwuzkJl4yO7vM3C0JS35DX9uf1n/T2RjBJF4YOJwz9/PEmSbHYfivEnxaWFz6vLyyura+sbm1vfPlwaW5VdhQqU7tUywdajLYYGKNT5lFmcQaH+P+j1J//IXWUWp+8jDDViJ7hrqkJHuqEbeL81F7uxrWwnGIeRBNQBUmcdfeqajnTqryBA0rLZ1rRmHGrUJaJqVxtPqcO8yk6sseNj00MkHXKsbTjsRXz3REN7X+GRZj9n1GIRPnhknsfyaSX9ysVpIfac2cuxetgkyWMxr12qiba8GpKFcXHbKoWA89kMqSn1WoF2mlYn+gqS7j2hmqqU2KQW5IpR2cYTUP2EpPOuREkim3Kupk8oGoU4z+JgbfVF+2lA+vqUfsjuveBXN8YxH7R3Mp3pZo1oR58HBWi85rZ/ffqpdXE4OWYQ/24RAi+A6XcAt30AAFBL/hD/wNIDgIToLT169BZZKzC1MRXPwHpdfFRw==</latexit> <latexit sha1_base64="Mrac9AmGg1pDfULu3wtz7vyya0Y=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7PqB+ha0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDPbHFhw==</latexit> <latexit sha1_base64="5PN1cqdjLamdY7CGZ+vd+T/Tydo=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7Kr48Ra0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDP8jFiA==</latexit> <latexit sha1_base64="kaC645jKDfiAyiPu7G+ZAQe5fW0=">AAACf3icbVFdSwJBFB23b/vSeuxlSKKCkF0LqreooB58KEgNTGR2vOrk7OwyczeUxf/Qa/2z/k2ztkFqFwYO58z9PH4khUHX/co5C4tLyyura/n1jc2t7UJxp27CWHOo8VCG+tlnBqRQUEOBEp4jDSzwJTT8wU2qN95AGxGqJxxF0ApYT4mu4AwtVffbiXcxbhdKbtmdBJ0HXgZKJIuHdjHHXzohjwNQyCUzpum5EbYSplFwCeP8S2wgYnzAetC0ULEATCuZjDumB5bp0G6o7VNIJ+zfjIQFxowC3/4MGPbNrJaS/2nNGLsXrUSoKEZQ/KdRN5YUQ5ruTjtCA0c5soBxLeyslPeZZhzthaa6TGpHwKc2SYaxEjzswAwrcYiaWdIABkyodKukKlQ8pFXhg72Jgl/Vlk3lo1vRE2hOqtYGdXKnAQbHcynWFm/WhHlQr5S903Ll8ax0dZ0ZtEr2yD45Ih45J1fknjyQGuHklbyTD/Lp5JxDp+y4P1+dXJazS6bCufwGPavFhw==</latexit> … 18 vertices 36 vertices <latexit sha1_base64="0KVnyYv6DtN8OkwP1tQAIjKB8QQ=">AAACfHicbVFdbxJBFB1WWyttLdVHXybSJjStZBeN+kjUxD7wQKN8JEDI3eECE2ZnNzN3DWTDT/BVf5t/xjgLNCnQm0xycs7czxMmSlry/b8F78nTg8NnR8+LxyenL85K5y/bNk6NwJaIVWy6IVhUUmOLJCnsJgYhChV2wtmXXO/8RGNlrH/QIsFBBBMtx1IAOeo7DINhqexX/VXwfRBsQJltojk8L4j+KBZphJqEAmt7gZ/QIANDUihcFvupxQTEDCbYc1BDhHaQrWZd8kvHjPg4Nu5p4iv2YUYGkbWLKHQ/I6Cp3dVy8jGtl9L40yCTOkkJtVg3GqeKU8zzxflIGhSkFg6AMNLNysUUDAhy59nqsqqdoNjaJJunWop4hDusojkZcKRFikDqfKusIXU65w0ZoruJxnvVlc3lylc5kWRvGs4DffPNIM6u9lKcLcGuCfugXasG76q1u/fl+ueNQUfsNXvDKixgH1md3bImazHBJuwX+83+FP55F96193b91Stscl6xrfA+/AdzXMU3</latexit> Approximation Ratio ' (.*+ = 2.18 # of c nodes = 41( # of b nodes = 21( # of a nodes = 1 Ground truth density: 21 KS-Approx density: 23 (3456 Approx Ratio: (3456 ( Enlarge the graph [Khuller and Saha. On finding dense subgraphs. ICALP. 2009]. 7∗. 7∗. 48 [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
  • 47. Densest Directed Subgraph: An Exact Algorithm • (", $)-core: An (S, T)-induced subgraph: • Every node in S has outdegree ≥ ". • Every node in T has indegree ≥ $. • S and T not necessarily disjoint. • H = ({a,b}, {c,d}) is a (2, 2)-core. 6/15/22 SIGMOD 2022, Philadelphia, PA c a b d e ⇤ <latexit sha1_base64="vK5hisxuwaLuWz+t3EWeuy906m4=">AAACfHicbVHZSgMxFE3Hre7boy/BKriWGRX1UVTQhz5U7CLUKpn0toZmMkNyR1qGfoKv+m3+jJipFWzrhcDhnNz1+JEUBl33M+NMTE5Nz2Rn5+YXFpeWV1bXKiaMNYcyD2WoH3xmQAoFZRQo4SHSwAJfQtVvX6V69RW0EaEqYTeCesBaSjQFZ2ip+9LT3vNKzs27/aDjwBuAHBlE8Xk1wx8bIY8DUMglM6bmuRHWE6ZRcAm9ucfYQMR4m7WgZqFiAZh60p+1R7ct06DNUNunkPbZvxkJC4zpBr79GTB8MaNaSv6n1WJsntcToaIYQfGfRs1YUgxpujhtCA0cZdcCxrWws1L+wjTjaM8z1KVfOwI+tEnSiZXgYQNGWIkd1MySBjBgQqVbJQWh4g4tCB/sTRT8qrZsKu9ci5ZAc1CwHqiDGw3Q3h1LsbZ4oyaMg8pR3jvOH92d5C4uBwZlyQbZJDvEI2fkgtySIikTTlrkjbyTj8yXs+XsO4c/X53MIGedDIVz+g1Hc8Ui</latexit> ⇤ <latexit sha1_base64="IfdjkWd9tC1nJRISm8srvbkdDxo=">AAACfHicbVHLSgMxFE3HV32/lm6CVaivMqOiLkUFXXRR0bZCrZJJb2toJjMkd6Rl6Ce41W/zZ8RMrWBbLwQO5+Q+jx9JYdB1PzPOxOTU9Ex2dm5+YXFpeWV1rWLCWHMo81CG+sFnBqRQUEaBEh4iDSzwJVT99mWqV19BGxGqe+xGUA9YS4mm4AwtdXf3tPu8knMLbj/oOPAGIEcGUXpezfDHRsjjABRyyYypeW6E9YRpFFxCb+4xNhAx3mYtqFmoWACmnvRn7dFtyzRoM9T2KaR99m9GwgJjuoFvfwYMX8yolpL/abUYm2f1RKgoRlD8p1EzlhRDmi5OG0IDR9m1gHEt7KyUvzDNONrzDHXp146AD22SdGIleNiAEVZiBzWzpAEMmFDpVklRqLhDi8IHexMFv6otm8r5K9ESaPaL1gO1f60B2jtjKdYWb9SEcVA5LHhHhcPb49z5xcCgLNkgmyRPPHJKzskNKZEy4aRF3sg7+ch8OVvOnnPw89XJDHLWyVA4J99FW8Uh</latexit> [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 49
  • 48. Densest Directed Subgraph: Core-Exact Theorem: The DDS of G is contained in the ( "∗ $ % , %⋅"∗ $ )- core. • a = )∗ |+∗| -- unknown; search through all , - : 0 < 1, 2 ≤ 4. • 6∗ -- unknown: start with good bounds and use binary search. • E.g., lower bound = any 2-approx. solution and upper bound = 2 × lower bound. • Still 9(4$ :;%<=>?@) but much faster in practice – smaller flow graphs. 6/15/22 SIGMOD 2022, Philadelphia, PA 50 [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
  • 49. Densest Directed Subgraph: DC-Exact • Uses a “divide and conquer” approach. • For a given ! " , result of binary search for “best” (S,T) pair gives enough info. about subranges of ratios that can be skipped. • Algorithm DC-Exact: $ %&'()*+,- , e.g., … • % ≪ /0 in practice. 6/15/22 SIGMOD 2022, Philadelphia, PA 51 [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
  • 50. Densest Directed Subgraph: Core-Approx • G[S,T] – (x,y)-core of G. Then ! ", $ ≥ &'. • Let [&∗ , '∗ ] be the max core-number pair, i. e. , it maximizes &' among all (&, ')-cores. • !∗ ≤ 2 &∗'∗. • èThe (&∗ , '∗ )-core is a 2-approx. solution to DDS. 6/15/22 SIGMOD 2022, Philadelphia, PA 52 [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
  • 51. Densest Directed Subgraph: Core-Approx • Naïve implementation: for each !, compute all (!, $)- cores, 0 < $ < (, and return (!∗ , $∗ )-core à *(( + + ( ) time. • Can we do better? 6/15/22 SIGMOD 2022, Philadelphia, PA 53 [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
  • 52. Densest Directed Subgraph: Core-Approx 6/15/22 SIGMOD 2022, Philadelphia, PA 54 x 8 5 2 7 1 6 y 4 3 7 4 8 2 1 3 6 5 ? <latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit> ? <latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit> Candidates [ ⇤ , ⇤ ] Main idea: for each ! ≤ #, search for the largest %; for each % ≤ #, search for the largest !; &( ( ⋅ (* + ()) time. Max equal pair: (#, #). [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
  • 53. Sample Experiment Results: Exact Algorithms 6/15/22 SIGMOD 2022, Philadelphia, PA 55 Up to 6 orders of magnitude faster Datasets MO: (~200, ~2.6K) TC: (~1.2K, ~2.7K) OF: (~3K, ~30K) AD: (~6.4K, ~57K) ) AM: (~400K, ~3.4M) [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020].
  • 54. Sample Experiment Results: Approx Algorithms 6/15/22 SIGMOD 2022, Philadelphia, PA 56 Up to 6 orders of magnitude faster Datasets MO: (~200, ~2.6K) TC: (~1.2K, ~2.7K) OF: (~3K, ~30K) AD: (~6.4K, ~57K) ) AM: (~400K, ~3.4M) AR: (~3.4M, ~5.8M) BA: (~2.1M, ~17.8M) TW: (~52.6M, ~1.96B) [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Bahmani, Kumar, Vassilvitskii. Densest Subgraph in Streaming and MapReduce. VLDB 2012].
  • 55. Better Approximation Ratio? • Propose a new LP formulation for DDS problem • A divide-and-conquer algorithmic framework • An efficient (1 + $)-approximation algorithm • An efficient exact algorithm • Up to 3 orders of magnitude faster than the state-of-the- art exact and approximation algorithms 6/15/22 SIGMOD 2022, Philadelphia, PA 57 Any real positive number [Ma, Fang, Cheng, L., and Han. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery. SIGMOD 2022]. For more details, go to Chenhao’s talk Wednesday at 2 pm in Rm 202B.
  • 56. Recent Progress on DDS • A Concurrent work from SODA2022 • Gives (1 + $)-approximation in & '( ( ) ) time via network flow for undirected graphs • Can also be extended to directed graphs with extra time cost • It would be interesting to compare two algos empirically 6/15/22 SIGMOD 2022, Philadelphia, PA 58 [Chekuri, Quanrud, and Torres. “Densest Subgraph: Supermodularity, Iterative Peeling, and Flow.” SODA 2022].
  • 57. Mini Case Study: Covid-19 •Covid-19 Retweets. 1,025,937 retweets involving 660,730 users. è(660,730 nodes, 835193 edges). •Largest connected component: (399,962 nodes, 663,506 edges) 6/15/22 SIGMOD 2022, Philadelphia, PA 59 Courtesy: Thirumuruganathan, QCRI.
  • 58. Directed Densest Subgraph from Covid-19 6/15/22 SIGMOD 2022, Philadelphia, PA 60 Source Nodes = 777 Target Nodes = 15 Common Nodes = 2 (5 70)-core. Density: 55.8826 777 nodes “influenced” by 15 “initiators”. Vaccine side effects, Modes of Transmission.
  • 59. Mini Case Study II: Nepal Earthquake 6/15/22 SIGMOD 2022, Philadelphia, PA 61 • Graph constructed from cascades of tweets collected following the Nepal earthquake, April 2015. • 265383 nodes. • 3898972 edges. • largest connected component: • 258756 nodes. • 3771999 edges. https://zenodo.org/record/2587475#.Ypkxmi-caFg. Courtesy: Thirumuruganathan, QCRI.
  • 60. Directed Densest Subgraph from Nepal 6/15/22 SIGMOD 2022, Philadelphia, PA 62 Source Nodes: 122637 Target Nodes: 25233 Common nodes: 20713 (1,51)-core density: 34.309 Tens of thousands of “initiators” and more than a hundred thousand of ”influenced”. Info on damage and requests for help.
  • 61. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed •Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 63
  • 62. Propagation/Diffusion Models 6/15/22 SIGMOD 2022, Philadelphia, PA 64 • How does influence/information travel in networks? • Example Phenomena: infection, product adoption, information, opinion, rumor, etc. • Stochastic diffusion models – discrete/continuous time. • How can we launch campaigns to optimize design objectives? [Kempe,Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. [W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
  • 63. Influence Maximization • Core optimization problem in IM: Given a diffusion model M, a network G = (V, E), model parameters, and problem parameters (e.g., budget). Find a seed set under budget that maximizes . expected number of adopters given initial adopters S (spread). S ⇢ V M (S) 65 6/15/22 SIGMOD 2022, Philadelphia, PA 65 e.g., edge propagation probabilities.
  • 64. Complexity of IM • Theorem: The IM problem is NP-hard for several major diffusion models under both discrete time and continuous time. 6/15/22 SIGMOD 2022, Philadelphia, PA 66 [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
  • 65. Complexity of Spread Computation • Theorem: It is #P-hard to compute the expected spread of a node set under major diffusion models. #simple paths in a digraph. [Chen, Wang, and Yang. Efficient influence maximization in social networks. KDD 2009]. [Chen, Yuan, and Zhang. Scalable influence maximization in social networks under the linear threshold model. ICDM 2010]. [W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013]. 6/15/22 SIGMOD 2022, Philadelphia, PA 67
  • 66. Properties of Spread Function is monotone: S ✓ S0 =) (S)  (S0 ). (S) 6/15/22 SIGMOD 2022, Philadelphia, PA 68
  • 67. Properties of Spread Function is submodular: (S) S ⇢ S0 ⇢ V, x 2 V S0 =) (x|S0 )  (x|s), where (x|S) := (S [ {x}) (S). marginal gain. 6/15/22 SIGMOD 2022, Philadelphia, PA 69 [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
  • 68. Approximation of Submodular Function Maximization • Theorem: Let be a monotone submodular function, with Let and resp. be the greedy and optimal solutions. Then OPT f : 2V ! R 0 f(;) = 0. SGrd S⇤ f(SGrd ) (1 1 e )f(S⇤ ). [Nemhauser, Woolsey, and Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Prog. 1978]. 6/15/22 SIGMOD 2022, Philadelphia, PA 70
  • 69. Approximation of Submodular Function Maximization • Theorem: The spread function is monotone and submodular under various major diffusion models, for both discrete and continuous time. (.) 6/15/22 SIGMOD 2022, Philadelphia, PA 71 [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
  • 70. Baseline Approximation Algorithm Monte Carlo simulations for estimating expected spread. Lazy Forward optimization to save useless updates. è Greedy still extremely slow on large networks. [Leskovec, Krause, Guestarin, Faloutsos, VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. KDD 2007]. [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. 6/15/22 SIGMOD 2022, Philadelphia, PA 72
  • 71. Reverse Influence Sampling • A series of algorithms that guarantee a -approximation to the optimal expected spread. • Key : use random reverse reachable sets (rr-sets) to gauge quality of (candidate) seeds. (1 1 e ✏) <latexit sha1_base64="AW/ZWNJ71ORm2nTuWljbif+hLkI=">AAACAXicbVBNS8NAEN34WetX1IvgZbEI9dCSVEGPBS8eK9gPaErZbCft0s0m7G6EEuLFv+LFgyJe/Rfe/Ddu2xy09cHA470ZZub5MWdKO863tbK6tr6xWdgqbu/s7u3bB4ctFSWSQpNGPJIdnyjgTEBTM82hE0sgoc+h7Y9vpn77AaRikbjXkxh6IRkKFjBKtJH69nHZrXiBJDR1sxSyigexYjwS53275FSdGfAycXNSQjkaffvLG0Q0CUFoyolSXdeJdS8lUjPKISt6iYKY0DEZQtdQQUJQvXT2QYbPjDLAQSRNCY1n6u+JlIRKTULfdIZEj9SiNxX/87qJDq57KRNxokHQ+aIg4VhHeBoHHjAJVPOJIYRKZm7FdERMHtqEVjQhuIsvL5NWrepeVGt3l6V6PY+jgE7QKSojF12hOrpFDdREFD2iZ/SK3qwn68V6tz7mrStWPnOE/sD6/AGGeJZN</latexit> [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]. 6/15/22 SIGMOD 2022, Philadelphia, PA 73
  • 72. Reverse Reachable Sets (RR-Sets) 7 A B C E D 0.4 0.3 0.6 0.5 0.2 0.3 0.4 • rr-set = sample subgraph of G. • example of rr-set generation under IC model. [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]. 6/15/22 SIGMOD 2022, Philadelphia, PA 74
  • 73. Reverse Reachable Sets (RR-Sets) 7 start from a random node A B C E D 0.4 0.3 0.6 0.5 0.2 0.3 0.4 RR-set = {A} • rr-set = sample subgraph of G. • example of rr-set generation under IC model. [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014] 6/15/22 SIGMOD 2022, Philadelphia, PA 75
  • 74. Reverse Reachable Sets (RR-Sets) 7 • An RR-set is a subgraph sample of ! • Generation of RR-sets under the IC model: start from a random node A B C E D 0.4 0.3 0.6 0.5 0.2 0.3 0.4 sample its/their incoming edges RR-set = {A, C, B, E} add the sampled neighbors • Intuition: – An rr-set is a sample set of nodes that can influence node A [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014] 6/15/22 SIGMOD 2022, Philadelphia, PA 76
  • 75. Influence Estimation with RR-Sets • Theorem: Pr[S overlaps a random rr-set] = ! " × expected spread of S. • Family of approx. algorithms: TIM, IMM, Stop- and-Stare, … [Tang et al., “Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency”, SIGMOD 2014] [Tang et al., “Influence Maximization in Near-Linear Time: A Martingale Approach”, SIGMOD 2015] [Chen et al. An issue in the Martingale Analysis of the Influence Maximization Algorithm IMM. arXiv 2018]. [Nguyen et al., “Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks”, SIGMOD 2016] à arXiv [K. Huang, S. Wang, G. Bevilacqua, X. Xiao, and L. Revisiting the Stop-and-Stare Algorithms for Influence Maximization, PVLDB 2017] 6/15/22 SIGMOD 2022, Philadelphia, PA 77
  • 76. What if objective is not submodular? 6/15/22 SIGMOD 2022, Philadelphia, PA 78 • Max non-decreasing non-submodular function. ! "#$% ≥ 1 ( 1 − e+,- OPT. [Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
  • 77. What if objective is not submodular? 6/15/22 SIGMOD 2022, Philadelphia, PA 79 • Max non-decreasing non-submodular function. ! "#$% ≥ 1 ( 1 − e+,- OPT. [Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
  • 78. What if objective is not submodular? 6/15/22 SIGMOD 2022, Philadelphia, PA 80 • Max non-decreasing non-submodular function. ! "#$% ≥ 1 ( 1 − e+,- OPT. [Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
  • 79. What if the objective is not submodular? 6/15/22 SIGMOD 2022, Philadelphia, PA 81 to the rescue! [Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016]. • f – monotone but not submodular. • !, # – monotone and submodular and ! (#) lower (resp. upper) bounds f. • Let $% ($', $() be the Greedy solution to max -⊆/, - 01 2 $ and $34 ∈ {$%, $', $(} (resp. …) be the best w.r.t. f(.). Then
  • 80. What if the objective is not submodular? 6/15/22 SIGMOD 2022, Philadelphia, PA 82 to the rescue! [Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016]. ! "#$ ≥ max{ !("+) -("+) , /("0 123 ) !("0 123 ) } ⋅ 1 − 1 8 ⋅ ! "0 123 . OPT.
  • 81. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization •Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 83
  • 82. Filter Bubbles, Echo Chambers, and Polarization • Selective exposure to viewpoints/issues can engender/worsen polarization. [Pariser. The filter bubble: What the Internet is hiding from you. Penguin, 2011]. [Bakshy, Messing, and Adamic. Exposure to ideologically diverse news and opinion on Facebook. Science 2015]. • Aggravated by echo chambers in social media. [Garrett. Echo chambers online?: Politically motivated selective exposure among internet news users. JCMC 2009]. [Akoglu. Quantifying political polarity based on bipartite opinion networks. ICWSM 2014]. [Amelkin, Singh, and Bogdanov. A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks. TKDD 2019]. [Chen, Lijffijit,, and De Bie. Quantifying and Minimizing Risk of Conflict in Social Media. KDD 2018]. [Garimella, de Morales, Gionis, and Mathioudakis. Quantifying Controversy over Social Media. TOCS 2018]. 6/15/22 SIGMOD 2022, Philadelphia, PA 84
  • 83. Balancing Exposure by Connections • Link Recommendation [Amelkin and A. K. Singh. Fighting opinion control in social networks via link recommendation. KDD 2019]. [Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018],. [Zhu, Bao, and Zhang. Minimizing Polarization and Disagreement in Social Networks via Link Recommendation. NeurIPS 2021]. 6/15/22 SIGMOD 2022, Philadelphia, PA 85
  • 84. Interdisciplinary Approach • Comprehensive solution goes beyond CS: e.g., Polarization Lab https://www.polarizationlab.com • Interdisciplinary (CS, stats, sociology) approach. • Real-life experiment by recruiting democrat and republican volunteers incentivized to follow bots tweeting posts initially aligned with their ideology but gradually from the other side of the aisle. • Complemented with offline tracking and study. [Bail. Breaking the Social Media Prism. Princeton Univ. Press. 2021]. 6/15/22 SIGMOD 2022, Philadelphia, PA 86
  • 85. Balancing via Information Campaigns • Smart Algorithm Bursts Social Networks' "Filter Bubbles" • “Instead of building echo chambers, Facebook, Twitter and company can tweak their code to broaden exposure to wider ranges of views.” • “… results suggest that targeting a strategic group of social media users and feeding them the right content is more effective for propagating diverse views through a social media network …” 6/15/22 SIGMOD 2022, Philadelphia, PA 87 [IEEE Spectrum Jan 2021. Featuring research of Aslay, Matakos, Galbrun, and Gionis. TKDE 2020].
  • 86. Balancing via Information Campaigns • Information Campaign Approach [Garimella, Gionis, Parotsidis, and Tatti. Balancing information exposure in social networks. NeurIPS 2018]. [Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020]. [Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020]. • Common assumptions: • awareness = adoption. • Adoption of opposing views is independent. 6/15/22 SIGMOD 2022, Philadelphia, PA 88
  • 87. Opinions can have complex interaction 6/15/22 SIGMOD 2022, Philadelphia, PA 89 Adopted and propagated independently?! The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated. Source: https://newsinteractives.cbc.ca/elections/federal/2021/party-platforms/#section-climate-change
  • 88. Opinions can have complex interaction 6/15/22 SIGMOD 2022, Philadelphia, PA 90 Pure competition. The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated.
  • 89. Opinions can have complex interaction 6/15/22 SIGMOD 2022, Philadelphia, PA 91 Partial competition. The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated.
  • 90. Opinions can have complex interaction 6/15/22 SIGMOD 2022, Philadelphia, PA 92 Complementation/reinforcement. The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated.
  • 91. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization •Mitigating Filter Bubbles •A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions A useful digression. 6/15/22 SIGMOD 2022, Philadelphia, PA 93
  • 92. Awareness vs adoption Higher utility!! Awareness spreads like epidemic, but adoption depends on UTILITY [Kalish. A new product adoption model with price advertising and uncertainty, Management Science 1985]. 6/15/22 SIGMOD 2022, Philadelphia, PA 94
  • 93. Complementary (aka Reinforcing) Campaigns 6/15/22 SIGMOD 2022, Philadelphia, PA 95
  • 94. Welfare Maximization: complementary setting • Problem: Given social network G = (V,E), propagation model, item utility model, and budget vector. Find an allocation of seed nodes to items that maximizes the expected social welfare. Expected sum of utilities of itemsets adopted by users. 6/15/22 SIGMOD 2022, Philadelphia, PA 96
  • 95. What does the theory say? 6/15/22 SIGMOD 2022, Philadelphia, PA 97 [Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
  • 96. A simple greedy still works 98 GREEDY ALGORITHM Does not require specific utility-parameters as input (1 − $ % ) approximation 6/15/22 SIGMOD 2022, Philadelphia, PA [Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
  • 97. Prefix-preserving seed selection - PRIMA 1 − # $ %&'()*+ 1 − # $ %&'(# ,# ,- 1 − # $ %&'(- ,)*+ = max 2 b2 Select enough samples corresponding to every budget of the budget vector ○ Challenge: The number of samples required is not monotone in budget 99 6/15/22 SIGMOD 2022, Philadelphia, PA [Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
  • 98. 6/15/22 SIGMOD 2022, Philadelphia, PA 100 Competing Campaigns
  • 99. Welfare Maximization: competing setting • Problem: Given social network G = (V,E), propagation model, item utility model, budget vector, and a fixed (partial) allocation of seed nodes to items, find an allocation of seed nodes to items that maximizes the expected social welfare. Expected sum of utilities of itemsets adopted by users. 6/15/22 SIGMOD 2022, Philadelphia, PA 101
  • 100. How hard is (the) competition? 6/15/22 SIGMOD 2022, Philadelphia, PA 102 [Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
  • 101. [Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021]. General case algorithm - SeqGRD !" !# $# $% $" • Instance dependent approximation : &'() &'*+ (- − - / )123 • Sort the items based on their utilities – {$# > $% > ⋯ > $"} !% … … ∑!9 6/15/22 SIGMOD 2022, Philadelphia, PA 103 $":; = max exp. utility of any bundle. $"9<= exp. min utility of any item. PRIMA+.
  • 102. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization •Mitigating Filter Bubbles • A User Utility Perspective •A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 104
  • 103. Filter bubble problem YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! NAY! • Items (opinions) are complementary objective-wise • Items (opinions) are competing propagation-wise [Garrett Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal of computer-mediated communication 2009]. [Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020]. 6/15/22 SIGMOD 2022, Philadelphia, PA 105
  • 104. Problem: Key Ingredients §Competition parameter § After being influenced, adopt the second item w.p. = !, 0 ≤ ! < 1 §(Host’s) Reward of adoption is supermodular, models complementarity § &, for the first item § & + Δ, for the second item, & < Δ §Expected (host) utility for user adopting both & + !Δ §Goal is to maximize the sum of utilities under a competition- driven diffusion 6/15/22 SIGMOD 2022, Philadelphia, PA 106 [Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
  • 105. Filter bubble mitigation • There is an existing bubble • A more general setting 107 Item A Problem FB Mitigation (FBM): Given graph ! = #, %, & , competition parameter ', 0 < ' < 1, fixed A seeds +,, and budget -, find B seeds +., such that +. ≤ - and the expected welfare is maximized. 6/15/22 SIGMOD 2022, Philadelphia, PA
  • 106. Inherent Challenges – Strike One • FBM is neither monotone nor submodular. • Restricted (sequential) setting: propagation of follower doesn’t start before that of leader ends. FBM in the sequential setting is monotone and submodular! J • But wait! FBM can be arbitrarily worse than FBM$%& and vice versa! L 6/15/22 SIGMOD 2022, Philadelphia, PA 108
  • 107. Another Attempt 6/15/22 SIGMOD 2022, Philadelphia, PA 109 Item A First Level Competition Item B • Expected reward at each FLC node = ! + #Δ. Surrogate objective: Expected # FLC nodes × (! + #Δ). • Clearly a lower bound for FBM. • But the FLC objective is neither monotone nor submodular.
  • 108. Algorithm 1 – SPReadGRD • Greedily selects B seeds that maximize the marginal spread • Ignore the welfare objective • PRIMA+ is used to do the seed selection • Given fixed !" , PRIMA selects !# , such that • %(!# ∪ !" ) = 1 − , - − . %(!#∗ ∪ !" ) 6/15/22 SIGMOD 2022, Philadelphia, PA 110
  • 109. Analyzing SpreadGRD • Given !, for the welfare function # the following holds: • $% ! ≤ # ! ≤ $ + (Δ %(!) • SPRGRD therefore has the following bound: # !, ∪ !. ≥ $ ⋅ % !, ∪ !. ≥ $ ⋅ 1 − 1 3 − 4 ⋅ % !, ∪ !∗ ≥ $ (Δ + $ (1 − 1 3 − 4)#(!, ∪ !∗ ) 6/15/22 SIGMOD 2022, Philadelphia, PA 111
  • 110. Algorithm 2 – Sandwich • Assume a tattler diffusion model • A node influences its neighbors, with every item in the awareness set • !" # ≥ !(#) • !"(⋅) is monotone and submodular 6/15/22 SIGMOD 2022, Philadelphia, PA 112
  • 111. Algorithm 2 – Sandwich • Assume a tattler diffusion model • !" # ≥ !(#) • Assume diffusion model with ' = ) • !* # ≤ !(#) • !*(⋅) is monotone and submodular 6/15/22 SIGMOD 2022, Philadelphia, PA 113
  • 112. Algorithm 2 – Sandwich • Assume a tattler diffusion model • !" # ≥ !(#) • Assume diffusion model with ' = ) • !* # ≤ !(#) • Using sandwich • Let #,-./ = 0123045678∈ 5:,5,5< !(#,=>) • ! #,-./ ≥ max B 5< B< 5< , B: 5∗ B 5∗ 1 − F G !(#∗ ) 6/15/22 SIGMOD 2022, Philadelphia, PA 114
  • 113. Algorithm 3 - NetRewGRD 115 Item A Item B First Level Competition • Extends state of the sampling for welfare objective • Reverse reachable trees • Recursive weight update using a linear pass • Scales for large networks 6/15/22 SIGMOD 2022, Philadelphia, PA [Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
  • 114. Experiments • Baselines considered: • COEX: Maximizes co-adoptions of both items • TDEM: Maximizes welfare based on leaning scores 116 [Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020]. [Aslay, Matakos, Galbrun, and Gionis. "Maximizing the diversity of exposure in a social network. TKDE 2020] 6/15/22 SIGMOD 2022, Philadelphia, PA
  • 115. Sample of Results - Quality 6/15/22 SIGMOD 2022, Philadelphia, PA 117
  • 116. Sample of Results – Running Time 6/15/22 SIGMOD 2022, Philadelphia, PA 118
  • 117. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective •Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 119
  • 118. Misinformation Mitigation – Prior Art • Influence Blocking • Temporal aspects ignored or not differentiated • Focus on scalability [Ceren, Agrawal, and El Abbadi. "Limiting the spread of misinformation in social networks." WWW 2011], [He, Song, Chen, and Jiang. Influence blocking maximization in social networks under the competitive linear threshold model. SDM 2012], [Song,, Hsu, and Lee. Temporal influence blocking: Minimizing the effect of misinformation in social networks. ICDE 2017], [Tong,Wu, Guo et al. An efficient randomized algorithm for rumor blocking in online social networks." IEEE TNSE 2017], [Tong, Du, and Wu. On misinformation containment in online social networks. NeurIPS 2018], [Simpson, Srinivasan, and Thomo. Reverse Prevention Sampling for Misinformation Mitigation in Social Networks. ICDT 2020]. 6/15/22 SIGMOD 2022, Philadelphia, PA 120
  • 119. Temporal Aspects of Propagation [Vosoughi, Roy, and Aral. The spread of true and false news online. Science 2018] Together these have important consequences for effective seed set selection [Mitchell, Stocking, and Matsa. Long-form reading shows signs of life in our mobile news world. Pew Research Center 2016] Misinformation spreads faster, farther, and wider than truth! Adoption decisions have varying lengths 6/15/22 SIGMOD 2022, Philadelphia, PA 121
  • 120. Temporal Aspects of Propagation [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] • Associate meeting probabilities with each edge • User reaction times sampled from a data-driven distribution t = 0 t = 2 t = 3 t = 6 6/15/22 SIGMOD 2022, Philadelphia, PA 122 Adoption decisions of !", !$, !%, !&, !' uncontested. !( faces a tie; broken with a random permutation, e.g., !', !" . F->3. DW: [3,6]. M->4. Tie!
  • 121. Misinformation Mitigation Problem [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Reward function !(⋅) measures effectiveness of mitigation P1 is not submodular! P1: Given fake seeds %& and reward function !(⋅), find a seed set that maximizes the expected reward 6/15/22 SIGMOD 2022, Philadelphia, PA 123 Truth reaches well before misinfo. Truth arrives too late!
  • 122. Sandwiching the Mitigation Objective [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Observe: Supermodular behavior arises due to joint effect of mitigation seeds, i.e. acting alone they would not achieve the same reward. LB: Maximum reward over singleton seed sets from !" (tight). !" = {%&, %(} LB = *+, -∈{/0,/1} 2(%4, {5}) 6/15/22 SIGMOD 2022, Philadelphia, PA 124
  • 123. Sandwiching the Mitigation Objective [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Simple Candidate: drop meeting events and enforce dominant tie-breaking. Tighter UB: remove meeting events on edges that can be traversed by both sides. !" = {%&, %(} 6/15/22 SIGMOD 2022, Philadelphia, PA 125
  • 124. Importance Sampling [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Observe: only nodes reached by the misinformation are eligible for reward. Idea: only sample roots from nodes that misinfo campaignreaches → tighter bounds! RDR sets: weighted analog to RR sets for reward probabilities 6/15/22 SIGMOD 2022, Philadelphia, PA 126
  • 125. Experiments [M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022] Two settings for selecting misinformation seeds: (1) from top-k influential users and (2) uniformly at random 6/15/22 SIGMOD 2022, Philadelphia, PA 127 Small # popular instigators. Several bots or newly created puppet accounts.
  • 126. Experiments [M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022] Reward distribution dominated by uncontested mitigation adoption 6/15/22 SIGMOD 2022, Philadelphia, PA 128
  • 127. Experiments [M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022] Mitigation seeds remain effective under simultaneous perturbation of model parameters. 6/15/22 SIGMOD 2022, Philadelphia, PA 129
  • 128. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation •Misinformation Intervention • Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 130
  • 129. Intervention Challenges Detectors are fallible Hard vs Soft intervention 6/15/22 SIGMOD 2022, Philadelphia, PA 131
  • 130. Misinformation Intervention – Prior Art • Disadvantaging posts with misleading info, deleting edges, removing nodes, … à too hard? • No correction for wrong intervention! [Farajtabar, Mehrdad, et al. Fake news mitigation via point process based intervention. ICML 2017], [Tong et al. Gelling, and melting, large graphs by edge manipulation. CIKM 2012], [Khalil, Boutros, Dilkina, and Song. "Scalable diffusion-aware optimization of network topology KDD 2014], [Chen, Chen, et al. "Node immunization on large graphs: Theory and algorithms." TKDE 2015], [Medya,, Silva, and Singh. "Approximate Algorithms for Data-driven Influence Limitation." TKDE 2020], [Caraban et al. "23 ways to nudge: A review of technology-mediated nudging in human-computer interaction." SIGCHI 2019], [Caraban, Konstantinou, and Karapanos. "The Nudge Deck: A design support tool for technology-mediated nudging." ACM Designing Interactive Systems Conference. 2020], [Bhuiyan et al. "NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges." CSCW2 2021]. 6/15/22 SIGMOD 2022, Philadelphia, PA 132
  • 131. Cost Aware Intervention [Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] 6/15/22 SIGMOD 2022, Philadelphia, PA 133
  • 132. Reward Function !" #$% − reach of item '" after intervention. !" $()#$% − reach of item '" w/ no intervention. 6/15/22 SIGMOD 2022, Philadelphia, PA 134
  • 133. [Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] Cost Aware Intervention 6/15/22 SIGMOD 2022, Philadelphia, PA 135
  • 134. Experiments [Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] NCB-TS: Neural Contextual Bandits w/ Thompson Sampling CB-TS: Contextual Bandits w/ Thompson Sampling RB: (Learned) Rule based CSC: Cost Sensitive Classification 6/15/22 SIGMOD 2022, Philadelphia, PA 136
  • 135. Experiments [Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] Real-time Evaluation from Twitter’s stream during 10-Oct-2020 to 10-Nov-2020. • 5 million tweets w/ 1800 distinct English news articles • Topics include Politics (32%), Healthcare (26%), Entertainment (30%), Misc. (12%) Manual Evaluation • Random sample of 750 viral and non-viral tweets • 3 volunteers evaluated intervention • Accuracy of 92.1% Automated Evaluation • Google FactCheck Claim Search API • TiKL: That is a Known Lie • Accuracy of 96.6% 6/15/22 SIGMOD 2022, Philadelphia, PA 137
  • 136. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention •Summary & Open Questions 6/15/22 SIGMOD 2022, Philadelphia, PA 138
  • 137. Summary • Efficient detection of dense subgraphs in undirected and directed graphs is useful for finding filter bubbles and groups of actors engaged in spreading misinformation. • In mitigating filter bubbles via information campaigns, competition between viewpoints/opinions cannot be ignored. • In mitigating misinformation, it’s critical to incorporate temporal aspects. • In misinformation intervention, it’s important to watch your step and correct your gait in the face of mistakes. 6/15/22 SIGMOD 2022, Philadelphia, PA 139
  • 138. Open Questions – Detection • Integrating content analysis in going after the “right” densest subgraphs. • Can we detect filter bubbles and groups promoting misinformation as they form? • Longitudinal: (how) do these groups transform over time? 6/15/22 SIGMOD 2022, Philadelphia, PA 140
  • 139. Open Questions – Countering • Multiple campaigns of items involving partial/pure competition, complementation? • How can we learn propagation probabilities, competition parameters, utilities from available propagation traces? • Go beyond expected outcome? E.g., as filter bubbles or misinformation spreading occur, can we counter them? 6/15/22 SIGMOD 2022, Philadelphia, PA 141
  • 140. Open Questions -- • Case studies reflecting the effect of mitigation campaigns on filter bubbles and misinformation diffusion. • Integrating with claim verification and (computational) fact checking efforts. • Incentivizing balance of adoption (in case of filter bubbles) and adoption of truth (in case of misinformation). 6/15/22 SIGMOD 2022, Philadelphia, PA 142
  • 141. Acknowledgments 6/15/22 SIGMOD 2022, Philadelphia, PA 143 Chenhao Ma Farnoosh Hashemi Glenn Bevilacqua Michael Simpson HKU UBC UBC->Oracle UBC Prithu Banerjee Reynold Cheng Saravanan Thirimuruganathan Xiaolin Han UBC ->Oracle HKU QCRI, HBKU HKU Xuemin Lin Wenjie Zhang Yixiang Fang Wei Chen Wei Lu UNSW UNSW CUHK MSRA UBC→LinkedIn
  • 142. 6/15/22 SIGMOD 2022, Philadelphia, PA 144 ந" றி!