SlideShare a Scribd company logo
Absorbing	
  Random	
  Walk	
  Centrality	
  
Theory	
  and	
  Algorithms	
  
1	
  
Harry	
  Mavroforakis,	
  Boston	
  University	
  
Michael	
  Mathioudakis,	
  Aalto	
  University	
  
ArisAdes	
  Gionis,	
  Aalto	
  University	
  
Helsinki	
  -­‐	
  September	
  15th	
  2015	
  
2	
  
submit	
  query	
  to	
  
twiKer	
  
e.g.	
  ‘#ferguson’	
  
want	
  to	
  see	
  
messages	
  from	
  	
  
few,	
  central	
  users	
  
many	
  and	
  different	
  
groups	
  of	
  people	
  
might	
  be	
  posAng	
  
about	
  it	
  
one	
  approach...	
  
	
  
represent	
  acAvity	
  
with	
  graph	
  
3	
  
users	
  in	
  the	
  results	
  
(query	
  nodes)	
  
other	
  users	
  
connecAons	
  
4	
  
select	
  k	
  
central	
  query	
  nodes	
  
what	
  is	
  ‘central’?	
  
many	
  measures	
  
have	
  been	
  studied	
  
here,	
  we	
  use	
  
random	
  walks	
  for	
  
robustness	
  
absorbing	
  random	
  walk	
  centrality	
  
absorbing	
  random	
  walk	
  centrality	
  
5	
  
model	
  
start	
  
from	
  query	
  node	
  q	
  w.p.	
  s(q)	
  
re-­‐start	
  
with	
  probability	
  α	
  at	
  each	
  step	
  
transiAons	
  
follow	
  edges	
  at	
  random	
  from	
  one	
  node	
  to	
  another	
  
	
  
absorpAon	
  
we	
  can	
  designate	
  set	
  of	
  absorbing	
  nodes	
  
no	
  escape	
  from	
  absorbing	
  nodes	
  
	
  
centrality	
  of	
  nodes	
  S	
  
expected	
  Ame	
  (number	
  of	
  steps)	
  unAl	
  absorbed	
  by	
  S	
  
problem	
  
input	
  
graph	
  ,	
  query	
  nodes	
  ,	
  α	
  
&	
  candidate	
  nodes	
  
(e.g.	
  query	
  nodes	
  or	
  all	
  nodes)	
  
	
  
select	
  	
  
k	
  candidate	
  nodes	
  
with	
  minimum	
  absorpAon	
  Ame	
  
6	
  
7	
  
select	
  nodes	
  that	
  
are	
  central	
  w.r.t.	
  	
  
the	
  query	
  nodes	
  
k	
  =	
  1	
  
8	
  
select	
  nodes	
  that	
  
are	
  central	
  w.r.t.	
  	
  
the	
  query	
  nodes	
  
k	
  =	
  3	
  
9	
  
select	
  nodes	
  that	
  
are	
  central	
  w.r.t.	
  	
  
the	
  query	
  nodes	
  
k	
  ≥|Q|	
  
outline	
  
•  complexity	
  
•  greedy	
  algorithm	
  
– naive	
  greedy	
  
– speed-­‐up	
  
•  heurisAcs	
  &	
  baselines	
  
•  empirical	
  evaluaAon	
  
	
  
10	
  
complexity	
  
the	
  problem	
  is	
  NP-­‐hard	
  
reducAon	
  from	
  	
  vertex cover
11	
  
approximaAon	
  
centrality	
  measure	
  
monotonicity	
  
absorpAon	
  Ame	
  decreases	
  with	
  
more	
  absorbing	
  nodes	
  
supermodularity	
  
diminishing	
  returns	
  
12	
  
approximaAon	
  
centrality	
  gain	
  
mc:	
  min	
  centrality	
  for	
  k=1	
  
gain	
  =	
  mc	
  -­‐	
  centrality,	
  k>1	
  
non-­‐negaAve,	
  non-­‐
decreasing,	
  submodular	
  
13	
  
S	
  υ {u}	
  	
   S	
  υ	
  {u,v}	
  S	
  
greedy	
  algorithm	
  
(1-1/e)-­‐approximaAon	
  guarantee	
  for	
  gain	
  
greedy	
  
S	
  =	
  empty	
  
for	
  i	
  =	
  1..k	
  
	
  for	
  u	
  in	
  V	
  -­‐	
  S	
  
	
   	
  calculate	
  centrality	
  of	
  S	
  υ {u}	
  (*)	
  
	
  update	
  S	
  :=	
  S	
  υ {best	
  u}	
  
14	
  
boKleneck	
  is	
  in	
  line	
  (*)	
  
one	
  matrix	
  inversion	
  	
  (super-­‐quadraAc)	
  for	
  step	
  (*)	
  
use	
  sherman-morrison	
  to	
  perform	
  (*)	
  in	
  O(n2)	
  
with	
  one	
  (1)	
  inversion	
  for	
  first	
  node	
  
	
  
however...	
  sAll	
  O(kn3)	
  
heurisAcs	
  &	
  baselines	
  
•  personalized	
  pagerank	
  with	
  same	
  α	
  
•  spectralQ	
  &	
  spectralD	
  
– spectral	
  embedding	
  of	
  nodes	
  
– k-­‐means	
  on	
  embedding	
  of	
  query	
  nodes	
  
– select	
  candidates	
  close	
  to	
  centers	
  
– spectral	
  Q	
  selects	
  more	
  nodes	
  from	
  larger	
  clusters	
  
•  spectralC	
  
– similar	
  to	
  spectralQ	
  but	
  clustering	
  on	
  candidates	
  
•  degree	
  &	
  distance	
  centrality	
  
15	
  
evaluaAon	
  
16	
  
TABLE I: Dataset statistics
small
Dataset |V | |E|
karate 34 78
dolphins 62 159
lesmis 77 254
adjnoun 112 425
football 115 613
large
Dataset |V | |E|
kddCoauthors 2 891 2 891
livejournal 3 645 4 141
ca-GrQc 5 242 14 496
ca-HepTh 9 877 25 998
roadnet 10 199 13 932
oregon-1 11 174 23 409
Degree and distance centrality. Finally, we consider the
standard degree and distance centrality measures.
Degree returns the k highest-degree nodes. Note that this
baseline is oblivious to query nodes Q.
with q
datasets
Final
starting
Implem
with ex
Intel X
C. Res
Figu
algorith
better).
of the fi
other tw
data	
  
cannot	
  run	
  greedy	
  on	
  these	
  
input	
  
graphs	
  from	
  previous	
  datasets	
  
	
  
query	
  nodes:	
  planted	
  spheres	
  
k	
  spheres	
  (k	
  =	
  1)	
  
radius	
  ρ	
  (ρ	
  =	
  2	
  or	
  3)	
  
s	
  special	
  nodes	
  inside	
  spheres	
  (s	
  =	
  10	
  or	
  20)	
  
17	
  
α	
  =	
  0.15	
  
small	
  graphs	
  
18	
  dolphins	
  
small	
  graphs	
  
19	
  adjnoun	
  
small	
  graphs	
  
20	
  karate	
  
large	
  graphs	
  
21	
  oregon	
  
large	
  graphs	
  
22	
  livejournal	
  
large	
  graphs	
  
23	
  roadnet	
  
conclusions	
  
	
  
complex	
  problem,	
  
greedy	
  algorithm	
  is	
  expensive	
  
personalized	
  pagerank	
  is	
  good	
  alternaAve	
  
	
  
future	
  work	
  
expansion	
  strategies	
  
comparison	
  with	
  more	
  alternaAves	
  
parallel	
  implementaAon	
  
	
  
	
   24	
  
The End
25	
  
26	
  
where the inequality comes from the fact that a path in GX
passing from Z and being absorbed by X corresponds to a
shorter path in GY being absorbed by Y .
B. Proposition 5
Proposition Let Ci 1 be a set of i 1 absorbing nodes,
Pi 1 the corresponding transition matrix, and Fi 1 = (I
Pi 1) 1
. Let Ci = Ci 1 [ {u}. Given Fi 1, the centrality
score acQ(Ci) can be computed in time O(n2
).
The proof makes use of the following lemma.
Lemma 1 (Sherman-Morrison Formula [7]) Let M be a
square n⇥n invertible matrix and M 1
its inverse. Moreover,
let a and b be any two column vectors of size n. Then, the
following equation holds
(M + abT
) 1
= M 1
M 1
abT
M 1
/(1 + bT
M 1
a).
By a direct a
can compute
cost of O(n2
Fi =
We have thu
and therefore
C. Propositio
Proposition
correspondin
C0
= C {
score acQ(C
Proof: T
Again we ass
Puu = 0 for
two sets of a
path in GX
sponds to a
bing nodes,
i 1 = (I
e centrality
t M be a
. Moreover,
. Then, the
By a direct application of Lemma 1, it is easy to see that we
can compute Fi from Fi 1 with the following formula, at a
cost of O(n2
) operations.
Fi = Fi 1 (Fi 1a)(bT
Fi 1)/(1 + bT
(Fi 1a))
We have thus shown that, given Fi 1, we can compute Fi,
and therefore acQ
(Ci) as well, in O(n2
).
C. Proposition 6
Proposition Let C be a set of absorbing nodes, P the
corresponding transition matrix, and F = (I P) 1
. Let
C0
= C {v} [ {u}, for u, v 2 C. Given F, the centrality
score acQ(C0
) can be computed in time O(n2
).
Proof: The proof is similar to the proof of Proposition 5.
score acQ(Ci) can be computed in time O(n2
).
The proof makes use of the following lemma.
Lemma 1 (Sherman-Morrison Formula [7]) Let M be a
square n⇥n invertible matrix and M 1
its inverse. Moreover,
let a and b be any two column vectors of size n. Then, the
following equation holds
(M + abT
) 1
= M 1
M 1
abT
M 1
/(1 + bT
M 1
a).
Proof: (Proposition 5) Without loss of generality, let the
set of absorbing nodes be Ci 1 = {1, 2, . . . , i 1}. For
simplicity, assume no self-loops for non-absorbing nodes, i.e.,
Pu,u = 0 for u 2 V Ci 1. As in Section VI, the expected
number of steps before absorption is given by the formulas
acQ
(Ci 1) = sT
Q
Fi 11,
with Fi 1 = A 1
i 1 and Ai 1 = I Pi 1.
We proceed to show how to increase the set of absorbing nodes
by one and calculate the new absorption time by updating Fi 1
in O(n2
). Without loss of generality, suppose we add node i
to the absorbing nodes Ci 1, so that
Ci = Ci 1 [ {i} = {1, 2, . . . , i 1, i}.
Let Pi be the transition matrix over G with absorbing nodes
C. Proposition 6
Proposition Let C be
corresponding transition
C0
= C {v} [ {u}, f
score acQ(C0
) can be c
Proof: The proof is
Again we assume no sel
Puu = 0 for u 2 V
two sets of absorbing no
C = {
C0
= {
Let P0
be the transition
absorbing centrality for t
C0
is expressed as a fun
F = A 1
,
F0
= A0
Notice that
A0
A = (I
=
2
6
6
4
0(i 1
pi,1 . . .
pi+1,0 . . .
0(n i
where pi,j denotes the
node j in a transition ma

More Related Content

PDF
Multicasting in Linear Deterministic Relay Network by Matrix Completion
PDF
The low-rank basis problem for a matrix subspace
PDF
Regret Minimization in Multi-objective Submodular Function Maximization
PDF
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
PDF
Core–periphery detection in networks with nonlinear Perron eigenvectors
PDF
Nonconvex Compressed Sensing with the Sum-of-Squares Method
PDF
Maximizing Submodular Function over the Integer Lattice
PPT
Chap8 new
Multicasting in Linear Deterministic Relay Network by Matrix Completion
The low-rank basis problem for a matrix subspace
Regret Minimization in Multi-objective Submodular Function Maximization
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Core–periphery detection in networks with nonlinear Perron eigenvectors
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Maximizing Submodular Function over the Integer Lattice
Chap8 new

What's hot (20)

PDF
Linear programming in computational geometry
PDF
Pseudo Random Number Generators
PDF
Approximation Algorithms
PPTX
Presentation of daa on approximation algorithm and vertex cover problem
PDF
Tutorial of topological_data_analysis_part_1(basic)
PPT
Electromagnetic fields
PDF
PDF
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
PDF
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
PDF
Weight enumerators of block codes and the mc williams
PDF
Analytic construction of points on modular elliptic curves
PDF
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
PDF
On approximating the Riemannian 1-center
PPTX
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
PDF
Computation of electromagnetic_fields_scattered_from_dielectric_objects_of_un...
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
An Introduction to Elleptic Curve Cryptography
PDF
Projectors and Projection Onto a Line
PPTX
Elliptic Curve Cryptography
Linear programming in computational geometry
Pseudo Random Number Generators
Approximation Algorithms
Presentation of daa on approximation algorithm and vertex cover problem
Tutorial of topological_data_analysis_part_1(basic)
Electromagnetic fields
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
Weight enumerators of block codes and the mc williams
Analytic construction of points on modular elliptic curves
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
On approximating the Riemannian 1-center
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
Computation of electromagnetic_fields_scattered_from_dielectric_objects_of_un...
Computation of electromagnetic fields scattered from dielectric objects of un...
An Introduction to Elleptic Curve Cryptography
Projectors and Projection Onto a Line
Elliptic Curve Cryptography
Ad

Viewers also liked (14)

PPTX
Random walk theory
PPT
B.V.Raghunandan-A Random Walk In Indian Marketing
PPTX
A Random Walk Down Wall Street
PPT
Webkdd2006
PPSX
Application of the random walk theory for simulation of flood hazzards jeddah...
PDF
A Random Walk Through Search Research
PDF
Random Walk Theory
PDF
Random Walks, Efficient Markets & Stock Prices
PPT
WIC2006 - Research Paper Recommender Systems: A Random-Walk Based Approach
PPTX
Random walk on Graphs
PPT
Random Walk Theory- Investment
PDF
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
PPT
Forecasting exchange rates 1
PPS
Efficient Market Hypothesis
Random walk theory
B.V.Raghunandan-A Random Walk In Indian Marketing
A Random Walk Down Wall Street
Webkdd2006
Application of the random walk theory for simulation of flood hazzards jeddah...
A Random Walk Through Search Research
Random Walk Theory
Random Walks, Efficient Markets & Stock Prices
WIC2006 - Research Paper Recommender Systems: A Random-Walk Based Approach
Random walk on Graphs
Random Walk Theory- Investment
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
Forecasting exchange rates 1
Efficient Market Hypothesis
Ad

Similar to Absorbing Random Walk Centrality (20)

PDF
Stratified sampling and resampling for approximate Bayesian computation
PDF
Bellmon Ford Algorithm
PPTX
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
PDF
Daa chapter11
PDF
Implicit schemes for wave models
PDF
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
DOC
Unit 3 daa
PDF
Learning Convolutional Neural Networks for Graphs
PDF
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
DOC
algorithm Unit 3
PDF
Kk2518251830
PDF
Kk2518251830
PPTX
Design of sampled data control systems part 2. 6th lecture
PDF
sublabel accurate convex relaxation of vectorial multilabel energies
PDF
Hormann.2001.TPI.pdf
PDF
Reachability Analysis Control of Non-Linear Dynamical Systems
PDF
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
PPTX
Optimisation random graph presentation
PDF
PDF
Graph
Stratified sampling and resampling for approximate Bayesian computation
Bellmon Ford Algorithm
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
Daa chapter11
Implicit schemes for wave models
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
Unit 3 daa
Learning Convolutional Neural Networks for Graphs
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
algorithm Unit 3
Kk2518251830
Kk2518251830
Design of sampled data control systems part 2. 6th lecture
sublabel accurate convex relaxation of vectorial multilabel energies
Hormann.2001.TPI.pdf
Reachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Optimisation random graph presentation
Graph

More from Michael Mathioudakis (10)

PDF
Measuring polarization on social media
PDF
Lecture 07 - CS-5040 - modern database systems
PDF
Lecture 06 - CS-5040 - modern database systems
PDF
Modern Database Systems - Lecture 02
PDF
Modern Database Systems - Lecture 01
PDF
Modern Database Systems - Lecture 00
PDF
Mining the Social Web - Lecture 3 - T61.6020
PDF
Mining the Social Web - Lecture 2 - T61.6020
PDF
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
PDF
Bump Hunting in the Dark - ICDE15 presentation
Measuring polarization on social media
Lecture 07 - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 00
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Bump Hunting in the Dark - ICDE15 presentation

Recently uploaded (20)

PPTX
Pharmacology of Autonomic nervous system
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
The scientific heritage No 166 (166) (2025)
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
2. Earth - The Living Planet earth and life
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Pharmacology of Autonomic nervous system
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
6.1 High Risk New Born. Padetric health ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
7. General Toxicologyfor clinical phrmacy.pptx
The scientific heritage No 166 (166) (2025)
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
INTRODUCTION TO EVS | Concept of sustainability
HPLC-PPT.docx high performance liquid chromatography
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
2. Earth - The Living Planet earth and life
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...

Absorbing Random Walk Centrality

  • 1. Absorbing  Random  Walk  Centrality   Theory  and  Algorithms   1   Harry  Mavroforakis,  Boston  University   Michael  Mathioudakis,  Aalto  University   ArisAdes  Gionis,  Aalto  University   Helsinki  -­‐  September  15th  2015  
  • 2. 2   submit  query  to   twiKer   e.g.  ‘#ferguson’   want  to  see   messages  from     few,  central  users   many  and  different   groups  of  people   might  be  posAng   about  it  
  • 3. one  approach...     represent  acAvity   with  graph   3   users  in  the  results   (query  nodes)   other  users   connecAons  
  • 4. 4   select  k   central  query  nodes   what  is  ‘central’?   many  measures   have  been  studied   here,  we  use   random  walks  for   robustness   absorbing  random  walk  centrality  
  • 5. absorbing  random  walk  centrality   5   model   start   from  query  node  q  w.p.  s(q)   re-­‐start   with  probability  α  at  each  step   transiAons   follow  edges  at  random  from  one  node  to  another     absorpAon   we  can  designate  set  of  absorbing  nodes   no  escape  from  absorbing  nodes     centrality  of  nodes  S   expected  Ame  (number  of  steps)  unAl  absorbed  by  S  
  • 6. problem   input   graph  ,  query  nodes  ,  α   &  candidate  nodes   (e.g.  query  nodes  or  all  nodes)     select     k  candidate  nodes   with  minimum  absorpAon  Ame   6  
  • 7. 7   select  nodes  that   are  central  w.r.t.     the  query  nodes   k  =  1  
  • 8. 8   select  nodes  that   are  central  w.r.t.     the  query  nodes   k  =  3  
  • 9. 9   select  nodes  that   are  central  w.r.t.     the  query  nodes   k  ≥|Q|  
  • 10. outline   •  complexity   •  greedy  algorithm   – naive  greedy   – speed-­‐up   •  heurisAcs  &  baselines   •  empirical  evaluaAon     10  
  • 11. complexity   the  problem  is  NP-­‐hard   reducAon  from    vertex cover 11  
  • 12. approximaAon   centrality  measure   monotonicity   absorpAon  Ame  decreases  with   more  absorbing  nodes   supermodularity   diminishing  returns   12  
  • 13. approximaAon   centrality  gain   mc:  min  centrality  for  k=1   gain  =  mc  -­‐  centrality,  k>1   non-­‐negaAve,  non-­‐ decreasing,  submodular   13   S  υ {u}     S  υ  {u,v}  S   greedy  algorithm   (1-1/e)-­‐approximaAon  guarantee  for  gain  
  • 14. greedy   S  =  empty   for  i  =  1..k    for  u  in  V  -­‐  S      calculate  centrality  of  S  υ {u}  (*)    update  S  :=  S  υ {best  u}   14   boKleneck  is  in  line  (*)   one  matrix  inversion    (super-­‐quadraAc)  for  step  (*)   use  sherman-morrison  to  perform  (*)  in  O(n2)   with  one  (1)  inversion  for  first  node     however...  sAll  O(kn3)  
  • 15. heurisAcs  &  baselines   •  personalized  pagerank  with  same  α   •  spectralQ  &  spectralD   – spectral  embedding  of  nodes   – k-­‐means  on  embedding  of  query  nodes   – select  candidates  close  to  centers   – spectral  Q  selects  more  nodes  from  larger  clusters   •  spectralC   – similar  to  spectralQ  but  clustering  on  candidates   •  degree  &  distance  centrality   15  
  • 16. evaluaAon   16   TABLE I: Dataset statistics small Dataset |V | |E| karate 34 78 dolphins 62 159 lesmis 77 254 adjnoun 112 425 football 115 613 large Dataset |V | |E| kddCoauthors 2 891 2 891 livejournal 3 645 4 141 ca-GrQc 5 242 14 496 ca-HepTh 9 877 25 998 roadnet 10 199 13 932 oregon-1 11 174 23 409 Degree and distance centrality. Finally, we consider the standard degree and distance centrality measures. Degree returns the k highest-degree nodes. Note that this baseline is oblivious to query nodes Q. with q datasets Final starting Implem with ex Intel X C. Res Figu algorith better). of the fi other tw data   cannot  run  greedy  on  these  
  • 17. input   graphs  from  previous  datasets     query  nodes:  planted  spheres   k  spheres  (k  =  1)   radius  ρ  (ρ  =  2  or  3)   s  special  nodes  inside  spheres  (s  =  10  or  20)   17   α  =  0.15  
  • 18. small  graphs   18  dolphins  
  • 19. small  graphs   19  adjnoun  
  • 20. small  graphs   20  karate  
  • 21. large  graphs   21  oregon  
  • 22. large  graphs   22  livejournal  
  • 23. large  graphs   23  roadnet  
  • 24. conclusions     complex  problem,   greedy  algorithm  is  expensive   personalized  pagerank  is  good  alternaAve     future  work   expansion  strategies   comparison  with  more  alternaAves   parallel  implementaAon       24  
  • 26. 26   where the inequality comes from the fact that a path in GX passing from Z and being absorbed by X corresponds to a shorter path in GY being absorbed by Y . B. Proposition 5 Proposition Let Ci 1 be a set of i 1 absorbing nodes, Pi 1 the corresponding transition matrix, and Fi 1 = (I Pi 1) 1 . Let Ci = Ci 1 [ {u}. Given Fi 1, the centrality score acQ(Ci) can be computed in time O(n2 ). The proof makes use of the following lemma. Lemma 1 (Sherman-Morrison Formula [7]) Let M be a square n⇥n invertible matrix and M 1 its inverse. Moreover, let a and b be any two column vectors of size n. Then, the following equation holds (M + abT ) 1 = M 1 M 1 abT M 1 /(1 + bT M 1 a). By a direct a can compute cost of O(n2 Fi = We have thu and therefore C. Propositio Proposition correspondin C0 = C { score acQ(C Proof: T Again we ass Puu = 0 for two sets of a path in GX sponds to a bing nodes, i 1 = (I e centrality t M be a . Moreover, . Then, the By a direct application of Lemma 1, it is easy to see that we can compute Fi from Fi 1 with the following formula, at a cost of O(n2 ) operations. Fi = Fi 1 (Fi 1a)(bT Fi 1)/(1 + bT (Fi 1a)) We have thus shown that, given Fi 1, we can compute Fi, and therefore acQ (Ci) as well, in O(n2 ). C. Proposition 6 Proposition Let C be a set of absorbing nodes, P the corresponding transition matrix, and F = (I P) 1 . Let C0 = C {v} [ {u}, for u, v 2 C. Given F, the centrality score acQ(C0 ) can be computed in time O(n2 ). Proof: The proof is similar to the proof of Proposition 5. score acQ(Ci) can be computed in time O(n2 ). The proof makes use of the following lemma. Lemma 1 (Sherman-Morrison Formula [7]) Let M be a square n⇥n invertible matrix and M 1 its inverse. Moreover, let a and b be any two column vectors of size n. Then, the following equation holds (M + abT ) 1 = M 1 M 1 abT M 1 /(1 + bT M 1 a). Proof: (Proposition 5) Without loss of generality, let the set of absorbing nodes be Ci 1 = {1, 2, . . . , i 1}. For simplicity, assume no self-loops for non-absorbing nodes, i.e., Pu,u = 0 for u 2 V Ci 1. As in Section VI, the expected number of steps before absorption is given by the formulas acQ (Ci 1) = sT Q Fi 11, with Fi 1 = A 1 i 1 and Ai 1 = I Pi 1. We proceed to show how to increase the set of absorbing nodes by one and calculate the new absorption time by updating Fi 1 in O(n2 ). Without loss of generality, suppose we add node i to the absorbing nodes Ci 1, so that Ci = Ci 1 [ {i} = {1, 2, . . . , i 1, i}. Let Pi be the transition matrix over G with absorbing nodes C. Proposition 6 Proposition Let C be corresponding transition C0 = C {v} [ {u}, f score acQ(C0 ) can be c Proof: The proof is Again we assume no sel Puu = 0 for u 2 V two sets of absorbing no C = { C0 = { Let P0 be the transition absorbing centrality for t C0 is expressed as a fun F = A 1 , F0 = A0 Notice that A0 A = (I = 2 6 6 4 0(i 1 pi,1 . . . pi+1,0 . . . 0(n i where pi,j denotes the node j in a transition ma