Random walks on graphs - link prediction by Rouhollah Nabati

1
Random Walks on Graphs:
An Overview
Rouhollah Nabati, modified and represent
IAUSDJ.ac.ir
Fall, 2016

2
Motivation: Link prediction in social
networks

3
Motivation: Basis for recommendation

4
Motivation: Personalized search

5
Why graphs?
 The underlying data is naturally a graph
 Papers linked by citation
 Authors linked by co-authorship
 Bipartite graph of customers and products
 Web-graph
 Friendship networks: who knows whom

6
What are we looking for
 Rank nodes for a particular query
 Top k matches for “Random Walks” from Citeseer
 Who are the most likely co-authors of “Manuel
Blum”.
 Top k book recommendations for Purna from
Amazon
 Top k websites matching “Sound of Music”
 Top k friend recommendations for Purna when she
joins “Facebook”

7
Talk Outline
 Basic definitions
 Random walks
 Stationary distributions
 Properties
 Perron frobenius theorem
 Electrical networks, hitting and commute times
 Euclidean Embedding
 Applications
 Pagerank
 Power iteration
 Convergencce
 Personalized pagerank
 Rank stability

8
Definitions
 nxn Adjacency matrix A.
 A(i,j) = weight on edge from i to j
 If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric
 nxn Transition matrix P.
 P is row stochastic
 P(i,j) = probability of stepping on node j from node i
= A(i,j)/∑iA(i,j)
 nxn Laplacian Matrix L.
 L(i,j)=∑iA(i,j)-A(i,j)
 Symmetric positive semi-definite for undirected graphs
 Singular

10
Definitions
Adjacency matrix A Transition matrix P
1
1
1
1
1
1/2
1/2
1

11
What is a random walk
1
1/2
1/2
1
t=0

12
1
1/2
1/2
1
1
1/2
1/2
1
t=0 t=1

13
1
1/2
1/2
1
1
1/2
1/2
1
t=0 t=1
1
1/2
1/2
1
t=2

14
1
1/2
1/2
1
1
1/2
1/2
1
t=0 t=1
1
1/2
1/2
1
t=2
1
1/2
1/2
1
t=3

15
Probability Distributions
 xt(i) = probability that the surfer is at node i at time
t
 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)
=∑jxt(j)*P(j,i)
 xt+1 = xtP= xt-1*P*P= xt-2*P*P*P = …=x0 Pt
 What happens when the surfer keeps walking for a
long time?

16
Stationary Distribution
 When the surfer keeps walking for a long time
 When the distribution does not change anymore
 i.e. xT+1 = xT
 For “well-behaved” graphs this does not depend on
the start distribution!!

17
What is a stationary distribution?
Intuitively and Mathematically

18
 The stationary distribution at a node is related to the
amount of time a random walker spends visiting that
node.

19
node.
 Remember that we can write the probability
distribution at a node as
 xt+1 = xtP

20
node.
 xt+1 = xtP
 For the stationary distribution v0 we have
 v0 = v0 P

21
node.
 xt+1 = xtP
 For the stationary distribution v0 we have
 v0 = v0 P
 Whoa! that’s just the left eigenvector of the
transition matrix !

22
Talk Outline
 Random walks
 Properties
 Applications
 Pagerank
 Power iteration
 Convergencce
 Rank stability

23
Interesting questions
 Does a stationary distribution always exist? Is it
unique?
 Yes, if the graph is “well-behaved”.
 What is “well-behaved”?
 We shall talk about this soon.
 How fast will the random surfer approach this
stationary distribution?
 Mixing Time!

24
Well behaved graphs
 Irreducible: There is a path from every node to every
other node.
Irreducible Not irreducible

25
Well behaved graphs
 Aperiodic: The GCD of all cycle lengths is 1. The GCD
is also called period.
AperiodicPeriodicity is 3

26
Implications of the Perron Frobenius
Theorem
 If a markov chain is irreducible and aperiodic then
the largest eigenvalue of the transition matrix will be
equal to 1 and all the other eigenvalues will be strictly
less than 1.
 Let the eigenvalues of P be {σi| i=0:n-1} in non-increasing
order of σi .
 σ0 = 1 > σ1 > σ2 >=……>= σn

27
Implications of the Perron Frobenius
Theorem
 If a markov chain is irreducible and aperiodic then
the largest eigenvalue of the transition matrix will be
equal to 1 and all the other eigenvalues will be strictly
less than 1.
 Let the eigenvalues of P be {σi| i=0:n-1} in non-increasing
order of σi .
 σ0 = 1 > σ1 > σ2 >=……>= σn
 These results imply that for a well behaved graph
there exists an unique stationary distribution.
 More details when we discuss pagerank.

28
Some fun stuff about undirected
graphs
 A connected undirected graph is irreducible
 A connected non-bipartite undirected graph has a
stationary distribution proportional to the degree
distribution!
 Makes sense, since larger the degree of the node
more likely a random walk is to come back to it.

29
Talk Outline
 Random walks
 Properties
 Applications
 Pagerank
 Power iteration
 Convergencce
 Rank stability

30
Proximity measures from random walks
 How long does it take to hit node b in a random walk
starting at node a ? Hitting time.
 How long does it take to hit node b and come back to
node a ? Commute time.
a
b

31
Hitting and Commute times
 Hitting time from node i to node j
 Expected number of hops to hit node j starting at node i.
 Is not symmetric. h(a,b) > h(a,b)
 h(i,j) = 1 + ΣkЄnbs(A) p(i,k)h(k,j)
a
b

32
Hitting and Commute times
 Commute time between node i and j
 Is expected time to hit node j and come back to i
 c(i,j) = h(i,j) + h(j,i)
 Is symmetric. c(a,b) = c(b,a)
a
b

33
Relationship with Electrical
networks1,2
 Consider the graph as a n-node
resistive network.
 Each edge is a resistor of 1 Ohm.
 Degree of a node is number of
neighbors
 Sum of degrees = 2*m
 m being the number of edges
1. Random Walks and Electric Networks , Doyle and Snell, 1984
2. The Electrical Resistance Of A Graph Captures Its Commute And Cover Times, Ashok K. Chandra, Prabhakar Raghavan,
Walter L. Ruzzo, Roman Smolensky, Prasoon Tiwari, 1989

34
Relationship with Electrical networks
 Inject d(i) amp current in
each node
 Extract 2m amp current from
node j.
 Now what is the voltage
difference between i and j ?
i j
3
3
2
2
2
16
4

35
 Whoa!! Hitting time from i to
j is exactly the voltage drop
when you inject respective
degree amount of current in
every node and take out 2*m
from j!
i j
3
3
2
2
2
4
16

36
 Consider neighbors of i i.e. NBS(i)
 Using Kirchhoff's law
d(i) = ΣkЄNBS(A) Φ(i,j) - Φ(k,j)
 Oh wait, that’s also the definition of
hitting time from i to j!
∑∈
+=
)(
),(
)(
1
1),(
iNBSk
jk
id
ji φφ
∑∈
+=
)(
),(),(1),(
iNBSk
jkhkiPjih
16
i j
3
3
2
2
2
41Ω
1Ω

37
Hitting times and Laplacians






























−1
0
.
.
.
.
.
n
j
i
φ
φ
φ
φ
=






























−
−1
0
.
2
.
.
.
.
n
j
i
d
md
d
d
h(i,j) = Φi- Φj
di
dj
-1 -1-1
-1 -1
L

38
i j
16
16
c(i,j) = h(i,j) + h(j,i) = 2m*Reff(i,j)
h(i,j) + h(j,i)
1. The Electrical Resistance Of i Graph Captures Its Commute And Cover Times, Ashok K. Chandra, Prabhakar Raghavan,
Walter L. Ruzzo, Roman Smolensky, Prasoon Tiwari, 1989
1

39
Commute times and Lapacians
C(i,j) = Φi – Φj
= 2m (ei – ej) T
L+
(ei – ej)
= 2m (xi-xj)T
(xi-xj)
xi = (L+
)1/2
ei
L
=




























−
0
.
2
.
.
.
2
.
0
m
m






























−1
0
.
.
.
.
.
n
j
i
φ
φ
φ
φ
di
dj
-1 -1-1
-1 -1

40
Commute times and Laplacians
 Why is this interesting ?
 Because, this gives a very intuitive definition of
embedding the points in some Euclidian space, s.t. the
commute times is the squared Euclidian distances in
the transformed space.1
1. The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering . M. Saerens, et al, ECML ‘04

41
L+
: some other interesting
measures of similarity1
 L+
ij = xi
T
xj = inner product of the position vectors
 L+
ii = xi
T
xi = square of length of position vector of i
 Cosine similarity
jjii
ij
ll
l
++
+
1. A random walks perspective on maximising satisfaction and profit. Matthew Brand, SIAM ‘05

42
Talk Outline
 Random walks
 Properties
 Applications
 Recommender Networks
 Pagerank
 Power iteration
 Convergencce
 Rank stability

43
Recommender Networks1
1. A random walks perspective on maximising satisfaction and profit. Matthew Brand, SIAM ‘05

44
Recommender Networks
 For a customer node i define similarity as
 H(i,j)
 C(i,j)
 Or the cosine similarity
 Now the question is how to compute these quantities
quickly for very large graphs.
 Fast iterative techniques (Brand 2005)
 Fast Random Walk with Restart (Tong, Faloutsos 2006)
 Finding nearest neighbors in graphs (Sarkar, Moore 2007)
++
+
jjii
ij
LL
L

45
Ranking algorithms on the web
 HITS (Kleinberg, 1998) & Pagerank (Page & Brin,
1998)
 We will focus on Pagerank for this talk.
 An webpage is important if other important pages point to it.
 Intuitively
 v works out to be the stationary distribution of the markov
chain corresponding to the web.
∑→
=
ij
out
j
jv
iv
)(deg
)(
)(

46
Pagerank & Perron-frobenius
 Perron Frobenius only holds if the graph is
irreducible and aperiodic.
 But how can we guarantee that for the web graph?
 Do it with a small restart probability c.
 At any time-step the random surfer
 jumps (teleport) to any other node with probability c
 jumps to its direct neighbors with total probability 1-c.
ji
n
cc
ij ,
)(
~
∀=
+−=
1
1
U
UPP

47
Power iteration
 Power Iteration is an algorithm for computing the
stationary distribution.
 Start with any distribution x0
 Keep computing xt+1 = xtP
 Stop when xt+1 and xt are almost the same.

48
Power iteration
 Why should this work?
 Write x0 as a linear combination of the left
eigenvectors {v0, v1, … , vn-1} of P
 Remember that v0 is the stationary distribution.
 x0 = c0v0 + c1v1 + c2v2 + … + cn-1vn-1

49
Power iteration
 Why should this work?
 Write x0 as a linear combination of the left
eigenvectors {v0, v1, … , vn-1} of P
 Remember that v0 is the stationary distribution.
 x0 = c0v0 + c1v1 + c2v2 + … + cn-1vn-1
c0 = 1 . WHY? (slide 71)

50
Power iteration
v0 v1 v2 ……. vn-1
1 c1 c2 cn-1
0x

51
Power iteration
v0 v1 v2 ……. vn-1
σ0 σ1c1 σ2c2 σn-1cn-1
~
01 Pxx =

52
Power iteration
v0 v1 v2 ……. vn-1
σ0
2
σ1
2
c1 σ2
2
c2 σn-1
2
cn-1
2~
0
~
12 PxPxx ==

53
Power iteration
v0 v1 v2 ……. vn-1
σ0
t
σ1
t
c1 σ2
t
c2 σn-1
t
cn-1
t
t
~
0 Pxx =

54
Power iteration
v0 v1 v2 ……. vn-1
1 σ1
t
c1 σ2
t
c2 σn-1
t
cn-1
σ0 = 1 > σ1 ≥…≥ σn
t
t
~
0 Pxx =

55
Power iteration
v0 v1 v2 ……. vn-1
1 0 0 0
σ0 = 1 > σ1 ≥…≥ σn∞x

56
Convergence Issues
 Formally ||x0Pt
– v0|| ≤ |λ|t
 λ is the eigenvalue with second largest magnitude
 The smaller the second largest eigenvalue (in
magnitude), the faster the mixing.
 For λ<1 there exists an unique stationary distribution,
namely the first left eigenvector of the transition
matrix.

57
Pagerank and convergence
 The transition matrix pagerank uses really is
 The second largest eigenvalue of can be proven1
to
be ≤ (1-c)
 Nice! This means pagerank computation will converge
fast.
1. The Second Eigenvalue of the Google Matrix, Taher H. Haveliwala and Sepandar D. Kamvar, Stanford University Technical Report,
2003.
~
P
UP)1(P
~
cc +−=

58
Pagerank
 We are looking for the vector v s.t.
 r is a distribution over web-pages.
 If r is the uniform distribution we get pagerank.
 What happens if r is non-uniform?
crc +−= vP)1(v

59
Pagerank
 We are looking for the vector v s.t.
 r is a distribution over web-pages.
 If r is the uniform distribution we get pagerank.
 What happens if r is non-uniform?
crc +−= vP)1(v
Personalization

60
Personalized Pagerank1,2,3
 The only difference is that we use a non-uniform
teleportation distribution, i.e. at any time step
teleport to a set of webpages.
 In other words we are looking for the vector v s.t.
 r is a non-uniform preference vector specific to an
user.
 v gives “personalized views” of the web.
rvP)1(v cc +−=
1. Scaling Personalized Web Search, Jeh, Widom. 2003
2. Topic-sensitive PageRank, Haveliwala, 2001
3. Towards scaling fully personalized pagerank, D. Fogaras and B. Racz, 2004

61
Personalized Pagerank
 Pre-computation: r is not known from before
 Computing during query time takes too long
 A crucial observation1
is that the personalized
pagerank vector is linear w.r.t r
Scaling Personalized Web Search, Jeh, Widom. 2003










=










=
−+=⇒










−
=
1
0
0
r,
0
0
1
r
)()1()()(
1
0r
20
20 rvrvrv αα
α
α

62
Topic-sensitive pagerank (Haveliwala’01)
 Divide the webpages into 16 broad categories
 For each category compute the biased personalized
pagerank vector by uniformly teleporting to websites
under that category.
 At query time the probability of the query being from
any of the above classes is computed, and the final
page-rank vector is computed by a linear combination
of the biased pagerank vectors computed offline.

63
Personalized Pagerank: Other
Approaches
 Scaling Personalized Web Search (Jeh & Widom ’03)
 Towards scaling fully personalized pagerank:
algorithms, lower bounds and experiments (Fogaras et
al, 2004)
 Dynamic personalized pagerank in entity-relation
graphs. (Soumen Chakrabarti, 2007)

64
Personalized Pagerank (Purna’s Take)
 But, whats the guarantee that the new transition matrix will still
be irreducible?

Check out
 The Second Eigenvalue of the Google Matrix, Taher H. Haveliwala
and Sepandar D. Kamvar, Stanford University Technical Report,
2003.
 Deeper Inside PageRank, Amy N. Langville. and Carl D. Meyer.
Internet Mathematics, 2004.
 As long as you are adding any rank one (where the matrix is a
repetition of one distinct row) matrix of form (1T
r) to your
transition matrix as shown before,
 λ ≤ 1-c

65
Talk Outline
 Random walks
 Properties
 Applications
 Recommender Networks
 Pagerank
 Power iteration
 Convergence
 Rank stability

66
Rank stability
 How does the ranking change when the link structure
changes?
 The web-graph is changing continuously.
 How does that affect page-rank?

67
Rank stability1
(On the Machine Learning papers
from the CORA2
database)
1. Link analysis, eigenvectors, and stability, Andrew Y. Ng, Alice X. Zheng and Michael Jordan, IJCAI-01
2. Automating the contruction of Internet portals with machine learning, A. Mc Callum, K. Nigam, J. Rennie, K. Seymore, In
Information Retrieval Journel, 2000
Rank on 5 perturbed
datasets by deleting
30% of the papers
Rank on the
entire database.

68
Rank stability
 Ng et al 2001:
 Theorem: if v is the left eigenvector of . Let the
pages i1, i2,…, ik be changed in any way, and let v’ be
the new pagerank. Then
 So if c is not too close to 0, the system would be rank
stable and also converge fast!
UPP cc +−= )(
~
1
~
P
c
i
k
j j )(
||'||
∑ =
≤−
1
1
v
vv

69
Conclusion
 Random walks
 Properties
 Applications
 Pagerank
 Power iteration
 Convergencce
 Rank stability

70
Thanks!
Please visit my page at www.rnabati.com

71
Acknowledgements
 Andrew Moore
 Gary Miller
 Check out Gary’s Fall 2007 class on “Spectral Graph Theory,
Scientific Computing, and Biomedical Applications”
 http://www.cs.cmu.edu/afs/cs/user/glmiller/public/Scientific-Computing/F-
 Fan Chung Graham’s course on
 Random Walks on Directed and Undirected Graphs
 http://www.math.ucsd.edu/~phorn/math261/
 Random Walks on Graphs: A Survey, Laszlo Lov'asz
 Reversible Markov Chains and Random Walks on Graphs, D
Aldous, J Fill
 Random Walks and Electric Networks, Doyle & Snell

72
Convergence Issues1
 Lets look at the vectors x for t=1,2,…
 Write x0 as a linear combination of the eigenvectors of
P
 x0 = c0v0 + c1v1 + c2v2 + … + cn-1vn-1
c0 = 1 . WHY?
Remember that 1is the right eigenvector of P with
eigenvalue 1, since P is stochastic. i.e. P*1T
= 1T
. Hence
vi1T
= 0 if i≠0.
1 = x*1T
= c0v0*1T
= c0 . Since v0 and x0 are both
distributions
1. We are assuming that P is diagonalizable. The non-diagonalizable case is trickier, you can take
a look at Fan Chung Graham’s class notes (the link is in the acknowledgements section).

Random walks on graphs - link prediction by Rouhollah Nabati

More Related Content

Similar to Random walks on graphs - link prediction by Rouhollah Nabati (20)

More from nabati (9)

Recently uploaded (20)

Random walks on graphs - link prediction by Rouhollah Nabati

Editor's Notes