randomwalk.ppt

1
Random Walks on Graphs:
An Overview
Purnamrita Sarkar

2
Motivation: Link prediction in social
networks

3
Motivation: Basis for recommendation

4
Motivation: Personalized search

5
Why graphs?
 The underlying data is naturally a graph
 Papers linked by citation
 Authors linked by co-authorship
 Bipartite graph of customers and products
 Web-graph
 Friendship networks: who knows whom

6
What are we looking for
 Rank nodes for a particular query
 Top k matches for “Random Walks” from Citeseer
 Who are the most likely co-authors of “Manuel
Blum”.
 Top k book recommendations for Purna from
Amazon
 Top k websites matching “Sound of Music”
 Top k friend recommendations for Purna when she
joins “Facebook”

7
Talk Outline
 Basic definitions
 Random walks
 Stationary distributions
 Properties
 Perron frobenius theorem
 Electrical networks, hitting and commute times
 Euclidean Embedding
 Applications
 Pagerank
 Power iteration
 Convergencce
 Personalized pagerank
 Rank stability

8
Definitions
 nxn Adjacency matrix A.
 A(i,j) = weight on edge from i to j
 If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric
 nxn Transition matrix P.
 P is row stochastic
 P(i,j) = probability of stepping on node j from node i
= A(i,j)/∑iA(i,j)
 nxn Laplacian Matrix L.
 L(i,j)=∑iA(i,j)-A(i,j)
 Symmetric positive semi-definite for undirected graphs
 Singular

9
Definitions
Adjacency matrix A Transition matrix P
1
1
1
1
1
1/2
1/2
1

10
What is a random walk
1
1/2
1/2
1
t=0

11
1
1/2
1/2
1
1
1/2
1/2
1
t=0 t=1

12
1
1/2
1/2
1
1
1/2
1/2
1
t=0 t=1
1
1/2
1/2
1
t=2

13
1
1/2
1/2
1
1
1/2
1/2
1
t=0 t=1
1
1/2
1/2
1
t=2
1
1/2
1/2
1
t=3

14
Probability Distributions
 xt(i) = probability that the surfer is at node i at time
t
 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)
=∑jxt(j)*P(j,i)
 xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt
 What happens when the surfer keeps walking for a
long time?

15
Stationary Distribution
 When the surfer keeps walking for a long time
 When the distribution does not change anymore
 i.e. xT+1 = xT
 For “well-behaved” graphs this does not depend on
the start distribution!!

16
What is a stationary distribution?
Intuitively and Mathematically

17
 The stationary distribution at a node is related to the
amount of time a random walker spends visiting that
node.

18
node.
 Remember that we can write the probability
distribution at a node as
 xt+1 = xtP

19
node.
 xt+1 = xtP
 For the stationary distribution v0 we have
 v0 = v0 P

20
node.
 xt+1 = xtP
 For the stationary distribution v0 we have
 v0 = v0 P
 Whoa! that’s just the left eigenvector of the
transition matrix !

21
Talk Outline
 Random walks
 Properties
 Applications
 Pagerank
 Power iteration
 Convergencce
 Rank stability

22
Interesting questions
 Does a stationary distribution always exist? Is it
unique?
 Yes, if the graph is “well-behaved”.
 What is “well-behaved”?
 We shall talk about this soon.
 How fast will the random surfer approach this
stationary distribution?
 Mixing Time!

23
Well behaved graphs
 Irreducible: There is a path from every node to every
other node.
Irreducible Not irreducible

24
Well behaved graphs
 Aperiodic: The GCD of all cycle lengths is 1. The GCD
is also called period.
Aperiodic
Periodicity is 3

25
Implications of the Perron Frobenius
Theorem
 If a markov chain is irreducible and aperiodic then
the largest eigenvalue of the transition matrix will be
equal to 1 and all the other eigenvalues will be strictly
less than 1.
 Let the eigenvalues of P be {σi| i=0:n-1} in non-increasing
order of σi .
 σ0 = 1 > σ1 > σ2 >= ……>= σn

26
Implications of the Perron Frobenius
Theorem
 If a markov chain is irreducible and aperiodic then
the largest eigenvalue of the transition matrix will be
equal to 1 and all the other eigenvalues will be strictly
less than 1.
 Let the eigenvalues of P be {σi| i=0:n-1} in non-increasing
order of σi .
 σ0 = 1 > σ1 > σ2 >= ……>= σn
 These results imply that for a well behaved graph
there exists an unique stationary distribution.
 More details when we discuss pagerank.

27
Some fun stuff about undirected
graphs
 A connected undirected graph is irreducible
 A connected non-bipartite undirected graph has a
stationary distribution proportional to the degree
distribution!
 Makes sense, since larger the degree of the node
more likely a random walk is to come back to it.

28
Talk Outline
 Random walks
 Properties
 Applications
 Pagerank
 Power iteration
 Convergencce
 Rank stability

29
Proximity measures from random walks
 How long does it take to hit node b in a random walk
starting at node a ? Hitting time.
 How long does it take to hit node b and come back to
node a ? Commute time.
a
b

30
Hitting and Commute times
 Hitting time from node i to node j
 Expected number of hops to hit node j starting at node i.
 Is not symmetric. h(a,b) > h(a,b)
 h(i,j) = 1 + ΣkЄnbs(A) p(i,k)h(k,j)
a
b

31
Hitting and Commute times
 Commute time between node i and j
 Is expected time to hit node j and come back to i
 c(i,j) = h(i,j) + h(j,i)
 Is symmetric. c(a,b) = c(b,a)
a
b

32
Relationship with Electrical
networks1,2
 Consider the graph as a n-node
resistive network.
 Each edge is a resistor of 1 Ohm.
 Degree of a node is number of
neighbors
 Sum of degrees = 2*m
 m being the number of edges
1. Random Walks and Electric Networks , Doyle and Snell, 1984
2. The Electrical Resistance Of A Graph Captures Its Commute And Cover Times, Ashok K. Chandra, Prabhakar Raghavan,
Walter L. Ruzzo, Roman Smolensky, Prasoon Tiwari, 1989

33
Relationship with Electrical networks
 Inject d(i) amp current in
each node
 Extract 2m amp current from
node j.
 Now what is the voltage
difference between i and j ?
i j
3
3
2
2
2
16
4

34
 Whoa!! Hitting time from i to
j is exactly the voltage drop
when you inject respective
degree amount of current in
every node and take out 2*m
from j!
i j
3
3
2
2
2
4
16

35
 Consider neighbors of i i.e. NBS(i)
 Using Kirchhoff's law
d(i) = ΣkЄNBS(A) Φ(i,j) - Φ(k,j)
 Oh wait, that’s also the definition of
hitting time from i to j!




)
(
)
,
(
)
(
1
1
)
,
(
i
NBS
k
j
k
i
d
j
i 





)
(
)
,
(
)
,
(
1
)
,
(
i
NBS
k
j
k
h
k
i
P
j
i
h
16
i j
3
3
2
2
2
4
1Ω
1Ω

36
Hitting times and Laplacians






























1
0
.
.
.
.
.
n
j
i




=































1
0
.
2
.
.
.
.
n
j
i
d
m
d
d
d
h(i,j) = Φi- Φj
di
dj
-1 -1
-1
-1 -1
L

37
i j
16
16
c(i,j) = h(i,j) + h(j,i) = 2m*Reff(i,j)
h(i,j) + h(j,i)
1. The Electrical Resistance Of i Graph Captures Its Commute And Cover Times, Ashok K. Chandra, Prabhakar Raghavan,
Walter L. Ruzzo, Roman Smolensky, Prasoon Tiwari, 1989
1

38
Commute times and Lapacians
C(i,j) = Φi – Φj
= 2m (ei – ej) TL+ (ei – ej)
= 2m (xi-xj)T(xi-xj)
xi = (L+)1/2 ei
L
=































0
.
2
.
.
.
2
.
0
m
m






























1
0
.
.
.
.
.
n
j
i




di
dj
-1 -1
-1
-1 -1

39
Commute times and Laplacians
 Why is this interesting ?
 Because, this gives a very intuitive definition of
embedding the points in some Euclidian space, s.t. the
commute times is the squared Euclidian distances in
the transformed space.1
1. The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering . M. Saerens, et al, ECML ‘04

40
L+ : some other interesting
measures of similarity1
 L+
ij = xi
Txj = inner product of the position vectors
 L+
ii = xi
Txi = square of length of position vector of i
 Cosine similarity
jj
ii
ij
l
l
l



1. A random walks perspective on maximising satisfaction and profit. Matthew Brand, SIAM ‘05

41
Talk Outline
 Random walks
 Properties
 Applications
 Recommender Networks
 Pagerank
 Power iteration
 Convergencce
 Rank stability

42
Recommender Networks1
1. A random walks perspective on maximising satisfaction and profit. Matthew Brand, SIAM ‘05

43
Recommender Networks
 For a customer node i define similarity as
 H(i,j)
 C(i,j)
 Or the cosine similarity
 Now the question is how to compute these quantities
quickly for very large graphs.
 Fast iterative techniques (Brand 2005)
 Fast Random Walk with Restart (Tong, Faloutsos 2006)
 Finding nearest neighbors in graphs (Sarkar, Moore 2007)



jj
ii
ij
L
L
L

44
Ranking algorithms on the web
 HITS (Kleinberg, 1998) & Pagerank (Page & Brin,
1998)
 We will focus on Pagerank for this talk.
 An webpage is important if other important pages point to it.
 Intuitively
 v works out to be the stationary distribution of the markov
chain corresponding to the web.



i
j
out
j
j
v
i
v
)
(
deg
)
(
)
(

45
Pagerank & Perron-frobenius
 Perron Frobenius only holds if the graph is
irreducible and aperiodic.
 But how can we guarantee that for the web graph?
 Do it with a small restart probability c.
 At any time-step the random surfer
 jumps (teleport) to any other node with probability c
 jumps to its direct neighbors with total probability 1-c.
j
i
n
c
c
ij ,
)
(
~





1
1
U
U
P
P

46
Power iteration
 Power Iteration is an algorithm for computing the
stationary distribution.
 Start with any distribution x0
 Keep computing xt+1 = xtP
 Stop when xt+1 and xt are almost the same.

47
Power iteration
 Why should this work?
 Write x0 as a linear combination of the left
eigenvectors {v0, v1, … , vn-1} of P
 Remember that v0 is the stationary distribution.
 x0 = c0v0 + c1v1 + c2v2 + … + cn-1vn-1

48
Power iteration
 Why should this work?
 Write x0 as a linear combination of the left
eigenvectors {v0, v1, … , vn-1} of P
 Remember that v0 is the stationary distribution.
 x0 = c0v0 + c1v1 + c2v2 + … + cn-1vn-1
c0 = 1 . WHY? (slide 71)

49
Power iteration
v0 v1 v2 ……. vn-1
1 c1 c2 cn-1
0
x

50
Power iteration
v0 v1 v2 ……. vn-1
σ0 σ1c1 σ2c2 σn-1cn-1
~
0
1 P
x
x 

51
Power iteration
v0 v1 v2 ……. vn-1
σ0
2 σ1
2c1 σ2
2c2 σn-1
2cn-1
2
~
0
~
1
2 P
x
P
x
x 


52
Power iteration
v0 v1 v2 ……. vn-1
σ0
t σ1
t c1 σ2
t c2 σn-1
t cn-1
t
t
~
0 P
x
x 

53
Power iteration
v0 v1 v2 ……. vn-1
1 σ1
t c1 σ2
t c2 σn-1
t cn-
1
σ0 = 1 > σ1 ≥…≥ σn
t
t
~
0 P
x
x 

54
Power iteration
v0 v1 v2 ……. vn-1
1 0 0 0
σ0 = 1 > σ1 ≥…≥ σn

x

55
Convergence Issues
 Formally ||x0Pt – v0|| ≤ |λ|t
 λ is the eigenvalue with second largest magnitude
 The smaller the second largest eigenvalue (in
magnitude), the faster the mixing.
 For λ<1 there exists an unique stationary distribution,
namely the first left eigenvector of the transition
matrix.

56
Pagerank and convergence
 The transition matrix pagerank uses really is
 The second largest eigenvalue of can be proven1
to be ≤ (1-c)
 Nice! This means pagerank computation will converge
fast.
1. The Second Eigenvalue of the Google Matrix, Taher H. Haveliwala and Sepandar D. Kamvar, Stanford University Technical
Report, 2003.
~
P
U
P
)
1
(
P
~
c
c 



57
Pagerank
 We are looking for the vector v s.t.
 r is a distribution over web-pages.
 If r is the uniform distribution we get pagerank.
 What happens if r is non-uniform?
cr
c 

 vP
)
1
(
v

58
Pagerank
 We are looking for the vector v s.t.
 r is a distribution over web-pages.
 If r is the uniform distribution we get pagerank.
 What happens if r is non-uniform?
cr
c 

 vP
)
1
(
v
Personalization

59
Personalized Pagerank1,2,3
 The only difference is that we use a non-uniform
teleportation distribution, i.e. at any time step
teleport to a set of webpages.
 In other words we are looking for the vector v s.t.
 r is a non-uniform preference vector specific to an
user.
 v gives “personalized views” of the web.
r
vP
)
1
(
v c
c 


1. Scaling Personalized Web Search, Jeh, Widom. 2003
2. Topic-sensitive PageRank, Haveliwala, 2001
3. Towards scaling fully personalized pagerank, D. Fogaras and B. Racz, 2004

60
Personalized Pagerank
 Pre-computation: r is not known from before
 Computing during query time takes too long
 A crucial observation1 is that the personalized
pagerank vector is linear w.r.t r
Scaling Personalized Web Search, Jeh, Widom. 2003






































1
0
0
r
,
0
0
1
r
)
(
)
1
(
)
(
)
(
1
0
r
2
0
2
0 r
v
r
v
r
v 




61
Topic-sensitive pagerank (Haveliwala’01)
 Divide the webpages into 16 broad categories
 For each category compute the biased personalized
pagerank vector by uniformly teleporting to websites
under that category.
 At query time the probability of the query being from
any of the above classes is computed, and the final
page-rank vector is computed by a linear combination
of the biased pagerank vectors computed offline.

62
Personalized Pagerank: Other
Approaches
 Scaling Personalized Web Search (Jeh & Widom ’03)
 Towards scaling fully personalized pagerank:
algorithms, lower bounds and experiments (Fogaras et
al, 2004)
 Dynamic personalized pagerank in entity-relation
graphs. (Soumen Chakrabarti, 2007)

63
Personalized Pagerank (Purna’s Take)
 But, whats the guarantee that the new transition matrix will still
be irreducible?
 Check out
 The Second Eigenvalue of the Google Matrix, Taher H. Haveliwala
and Sepandar D. Kamvar, Stanford University Technical Report,
2003.
 Deeper Inside PageRank, Amy N. Langville. and Carl D. Meyer.
Internet Mathematics, 2004.
 As long as you are adding any rank one (where the matrix is a
repetition of one distinct row) matrix of form (1Tr) to your
transition matrix as shown before,
 λ ≤ 1-c

64
Talk Outline
 Random walks
 Properties
 Applications
 Recommender Networks
 Pagerank
 Power iteration
 Convergence
 Rank stability

65
Rank stability
 How does the ranking change when the link structure
changes?
 The web-graph is changing continuously.
 How does that affect page-rank?

66
Rank stability1 (On the Machine Learning papers
from the CORA2 database)
1. Link analysis, eigenvectors, and stability, Andrew Y. Ng, Alice X. Zheng and Michael Jordan, IJCAI-01
2. Automating the contruction of Internet portals with machine learning, A. Mc Callum, K. Nigam, J. Rennie, K. Seymore, In
Information Retrieval Journel, 2000
Rank on 5 perturbed
datasets by deleting
30% of the papers
Rank on the
entire database.

67
Rank stability
 Ng et al 2001:
 Theorem: if v is the left eigenvector of . Let the
pages i1, i2,…, ik be changed in any way, and let v’ be
the new pagerank. Then
 So if c is not too close to 0, the system would be rank
stable and also converge fast!
U
P
P c
c 

 )
(
~
1
~
P
c
i
k
j j )
(
||
'
||
 


1
1
v
v
v

68
Conclusion
 Random walks
 Properties
 Applications
 Pagerank
 Power iteration
 Convergencce
 Rank stability

69
Thanks!
Please send email to Purna at
psarkar@cs.cmu.edu with questions,
suggestions, corrections 

70
Acknowledgements
 Andrew Moore
 Gary Miller
 Check out Gary’s Fall 2007 class on “Spectral Graph Theory,
Scientific Computing, and Biomedical Applications”
 http://www.cs.cmu.edu/afs/cs/user/glmiller/public/Scientific-
Computing/F-07/index.html
 Fan Chung Graham’s course on
 Random Walks on Directed and Undirected Graphs
 http://www.math.ucsd.edu/~phorn/math261/
 Random Walks on Graphs: A Survey, Laszlo Lov'asz
 Reversible Markov Chains and Random Walks on Graphs, D
Aldous, J Fill
 Random Walks and Electric Networks, Doyle & Snell

71
Convergence Issues1
 Lets look at the vectors x for t=1,2,…
 Write x0 as a linear combination of the eigenvectors of
P
 x0 = c0v0 + c1v1 + c2v2 + … + cn-1vn-1
c0 = 1 . WHY?
Remember that 1is the right eigenvector of P with
eigenvalue 1, since P is stochastic. i.e. P*1T = 1T. Hence
vi1T = 0 if i≠0.
1 = x*1T = c0v0*1T = c0 . Since v0 and x0 are both
distributions
1. We are assuming that P is diagonalizable. The non-diagonalizable case is trickier, you can take a
look at Fan Chung Graham’s class notes (the link is in the acknowledgements section).

randomwalk.ppt

More Related Content

What's hot (20)

Similar to randomwalk.ppt (20)

Recently uploaded (20)

randomwalk.ppt