ch10-graphs2.pdf

Mining of Massive Datasets
Jure Leskovec, Anand Rajaraman, Jeff Ullman
Stanford University
http://www.mmds.org
Note to other teachers and users of these slides: We would be delighted if you found this our
material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org

2
Nodes: FootballTeams
Edges: Games played
Can we identify
node groups?
(communities,
modules, clusters)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

3
NCAA conferences
Nodes: FootballTeams
Edges: Games played

4
Can we identify
functional modules?
Nodes: Proteins
Edges: Physical interactions

5
Functional modules
Nodes: Proteins
Edges: Physical interactions

6
Can we identify
social communities?
Nodes: Facebook Users
Edges: Friendships

7
High school Summer
internship
Stanford (Squash)
Stanford (Basketball)
Social communities
Nodes: Facebook Users
Edges: Friendships

Non-overlapping vs. overlapping communities
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 8

9
Network Adjacency matrix
Nodes
Nodes

What is the structure of community overlaps:
Edge density in the overlaps is higher!
10
Communities as “tiles”

11
This is what we want!
Communities
in a network

1) Given a model, we generate the network:
2) Given a network, find the “best” model
C
A
B
D E
H
F
G
C
A
B
D E
H
F
G
Generative
model for
networks
Generative
model for
networks

Goal: Define a model that can generate
networks
The model will have a set of “parameters” that we
will later want to estimate (and detect communities)
Q: Given a set of nodes, how do communities
“generate” edges of the network?
C
A
B
D E
H
F
G
Generative
model for
networks

Generative model B(V, C, M, {pc}) for graphs:
Nodes V, Communities C, Memberships M
Each community c has a single probability pc
Later we fit the model to networks to detect
communities
14
Model
Network
Communities, C
Nodes,V
Model
pA pB
Memberships, M

AGM generates the links: For each
For each pair of nodes in community ,
we connect them with prob.
The overall edge probability is:
15
Model
∏
∩
∈
−
−
=
v
u M
M
c
c
p
v
u
P )
1
(
1
)
,
(
Network
Communities, C
Nodes,V
Community Affiliations
pA pB
Memberships, M
If , share no communities: ,
Think of this as an “OR” function: If at least 1 community says “YES” we create an edge
… set of communities
node belongs to

16
Model
Network

AGM can express a
variety of community
structures:
Non-overlapping,
Overlapping, Nested
17

Detecting communities with AGM:
19
C
A
B
D E
H
F
G
Given a Graph , find the Model
1) Affiliation graph M
2) Number of communities C
3) Parameters pc

Maximum Likelihood Principle (MLE):
Given: Data
Assumption: Data is generated by some model
… model
… model parameters
Want to estimate :
The probability that our model (with parameters )
generated the data
Now let’s find the most likely model that could have
generated the data: arg max


Imagine we are given a set of coin flips
Task: Figure out the bias of a coin!
Data: Sequence of coin flips: , , , , , , ,
Model: return 1 with prob. Θ, else return 0
What is ? Assuming coin flips are independent
So, ∗ ∗ … ∗
What is ? Simple,
Then,

For example:
. . #

$
. %
What did we learn? Our data was most
likely generated by coin with bias /$

∗
/$

How do we do MLE for graphs?
Model generates a probabilistic adjacency matrix
We then flip all the entries of the probabilistic
matrix to obtain the binary adjacency matrix
The likelihood of AGM generating graph G:
0 0.10 0.10 0.04
0.10 0 0.02 0.06
0.10 0.02 0 0.06
0.04 0.06 0.06 0
0 1 0 0
1 0 1 1
0 1 0 1
0 1 1 0
For every pair
of nodes ,
AGM gives the
prob. of
them being
linked
Flip
biased
coins
))
,
(
1
(
)
,
(
)
|
(
)
,
(
)
,
(
v
u
P
v
u
P
G
P
E
v
u
E
v
u
−
Π
Π
=
Θ
∉
∈

Given graph G(V,E) and Θ, we calculate
likelihood that Θ generated G: P(G|Θ)
0 0.9 0.9 0
0.9 0 0.9 0
0.9 0.9 0 0.9
0 0 0.9 0
Θ=B(V, C, M, {pc})
0 1 1 0
1 0 1 0
1 1 0 1
0 0 1 0
G
P(G|Θ)
))
,
(
1
(
)
,
(
)
|
(
)
,
(
)
,
(
v
u
P
v
u
P
G
P
E
v
u
E
v
u
−
Π
Π
=
Θ
∉
∈
G
23
A B

Our goal: Find ' (, ), , ) such
that:
How do we find ' (, ), , ) that
maximizes the likelihood?
Θ
P( | )
AGM
arg max
Θ

Our goal is to find ' (, ), , ) such that:
arg max
* (,), , )
+ , + ,
∉-

,∈-
Problem: Finding B means finding the
bipartite affiliation network.
There is no nice way to do this.
Fitting ' (, ), , ) is too hard,
let’s change the model (so it is easier to fit)!

Relaxation: Memberships have strengths
/: The membership strength of node
to community (/ : no membership)
Each community links nodes independently:
, 123 / ⋅ /
26
/
u v

Community membership strength matrix /
/
j
Communities
Nodes
/ …
strength of ’s
membership to
/ … vector of
community
membership
strengths of
, 123 / ⋅ /
Probability of connection is
proportional to the product of
strengths
Notice: If one node doesn’t belong to the
community (567 0) then ,
Prob. that at least one common
community ) links the nodes:
, ∏ ) ,
)

Community links nodes , independently:
, 123 / ⋅ /
Then prob. at least one common ) links them:
, ∏ ) ,
)
123 ∑ /) ⋅ /)
)
123 / ⋅ /
;

Example / matrix:
28
/ :
/ :
Then: / ⋅ /
; . #
And: , = . # .
But: , ? . $$
, ?
/? :
Node community
membership strengths
0 1.2 0 0.2
0.5 0 0 0.8
0 1.8 1 0

Task: Given a network @ (, -, estimate /
Find / that maximizes the likelihood:
ABC DA=/ + ,
,∈-
+ ,
, ∉-
where: , 123 / ⋅ /
;
Many times we take the logarithm of the likelihood,
and call it log-likelihood: E / FGH @|/
Goal: Find / that maximizes E /:
29

Compute gradient of a single row / of /:
Coordinate gradient ascent:
Iterate over the rows of /:
Compute gradient JE / of row (while keeping others fixed)
Update the row /: / ← / L M NE /
Project / back to a non-negative vector: If /) O : /)
This is slow! Computing JE / takes linear time! 30
P .. Set out
outgoing neighbors

However, we notice:
We cache ∑ /

So, computing ∑ /
∉P now takes linear time
in the degree |P | of
In networks degree of a node is much smaller to the total
number of nodes in the network, so this is a significant
speedup!
31

BigCLAM takes 5 minutes for 300k node nets
Other methods take 10 days
Can process networks with 100M edges!
32
0
2000
4000
6000
8000
10000
0 100 200 300
Time
(Sec.)
Number of nodes (× 103
)
Link Clustering
Clique Percolation
MMSB
BigCLAM
Parallel BigCLAM

34

Extension:
Make community membership edges directed!
Outgoing membership: Nodes “sends” edges
Incoming membership: Node “receives” edges
35

Everything is almost the same except now
we have 2 matrices: / and Q
/… out-going community memberships
Q… in-coming community memberships
Edge prob.: , = /Q
;

Network log-likelihood:
which we optimize the same way as before
/ Q

38

Overlapping Community Detection at Scale: A Nonnegative Matrix
Factorization Approach by J. Yang, J. Leskovec. ACM International
Conference on Web Search and Data Mining (WSDM), 2013.
Detecting Cohesive and 2-mode Communities in Directed and
Undirected Networks by J. Yang, J. McAuley, J. Leskovec. ACM
International Conference on Web Search and Data Mining (WSDM),
2014.
Community Detection in Networks with Node Attributes by J. Yang,
J. McAuley, J. Leskovec. IEEE International Conference On Data
Mining (ICDM), 2013.

ch10-graphs2.pdf

More Related Content

Similar to ch10-graphs2.pdf (20)

Recently uploaded (20)

ch10-graphs2.pdf