SlideShare a Scribd company logo
3
Most read
12
Most read
13
Most read
DEEPWALK VS NODE2VEC
SIDDHANT VERMA
BIT MESRA
Networks
 A flexible and general data structure.
 Many types of data can be formulated as
networks.
 Universal language for describing complex
data.
 Networks from science, nature,
and technology are more similar
than one would expect.
Classical ML tasks in
networks
 Node classification - Predict a type of a given
node
 Link prediction - Predict whether two nodes
are linked
 Community detection - Identify densely
linked clusters of nodes
 Network similarity - How similar are two
(sub)networks.
Graph Embedding
Graph Embedding is an approach that is used to
transform nodes, edges, and their features into
vector space (a lower dimension) whilst
maximally preserving properties like graph
structure and information.
DeepWalk
DeepWalk uses local information obtained from
truncated random walks to learn latent
representations by treating walks as the
equivalent of sentences.
Deepwalk learns social representations of a
graph’s vertices, by modeling a stream of short
random walks.
Social representations are latent features of the
vertices that capture neighborhood similarity and
community membership.
DeepWalk takes a graph as input and produces a
latent representation as an output. The result of
applying our method to the well-studied Karate
network is shown
Connection: Power laws
The frequency which vertices appear in the short
random walks will also follow a power-law
distribution. Word frequency in natural language
follows a similar distribution, and techniques
from language modeling account for this
distributional behavior.
Pseudo Algorithm
SkipGram
SkipGram is an algorithm that is used to create
word embeddings i.e. high-dimensional vector
representation of words. These embeddings are
meant to encode the semantic meaning of words
such that words that are semantically similar will
lie close to each other in that vector's space.
Hierarchical Softmax
If we assign the vertices to the leaves of a binary
tree, the prediction problem turns into
maximizing the probability of a specific path in
the tree. If the path to vertex uk is identified by a
sequence of tree nodes (b0, b1, . . . , bdlog |V |e),
(b0 = root, bdlog |V |e = uk) then
Pr(uk | Φ(vj )) = dlog Y |V |e l=1 Pr(bl | Φ(vj ))
Now, Pr(bl | Φ(vj )) could be modeled by a binary
classifier that is assigned to the parent of the node
bl. This reduces the computational complexity of
calculating Pr(uk | Φ(vj )) from O(|V |) to O(log
|V |).
N
o
d
e
2
vec
Node2vec, an algorithmic framework for learning
continuous feature representations for nodes in
networks. In node2vec, we learn a mapping of
nodes to a low-dimensional space of features that
maximizes the likelihood of preserving network
neighborhoods of nodes. We define a flexible
notion of a node’s network neighborhood and
design a biased random walk procedure, which
efficiently explores diverse neighborhoods. Our
algorithm generalizes prior work which is based
on rigid notions of network neighborhoods, and
we argue that the added flexibility in exploring
neighborhoods is the key to learning richer
representations.
BFS
Neighborhood is restricted to nodes which are
immediate neighbors of the source .
For a neighborhood of size k=3 BFS samples
nodes s1,s2,s3
BFS <-> Structural Equivalence
Nodes that have similar structural roles in
networks should be embedded closely together.
E.g., nodes u and s6 in fig
Restricting search to nearby nodes, BFS gives
microscopic view.
Network roles such as bridges and hubs can be
inferred using BFS.
DFS
Neighborhood consists of nodes sequentially
sampled at increasing distances from the source
node .
For a neighborhood of size k=3 DFS
samples nodes s4,s5,s6 .
DFS <-> Homophily
Nodes that are highly interconnected and belong
to similar network communities should be
embedded closely together.
E.g., nodes u and s1 in fig
DFS sampled nodes reflect a macro-view of
nodes neighborhood.
Drawbacks of DeepWalk
DeepWalk: learns d-dimensional feature
representations by simulating uniform random
walks. Can be observed as a special case of
node2vec with parameters p = 1 & q = 1.
Networks represent a mixture of homophily and
structural equivalence, which are not effectively
covered by the above method.
Feature Learning Framework
Let G = (V, E) be a given network. applies to any
(un)directed, (un)weighted network. Let f : V →
R d be the mapping function from nodes to
feature representaions . Here d is a parameter
specifying the number of dimensions of our
feature representation. Equivalently, f is a matrix
of size |V | × d parameters. For every source node
u ∈ V , we define NS(u) ⊂ V as a network
neighborhood of node u generated through a
neighborhood sampling strategy S.
f: max f X u∈ V log P r(NS(u)|f(u))
Conditional independence. We factorize the
likelihood by assuming that the likelihood of
observing a neighborhood node is independent of
observing any other neighborhood node given the
feature representation of the source:
P r(NS(u)|f(u)) = Y ni∈ NS (u) P r(ni|f(u)).
model the conditional likelihood of every source-
neighborhood node pair as a softmax unit
parametrized by a dot product of their features: P
r(ni|f(u)) = exp(f(ni) · f(u)) P v∈ V exp(f(v) ·
f(u)). With the above assumptions, the objective
in Eq. 1 simplifies to: max f X u∈ V − log Zu +
X ni∈ NS (u) f(ni) · f(u) .
Given the linear nature of text, the notion of a
neighborhood can be naturally defined using a
sliding window over consecutive words.
Networks, however, are not linear, and thus a
richer notion of a neighborhood is needed. To
resolve this issue, we propose a randomized
procedure that samples many different
neighborhoods of a given source node u. The
neighborhoods NS(u) are not restricted to just
immediate neighbors but can have vastly different
structures depending on the sampling strategy S.
Search bias α
We define a 2nd order random walk with two
parameters p and q which guide the walk:
Consider a random walk that just traversed edge
(t, v) and now resides at node v (Figure 2). The
walk now needs to decide on the next step so it
evaluates the transition probabilities πvx on edges
(v, x) leading from v. We set the unnormalized
transition probability to
πvx = αpq(t, x) · wvx,
wvx = weight between node v and x
and dtx denotes the shortest path distance
between nodes t and x. Note that dtx must be one
of {0, 1, 2}, and hence, the two parameters are
necessary and sufficient to guide the walk.
Intuitively, parameters p and q control how fast
the walk explores and leaves the neighborhood of
starting node u. In particular, the parameters
allow our search procedure to (approximately)
interpolate between BFS and DFS and thereby
reflect an affinity for different notions of node
equivalences.
Formally, given a source node u, we simulate a
random walk of fixed length l. Let ci denote the
ith node in the walk, starting with c0 = u. Nodes
ci are generated by the following distribution:
P(ci = x | ci−1 = v) = (πvx Z if (v, x) ∈ E 0
otherwise where πvx is the unnormalized
transition probability between nodes v and x, and
Z is the normalizing constant.
Return parameter, p. Parameter p controls the
likelihood of immediately revisiting a node in the
walk. Setting it to a high value (> max(q, 1))
ensures that we are less likely to sample an
alreadyvisited node in the following two steps
(unless the next node in the walk had no other
neighbor). This strategy encourages moderate
exploration and avoids 2-hop redundancy in
sampling. On the other hand, if p is low (< min(q,
1)), it would lead the walk to backtrack a step
(Figure 2) and this would keep the walk “local”
close to the starting node u. In-out parameter, q.
Parameter q allows the search to differentiate
between “inward” and “outward” nodes. Going
back to Figure 2, if q > 1, the random walk is
biased towards nodes close to node t. Such walks
obtain a local view of the underlying graph with
respect to the start node in the walk and
approximate BFS behavior in the sense that our
samples comprise of nodes within a small
locality. In contrast, if q < 1, the walk is more
inclined to visit nodes which are further away
from the node t. Such behavior is reflective of
DFS which encourages outward exploration.
However, an essential difference here is that we
achieve DFS-like exploration within the random
walk framework. Hence, the sampled nodes are
not at strictly increasing distances from a given
source node u, but in turn, we benefit from
tractable preprocessing and superior sampling
efficiency of random walks. Note that by setting
πv,x to be a function of the preceeding node in the
walk t, the random walks are 2nd order
Markovian.
Pseudo algorithm

More Related Content

PDF
Network embedding
PPTX
Graph Representation Learning
PPTX
AlexNet, VGG, GoogleNet, Resnet
PDF
Backpropagation in Convolutional Neural Network
PPTX
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
PPT
K mean-clustering
PPTX
DBSCAN (2014_11_25 06_21_12 UTC)
PDF
Link prediction 방법의 개념 및 활용
Network embedding
Graph Representation Learning
AlexNet, VGG, GoogleNet, Resnet
Backpropagation in Convolutional Neural Network
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
K mean-clustering
DBSCAN (2014_11_25 06_21_12 UTC)
Link prediction 방법의 개념 및 활용

What's hot (20)

PDF
Gnn overview
PPTX
Community detection in social networks
PDF
Social network analysis part ii
PPTX
DBSCAN : A Clustering Algorithm
PPT
Propositional And First-Order Logic
PDF
Graph-Powered Machine Learning
PPTX
Statistical learning
PDF
DeepWalk: Online Learning of Representations
PDF
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
PPTX
Link prediction with the linkpred tool
PDF
Machine Learning Algorithm - Decision Trees
PDF
Representation learning on graphs
PPT
Knowledge Representation.ppt
PPTX
Classical relations and fuzzy relations
PPTX
Deep Learning A-Z™: Artificial Neural Networks (ANN) - The Activation Function
PDF
GAN - Theory and Applications
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PPTX
Dimensionality reduction: SVD and its applications
PPTX
An introduction to reinforcement learning
PDF
GraphSage vs Pinsage #InsideArangoDB
Gnn overview
Community detection in social networks
Social network analysis part ii
DBSCAN : A Clustering Algorithm
Propositional And First-Order Logic
Graph-Powered Machine Learning
Statistical learning
DeepWalk: Online Learning of Representations
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Link prediction with the linkpred tool
Machine Learning Algorithm - Decision Trees
Representation learning on graphs
Knowledge Representation.ppt
Classical relations and fuzzy relations
Deep Learning A-Z™: Artificial Neural Networks (ANN) - The Activation Function
GAN - Theory and Applications
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Dimensionality reduction: SVD and its applications
An introduction to reinforcement learning
GraphSage vs Pinsage #InsideArangoDB
Ad

Similar to Deepwalk vs Node2vec (20)

PPTX
Deepwalk vs Node2vec
PDF
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
PDF
Iclr2016 vaeまとめ
PDF
The Effect of Network Topology on Geographic Routing Performance in Localized...
PPTX
240708_JW_labseminar[struc2vec: Learning Node Representations from Structural...
DOC
Xtc a practical topology control algorithm for ad hoc networks (synopsis)
PDF
Link Prediction in the Real World
PPTX
5.2_video_slides.pptx
PPTX
Dijkstra
PDF
Cnetwork
PPTX
numeric network in the world of heart then ay iks jsns
PPTX
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
PPTX
Routing Algorithm
PDF
ON THE IMPLEMENTATION OF GOLDBERG'S MAXIMUM FLOW ALGORITHM IN EXTENDED MIXED ...
PDF
ON THE IMPLEMENTATION OF GOLDBERG'S MAXIMUM FLOW ALGORITHM IN EXTENDED MIXED ...
PDF
ON THE IMPLEMENTATION OF GOLDBERG'S MAXIMUM FLOW ALGORITHM IN EXTENDED MIXED ...
Deepwalk vs Node2vec
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Iclr2016 vaeまとめ
The Effect of Network Topology on Geographic Routing Performance in Localized...
240708_JW_labseminar[struc2vec: Learning Node Representations from Structural...
Xtc a practical topology control algorithm for ad hoc networks (synopsis)
Link Prediction in the Real World
5.2_video_slides.pptx
Dijkstra
Cnetwork
numeric network in the world of heart then ay iks jsns
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
Routing Algorithm
ON THE IMPLEMENTATION OF GOLDBERG'S MAXIMUM FLOW ALGORITHM IN EXTENDED MIXED ...
ON THE IMPLEMENTATION OF GOLDBERG'S MAXIMUM FLOW ALGORITHM IN EXTENDED MIXED ...
ON THE IMPLEMENTATION OF GOLDBERG'S MAXIMUM FLOW ALGORITHM IN EXTENDED MIXED ...
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Computer network topology notes for revision
PDF
Mega Projects Data Mega Projects Data
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Fluorescence-microscope_Botany_detailed content
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Moving the Public Sector (Government) to a Digital Adoption
Supervised vs unsupervised machine learning algorithms
oil_refinery_comprehensive_20250804084928 (1).pptx
IB Computer Science - Internal Assessment.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
A Quantitative-WPS Office.pptx research study
STUDY DESIGN details- Lt Col Maksud (21).pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
Database Infoormation System (DBIS).pptx
Computer network topology notes for revision
Mega Projects Data Mega Projects Data
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Major-Components-ofNKJNNKNKNKNKronment.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Deepwalk vs Node2vec

  • 2. Networks  A flexible and general data structure.  Many types of data can be formulated as networks.  Universal language for describing complex data.  Networks from science, nature, and technology are more similar than one would expect.
  • 3. Classical ML tasks in networks  Node classification - Predict a type of a given node  Link prediction - Predict whether two nodes are linked  Community detection - Identify densely linked clusters of nodes  Network similarity - How similar are two (sub)networks.
  • 4. Graph Embedding Graph Embedding is an approach that is used to transform nodes, edges, and their features into vector space (a lower dimension) whilst maximally preserving properties like graph structure and information.
  • 5. DeepWalk DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences.
  • 6. Deepwalk learns social representations of a graph’s vertices, by modeling a stream of short random walks. Social representations are latent features of the vertices that capture neighborhood similarity and community membership. DeepWalk takes a graph as input and produces a latent representation as an output. The result of applying our method to the well-studied Karate network is shown
  • 7. Connection: Power laws The frequency which vertices appear in the short random walks will also follow a power-law distribution. Word frequency in natural language follows a similar distribution, and techniques from language modeling account for this distributional behavior.
  • 9. SkipGram SkipGram is an algorithm that is used to create word embeddings i.e. high-dimensional vector representation of words. These embeddings are meant to encode the semantic meaning of words such that words that are semantically similar will lie close to each other in that vector's space.
  • 10. Hierarchical Softmax If we assign the vertices to the leaves of a binary tree, the prediction problem turns into maximizing the probability of a specific path in the tree. If the path to vertex uk is identified by a sequence of tree nodes (b0, b1, . . . , bdlog |V |e), (b0 = root, bdlog |V |e = uk) then Pr(uk | Φ(vj )) = dlog Y |V |e l=1 Pr(bl | Φ(vj )) Now, Pr(bl | Φ(vj )) could be modeled by a binary classifier that is assigned to the parent of the node bl. This reduces the computational complexity of calculating Pr(uk | Φ(vj )) from O(|V |) to O(log |V |). N o d e 2
  • 11. vec Node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node’s network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations.
  • 12. BFS Neighborhood is restricted to nodes which are immediate neighbors of the source . For a neighborhood of size k=3 BFS samples nodes s1,s2,s3 BFS <-> Structural Equivalence Nodes that have similar structural roles in networks should be embedded closely together. E.g., nodes u and s6 in fig Restricting search to nearby nodes, BFS gives microscopic view. Network roles such as bridges and hubs can be inferred using BFS.
  • 13. DFS Neighborhood consists of nodes sequentially sampled at increasing distances from the source node . For a neighborhood of size k=3 DFS samples nodes s4,s5,s6 . DFS <-> Homophily Nodes that are highly interconnected and belong to similar network communities should be embedded closely together. E.g., nodes u and s1 in fig DFS sampled nodes reflect a macro-view of nodes neighborhood.
  • 14. Drawbacks of DeepWalk DeepWalk: learns d-dimensional feature representations by simulating uniform random walks. Can be observed as a special case of node2vec with parameters p = 1 & q = 1. Networks represent a mixture of homophily and structural equivalence, which are not effectively covered by the above method.
  • 15. Feature Learning Framework Let G = (V, E) be a given network. applies to any (un)directed, (un)weighted network. Let f : V → R d be the mapping function from nodes to feature representaions . Here d is a parameter specifying the number of dimensions of our feature representation. Equivalently, f is a matrix of size |V | × d parameters. For every source node u ∈ V , we define NS(u) ⊂ V as a network neighborhood of node u generated through a neighborhood sampling strategy S. f: max f X u∈ V log P r(NS(u)|f(u)) Conditional independence. We factorize the likelihood by assuming that the likelihood of observing a neighborhood node is independent of observing any other neighborhood node given the feature representation of the source: P r(NS(u)|f(u)) = Y ni∈ NS (u) P r(ni|f(u)). model the conditional likelihood of every source- neighborhood node pair as a softmax unit parametrized by a dot product of their features: P r(ni|f(u)) = exp(f(ni) · f(u)) P v∈ V exp(f(v) · f(u)). With the above assumptions, the objective
  • 16. in Eq. 1 simplifies to: max f X u∈ V − log Zu + X ni∈ NS (u) f(ni) · f(u) . Given the linear nature of text, the notion of a neighborhood can be naturally defined using a sliding window over consecutive words. Networks, however, are not linear, and thus a richer notion of a neighborhood is needed. To resolve this issue, we propose a randomized procedure that samples many different neighborhoods of a given source node u. The neighborhoods NS(u) are not restricted to just immediate neighbors but can have vastly different structures depending on the sampling strategy S. Search bias α We define a 2nd order random walk with two parameters p and q which guide the walk: Consider a random walk that just traversed edge (t, v) and now resides at node v (Figure 2). The walk now needs to decide on the next step so it evaluates the transition probabilities πvx on edges (v, x) leading from v. We set the unnormalized transition probability to
  • 17. πvx = αpq(t, x) · wvx, wvx = weight between node v and x and dtx denotes the shortest path distance between nodes t and x. Note that dtx must be one of {0, 1, 2}, and hence, the two parameters are necessary and sufficient to guide the walk. Intuitively, parameters p and q control how fast the walk explores and leaves the neighborhood of starting node u. In particular, the parameters allow our search procedure to (approximately) interpolate between BFS and DFS and thereby reflect an affinity for different notions of node equivalences.
  • 18. Formally, given a source node u, we simulate a random walk of fixed length l. Let ci denote the ith node in the walk, starting with c0 = u. Nodes ci are generated by the following distribution: P(ci = x | ci−1 = v) = (πvx Z if (v, x) ∈ E 0 otherwise where πvx is the unnormalized transition probability between nodes v and x, and Z is the normalizing constant. Return parameter, p. Parameter p controls the likelihood of immediately revisiting a node in the walk. Setting it to a high value (> max(q, 1)) ensures that we are less likely to sample an alreadyvisited node in the following two steps (unless the next node in the walk had no other neighbor). This strategy encourages moderate exploration and avoids 2-hop redundancy in
  • 19. sampling. On the other hand, if p is low (< min(q, 1)), it would lead the walk to backtrack a step (Figure 2) and this would keep the walk “local” close to the starting node u. In-out parameter, q. Parameter q allows the search to differentiate between “inward” and “outward” nodes. Going back to Figure 2, if q > 1, the random walk is biased towards nodes close to node t. Such walks obtain a local view of the underlying graph with respect to the start node in the walk and approximate BFS behavior in the sense that our samples comprise of nodes within a small locality. In contrast, if q < 1, the walk is more inclined to visit nodes which are further away from the node t. Such behavior is reflective of DFS which encourages outward exploration. However, an essential difference here is that we achieve DFS-like exploration within the random walk framework. Hence, the sampled nodes are not at strictly increasing distances from a given source node u, but in turn, we benefit from tractable preprocessing and superior sampling efficiency of random walks. Note that by setting πv,x to be a function of the preceeding node in the walk t, the random walks are 2nd order Markovian.