1. Lecture 3:
Mathematics of Networks
CS 765: Complex Networks
Slides are modified from Networks: Theory and Application by Lada Adamic
2. What are networks?
Networks are collections of points joined by lines.
“Network” ≡ “Graph”
points lines Domain
vertices edges, arcs math
nodes links computer science
sites bonds physics
actors ties, relations sociology
node
edge
2
3. Network elements: edges
Directed (also called arcs)
A -> B (EBA)
A likes B, A gave a gift to B, A is B’s child
Undirected
A <-> B or A – B
A and B like each other
A and B are siblings
A and B are co-authors
Edge attributes
weight (e.g. frequency of communication)
ranking (best friend, second best friend…)
type (friend, relative, co-worker)
properties depending on the structure of the rest of the graph: e.g.
betweenness
Multiedge: multiple edges between two pair of nodes
Self-edge: from a node to itself
3
5. Edge weights can have positive or negative values
One gene activates/
inhibits another
One person trusting/
distrusting another
Research challenge:
How does one
‘propagate’ negative
feelings in a social
network?
Is my enemy’s enemy
my friend?
Transcription regulatory
network in baker’s yeast
5
6. Adjacency matrices
Representing edges (who is adjacent to whom) as a
matrix
Aij = 1 if node i has an edge to node j
= 0 if node i does not have an edge to j
Aii = 0 unless the network has self-loops
If self-loop, Aii=1
Aij = Aji if the network is undirected,
or if i and j share a reciprocated edge
i
j
i
i
j
1
2
3
4
Example:
5
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
A =
6
7. Adjacency lists
Edge list
2 3
2 4
3 2
3 4
4 5
5 2
5 1
Adjacency list
is easier to work with if network is
large
sparse
quickly retrieve all neighbors for a node
1:
2: 3 4
3: 2 4
4: 5
5: 1 2
1
2
3
4
5
7
8. Nodes
Node network properties
from immediate connections
indegree
how many directed edges (arcs) are incident on a node
outdegree
how many directed edges (arcs) originate at a node
degree (in or out)
number of edges incident on a node
outdegree=2
indegree=3
degree=5
8
9. HyperGraphs
Edges join more than two nodes at a time (hyperEdge)
Affliation networks
Examples
Families
Subnetworks
Can be transformed to a bipartite network
9
C D
A B
C D
A B
10. Bipartite (two-mode) networks
edges occur only between two groups of nodes, not
within those groups
for example, we may have individuals and events
directors and boards of directors
customers and the items they purchase
metabolites and the reactions they participate in
11. in matrix notation
Bij
= 1 if node i from the first group
links to node j from the second group
= 0 otherwise
B is usually not a square matrix!
for example: we have n customers and m products
i
j
1 0 0 0
1 0 0 0
1 1 0 0
1 1 1 1
0 0 0 1
B =
12. going from a bipartite to a one-mode graph
One mode projection
two nodes from the first group
are connected if they link to the
same node in the second group
naturally high occurrence of
cliques
some loss of information
Can use weighted edges to
preserve group occurrences
Two-mode network
group 1
group 2
13. Collapsing to a one-mode network
i and j are linked if they both link to k
Pij = k Bik Bjk
P’ = B BT
the transpose of a matrix swaps Bxy and Byx
if B is an nxm matrix, BT
is an mxn matrix
i
k=1
j
k=2
B = BT
=
1 0 0 0
1 0 0 0
1 1 0 0
1 1 1 1
0 0 0 1
1 1 1 1 0
0 0 1 1 0
0 0 0 1 0
0 0 0 1 1
15. Collapsing a two-mode network to a one mode-network
Assume the nodes in group 1 are people and the nodes
in group 2 are movies
P’ is symmetric
The diagonal entries of P’ give the number of movies
each person has seen
The off-diagonal elements of P’ give the number of
movies that both people have seen
P’ =
1 1 1 1 0
1 1 1 1 0
1 1 2 2 0
1 1 2 4 1
0 0 0 1 1
1 1
1 2
1
16. Trees
Trees are undirected graphs that contain no cycles
For n nodes, number of edges m = n-1
Any node can be dedicated as the root
17. examples of trees
In nature
trees
river networks
arteries (or veins, but not both)
Man made
sewer system
Computer science
binary search trees
decision trees (AI)
Network analysis
minimum spanning trees
from one node – how to reach all other nodes most quickly
may not be unique, because shortest paths are not always unique
depends on weight of edges
18. Planar graphs
A graph is planar if it can be drawn on a plane without
any edges crossing
19. Cliques and complete graphs
Kn is the complete graph (clique) with K vertices
each vertex is connected to every other vertex
there are n*(n-1)/2 undirected edges
K5 K8
K3
20. Kuratowski’s theorem
Every non-planar network contains at least one
subgraph that is an expansion of K5 or K3,3.
K5 K3,3
Expansion: Addition of new node in the middle of edges.
Research challenge: Degree of planarity?
20
21. #s of planar graphs of different sizes
1:1
2:2
3:4
4:11
Every planar graph
has a straight line
embedding
22. Edge contractions defined
A finite graph G is planar if and only if it has no subgraph that is
homeomorphic or edge-contractible to the complete graph in five vertices
(K5) or the complete bipartite graph K3, 3. (Kuratowski's Theorem)
24. Bi-cliques (cliques in bipartite graphs)
Km,n is the complete bipartite graph with m and n vertices of the
two different types
K3,3 maps to the utility graph
Is there a way to connect three utilities, e.g. gas, water, electricity to
three houses without having any of the pipes cross?
K3,3
Utility graph
25. Node degree
Outdegree =
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
A =
n
j
ij
A
1
example: outdegree for node 3 is 2, which
we obtain by summing the number of non-
zero entries in the 3rd
row
Indegree =
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
A =
n
i
ij
A
1
example: the indegree for node 3 is 1,
which we obtain by summing the number of
non-zero entries in the 3rd
column
n
i
i
A
1
3
n
j
j
A
1
3
1
2
3
4
5
25
26. Degree sequence and Degree distribution
Degree sequence: An ordered list of the (in,out) degree of each node
In-degree sequence:
[2, 2, 2, 1, 1, 1, 1, 0]
Out-degree sequence:
[2, 2, 2, 2, 1, 1, 1, 0]
(undirected) degree sequence:
[3, 3, 3, 2, 2, 1, 1, 1]
Degree distribution: A frequency count of the occurrence of each degree
In-degree distribution:
[(2,3) (1,4) (0,1)]
Out-degree distribution:
[(2,4) (1,3) (0,1)]
(undirected) distribution:
[(3,3) (2,2) (1,3)]
0 1 2
0
1
2
3
4
5
indegree
frequency
26
29. network metrics: graph density
Of the connections that may exist between n nodes
directed graph
emax = n*(n-1)
undirected graph
emax = n*(n-1)/2
What fraction are present?
density = e/ emax
For example, out of 12 possible connections,
this graph has 7, giving it a density of 7/12 = 0.583
29
30. Graph density
30
Would this measure be useful for comparing networks of
different sizes (different numbers of nodes)?
As n → ∞, a graph whose density reaches
0 is a sparse graph
a constant is a dense graph
32. Network metrics: paths
A path is any sequence of vertices such that every
consecutive pair of vertices in the sequence is
connected by an edge in the network.
For directed: traversed in the correct direction for the edges.
path can visit itself (vertex or edge) more than once
Self-avoiding paths do not intersect themselves.
Path length r is the number of edges on the path
Called hops
32
36. Eulerian Path
Euler’s Seven Bridges of Königsberg
one of the first problems in graph theory
Is there a route that crosses each bridge only once and returns to
the starting point?
Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
Image 1 – GNU v1.2: Bogdan, Wikipedia; http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License
Image 2 – GNU v1.2: Booyabazooka, Wikipedia; http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License
Image 3 – GNU v1.2: Riojajar, Wikipedia; http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License
37. Eulerian and Hamiltonian paths
Hamiltonian path is self avoiding
If starting point and end point are the same:
only possible if no nodes have an odd degree as each path must visit and leave
each shore
If don’t need to return to starting point
can have 0 or 2 nodes with an odd degree
Eulerian path: traverse each
edge exactly once
Hamiltonian path: visit
each vertex exactly once
39. Network metrics: components
If there is a path from every vertex in a network to every
other, the network is connected
otherwise, it is disconnected
Component: A subset of vertices such that there exist at
least one path from each member of the subset to others
and there does not exist another vertex in the network
which is connected to any vertex in the subset
Maximal subset
A singeleton vertex that is not connected to any other
forms a size one component
Every vertex belongs to exactly one component
39
40. network metrics: size of giant component
if the largest component encompasses a significant fraction of the graph,
it is called the giant component
40
41. components in directed networks
A
B
C
D
E
F
G
H
Weakly connected components
A B C D E
G H F
41
Strongly connected components
Each node within the component can be reached from every other node in the component
by following directed links
Strongly connected components
B C D E
A
G H
F
Weakly connected components:
every node can be reached from every other node by following links in either direction
A
B
C
D
E
F
G
H
42. components in directed networks
Every strongly connected component of more than one
vertex has at least one cycle
Out-component: set of all vertices that are reachable
via directed paths starting at a specific vertex v
Out-components of all members of a strongly
connected component are identical
In-component: set of all vertices from which there is a
direct path to a vertex v
In-components of all members of a strongly connected
component are identical
42
A
B
C
D
E
F
G
H
43. bowtie model of the web
The Web is a directed graph:
webpages link to other webpages
The connected components tell us what set of pages can
be reached from any other just by surfing
no ‘jumping’ around by typing in a URL or using a search engine
Broder et al. 1999 – crawl of over 200 million pages and
1.5 billion links.
SCC – 27.5%
IN and OUT – 21.5%
Tendrils and tubes – 21.5%
Disconnected – 8%
43
44. degree distribution
indegree, ~ 2.1
outdegree, ~ 2.4
source: Pennock et al.: Winners don't take all: Characterizing the competition for links on the web
PNAS April 16, 2002 vol. 99 no. 8 5207-5211
45. clustering & motifs
clustering coefficient ~ 0.11 (at the site level)
Source: Milo et al., “Superfamilies of evolved and designed networks”, Science 303 (5663), p. 1538-1542, 2004.
46. shortest paths
<d> = 0.35 + 2.06 log(N)
prediction: <d> = 17.5 for 200 million nodes
actual: <d> = 16 for reachable pairs
0 2 4 6 8 10
x 10
4
0
5
10
15
20
25
average
shortest
path
number of webpages
47. Network Analysis
What is a network?
a bunch of nodes and edges
How do you characterize it?
with some basic network metrics
How did network analysis get started?
it was the mathematicians
How do you analyze networks today?
with pajek or other software
48. overview of network analysis tools
Pajek
network analysis and visualization,
menu driven, suitable for large networks
platforms: Windows (on linux
via Wine)
download
Netlogo
agent based modeling
recently added network modeling capabilities
platforms: any (Java)
download
GUESS
network analysis and visualization,
extensible, script-driven (jython)
platforms: any (Java)
download
Other software tools that we will not be using but that you may find useful:
visualization and analysis:
UCInet - user friendly social network visualization and analysis software (suitable smaller networks)
iGraph - if you are familiar with R, you can use iGraph as a module to analyze or create large networks, or you can directly use the C functions
Jung - comprehensive Java library of network analysis, creation and visualization routines
Graph package for Matlab (untested?) - if Matlab is the environment you are most comfortable in, here are some basic routines
SIENA - for p* models and longitudinal analysis
SNA package for R - all sorts of analysis + heavy duty stats to boot
NetworkX - python based free package for analysis of large graphs
InfoVis Cyberinfrastructure - large agglomeration of network analysis tools/routines, partly menu driven
visualization only:
GraphViz - open source network visualization software (can handle large/specialized networks)
TouchGraph - need to quickly create an interactive visualization for the web?
yEd - free, graph visualization and editing software
specialized:
fast community finding algorithm
motif profiles
CLAIR library - NLP and IR library (Perl Based) includes network analysis routines
finally: INSNA long list of SNA packages
49. tools we’ll use
Pajek: extensive menu-driven functionality, including many,
many network metrics and manipulations
but… not extensible
Guess: extensible, scriptable tool of exploratory data analysis,
but more limited selection of built-in methods compared to
Pajek
NetLogo: general agent based simulation platform with
excellent network modeling support
many of the demos in this course were built with NetLogo
iGraph: libraries can be accessed through R or python.
Routines scale to millions of nodes.
50. other tools: visualization tool: gephi
http://gephi.org
primarily for visualization, has some nice touches
http://player.vimeo.com/video/9726202
51. visualization tool: GraphViz
Takes descriptions of graphs in simple text languages
Outputs images in useful formats
Options for shapes and colors
Standalone or use as a library
dot: hierarchical or layered drawings of directed graphs,
by avoiding edge crossings and reducing edge length
neato (Kamada-Kawai) and fdp (Fruchterman-Reinhold
with heuristics to handle larger graphs)
twopi – radial layout
circo – circular layout
http://www.graphviz.org
57. Other visualization tools: Walrus
developed at CAIDA available under the GNU GPL.
“…best suited to visualizing moderately sized graphs that are
nearly trees. A graph with a few hundred thousand nodes and
only a slightly greater number of links is likely to be
comfortable to work with.”
Java-based
Implemented Features
rendering at a guaranteed frame rate regardless of graph size
coloring nodes and links with a fixed color, or by RGB values
stored in attributes
labeling nodes
picking nodes to examine attribute values
displaying a subset of nodes or links based on a user-supplied
boolean attribute
interactive pruning of the graph to temporarily reduce clutter and
occlusion
zooming in and out
Source: CAIDA, http://www.caida.org/tools/visualization/walrus/
60. visualization tools: Prefuse
user interface toolkit for interactive information visualization
built in Java using Java2D graphics library
data structures and algorithms
pipeline architecture featuring reusable, composable modules
animation and rendering support
architectural techniques for scalability
requires knowledge of Java programming
website: http://prefuse.sourceforge.net
62. Examples of prefuse applications: flow maps
A flow map of migration from California from
1995-2000, generated automatically by our
system using edge routing but no layout
adjustment.
http://graphics.stanford.edu/papers/flow_map_layout/
63. Examples of prefuse applications: vizster
http://jheer.org/vizster
64. Outline
Network metrics can help us characterize networks
This has is roots in graph theory
Today there are many network analysis tools to choose
from
though most of them are in beta!