Network representations
Number of nodes:
Number of links:
Adjacency matrix
Laplacian matrix
N
L
A =
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
0
1
1
1
1
0
1
0
0
0
1
1
1
0
0
1
1
0
1
0
1
0
0
0
1
1
1
0
0
1
0
1
0
0
1
0
⎤
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
Q =
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
4
−1
−1
−1
−1
0
−1
3
0
0
−1
−1
−1
0
3
−1
−1
0
−1
0
1
2
0
0
−1
−1
−1
0
4
−1
0
−1
0
0
−1
2
⎤
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
In this example: and .N = 6 L = 9
2

Triangles in networks
How to count the number of node-disjoint
triangles?
Algorithms need exhaustive enumeration.
def triangles(G, A):
   triangles = 0
   for n1, n2, n3 in zip(G.nodes(), G.nodes(), G.nodes()):
      if (A[n1,n2] and A[n2,n3] and A[n3,n1]):
         triangles += 1
   return triangles / 6
In this example we have 4 triangles.
Why are we interested in triangles anyway?
Important metric to classify networks.
Clustering coefficient C =
3 × number of triangles
number of connected triplets of nodes
3

Triangles in networks
Spectral decomposition of the Adjacency Matrix:
are the eigenvalues of the
network.
are the Laplacian eigenvalues
(We will need them later)
A = X
⎡
⎣
⎢
⎢
λ1
0
⋱
0
λN
⎤
⎦
⎥
⎥
X
T
≥ … ≥λ1 λN
≥ … ≥μ1 μN
In this example the eigenvalues are:
3.182, 1.247, −1.802,
−1.588, −0.445, −0.594
Computing the number of triangles becomes simple, once you know the eigenvalues.▲
▲ =
1
6
∑
i=1
N
λ
3
i
4

Network spectra
Random network
Barbasi-Albert ( = 5000)N
Network of AS
tech-WHOIS, Mahadevan, P., Krioukov, D. et al. ( = 7476)N
The adjacency and the Laplacian spectra might not be as intuitive for humans as the graph
representation, but both contain the complete information about the topology of the
network.
5

Can we extract the hidden relations
between the properties of a network
and their corresponding spectra?
6

Proof of concept: counting triangles
Idea: use to evolve symbolic expression!
1. generate all networks with nodes
2. compute as target
3. compute as features
Does it work?
1. allow operations , do not bother with the constant
in 99%! Okay, that was too easy...
2. allow operations , give constant as input node
in 15%! Probably still expected?
3. allow operations , only constants as input nodes
in 2%! Maybe this could really work!?
Cartesian Genetic Programming (CGP)
N = 6
▲
, … ,λ1 λ6
{+, }⋅
3 1
6
6▲ = ∑
i=1
6
λ
3
i
{+, ×, }⋅
3 1
6
▲ =
1
6
∑
i=1
6
λ
3
i
{+, ×, ÷, }⋅
3
{1, 2, 3}
▲ =
1
6
∑
i=1
6
λ
3
i
7

Rediscover a known formula for a
network property
8

Discover an unknown formula for a
network property
9

Network diameter
Length of the longest of all shortest paths in a
network.
The exact formula for the network diameter implies this
exhaustive enumeration:
def diameter(G):
    diam = 0
    for src, dest in zip(G.nodes(), G.nodes()):
        path = shortest_path(src, dest)
        diam = max(len(path), diam)
    return diam
ρ
ρ = min dist(src, dest)max
src,dest
In this example the diameter is 3.
Why are we interested in the diameter anyway?
Diameter = worst-case lower bound on number of hops
i.e. limits the speed of spreading/ ooding in a network
10

Symbolic regression for the diameter
Idea: Try (systematically) different con gurations with CGP!
1. generate all networks with nodes
2. compute as the target
3. compute as features
4. include and some basic constants as features as well
5. use different sets of operators like or
Pick the expression that minimizes the sum of absolute errors:
Does it work?
What happened?
N = 6
ρ
, … ,λ1 λ6
N, L {1, 2, 3}
{+, −, ×, }⋅
2
{+, −, ×, ÷, }⋅√
f( ) = ∣ ρ(G) − ∣ρ^ ∑
G
ρ(G)
ˆ
= 2ρ^
11

Target distributions
Triangles ▲ Diameter ρ
Exhaustive learning on all networks is no longer feasible.
Networks need to be sampled to create a good (diverse) learning set.
12

Sampling networks
Augmented path graphs
ρ = 11
ρ = 10
ρ = 8
sparse networks
varies by keeping constantρ N
Barbell graphs
ρ = 6
ρ = 8
ρ = 8
dense networks
varies by keeping constantN ρ
Mixed graphs
50% augmented path + 50% barbell
13

CGP parametrization
parameter value
tness function sum of absolute errors
evolutionary strategy 1+4
mutation type and rate probabilistic (0.1)
node layout 1 row
200 columns
levels-back unrestricted
number of generations 200000
operators
constants
featuresets
A)
B)
+, −, ×, ÷, , , , log⋅
2
⋅
3
⋅√
1, 2, 3, 4, 5, 6, 7, 8, 9
N , L, , , ,λ1 λ2 λ3 λN
N , L, , , ,μ1 μN−1 μN−2 μN−3
15

Automatically generated formulas
= + + 3ρ^
log (2L +6)+6μ
N−3
log (L + )μ
N−3
5√
1
μ
N−1
√
5
–
√
1
μ
N−1
− −−−
√ 82
−−
√
1
729L −5μ
N−2
μ
N−3
− −−−−−−−−−−
√
= N − − 2 −ρ^
1−
1
(L−N)
3
2
+4
6−
6
(L−N)
3
2
L−N√
L−N√
L − N
− −−−−
√
1
L−N√
=ρ^ + + log ( ) − +N
−−
√
45μ
N−3
( + )μ
N−1
μ
N−3
2
216
( + )μ
N−1
μ
N−3
2
16
9μ
N−3
8 μ
N−3√
4
Lμ
N−1
μ
N−2
− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
√
16

Bounds on the diameter
= + + 3ρ^
log (2L +6)+6μ
N−3
log (L + )μ
N−3
5√
1
μ
N−1
√
5
–
√
1
μ
N−1
− −−−
√ 82
−−
√
1
729L −5μ
N−2
μ
N−3
− −−−−−−−−−−
√
Comparison to known analytical results from graph theory:
Upper bound by Chung, F.R., Faber, V. and Manteuffel, T.A.: "An Upper bound on
the diameter of a graph from eigenvalues associated with its Laplacian."
(See also Van Dam, E.R., Haemers, W.H.: "Eigenvalues and the diameter of graphs.")
ρ ≤ ⌊ ⌋ + 1
(N−1)cosh
−1
( )cosh
−1
+μ
1
μ
N−1
−μ
1
μ
N−1
1
/ρρ^
17

Diameter on real networks
parameter value
379
914
true diameter 17
approximated
diameter 21.48
upper bound 160.09
N
L
ρ
ρ^
ca-netscience
source: http://www.networkrepository.com
18

parameter value
4941
6594
true diameter 46
approximated
diameter 97.52
upper bound 749.49
N
L
ρ
ρ^
inf-power
19

parameter value
1458
1948
true diameter 19
approximated
diameter 18.59
upper bound 207.52
N
L
ρ
ρ^
bio-yeast
20

Can we extract the hidden relations
between the properties of a network
and their corresponding spectra?
22

Closing thoughts
Automatically evolving approximate equations for network properties is possible.
Equations are unbiased by human preconceptions.
Unexpected results may aid and stimulate research in spectral graph theory and
network science in general.
Challenges and possible (?) improvements
Sampling network space for stronger diversity.
Make complexity of formula part of the objective function.
Automatically evolve constants rather than using them as inputs.
Apply more advanced feature-selection methods to increase quality of formulas.
23

Symbolic Regression on Network Properties

More Related Content

What's hot (20)

Similar to Symbolic Regression on Network Properties (20)

Recently uploaded (20)

Symbolic Regression on Network Properties