Embeddings the geometry of relational algebra

Embeddings: The geometry of
Relational Algebra
Nikolaos Vasiloglou
1

Outline
1. Geometry in Greece
2. The birth of Real numbers
3. Geometry in France
4. From Geometry to Data Science
5. Lifting dimensions
6. Manifold Learning
7. Embedding and Relational data
8. Beyond Euclidean Geometry
9. Embedding and deep learning
2

Geometry in the days of Euclid
edge(A,B)
edge(B,C)
edge(C,A)
A
B C
4

Being more specific
edge(A,B)
edge(B,C)
edge(A,C)
angle[A] = x
length[A,B] = y
…...
5
A
B C

Is everything symbolic?
● len[A,B] = x
● What is x?
● x can only be 1 !
● angle(A) = y
● What is y?
● y can only be 900
6

Constructing quantities
● Natural numbers by repetitions
● Orthogonal Isosceles triangle constructs
Sqrt(2)
● Equilateral triangle constructs 600
angles
● Congruent triangles construct rational
numbers
7

The existence of Reals
● It took several decades
● and a lot of logicians
● Until we know they exist
● We cannot construct all of them with
geometry nor with relational algebra
8

What is great about reals?
● Reals can represent the output of
complicated programs, or any sequences
Example
⅓ =0.333333333….
√2 = 1.414….
sin(√3) = 0.98702664499
9

Expressions are standardized
● And we can compute distances very easily
● Subtraction computes distance in O(ε)
● ε is the number of digits
● We can also compare distances very fast
And don’t forget reals=arithmetization of logic
(Not all logic)
10

Geometry in the days of Descartes
A = (0, 1)
B = (-0.5, 0)
C = (0.5, 0)
12
A
B C

Key point in Cartesian Geometry
● Geometrical shapes are collection of vectors
● Vectors consist of REAL numbers
● Dimensions are independent
13

Why Cartesian Geometry?
14
Euclidean Cartesian
computation of distances
and angles
Construction of a program distance computations*
angle computations
proof of equalities
inequalities for abstract
shapes
A* search
other complicated methods
Calculus, use theorems of
Reals*

15
Euclidean CartesianEmbedding

From Geometry to data
(science)
16

Absolute (Cartesian) (Dense) Data
● The value of every dimension is determined
● A (Euclidean) Metric is assumed (Banach)
● An inner product (Hilbert) is also assumed
● Small distances are meaningful
● Big distances are meaningless
17

Relational Data
● Sparse representation
● Only some relations between atoms are
defined
● There is no straight way to compute any
relationship
● We usually call them graphs
18

Absolute vs Relational Representation
0 1
0 -1
1 0
-1 0
1 -1
-1 1
0 0
0 0
0 0
0 0
1 -1
-1 1
● Storage Requirements
○ N x d
○ N x N
19

Why do we want to convert
relational data to cartesian?
20

Relational -> Absolute (Embedding)
● Assign a k-dimensional vector to every entity
(graph node)
● Every edge (relationship) is computed as an
inner product
● Easy to store, good locality, visualization
● Not every graph is embeddable in a Hilbert
space
21

Does it make sense to convert
cartesian data to relational?
22

Absolute -> Relational -> Cartesian
Manifold Learning
● Why would someone want to do that?
● Practically speaking it means increasing the
dimensionality of the data
● Reminds you of the kernel trick?
● How do you do that?
23

From Absolute to Relational
24

All Nearest Neighbor graph
● Project data on itself
○ based on a metric
○ or a similarity
all nearest
25

Bipartite Projection (1)
● K-means with big K
k-means
N
d
N
k
26

● Random trees
○ Kd-trees,decision trees
recurse to the leaves
N
d
N
k
27

● Random Projections
random projections
N
d
N
k
28

Why do we want to do this?
● In lower dimensions linear models do not
work
● In high dimensions things are linear
29

Relations
● Relations express local information
○ Friend(Nick, TJ)
○ Friend(TJ, Molham)
○ Friend(x,y) <- Friend(x,z), Friend(y,z)
● But we don’t know if Friend(Nick, Molham)
● Can we find a geometric structure that
encodes information locally and allows
inference?
30

Manifolds = information is local
AKA Friend(Nick, TJ)
31

Riemannian Manifold (1)
● It has to be smooth, no corners
32

● There is something called tangent space that
looks like a euclidean plane.
33

● The distances that matter are the geodesic
34

● Euclidean Distances are meaningless
35

Manifold Learning Stages (1)
○ A mapping that forms relationships from the data
36

Embedding as an optimization problem
min Agg f(xi
,xj
)
subject to:
g(xi
,xj
)=0
where xi
∊ Rd
, where g,f are geometric functions
37

Manifold Learning on Unlabeled data
● MVU
○ Construct the relational data based on the nearest neighbor distances
38

Manifold Learning on Labeled data
● Laplacian SVM
39

Determining the optimal Manifold
Learning
40

Lift Dimension for Linear Separation
41

Embeddings for relational
data
42

Relations as geometric quantities
atoms -> vectors
relations -> dot products between vectors
relations -> distances between vectors
43

Back to the optimization problem
44
min Agg f(xi
,xj
)
subject to:
gk
(xi
,xj
)=0
where xi
∊ Rd
f -> ||xi
-xj
||
xi
represents an atom
gk
(xi
,xj
)=0
if
Rk
(atom_i, atom_j)
i.e. <xi
,xj
>=1

Relational Data as a graph
45
What is the
meaning of
zeros?

Is embedding always possible?
● It is a symmetric square matrix A
● It does not always admit a factorization of
the form A=VVT
● So the embedding cannot reconstruct
exactly the graph
● It can reconstruct properties of the graph
46

Let’s dive into this
Friend(Molham,Garry)
Friend(Molham,TJ)
Friend(Molham, Todd)
Friend(Nick,Garry)
Friend(Nick,Todd)
Friend(Nick, TJ)
47
There is no connection
between Nick and Molham
but there is a lot of flow
between them

How do we formulate that?
max ||xi
-xj
||
subject to:
||xi
-xj
||=dij
where dij
is the
flow from i,j
48
Friend[A,B]=v <- agg<<v=total(z)>>
Friend[A,C]=_,
Friend[B,C]=_,
Friend[C,D]=_.

There is a faster way to do this without
grounding all the facts
● Spectral Embedding of Graphs
● Graph Laplacian
49

Embeddings that approximate the shortest
paths
Convex Formulation
NonConvex Formulation
50
Again here we don’t have to
compute explicitly all
shortest paths

Eureka moment
Distances or dot products can compute
complicated functions
The same way that reals could compare complicated symbolic expressions
51

What if we observe only aggregations of relations only?
52
coke
A
T
L
Monday
P
r
o
m
o

Embeddings of more rich Relational Data
1
k + + +
+ +
=
53

● The factorization machine as a nonlinear
regression
54

● Generalization to Tensors
55

Relations with ordering
Sometimes it is hard to quantify!
It is easier to rank…
Love(Molham,Tim)>Love(Nick,Emir)
Love(TJ,Hung) < Love(Hung, Long)
…
Love(Nick,TJ) ? Love(Emir, Molham)
56

Optimization problem
max ||xi
-xj
||
subject to:
||xi
-xj
||< ||xi
-xk
||+c
57

Embedding on ranked data (1)
● A convex method
58

Manifold Learning and Embedding of
ranked data (2)
● OASIS, a semi-parametric approach
S(p,q) =
60

Embedding of time sequences
● Latent Markov Embedding
63

Euclidean space boring place to live
● Euclidean space is flat
● This practically means there is no
information stored on the space
● All points in space are equivalent
65

Sphere is a little bit more interesting
66

Back to manifolds
67
Geodesic distances on a manifold are expensive.
Metric changes from point to point.
In Euclidean space metric is the same everywhere

Hierarchical clustering, Poincare disks
68

The “imbedding” problem
● Plot a graph so that edges do not cross
● A more relaxed version of embedding
● imbedding is actually solving the problem of
embedding
● imbedding requires less dimensions
72

Embedding and deep learning
73

How is DL related to Embedding/Manifold
learning?
75
n_output < n_input
2N
possible outputs in the first layer

Embedding
76
text LSTM vector
images convnet vector
click data MLP vector
logic rules LSTM vector

Deep Learning is unifying Machine
Learning
77

The importance of embedding for categorical
data
79
Reviewing one hot encoding:
hair_color: (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1)
How would you transform hair_color into a tensor?
(Local representation)

A simple regression example
y =a1
xblack
+a2
xbrown
+a3
xyellow
+a4
xred
+a4
xprice
80
(One number per category)

A different approach
y =<wcolor
,acolor
> +a4
xprice
xcolor
is a vector
wcolor
is a vector
< , > is the dot product
81
xcolor
w

What is so great about this approach?
● First of all we can learn relationships
between categories
● Adversarial debugging
error =(<wcolor
, acolor
> +a4
xprice
-y)2
● argmin(error)w,a
is the best model given the data
● argmax(error)x,w
is the datapoint that breaks
our model
82

Finding the category
83
w*
w
● Do nearest neighbor search
with the w that was found by
maximizing the error.
● Do nearest neighbor search of
w with the w*
and find the
corresponding category

Why can’t we do this with 1-hot encoding?
argmaxx
(y-a1
xblack
+a2
xbrown
+a3
xyellow
+a4
xred
+a4
xprice
)2
What if the solution is
[xblack
,xbrown
, xyellow
,xred
,xprice
] =[1.3,-0.88,6.1,3.4,0.22]
The one hot encoding structure is violated
84

Embedding with generative
models
85

What is a Generative Model
86
Vector
Generative Deep
Learning Network
image
text
video
music

Adversarial Training
89
● Train D to maximize the
probability of the correct
label.
● Train D on both real
images and samples from
G
● Train G to min log ( 1- D(
G(z))
● or max log D(G(z) )
z
G(z)
D(x)
Real Images

Embeddings by inverting GANs
90
Vector
Generative Deep
Learning Network
image
text
video
music

When Euclidean dimensions acquire
meaning (InfoGAN)
92

Applications Putting all embeddings
together
93
text
DNN
embedding
vector
generative
network
image
text
DNN
embedding
vector
generative
network
image

Connecting images and natural language
94

Connecting images with text
95

I give you text and you create an
image
96

A sketch and ask you to draw a
house
97
https://affinelayer.com/pixsrv/

Give a question generate an answer
98

Arithmetization of logic
● Godel embedded logic with primes
● When programs take as input numbers that represent
logical statements we get in trouble
● A neural network can embed data
● A neural network can embed another neural net
● What are the implications?
● Is there a universal neural network that can get a task
as an input and build a neural network that can do it?
● Do we have another incompleteness theorem?
99

Summary
● Embedding is the new normalization of data
● Embedding summarizes any info in a fixed vector
● Complicated processed can be done with dot
products
● Different types of data can pass information to each
other
● Embedding is the basic ingredient of deep learning
100

Embeddings the geometry of relational algebra

More Related Content

What's hot (20)

Similar to Embeddings the geometry of relational algebra (20)

Recently uploaded (20)

Embeddings the geometry of relational algebra