SlideShare a Scribd company logo
Embeddings: The geometry of
Relational Algebra
Nikolaos Vasiloglou
1
Outline
1. Geometry in Greece
2. The birth of Real numbers
3. Geometry in France
4. From Geometry to Data Science
5. Lifting dimensions
6. Manifold Learning
7. Embedding and Relational data
8. Beyond Euclidean Geometry
9. Embedding and deep learning
2
Geometry in Greece
3
Geometry in the days of Euclid
edge(A,B)
edge(B,C)
edge(C,A)
A
B C
4
Being more specific
edge(A,B)
edge(B,C)
edge(A,C)
angle[A] = x
length[A,B] = y
…...
5
A
B C
Is everything symbolic?
● len[A,B] = x
● What is x?
● x can only be 1 !
● angle(A) = y
● What is y?
● y can only be 900
6
Constructing quantities
● Natural numbers by repetitions
● Orthogonal Isosceles triangle constructs
Sqrt(2)
● Equilateral triangle constructs 600
angles
● Congruent triangles construct rational
numbers
7
The existence of Reals
● It took several decades
● and a lot of logicians
● Until we know they exist
● We cannot construct all of them with
geometry nor with relational algebra
8
What is great about reals?
● Reals can represent the output of
complicated programs, or any sequences
Example
⅓ =0.333333333….
√2 = 1.414….
sin(√3) = 0.98702664499
9
Expressions are standardized
● And we can compute distances very easily
● Subtraction computes distance in O(ε)
● ε is the number of digits
● We can also compare distances very fast
And don’t forget reals=arithmetization of logic
(Not all logic)
10
Geometry in France
11
Geometry in the days of Descartes
A = (0, 1)
B = (-0.5, 0)
C = (0.5, 0)
12
A
B C
Key point in Cartesian Geometry
● Geometrical shapes are collection of vectors
● Vectors consist of REAL numbers
● Dimensions are independent
13
Why Cartesian Geometry?
14
Euclidean Cartesian
computation of distances
and angles
Construction of a program distance computations*
angle computations
proof of equalities
inequalities for abstract
shapes
A* search
other complicated methods
Calculus, use theorems of
Reals*
15
Euclidean CartesianEmbedding
From Geometry to data
(science)
16
Absolute (Cartesian) (Dense) Data
● The value of every dimension is determined
● A (Euclidean) Metric is assumed (Banach)
● An inner product (Hilbert) is also assumed
● Small distances are meaningful
● Big distances are meaningless
17
Relational Data
● Sparse representation
● Only some relations between atoms are
defined
● There is no straight way to compute any
relationship
● We usually call them graphs
18
Absolute vs Relational Representation
0 1
0 -1
1 0
-1 0
1 -1
-1 1
0 0
0 0
0 0
0 0
1 -1
-1 1
● Storage Requirements
○ N x d
○ N x N
19
Why do we want to convert
relational data to cartesian?
20
Relational -> Absolute (Embedding)
● Assign a k-dimensional vector to every entity
(graph node)
● Every edge (relationship) is computed as an
inner product
● Easy to store, good locality, visualization
● Not every graph is embeddable in a Hilbert
space
21
Does it make sense to convert
cartesian data to relational?
22
Absolute -> Relational -> Cartesian
Manifold Learning
● Why would someone want to do that?
● Practically speaking it means increasing the
dimensionality of the data
● Reminds you of the kernel trick?
● How do you do that?
23
From Absolute to Relational
24
All Nearest Neighbor graph
● Project data on itself
○ based on a metric
○ or a similarity
all nearest
25
Bipartite Projection (1)
● K-means with big K
k-means
N
d
N
k
26
Bipartite Projection (2)
● Random trees
○ Kd-trees,decision trees
recurse to the leaves
N
d
N
k
27
Bipartite Projection (3)
● Random Projections
random projections
N
d
N
k
28
Why do we want to do this?
● In lower dimensions linear models do not
work
● In high dimensions things are linear
29
Relations
● Relations express local information
○ Friend(Nick, TJ)
○ Friend(TJ, Molham)
○ Friend(x,y) <- Friend(x,z), Friend(y,z)
● But we don’t know if Friend(Nick, Molham)
● Can we find a geometric structure that
encodes information locally and allows
inference?
30
Manifolds = information is local
AKA Friend(Nick, TJ)
31
Riemannian Manifold (1)
● It has to be smooth, no corners
32
Riemannian Manifold (2)
● There is something called tangent space that
looks like a euclidean plane.
33
Riemannian Manifold (2)
● The distances that matter are the geodesic
34
Riemannian Manifold (3)
● Euclidean Distances are meaningless
35
Manifold Learning Stages (1)
○ A mapping that forms relationships from the data
36
Embedding as an optimization problem
min Agg f(xi
,xj
)
subject to:
g(xi
,xj
)=0
where xi
∊ Rd
, where g,f are geometric functions
37
Manifold Learning on Unlabeled data
● MVU
○ Construct the relational data based on the nearest neighbor distances
38
Manifold Learning on Labeled data
● Laplacian SVM
39
Determining the optimal Manifold
Learning
40
Lift Dimension for Linear Separation
41
Embeddings for relational
data
42
Relations as geometric quantities
atoms -> vectors
relations -> dot products between vectors
relations -> distances between vectors
43
Back to the optimization problem
44
min Agg f(xi
,xj
)
subject to:
gk
(xi
,xj
)=0
where xi
∊ Rd
f -> ||xi
-xj
||
xi
represents an atom
gk
(xi
,xj
)=0
if
Rk
(atom_i, atom_j)
i.e. <xi
,xj
>=1
Relational Data as a graph
45
What is the
meaning of
zeros?
Is embedding always possible?
● It is a symmetric square matrix A
● It does not always admit a factorization of
the form A=VVT
● So the embedding cannot reconstruct
exactly the graph
● It can reconstruct properties of the graph
46
Let’s dive into this
Friend(Molham,Garry)
Friend(Molham,TJ)
Friend(Molham, Todd)
Friend(Nick,Garry)
Friend(Nick,Todd)
Friend(Nick, TJ)
47
There is no connection
between Nick and Molham
but there is a lot of flow
between them
How do we formulate that?
max ||xi
-xj
||
subject to:
||xi
-xj
||=dij
where dij
is the
flow from i,j
48
Friend[A,B]=v <- agg<<v=total(z)>>
Friend[A,C]=_,
Friend[B,C]=_,
Friend[C,D]=_.
There is a faster way to do this without
grounding all the facts
● Spectral Embedding of Graphs
● Graph Laplacian
49
Embeddings that approximate the shortest
paths
Convex Formulation
NonConvex Formulation
50
Again here we don’t have to
compute explicitly all
shortest paths
Eureka moment
Distances or dot products can compute
complicated functions
The same way that reals could compare complicated symbolic expressions
51
What if we observe only aggregations of relations only?
52
coke
A
T
L
Monday
P
r
o
m
o
Embeddings of more rich Relational Data
1
k + + +
+ +
=
53
Embeddings of more rich Relational Data
● The factorization machine as a nonlinear
regression
54
Embeddings of more rich Relational Data
● Generalization to Tensors
55
Relations with ordering
Sometimes it is hard to quantify!
It is easier to rank…
Love(Molham,Tim)>Love(Nick,Emir)
Love(TJ,Hung) < Love(Hung, Long)
…
Love(Nick,TJ) ? Love(Emir, Molham)
56
Optimization problem
max ||xi
-xj
||
subject to:
||xi
-xj
||< ||xi
-xk
||+c
57
Embedding on ranked data (1)
● A convex method
58
What is this?
59
Manifold Learning and Embedding of
ranked data (2)
● OASIS, a semi-parametric approach
S(p,q) =
60
OASIS (2)
61
Embeddings on Count data
62
Embedding of time sequences
● Latent Markov Embedding
63
Beyond Euclidean Geometry
64
Euclidean space boring place to live
● Euclidean space is flat
● This practically means there is no
information stored on the space
● All points in space are equivalent
65
Sphere is a little bit more interesting
66
Back to manifolds
67
Geodesic distances on a manifold are expensive.
Metric changes from point to point.
In Euclidean space metric is the same everywhere
Hierarchical clustering, Poincare disks
68
Astonishing visualization
69
Astonishing results
70
Word2Vec
71
The “imbedding” problem
● Plot a graph so that edges do not cross
● A more relaxed version of embedding
● imbedding is actually solving the problem of
embedding
● imbedding requires less dimensions
72
Embedding and deep learning
73
Deep learning Review
74
How is DL related to Embedding/Manifold
learning?
75
n_output < n_input
2N
possible outputs in the first layer
Embedding
76
text LSTM vector
images convnet vector
click data MLP vector
logic rules LSTM vector
Deep Learning is unifying Machine
Learning
77
78
The importance of embedding for categorical
data
79
Reviewing one hot encoding:
hair_color: (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1)
How would you transform hair_color into a tensor?
(Local representation)
A simple regression example
y =a1
xblack
+a2
xbrown
+a3
xyellow
+a4
xred
+a4
xprice
80
(One number per category)
A different approach
y =<wcolor
,acolor
> +a4
xprice
xcolor
is a vector
wcolor
is a vector
< , > is the dot product
81
xcolor
w
What is so great about this approach?
● First of all we can learn relationships
between categories
● Adversarial debugging
error =(<wcolor
, acolor
> +a4
xprice
-y)2
● argmin(error)w,a
is the best model given the data
● argmax(error)x,w
is the datapoint that breaks
our model
82
Finding the category
83
w*
w
● Do nearest neighbor search
with the w that was found by
maximizing the error.
● Do nearest neighbor search of
w with the w*
and find the
corresponding category
Why can’t we do this with 1-hot encoding?
argmaxx
(y-a1
xblack
+a2
xbrown
+a3
xyellow
+a4
xred
+a4
xprice
)2
What if the solution is
[xblack
,xbrown
, xyellow
,xred
,xprice
] =[1.3,-0.88,6.1,3.4,0.22]
The one hot encoding structure is violated
84
Embedding with generative
models
85
What is a Generative Model
86
Vector
Generative Deep
Learning Network
image
text
video
music
Here is an Example
87
GANs and VAEs
Adversarial Training
89
● Train D to maximize the
probability of the correct
label.
● Train D on both real
images and samples from
G
● Train G to min log ( 1- D(
G(z))
● or max log D(G(z) )
z
G(z)
D(x)
Real Images
Embeddings by inverting GANs
90
Vector
Generative Deep
Learning Network
image
text
video
music
Reference
91
When Euclidean dimensions acquire
meaning (InfoGAN)
92
Applications Putting all embeddings
together
93
text
DNN
embedding
vector
generative
network
image
text
DNN
embedding
vector
generative
network
image
Connecting images and natural language
94
Connecting images with text
95
I give you text and you create an
image
96
A sketch and ask you to draw a
house
97
https://affinelayer.com/pixsrv/
Give a question generate an answer
98
Arithmetization of logic
● Godel embedded logic with primes
● When programs take as input numbers that represent
logical statements we get in trouble
● A neural network can embed data
● A neural network can embed another neural net
● What are the implications?
● Is there a universal neural network that can get a task
as an input and build a neural network that can do it?
● Do we have another incompleteness theorem?
99
Summary
● Embedding is the new normalization of data
● Embedding summarizes any info in a fixed vector
● Complicated processed can be done with dot
products
● Different types of data can pass information to each
other
● Embedding is the basic ingredient of deep learning
100

More Related Content

PPTX
Daa unit 4
PPTX
1.1 and 1.2 slides revised
PDF
Complex Analysis
PDF
Teaching Mathematics Concepts via Computer Algebra Systems
PDF
Analytic construction of points on modular elliptic curves
PDF
Lesson 2 solving equations using z
DOCX
Class xi complex numbers worksheet (t) part 2
PDF
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Daa unit 4
1.1 and 1.2 slides revised
Complex Analysis
Teaching Mathematics Concepts via Computer Algebra Systems
Analytic construction of points on modular elliptic curves
Lesson 2 solving equations using z
Class xi complex numbers worksheet (t) part 2
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...

What's hot (20)

PDF
Discrete-Chapter 11 Graphs Part II
PDF
Chapter 12 linear_programming
PDF
Au2419291933
PDF
Multimodal Residual Networks for Visual QA
PDF
Kakuro: Solving the Constraint Satisfaction Problem
PDF
A New Approach for Ranking Shadowed Fuzzy Numbers and its Application
PDF
Discrete-Chapter 11 Graphs Part III
PDF
Discrete-Chapter 11 Graphs Part I
PDF
Neural Collaborative Subspace Clustering
PPTX
Non Deterministic and Deterministic Problems
PDF
Deep learning @ University of Oradea - part I (16 Jan. 2018)
DOCX
Class xii inverse trigonometric function worksheet (t)
PDF
Discrete-Chapter 02 Functions and Sequences
PDF
E-Cordial Labeling of Some Mirror Graphs
DOCX
Permutation & Combination (t) Part 1
PDF
129215 specimen-paper-and-mark-schemes
PDF
IIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
PDF
Acet syllabus 1 fac
PDF
11.polynomial regression model of making cost prediction in mixed cost analysis
PDF
Polynomial regression model of making cost prediction in mixed cost analysis
Discrete-Chapter 11 Graphs Part II
Chapter 12 linear_programming
Au2419291933
Multimodal Residual Networks for Visual QA
Kakuro: Solving the Constraint Satisfaction Problem
A New Approach for Ranking Shadowed Fuzzy Numbers and its Application
Discrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part I
Neural Collaborative Subspace Clustering
Non Deterministic and Deterministic Problems
Deep learning @ University of Oradea - part I (16 Jan. 2018)
Class xii inverse trigonometric function worksheet (t)
Discrete-Chapter 02 Functions and Sequences
E-Cordial Labeling of Some Mirror Graphs
Permutation & Combination (t) Part 1
129215 specimen-paper-and-mark-schemes
IIT JAM MATH 2018 Question Paper | Sourav Sir's Classes
Acet syllabus 1 fac
11.polynomial regression model of making cost prediction in mixed cost analysis
Polynomial regression model of making cost prediction in mixed cost analysis
Ad

Similar to Embeddings the geometry of relational algebra (20)

PDF
Presentation
PDF
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
PDF
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
PDF
Dimensionality reduction with UMAP
PDF
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
PPTX
Fassold-MMAsia2023-Tutorial-GeometricDL-Part1.pptx
PDF
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
PPTX
2018 icml part 2 segwangkim
PPTX
Manifold learning
PPT
. An introduction to machine learning and probabilistic ...
PDF
Feature Engineering
PPTX
17- Kernels and Clustering.pptx
PPT
Support Vector Machines Support Vector Machines
PPT
lecture_mooney.ppt
PDF
Visualizing Data Using t-SNE
PDF
Icml2012 tutorial representation_learning
PDF
Nonlinear dimension reduction
PPTX
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
PDF
Low-rank tensor approximation (Introduction)
PPTX
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
Presentation
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
Dimensionality reduction with UMAP
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Fassold-MMAsia2023-Tutorial-GeometricDL-Part1.pptx
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
2018 icml part 2 segwangkim
Manifold learning
. An introduction to machine learning and probabilistic ...
Feature Engineering
17- Kernels and Clustering.pptx
Support Vector Machines Support Vector Machines
lecture_mooney.ppt
Visualizing Data Using t-SNE
Icml2012 tutorial representation_learning
Nonlinear dimension reduction
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
Low-rank tensor approximation (Introduction)
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
Ad

Recently uploaded (20)

PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
BIOMOLECULES PPT........................
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Pharmacology of Autonomic nervous system
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
2. Earth - The Living Planet earth and life
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
INTRODUCTION TO EVS | Concept of sustainability
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Phytochemical Investigation of Miliusa longipes.pdf
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
lecture 2026 of Sjogren's syndrome l .pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ECG_Course_Presentation د.محمد صقران ppt
BIOMOLECULES PPT........................
Biophysics 2.pdffffffffffffffffffffffffff
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Pharmacology of Autonomic nervous system
Placing the Near-Earth Object Impact Probability in Context
7. General Toxicologyfor clinical phrmacy.pptx
6.1 High Risk New Born. Padetric health ppt
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
2. Earth - The Living Planet earth and life

Embeddings the geometry of relational algebra

  • 1. Embeddings: The geometry of Relational Algebra Nikolaos Vasiloglou 1
  • 2. Outline 1. Geometry in Greece 2. The birth of Real numbers 3. Geometry in France 4. From Geometry to Data Science 5. Lifting dimensions 6. Manifold Learning 7. Embedding and Relational data 8. Beyond Euclidean Geometry 9. Embedding and deep learning 2
  • 4. Geometry in the days of Euclid edge(A,B) edge(B,C) edge(C,A) A B C 4
  • 6. Is everything symbolic? ● len[A,B] = x ● What is x? ● x can only be 1 ! ● angle(A) = y ● What is y? ● y can only be 900 6
  • 7. Constructing quantities ● Natural numbers by repetitions ● Orthogonal Isosceles triangle constructs Sqrt(2) ● Equilateral triangle constructs 600 angles ● Congruent triangles construct rational numbers 7
  • 8. The existence of Reals ● It took several decades ● and a lot of logicians ● Until we know they exist ● We cannot construct all of them with geometry nor with relational algebra 8
  • 9. What is great about reals? ● Reals can represent the output of complicated programs, or any sequences Example ⅓ =0.333333333…. √2 = 1.414…. sin(√3) = 0.98702664499 9
  • 10. Expressions are standardized ● And we can compute distances very easily ● Subtraction computes distance in O(ε) ● ε is the number of digits ● We can also compare distances very fast And don’t forget reals=arithmetization of logic (Not all logic) 10
  • 12. Geometry in the days of Descartes A = (0, 1) B = (-0.5, 0) C = (0.5, 0) 12 A B C
  • 13. Key point in Cartesian Geometry ● Geometrical shapes are collection of vectors ● Vectors consist of REAL numbers ● Dimensions are independent 13
  • 14. Why Cartesian Geometry? 14 Euclidean Cartesian computation of distances and angles Construction of a program distance computations* angle computations proof of equalities inequalities for abstract shapes A* search other complicated methods Calculus, use theorems of Reals*
  • 16. From Geometry to data (science) 16
  • 17. Absolute (Cartesian) (Dense) Data ● The value of every dimension is determined ● A (Euclidean) Metric is assumed (Banach) ● An inner product (Hilbert) is also assumed ● Small distances are meaningful ● Big distances are meaningless 17
  • 18. Relational Data ● Sparse representation ● Only some relations between atoms are defined ● There is no straight way to compute any relationship ● We usually call them graphs 18
  • 19. Absolute vs Relational Representation 0 1 0 -1 1 0 -1 0 1 -1 -1 1 0 0 0 0 0 0 0 0 1 -1 -1 1 ● Storage Requirements ○ N x d ○ N x N 19
  • 20. Why do we want to convert relational data to cartesian? 20
  • 21. Relational -> Absolute (Embedding) ● Assign a k-dimensional vector to every entity (graph node) ● Every edge (relationship) is computed as an inner product ● Easy to store, good locality, visualization ● Not every graph is embeddable in a Hilbert space 21
  • 22. Does it make sense to convert cartesian data to relational? 22
  • 23. Absolute -> Relational -> Cartesian Manifold Learning ● Why would someone want to do that? ● Practically speaking it means increasing the dimensionality of the data ● Reminds you of the kernel trick? ● How do you do that? 23
  • 24. From Absolute to Relational 24
  • 25. All Nearest Neighbor graph ● Project data on itself ○ based on a metric ○ or a similarity all nearest 25
  • 26. Bipartite Projection (1) ● K-means with big K k-means N d N k 26
  • 27. Bipartite Projection (2) ● Random trees ○ Kd-trees,decision trees recurse to the leaves N d N k 27
  • 28. Bipartite Projection (3) ● Random Projections random projections N d N k 28
  • 29. Why do we want to do this? ● In lower dimensions linear models do not work ● In high dimensions things are linear 29
  • 30. Relations ● Relations express local information ○ Friend(Nick, TJ) ○ Friend(TJ, Molham) ○ Friend(x,y) <- Friend(x,z), Friend(y,z) ● But we don’t know if Friend(Nick, Molham) ● Can we find a geometric structure that encodes information locally and allows inference? 30
  • 31. Manifolds = information is local AKA Friend(Nick, TJ) 31
  • 32. Riemannian Manifold (1) ● It has to be smooth, no corners 32
  • 33. Riemannian Manifold (2) ● There is something called tangent space that looks like a euclidean plane. 33
  • 34. Riemannian Manifold (2) ● The distances that matter are the geodesic 34
  • 35. Riemannian Manifold (3) ● Euclidean Distances are meaningless 35
  • 36. Manifold Learning Stages (1) ○ A mapping that forms relationships from the data 36
  • 37. Embedding as an optimization problem min Agg f(xi ,xj ) subject to: g(xi ,xj )=0 where xi ∊ Rd , where g,f are geometric functions 37
  • 38. Manifold Learning on Unlabeled data ● MVU ○ Construct the relational data based on the nearest neighbor distances 38
  • 39. Manifold Learning on Labeled data ● Laplacian SVM 39
  • 40. Determining the optimal Manifold Learning 40
  • 41. Lift Dimension for Linear Separation 41
  • 43. Relations as geometric quantities atoms -> vectors relations -> dot products between vectors relations -> distances between vectors 43
  • 44. Back to the optimization problem 44 min Agg f(xi ,xj ) subject to: gk (xi ,xj )=0 where xi ∊ Rd f -> ||xi -xj || xi represents an atom gk (xi ,xj )=0 if Rk (atom_i, atom_j) i.e. <xi ,xj >=1
  • 45. Relational Data as a graph 45 What is the meaning of zeros?
  • 46. Is embedding always possible? ● It is a symmetric square matrix A ● It does not always admit a factorization of the form A=VVT ● So the embedding cannot reconstruct exactly the graph ● It can reconstruct properties of the graph 46
  • 47. Let’s dive into this Friend(Molham,Garry) Friend(Molham,TJ) Friend(Molham, Todd) Friend(Nick,Garry) Friend(Nick,Todd) Friend(Nick, TJ) 47 There is no connection between Nick and Molham but there is a lot of flow between them
  • 48. How do we formulate that? max ||xi -xj || subject to: ||xi -xj ||=dij where dij is the flow from i,j 48 Friend[A,B]=v <- agg<<v=total(z)>> Friend[A,C]=_, Friend[B,C]=_, Friend[C,D]=_.
  • 49. There is a faster way to do this without grounding all the facts ● Spectral Embedding of Graphs ● Graph Laplacian 49
  • 50. Embeddings that approximate the shortest paths Convex Formulation NonConvex Formulation 50 Again here we don’t have to compute explicitly all shortest paths
  • 51. Eureka moment Distances or dot products can compute complicated functions The same way that reals could compare complicated symbolic expressions 51
  • 52. What if we observe only aggregations of relations only? 52 coke A T L Monday P r o m o
  • 53. Embeddings of more rich Relational Data 1 k + + + + + = 53
  • 54. Embeddings of more rich Relational Data ● The factorization machine as a nonlinear regression 54
  • 55. Embeddings of more rich Relational Data ● Generalization to Tensors 55
  • 56. Relations with ordering Sometimes it is hard to quantify! It is easier to rank… Love(Molham,Tim)>Love(Nick,Emir) Love(TJ,Hung) < Love(Hung, Long) … Love(Nick,TJ) ? Love(Emir, Molham) 56
  • 57. Optimization problem max ||xi -xj || subject to: ||xi -xj ||< ||xi -xk ||+c 57
  • 58. Embedding on ranked data (1) ● A convex method 58
  • 60. Manifold Learning and Embedding of ranked data (2) ● OASIS, a semi-parametric approach S(p,q) = 60
  • 63. Embedding of time sequences ● Latent Markov Embedding 63
  • 65. Euclidean space boring place to live ● Euclidean space is flat ● This practically means there is no information stored on the space ● All points in space are equivalent 65
  • 66. Sphere is a little bit more interesting 66
  • 67. Back to manifolds 67 Geodesic distances on a manifold are expensive. Metric changes from point to point. In Euclidean space metric is the same everywhere
  • 72. The “imbedding” problem ● Plot a graph so that edges do not cross ● A more relaxed version of embedding ● imbedding is actually solving the problem of embedding ● imbedding requires less dimensions 72
  • 73. Embedding and deep learning 73
  • 75. How is DL related to Embedding/Manifold learning? 75 n_output < n_input 2N possible outputs in the first layer
  • 76. Embedding 76 text LSTM vector images convnet vector click data MLP vector logic rules LSTM vector
  • 77. Deep Learning is unifying Machine Learning 77
  • 78. 78
  • 79. The importance of embedding for categorical data 79 Reviewing one hot encoding: hair_color: (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1) How would you transform hair_color into a tensor? (Local representation)
  • 80. A simple regression example y =a1 xblack +a2 xbrown +a3 xyellow +a4 xred +a4 xprice 80 (One number per category)
  • 81. A different approach y =<wcolor ,acolor > +a4 xprice xcolor is a vector wcolor is a vector < , > is the dot product 81 xcolor w
  • 82. What is so great about this approach? ● First of all we can learn relationships between categories ● Adversarial debugging error =(<wcolor , acolor > +a4 xprice -y)2 ● argmin(error)w,a is the best model given the data ● argmax(error)x,w is the datapoint that breaks our model 82
  • 83. Finding the category 83 w* w ● Do nearest neighbor search with the w that was found by maximizing the error. ● Do nearest neighbor search of w with the w* and find the corresponding category
  • 84. Why can’t we do this with 1-hot encoding? argmaxx (y-a1 xblack +a2 xbrown +a3 xyellow +a4 xred +a4 xprice )2 What if the solution is [xblack ,xbrown , xyellow ,xred ,xprice ] =[1.3,-0.88,6.1,3.4,0.22] The one hot encoding structure is violated 84
  • 86. What is a Generative Model 86 Vector Generative Deep Learning Network image text video music
  • 87. Here is an Example 87
  • 89. Adversarial Training 89 ● Train D to maximize the probability of the correct label. ● Train D on both real images and samples from G ● Train G to min log ( 1- D( G(z)) ● or max log D(G(z) ) z G(z) D(x) Real Images
  • 90. Embeddings by inverting GANs 90 Vector Generative Deep Learning Network image text video music
  • 92. When Euclidean dimensions acquire meaning (InfoGAN) 92
  • 93. Applications Putting all embeddings together 93 text DNN embedding vector generative network image text DNN embedding vector generative network image
  • 94. Connecting images and natural language 94
  • 96. I give you text and you create an image 96
  • 97. A sketch and ask you to draw a house 97 https://affinelayer.com/pixsrv/
  • 98. Give a question generate an answer 98
  • 99. Arithmetization of logic ● Godel embedded logic with primes ● When programs take as input numbers that represent logical statements we get in trouble ● A neural network can embed data ● A neural network can embed another neural net ● What are the implications? ● Is there a universal neural network that can get a task as an input and build a neural network that can do it? ● Do we have another incompleteness theorem? 99
  • 100. Summary ● Embedding is the new normalization of data ● Embedding summarizes any info in a fixed vector ● Complicated processed can be done with dot products ● Different types of data can pass information to each other ● Embedding is the basic ingredient of deep learning 100