20180831 riemannian representation learning

Hierarchical representation with
hyperbolic geometry
2016-20873 Segwang Kim 1

① Embedding Symbolic and Hierarchical Data
② Introduction to Hyperbolic Space
③ Optimization over Hyperbolic Space
④ Toy Experiments
Overview
2

3
Embedding Symbolic and Hierarchical Data

Symbolic and Hierarchical Data
4
Symbolic data with Implicit hierarchy.
Downstream tasks
link prediction, node classification, community detection, visualization
Wordnet Twitter Social Graph
?LINK
community

Good Hierarchical Embedding
5
For downstream tasks, symbolic and hierarchical data needs to
be embedded into space.
Good Embedding?
Embeddings of similar symbols should aggregate in some sense.
Symbolic arithmetic exists: v(King)- v(man) + v(woman)=v(Queen)
Hierarchy can be restored from embedded data.
The space should have low dimension.

6
Introduction to Hyperbolic Space

Limitation of Euclidean Embedding
7
Embed graph structure while preserving distances
Thm) Trees cannot be embedded into Euclidean space with
arbitrarily low distortion for any number of dimensions
a
b Graph Euclidean ??
D(a,b) 2 0.1 1.889
D(a,c) 2 1 1.902
D(a,d) 2 1.8 1.962
Euclidean
Graph
??
c
d
a
b
c
d
a
b
c
d
Embedding
Representation tradeoffs for hyperbolic Embeddings (ICML 2018)

Euclidean Space vs Hyperbolic space
8
𝑀 = 𝐷 𝑛 = {𝑥 ∈ ℝ 𝑛 ∶ 𝑥1
2
+ ⋯ + 𝑥 𝑛
2 < 1}
(𝐷 𝑛
,
2
1−||𝑥||2
2
𝑔)𝑔 = 𝑑𝑥1 2
+ ⋯ + 𝑑𝑥 𝑛 2
Euclidean Hyperbolic
(ℝ 𝑛, 𝑔)
𝑀 = ℝ 𝑛
Metric tensor : inner product on tangent space
= 𝑑𝑥1 𝑢 𝑑𝑥1 𝑣 + ⋯ + 𝑑𝑥 𝑛 𝑢 𝑑𝑥 𝑛(𝑣)
= 𝑢1 𝑣1 + ⋯ + 𝑢 𝑛 𝑣 𝑛
∀ 𝑢, 𝑣 ∈ 𝑇𝑝ℝ 𝑛
where 𝑝 ∈ ℝ 𝑛
𝑢, 𝑣 𝑝 = 𝑢 𝑡 𝑔𝑣
=
2
1 − ||𝑝||2
2
(𝑢1 𝑣1 + ⋯ + 𝑢 𝑛 𝑣 𝑛)
∀ 𝑢, 𝑣 ∈ 𝑇𝑝 𝐷 𝑛
where 𝑝 ∈ 𝐷 𝑛
𝑢, 𝑣 𝑝 = 𝑢 𝑡
(
2
1 − ||𝑝||2
2
𝑔)𝑣
Give Riemannian Metric

Euclidean Space vs Hyperbolic space
9
Inner product ⟨ ⋅ , ⋅ ⟩ 𝑝 in 𝑇𝑝 𝐷 𝑛 defines
Length of 𝛾: 0,1 → 𝐷 𝑛  𝐿 𝛾 = 0
1
𝛾𝑡
′
, 𝛾𝑡
′
𝛾𝑡
1/2
𝑑𝑡
Angle between 𝑤1, 𝑤2 ∈ 𝑇𝑝 𝐷 𝑛

𝑤1,𝑤2 𝑝
𝑤1,𝑤1 𝑝⋅ 𝑤2,𝑤2 𝑝
1/2
Line between 𝑝, 𝑞 ∈ 𝑀 is the shortest path between them
𝛾∗
= 𝑎𝑟𝑔𝑚𝑖𝑛
0
1
𝛾𝑡
′
, 𝛾𝑡
′
𝛾𝑡
1/2
𝑑𝑡
𝛾0 = 𝑝, 𝛾1 = 𝑞
Euclidean Hyperbolic
𝑞
𝑝
𝑞
𝑝
2
1 − ||𝑥||2
2
𝑔
→ ∞ 𝑎𝑠 |𝑥| → 1

Equivalent Hyperbolic Models
10
We can choose one of Hyperbolic Models depending on purpose.
𝐷 𝑛
= {𝑥 ∈ ℝ 𝑛
∶ 𝑥1
2
+ ⋯ + 𝑥 𝑛
2
< 1}
(𝐷 𝑛,
2
1−||𝑥||2
2
𝑑𝑥1 2 + ⋯ + 𝑑𝑥 𝑛 2)
(𝑥0, … , 𝑥 𝑛)
 For visualization  For optimization
(
𝑥1
1 + 𝑥0
, … ,
𝑥 𝑛
1 + 𝑥0
)
Poincare Model Lorentz Model
(ℒ 𝑛
, −𝑑𝑥0 2
+ 𝑑𝑥1 2
… + 𝑑𝑥 𝑛 2
)
ISOMETRIC
Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry (ICML 2018)

Suggested loss function
12
A Example of loss function over hyperbolic space.
Fundamentally, gradients of loss tells which direction the points
should proceed.
Poincaré Embeddings for Learning Hierarchical Representations (ICML 2017)

Gradient Descent Algorithm
13
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘
𝐿2
choose a retraction 𝑅 𝑝 𝑘
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
choose a step length 𝛼 𝑘 ∈ ℝ
set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘
(𝛼 𝑘 𝑣 𝑘)
𝑘 ← 𝑘 + 1
until 𝑝 𝑘+1 sufficiently minimize 𝑓
Nothing different from usual gradient descent except for
Gradient direction
Retraction
Optimization methods on Riemannian manifolds and their application to shape space (SIAM 2012)

14
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
𝐿2
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
𝑘 ← 𝑘 + 1
What is the gradient on Hyperbolic space?
𝑓 ∶ (ℒ2
, −𝑑𝑥0 2
+ 𝑑𝑥1 2
+ 𝑑𝑥 𝑛 2
) → ℝ
∇𝑓 ?

Hyperboloid model
15
First, find 𝛻ℝ2:1 𝑓| 𝑝 ∈ ℝ3
𝑠. 𝑡. 𝛻ℝ2:1 𝑓| 𝑝, 𝑣
ℒ
= 𝑑𝑓 𝑣 | 𝑝.
Second, project 𝛻ℝ2:1 𝑓| 𝑝 into 𝑇𝑝 𝐿2.
𝛻𝐿2 𝑓| 𝑝 = 𝛻ℝ2:1 𝑓| 𝑝 + 𝛻ℝ2:1 𝑓| 𝑝, 𝑝
ℒ
𝑝
𝑇𝑝 𝐿2
= {𝑣 ∈ ℝ3
∶ 𝑣, 𝑝 ℒ = 0}.
𝐿2 = {𝑝 ∈ ℝ3: 𝑝, 𝑝 ℒ = −1, 𝑝 𝑧 > 0}.
𝑓 ∶ (ℒ2, −𝑑𝑥0 2 + 𝑑𝑥1 2 + 𝑑𝑥2 2) → ℝ
𝛻ℝ2:1 𝑓| 𝑝 = (−𝑑𝑥0 2 + 𝑑𝑥1 2 + 𝑑𝑥 𝑛 2)−1 ⋅ Usual derivative
(from tensorflow)
−𝑣 𝑘
Gradient descent in hyperbolic space (Arxiv 2018)

16
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
𝐿2
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
𝑘 ← 𝑘 + 1
What is the retraction on Hyperbolic space?

Hyperboloid model
17
Retraction tells how ends points of tangent vectors correspond
to the point on manifold.
We chose affine geodesic as retraction
𝛾𝑡 = cosh ||𝑣||ℒ 𝑡 𝑝 + sinh ||𝑣||ℒ 𝑡
𝑣
||𝑣||ℒ
𝑞′ ∉ 𝐿2
𝑅(𝑞′
) ∈ 𝐿2
At 𝑝 ∈ 𝐿2 with direction 𝑣 ∈ 𝑇𝑝 𝐿2

18
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
𝐿2
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
𝑘 ← 𝑘 + 1
The next point becomes
𝑝 𝑘+1 = 𝑅 𝑝 𝑘
= cosh ||𝑣 𝑘||ℒ 𝛼 𝑘 𝑝 𝑘 + sinh ||𝑣 𝑘||ℒ 𝛼 𝑘
𝑣 𝑘
||𝑣 𝑘||ℒ

Simple Optimization Task1
19
GD with gradients GD with R-gradients R-GD with R-gradients
𝑝𝑡 = 𝑝𝑡−1 − 𝛼 ⋅ 𝛻𝐸 𝐿(𝑝𝑡−1) 𝑝𝑡 = 𝑝𝑡−1 − 𝛼 ⋅ 𝛻𝑅 𝐿(𝑝𝑡−1)
𝑝𝑡 = 𝛾 𝛼
𝛾0 = 𝑝𝑡−1 𝛾0
′
= 𝛻𝑅 𝐿(𝑝𝑡−1)
3.3024998, 4.7424998,
4.7859879, 4.8213577,
4.851644, 4.8784704,
4.9028177, 4.9253302
3.3024998, 3.3081245,
3.3175893, 3.3334663,
3.3599658, 3.403821,
3.4753809, 3.5894651
3.3024998, 3.3025002,
3.3025002, 3.3025002,
3.3025005, 3.3025,
3.3025002, 3.3025005

20
𝐿(𝑝) =
𝑖
𝑑 𝐿2 𝑝, 𝑥𝑖
2
“Barycenter” can be found by minimizing

22
𝐿(𝑝) =
𝑖
𝑑 𝐿2 𝑝, 𝑥𝑖
2
“Barycenter”
can be found by minimizing

Takeaways
23
Hyperbolic space is promising to represent symbolic and
hierarchical datasets.
Geometry determines path toward optimal points.
Regardless of optimization technique, the optimal point is only
depends on loss function.
Interpretation: Can the path entail semantics?
Loss function over hyperbolic space should be discreetly
chosen.
Is it suitable for given geometry? Differentiable? / operation?
Unfortunately, we loose simple arithmetic.

20180831 riemannian representation learning

More Related Content

What's hot (20)

Similar to 20180831 riemannian representation learning (20)

Recently uploaded (20)

20180831 riemannian representation learning

Editor's Notes