SlideShare a Scribd company logo
Hierarchical representation with
hyperbolic geometry
2016-20873 Segwang Kim 1
① Embedding Symbolic and Hierarchical Data
② Introduction to Hyperbolic Space
⑱ Optimization over Hyperbolic Space
④ Toy Experiments
Overview
2
3
Embedding Symbolic and Hierarchical Data
Symbolic and Hierarchical Data
4
Symbolic data with Implicit hierarchy.
Downstream tasks
link prediction, node classification, community detection, visualization
Wordnet Twitter Social Graph
?LINK
community
Good Hierarchical Embedding
5
For downstream tasks, symbolic and hierarchical data needs to
be embedded into space.
Good Embedding?
Embeddings of similar symbols should aggregate in some sense.
Symbolic arithmetic exists: v(King)- v(man) + v(woman)=v(Queen)
Hierarchy can be restored from embedded data.
The space should have low dimension.
6
Introduction to Hyperbolic Space
Limitation of Euclidean Embedding
7
Embed graph structure while preserving distances
Thm) Trees cannot be embedded into Euclidean space with
arbitrarily low distortion for any number of dimensions
a
b Graph Euclidean ??
D(a,b) 2 0.1 1.889
D(a,c) 2 1 1.902
D(a,d) 2 1.8 1.962
Euclidean
Graph
??
c
d
a
b
c
d
a
b
c
d
Embedding
Representation tradeoffs for hyperbolic Embeddings (ICML 2018)
Euclidean Space vs Hyperbolic space
8
𝑀 = đ· 𝑛 = {đ‘„ ∈ ℝ 𝑛 ∶ đ‘„1
2
+ ⋯ + đ‘„ 𝑛
2 < 1}
(đ· 𝑛
,
2
1−||đ‘„||2
2
𝑔)𝑔 = đ‘‘đ‘„1 2
+ ⋯ + đ‘‘đ‘„ 𝑛 2
Euclidean Hyperbolic
(ℝ 𝑛, 𝑔)
𝑀 = ℝ 𝑛
Metric tensor : inner product on tangent space
= đ‘‘đ‘„1 𝑱 đ‘‘đ‘„1 𝑣 + ⋯ + đ‘‘đ‘„ 𝑛 𝑱 đ‘‘đ‘„ 𝑛(𝑣)
= 𝑱1 𝑣1 + ⋯ + 𝑱 𝑛 𝑣 𝑛
∀ 𝑱, 𝑣 ∈ 𝑇𝑝ℝ 𝑛
where 𝑝 ∈ ℝ 𝑛
𝑱, 𝑣 𝑝 = 𝑱 𝑡 𝑔𝑣
=
2
1 − ||𝑝||2
2
(𝑱1 𝑣1 + ⋯ + 𝑱 𝑛 𝑣 𝑛)
∀ 𝑱, 𝑣 ∈ 𝑇𝑝 đ· 𝑛
where 𝑝 ∈ đ· 𝑛
𝑱, 𝑣 𝑝 = 𝑱 𝑡
(
2
1 − ||𝑝||2
2
𝑔)𝑣
Give Riemannian Metric
Euclidean Space vs Hyperbolic space
9
Inner product ⟹ ⋅ , ⋅ ⟩ 𝑝 in 𝑇𝑝 đ· 𝑛 defines
Length of đ›Ÿ: 0,1 → đ· 𝑛  𝐿 đ›Ÿ = 0
1
đ›Ÿđ‘Ą
â€Č
, đ›Ÿđ‘Ą
â€Č
đ›Ÿđ‘Ą
1/2
𝑑𝑡
Angle between đ‘€1, đ‘€2 ∈ 𝑇𝑝 đ· 𝑛

đ‘€1,đ‘€2 𝑝
đ‘€1,đ‘€1 𝑝⋅ đ‘€2,đ‘€2 𝑝
1/2
Line between 𝑝, 𝑞 ∈ 𝑀 is the shortest path between them
đ›Ÿâˆ—
= 𝑎𝑟𝑔𝑚𝑖𝑛
0
1
đ›Ÿđ‘Ą
â€Č
, đ›Ÿđ‘Ą
â€Č
đ›Ÿđ‘Ą
1/2
𝑑𝑡
đ›Ÿ0 = 𝑝, đ›Ÿ1 = 𝑞
Euclidean Hyperbolic
𝑞
𝑝
𝑞
𝑝
2
1 − ||đ‘„||2
2
𝑔
→ ∞ 𝑎𝑠 |đ‘„| → 1
Equivalent Hyperbolic Models
10
We can choose one of Hyperbolic Models depending on purpose.
đ· 𝑛
= {đ‘„ ∈ ℝ 𝑛
∶ đ‘„1
2
+ ⋯ + đ‘„ 𝑛
2
< 1}
(đ· 𝑛,
2
1−||đ‘„||2
2
đ‘‘đ‘„1 2 + ⋯ + đ‘‘đ‘„ 𝑛 2)
(đ‘„0, 
 , đ‘„ 𝑛)
 For visualization  For optimization
(
đ‘„1
1 + đ‘„0
, 
 ,
đ‘„ 𝑛
1 + đ‘„0
)
Poincare Model Lorentz Model
(ℒ 𝑛
, âˆ’đ‘‘đ‘„0 2
+ đ‘‘đ‘„1 2

 + đ‘‘đ‘„ 𝑛 2
)
ISOMETRIC
Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry (ICML 2018)
11
Optimization Techniques
Suggested loss function
12
A Example of loss function over hyperbolic space.
Fundamentally, gradients of loss tells which direction the points
should proceed.
Poincaré Embeddings for Learning Hierarchical Representations (ICML 2017)
Gradient Descent Algorithm
13
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘
𝐿2
choose a retraction 𝑅 𝑝 𝑘
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
choose a step length đ›Œ 𝑘 ∈ ℝ
set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘
(đ›Œ 𝑘 𝑣 𝑘)
𝑘 ← 𝑘 + 1
until 𝑝 𝑘+1 sufficiently minimize 𝑓
Nothing different from usual gradient descent except for
Gradient direction
Retraction
Optimization methods on Riemannian manifolds and their application to shape space (SIAM 2012)
Gradient Descent Algorithm
14
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘
𝐿2
choose a retraction 𝑅 𝑝 𝑘
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
choose a step length đ›Œ 𝑘 ∈ ℝ
set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘
(đ›Œ 𝑘 𝑣 𝑘)
𝑘 ← 𝑘 + 1
until 𝑝 𝑘+1 sufficiently minimize 𝑓
What is the gradient on Hyperbolic space?
𝑓 ∶ (ℒ2
, âˆ’đ‘‘đ‘„0 2
+ đ‘‘đ‘„1 2
+ đ‘‘đ‘„ 𝑛 2
) → ℝ
∇𝑓 ?
Hyperboloid model
15
First, find đ›»â„2:1 𝑓| 𝑝 ∈ ℝ3
𝑠. 𝑡. đ›»â„2:1 𝑓| 𝑝, 𝑣
ℒ
= 𝑑𝑓 𝑣 | 𝑝.
Second, project đ›»â„2:1 𝑓| 𝑝 into 𝑇𝑝 𝐿2.
đ›»đż2 𝑓| 𝑝 = đ›»â„2:1 𝑓| 𝑝 + đ›»â„2:1 𝑓| 𝑝, 𝑝
ℒ
𝑝
𝑇𝑝 𝐿2
= {𝑣 ∈ ℝ3
∶ 𝑣, 𝑝 ℒ = 0}.
𝐿2 = {𝑝 ∈ ℝ3: 𝑝, 𝑝 ℒ = −1, 𝑝 𝑧 > 0}.
𝑓 ∶ (ℒ2, âˆ’đ‘‘đ‘„0 2 + đ‘‘đ‘„1 2 + đ‘‘đ‘„2 2) → ℝ
đ›»â„2:1 𝑓| 𝑝 = (âˆ’đ‘‘đ‘„0 2 + đ‘‘đ‘„1 2 + đ‘‘đ‘„ 𝑛 2)−1 ⋅ Usual derivative
(from tensorflow)
−𝑣 𝑘
Gradient descent in hyperbolic space (Arxiv 2018)
Gradient Descent Algorithm
16
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘
𝐿2
choose a retraction 𝑅 𝑝 𝑘
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
choose a step length đ›Œ 𝑘 ∈ ℝ
set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘
(đ›Œ 𝑘 𝑣 𝑘)
𝑘 ← 𝑘 + 1
until 𝑝 𝑘+1 sufficiently minimize 𝑓
What is the retraction on Hyperbolic space?
Hyperboloid model
17
Retraction tells how ends points of tangent vectors correspond
to the point on manifold.
We chose affine geodesic as retraction
đ›Ÿđ‘Ą = cosh ||𝑣||ℒ 𝑡 𝑝 + sinh ||𝑣||ℒ 𝑡
𝑣
||𝑣||ℒ
𝑞â€Č ∉ 𝐿2
𝑅(𝑞â€Č
) ∈ 𝐿2
At 𝑝 ∈ 𝐿2 with direction 𝑣 ∈ 𝑇𝑝 𝐿2
Gradient Descent Algorithm
18
Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0
repeat
choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘
𝐿2
choose a retraction 𝑅 𝑝 𝑘
: 𝑇𝑝 𝑘
𝐿2
→ 𝐿2
choose a step length đ›Œ 𝑘 ∈ ℝ
set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘
(đ›Œ 𝑘 𝑣 𝑘)
𝑘 ← 𝑘 + 1
until 𝑝 𝑘+1 sufficiently minimize 𝑓
The next point becomes
𝑝 𝑘+1 = 𝑅 𝑝 𝑘
(đ›Œ 𝑘 𝑣 𝑘)
= cosh ||𝑣 𝑘||ℒ đ›Œ 𝑘 𝑝 𝑘 + sinh ||𝑣 𝑘||ℒ đ›Œ 𝑘
𝑣 𝑘
||𝑣 𝑘||ℒ
Simple Optimization Task1
19
GD with gradients GD with R-gradients R-GD with R-gradients
𝑝𝑡 = 𝑝𝑡−1 − đ›Œ ⋅ đ›»đž 𝐿(𝑝𝑡−1) 𝑝𝑡 = 𝑝𝑡−1 − đ›Œ ⋅ đ›»đ‘… 𝐿(𝑝𝑡−1)
𝑝𝑡 = đ›Ÿ đ›Œ
đ›Ÿ0 = 𝑝𝑡−1 đ›Ÿ0
â€Č
= đ›»đ‘… 𝐿(𝑝𝑡−1)
3.3024998, 4.7424998,
4.7859879, 4.8213577,
4.851644, 4.8784704,
4.9028177, 4.9253302
3.3024998, 3.3081245,
3.3175893, 3.3334663,
3.3599658, 3.403821,
3.4753809, 3.5894651
3.3024998, 3.3025002,
3.3025002, 3.3025002,
3.3025005, 3.3025,
3.3025002, 3.3025005
Simple Optimization Task2
20
𝐿(𝑝) =
𝑖
𝑑 𝐿2 𝑝, đ‘„đ‘–
2
“Barycenter” can be found by minimizing
Simple Optimization Task2
21
Simple Optimization Task2
22
𝐿(𝑝) =
𝑖
𝑑 𝐿2 𝑝, đ‘„đ‘–
2
“Barycenter”
can be found by minimizing
Takeaways
23
Hyperbolic space is promising to represent symbolic and
hierarchical datasets.
Geometry determines path toward optimal points.
Regardless of optimization technique, the optimal point is only
depends on loss function.
Interpretation: Can the path entail semantics?
Loss function over hyperbolic space should be discreetly
chosen.
Is it suitable for given geometry? Differentiable? / operation?
Unfortunately, we loose simple arithmetic.

More Related Content

PPT
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
PPTX
Shortest path problem
PDF
2021 preTEST4A Vector Calculus
PPT
Unit26 shortest pathalgorithm
PPT
Counting Sort and Radix Sort Algorithms
PPTX
Dijkstra’s algorithm
PDF
All pairs shortest path algorithm
DOCX
Application of vector integration
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path problem
2021 preTEST4A Vector Calculus
Unit26 shortest pathalgorithm
Counting Sort and Radix Sort Algorithms
Dijkstra’s algorithm
All pairs shortest path algorithm
Application of vector integration

What's hot (20)

PPTX
It elective-4-buendia lagua
DOCX
Question 5 Math 1
DOCX
Question 4 Math 1
PPTX
Dijkstra's Algorithm
ODP
Power point vector
PDF
Prestation_ClydeShen
PPT
2D transformation (Computer Graphics)
PDF
Shortest path search for real road networks and dynamic costs with pgRouting
PDF
Svm soft margin hyperplanes
DOCX
Parallel tansport sssqrd
PPTX
Unit 6.1
PDF
Fractional Calculus A Commutative Method on Real Analytic Functions
PPT
chapter-8.ppt
PPTX
NUMERICAL INTEGRATION : ERROR FORMULA, GAUSSIAN QUADRATURE FORMULA
PDF
CG 2D Transformation
PPTX
Triangle law of vector addition
PPT
3d Projection
PPTX
Relaxation method
PPTX
Scalars & vectors
It elective-4-buendia lagua
Question 5 Math 1
Question 4 Math 1
Dijkstra's Algorithm
Power point vector
Prestation_ClydeShen
2D transformation (Computer Graphics)
Shortest path search for real road networks and dynamic costs with pgRouting
Svm soft margin hyperplanes
Parallel tansport sssqrd
Unit 6.1
Fractional Calculus A Commutative Method on Real Analytic Functions
chapter-8.ppt
NUMERICAL INTEGRATION : ERROR FORMULA, GAUSSIAN QUADRATURE FORMULA
CG 2D Transformation
Triangle law of vector addition
3d Projection
Relaxation method
Scalars & vectors
Ad

Similar to 20180831 riemannian representation learning (20)

PPTX
Icra 17
PDF
Concepts and Applications of the Fundamental Theorem of Line Integrals.pdf
PDF
A PROBABILISTIC ALGORITHM FOR COMPUTATION OF POLYNOMIAL GREATEST COMMON WITH ...
PDF
A PROBABILISTIC ALGORITHM FOR COMPUTATION OF POLYNOMIAL GREATEST COMMON WITH ...
PDF
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
PDF
A Non Local Boundary Value Problem with Integral Boundary Condition
PDF
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
PDF
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
PPTX
Linear regression, costs & gradient descent
PDF
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
PPTX
Complex differentiation contains analytic function.pptx
PPTX
Machine learning introduction lecture notes
PDF
Matrix Transformations on Some Difference Sequence Spaces
PPTX
Unit-1 Basic Concept of Algorithm.pptx
PPTX
Lesson 3: Problem Set 4
PDF
A Szemeredi-type theorem for subsets of the unit cube
PDF
Design of Second Order Digital Differentiator and Integrator Using Forward Di...
PDF
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
PDF
A05330107
PDF
"Incremental Lossless Graph Summarization", KDD 2020
Icra 17
Concepts and Applications of the Fundamental Theorem of Line Integrals.pdf
A PROBABILISTIC ALGORITHM FOR COMPUTATION OF POLYNOMIAL GREATEST COMMON WITH ...
A PROBABILISTIC ALGORITHM FOR COMPUTATION OF POLYNOMIAL GREATEST COMMON WITH ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Non Local Boundary Value Problem with Integral Boundary Condition
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
Linear regression, costs & gradient descent
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
Complex differentiation contains analytic function.pptx
Machine learning introduction lecture notes
Matrix Transformations on Some Difference Sequence Spaces
Unit-1 Basic Concept of Algorithm.pptx
Lesson 3: Problem Set 4
A Szemeredi-type theorem for subsets of the unit cube
Design of Second Order Digital Differentiator and Integrator Using Forward Di...
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
A05330107
"Incremental Lossless Graph Summarization", KDD 2020
Ad

Recently uploaded (20)

PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Lecture Notes Electrical Wiring System Components
PPT
Project quality management in manufacturing
PPTX
Sustainable Sites - Green Building Construction
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Welding lecture in detail for understanding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Construction Project Organization Group 2.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
composite construction of structures.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Lecture Notes Electrical Wiring System Components
Project quality management in manufacturing
Sustainable Sites - Green Building Construction
Internet of Things (IOT) - A guide to understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
additive manufacturing of ss316l using mig welding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT 4 Total Quality Management .pptx
Welding lecture in detail for understanding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mechanical Engineering MATERIALS Selection
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Construction Project Organization Group 2.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
composite construction of structures.pdf

20180831 riemannian representation learning

  • 1. Hierarchical representation with hyperbolic geometry 2016-20873 Segwang Kim 1
  • 2. ① Embedding Symbolic and Hierarchical Data ② Introduction to Hyperbolic Space ⑱ Optimization over Hyperbolic Space ④ Toy Experiments Overview 2
  • 3. 3 Embedding Symbolic and Hierarchical Data
  • 4. Symbolic and Hierarchical Data 4 Symbolic data with Implicit hierarchy. Downstream tasks link prediction, node classification, community detection, visualization Wordnet Twitter Social Graph ?LINK community
  • 5. Good Hierarchical Embedding 5 For downstream tasks, symbolic and hierarchical data needs to be embedded into space. Good Embedding? Embeddings of similar symbols should aggregate in some sense. Symbolic arithmetic exists: v(King)- v(man) + v(woman)=v(Queen) Hierarchy can be restored from embedded data. The space should have low dimension.
  • 7. Limitation of Euclidean Embedding 7 Embed graph structure while preserving distances Thm) Trees cannot be embedded into Euclidean space with arbitrarily low distortion for any number of dimensions a b Graph Euclidean ?? D(a,b) 2 0.1 1.889 D(a,c) 2 1 1.902 D(a,d) 2 1.8 1.962 Euclidean Graph ?? c d a b c d a b c d Embedding Representation tradeoffs for hyperbolic Embeddings (ICML 2018)
  • 8. Euclidean Space vs Hyperbolic space 8 𝑀 = đ· 𝑛 = {đ‘„ ∈ ℝ 𝑛 ∶ đ‘„1 2 + ⋯ + đ‘„ 𝑛 2 < 1} (đ· 𝑛 , 2 1−||đ‘„||2 2 𝑔)𝑔 = đ‘‘đ‘„1 2 + ⋯ + đ‘‘đ‘„ 𝑛 2 Euclidean Hyperbolic (ℝ 𝑛, 𝑔) 𝑀 = ℝ 𝑛 Metric tensor : inner product on tangent space = đ‘‘đ‘„1 𝑱 đ‘‘đ‘„1 𝑣 + ⋯ + đ‘‘đ‘„ 𝑛 𝑱 đ‘‘đ‘„ 𝑛(𝑣) = 𝑱1 𝑣1 + ⋯ + 𝑱 𝑛 𝑣 𝑛 ∀ 𝑱, 𝑣 ∈ 𝑇𝑝ℝ 𝑛 where 𝑝 ∈ ℝ 𝑛 𝑱, 𝑣 𝑝 = 𝑱 𝑡 𝑔𝑣 = 2 1 − ||𝑝||2 2 (𝑱1 𝑣1 + ⋯ + 𝑱 𝑛 𝑣 𝑛) ∀ 𝑱, 𝑣 ∈ 𝑇𝑝 đ· 𝑛 where 𝑝 ∈ đ· 𝑛 𝑱, 𝑣 𝑝 = 𝑱 𝑡 ( 2 1 − ||𝑝||2 2 𝑔)𝑣 Give Riemannian Metric
  • 9. Euclidean Space vs Hyperbolic space 9 Inner product ⟹ ⋅ , ⋅ ⟩ 𝑝 in 𝑇𝑝 đ· 𝑛 defines Length of đ›Ÿ: 0,1 → đ· 𝑛  𝐿 đ›Ÿ = 0 1 đ›Ÿđ‘Ą â€Č , đ›Ÿđ‘Ą â€Č đ›Ÿđ‘Ą 1/2 𝑑𝑡 Angle between đ‘€1, đ‘€2 ∈ 𝑇𝑝 đ· 𝑛  đ‘€1,đ‘€2 𝑝 đ‘€1,đ‘€1 𝑝⋅ đ‘€2,đ‘€2 𝑝 1/2 Line between 𝑝, 𝑞 ∈ 𝑀 is the shortest path between them đ›Ÿâˆ— = 𝑎𝑟𝑔𝑚𝑖𝑛 0 1 đ›Ÿđ‘Ą â€Č , đ›Ÿđ‘Ą â€Č đ›Ÿđ‘Ą 1/2 𝑑𝑡 đ›Ÿ0 = 𝑝, đ›Ÿ1 = 𝑞 Euclidean Hyperbolic 𝑞 𝑝 𝑞 𝑝 2 1 − ||đ‘„||2 2 𝑔 → ∞ 𝑎𝑠 |đ‘„| → 1
  • 10. Equivalent Hyperbolic Models 10 We can choose one of Hyperbolic Models depending on purpose. đ· 𝑛 = {đ‘„ ∈ ℝ 𝑛 ∶ đ‘„1 2 + ⋯ + đ‘„ 𝑛 2 < 1} (đ· 𝑛, 2 1−||đ‘„||2 2 đ‘‘đ‘„1 2 + ⋯ + đ‘‘đ‘„ 𝑛 2) (đ‘„0, 
 , đ‘„ 𝑛)  For visualization  For optimization ( đ‘„1 1 + đ‘„0 , 
 , đ‘„ 𝑛 1 + đ‘„0 ) Poincare Model Lorentz Model (ℒ 𝑛 , âˆ’đ‘‘đ‘„0 2 + đ‘‘đ‘„1 2 
 + đ‘‘đ‘„ 𝑛 2 ) ISOMETRIC Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry (ICML 2018)
  • 12. Suggested loss function 12 A Example of loss function over hyperbolic space. Fundamentally, gradients of loss tells which direction the points should proceed. PoincarĂ© Embeddings for Learning Hierarchical Representations (ICML 2017)
  • 13. Gradient Descent Algorithm 13 Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0 repeat choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘 𝐿2 choose a retraction 𝑅 𝑝 𝑘 : 𝑇𝑝 𝑘 𝐿2 → 𝐿2 choose a step length đ›Œ 𝑘 ∈ ℝ set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘 (đ›Œ 𝑘 𝑣 𝑘) 𝑘 ← 𝑘 + 1 until 𝑝 𝑘+1 sufficiently minimize 𝑓 Nothing different from usual gradient descent except for Gradient direction Retraction Optimization methods on Riemannian manifolds and their application to shape space (SIAM 2012)
  • 14. Gradient Descent Algorithm 14 Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0 repeat choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘 𝐿2 choose a retraction 𝑅 𝑝 𝑘 : 𝑇𝑝 𝑘 𝐿2 → 𝐿2 choose a step length đ›Œ 𝑘 ∈ ℝ set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘 (đ›Œ 𝑘 𝑣 𝑘) 𝑘 ← 𝑘 + 1 until 𝑝 𝑘+1 sufficiently minimize 𝑓 What is the gradient on Hyperbolic space? 𝑓 ∶ (ℒ2 , âˆ’đ‘‘đ‘„0 2 + đ‘‘đ‘„1 2 + đ‘‘đ‘„ 𝑛 2 ) → ℝ ∇𝑓 ?
  • 15. Hyperboloid model 15 First, find đ›»â„2:1 𝑓| 𝑝 ∈ ℝ3 𝑠. 𝑡. đ›»â„2:1 𝑓| 𝑝, 𝑣 ℒ = 𝑑𝑓 𝑣 | 𝑝. Second, project đ›»â„2:1 𝑓| 𝑝 into 𝑇𝑝 𝐿2. đ›»đż2 𝑓| 𝑝 = đ›»â„2:1 𝑓| 𝑝 + đ›»â„2:1 𝑓| 𝑝, 𝑝 ℒ 𝑝 𝑇𝑝 𝐿2 = {𝑣 ∈ ℝ3 ∶ 𝑣, 𝑝 ℒ = 0}. 𝐿2 = {𝑝 ∈ ℝ3: 𝑝, 𝑝 ℒ = −1, 𝑝 𝑧 > 0}. 𝑓 ∶ (ℒ2, âˆ’đ‘‘đ‘„0 2 + đ‘‘đ‘„1 2 + đ‘‘đ‘„2 2) → ℝ đ›»â„2:1 𝑓| 𝑝 = (âˆ’đ‘‘đ‘„0 2 + đ‘‘đ‘„1 2 + đ‘‘đ‘„ 𝑛 2)−1 ⋅ Usual derivative (from tensorflow) −𝑣 𝑘 Gradient descent in hyperbolic space (Arxiv 2018)
  • 16. Gradient Descent Algorithm 16 Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0 repeat choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘 𝐿2 choose a retraction 𝑅 𝑝 𝑘 : 𝑇𝑝 𝑘 𝐿2 → 𝐿2 choose a step length đ›Œ 𝑘 ∈ ℝ set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘 (đ›Œ 𝑘 𝑣 𝑘) 𝑘 ← 𝑘 + 1 until 𝑝 𝑘+1 sufficiently minimize 𝑓 What is the retraction on Hyperbolic space?
  • 17. Hyperboloid model 17 Retraction tells how ends points of tangent vectors correspond to the point on manifold. We chose affine geodesic as retraction đ›Ÿđ‘Ą = cosh ||𝑣||ℒ 𝑡 𝑝 + sinh ||𝑣||ℒ 𝑡 𝑣 ||𝑣||ℒ 𝑞â€Č ∉ 𝐿2 𝑅(𝑞â€Č ) ∈ 𝐿2 At 𝑝 ∈ 𝐿2 with direction 𝑣 ∈ 𝑇𝑝 𝐿2
  • 18. Gradient Descent Algorithm 18 Input: 𝑓: 𝐿2 → ℝ, 𝑝0 ∈ 𝐿2, 𝑘 = 0 repeat choose a descent direction 𝑣 𝑘 ∈ 𝑇𝑝 𝑘 𝐿2 choose a retraction 𝑅 𝑝 𝑘 : 𝑇𝑝 𝑘 𝐿2 → 𝐿2 choose a step length đ›Œ 𝑘 ∈ ℝ set 𝑝 𝑘+1 = 𝑅 𝑝 𝑘 (đ›Œ 𝑘 𝑣 𝑘) 𝑘 ← 𝑘 + 1 until 𝑝 𝑘+1 sufficiently minimize 𝑓 The next point becomes 𝑝 𝑘+1 = 𝑅 𝑝 𝑘 (đ›Œ 𝑘 𝑣 𝑘) = cosh ||𝑣 𝑘||ℒ đ›Œ 𝑘 𝑝 𝑘 + sinh ||𝑣 𝑘||ℒ đ›Œ 𝑘 𝑣 𝑘 ||𝑣 𝑘||ℒ
  • 19. Simple Optimization Task1 19 GD with gradients GD with R-gradients R-GD with R-gradients 𝑝𝑡 = 𝑝𝑡−1 − đ›Œ ⋅ đ›»đž 𝐿(𝑝𝑡−1) 𝑝𝑡 = 𝑝𝑡−1 − đ›Œ ⋅ đ›»đ‘… 𝐿(𝑝𝑡−1) 𝑝𝑡 = đ›Ÿ đ›Œ đ›Ÿ0 = 𝑝𝑡−1 đ›Ÿ0 â€Č = đ›»đ‘… 𝐿(𝑝𝑡−1) 3.3024998, 4.7424998, 4.7859879, 4.8213577, 4.851644, 4.8784704, 4.9028177, 4.9253302 3.3024998, 3.3081245, 3.3175893, 3.3334663, 3.3599658, 3.403821, 3.4753809, 3.5894651 3.3024998, 3.3025002, 3.3025002, 3.3025002, 3.3025005, 3.3025, 3.3025002, 3.3025005
  • 20. Simple Optimization Task2 20 𝐿(𝑝) = 𝑖 𝑑 𝐿2 𝑝, đ‘„đ‘– 2 “Barycenter” can be found by minimizing
  • 22. Simple Optimization Task2 22 𝐿(𝑝) = 𝑖 𝑑 𝐿2 𝑝, đ‘„đ‘– 2 “Barycenter” can be found by minimizing
  • 23. Takeaways 23 Hyperbolic space is promising to represent symbolic and hierarchical datasets. Geometry determines path toward optimal points. Regardless of optimization technique, the optimal point is only depends on loss function. Interpretation: Can the path entail semantics? Loss function over hyperbolic space should be discreetly chosen. Is it suitable for given geometry? Differentiable? / operation? Unfortunately, we loose simple arithmetic.

Editor's Notes

  • #2: Good evening. I am segwang kim from machine intelligence lab. My topic is Hierarchical representation with hyperbolic geometry. This topic is the topic I am currently working on, but I have gotten nothing meaningful yet. I found this topic intriguing in that it suggests alternative ways to represent symbolic and hierarchical datasets, which in turns helps to do downstream tasks in Natural Language Processing or Social Network Analysis.
  • #3: This is an overview. The main goal of this talk is to make you get along with hyperbolic representation. First, I will introduce the data of interest to be represented and conventional way to embed those datasets. Second, I will go over shortcomings of conventional embedding and introduce the gist of hyperbolic space. Third, I am gonna show optimization technique over hyperbolic space. In the end, Toy Experiments are followed. Recent papers are included in this presentation.
  • #5: The datasets I am dealing with, such as wordnet or social network are symbolic and hierarchical. They are symbolic because words or users have no meaningful numeric values. They are just symbols. On top of that, they are hierarchical since there exist partial orderings between data points like dogs belong to mammals and mammals belong to animal. Or, when a twitter user follows another, then we can have ordering between them. The typical machine learning problem on those datasets are link prediction, node classification, community detection or visualization. To be specific, someone would ask are sprinkler and birdcage linked? or what community does a particular user belong to?
  • #6: To tackle those problems, we need to parametrize symbolic and hierarchical dataset into numeric forms. We call this process as embedding. Once data points are embedded into some space, we can apply a machine learning model that work on the space. Even if symbolic datapoints are represented in numerical form, it is natural to expect that the embedding should agree on our intuition. For instance, two words with similar meaning should be represented as two points that are close to each other. This two-dimensional figure seems to catch semantic relation. Like this, we expect some properties from good embedding. Down the ages, we have embedded symbolic data into the most familiar space, Euclidean space.
  • #8: However, there are some limitations of Euclidean Embedding. To illustrate, assume that we want to solve machine learning problem on this bushy-structured datasets. Edge between two nodes means they have something in common. Therefore, we would want to find the embedding that preserves distances among nodes measured in the graph. Unfortunately, a second you embed the data points into two dimensional Euclidean space, you would realize that the huge distortions have been made. While the graph distance between node a and b is 2, the Euclidean distance between corresponding points is far less than 2. To remedy this problem, researchers have increased the dimensionality of Euclidean space. However, by doing that, we loose opportunities to analyze it low dimension. On top of that, trying to embed trees into Euclidean space is wrong from the beginning. To be more formally, there is a theorem that Trees cannot be 
. So, main question is, what if we have a space that preserve graph structure well like this one? What is this mysterious space? Now, it’s time to introduce hyperbolic space.
  • #9: Time for series of math slides. The best analogy I can use for introducing hyperbolic space is Euclidean space. We can define geometry of given space or manifold by looking into its domain and inner product structure on tangent space. Before elaborating why inner product structure does matter, let’s formally define hyperbolic space. Hyperbolic space is a manifold with constant sectional curvature -1 and five different models are used for describing it. Actually they are same because there exists isometries among them. Anyhow, I pick one of them. A Poicare disk model. The domain of N-dimensional poicare disk model is N-dimensional sphere. A innerproduct of tangent space is defined like this. Unlike Euclidean space which has the same innerprdocut rule for all tangent space, hyperbolic space has different innerproduct structure depending on which point given tangent space is attached. In mathetmatical term, this is called Riemannian metric. To compare these two spaces, let’s do an inner product. First you attach tangent plane to given point p in Euclidean or hyperbolic space and then, you pick two arbitrary tangent vectors from the tangent plane. In case of Euclidean product, you take component-wise product and do summation. Note that the point p has nothing to do with computing inner product. However, in case of hyperbolic space, this highlighted term is multiplied after usual inner product. Note that it depends on point p. Because of this term, strange things are happened.
  • #10: As I said, inner product of tangent space governs geometry of space. Because, it defines length, angle and “line” of given space. From the calculus 101, we know that length of given path is defined as line integral of norm of instantaneous velocity, which is tangent vector. Since norm is defined when inner product is given, the Riemannian Metric comes into play. Also, angle between two tangent vectors is governed by innerproduct structure because inner products need to be done. Finally, if we keep in mind that line is not defined as straight path but the shortest path connecting starting and end points, shape of line in hyperbolic space must be different. The shortest path is the optimal solution of this functional equation which seems almost impossible to solve. But Mathematician concludes that line in hyperbolic space is either an usual arc which perpendicularly intersects with boundary of n-dimensional sphere or straight line starting from the center. Considering the norm of tangent vector increases as base point goes to boundary, the shortest path must be inclined to pass region around center rather than near boundary. So it must be tilted toward center.
  • #11: One interesting fact about hyperbolic space is we can choose a one model among five ones depending on situation. Fundamentally, they are all same because of existence of isometry. The paper “ “ suggest that Poincare ball model is more adequate for visualization than Lorentz model, defined like this. This is because Lorentz model is defined on ambient space with constraints. But Lorentz model guarantees more computational stability of gradient than Poincare ball model. In the following optimization section, I will explain optimization technique on Lorentz model not Poincare model.
  • #13: This is one example of loss function over hyperbolic space. As you can see, this loss function has hyperbolic distance terms. Details are omitted, but basically, this disperses irrelevant datapoints and aggregates relevant ones. Because gradients of loss tells which direction the datapoints should proceed, we need to know how to compute derivative of given loss function.
  • #14: This is Riemannian Gradient descent algorithm. There are only two parts you need to focus on. First, choosing a descent direction, second choosing a retraction
  • #15: Choosing a descent direction needs more a little bit of efforts than usual gradient. Let’s assume that we want to minimize a loss function over two-dimensional Lorentz model. Basically, we want to find gradient of f.
  • #16: It takes two steps. Basically, we need to correspond naĂŻve gradients to a tangent vector. First, once we get a gradient from tensorflow or any api, as shown in blue box, this value is unique no matter which metric tensor you have chosen. If we interpret gradient as linear mapping from tangent space to real number, Riesz representation theorem implies that there is a corresponding vector such that inner product with the vector is the gradient map. To find the vector, inverse of metric tensor needs to be multiplied to usual derivatives in order to compensates extra terms in hyperbolic innerproduct. It is complicated but, bottom line is just flip the sign of the first element of usual gradient. The second step is projection. Because Lorentz model is defined in ambient space, we need to project the resulting vector from the first step to tangent place of model. It only takes some multiplication and addition. Therefore, we can get Riemannian descent direction by flipping signs of all components of hyperbolic gradient of the loss.
  • #17: Retraction tells how can a point be moved to given direction. When the point is moved to the tip of the direction, it escapes a manifold. This is sad.
  • #18: However, the point is moved to the tip of the geodesics, then it stays on the manifold and we are happy. The geodesic is a hyperbolic version of line and this simple formula is all you need.
  • #19: The last step is trivial. We just need to iterate previous steps until we get sufficiently small errors.