SlideShare a Scribd company logo
A Wrapped Normal Distribution on Hyperbolic Space
for Gradient Based Learning
ICML’19, Jun 12th, 2019
Yoshihiro Nagano1), Shoichiro Yamaguchi2), Yasuhiro Fujita2), Masanori Koyama2)
1) Department of Complexity Science, The University of Tokyo, Japan
2) Preferred Networks, Inc., Japan
Paper: proceedings.mlr.press/v97/nagano19a.html
Code: github.com/pfnet-research/hyperbolic_wrapped_distribution
ICLR/ICML2019 , Jul 21st, 2019
Yoshihiro Nagano
2017-Current Ph.D. student @ UTokyo
Advisor: Masato Okada
Jul.-Sep. 2018 Summer Internship @ PFN
Mar. 2017 MSc. (Science) @ UTokyo
Mar. 2015 B.S. @ Keio Univ.
Interests
Generative Models, Neural Networks, Computational Neuroscience,
Unsupervised Learning
SNS
!: ganow.me / *: ganow / +: @ny_ganow
Motivation
ARTICLERESEARCH
Figure 3 | Monte Carlo tree search in AlphaGo. a, Each simulation
traverses the tree by selecting the edge with maximum action value Q,
plus a bonus u(P) that depends on a stored prior probability P for that
is evaluated
a rollout to
Selectiona b cExpansion Evaluation
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
r
P
max
max
P
[Silver+2016]
Mammal
Primate
Human Monkey
Rodent
Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
[Image: wikipedia.org]
[Nickel & Kiela, 2017]
Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
Volume increases exponentially
with its radius
Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
[Nickel+2017]
Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
[Nickel+2017]
How can we extend these works to
probabilistic inference?
Difficulty: Probabilistic Distribution on Curved Space
…
M
1.
2. 3.
[Image: wikipedia.org]
Difficulty: Probabilistic Distribution on Curved Space
…
M
1.
2. 3.
[Image: wikipedia.org]
Difficulty: Probabilistic Distribution on Curved Space
…
M
1.
2. 3.
[Image: wikipedia.org]
Difficulty: Probabilistic Distribution on Curved Space
…
M
1.
2. 3.
[Image: wikipedia.org]
[ja.wikipedia.org]
(e.g. Poincaré disk, Lorentz model, …)
Lorentz Model
ℝ"#$ Lorentzian product
-1 n :
Hyperbolic Geometry
Hyperbolic Geometry
(Exponential Map)
(tangent space)
% ∈ '(ℍ* O
(Parallel Transport)
+ ∈ ',ℍ*
% ∈ '(ℍ*
Construction of Hyperbolic Wrapped Distribution
ℝ*
( )
Hyperbolic Wrapped Distribution(b)
Figure 3: The heatmaps of log-likelihood of the pesudo-
hyperbolic Gaussians with various µ and Σ. We designate
the origin of hyperbolic space by the × mark. See Ap-
pendix B for further details.
Since the metric at the tangent space coincides with the Eu-
clidean metric, we can produce various types of Hyperbolic
distributions by applying our construction strategy to other
distributions defined on Euclidean space, such as Laplace
and Cauchy distribution.
to a
rep
gra
wor
β-V
a sc
In H
is i
cod
µ
As
allo
dien
of t
rep
4.2
We
bili
lum
tual
wor
on
ing
wri
Density:
Projection:
(910 1
(;2 2 ; 9120 +
) 0 2 9 2 92 (
≃ ℝ* 2
Numerical Evaluations: VAEs on Synthetic Data
Hyperbolic VAE
Yoshihiro Nagano 1
Shoichiro Yamaguchi 2
Yasuhiro Fujita 2
Masanori Koyama 2
Abstract
rbolic space is a geometry that is known to
ell-suited for representation learning of data
an underlying hierarchical structure. In this
r, we present a novel hyperbolic distribution
d pseudo-hyperbolic Gaussian, a Gaussian-
distribution on hyperbolic space whose den-
can be evaluated analytically and differen-
d with respect to the parameters. Our dis-
ion enables the gradient-based learning of
robabilistic models on hyperbolic space that
d never have been considered before. Also,
an sample from this hyperbolic probability
bution without resorting to auxiliary means
ejection sampling. As applications of our
bution, we develop a hyperbolic-analog of
tional autoencoder and a method of prob-
tic word embedding on hyperbolic space.
emonstrate the efficacy of our distribution
rious datasets including MNIST, Atari 2600
kout, and WordNet.
duction
hyperbolic geometry is drawing attention as a
geometry to assist deep networks in capturing
tal structural properties of data such as a hi-
Hyperbolic attention network (G¨ulc¸ehre et al.,
proved the generalization performance of neural
on various tasks including machine translation
ng the hyperbolic geometry on several parts of
(a) A tree representation of the
training dataset
(b) Normal VAE (β = 1.0) (c) Hyperbolic VAE
Figure 1: The visual results of Hyperbolic VAE applied to
an artificial dataset generated by applying random pertur-
bations to a binary tree. The visualization is being done
on the Poincar´e ball. The red points are the embeddings
of the original tree, and the blue points are the embeddings
of noisy observations generated from the tree. The pink
× represents the origin of the hyperbolic space. The VAE
was trained without the prior knowledge of the tree struc-
ture. Please see 6.1 for experimental details
determines the properties of the dataset that can be learned
from the embedding. For the dataset with a hierarchical
stribution on Hyperbolic Space for
sed Learning
2
Yasuhiro Fujita 2
Masanori Koyama 2
(a) A tree representation of the
training dataset
(b) Normal VAE (β = 1.0) (c) Hyperbolic VAE
Figure 1: The visual results of Hyperbolic VAE applied to
(
Numerical Evaluations: VAEs on Breakout
Atari 2600 Breakout-v4
DQN [Mnih+ 2015]
VAE
(≒
)
Vanilla
Vanilla, |v|2 = 200
VanillaHyperbolic
Numerical Evaluations: Word Embeddings
WordNet Nouns word embedding
Euclid [Vilnis & McCallum 2015]
Conclusion
projection-based
hyperbolic wrapped distribution
VAE MNIST, Atari 2600
Breakout, WordNet
*: pfnet-research/hyperbolic_wrapped_distribution
+
Acknowledgements
Masaki
Watanabe
Tomohiro Hayase Kenta Oono
Takeru Miyato Sosuke Kobayashi
PFN2018

More Related Content

PDF
Alpha Go: in few slides
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
PDF
Enterprise Scale Topological Data Analysis Using Spark
PPTX
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
PPTX
Introduction to matplotlib
PDF
Rn d presentation_gurulingannk
PDF
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
PPT
Clustering (from Google)
Alpha Go: in few slides
Tutorial of topological data analysis part 3(Mapper algorithm)
Enterprise Scale Topological Data Analysis Using Spark
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Introduction to matplotlib
Rn d presentation_gurulingannk
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Clustering (from Google)

What's hot (18)

PDF
Matching networks for one shot learning
PDF
Report
DOC
Algorithms and tools for point cloud generation
PDF
A Hough Transform Based On a Map-Reduce Algorithm
PDF
safe and efficient off policy reinforcement learning
PDF
REU-Airborn Toxins paper
PDF
11 clusadvanced
PDF
The Gaussian Process Latent Variable Model (GPLVM)
PDF
Interaction Networks for Learning about Objects, Relations and Physics
PDF
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
PDF
Efficient Data Stream Classification via Probabilistic Adaptive Windows
PDF
Dual Learning for Machine Translation (NIPS 2016)
PDF
Effective Numerical Computation in NumPy and SciPy
PDF
50120140502003
PDF
Introduction to NumPy (PyData SV 2013)
PPTX
R and Visualization: A match made in Heaven
PDF
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
PDF
CLUSTERING HYPERSPECTRAL DATA
Matching networks for one shot learning
Report
Algorithms and tools for point cloud generation
A Hough Transform Based On a Map-Reduce Algorithm
safe and efficient off policy reinforcement learning
REU-Airborn Toxins paper
11 clusadvanced
The Gaussian Process Latent Variable Model (GPLVM)
Interaction Networks for Learning about Objects, Relations and Physics
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Dual Learning for Machine Translation (NIPS 2016)
Effective Numerical Computation in NumPy and SciPy
50120140502003
Introduction to NumPy (PyData SV 2013)
R and Visualization: A match made in Heaven
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
CLUSTERING HYPERSPECTRAL DATA
Ad

Similar to [ICLR/ICML2019読み会] A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based Learning (ICML2019) (20)

PPTX
Poincare embeddings for Learning Hierarchical Representations
PDF
Hyperbolic Deep Reinforcement Learning
PPTX
Hyperbolic Image Embedding.pptx
PDF
Probably, Definitely, Maybe
PDF
Machine Learning Foundations
PDF
Statistical inference of generative network models - Tiago P. Peixoto
PDF
Probabilistic Models with Hidden variables3.pdf
PDF
Bayesian Deep Learning
PDF
Graphical Models In Python | Edureka
PDF
Introduction to probabilistic programming with pyro
PDF
Efficient sampling of constraint spaces in practice
PDF
Hyperbolic Neural Networks
PPT
Basen Network
PPTX
Learning multifractal structure in large networks (Purdue ML Seminar)
PDF
Accelerating Metropolis Hastings with Lightweight Inference Compilation
PDF
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
PDF
Asymptotics of ABC, lecture, Collège de France
PDF
The slides of my Ph.D. defense
PDF
Deep Learning for Cyber Security
PDF
Generating Networks with Arbitrary Properties
Poincare embeddings for Learning Hierarchical Representations
Hyperbolic Deep Reinforcement Learning
Hyperbolic Image Embedding.pptx
Probably, Definitely, Maybe
Machine Learning Foundations
Statistical inference of generative network models - Tiago P. Peixoto
Probabilistic Models with Hidden variables3.pdf
Bayesian Deep Learning
Graphical Models In Python | Edureka
Introduction to probabilistic programming with pyro
Efficient sampling of constraint spaces in practice
Hyperbolic Neural Networks
Basen Network
Learning multifractal structure in large networks (Purdue ML Seminar)
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
Asymptotics of ABC, lecture, Collège de France
The slides of my Ph.D. defense
Deep Learning for Cyber Security
Generating Networks with Arbitrary Properties
Ad

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
protein biochemistry.ppt for university classes
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Comparative Structure of Integument in Vertebrates.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Placing the Near-Earth Object Impact Probability in Context
protein biochemistry.ppt for university classes
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Microbiology with diagram medical studies .pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
AlphaEarth Foundations and the Satellite Embedding dataset
7. General Toxicologyfor clinical phrmacy.pptx
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Cardiovascular system_structure and functions-1
Taita Taveta Laboratory Technician Workshop Presentation.pptx
The KM-GBF monitoring framework – status & key messages.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg

[ICLR/ICML2019読み会] A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based Learning (ICML2019)

  • 1. A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based Learning ICML’19, Jun 12th, 2019 Yoshihiro Nagano1), Shoichiro Yamaguchi2), Yasuhiro Fujita2), Masanori Koyama2) 1) Department of Complexity Science, The University of Tokyo, Japan 2) Preferred Networks, Inc., Japan Paper: proceedings.mlr.press/v97/nagano19a.html Code: github.com/pfnet-research/hyperbolic_wrapped_distribution ICLR/ICML2019 , Jul 21st, 2019
  • 2. Yoshihiro Nagano 2017-Current Ph.D. student @ UTokyo Advisor: Masato Okada Jul.-Sep. 2018 Summer Internship @ PFN Mar. 2017 MSc. (Science) @ UTokyo Mar. 2015 B.S. @ Keio Univ. Interests Generative Models, Neural Networks, Computational Neuroscience, Unsupervised Learning SNS !: ganow.me / *: ganow / +: @ny_ganow
  • 3. Motivation ARTICLERESEARCH Figure 3 | Monte Carlo tree search in AlphaGo. a, Each simulation traverses the tree by selecting the edge with maximum action value Q, plus a bonus u(P) that depends on a stored prior probability P for that is evaluated a rollout to Selectiona b cExpansion Evaluation p p Q + u(P) Q + u(P)Q + u(P) Q + u(P) P P P P r P max max P [Silver+2016] Mammal Primate Human Monkey Rodent
  • 4. Motivation Mammal Primate Human Monkey Rodent ARTICLECH Monte Carlo tree search in AlphaGo. a, Each simulation he tree by selecting the edge with maximum action value Q, is evaluated in two ways: using the value network vθ a rollout to the end of the game with the fast rollout Selection b c dExpansion Evaluation Backup p p Q + u(P) Q + u(P)Q + u(P) Q + u(P) P P P P Q QQ Q rr r P max max P [Silver+2016] Hierarchical Datasets Hyperbolic Space [Image: wikipedia.org] [Nickel & Kiela, 2017]
  • 5. Motivation Mammal Primate Human Monkey Rodent ARTICLECH Monte Carlo tree search in AlphaGo. a, Each simulation he tree by selecting the edge with maximum action value Q, is evaluated in two ways: using the value network vθ a rollout to the end of the game with the fast rollout Selection b c dExpansion Evaluation Backup p p Q + u(P) Q + u(P)Q + u(P) Q + u(P) P P P P Q QQ Q rr r P max max P [Silver+2016] Hierarchical Datasets Hyperbolic Space Volume increases exponentially with its radius
  • 6. Motivation Mammal Primate Human Monkey Rodent ARTICLECH Monte Carlo tree search in AlphaGo. a, Each simulation he tree by selecting the edge with maximum action value Q, is evaluated in two ways: using the value network vθ a rollout to the end of the game with the fast rollout Selection b c dExpansion Evaluation Backup p p Q + u(P) Q + u(P)Q + u(P) Q + u(P) P P P P Q QQ Q rr r P max max P [Silver+2016] Hierarchical Datasets Hyperbolic Space [Nickel+2017]
  • 7. Motivation Mammal Primate Human Monkey Rodent ARTICLECH Monte Carlo tree search in AlphaGo. a, Each simulation he tree by selecting the edge with maximum action value Q, is evaluated in two ways: using the value network vθ a rollout to the end of the game with the fast rollout Selection b c dExpansion Evaluation Backup p p Q + u(P) Q + u(P)Q + u(P) Q + u(P) P P P P Q QQ Q rr r P max max P [Silver+2016] Hierarchical Datasets Hyperbolic Space [Nickel+2017] How can we extend these works to probabilistic inference?
  • 8. Difficulty: Probabilistic Distribution on Curved Space … M 1. 2. 3. [Image: wikipedia.org]
  • 9. Difficulty: Probabilistic Distribution on Curved Space … M 1. 2. 3. [Image: wikipedia.org]
  • 10. Difficulty: Probabilistic Distribution on Curved Space … M 1. 2. 3. [Image: wikipedia.org]
  • 11. Difficulty: Probabilistic Distribution on Curved Space … M 1. 2. 3. [Image: wikipedia.org]
  • 12. [ja.wikipedia.org] (e.g. Poincaré disk, Lorentz model, …) Lorentz Model ℝ"#$ Lorentzian product -1 n : Hyperbolic Geometry
  • 13. Hyperbolic Geometry (Exponential Map) (tangent space) % ∈ '(ℍ* O (Parallel Transport) + ∈ ',ℍ* % ∈ '(ℍ*
  • 14. Construction of Hyperbolic Wrapped Distribution ℝ* ( )
  • 15. Hyperbolic Wrapped Distribution(b) Figure 3: The heatmaps of log-likelihood of the pesudo- hyperbolic Gaussians with various µ and Σ. We designate the origin of hyperbolic space by the × mark. See Ap- pendix B for further details. Since the metric at the tangent space coincides with the Eu- clidean metric, we can produce various types of Hyperbolic distributions by applying our construction strategy to other distributions defined on Euclidean space, such as Laplace and Cauchy distribution. to a rep gra wor β-V a sc In H is i cod µ As allo dien of t rep 4.2 We bili lum tual wor on ing wri Density: Projection: (910 1 (;2 2 ; 9120 + ) 0 2 9 2 92 ( ≃ ℝ* 2
  • 16. Numerical Evaluations: VAEs on Synthetic Data Hyperbolic VAE Yoshihiro Nagano 1 Shoichiro Yamaguchi 2 Yasuhiro Fujita 2 Masanori Koyama 2 Abstract rbolic space is a geometry that is known to ell-suited for representation learning of data an underlying hierarchical structure. In this r, we present a novel hyperbolic distribution d pseudo-hyperbolic Gaussian, a Gaussian- distribution on hyperbolic space whose den- can be evaluated analytically and differen- d with respect to the parameters. Our dis- ion enables the gradient-based learning of robabilistic models on hyperbolic space that d never have been considered before. Also, an sample from this hyperbolic probability bution without resorting to auxiliary means ejection sampling. As applications of our bution, we develop a hyperbolic-analog of tional autoencoder and a method of prob- tic word embedding on hyperbolic space. emonstrate the efficacy of our distribution rious datasets including MNIST, Atari 2600 kout, and WordNet. duction hyperbolic geometry is drawing attention as a geometry to assist deep networks in capturing tal structural properties of data such as a hi- Hyperbolic attention network (G¨ulc¸ehre et al., proved the generalization performance of neural on various tasks including machine translation ng the hyperbolic geometry on several parts of (a) A tree representation of the training dataset (b) Normal VAE (β = 1.0) (c) Hyperbolic VAE Figure 1: The visual results of Hyperbolic VAE applied to an artificial dataset generated by applying random pertur- bations to a binary tree. The visualization is being done on the Poincar´e ball. The red points are the embeddings of the original tree, and the blue points are the embeddings of noisy observations generated from the tree. The pink × represents the origin of the hyperbolic space. The VAE was trained without the prior knowledge of the tree struc- ture. Please see 6.1 for experimental details determines the properties of the dataset that can be learned from the embedding. For the dataset with a hierarchical stribution on Hyperbolic Space for sed Learning 2 Yasuhiro Fujita 2 Masanori Koyama 2 (a) A tree representation of the training dataset (b) Normal VAE (β = 1.0) (c) Hyperbolic VAE Figure 1: The visual results of Hyperbolic VAE applied to (
  • 17. Numerical Evaluations: VAEs on Breakout Atari 2600 Breakout-v4 DQN [Mnih+ 2015] VAE (≒ ) Vanilla Vanilla, |v|2 = 200 VanillaHyperbolic
  • 18. Numerical Evaluations: Word Embeddings WordNet Nouns word embedding Euclid [Vilnis & McCallum 2015]
  • 19. Conclusion projection-based hyperbolic wrapped distribution VAE MNIST, Atari 2600 Breakout, WordNet *: pfnet-research/hyperbolic_wrapped_distribution +
  • 20. Acknowledgements Masaki Watanabe Tomohiro Hayase Kenta Oono Takeru Miyato Sosuke Kobayashi PFN2018