Automatic variational inference with latent categorical variables

Automatic variational inference
with
latent categorical variables
Tomasz Kuśmierczyk
2020-04-15

Resources
Mixture of discrete normalizing flows for variational inference:
● ArXiv text: https://arxiv.org/abs/2006.15568
● GitHub code:
https://github.com/tkusmierczyk/mixture_of_discrete_normalizing_flows
Discrete normalizing flows:
● ArXiv text: https://arxiv.org/abs/1905.10347
● GitHub implementation:
https://github.com/google/edward2/blob/master/edward2/tensorflow/layers/discrete_flo
ws.py

Agile AI with Probabilistic Programming
● easy speciﬁcation & model development
● scalability thanks to variational inference
● handling latent discrete variables?

Latent categorical variables
● allocation models with hand-crafted algorithms
○ mixture models
○ hidden markov models
○ topic models
● expert’s knowledge encoding
● … ?

1D categorical distribution speciﬁcation
(unordered) set of categories x∈ {a,b,c,d} → probability p
p(x=a) = ½
p(x=b) = 0
p(x=c) = ½
p(x=d) = 0

2D categorical distribution speciﬁcation
p(x=a) = ½
p(x=b) = 0
p(x=c) = ½
p(x=d) = 0
0.25 0.20 0.05 0
0 0 0 0
0.20 0.05 0.05 0.20
0 0 0 0
e f g h
z =
joint distribution
p(x, z)
0.50 0.40 0.10 0
0 0 0 0
0.40 0.10 0.10 0.40
0 0 0 0
e f g h
z =
conditional distribution
p(z | x)
or

Toy model
https://en.wikipedia.org/wiki/Bayesian_network#/media/File:SimpleBayesNet.svg

Automatic Variational Inference
Why: eﬃcient inference of approximate posteriors q without tedious math
qx(x) ≈ p(x|D)
Reparametrized ELBO:
Requirements for q to perform automatic VI with reparametrization gradients:
● generating sampling diﬀerentiable w.r.t distribution parameters
● log-probability of samples

Reparametrization:
For example: normal distribution with parameters λ = (μ, σ):
fλ(u) = μ + σ u , u ~ N(0, 1)
Normalizing ﬂows:
● parameters λ do not have this interpretation
Reparametrization vs normalizing ﬂows
, u ~ pu

Discrete ﬂows
, u ~ pu
fλ
sampling:
probability evaluation:

Individual flow vs. mixture of flows
b-th flow
mixing weightsum over flows

Accuracy of B-ﬂows
Base distributions with probability mass concentrated at exactly one category:
p(ub=c)=1
Equal mixing weights:
ρb
= 1/B
➔
each ﬂow allocates probability 1/B
➔
| true probability for category k - approximation | ⋜ 1/B
➔
works well for concentrated distributions
➔
fails for uniform distribution

Multivariate categorical distributions
p(x) = p(x1) p(x2 | x1) p(x3 | x2, x1) … p(xd | xd-1, xd-2, …, x1) ...
fd(u) = (μd + σd u) mod Kd
where
μd := μd(xd-1, xd-2, …, x1, * )
σd := σd(xd-1, xd-2, …, x1, * )
neural networks
trained with
straight-through
estimator of argmax

Practical probability evaluation in
entropy term
assuming independence:
full covariance:

Toy example
p( … | grass wet = yes ) sprinkler = no sprinkler = yes
rain = no 0.000 0.642
rain = yes 0.353 0.004
p( … | grass wet = yes ) sprinkler = no sprinkler = yes
rain = no 0.000 0.600
rain = yes 0.400 0.000
true posterior
approximate posterior (10 ﬂows, found in 15 iterations)

Experiments
● gaussian mixture model
● (large) bayesian networks
● (higher order) hidden markov model
● variational autoencoder

Automatic variational inference with latent categorical variables

More Related Content

What's hot (20)

Similar to Automatic variational inference with latent categorical variables (20)

More from Tomasz Kusmierczyk (7)

Recently uploaded (20)

Automatic variational inference with latent categorical variables