SlideShare a Scribd company logo
Automatic variational inference
with
latent categorical variables
Tomasz Kuśmierczyk
2020-04-15
Resources
Mixture of discrete normalizing flows for variational inference:
● ArXiv text: https://arxiv.org/abs/2006.15568
● GitHub code:
https://github.com/tkusmierczyk/mixture_of_discrete_normalizing_flows
Discrete normalizing flows:
● ArXiv text: https://arxiv.org/abs/1905.10347
● GitHub implementation:
https://github.com/google/edward2/blob/master/edward2/tensorflow/layers/discrete_flo
ws.py
Agile AI with Probabilistic Programming
● easy specification & model development
● scalability thanks to variational inference
● handling latent discrete variables?
Latent categorical variables
● allocation models with hand-crafted algorithms
○ mixture models
○ hidden markov models
○ topic models
● expert’s knowledge encoding
● … ?
1D categorical distribution specication
(unordered) set of categories x∈ {a,b,c,d} → probability p
p(x=a) = ½
p(x=b) = 0
p(x=c) = ½
p(x=d) = 0
2D categorical distribution specication
p(x=a) = ½
p(x=b) = 0
p(x=c) = ½
p(x=d) = 0
0.25 0.20 0.05 0
0 0 0 0
0.20 0.05 0.05 0.20
0 0 0 0
e f g h
z =
joint distribution
p(x, z)
0.50 0.40 0.10 0
0 0 0 0
0.40 0.10 0.10 0.40
0 0 0 0
e f g h
z =
conditional distribution
p(z | x)
or
Toy model
https://en.wikipedia.org/wiki/Bayesian_network#/media/File:SimpleBayesNet.svg
Automatic Variational Inference
Why: ecient inference of approximate posteriors q without tedious math
qx(x) ≈ p(x|D)
Reparametrized ELBO:
Requirements for q to perform automatic VI with reparametrization gradients:
● generating sampling differentiable w.r.t distribution parameters
● log-probability of samples
Reparametrization:
For example: normal distribution with parameters Ν = (Ο, σ):
fΝ(u) = Ο + σ u , u ~ N(0, 1)
Normalizing flows:
● parameters λ do not have this interpretation
Reparametrization vs normalizing flows
, u ~ pu
Discrete flows
, u ~ pu
fÎť
sampling:
probability evaluation:
Individual flow vs. mixture of flows
b-th flow
mixing weightsum over flows
Accuracy of B-flows
Base distributions with probability mass concentrated at exactly one category:
p(ub=c)=1
Equal mixing weights:
ρb
= 1/B
➔
each flow allocates probability 1/B
➔
| true probability for category k - approximation | ⋜ 1/B
➔
works well for concentrated distributions
➔
fails for uniform distribution
Multivariate categorical distributions
p(x) = p(x1) p(x2 | x1) p(x3 | x2, x1) … p(xd | xd-1, xd-2, …, x1) ...
fd(u) = (Οd + σd u) mod Kd
where
μd := μd(xd-1, xd-2, …, x1, * )
σd := σd(xd-1, xd-2, …, x1, * )
neural networks
trained with
straight-through
estimator of argmax
Practical probability evaluation in
entropy term
assuming independence:
full covariance:
Toy example
p( … | grass wet = yes ) sprinkler = no sprinkler = yes
rain = no 0.000 0.642
rain = yes 0.353 0.004
p( … | grass wet = yes ) sprinkler = no sprinkler = yes
rain = no 0.000 0.600
rain = yes 0.400 0.000
true posterior
approximate posterior (10 flows, found in 15 iterations)
Experiments
● gaussian mixture model
● (large) bayesian networks
● (higher order) hidden markov model
● variational autoencoder

More Related Content

PDF
Variational inference using implicit distributions
PDF
Introduction to modern Variational Inference.
PDF
Loss Calibrated Variational Inference
PDF
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
PDF
Formal methods 8 - category theory (last one)
PDF
H2O World - Generalized Low Rank Models - Madeleine Udell
PDF
Generalized Low Rank Models
PDF
H2O World - GLRM - Anqi Fu
Variational inference using implicit distributions
Introduction to modern Variational Inference.
Loss Calibrated Variational Inference
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Formal methods 8 - category theory (last one)
H2O World - Generalized Low Rank Models - Madeleine Udell
Generalized Low Rank Models
H2O World - GLRM - Anqi Fu

What's hot (20)

PDF
Formal methods 4 - Z notation
PDF
Elm talk bayhac2015
PPTX
Theory of Automata and formal languages Unit 3
PDF
Integration+using+u-substitutions
PDF
Cs6503 theory of computation november december 2015 be cse anna university q...
DOCX
CS2303 Theory of computation April may 2015
PDF
Cs6503 theory of computation may june 2016 be cse anna university question paper
DOCX
Cs6503 theory of computation november december 2016
PDF
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
PDF
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
DOCX
Cs6503 theory of computation april may 2017
PDF
Computing Information Flow Using Symbolic-Model-Checking_.pdf
PDF
Cs2303 theory of computation all anna University question papers
PPTX
Dynamic Program Problems
PDF
PAGOdA poster
 
PDF
Backpropagation
PDF
Cs2303 theory of computation november december 2015
PPTX
Theory of Automata and formal languages unit 2
DOC
Model toc
PDF
Minimal Introduction to C++ - Part I
Formal methods 4 - Z notation
Elm talk bayhac2015
Theory of Automata and formal languages Unit 3
Integration+using+u-substitutions
Cs6503 theory of computation november december 2015 be cse anna university q...
CS2303 Theory of computation April may 2015
Cs6503 theory of computation may june 2016 be cse anna university question paper
Cs6503 theory of computation november december 2016
Shape Safety in Tensor Programming is Easy for a Theorem Prover -SBTB 2021
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
Cs6503 theory of computation april may 2017
Computing Information Flow Using Symbolic-Model-Checking_.pdf
Cs2303 theory of computation all anna University question papers
Dynamic Program Problems
PAGOdA poster
 
Backpropagation
Cs2303 theory of computation november december 2015
Theory of Automata and formal languages unit 2
Model toc
Minimal Introduction to C++ - Part I
Ad

Similar to Automatic variational inference with latent categorical variables (20)

PDF
Deep Learning for Cyber Security
PDF
DirichletProcessNotes
PPTX
Slides for "Do Deep Generative Models Know What They Don't know?"
PDF
Probabilistic Models with Hidden variables3.pdf
PDF
Deep VI with_beta_likelihood
PDF
How many components in a mixture?
PPTX
Into to prob_prog_hari
PDF
Probably, Definitely, Maybe
PPTX
Dirichlet processes and Applications
PDF
prior selection for mixture estimation
PDF
Expectation propagation
PPTX
PRML Chapter 10
PDF
maXbox starter67 machine learning V
PDF
is anyone_interest_in_auto-encoding_variational-bayes
PPTX
Bayesian Neural Networks
PDF
Bayesian Statistics in High Dimensions Lecture 1: Curve and surface estimation
PDF
Testing for mixtures at BNP 13
PDF
Graphical Models In Python | Edureka
PDF
Lecture_ on_bayesian-networks & neural network.pdf
PDF
Variational inference
Deep Learning for Cyber Security
DirichletProcessNotes
Slides for "Do Deep Generative Models Know What They Don't know?"
Probabilistic Models with Hidden variables3.pdf
Deep VI with_beta_likelihood
How many components in a mixture?
Into to prob_prog_hari
Probably, Definitely, Maybe
Dirichlet processes and Applications
prior selection for mixture estimation
Expectation propagation
PRML Chapter 10
maXbox starter67 machine learning V
is anyone_interest_in_auto-encoding_variational-bayes
Bayesian Neural Networks
Bayesian Statistics in High Dimensions Lecture 1: Curve and surface estimation
Testing for mixtures at BNP 13
Graphical Models In Python | Edureka
Lecture_ on_bayesian-networks & neural network.pdf
Variational inference
Ad

More from Tomasz Kusmierczyk (7)

PDF
Priors for BNNs
PDF
Overconfidence and subnetwork Inference for BNNs
PDF
On the Causal Effect of Digital Badges
PDF
What are the negative effects of social media?: fighting fake information
PDF
Sampling and Markov Chain Monte Carlo Techniques
PDF
Probabilistic Models in Recommender Systems: Time Variant Models
PDF
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Priors for BNNs
Overconfidence and subnetwork Inference for BNNs
On the Causal Effect of Digital Badges
What are the negative effects of social media?: fighting fake information
Sampling and Markov Chain Monte Carlo Techniques
Probabilistic Models in Recommender Systems: Time Variant Models
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)

Recently uploaded (20)

PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Computer network topology notes for revision
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Global journeys: estimating international migration
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to Business Data Analytics.
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Computer network topology notes for revision
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
Introduction to Knowledge Engineering Part 1
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Clinical guidelines as a resource for EBP(1).pdf
Global journeys: estimating international migration
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Business Data Analytics.

Automatic variational inference with latent categorical variables

  • 1. Automatic variational inference with latent categorical variables Tomasz Kuśmierczyk 2020-04-15
  • 2. Resources Mixture of discrete normalizing flows for variational inference: ● ArXiv text: https://arxiv.org/abs/2006.15568 ● GitHub code: https://github.com/tkusmierczyk/mixture_of_discrete_normalizing_flows Discrete normalizing flows: ● ArXiv text: https://arxiv.org/abs/1905.10347 ● GitHub implementation: https://github.com/google/edward2/blob/master/edward2/tensorflow/layers/discrete_flo ws.py
  • 3. Agile AI with Probabilistic Programming ● easy specication & model development ● scalability thanks to variational inference ● handling latent discrete variables?
  • 4. Latent categorical variables ● allocation models with hand-crafted algorithms ○ mixture models ○ hidden markov models ○ topic models ● expert’s knowledge encoding ● … ?
  • 5. 1D categorical distribution specication (unordered) set of categories x∈ {a,b,c,d} → probability p p(x=a) = ½ p(x=b) = 0 p(x=c) = ½ p(x=d) = 0
  • 6. 2D categorical distribution specication p(x=a) = ½ p(x=b) = 0 p(x=c) = ½ p(x=d) = 0 0.25 0.20 0.05 0 0 0 0 0 0.20 0.05 0.05 0.20 0 0 0 0 e f g h z = joint distribution p(x, z) 0.50 0.40 0.10 0 0 0 0 0 0.40 0.10 0.10 0.40 0 0 0 0 e f g h z = conditional distribution p(z | x) or
  • 8. Automatic Variational Inference Why: ecient inference of approximate posteriors q without tedious math qx(x) ≈ p(x|D) Reparametrized ELBO: Requirements for q to perform automatic VI with reparametrization gradients: ● generating sampling differentiable w.r.t distribution parameters ● log-probability of samples
  • 9. Reparametrization: For example: normal distribution with parameters Îť = (Îź, σ): fÎť(u) = Îź + σ u , u ~ N(0, 1) Normalizing flows: ● parameters Îť do not have this interpretation Reparametrization vs normalizing flows , u ~ pu
  • 10. Discrete flows , u ~ pu fÎť sampling: probability evaluation:
  • 11. Individual flow vs. mixture of flows b-th flow mixing weightsum over flows
  • 12. Accuracy of B-flows Base distributions with probability mass concentrated at exactly one category: p(ub=c)=1 Equal mixing weights: ρb = 1/B ➔ each flow allocates probability 1/B ➔ | true probability for category k - approximation | ⋜ 1/B ➔ works well for concentrated distributions ➔ fails for uniform distribution
  • 13. Multivariate categorical distributions p(x) = p(x1) p(x2 | x1) p(x3 | x2, x1) … p(xd | xd-1, xd-2, …, x1) ... fd(u) = (Îźd + σd u) mod Kd where Îźd := Îźd(xd-1, xd-2, …, x1, * ) σd := σd(xd-1, xd-2, …, x1, * ) neural networks trained with straight-through estimator of argmax
  • 14. Practical probability evaluation in entropy term assuming independence: full covariance:
  • 15. Toy example p( … | grass wet = yes ) sprinkler = no sprinkler = yes rain = no 0.000 0.642 rain = yes 0.353 0.004 p( … | grass wet = yes ) sprinkler = no sprinkler = yes rain = no 0.000 0.600 rain = yes 0.400 0.000 true posterior approximate posterior (10 flows, found in 15 iterations)
  • 16. Experiments ● gaussian mixture model ● (large) bayesian networks ● (higher order) hidden markov model ● variational autoencoder