SlideShare a Scribd company logo
Lecture'5
Varia%onal)Es%ma%on)and)Inference
Dahua%Lin
The$Chinese$University$of$Hong$Kong
1
Outline
• Es$ma$on)of)Models)in)Exponen$al)Families
• Es$ma$on)with)par$al)observa$ons)9)EM
• Mean)Field)Methods
2
Factorized+Exponen0al+Family
Consider)an)exponen-al)family)of)joint)distribu-ons)
over) :
Here,% %indicates%the%subset%of%components%involved%
in%the% 6th%factor.
3
With%Complete%Observa2ons
Given& ,&the&op,mal&es,mates&are&given&by
• With&canonical&parameteriza1on,&this&is&convex.
• Parameters&may&come&with&constraints.
• There&can&be&analy%c'solu%ons,&otherwise,&one&can&
solve&this&using&numerical&methods.&
4
Example:)GMM
• GMM"involves:"observed*feature" "and"component*
indicator" .
• How%to%es)mate%if%both% %and% %are%observed?
5
Es#mate(GMM
For$ ,$we$maximize
Using&Lagrange&mul.pliers,&we&get
6
Es#mate(GMM((cont'd)
For$ ,$we$minimize
where% ,%thus
7
Par$ally'Observed'Models
Consider)an)exponen-al)family)involving)observed(
variables) )and)latent(variables) :
Here,% %and% %refer%to%the%observed%parts%and%latent%
parts%of%the%en2re%sample%set.
8
Par$ally'Observed'Models'(cont'd)
Given&an&observa,on& ,&we&have
where% %is%called%condi&onal)log+par&&on:
This%also%belongs%to%an%exponen&al)family.
9
MLE$with$Par,al$Observa,ons
The$maximum&likelihood&es.mate$is$obtained$by$
maximizing$the$marginal&likelihood$over$observed&data:
10
Issues
• The%condi&onal)log+par&&on% %as%below%is%
o-en%very%difficult%to%evaluate:
• We$usually$resort$to$Expecta(on+Maximiza(on0(EM)$
--$a$strategy$that$itera1vely$construct$and$maximize$
lower$bounds$of$ .
11
Lower&Bound&of&
Let$ .$
By$conjugate$duality:
with%
!
12
Lower&Bound&of& &(cont'd)
Hence,&we&have&
Hence,& &is&a&lower&bound&of& &for&
any& .
13
Expecta(on+Maximiza(on
The$Expecta(on+Maximiza(on0(EM)$algorithm$is$
coordinate0ascent$on$ :
• E"step:
• M"step:
14
E"step
• Each&E"step&reduces&to&maximize& ,&
the&op4mal&solu4on&is&the&expecta*on&of& :
• By$conjugate*duality,$with$ ,$we$have$
,$thus:
15
M"step
• Each&M"step&reduces&to&maximize& ,&
the&op4mal&solu4on&is&a7ained&when
16
It#can#be#shown#that#EM#Op&mizes# .#Why?
17
log L(✓|x)
Q(✓; µ(t+1)
)
Q(✓; µ(t)
)
✓(t 1)
✓(t)
✓(t+1)
EM#Op&mizes#
Sta$onary)point)is)a-ained)when)
)and) )are)dually&coupled,)w.r.t.)
both) )and) :
18
Info.&Geo.&Interpreta-on
• A#parameter# #indicates#a#condi0onal#
distribu0on#over# :# .
• A#mean# #is#realized#by#another#condi0onal#
distribu0on# #with# .
• The#KL#divergence#between#them:
19
Info.&Geo.&Interpreta-on
• For%any% %and% :
• E#step:)minimize) )to)close)the)gap)
between) )and) .
• M#step:)M#projec;on)of) )onto) .
20
EM#with#iid#samples
Consider)a)common)problem:) )
are)generated)from)an)exponen5al)family)distribu5on,)
and)only) )is)observed)for)each) :
!
21
EM#with#iid#samples#(cont'd)
Lower&bound& &is:
It#has:
22
• E#step:)
• M#step:)
op#ma&a'ained&when
23
EM#for#GMM
• The%condi&onal)expecta&on%is%determined%by%
.
• E1step%computes:
24
EM#for#GMM#(cont'd)
Given& ,&M)step:
25
What%if%it%is%intractable%to%compute%the%expected%
sufficient%sta6s6cs% ?
26
Varia%onal)EM
• Basic&idea:"Use"a"distribu-on" "from"a"tractable"
family" "to"approximate" ,"and"thus"
"to"approximate" .
• This"is"to"restrict" "to"
.
• The"lower"bound"becomes:
27
Varia%onal)EM)(cont'd)
• Varia%onal)E+step:"with"restric+on"to" ,"
compu+ng" "is"tractable:
!
• M"step:"remains"the"same
28
Varia%onal)E+step
• "is"usually"chosen"to"be"an"exponen&al)family,"
parameterized"by" ."Then"the"varia&onal)E1step"
reduces"into"two"steps.
• Step"1:"Find"op=mal" "through"I1projec&on:
• Step&2:&Compute&
29
Key$Problem
• With& &given,& &remains&an&exponen3al&
family&distribu3on:&
&with&
&and& .
• &plays&a&key&role&in&model&es3ma3on.
• Key$problem:&choose&a&tractable&distribu3on& &
from& &to&approximate& &and&compute&
30
Mean%Field%Methods
• Consider*an*exponen.al*family*distribu.on* *for*
which*it*is*intractable*to*compute*the*mean*given* .
• Mean%field%methods*use*a*distribu.on* *from*a*
tractable*family,*usually*in*a*product%form,*to*
approximate*the*given*distribu.on* ,*and*use*
*to*approximate* .*
31
Product(Form
• We$say$a$joint$distribu1on$over$ $
is$of$the$product(form,$if$its$density$can$be$wri8en:
• An$exponen&al)family$of$product)form:
32
Product(Form((cont'd)
• Log%par))on+func)on:
• Expecta)on:
• If$each$factor$is$tractable,$then$the$whole$
distribu5on$is$tractable.
33
Ising&Model&(formula2on)
It#is#intractable#to#compute# #exactly.
34
Ising&Model&(factorized&model)
Consider)a)factorized)model
where% .%Then
35
Ising&Model&(approxima2on)
To#find# #that#approximates# ,#we#perform#I"
projec)on#of# #onto#the#factorized1family# :
with% .
36
Ising&Model&(approxima2on)
The$best$approxima(on$can$be$solved$itera1vely:
Whereas' 'is'in'a'product(form,'the'parameters'
associated'with'different'components'are'usually'
coupled'in'the'op6mal'approxima6on.
37
Mean%Field%Theory
Consider)an)exponen&al)family:)
and$a$tractable(family$ .$Then$for$any$ :
!
!can!generally!be!factorized!into!simpler!forms.
38
Mean%Field%Theory%(cont'd)
The$difference$between$ $and$the$tractable(lower(
bound$is$the$KL$divergence:
with% .%The%op+ma% %is%the%I"projec)on:
39
Naive&Mean&Field
The$mean%field%methods$are$called$naive%mean%field$
when$ $is$of$product%form.$Consider:
and
40
Hence,&the&nega+ve&entropy&of& &can&be&factorized:
The$op'ma$ $can$be$solved$by$minimizing:
where% .
41
Naive&Mean&Field&(Op/ma)
• This&problem&can&be&solved&by&coordinate*descent.&
• When&op6ma&is&a7ained:&
• Hence,'the'op,ma' 'is'given'by'
42
Naive&Mean&Field&(Discussion)
• In$naive$mean$field,$while$ $is$of$a$product$form,$
the$parameters$associated$with$different$
components$are$generally$coupled$in$the$op;mal$
approxima;on.
• The$I"projec)on$problem$in$naive$mean$field$is$non#
convex$in$general.$In$prac;ce,$the$coordinate$
ascent$procedure$can$be$trapped$in$a$local-valley.$
• Generally,$it$is$unclear$how$far$ $is$from$ .
43
Varia%onal)EM)(Recap)
• E#step((for(each(sample( ):
• M#step:
44
M
N
nd
↵
✓d
zdi
wdi
k
Latent&Dirichlet&
Alloca/on
• Variables
• Parameters:. ,.
• Observed:.
• Latent:. ,.
45
Condi&onal)Distribu&on
Let$ $and$ :
Two$latent&suff.stats.:$ $and$ .$
46
Varia%onal)Distribu%on
• :#Dirichlet#with#
• :#Categorical#with# .
47
Varia%onal)E+Steps
• For% :
• For% :
48
M"Step
49

More Related Content

PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
PDF
MLPI Lecture 1: Maths for Machine Learning
PDF
MLPI Lecture 2: Monte Carlo Methods (Basics)
PDF
MLPI Lecture 4: Graphical Model and Exponential Family
PDF
MLPI Lecture 0: Overview
PDF
MLPI Lecture 3: Advanced Sampling Techniques
PDF
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Storytelling For The Web: Integrate Storytelling in your Design Process
2024 Trend Updates: What Really Works In SEO & Content Marketing
MLPI Lecture 1: Maths for Machine Learning
MLPI Lecture 2: Monte Carlo Methods (Basics)
MLPI Lecture 4: Graphical Model and Exponential Family
MLPI Lecture 0: Overview
MLPI Lecture 3: Advanced Sampling Techniques
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)

Recently uploaded (20)

PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
An interstellar mission to test astrophysical black holes
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
The scientific heritage No 166 (166) (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Derivatives of integument scales, beaks, horns,.pptx
Phytochemical Investigation of Miliusa longipes.pdf
An interstellar mission to test astrophysical black holes
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
neck nodes and dissection types and lymph nodes levels
HPLC-PPT.docx high performance liquid chromatography
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Introduction to Cardiovascular system_structure and functions-1
INTRODUCTION TO EVS | Concept of sustainability
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
microscope-Lecturecjchchchchcuvuvhc.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
POSITIONING IN OPERATION THEATRE ROOM.ppt
The scientific heritage No 166 (166) (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
2. Earth - The Living Planet Module 2ELS
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Derivatives of integument scales, beaks, horns,.pptx
Ad
Ad

Lecture 5: Variational Estimation and Inference