SlideShare a Scribd company logo
3
Most read
5
Most read
6
Most read
A Gentle Introduction to  the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota Duluth [email_address]
A unifying methodology Dempster, Laird & Rubin (1977) unified many strands of apparently unrelated work under the banner of  The EM Algorithm EM had gone incognito for many years Newcomb (1887)  McKendrick (1926) Hartley (1958) Baum et. al. (1970)
A general framework for solving many kinds of problems  Filling in missing data in a sample Discovering the value of latent variables Estimating parameters of HMMs Estimating parameters of finite mixtures Unsupervised learning of clusters …
EM allows us to make MLE under adverse circumstances What are Maximum Likelihood Estimates? What are these adverse circumstances? How does EM triumph over adversity? PANEL: When does it really work?
Maximum Likelihood Estimates Parameters describe the characteristics of a population. Their values are estimated from samples collected from that population. A MLE is a parameter estimate that is most consistent with the sampled data. It maximizes the likelihood function.
Coin Tossing! How likely am I to toss a head? A series of 10 trials/tosses yields (h,t,t,t,h,t,t,h,t,t) (x1=3, x2=7), n=10 Probability of tossing a head = 3/10 That’s a MLE! This estimate is absolutely consistent with the observed data.  A few underlying details are masked…
Coin tossing unmasked Coin tossing is well described by the binomial distribution since there are  n  independent trials with two outcomes. Given 10 tosses, how likely is 3 heads?
Maximum Likelihood Estimates We seek to estimate the parameter such that it maximizes the likelihood function.  Take the first derivative of the likelihood function with respect to the parameter theta and solve for 0. This value maximizes the likelihood function and is the MLE.
Maximizing the likelihood
Multinomial MLE example There are  n  animals classified into one of four possible categories (Rao 1973). Category counts are the sufficient statistics to estimate multinomial parameters Technique for finding MLEs is the same Take derivative of likelihood function Solve for zero
Multinomial MLE example
Multinomial MLE example
Multinomial MLE runs aground? Adversity strikes! The observed data is incomplete. There are really 5 categories.  y1 is the composite of 2 categories (x1+x2) p(y1)= ½ + ¼ *pi, p(x1) = ½, p(x2)= ¼* pi How can we make a MLE, since we can’t observe category counts x1 and x2?! Unobserved sufficient statistics!?
EM triumphs over adversity! E-STEP: Find the expected values of the sufficient statistics for the complete data X, given the incomplete data Y and the current parameter estimates  M-STEP: Use those sufficient statistics to make a MLE as usual!
MLE for complete data
MLE for complete data
E-step What are the sufficient statistics? X1 => X2 = 125 – x1  How can their expected value be computed? E [x1 | y1] = n*p(x1) The unobserved counts x1 and x2 are the categories of a binomial distribution with a sample size of 125.  p(x1) + p(x2) = p(y1) = ½ + ¼*pi
E-Step E[x1|y1] = n*p(x1) p(x1) = ½ / (½+ ¼*pi) E[x2|y1] = n*p(x2)  = 125 – E[x1|y1] p(x2)= ¼*pi / ( ½ + ¼*pi) Iteration 1? Start with pi = 0.5 (this is just a random guess…)
E-Step Iteration 1 E[x1|y1] = 125* (½ / (½+ ¼*0.5)) = 100 E[x2|y1] = 125 – 100 = 25 These are the expected values of the sufficient statistics, given the observed data and current parameter estimate (which was just a guess)
M-Step iteration 1 Given sufficient statistics, make MLEs as usual
E-Step Iteration 2 E[x1|y1] = 125* (½ / (½+ ¼*0.608)) = 95.86 E[x2|y1] = 125 – 95.86 = 29.14 These are the expected values of the sufficient statistics, given the observed data and current parameter estimate (from iteration 1)
M-Step iteration 2 Given sufficient statistics, make MLEs as usual
Result? Converge in 4 iterations to pi=.627 E[x1|y1] = 95.2 E[x2|y1] = 29.8
Conclusion Distribution must be appropriate to problem Sufficient statistics should be identifiable and have computable expected values Maximization operation should be possible Initialization should be good or lucky to avoid saddle points and local maxima Then…it might be safe to proceed…

More Related Content

PPTX
Data preprocessing in Machine learning
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
DOC
Time and space complexity
PDF
Vc dimension in Machine Learning
PPTX
Python Scipy Numpy
PPT
Association rule mining
PDF
ML Basics
PDF
Methods of Optimization in Machine Learning
Data preprocessing in Machine learning
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Time and space complexity
Vc dimension in Machine Learning
Python Scipy Numpy
Association rule mining
ML Basics
Methods of Optimization in Machine Learning

What's hot (20)

PDF
An introduction to Machine Learning
PPTX
Introdution and designing a learning system
PPTX
Data mining Measuring similarity and desimilarity
PPTX
Data Structure and Algorithm - Divide and Conquer
PPT
2.5 backpropagation
PPTX
Lect6 Association rule & Apriori algorithm
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
PDF
Graph Theory: Cut-Set and Cut-Vertices
PPT
5.1 mining data streams
PPTX
Artificial Neural Network
PPTX
Classification and prediction in data mining
PPTX
Mathematical Analysis of Non-Recursive Algorithm.
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PDF
What is the Expectation Maximization (EM) Algorithm?
PPTX
Semi-Supervised Learning
PPTX
Solving recurrences
PDF
Ai lab manual
PPTX
Types of Machine Learning
PPTX
Linear Regression and Logistic Regression in ML
An introduction to Machine Learning
Introdution and designing a learning system
Data mining Measuring similarity and desimilarity
Data Structure and Algorithm - Divide and Conquer
2.5 backpropagation
Lect6 Association rule & Apriori algorithm
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Graph Theory: Cut-Set and Cut-Vertices
5.1 mining data streams
Artificial Neural Network
Classification and prediction in data mining
Mathematical Analysis of Non-Recursive Algorithm.
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
What is the Expectation Maximization (EM) Algorithm?
Semi-Supervised Learning
Solving recurrences
Ai lab manual
Types of Machine Learning
Linear Regression and Logistic Regression in ML
Ad

Similar to A Gentle Introduction to the EM Algorithm (20)

PDF
MLE.pdf
PPTX
Introduction to Maximum Likelihood Estimator
PDF
Federico Vegetti_GLM and Maximum Likelihood.pdf
PPT
PPTX
ML PRESENTATION (1).pptx
PDF
SAS Homework Help
PDF
3_MLE_printable.pdf
PPTX
Estimation Methods.pptx
PDF
HMM & R & FK
PDF
Lecture12 xing
PDF
Statistical Analysis and Model Validation of Gompertz Model on different Real...
PDF
1388585341 5527874
PPTX
Lecture 6 of probabilistic modellin.pptx
PDF
Lecture9 xing
PDF
On learning statistical mixtures maximizing the complete likelihood
PDF
A walk in the black forest - during which I explain the fundamental problem o...
PDF
Methods of point estimation
PPTX
Chapter_09_ParameterEstimation.pptx
PPTX
Statistics Assignment Help
PDF
Statistics_summary_1634533932.pdf
MLE.pdf
Introduction to Maximum Likelihood Estimator
Federico Vegetti_GLM and Maximum Likelihood.pdf
ML PRESENTATION (1).pptx
SAS Homework Help
3_MLE_printable.pdf
Estimation Methods.pptx
HMM & R & FK
Lecture12 xing
Statistical Analysis and Model Validation of Gompertz Model on different Real...
1388585341 5527874
Lecture 6 of probabilistic modellin.pptx
Lecture9 xing
On learning statistical mixtures maximizing the complete likelihood
A walk in the black forest - during which I explain the fundamental problem o...
Methods of point estimation
Chapter_09_ParameterEstimation.pptx
Statistics Assignment Help
Statistics_summary_1634533932.pdf
Ad

More from University of Minnesota, Duluth (20)

PPTX
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
PDF
Automatically Identifying Islamophobia in Social Media
PPTX
What Makes Hate Speech : an interactive workshop
PDF
Algorithmic Bias - What is it? Why should we care? What can we do about it?
PDF
Algorithmic Bias : What is it? Why should we care? What can we do about it?
PDF
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
PDF
Who's to say what's funny? A computer using Language Models and Deep Learning...
PDF
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
PDF
Puns upon a midnight dreary, lexical semantics for the weak and weary
PDF
The horizon isn't found in a dictionary : Identifying emerging word senses a...
PDF
Screening Twitter Users for Depression and PTSD
PDF
Duluth : Word Sense Discrimination in the Service of Lexicography
PDF
Pedersen masters-thesis-oct-10-2014
PDF
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
PDF
What it's like to do a Master's thesis with me (Ted Pedersen)
PDF
Pedersen naacl-2013-demo-poster-may25
PDF
Pedersen semeval-2013-poster-may24
ODP
Talk at UAB, April 12, 2013
ODP
Feb20 mayo-webinar-21feb2012
ODP
Ihi2012 semantic-similarity-tutorial-part1
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Automatically Identifying Islamophobia in Social Media
What Makes Hate Speech : an interactive workshop
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Who's to say what's funny? A computer using Language Models and Deep Learning...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Puns upon a midnight dreary, lexical semantics for the weak and weary
The horizon isn't found in a dictionary : Identifying emerging word senses a...
Screening Twitter Users for Depression and PTSD
Duluth : Word Sense Discrimination in the Service of Lexicography
Pedersen masters-thesis-oct-10-2014
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
What it's like to do a Master's thesis with me (Ted Pedersen)
Pedersen naacl-2013-demo-poster-may25
Pedersen semeval-2013-poster-may24
Talk at UAB, April 12, 2013
Feb20 mayo-webinar-21feb2012
Ihi2012 semantic-similarity-tutorial-part1

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Computing-Curriculum for Schools in Ghana
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Anesthesia in Laparoscopic Surgery in India
O5-L3 Freight Transport Ops (International) V1.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Insiders guide to clinical Medicine.pdf
Renaissance Architecture: A Journey from Faith to Humanism
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
human mycosis Human fungal infections are called human mycosis..pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
GDM (1) (1).pptx small presentation for students
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Computing-Curriculum for Schools in Ghana
Complications of Minimal Access Surgery at WLH
Microbial diseases, their pathogenesis and prophylaxis
Microbial disease of the cardiovascular and lymphatic systems
VCE English Exam - Section C Student Revision Booklet
FourierSeries-QuestionsWithAnswers(Part-A).pdf

A Gentle Introduction to the EM Algorithm

  • 1. A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota Duluth [email_address]
  • 2. A unifying methodology Dempster, Laird & Rubin (1977) unified many strands of apparently unrelated work under the banner of The EM Algorithm EM had gone incognito for many years Newcomb (1887) McKendrick (1926) Hartley (1958) Baum et. al. (1970)
  • 3. A general framework for solving many kinds of problems Filling in missing data in a sample Discovering the value of latent variables Estimating parameters of HMMs Estimating parameters of finite mixtures Unsupervised learning of clusters …
  • 4. EM allows us to make MLE under adverse circumstances What are Maximum Likelihood Estimates? What are these adverse circumstances? How does EM triumph over adversity? PANEL: When does it really work?
  • 5. Maximum Likelihood Estimates Parameters describe the characteristics of a population. Their values are estimated from samples collected from that population. A MLE is a parameter estimate that is most consistent with the sampled data. It maximizes the likelihood function.
  • 6. Coin Tossing! How likely am I to toss a head? A series of 10 trials/tosses yields (h,t,t,t,h,t,t,h,t,t) (x1=3, x2=7), n=10 Probability of tossing a head = 3/10 That’s a MLE! This estimate is absolutely consistent with the observed data. A few underlying details are masked…
  • 7. Coin tossing unmasked Coin tossing is well described by the binomial distribution since there are n independent trials with two outcomes. Given 10 tosses, how likely is 3 heads?
  • 8. Maximum Likelihood Estimates We seek to estimate the parameter such that it maximizes the likelihood function. Take the first derivative of the likelihood function with respect to the parameter theta and solve for 0. This value maximizes the likelihood function and is the MLE.
  • 10. Multinomial MLE example There are n animals classified into one of four possible categories (Rao 1973). Category counts are the sufficient statistics to estimate multinomial parameters Technique for finding MLEs is the same Take derivative of likelihood function Solve for zero
  • 13. Multinomial MLE runs aground? Adversity strikes! The observed data is incomplete. There are really 5 categories. y1 is the composite of 2 categories (x1+x2) p(y1)= ½ + ¼ *pi, p(x1) = ½, p(x2)= ¼* pi How can we make a MLE, since we can’t observe category counts x1 and x2?! Unobserved sufficient statistics!?
  • 14. EM triumphs over adversity! E-STEP: Find the expected values of the sufficient statistics for the complete data X, given the incomplete data Y and the current parameter estimates M-STEP: Use those sufficient statistics to make a MLE as usual!
  • 17. E-step What are the sufficient statistics? X1 => X2 = 125 – x1 How can their expected value be computed? E [x1 | y1] = n*p(x1) The unobserved counts x1 and x2 are the categories of a binomial distribution with a sample size of 125. p(x1) + p(x2) = p(y1) = ½ + ¼*pi
  • 18. E-Step E[x1|y1] = n*p(x1) p(x1) = ½ / (½+ ¼*pi) E[x2|y1] = n*p(x2) = 125 – E[x1|y1] p(x2)= ¼*pi / ( ½ + ¼*pi) Iteration 1? Start with pi = 0.5 (this is just a random guess…)
  • 19. E-Step Iteration 1 E[x1|y1] = 125* (½ / (½+ ¼*0.5)) = 100 E[x2|y1] = 125 – 100 = 25 These are the expected values of the sufficient statistics, given the observed data and current parameter estimate (which was just a guess)
  • 20. M-Step iteration 1 Given sufficient statistics, make MLEs as usual
  • 21. E-Step Iteration 2 E[x1|y1] = 125* (½ / (½+ ¼*0.608)) = 95.86 E[x2|y1] = 125 – 95.86 = 29.14 These are the expected values of the sufficient statistics, given the observed data and current parameter estimate (from iteration 1)
  • 22. M-Step iteration 2 Given sufficient statistics, make MLEs as usual
  • 23. Result? Converge in 4 iterations to pi=.627 E[x1|y1] = 95.2 E[x2|y1] = 29.8
  • 24. Conclusion Distribution must be appropriate to problem Sufficient statistics should be identifiable and have computable expected values Maximization operation should be possible Initialization should be good or lucky to avoid saddle points and local maxima Then…it might be safe to proceed…