SlideShare a Scribd company logo
3
Most read
16
Most read
18
Most read
Network Intelligence and Analysis Lab 
Network Intelligence and Analysis Lab 
Clustering methods via EM algorithm 
2014.07.10 
SanghyukChun
Network Intelligence and Analysis Lab 
• 
Machine Learning 
• 
Training data 
• 
Learning model 
• 
Unsupervised Learning 
• 
Training data without label 
• 
Input data: 퐷퐷={푥푥1,푥푥2,…,푥푥푁푁} 
• 
Most of unsupervised learning problems are trying to find hidden structure in unlabeled data 
• 
Examples: Clustering, Dimensionality Reduction (PCA, LDA), … 
Machine Learning and Unsupervised Learning 
2
Network Intelligence and Analysis Lab 
• 
Clustering 
• 
Grouping objects in a such way that objects in the same group are more similar to each other than other groups 
• 
Input: a set of objects (or data) without group information 
• 
Output: cluster index for each object 
• 
Usage: Customer Segmentation, Image Segmentation… 
Unsupervised Learning and Clustering 
Input 
Output 
Clustering 
Algorithm 
3
Network Intelligence and Analysis Lab 
K-means Clustering 
Introduction 
Optimization 
4
Network Intelligence and Analysis Lab 
• 
Intuition: data in same cluster has shorter distance than data which are in other clusters 
• 
Goal: minimize distance between data in same cluster 
• 
Objective function: 
• 
퐽퐽=෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 
• 
Where N is number of data points, K is number of clusters 
• 
푟푟푛푛푛∈{0,1}is indicator variables where k describing which of the K clusters the data point 퐱퐱퐧퐧is assigned to 
• 
훍훍퐤퐤is a prototype associated with the k-thcluster 
• 
Eventually 훍훍퐤퐤is same as the center (mean) of cluster 
K-means Clustering 
5
Network Intelligence and Analysis Lab 
• 
Objective function: 
• 
푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤}෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 
• 
This function can be solved through an iterative procedure 
• 
Step 1: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed 
• 
Step 2: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed 
• 
Repeat Step 1,2 until converge 
• 
Does it always converge? 
K-means Clustering –Optimization 
6
Network Intelligence and Analysis Lab 
• 
Biconvex optimization is a generalization of convex optimization where the objective function and the constraint set can be biconvex 
• 
푓푓푥푥,푦푦is biconvex if fixing x, 푓푓푥푥y=푓푓푥푥,푦푦is convex over Y and fixing y, 푓푓푦푦푥푥=푓푓푥푥,푦푦is convex over X 
• 
One way to solve biconvex optimization problem is that iteratively solve the corresponding convex problems 
• 
It does not guarantee the global optimal point 
• 
But it always converge to some local optimum 
Optional –Biconvex optimization 
7
Network Intelligence and Analysis Lab 
• 
푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤}෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 
• 
Step 1: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed 
• 
푟푟푛푛푛=ቊ1푖푘푘=푎푎푎푎푎푎푎푎푎푛푛푗푗퐱퐱퐧퐧−훍훍퐤퐤 ퟐퟐ 0표표표표표표표표표표표표표표표 
• 
Step 2: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed 
• 
Derivative with respect to 훍훍퐤퐤to zero giving 
• 
2Σ푛푛푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤=0 
• 
훍훍퐤퐤=Σ푛푛푟푟푛푛푛푛퐱퐱퐧퐧 Σ푛푛푟푟푛푛푛푛 
• 
훍훍퐤퐤is equal to the mean of all the data assigned to cluster k 
K-means Clustering –Optimization 
8
Network Intelligence and Analysis Lab 
• 
Advantage of K-means clustering 
• 
Easy to implement (kmeansin Matlab, kclusterin Python) 
• 
In practice, it works well 
• 
Disadvantage of K-means clustering 
• 
It can converge to local optimum 
• 
Computing Euclidian distance of every point is expensive 
• 
Solution: Batch K-means 
• 
Euclidian distance is non-robust to outlier 
• 
Solution: K-medoidsalgorithms (use different metric) 
K-means Clustering –Conclusion 
9
Network Intelligence and Analysis Lab 
Mixture of Gaussians 
Mixture Model 
EM Algorithm 
EM for Gaussian Mixtures 
10
Network Intelligence and Analysis Lab 
• 
Assumption: There are k components: 푐푐푖푖푖푖=1 푘푘 
• 
Component 푐푐푖푖has an associated mean vector 휇휇푖푖 
• 
Each component generates data from a Gaussian with mean 휇휇푖푖 and covariance matrix Σ푖푖 
Mixture of Gaussians 
휇휇1 
휇휇2 
휇휇3 
휇휇4 
휇휇5 
11
Network Intelligence and Analysis Lab 
• 
Represent model as linear combination of Gaussians 
• 
Probability density function of GMM 
• 
푝푝푥푥=෍ 푘푘=1 퐾퐾 휋휋푘푘푁푁푥푥휇휇푘푘,Σ푘푘 
• 
푁푁푥푥휇휇푘푘,Σ푘푘=12휋휋푑푑/2Σ1/2exp{−12푥푥−휇휇⊤Σ−1푥푥−휇휇} 
• 
Which is called a mixture of Gaussian or Gaussian Mixture Model 
• 
Each Gaussian density is called component of the mixtures and has its own mean 휇휇푘푘and covariance Σ푘푘 
• 
The parameters are called mixing coefficients (Σ푘푘휋휋푘푘=1) 
Gaussian Mixture Model 
12
Network Intelligence and Analysis Lab 
• 
푝푝푥푥=Σ푘푘=1 퐾퐾휋휋푘푘푁푁푥푥휇휇푘푘,Σ푘푘, where Σ푘푘휋휋푘푘=1 
• 
Input: 
• 
The training set: 푥푥푖푖푖푖=1 푁푁 
• 
Number of clusters: k 
• 
Goal: model this data using mixture of Gaussians 
• 
Mixing coefficients 휋휋1,휋휋2,…,휋휋푘푘 
• 
Means and covariance: 휇휇1,휇휇2,…,휇휇푘푘;Σ1,Σ2,…,Σ푘푘 
Clustering using Mixture Model 
13
Network Intelligence and Analysis Lab 
• 
푝푝푥푥퐺퐺=푝푝푥푥휋휋1,휇휇1,…=Σ푖푖푝푝푥푥푐푐푖푖푝푝(푐푐푖푖)=Σ푖푖휋휋푖푖푁푁(푥푥|휇휇푖푖,Σ푖푖) 
• 
푝푝푥푥1,푥푥2,…,푥푥푁푁퐺퐺=Π푖푖푝푝(푥푥푖푖|퐺퐺) 
• 
The log likelihood function is given by 
• 
ln푝푝퐗퐗훑훑,훍훍,횺횺=෍ 푛푛=1 푁푁 ln෍ 푘푘=1 퐾퐾 휋휋푘푘푁푁퐱퐱퐧퐧훍훍퐤퐤,횺횺퐤퐤 
• 
Goal: Find parameter which maximize log-likelihood 
• 
Problem: Hard to compute maximum likelihood 
• 
Solution: use EM algorithm 
Maximum Likelihood of GMM 
14
Network Intelligence and Analysis Lab 
• 
EM algorithm is an iterative procedure for finding the MLE 
• 
An expectation (E) step creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters 
• 
A maximization (M) step computes parameters maximizing the expected log-likelihood found on the E step 
• 
These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. 
• 
EM always converges to one of local optimums 
EM (Expectation Maximization) Algorithm 
15
Network Intelligence and Analysis Lab 
• 
푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤}෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 
• 
E-Step: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed 
• 
푟푟푛푛푛=ቊ1푖푘푘=푎푎푎푎푎푎푎푎푎푛푛푗푗퐱퐱퐧퐧−훍훍퐤퐤 ퟐퟐ 0표표표표표표표표표표표표표표표 
• 
M-Step: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed 
• 
훍훍퐤퐤=Σ푛푛푟푟푛푛푛푛퐱퐱퐧퐧 Σ푛푛푟푟푛푛푛푛 
K-means revisit: EM and K-means 
16
Network Intelligence and Analysis Lab 
• 
Let 푧푧푘푘is Bernoulli random variable with probability 휋휋푘푘 
• 
푝푝푧푧푘푘=1=휋휋푘푘where Σ푧푧푘푘=1and Σ휋휋푘푘=1 
• 
Because z use a 1-of-K representation, this distribution in the form 
• 
푝푝푧푧=Π푘푘=1 퐾퐾휋휋푘푘 푧푧푘
• 
Similarly, the conditional distribution of x given a particular value for z is a Gaussian 
• 
푝푝푥푥푧푧=Π푘푘=1 퐾퐾푁푁푥푥휇휇푘푘,Σ푘푘 푧푧푘
Latent variable for GMM 
17
Network Intelligence and Analysis Lab 
• 
The joint distribution is given by 푝푝푥푥,푧푧=푝푝푧푧푝푝(푥푥|푧푧) 
• 
푝푝푥푥=Σ푧푧푝푝푧푧푝푝(푥푥|푧푧)=Σ푘푘휋휋푘푘푁푁(푥푥|휇휇푘푘,Σ푘푘) 
• 
Thus the marginal distribution of x is a Gaussian mixture of the above form 
• 
Now, we are able to work with joint distribution instead of marginal distribution 
• 
Graphical representation of a GMMfor a set of N i.i.d. data points {푥푥푛푛} with corresponding latent variable{푧푧푛푛},where n=1,…,N 
Latent variable for GMM 
퐳퐳퐧퐧 
푿푿풏풏 
훑훑 
흁흁 
횺횺 
N 
18
Network Intelligence and Analysis Lab 
• 
Conditional probability of z given x 
• 
From Bayes’ theorem, 
• 
훾훾푧푧푘푘≡푝푝푧푧푘푘=1퐱퐱=푝푝푧푧푘=1푝푝퐱퐱푧푧푘푘=1Σ푗푗=1 퐾퐾푝푝푧푧푗푗=1푝푝퐱퐱푧푧푗푗=1= 휋휋푘푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣) 
• 
훾훾푧푧푘푘can also be viewed as the responsibility that component k takes for ‘explaining’ the observation x 
EM for Gaussian Mixtures (E-step) 
19
Network Intelligence and Analysis Lab 
• 
Likelihood function for GMM 
• 
ln푝푝퐗퐗훑훑,훍훍,횺횺=෍ 푛푛=1 푁푁 ln෍ 푘푘=1 퐾퐾 휋휋푘푘푁푁퐱퐱퐧퐧훍훍퐤퐤,횺횺퐤퐤 
• 
Setting the derivatives of log likelihood with respect to the means 휇휇푘푘of the Gaussian components to zero, we obtain 
• 
휇휇푘푘= 1N푘푘 ෍ 푛푛=1 푁푁 훾훾푧푧푛푛푛퐱퐱퐧퐧 where, 푁푁푘푘=Σ푛푛=1 푁푁훾훾(푧푧푛푛푛) 
EM for Gaussian Mixtures (M-step) 
20
Network Intelligence and Analysis Lab 
• 
Setting the derivatives of likelihood with respect to the Σ푘푘to zero, we obtain 
• 
횺횺풌풌= 1 푁푁푘푘 ෍ 푛푛=1 푁푁 훾훾푧푧푛푛푛퐱퐱퐧퐧−휇휇푘푘퐱퐱퐧퐧−휇휇푘푘 ⊤ 
• 
Maximize likelihood with respect to the mixing coefficient 휋휋by using a Lagrange multiplier, we obtain 
• 
ln푝푝퐗퐗훑훑,훍훍,횺횺+휆휆(Σ푘푘=1 퐾퐾휋휋푘푘−1) 
• 
휋휋푘푘=푁푁푘푁푁 
EM for Gaussian Mixtures (M-step) 
21
Network Intelligence and Analysis Lab 
• 
휇휇푘푘,Σ푘푘,휋휋푘푘do not constitute a closed-form solution for the parameters of the mixture model because the responsibility 훾훾푧푧푛푛푛depend on those parameters in a complex way 
• 
훾훾(푧푧푛푛푛)=휋휋푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣) 
• 
In EM algorithm for GMM, 훾훾(푧푧푛푛푛)and parameters are iteratively optimized 
• 
In E step, responsibilities or the posterior probabilities are evaluated by current values for the parameters 
• 
In M step, re-estimate the means, covariances, and mixing coefficients using previous results 
EM for Gaussian Mixtures 
22
Network Intelligence and Analysis Lab 
• 
Initialize the means 휇휇푘푘, covariancesΣ푘푘and mixing coefficient 휋휋푘푘, and evaluate the initial value of the log likelihood 
• 
E step: Evaluate the responsibilities using the current parameter 
• 
훾훾(푧푧푛푛푛)= 휋휋푘푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣) 
• 
M step: Re-estimate parameters using the current responsibilities 
• 
휇휇푘푘 푛푛푛푛푛푛=1N푘Σ푛푛=1 푁푁훾훾푧푧푛푛푛퐱퐱퐧퐧 
• 
횺횺풌풌 풏풏풏풏풏풏=1 푁푁푘Σ푛푛=1 푁푁훾훾푧푧푛푛푛퐱퐱퐧퐧−휇휇푘푘퐱퐱퐧퐧−휇휇푘푘 ⊤ 
• 
휋휋푘푘 푛푛푛푛푛푛=푁푁푘푁푁 
• 
푁푁푘푘=Σ푛푛=1 푁푁훾훾(푧푧푛푛푛) 
• 
Repeat E step and M step until converge 
EM for Gaussian Mixtures 
23
Network Intelligence and Analysis Lab 
• 
We can derive the K-means algorithm as a particular limit of EM for Gaussian Mixture Model 
• 
Consider a Gaussian mixture model with covariance matrices are given by 휀휀퐼퐼, where 휀휀is a variance parameter and I is identity 
• 
If we consider the limit휀휀→0, log likelihood of GMM becomes 
• 
퐸퐸푧푧ln푝푝푋푋,푍푍휇휇,Σ,휋휋→−12=Σ푛푛Σ푘푘푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2+퐶퐶 
• 
Thus, we see that in this limit, maximizing the expected complete- data log likelihood is equivalent to K-means algorithm 
Relationship between K-means algorithm and GMM 
24

More Related Content

ODP
Machine Learning with Decision trees
PDF
Bayesian Networks - A Brief Introduction
PDF
Markov decision process
PDF
Genetic Algorithms
PDF
Gradient descent method
PPTX
Support vector machines (svm)
PDF
What is the Expectation Maximization (EM) Algorithm?
PDF
Genetic Algorithms
Machine Learning with Decision trees
Bayesian Networks - A Brief Introduction
Markov decision process
Genetic Algorithms
Gradient descent method
Support vector machines (svm)
What is the Expectation Maximization (EM) Algorithm?
Genetic Algorithms

What's hot (20)

PPTX
PDF
Linear models for classification
ODP
Machine Learning With Logistic Regression
PPTX
Speaker Recognition using Gaussian Mixture Model
PPT
Naive bayes
PDF
Logistic regression in Machine Learning
PPTX
Genetic Algorithm by Example
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Genetic Algorithms
PPT
Genetic Algorithms - Artificial Intelligence
PPTX
Naive Bayes Presentation
PPT
Bayseian decision theory
PPTX
Randomized Algorithm- Advanced Algorithm
PDF
Machine Learning: Generative and Discriminative Models
PPTX
An overview of gradient descent optimization algorithms
PDF
Bayesian inference
PPTX
Feedforward neural network
PDF
Tutorial on Deep Generative Models
ODP
Introduction to Bayesian Statistics
PPTX
Wrapper feature selection method
Linear models for classification
Machine Learning With Logistic Regression
Speaker Recognition using Gaussian Mixture Model
Naive bayes
Logistic regression in Machine Learning
Genetic Algorithm by Example
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Genetic Algorithms
Genetic Algorithms - Artificial Intelligence
Naive Bayes Presentation
Bayseian decision theory
Randomized Algorithm- Advanced Algorithm
Machine Learning: Generative and Discriminative Models
An overview of gradient descent optimization algorithms
Bayesian inference
Feedforward neural network
Tutorial on Deep Generative Models
Introduction to Bayesian Statistics
Wrapper feature selection method
Ad

Viewers also liked (20)

PDF
Expectation Maximization and Gaussian Mixture Models
PDF
K-means, EM and Mixture models
PPTX
Lecture 18: Gaussian Mixture Models and Expectation Maximization
PDF
Markov Chain Basic
PDF
Clustering:k-means, expect-maximization and gaussian mixture model
PDF
Visual Design with Data
PPTX
Statistical Clustering
PPTX
Em Algorithm | Statistics
PPT
Expectation Maximization | Statistics
PDF
면발쫄깃123(기업용제안서)
DOC
A study of investors opinion about on line tradingcase of m soll ltd.
PDF
Internship_presentation
PPTX
구조 설정
PDF
Coordinate Descent method
PDF
Introduction to E-book
PDF
비즈니스모델 젠 (Business Model Zen) 소개
PPT
최종Ppt 디자인입힌거 최종
PDF
[토크아이티] 프런트엔드 개발 시작하기 저자 특강
PDF
패션, 뷰티, 라이프스타일 부문 글로벌 디지털 콘텐츠 허브 : 패션인코리아 소개
PPT
K mean-clustering algorithm
Expectation Maximization and Gaussian Mixture Models
K-means, EM and Mixture models
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Markov Chain Basic
Clustering:k-means, expect-maximization and gaussian mixture model
Visual Design with Data
Statistical Clustering
Em Algorithm | Statistics
Expectation Maximization | Statistics
면발쫄깃123(기업용제안서)
A study of investors opinion about on line tradingcase of m soll ltd.
Internship_presentation
구조 설정
Coordinate Descent method
Introduction to E-book
비즈니스모델 젠 (Business Model Zen) 소개
최종Ppt 디자인입힌거 최종
[토크아이티] 프런트엔드 개발 시작하기 저자 특강
패션, 뷰티, 라이프스타일 부문 글로벌 디지털 콘텐츠 허브 : 패션인코리아 소개
K mean-clustering algorithm
Ad

Similar to K-means and GMM (20)

PPTX
PRML Chapter 9
PDF
Machine learning ,supervised learning ,j
PPTX
Statistical Clustering Redux - kmeans, GMM and Variational Inference
PPTX
GMM Clustering Presentation Slides for Machine Learning Course
PPTX
Learning group em - 20171025 - copy
PDF
Clustering-beamer.pdf
PDF
2012 mdsp pr12 k means mixture of gaussian
PDF
Cs229 notes7b
PDF
Machine learning (8)
PPTX
ML PRESENTATION (1).pptx
PPTX
A popular clustering algorithm is known as K-means, which will follow an iter...
PPTX
presentationIDC - 14MAY2015
PDF
13_Unsupervised Learning.pdf
PPTX
Machine learning interviews day4
PDF
Machine Learning: an Introduction and cases
PDF
07 Machine Learning - Expectation Maximization
PPT
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
PPTX
ML unit3.pptx
PDF
11 clusadvanced
PPTX
Gaussian-Integrals and Hypothesis Testing.pptx
PRML Chapter 9
Machine learning ,supervised learning ,j
Statistical Clustering Redux - kmeans, GMM and Variational Inference
GMM Clustering Presentation Slides for Machine Learning Course
Learning group em - 20171025 - copy
Clustering-beamer.pdf
2012 mdsp pr12 k means mixture of gaussian
Cs229 notes7b
Machine learning (8)
ML PRESENTATION (1).pptx
A popular clustering algorithm is known as K-means, which will follow an iter...
presentationIDC - 14MAY2015
13_Unsupervised Learning.pdf
Machine learning interviews day4
Machine Learning: an Introduction and cases
07 Machine Learning - Expectation Maximization
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
ML unit3.pptx
11 clusadvanced
Gaussian-Integrals and Hypothesis Testing.pptx

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
sap open course for s4hana steps from ECC to s4
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology
Understanding_Digital_Forensics_Presentation.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
sap open course for s4hana steps from ECC to s4

K-means and GMM

  • 1. Network Intelligence and Analysis Lab Network Intelligence and Analysis Lab Clustering methods via EM algorithm 2014.07.10 SanghyukChun
  • 2. Network Intelligence and Analysis Lab • Machine Learning • Training data • Learning model • Unsupervised Learning • Training data without label • Input data: 퐷퐷={푥푥1,푥푥2,…,푥푥푁푁} • Most of unsupervised learning problems are trying to find hidden structure in unlabeled data • Examples: Clustering, Dimensionality Reduction (PCA, LDA), … Machine Learning and Unsupervised Learning 2
  • 3. Network Intelligence and Analysis Lab • Clustering • Grouping objects in a such way that objects in the same group are more similar to each other than other groups • Input: a set of objects (or data) without group information • Output: cluster index for each object • Usage: Customer Segmentation, Image Segmentation… Unsupervised Learning and Clustering Input Output Clustering Algorithm 3
  • 4. Network Intelligence and Analysis Lab K-means Clustering Introduction Optimization 4
  • 5. Network Intelligence and Analysis Lab • Intuition: data in same cluster has shorter distance than data which are in other clusters • Goal: minimize distance between data in same cluster • Objective function: • 퐽퐽=෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 • Where N is number of data points, K is number of clusters • 푟푟푛푛푛∈{0,1}is indicator variables where k describing which of the K clusters the data point 퐱퐱퐧퐧is assigned to • 훍훍퐤퐤is a prototype associated with the k-thcluster • Eventually 훍훍퐤퐤is same as the center (mean) of cluster K-means Clustering 5
  • 6. Network Intelligence and Analysis Lab • Objective function: • 푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤}෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 • This function can be solved through an iterative procedure • Step 1: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed • Step 2: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed • Repeat Step 1,2 until converge • Does it always converge? K-means Clustering –Optimization 6
  • 7. Network Intelligence and Analysis Lab • Biconvex optimization is a generalization of convex optimization where the objective function and the constraint set can be biconvex • 푓푓푥푥,푦푦is biconvex if fixing x, 푓푓푥푥y=푓푓푥푥,푦푦is convex over Y and fixing y, 푓푓푦푦푥푥=푓푓푥푥,푦푦is convex over X • One way to solve biconvex optimization problem is that iteratively solve the corresponding convex problems • It does not guarantee the global optimal point • But it always converge to some local optimum Optional –Biconvex optimization 7
  • 8. Network Intelligence and Analysis Lab • 푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤}෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 • Step 1: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed • 푟푟푛푛푛=ቊ1푖푘푘=푎푎푎푎푎푎푎푎푎푛푛푗푗퐱퐱퐧퐧−훍훍퐤퐤 ퟐퟐ 0표표표표표표표표표표표표표표표 • Step 2: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed • Derivative with respect to 훍훍퐤퐤to zero giving • 2Σ푛푛푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤=0 • 훍훍퐤퐤=Σ푛푛푟푟푛푛푛푛퐱퐱퐧퐧 Σ푛푛푟푟푛푛푛푛 • 훍훍퐤퐤is equal to the mean of all the data assigned to cluster k K-means Clustering –Optimization 8
  • 9. Network Intelligence and Analysis Lab • Advantage of K-means clustering • Easy to implement (kmeansin Matlab, kclusterin Python) • In practice, it works well • Disadvantage of K-means clustering • It can converge to local optimum • Computing Euclidian distance of every point is expensive • Solution: Batch K-means • Euclidian distance is non-robust to outlier • Solution: K-medoidsalgorithms (use different metric) K-means Clustering –Conclusion 9
  • 10. Network Intelligence and Analysis Lab Mixture of Gaussians Mixture Model EM Algorithm EM for Gaussian Mixtures 10
  • 11. Network Intelligence and Analysis Lab • Assumption: There are k components: 푐푐푖푖푖푖=1 푘푘 • Component 푐푐푖푖has an associated mean vector 휇휇푖푖 • Each component generates data from a Gaussian with mean 휇휇푖푖 and covariance matrix Σ푖푖 Mixture of Gaussians 휇휇1 휇휇2 휇휇3 휇휇4 휇휇5 11
  • 12. Network Intelligence and Analysis Lab • Represent model as linear combination of Gaussians • Probability density function of GMM • 푝푝푥푥=෍ 푘푘=1 퐾퐾 휋휋푘푘푁푁푥푥휇휇푘푘,Σ푘푘 • 푁푁푥푥휇휇푘푘,Σ푘푘=12휋휋푑푑/2Σ1/2exp{−12푥푥−휇휇⊤Σ−1푥푥−휇휇} • Which is called a mixture of Gaussian or Gaussian Mixture Model • Each Gaussian density is called component of the mixtures and has its own mean 휇휇푘푘and covariance Σ푘푘 • The parameters are called mixing coefficients (Σ푘푘휋휋푘푘=1) Gaussian Mixture Model 12
  • 13. Network Intelligence and Analysis Lab • 푝푝푥푥=Σ푘푘=1 퐾퐾휋휋푘푘푁푁푥푥휇휇푘푘,Σ푘푘, where Σ푘푘휋휋푘푘=1 • Input: • The training set: 푥푥푖푖푖푖=1 푁푁 • Number of clusters: k • Goal: model this data using mixture of Gaussians • Mixing coefficients 휋휋1,휋휋2,…,휋휋푘푘 • Means and covariance: 휇휇1,휇휇2,…,휇휇푘푘;Σ1,Σ2,…,Σ푘푘 Clustering using Mixture Model 13
  • 14. Network Intelligence and Analysis Lab • 푝푝푥푥퐺퐺=푝푝푥푥휋휋1,휇휇1,…=Σ푖푖푝푝푥푥푐푐푖푖푝푝(푐푐푖푖)=Σ푖푖휋휋푖푖푁푁(푥푥|휇휇푖푖,Σ푖푖) • 푝푝푥푥1,푥푥2,…,푥푥푁푁퐺퐺=Π푖푖푝푝(푥푥푖푖|퐺퐺) • The log likelihood function is given by • ln푝푝퐗퐗훑훑,훍훍,횺횺=෍ 푛푛=1 푁푁 ln෍ 푘푘=1 퐾퐾 휋휋푘푘푁푁퐱퐱퐧퐧훍훍퐤퐤,횺횺퐤퐤 • Goal: Find parameter which maximize log-likelihood • Problem: Hard to compute maximum likelihood • Solution: use EM algorithm Maximum Likelihood of GMM 14
  • 15. Network Intelligence and Analysis Lab • EM algorithm is an iterative procedure for finding the MLE • An expectation (E) step creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters • A maximization (M) step computes parameters maximizing the expected log-likelihood found on the E step • These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. • EM always converges to one of local optimums EM (Expectation Maximization) Algorithm 15
  • 16. Network Intelligence and Analysis Lab • 푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤}෍ 푛푛=1 푁푁 ෍ 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2 • E-Step: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed • 푟푟푛푛푛=ቊ1푖푘푘=푎푎푎푎푎푎푎푎푎푛푛푗푗퐱퐱퐧퐧−훍훍퐤퐤 ퟐퟐ 0표표표표표표표표표표표표표표표 • M-Step: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed • 훍훍퐤퐤=Σ푛푛푟푟푛푛푛푛퐱퐱퐧퐧 Σ푛푛푟푟푛푛푛푛 K-means revisit: EM and K-means 16
  • 17. Network Intelligence and Analysis Lab • Let 푧푧푘푘is Bernoulli random variable with probability 휋휋푘푘 • 푝푝푧푧푘푘=1=휋휋푘푘where Σ푧푧푘푘=1and Σ휋휋푘푘=1 • Because z use a 1-of-K representation, this distribution in the form • 푝푝푧푧=Π푘푘=1 퐾퐾휋휋푘푘 푧푧푘 • Similarly, the conditional distribution of x given a particular value for z is a Gaussian • 푝푝푥푥푧푧=Π푘푘=1 퐾퐾푁푁푥푥휇휇푘푘,Σ푘푘 푧푧푘 Latent variable for GMM 17
  • 18. Network Intelligence and Analysis Lab • The joint distribution is given by 푝푝푥푥,푧푧=푝푝푧푧푝푝(푥푥|푧푧) • 푝푝푥푥=Σ푧푧푝푝푧푧푝푝(푥푥|푧푧)=Σ푘푘휋휋푘푘푁푁(푥푥|휇휇푘푘,Σ푘푘) • Thus the marginal distribution of x is a Gaussian mixture of the above form • Now, we are able to work with joint distribution instead of marginal distribution • Graphical representation of a GMMfor a set of N i.i.d. data points {푥푥푛푛} with corresponding latent variable{푧푧푛푛},where n=1,…,N Latent variable for GMM 퐳퐳퐧퐧 푿푿풏풏 훑훑 흁흁 횺횺 N 18
  • 19. Network Intelligence and Analysis Lab • Conditional probability of z given x • From Bayes’ theorem, • 훾훾푧푧푘푘≡푝푝푧푧푘푘=1퐱퐱=푝푝푧푧푘=1푝푝퐱퐱푧푧푘푘=1Σ푗푗=1 퐾퐾푝푝푧푧푗푗=1푝푝퐱퐱푧푧푗푗=1= 휋휋푘푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣) • 훾훾푧푧푘푘can also be viewed as the responsibility that component k takes for ‘explaining’ the observation x EM for Gaussian Mixtures (E-step) 19
  • 20. Network Intelligence and Analysis Lab • Likelihood function for GMM • ln푝푝퐗퐗훑훑,훍훍,횺횺=෍ 푛푛=1 푁푁 ln෍ 푘푘=1 퐾퐾 휋휋푘푘푁푁퐱퐱퐧퐧훍훍퐤퐤,횺횺퐤퐤 • Setting the derivatives of log likelihood with respect to the means 휇휇푘푘of the Gaussian components to zero, we obtain • 휇휇푘푘= 1N푘푘 ෍ 푛푛=1 푁푁 훾훾푧푧푛푛푛퐱퐱퐧퐧 where, 푁푁푘푘=Σ푛푛=1 푁푁훾훾(푧푧푛푛푛) EM for Gaussian Mixtures (M-step) 20
  • 21. Network Intelligence and Analysis Lab • Setting the derivatives of likelihood with respect to the Σ푘푘to zero, we obtain • 횺횺풌풌= 1 푁푁푘푘 ෍ 푛푛=1 푁푁 훾훾푧푧푛푛푛퐱퐱퐧퐧−휇휇푘푘퐱퐱퐧퐧−휇휇푘푘 ⊤ • Maximize likelihood with respect to the mixing coefficient 휋휋by using a Lagrange multiplier, we obtain • ln푝푝퐗퐗훑훑,훍훍,횺횺+휆휆(Σ푘푘=1 퐾퐾휋휋푘푘−1) • 휋휋푘푘=푁푁푘푁푁 EM for Gaussian Mixtures (M-step) 21
  • 22. Network Intelligence and Analysis Lab • 휇휇푘푘,Σ푘푘,휋휋푘푘do not constitute a closed-form solution for the parameters of the mixture model because the responsibility 훾훾푧푧푛푛푛depend on those parameters in a complex way • 훾훾(푧푧푛푛푛)=휋휋푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣) • In EM algorithm for GMM, 훾훾(푧푧푛푛푛)and parameters are iteratively optimized • In E step, responsibilities or the posterior probabilities are evaluated by current values for the parameters • In M step, re-estimate the means, covariances, and mixing coefficients using previous results EM for Gaussian Mixtures 22
  • 23. Network Intelligence and Analysis Lab • Initialize the means 휇휇푘푘, covariancesΣ푘푘and mixing coefficient 휋휋푘푘, and evaluate the initial value of the log likelihood • E step: Evaluate the responsibilities using the current parameter • 훾훾(푧푧푛푛푛)= 휋휋푘푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣) • M step: Re-estimate parameters using the current responsibilities • 휇휇푘푘 푛푛푛푛푛푛=1N푘Σ푛푛=1 푁푁훾훾푧푧푛푛푛퐱퐱퐧퐧 • 횺횺풌풌 풏풏풏풏풏풏=1 푁푁푘Σ푛푛=1 푁푁훾훾푧푧푛푛푛퐱퐱퐧퐧−휇휇푘푘퐱퐱퐧퐧−휇휇푘푘 ⊤ • 휋휋푘푘 푛푛푛푛푛푛=푁푁푘푁푁 • 푁푁푘푘=Σ푛푛=1 푁푁훾훾(푧푧푛푛푛) • Repeat E step and M step until converge EM for Gaussian Mixtures 23
  • 24. Network Intelligence and Analysis Lab • We can derive the K-means algorithm as a particular limit of EM for Gaussian Mixture Model • Consider a Gaussian mixture model with covariance matrices are given by 휀휀퐼퐼, where 휀휀is a variance parameter and I is identity • If we consider the limit휀휀→0, log likelihood of GMM becomes • 퐸퐸푧푧ln푝푝푋푋,푍푍휇휇,Σ,휋휋→−12=Σ푛푛Σ푘푘푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2+퐶퐶 • Thus, we see that in this limit, maximizing the expected complete- data log likelihood is equivalent to K-means algorithm Relationship between K-means algorithm and GMM 24