SlideShare a Scribd company logo
Clustering
Shadi Albarqouni, M.Sc.
Graduate Research Assistant | PhD Candidate
shadi.albarqouni@tum.de
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Machine Learning in Medical Imaging (MLMI)
Winter Semester 15/16
BioMedical Computing (BMC) Master Program
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Outline
1 Introduction
2 Parametric, cost-based clustering
1. K-Means
2. K-Medoids
3. Kernel K-Means
4. Spectral Clustering
5. Extensions
6. Comparison
3 Parametric, model-based clustering
1. Mixture Models
4 Non-parametric, model-based clustering
1. Mean-shift
2 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
What is clustering?
Definition (Clustering)
Given n unlabelled data points, separate them into K clusters.
Dilemma! [6]
• What is a Cluster?
(Compact vs. Connected)
• How many K clusters?
(Parametric vs. Non-parametric)
• Soft vs. Hard clustering.
(Model vs. Cost based)
• Data representation.
(Vector vs. Similarities)
• Classification vs. Clustering.
• Stability [7].
3 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Applications
• Image Retrieval
• Image Compression
• Image Segmentation
• Pattern Recognition
4 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Notation
• D = {x1, x2, ..., xn} ∈ Rm×n is the data set.
• m is the feature dimension of xi .
• n is the number of instances.
• K is the number of clusters.
• = {C1, C2, ...., CK }, where Ck is a partition of D.
• c(xi ) is the label/cluster of instance xi .
• rnk where n is the index of instance and k is the index of cluster.
Objective
Find the clusters minimizing the cost function L( ).
5 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Parametric, cost-based clustering
Parametric: K is defined.
Cost-based: It is hard-clustering based on the cost function.
Selected Algorithms:
• K-Means [8].
• K-Medoids [11].
• Kernel K-Means [12].
• Spectral Clustering [10].
6 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Means
• K-Means algorithm:
1. Initialize: Pick K random samples from the dataset D as the cluster
centroids µk = {µ1, µ2, ..., µK }.
2. Assign Points to the clusters: Partition data points D into K
clusters = {C1, C2, ..., CK } based on the Euclidean distance
between the points and centroids (searching for the closest centroid).
3. Centroid update: Based on the points assigned to each cluster, a
new centroid is computed µk .
4. Repeat: Do step 2 and 3 until convergence.
5. Convergence: if the cluster centroids barley change, or we have
compact and/or isolated clusters. Mathematically, when the cost
(distortion) function L( ) =
K
k=1 i∈Ck
xi − µk
2
is minimum.
• Practical issues:
a) The initialization. b) Pre-processing.
7 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Means – Algorithm
input : Data points D = {x1, x2, ..., xn}, number of clusters K
output: Clusters, = {C1, C2, ...., CK }
Pick K random samples as the cluster centroids µk.
repeat
for i = 1 to n do
c(xi ) = mink∈K xi − µk
2
2 %Assign points to clusters
end
for k = 1 to K do
µk = 1
|Ck | i∈Ck
xi %Update the cluster centroid
end
until convergence;
8 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Medoids (1)
• K-Medoids algorithm:
1. Initialize: Pick K random samples from the dataset D as the medoids
µk = {µ1, µ2, ..., µK }.
2. Assign Points to the clusters: Partition data points D into K
clusters = {C1, C2, ..., CK } based on the dissimilarity (Manhattan)
distance between the points and medoids (searching for the min.
dissimilarity).
3. Medoids update: Based on the points assigned to each cluster, swap
the medoid with a new data point and compute the cost. (undo the
swap if the cost is getting increased).
4. Repeat: Do step 2 and 3 until convergence.
5. Convergence: if the cluster medoids barley change. Mathematically,
when the cost (distortion) function L( ) =
K
k=1 i∈Ck
xi − µk is
minimum.
9 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Medoids (2)
Figure : K-Means vs. K-Medoids
10 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means (1)
Definition
It is a generalization of the
standard K-Means algorithm.
• What happens if the clusters
are not linearly separable?
• Euclidean distance vs. Geodesic
distance.
Figure : Spiral and Jain datasets
11 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means (2)
• K-Means can not be applied right away.
• Map the data points xi ∈ D to a high dimensional feature space
M using a nonlinear function φ(xi ).
• Assume the clusters in the high dimensional feature space (RKHS)
is linearly separable, hence K-Means can be applied.
• The cost function would be
LK( ) =
K
k=1 i∈Ck
φ(xi ) − φ(µk) 2
,
where
φ(xi ) − φ(µk) 2
= φ(xi )T ·φ(xi )−2φ(xi )T ·φ(µk)+φ(µk)T ·φ(µk).
12 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means (3)
• Using the kernel trick, Kij = φ(xi )T · φ(xj), the Euclidean distance
in LK( ) can be computed easily using any kernel function Kij
without explicitly knowing the nonlinear transformation φ(xi ).
• Examples of kernel functions (positive semidefinite)
1. Hom. Polynomial kernel: Kij = (xT
i xj )δ
2. Inho. Polynomial kernel: Kij = (xT
i xj + γ)δ
3. Gaussian kernel: Kij = e
− xi −xj
2
2σ2
4. Laplacian kernel Kij = e
− xi −xj
σ
5. Sigmoid kernel: Kij = tanh(γ(xT
i xj ) + θ)
13 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means – Algorithm
input : Data points D = {x1, x2, ..., xn}, Kernel matrix Kij, number
of clusters K
output: Clusters, = {C1, C2, ...., CK }
Pick K random samples as the cluster centroids µk.
repeat
for i = 1 to n do
for k = 1 to K do
Compute φ(xi ) − φ(µk) 2
using Kij
end
c(xi ) = mink∈K φ(xi ) − φ(µk) 2
end
for k = 1 to K do
µk = 1
|Ck | i∈Ck
xi
end
until convergence;
14 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Spectral Clustering
Graph - Overview
• Fully connected, undirected, and wighted graph with N vertices
• The graph is represented by G = {ν, ε, ω}, where ν is a set of
vertices N, ε is a set of edges, and ω is a set of weights are
assigned using a heat kernel as follows to build the Adjacency
matrix W
Wij =



e−
xi −xj
2
2
σ2 eij ∈ ε
0 else
• The degree matrix D, where its diagonal elements Dij = j Wij
• Compute the Normalized graph Laplacian Matrix
˜L = I − D−1/2
WD−1/2
15 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Spectral Clustering – Algorithm
input : Normalized Laplacian Matrix ˜L, number of clusters K
output: Clusters, = {C1, C2, ...., CK }
Compute the firsts K eigenvectors U = {u1, u2, ..., uK } ∈ Rn×K of ˜L.
Compute ˜U by normalising the rows to norm 1.
Do K-Means on ˜U ∈ Rn×K such that your data points are the rows
vectors which have K-dimensions or simply: D ← ˜UT .
Pick K random samples as the cluster centroids µk.
repeat
for i = 1 to n do
c(xi ) = mink∈K xi − µk
2
2 %Assign points to clusters
end
for k = 1 to K do
µk = 1
|Ck | i∈Ck
xi %Update the cluster centroid
end
until convergence;
16 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Extensions (1)
• Alternative cost (distortion) function:
n
i=1
n
j=1
xi − xj
2
=
K
k=1 i,j∈Ck
xi − xj
2
Intracluster distance
+
K
k=1 i∈Ck j /∈Ck
xi − xj
2
Intercluster distance
1. Intracluster distance:
L( ) =
K
k=1 i,j∈Ck
xi − xj
2
+ constant
2. Interclsuter distance:
L( ) = −
K
k=1 i∈Ck l /∈Ck
xi − xj
2
+ constant
17 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Extensions (2)
3. K-Median:
L( ) =
K
k=1 i∈Ck
xi − µk
• Alternative Initialization:
1. K-Means++ [1]
2. Global Kernel K-Means [13]
• On selecting K 1:
1. Rule of thumb: K = n/2
2. Elbow Method
3. Silhouette
• Soft clustering: Fuzzy C-Means [2]
• Variant: Spectral Clustering [14]
• Hierarchical Clustering
1
https://en.wikipedia.org/wiki/Determining_the_number_of_
clusters_in_a_data_set
18 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Comparison
Algorithm Data Rep. Comp. Out. Cent.
K-Means Vectors Low No /∈ D
K-Medians Vectors High No /∈ D
K-Medoids Similarity High Yes ∈ D
Kernel K-Means Kernel High N/A /∈ D
Spectral Clustering Similarity High N/A /∈ D
2
2
Data Rep: Data Representation, Comp.: Computational cost, Out.:
Handling outliers, Cent.: Centroids.
19 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Parametric, model-based clustering
Parametric: K and the density function are defined (i.e. Gaussian)
Model-based: It is soft-clustering based on the mixture density f (x).
f (x) =
K
k=1
πkfk(x), s.t. πk ≥ 0,
K
πk = 1,
where fk(x) is the component of mixture. f (x) is a Gaussian Mixture
Model (GMM) when fk(x) ∼ N(x; µk, σ2
k).
Degree of Membership:
γki = P[xi ∈ Ck] =
πkfk(xi )
f (xi )
GMM Parameter: θ = {π1:K , µ1:K , σ1:K }.
Selected Algorithm to estimate the parameter:
• EM-Algorithm [5].
20 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Expectation-Maximization (EM) Algorithm
• Given data points D sampled
i.i.d from an unknown
distribution f
• We need to model the
distribution using Maximum
Likelihood (ML) principle
(log-likelihood):
l(θ) = ln fθ(D) =
n
i=1
ln fθ(xi )
l(θ) =
n
i=1
ln
K
k=1
πkfk(xi )
The objective: θML = argmaxθl(θ)
Figure : GMM Clustering
21 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
EM – Algorithm
input : data points D, number of clusters K
output: Parameters, θML = {π1:K , µ1:K , σ1:K }
Initialize the parameters θ at random.
repeat
for i = 1 to n do
for k = 1 to K do
γik = πk fk (xi )
f (xi ) %E-Step
end
end
for k = 1 to K do
πk = 1
n
n
i=1 γik %M-Step
µk = 1
nπk
n
i=1 γik xi
σk = 1
nπk
n
i=1 γik (xi − µk )(xi − µk )T
end
until convergence;
22 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Non-parametric, model-based clustering
Idea: group the points by the peak of data density
Parameter: shape and number of clusters K are defined by the
algorithm, however, you should define:
1. smoothness of density estimate (h)3
2. what is a peak
Selected Algorithm:
• Mean-shift [4].
3
Rule of thumb: h = (4ˆσ
3n
)1/5
, where ˆσ is the standard deviation of the samples.
23 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Mean-shift Algorithm
• Given data points D sampled
i.i.d from an unknown density f
• We need to define the shape of
the density using Kernel Density
Estimation (KDE) principle:
fh(x) =
1
nhm
n
i=1
K(
x − xi
h
),
where K(·) is a kernel function,
must be positive, symmetric
and differentiable, i.e. Gaussian
kernel K(z) = 1
(2π)m/2 e−
z 2
2
• The objective: find the peaks of
fh(x) by equating fh(x) = 0
• That results in
x =
n
i=1 xi K(x−xi
h )
n
i=1 K(x−xi
h )
mean−shift
Figure : Mean-shift Clustering
24 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Summary
25 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Acknowledgment
This tutorial is done with the help of
• Bishop’s book [3],
• Meila’s slides in MLSS 2011 [9], and
• Lichao’s slides form pervious semester (SS15)
26 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
References (1)
David Arthur and Sergei Vassilvitskii.
k-means++: The advantages of careful seeding.
In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035.
Society for Industrial and Applied Mathematics, 2007.
James C Bezdek.
Pattern recognition with fuzzy objective function algorithms.
Springer Science & Business Media, 2013.
Christopher M Bishop.
Pattern recognition and machine learning.
springer, 2006.
Yizong Cheng.
Mean shift, mode seeking, and clustering.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790–799, 1995.
Arthur P Dempster, Nan M Laird, and Donald B Rubin.
Maximum likelihood from incomplete data via the em algorithm.
Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977.
Anil K Jain and Martin HC Law.
Data clustering: A user’s dilemma.
In Pattern Recognition and Machine Intelligence, pages 1–10. Springer, 2005.
27 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
References (2)
Tilman Lange, Volker Roth, Mikio L Braun, and Joachim M Buhmann.
Stability-based validation of clustering solutions.
Neural computation, 16(6):1299–1323, 2004.
S. Lloyd.
Least squares quantization in pcm.
Information Theory, IEEE Transactions on, 28(2):129–137, Mar 1982.
Marina Meila.
Classic and modern data clustering.
Machine Learning Summer School (MLSS), 2011.
Andrew Y Ng, Michael I Jordan, Yair Weiss, et al.
On spectral clustering: Analysis and an algorithm.
Advances in neural information processing systems, 2:849–856, 2002.
Hae-Sang Park and Chi-Hyuck Jun.
A simple and fast algorithm for k-medoids clustering.
Expert Systems with Applications, 36(2):3336–3341, 2009.
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller.
Nonlinear component analysis as a kernel eigenvalue problem.
Neural computation, 10(5):1299–1319, 1998.
28 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
References (3)
Grigorios F Tzortzis and Aristidis C Likas.
The global kernel-means algorithm for clustering in feature space.
Neural Networks, IEEE Transactions on, 20(7):1181–1194, 2009.
Ulrike Von Luxburg.
A tutorial on spectral clustering.
Statistics and computing, 17(4):395–416, 2007.
29 / 29

More Related Content

PDF
Lecture 3 image sampling and quantization
PDF
11 clusadvanced
PDF
Information-theoretic clustering with applications
PDF
Lecture 11 (Digital Image Processing)
PDF
Neural Networks: Radial Bases Functions (RBF)
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
PDF
Reweighting and Boosting to uniforimty in HEP
Lecture 3 image sampling and quantization
11 clusadvanced
Information-theoretic clustering with applications
Lecture 11 (Digital Image Processing)
Neural Networks: Radial Bases Functions (RBF)
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Reweighting and Boosting to uniforimty in HEP

What's hot (20)

PDF
MLHEP Lectures - day 1, basic track
PDF
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
PDF
第13回 配信講義 計算科学技術特論A(2021)
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
MLHEP Lectures - day 2, basic track
PDF
Analysis_molf
PDF
010_20160216_Variational Gaussian Process
PDF
2012 mdsp pr08 nonparametric approach
PDF
2012 mdsp pr04 monte carlo
PDF
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
PDF
A review on structure learning in GNN
PDF
Training and Inference for Deep Gaussian Processes
PDF
Machine learning in science and industry — day 4
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
MLHEP Lectures - day 3, basic track
PPT
Section5 Rbf
PDF
Hyperparameter optimization with approximate gradient
PPTX
Smart Multitask Bregman Clustering
PDF
MLHEP 2015: Introductory Lecture #4
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
MLHEP Lectures - day 1, basic track
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
第13回 配信講義 計算科学技術特論A(2021)
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
MLHEP Lectures - day 2, basic track
Analysis_molf
010_20160216_Variational Gaussian Process
2012 mdsp pr08 nonparametric approach
2012 mdsp pr04 monte carlo
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
A review on structure learning in GNN
Training and Inference for Deep Gaussian Processes
Machine learning in science and industry — day 4
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
MLHEP Lectures - day 3, basic track
Section5 Rbf
Hyperparameter optimization with approximate gradient
Smart Multitask Bregman Clustering
MLHEP 2015: Introductory Lecture #4
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Ad

Similar to Clustering lect (20)

PDF
Litvinenko low-rank kriging +FFT poster
PPT
Chapter 11. Cluster Analysis Advanced Methods.ppt
PDF
20 k-means, k-center, k-meoids and variations
PDF
Introduction to Sparse Methods
PDF
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
PPT
11ClusAdvanced.ppt
PDF
Clustering-beamer.pdf
PDF
Computing near-optimal policies from trajectories by solving a sequence of st...
PDF
Inria Tech Talk - La classification de données complexes avec MASSICCC
PDF
The International Journal of Engineering and Science (The IJES)
PDF
Clustering:k-means, expect-maximization and gaussian mixture model
PDF
clusteringEng pour savoir le technique de classtering
PPT
K mean-clustering
PPT
Hierarchical (2)l ppt for data and analytics
PDF
Unbiased Markov chain Monte Carlo
PPTX
DimensionalityReduction.pptx
PDF
Subquad multi ff
PPT
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
PDF
Optimization Techniques
Litvinenko low-rank kriging +FFT poster
Chapter 11. Cluster Analysis Advanced Methods.ppt
20 k-means, k-center, k-meoids and variations
Introduction to Sparse Methods
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
11ClusAdvanced.ppt
Clustering-beamer.pdf
Computing near-optimal policies from trajectories by solving a sequence of st...
Inria Tech Talk - La classification de données complexes avec MASSICCC
The International Journal of Engineering and Science (The IJES)
Clustering:k-means, expect-maximization and gaussian mixture model
clusteringEng pour savoir le technique de classtering
K mean-clustering
Hierarchical (2)l ppt for data and analytics
Unbiased Markov chain Monte Carlo
DimensionalityReduction.pptx
Subquad multi ff
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Optimization Techniques
Ad

More from Shadi Nabil Albarqouni (10)

PDF
PDF
AggNet: Deep Learning from Crowds
PDF
Sparse Regularization
PDF
Telemedicine in Palestine
PDF
Medical robots history
PPTX
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
PPT
DIP Course Projects (HCR)
PPTX
Rigid motions & Homogeneous Transformation
AggNet: Deep Learning from Crowds
Sparse Regularization
Telemedicine in Palestine
Medical robots history
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
DIP Course Projects (HCR)
Rigid motions & Homogeneous Transformation

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Insiders guide to clinical Medicine.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Pharma ospi slides which help in ospi learning
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Anesthesia in Laparoscopic Surgery in India
Insiders guide to clinical Medicine.pdf
Microbial disease of the cardiovascular and lymphatic systems
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
O7-L3 Supply Chain Operations - ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
O5-L3 Freight Transport Ops (International) V1.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Pharma ospi slides which help in ospi learning
GDM (1) (1).pptx small presentation for students
Basic Mud Logging Guide for educational purpose
Microbial diseases, their pathogenesis and prophylaxis
2.FourierTransform-ShortQuestionswithAnswers.pdf

Clustering lect

  • 1. Clustering Shadi Albarqouni, M.Sc. Graduate Research Assistant | PhD Candidate shadi.albarqouni@tum.de Computer Aided Medical Procedures (CAMP) | TU München (TUM) Machine Learning in Medical Imaging (MLMI) Winter Semester 15/16 BioMedical Computing (BMC) Master Program
  • 2. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Outline 1 Introduction 2 Parametric, cost-based clustering 1. K-Means 2. K-Medoids 3. Kernel K-Means 4. Spectral Clustering 5. Extensions 6. Comparison 3 Parametric, model-based clustering 1. Mixture Models 4 Non-parametric, model-based clustering 1. Mean-shift 2 / 29
  • 3. Computer Aided Medical Procedures (CAMP) | TU München (TUM) What is clustering? Definition (Clustering) Given n unlabelled data points, separate them into K clusters. Dilemma! [6] • What is a Cluster? (Compact vs. Connected) • How many K clusters? (Parametric vs. Non-parametric) • Soft vs. Hard clustering. (Model vs. Cost based) • Data representation. (Vector vs. Similarities) • Classification vs. Clustering. • Stability [7]. 3 / 29
  • 4. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Applications • Image Retrieval • Image Compression • Image Segmentation • Pattern Recognition 4 / 29
  • 5. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Notation • D = {x1, x2, ..., xn} ∈ Rm×n is the data set. • m is the feature dimension of xi . • n is the number of instances. • K is the number of clusters. • = {C1, C2, ...., CK }, where Ck is a partition of D. • c(xi ) is the label/cluster of instance xi . • rnk where n is the index of instance and k is the index of cluster. Objective Find the clusters minimizing the cost function L( ). 5 / 29
  • 6. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Parametric, cost-based clustering Parametric: K is defined. Cost-based: It is hard-clustering based on the cost function. Selected Algorithms: • K-Means [8]. • K-Medoids [11]. • Kernel K-Means [12]. • Spectral Clustering [10]. 6 / 29
  • 7. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Means • K-Means algorithm: 1. Initialize: Pick K random samples from the dataset D as the cluster centroids µk = {µ1, µ2, ..., µK }. 2. Assign Points to the clusters: Partition data points D into K clusters = {C1, C2, ..., CK } based on the Euclidean distance between the points and centroids (searching for the closest centroid). 3. Centroid update: Based on the points assigned to each cluster, a new centroid is computed µk . 4. Repeat: Do step 2 and 3 until convergence. 5. Convergence: if the cluster centroids barley change, or we have compact and/or isolated clusters. Mathematically, when the cost (distortion) function L( ) = K k=1 i∈Ck xi − µk 2 is minimum. • Practical issues: a) The initialization. b) Pre-processing. 7 / 29
  • 8. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Means – Algorithm input : Data points D = {x1, x2, ..., xn}, number of clusters K output: Clusters, = {C1, C2, ...., CK } Pick K random samples as the cluster centroids µk. repeat for i = 1 to n do c(xi ) = mink∈K xi − µk 2 2 %Assign points to clusters end for k = 1 to K do µk = 1 |Ck | i∈Ck xi %Update the cluster centroid end until convergence; 8 / 29
  • 9. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Medoids (1) • K-Medoids algorithm: 1. Initialize: Pick K random samples from the dataset D as the medoids µk = {µ1, µ2, ..., µK }. 2. Assign Points to the clusters: Partition data points D into K clusters = {C1, C2, ..., CK } based on the dissimilarity (Manhattan) distance between the points and medoids (searching for the min. dissimilarity). 3. Medoids update: Based on the points assigned to each cluster, swap the medoid with a new data point and compute the cost. (undo the swap if the cost is getting increased). 4. Repeat: Do step 2 and 3 until convergence. 5. Convergence: if the cluster medoids barley change. Mathematically, when the cost (distortion) function L( ) = K k=1 i∈Ck xi − µk is minimum. 9 / 29
  • 10. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Medoids (2) Figure : K-Means vs. K-Medoids 10 / 29
  • 11. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means (1) Definition It is a generalization of the standard K-Means algorithm. • What happens if the clusters are not linearly separable? • Euclidean distance vs. Geodesic distance. Figure : Spiral and Jain datasets 11 / 29
  • 12. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means (2) • K-Means can not be applied right away. • Map the data points xi ∈ D to a high dimensional feature space M using a nonlinear function φ(xi ). • Assume the clusters in the high dimensional feature space (RKHS) is linearly separable, hence K-Means can be applied. • The cost function would be LK( ) = K k=1 i∈Ck φ(xi ) − φ(µk) 2 , where φ(xi ) − φ(µk) 2 = φ(xi )T ·φ(xi )−2φ(xi )T ·φ(µk)+φ(µk)T ·φ(µk). 12 / 29
  • 13. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means (3) • Using the kernel trick, Kij = φ(xi )T · φ(xj), the Euclidean distance in LK( ) can be computed easily using any kernel function Kij without explicitly knowing the nonlinear transformation φ(xi ). • Examples of kernel functions (positive semidefinite) 1. Hom. Polynomial kernel: Kij = (xT i xj )δ 2. Inho. Polynomial kernel: Kij = (xT i xj + γ)δ 3. Gaussian kernel: Kij = e − xi −xj 2 2σ2 4. Laplacian kernel Kij = e − xi −xj σ 5. Sigmoid kernel: Kij = tanh(γ(xT i xj ) + θ) 13 / 29
  • 14. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means – Algorithm input : Data points D = {x1, x2, ..., xn}, Kernel matrix Kij, number of clusters K output: Clusters, = {C1, C2, ...., CK } Pick K random samples as the cluster centroids µk. repeat for i = 1 to n do for k = 1 to K do Compute φ(xi ) − φ(µk) 2 using Kij end c(xi ) = mink∈K φ(xi ) − φ(µk) 2 end for k = 1 to K do µk = 1 |Ck | i∈Ck xi end until convergence; 14 / 29
  • 15. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Spectral Clustering Graph - Overview • Fully connected, undirected, and wighted graph with N vertices • The graph is represented by G = {ν, ε, ω}, where ν is a set of vertices N, ε is a set of edges, and ω is a set of weights are assigned using a heat kernel as follows to build the Adjacency matrix W Wij =    e− xi −xj 2 2 σ2 eij ∈ ε 0 else • The degree matrix D, where its diagonal elements Dij = j Wij • Compute the Normalized graph Laplacian Matrix ˜L = I − D−1/2 WD−1/2 15 / 29
  • 16. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Spectral Clustering – Algorithm input : Normalized Laplacian Matrix ˜L, number of clusters K output: Clusters, = {C1, C2, ...., CK } Compute the firsts K eigenvectors U = {u1, u2, ..., uK } ∈ Rn×K of ˜L. Compute ˜U by normalising the rows to norm 1. Do K-Means on ˜U ∈ Rn×K such that your data points are the rows vectors which have K-dimensions or simply: D ← ˜UT . Pick K random samples as the cluster centroids µk. repeat for i = 1 to n do c(xi ) = mink∈K xi − µk 2 2 %Assign points to clusters end for k = 1 to K do µk = 1 |Ck | i∈Ck xi %Update the cluster centroid end until convergence; 16 / 29
  • 17. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Extensions (1) • Alternative cost (distortion) function: n i=1 n j=1 xi − xj 2 = K k=1 i,j∈Ck xi − xj 2 Intracluster distance + K k=1 i∈Ck j /∈Ck xi − xj 2 Intercluster distance 1. Intracluster distance: L( ) = K k=1 i,j∈Ck xi − xj 2 + constant 2. Interclsuter distance: L( ) = − K k=1 i∈Ck l /∈Ck xi − xj 2 + constant 17 / 29
  • 18. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Extensions (2) 3. K-Median: L( ) = K k=1 i∈Ck xi − µk • Alternative Initialization: 1. K-Means++ [1] 2. Global Kernel K-Means [13] • On selecting K 1: 1. Rule of thumb: K = n/2 2. Elbow Method 3. Silhouette • Soft clustering: Fuzzy C-Means [2] • Variant: Spectral Clustering [14] • Hierarchical Clustering 1 https://en.wikipedia.org/wiki/Determining_the_number_of_ clusters_in_a_data_set 18 / 29
  • 19. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Comparison Algorithm Data Rep. Comp. Out. Cent. K-Means Vectors Low No /∈ D K-Medians Vectors High No /∈ D K-Medoids Similarity High Yes ∈ D Kernel K-Means Kernel High N/A /∈ D Spectral Clustering Similarity High N/A /∈ D 2 2 Data Rep: Data Representation, Comp.: Computational cost, Out.: Handling outliers, Cent.: Centroids. 19 / 29
  • 20. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Parametric, model-based clustering Parametric: K and the density function are defined (i.e. Gaussian) Model-based: It is soft-clustering based on the mixture density f (x). f (x) = K k=1 πkfk(x), s.t. πk ≥ 0, K πk = 1, where fk(x) is the component of mixture. f (x) is a Gaussian Mixture Model (GMM) when fk(x) ∼ N(x; µk, σ2 k). Degree of Membership: γki = P[xi ∈ Ck] = πkfk(xi ) f (xi ) GMM Parameter: θ = {π1:K , µ1:K , σ1:K }. Selected Algorithm to estimate the parameter: • EM-Algorithm [5]. 20 / 29
  • 21. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Expectation-Maximization (EM) Algorithm • Given data points D sampled i.i.d from an unknown distribution f • We need to model the distribution using Maximum Likelihood (ML) principle (log-likelihood): l(θ) = ln fθ(D) = n i=1 ln fθ(xi ) l(θ) = n i=1 ln K k=1 πkfk(xi ) The objective: θML = argmaxθl(θ) Figure : GMM Clustering 21 / 29
  • 22. Computer Aided Medical Procedures (CAMP) | TU München (TUM) EM – Algorithm input : data points D, number of clusters K output: Parameters, θML = {π1:K , µ1:K , σ1:K } Initialize the parameters θ at random. repeat for i = 1 to n do for k = 1 to K do γik = πk fk (xi ) f (xi ) %E-Step end end for k = 1 to K do πk = 1 n n i=1 γik %M-Step µk = 1 nπk n i=1 γik xi σk = 1 nπk n i=1 γik (xi − µk )(xi − µk )T end until convergence; 22 / 29
  • 23. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Non-parametric, model-based clustering Idea: group the points by the peak of data density Parameter: shape and number of clusters K are defined by the algorithm, however, you should define: 1. smoothness of density estimate (h)3 2. what is a peak Selected Algorithm: • Mean-shift [4]. 3 Rule of thumb: h = (4ˆσ 3n )1/5 , where ˆσ is the standard deviation of the samples. 23 / 29
  • 24. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Mean-shift Algorithm • Given data points D sampled i.i.d from an unknown density f • We need to define the shape of the density using Kernel Density Estimation (KDE) principle: fh(x) = 1 nhm n i=1 K( x − xi h ), where K(·) is a kernel function, must be positive, symmetric and differentiable, i.e. Gaussian kernel K(z) = 1 (2π)m/2 e− z 2 2 • The objective: find the peaks of fh(x) by equating fh(x) = 0 • That results in x = n i=1 xi K(x−xi h ) n i=1 K(x−xi h ) mean−shift Figure : Mean-shift Clustering 24 / 29
  • 25. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Summary 25 / 29
  • 26. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Acknowledgment This tutorial is done with the help of • Bishop’s book [3], • Meila’s slides in MLSS 2011 [9], and • Lichao’s slides form pervious semester (SS15) 26 / 29
  • 27. Computer Aided Medical Procedures (CAMP) | TU München (TUM) References (1) David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007. James C Bezdek. Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, 2013. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006. Yizong Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790–799, 1995. Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977. Anil K Jain and Martin HC Law. Data clustering: A user’s dilemma. In Pattern Recognition and Machine Intelligence, pages 1–10. Springer, 2005. 27 / 29
  • 28. Computer Aided Medical Procedures (CAMP) | TU München (TUM) References (2) Tilman Lange, Volker Roth, Mikio L Braun, and Joachim M Buhmann. Stability-based validation of clustering solutions. Neural computation, 16(6):1299–1323, 2004. S. Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2):129–137, Mar 1982. Marina Meila. Classic and modern data clustering. Machine Learning Summer School (MLSS), 2011. Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849–856, 2002. Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications, 36(2):3336–3341, 2009. Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299–1319, 1998. 28 / 29
  • 29. Computer Aided Medical Procedures (CAMP) | TU München (TUM) References (3) Grigorios F Tzortzis and Aristidis C Likas. The global kernel-means algorithm for clustering in feature space. Neural Networks, IEEE Transactions on, 20(7):1181–1194, 2009. Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007. 29 / 29