SlideShare a Scribd company logo
Clustering Part 1
Abdul Kawsar Tushar
Nadeem Ahmed
CSE, UAP
What is Clustering
• visualization of data
• hypothesis generation
Overview of Clustering
• Feature Selection
• Feature Extraction
• transformations of the input features to produce
new salient features.
• Inter-pattern Similarity
• Grouping
Formal Definition
• Clustering is the classification of objects into different
groups, or more precisely, the partitioning of a data set into
subsets (clusters), so that the data in each subset (ideally)
share some common trait - often according to some defined
distance measure.
Notion of a Cluster can be Ambiguous
How many clusters?
Four ClustersTwo Clusters
Six Clusters
Hierarchical Clustering: Example
Hierarchical Clustering: Example Using Single
Linkage
Hierarchical Clustering: Forming Clusters
• Forming clusters from dendograms
Hierarchical Clustering
• Advantages
• Dendograms are great for visualization
• Provides hierarchical relations between clusters
• Shown to be able to capture concentric clusters
• Disadvantages
• Not easy to define levels for clusters
• Experiments showed that other clustering techniques outperform hierarchical
clustering
How to Define Inter-Cluster Similarity
Similarity?
 Single Link
 Complete Link
 Average Link
How to Define Inter-Cluster Similarity
 Single Link
 Complete Link
 Average Link
How to Define Inter-Cluster Similarity
 Single Link
 Complete Link
 Average Link
How to Define Inter-Cluster Similarity
 Single Link
 Complete Link
 Average Link
Common Similarity Measures
• Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is given by:
2. The Manhattan distance (also called taxicab norm or 1-norm) is
given by:
A Simple example showing the implementation of k-
means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
• Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
• Their new centroids are:
Step 3:
• Now using these centroids we
compute the Euclidean
distance of each object, as
shown in table.
• Therefore, the new clusters
are:
{1,2} and {3,4,5,6,7}
• Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
• Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
• Therefore, there is no change
in the cluster.
• Thus, the algorithm comes to
a halt here and final result
consist of 2 clusters {1,2} and
{3,4,5,6,7}.
PLOT
(with K=3)
Step 1 Step 2
PLOT
Two different K-means Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Sub-optimal Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Optimal Clustering
Original Points
Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
Clustering Non-clustered Data
Getting Stuck In A Local Minimum
Can k-means Handle Non-spherical Clusters?
…Maybe not.
Let’s Try Single Linkage Hierarchical Clustering
K-means with Polar Coordinates
Clustering part 1

More Related Content

PPT
K mean-clustering algorithm
PPT
PPTX
Clustering
PPTX
K means clustring @jax
PPTX
Clustering, k-means clustering
PPT
K mean-clustering
PPTX
K-means clustering algorithm
K mean-clustering algorithm
Clustering
K means clustring @jax
Clustering, k-means clustering
K mean-clustering
K-means clustering algorithm

What's hot (20)

PPTX
K-Means clustring @jax
PPT
Cure, Clustering Algorithm
PDF
K-means Clustering Algorithm with Matlab Source code
PDF
K means clustering
PPT
Clustering in artificial intelligence
PDF
An improvement in k mean clustering algorithm using better time and accuracy
PPT
K means Clustering Algorithm
PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PPTX
K-Means manual work
PPTX
Hierarchical clustering
DOCX
Neural nw k means
PPTX
K MEANS CLUSTERING
PDF
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
PPTX
K means clustering | K Means ++
PPT
Data miningpresentation
PPTX
Hierarchical Clustering
PDF
K means
PDF
Hierarchical Clustering
PDF
Rough K Means - Numerical Example
K-Means clustring @jax
Cure, Clustering Algorithm
K-means Clustering Algorithm with Matlab Source code
K means clustering
Clustering in artificial intelligence
An improvement in k mean clustering algorithm using better time and accuracy
K means Clustering Algorithm
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
K-Means manual work
Hierarchical clustering
Neural nw k means
K MEANS CLUSTERING
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K means clustering | K Means ++
Data miningpresentation
Hierarchical Clustering
K means
Hierarchical Clustering
Rough K Means - Numerical Example
Ad

Viewers also liked (7)

PDF
Literature Survey: Clustering Technique
PPTX
Clustering in wireless sensor networks with compressive sensing
PPTX
Aplicaciones Difusas: Algoritmo k medias
PDF
Fuzzy c-Means Clustering Algorithms
PDF
Clustering: A Survey
PPTX
Clustering in Data Mining
PPTX
Types of clustering and different types of clustering algorithms
Literature Survey: Clustering Technique
Clustering in wireless sensor networks with compressive sensing
Aplicaciones Difusas: Algoritmo k medias
Fuzzy c-Means Clustering Algorithms
Clustering: A Survey
Clustering in Data Mining
Types of clustering and different types of clustering algorithms
Ad

Similar to Clustering part 1 (20)

PPT
Lecture_3_k-mean-clustering.ppt
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
K means clustering
PDF
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
PPTX
AI-Lec20 Clustering I - Kmean.pptx
PPT
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
PPT
06K-means-clustering K-MEANS CLUSTERING.ppt
PPT
Clustering in Machine Learning: A Brief Overview.ppt
DOCX
8.clustering algorithm.k means.em algorithm
PDF
Clustering
PDF
Chapter#04[Part#01]K-Means Clusterig.pdf
PPT
26-Clustering MTech-2017.ppt
PDF
k-mean-clustering.pdf
PDF
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
PPTX
Lec13 Clustering.pptx
PPT
clustering algorithm in neural networks
PPT
Unsupervised Machine Learning, Clustering, K-Means
PPT
k-mean-Clustering impact on AI using DSS
PPT
k-mean-clustering algorithm with example.ppt
PPT
k-mean-clustering.ppt
Lecture_3_k-mean-clustering.ppt
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K means clustering
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
AI-Lec20 Clustering I - Kmean.pptx
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
06K-means-clustering K-MEANS CLUSTERING.ppt
Clustering in Machine Learning: A Brief Overview.ppt
8.clustering algorithm.k means.em algorithm
Clustering
Chapter#04[Part#01]K-Means Clusterig.pdf
26-Clustering MTech-2017.ppt
k-mean-clustering.pdf
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
Lec13 Clustering.pptx
clustering algorithm in neural networks
Unsupervised Machine Learning, Clustering, K-Means
k-mean-Clustering impact on AI using DSS
k-mean-clustering algorithm with example.ppt
k-mean-clustering.ppt

Recently uploaded (20)

PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Pre independence Education in Inndia.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Computing-Curriculum for Schools in Ghana
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Complications of Minimal Access Surgery at WLH
Pre independence Education in Inndia.pdf
PPH.pptx obstetrics and gynecology in nursing
Insiders guide to clinical Medicine.pdf
Cell Structure & Organelles in detailed.
102 student loan defaulters named and shamed – Is someone you know on the list?
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Final Presentation General Medicine 03-08-2024.pptx
TR - Agricultural Crops Production NC III.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cell Types and Its function , kingdom of life
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
O5-L3 Freight Transport Ops (International) V1.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial disease of the cardiovascular and lymphatic systems
O7-L3 Supply Chain Operations - ICLT Program
Sports Quiz easy sports quiz sports quiz
Computing-Curriculum for Schools in Ghana

Clustering part 1

  • 1. Clustering Part 1 Abdul Kawsar Tushar Nadeem Ahmed CSE, UAP
  • 2. What is Clustering • visualization of data • hypothesis generation
  • 3. Overview of Clustering • Feature Selection • Feature Extraction • transformations of the input features to produce new salient features. • Inter-pattern Similarity • Grouping
  • 4. Formal Definition • Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.
  • 5. Notion of a Cluster can be Ambiguous How many clusters? Four ClustersTwo Clusters Six Clusters
  • 7. Hierarchical Clustering: Example Using Single Linkage
  • 8. Hierarchical Clustering: Forming Clusters • Forming clusters from dendograms
  • 9. Hierarchical Clustering • Advantages • Dendograms are great for visualization • Provides hierarchical relations between clusters • Shown to be able to capture concentric clusters • Disadvantages • Not easy to define levels for clusters • Experiments showed that other clustering techniques outperform hierarchical clustering
  • 10. How to Define Inter-Cluster Similarity Similarity?  Single Link  Complete Link  Average Link
  • 11. How to Define Inter-Cluster Similarity  Single Link  Complete Link  Average Link
  • 12. How to Define Inter-Cluster Similarity  Single Link  Complete Link  Average Link
  • 13. How to Define Inter-Cluster Similarity  Single Link  Complete Link  Average Link
  • 14. Common Similarity Measures • Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the clusters. They include: 1. The Euclidean distance (also called 2-norm distance) is given by: 2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:
  • 15. A Simple example showing the implementation of k- means algorithm (using K=2)
  • 16. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
  • 17. Step 2: • Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}. • Their new centroids are:
  • 18. Step 3: • Now using these centroids we compute the Euclidean distance of each object, as shown in table. • Therefore, the new clusters are: {1,2} and {3,4,5,6,7} • Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
  • 19. • Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7} • Therefore, there is no change in the cluster. • Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
  • 20. PLOT
  • 22. PLOT
  • 23. Two different K-means Clustering -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Sub-optimal Clustering -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Optimal Clustering Original Points
  • 24. Importance of Choosing Initial Centroids -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 6
  • 25. Importance of Choosing Initial Centroids -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 5
  • 27. Getting Stuck In A Local Minimum
  • 28. Can k-means Handle Non-spherical Clusters?
  • 30. Let’s Try Single Linkage Hierarchical Clustering
  • 31. K-means with Polar Coordinates