SlideShare a Scribd company logo
Clustering
- Dr. Sifat Momen (SfM1)
9
11/30/2024 Slides by Dr. Sifat Momen 2
Learning goals
• After this presentation, you should be able to
• Understand what clustering is
• Understand the different types of clustering
• Apply Kmeans clustering algorithm
• Understand the notion of silhouette clustering
• Organizing data into classes such that there is
• high intra-class similarity
• low inter-class similarity
• Finding the class labels and the number of classes directly
from the data (in contrast to classification).
• More informally, finding natural groupings among objects.
What is Clustering?
Also called unsupervised learning, sometimes called
classification by statisticians and sorting by
psychologists and segmentation by people in marketing
What is a natural grouping among these objects?
School Employees
Simpson's Family Males
Females
Clustering is subjective
What is a natural grouping among these objects?
What is Similarity?
The quality or state of being similar; likeness; resemblance; as, a similarity of features.
Similarity is hard
to define, but…
“We know it when
we see it”
The real meaning
of similarity is a
philosophical
question. We will
take a more
pragmatic
approach.
Webster's Dictionary
Two Types of Clustering
Hierarchical
• Partitional algorithms: Construct various partitions and then
evaluate them by some criterion (we will see an example called BIRCH)
• Hierarchical algorithms: Create a hierarchical decomposition of
the set of objects using some criterion
Partitional
Partitional Clustering
• Nonhierarchical, each instance is placed in
exactly one of K nonoverlapping clusters.
• Since only one set of clusters is output, the user
normally has to input the desired number of
clusters K.
Unlabeled Dataset
After clustering (Decision Boundaries)
[Also called Voronoi diagram or Voronoi
tesselation]
Squared Error
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Objective Function
Algorithm k-means
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if
necessary).
3. Decide the class memberships of the N objects by
assigning them to the nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the
memberships found above are correct.
5. If none of the N objects changed membership in
the last iteration, exit. Otherwise goto 3.
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 1
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 2
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 3
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 4
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
expression
in
condition
2
K-means Clustering: Step 5
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
Comments on the K-Means Method
• Strength
• Relatively efficient: O(tkn), where n is # objects, k is # clusters,
and t is # iterations. Normally, k, t << n.
• Often terminates at a local optimum. The global optimum may
be found using techniques such as: deterministic annealing and
genetic algorithms
• Weakness
• Applicable only when mean is defined, then what about
categorical data?
• Need to specify k, the number of clusters, in advance
• Unable to handle noisy data and outliers
• Not suitable to discover clusters with non-convex shapes
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
How can we tell the right number of clusters?
In general, this is a unsolved problem. However there are many approximate methods. In the
next few slides we will see an example.
For our example, we will use the
familiar katydid/grasshopper
dataset.
However, in this case we are
imagining that we do NOT
know the class labels. We are
only clustering on the X and Y
axis values.
1 2 3 4 5 6 7 8 9 10
When k = 1, the objective function is 873.0
1 2 3 4 5 6 7 8 9 10
When k = 2, the objective function is 173.1
1 2 3 4 5 6 7 8 9 10
When k = 3, the objective function is 133.6
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
7.00E+02
8.00E+02
9.00E+02
1.00E+03
1 2 3 4 5 6
We can plot the objective function values for k equals 1 to 6…
The abrupt change at k = 2, is highly suggestive of two clusters
in the data. This technique for determining the number of
clusters is known as “knee finding” or “elbow finding”.
Note that the results are not always as clear cut as in this toy example
k
Objective
Function
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx

More Related Content

PDF
Clustering.pdf
PPTX
Pattern recognition binoy k means clustering
PPTX
Poggi analytics - clustering - 1
PPTX
machine learning - Clustering in R
PDF
PPT s10-machine vision-s2
PPTX
Advanced database and data mining & clustering concepts
PPT
3.Unsupervised Learning.ppt presenting machine learning
PPTX
Clustering_Overview.pptx
Clustering.pdf
Pattern recognition binoy k means clustering
Poggi analytics - clustering - 1
machine learning - Clustering in R
PPT s10-machine vision-s2
Advanced database and data mining & clustering concepts
3.Unsupervised Learning.ppt presenting machine learning
Clustering_Overview.pptx

Similar to Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx (20)

PPTX
Clustering on DSS
PPT
26-Clustering MTech-2017.ppt
PDF
Unsupervised learning and clustering.pdf
PPTX
Clustering.pptx
PPTX
Mathematics online: some common algorithms
PDF
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
PPTX
Clustering.pptx
PPT
4 DM Clustering ifor computerscience.ppt
PDF
Chapter 10.1,2,3 pdf.pdf
PPTX
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PPTX
Classification & Clustering.pptx
PPTX
MODULE 4_ CLUSTERING.pptx
PPTX
US learning
PDF
Clustering.pdf
PDF
ch_5_dm clustering in data mining.......
PDF
Clustering
PPTX
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
PDF
CSA 3702 machine learning module 3
PPTX
Data mining Techniques
PPT
Clustering in Machine Learning Topic7a.ppt
Clustering on DSS
26-Clustering MTech-2017.ppt
Unsupervised learning and clustering.pdf
Clustering.pptx
Mathematics online: some common algorithms
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
Clustering.pptx
4 DM Clustering ifor computerscience.ppt
Chapter 10.1,2,3 pdf.pdf
K MEANS CLUSTERING - UNSUPERVISED LEARNING
Classification & Clustering.pptx
MODULE 4_ CLUSTERING.pptx
US learning
Clustering.pdf
ch_5_dm clustering in data mining.......
Clustering
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
CSA 3702 machine learning module 3
Data mining Techniques
Clustering in Machine Learning Topic7a.ppt
Ad

Recently uploaded (20)

PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
introduction to datamining and warehousing
DOCX
573137875-Attendance-Management-System-original
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Well-logging-methods_new................
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPT
Project quality management in manufacturing
PPTX
Sustainable Sites - Green Building Construction
PDF
composite construction of structures.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Safety Seminar civil to be ensured for safe working.
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Internet of Things (IOT) - A guide to understanding
Lecture Notes Electrical Wiring System Components
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
introduction to datamining and warehousing
573137875-Attendance-Management-System-original
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Well-logging-methods_new................
bas. eng. economics group 4 presentation 1.pptx
Project quality management in manufacturing
Sustainable Sites - Green Building Construction
composite construction of structures.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Ad

Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx

  • 1. Clustering - Dr. Sifat Momen (SfM1) 9
  • 2. 11/30/2024 Slides by Dr. Sifat Momen 2 Learning goals • After this presentation, you should be able to • Understand what clustering is • Understand the different types of clustering • Apply Kmeans clustering algorithm • Understand the notion of silhouette clustering
  • 3. • Organizing data into classes such that there is • high intra-class similarity • low inter-class similarity • Finding the class labels and the number of classes directly from the data (in contrast to classification). • More informally, finding natural groupings among objects. What is Clustering? Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by people in marketing
  • 4. What is a natural grouping among these objects?
  • 5. School Employees Simpson's Family Males Females Clustering is subjective What is a natural grouping among these objects?
  • 6. What is Similarity? The quality or state of being similar; likeness; resemblance; as, a similarity of features. Similarity is hard to define, but… “We know it when we see it” The real meaning of similarity is a philosophical question. We will take a more pragmatic approach. Webster's Dictionary
  • 7. Two Types of Clustering Hierarchical • Partitional algorithms: Construct various partitions and then evaluate them by some criterion (we will see an example called BIRCH) • Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion Partitional
  • 8. Partitional Clustering • Nonhierarchical, each instance is placed in exactly one of K nonoverlapping clusters. • Since only one set of clusters is output, the user normally has to input the desired number of clusters K.
  • 10. After clustering (Decision Boundaries) [Also called Voronoi diagram or Voronoi tesselation]
  • 11. Squared Error 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 Objective Function
  • 12. Algorithm k-means 1. Decide on a value for k. 2. Initialize the k cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. 5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3.
  • 13. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 1 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 14. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 2 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 15. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 3 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 16. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 4 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 17. 0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 expression in condition 2 K-means Clustering: Step 5 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 18. Comments on the K-Means Method • Strength • Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. • Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms • Weakness • Applicable only when mean is defined, then what about categorical data? • Need to specify k, the number of clusters, in advance • Unable to handle noisy data and outliers • Not suitable to discover clusters with non-convex shapes
  • 19. 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 How can we tell the right number of clusters? In general, this is a unsolved problem. However there are many approximate methods. In the next few slides we will see an example. For our example, we will use the familiar katydid/grasshopper dataset. However, in this case we are imagining that we do NOT know the class labels. We are only clustering on the X and Y axis values.
  • 20. 1 2 3 4 5 6 7 8 9 10 When k = 1, the objective function is 873.0
  • 21. 1 2 3 4 5 6 7 8 9 10 When k = 2, the objective function is 173.1
  • 22. 1 2 3 4 5 6 7 8 9 10 When k = 3, the objective function is 133.6
  • 23. 0.00E+00 1.00E+02 2.00E+02 3.00E+02 4.00E+02 5.00E+02 6.00E+02 7.00E+02 8.00E+02 9.00E+02 1.00E+03 1 2 3 4 5 6 We can plot the objective function values for k equals 1 to 6… The abrupt change at k = 2, is highly suggestive of two clusters in the data. This technique for determining the number of clusters is known as “knee finding” or “elbow finding”. Note that the results are not always as clear cut as in this toy example k Objective Function