SlideShare a Scribd company logo
K
M
EAN
S
CLUSTERING
UNSUPERVISED MACHINE LEARNING
• Type of machine learning algorithm
used to draw inferences from datasets
consisting of input data without
labeled responses.
• The model learns through observation
and finds structures in the data
• Once the model is given a dataset, it
automatically finds patterns and
relationships in the dataset by
creating clusters in it.
K means ALGORITHM IN MACHINE LEARNING.pptx
CLUSTERING
• Clustering means grouping the objects based
on the information found in the data describing
object.
• Objects in one group should be similar to each
other but different from objects in another
group.
• Finds a structure in a collection of unlabeled
data.
Organizing data into clusters such that there is
 high intra class similarity.
 Low inter class similarity.
 Finding natural grouping among objects.
APPLICATIONS
• Information retrieval: document clustering
• Land use: Identification of areas of similar land use in an
earth observation database
• Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
• City-planning: Identifying groups of houses according
to their house type, value, and geographical location
• Climate: understanding earth climate, find patterns of
atmospheric and ocean
• Economic Science: market research
K-MEANS : A CENTROID-BASED TECHNIQUE
Centroid - defined in various ways such as by the mean or
medoid of the objects assigned to the cluster.
Dataset - Data set contains n objects in Euclidean space. Partitioning
methods distribute the objects in D into k clusters, C1
, : : : ,Ck
Euclidean distance – To find distance between an object p and Ci
,
Quality of cluster Ci
- can be measured by the within cluster
variation, which is the sum of squared error between all objects in Ci
and
the centroid ci, defined as
A centroid-based partitioning technique uses the centroid (Center point) of a cluster, Ci , to represent that
cluster.
CLASSIFICATION ON CLUSTERING
CLASSIFICATION ON CLUSTERING
DISSIMILARITY MEASURES
EUCLIDEAN AND MANHATTAN
K MEANS CLUSTERING
PROCESS FLOW FOR K MEANS
A SIMPLE EXAMPLE SHOWING THE
IMPLEMENTATION OF K-MEANS
ALGORITHM (USING K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K means ALGORITHM IN MACHINE LEARNING.pptx
Step 2:
Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
Their new centroids are:
Step 3:
Now using these centroids we
compute the Euclidean
distance of each object, as
shown in table.
Therefore, the new clusters
are:
{1,2} and {3,4,5,6,7}
Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
Therefore, there is no change
in the cluster.
Thus, the algorithm comes to a
halt here and final result
consist of 2 clusters {1,2}
and {3,4,5,6,7}.
K MEANS IMPLEMENTATION(Example 2)
Apply K means algorithm to group the samples into two clusters.
Samples Feature 1
(x)
Feature 2
(y)
1 2 3
2 5 6
3 8 7
4 1 4
5 2 4
6 6 7
7 3 4
8 8 6
STEPS 2-3:
x y C1
(x-2)2
+ (y-3)2
C2
(x-5)2
+ (y-6)2
Cluster
assignment
2 3
0
18 C1
5 6 18 0 C2
8 7 52 10 C2
1 4 2 20 C1
2 4 1 13 C1
6 7 32 2 C2
3 4 2 8 C1
8 6 45 9 C2
C1
C2
Step-4: Calculate the mean and place a new centroid of each
cluster
From step 3, the newly assigned cluster values are
considered:
 C1= (2,3) (1,4) (2,4) and (3,4)
 C2= (5,6) (8, 7) (6,7) and (8,6)
Calculate mean:
Mean (x,y) = (x1+x2+x3+x4)/4 and
(y1+y2+y3+y4)/4
Mean for cluster 1
= (2+1+2+3)/4, (3+4+4+4)/4
= (8/4, 15/4) = (2, 3.75)
Mean for cluster 2= 6.75, 6.5
x y C1 C2 Cluster
assignment
2 3
0
18 C1
5 6 18 0 C2
8 7 52 10 C2
1 4 2 20 C1
2 4 1 13 C1
6 7 32 2 C2
3 4 2 8 C1
8 6 45 9 C2
Step-5: Calculate new centroid: repeat the third step- reassign each data
point to the new closest centroid of each cluster.
x y C1
(x-2)2
+ (y-3.75)2
C2
(x-6.75)2
+ (y-6.5)2
Cluster
assignment
2 3 0.25 34.81 C1
5 6 16.56 3.31 C2
8 7 50.06 1.81 C2
1 4 1.56 39.31 C1
2 4 1.25 42.81 C1
6 7 30.04 0.84 C2
3 4 1.54 20.71 C1
8 6 43.56 1.81 C2
Calculate mean for the new cluster:
Mean (x,y) = (x1+x2+x3+x4)/4 and
(y1+y2+y3+y4)/4
Mean for cluster 1 = (2, 3.75)
Mean for cluster 2 = (6.75, 6.5)
Step-6: If any reassignment occurs,
then go to step-4 else go to FINISH.
Mean c1= 2, 3.75
Mean c2= 6.75, 6.5
Step-7: The model is ready.
x y C1 C2 Cluster
assignment
2 3 0.25 34.81 C1
5 6 16.56 3.31 C2
8 7 50.06 1.81 C2
1 4 1.56 39.31 C1
2 4 1.25 42.81 C1
6 7 30.04 0.84 C2
3 4 1.54 20.71 C1
8 6 43.56 1.81 C2
How to choose optimal “K “ value in k-means clustering?
The Elbow method:
 one of the most popular ways to find the optimal number of clusters.
 uses the concept of WCSS value.
 WCSS - Within Cluster Sum of Squares, which defines the total variations within a
cluster.
 WCSS formula (for 3 clusters) :
It is the sum of the square of the distances between each data point and
its centroid within a cluster1 and the same for the other two terms. (use
any method such as Euclidean distance or Manhattan distance)
ELBOW METHOD
It executes the K-means clustering
on a given dataset for different K
values (ranges from 1-10).
For each value of K, calculate WCSS
value.
Plot a curve between calculated
WCSS values and the number of
clusters K.
The sharp point of bend or a point of
the plot looks like an arm, then
that point is considered as the
best value of K.
Since the graph shows the sharp
bend, which looks like an elbow,
hence it is known as the elbow
method.

More Related Content

PDF
Unsupervised Learning in Machine Learning
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
K-means machine learning clustering .pptx
PPT
Lecture_3_k-mean-clustering.ppt
PDF
K means clustering
PDF
Clustering
PDF
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
PPTX
K-Means Clustering Algorithm.pptx
Unsupervised Learning in Machine Learning
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K-means machine learning clustering .pptx
Lecture_3_k-mean-clustering.ppt
K means clustering
Clustering
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
K-Means Clustering Algorithm.pptx

Similar to K means ALGORITHM IN MACHINE LEARNING.pptx (20)

PPT
Enhance The K Means Algorithm On Spatial Dataset
PPT
Clustering in Machine Learning: A Brief Overview.ppt
PPT
26-Clustering MTech-2017.ppt
PPTX
Lec13 Clustering.pptx
PPTX
K-means Clustering || Data Mining
PPTX
K – means cluster analysis.pptx
PDF
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
DOCX
8.clustering algorithm.k means.em algorithm
PDF
Optimising Data Using K-Means Clustering Algorithm
PDF
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
PDF
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
PPTX
K means clustering
PPTX
K means clustering | K Means ++
PPTX
K-Means clustering and its working .pptx
PPTX
"k-means-clustering" presentation @ Papers We Love Bucharest
PPTX
Unsupervised learning Algorithms and Assumptions
PPTX
AI-Lec20 Clustering I - Kmean.pptx
PPT
K mean-clustering
Enhance The K Means Algorithm On Spatial Dataset
Clustering in Machine Learning: A Brief Overview.ppt
26-Clustering MTech-2017.ppt
Lec13 Clustering.pptx
K-means Clustering || Data Mining
K – means cluster analysis.pptx
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
8.clustering algorithm.k means.em algorithm
Optimising Data Using K-Means Clustering Algorithm
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
K means clustering
K means clustering | K Means ++
K-Means clustering and its working .pptx
"k-means-clustering" presentation @ Papers We Love Bucharest
Unsupervised learning Algorithms and Assumptions
AI-Lec20 Clustering I - Kmean.pptx
K mean-clustering
Ad

More from angelinjeba6 (20)

PPTX
NLP-Components NLP-ComponentsNLP-Components.pptx
PPTX
REVIEW_Practicum1_NLP REVIEW_Practicum1_NLP.pptx
PPTX
STUDENT ACTIVITIES_NLP STUDENT ACTIVITIES_NLP.pptx
PPTX
AI-in-Cybersecurity AI-in-Cybersecurity.pptx
PPTX
Applying-AIAI-in-CybersecurityAI-in-Cybersecurity-in-Cybersecurity.pptx
PPTX
Exercise no: 6 java script fundamentals.pptx
PPTX
1 Intro of web technology and sciences .pptx
PPT
rules classifier in machine learning .ppt
PPTX
21. Regression Tree in machine learning.pptx
PPTX
Loss function in machine learning .pptx
PPTX
Module 1 Taxonomy of Machine L(1).pptx
PPTX
decision tree DECISION TREE IN MACHINE .pptx
PPTX
Support vector machine_new SVM presentation.pptx
PPTX
INTELL ACTUAL PROPERTY. INTELL ACTUAL PROPERTY.pptx
PPTX
Multiple Linear Regressionnnnnnnnnnn.pptx
PPTX
normalisation jdsuhduswwhdusw cdscsacasc.pptx
PPT
311----introduction tomachinelearning.ppt
PPTX
Dynamic and Embedded SQL for db practices.pptx
PPT
artificial engineering the future of computing
PPTX
1 Json Intro and datatype PRESENTATION.pptx
NLP-Components NLP-ComponentsNLP-Components.pptx
REVIEW_Practicum1_NLP REVIEW_Practicum1_NLP.pptx
STUDENT ACTIVITIES_NLP STUDENT ACTIVITIES_NLP.pptx
AI-in-Cybersecurity AI-in-Cybersecurity.pptx
Applying-AIAI-in-CybersecurityAI-in-Cybersecurity-in-Cybersecurity.pptx
Exercise no: 6 java script fundamentals.pptx
1 Intro of web technology and sciences .pptx
rules classifier in machine learning .ppt
21. Regression Tree in machine learning.pptx
Loss function in machine learning .pptx
Module 1 Taxonomy of Machine L(1).pptx
decision tree DECISION TREE IN MACHINE .pptx
Support vector machine_new SVM presentation.pptx
INTELL ACTUAL PROPERTY. INTELL ACTUAL PROPERTY.pptx
Multiple Linear Regressionnnnnnnnnnn.pptx
normalisation jdsuhduswwhdusw cdscsacasc.pptx
311----introduction tomachinelearning.ppt
Dynamic and Embedded SQL for db practices.pptx
artificial engineering the future of computing
1 Json Intro and datatype PRESENTATION.pptx
Ad

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPT
Project quality management in manufacturing
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PPT on Performance Review to get promotions
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Lecture Notes Electrical Wiring System Components
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
DOCX
573137875-Attendance-Management-System-original
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CH1 Production IntroductoryConcepts.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Operating System & Kernel Study Guide-1 - converted.pdf
OOP with Java - Java Introduction (Basics)
Project quality management in manufacturing
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT on Performance Review to get promotions
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Digital Logic Computer Design lecture notes
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Foundation to blockchain - A guide to Blockchain Tech
Lecture Notes Electrical Wiring System Components
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
573137875-Attendance-Management-System-original
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

K means ALGORITHM IN MACHINE LEARNING.pptx

  • 2. UNSUPERVISED MACHINE LEARNING • Type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. • The model learns through observation and finds structures in the data • Once the model is given a dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it.
  • 4. CLUSTERING • Clustering means grouping the objects based on the information found in the data describing object. • Objects in one group should be similar to each other but different from objects in another group. • Finds a structure in a collection of unlabeled data.
  • 5. Organizing data into clusters such that there is  high intra class similarity.  Low inter class similarity.  Finding natural grouping among objects.
  • 6. APPLICATIONS • Information retrieval: document clustering • Land use: Identification of areas of similar land use in an earth observation database • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs • City-planning: Identifying groups of houses according to their house type, value, and geographical location • Climate: understanding earth climate, find patterns of atmospheric and ocean • Economic Science: market research
  • 7. K-MEANS : A CENTROID-BASED TECHNIQUE Centroid - defined in various ways such as by the mean or medoid of the objects assigned to the cluster. Dataset - Data set contains n objects in Euclidean space. Partitioning methods distribute the objects in D into k clusters, C1 , : : : ,Ck Euclidean distance – To find distance between an object p and Ci , Quality of cluster Ci - can be measured by the within cluster variation, which is the sum of squared error between all objects in Ci and the centroid ci, defined as A centroid-based partitioning technique uses the centroid (Center point) of a cluster, Ci , to represent that cluster.
  • 13. PROCESS FLOW FOR K MEANS
  • 14. A SIMPLE EXAMPLE SHOWING THE IMPLEMENTATION OF K-MEANS ALGORITHM (USING K=2)
  • 15. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
  • 26. Step 2: Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}. Their new centroids are:
  • 27. Step 3: Now using these centroids we compute the Euclidean distance of each object, as shown in table. Therefore, the new clusters are: {1,2} and {3,4,5,6,7} Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
  • 28. Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7} Therefore, there is no change in the cluster. Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
  • 29. K MEANS IMPLEMENTATION(Example 2) Apply K means algorithm to group the samples into two clusters. Samples Feature 1 (x) Feature 2 (y) 1 2 3 2 5 6 3 8 7 4 1 4 5 2 4 6 6 7 7 3 4 8 8 6
  • 30. STEPS 2-3: x y C1 (x-2)2 + (y-3)2 C2 (x-5)2 + (y-6)2 Cluster assignment 2 3 0 18 C1 5 6 18 0 C2 8 7 52 10 C2 1 4 2 20 C1 2 4 1 13 C1 6 7 32 2 C2 3 4 2 8 C1 8 6 45 9 C2 C1 C2
  • 31. Step-4: Calculate the mean and place a new centroid of each cluster From step 3, the newly assigned cluster values are considered:  C1= (2,3) (1,4) (2,4) and (3,4)  C2= (5,6) (8, 7) (6,7) and (8,6) Calculate mean: Mean (x,y) = (x1+x2+x3+x4)/4 and (y1+y2+y3+y4)/4 Mean for cluster 1 = (2+1+2+3)/4, (3+4+4+4)/4 = (8/4, 15/4) = (2, 3.75) Mean for cluster 2= 6.75, 6.5 x y C1 C2 Cluster assignment 2 3 0 18 C1 5 6 18 0 C2 8 7 52 10 C2 1 4 2 20 C1 2 4 1 13 C1 6 7 32 2 C2 3 4 2 8 C1 8 6 45 9 C2
  • 32. Step-5: Calculate new centroid: repeat the third step- reassign each data point to the new closest centroid of each cluster. x y C1 (x-2)2 + (y-3.75)2 C2 (x-6.75)2 + (y-6.5)2 Cluster assignment 2 3 0.25 34.81 C1 5 6 16.56 3.31 C2 8 7 50.06 1.81 C2 1 4 1.56 39.31 C1 2 4 1.25 42.81 C1 6 7 30.04 0.84 C2 3 4 1.54 20.71 C1 8 6 43.56 1.81 C2
  • 33. Calculate mean for the new cluster: Mean (x,y) = (x1+x2+x3+x4)/4 and (y1+y2+y3+y4)/4 Mean for cluster 1 = (2, 3.75) Mean for cluster 2 = (6.75, 6.5) Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Mean c1= 2, 3.75 Mean c2= 6.75, 6.5 Step-7: The model is ready. x y C1 C2 Cluster assignment 2 3 0.25 34.81 C1 5 6 16.56 3.31 C2 8 7 50.06 1.81 C2 1 4 1.56 39.31 C1 2 4 1.25 42.81 C1 6 7 30.04 0.84 C2 3 4 1.54 20.71 C1 8 6 43.56 1.81 C2
  • 34. How to choose optimal “K “ value in k-means clustering? The Elbow method:  one of the most popular ways to find the optimal number of clusters.  uses the concept of WCSS value.  WCSS - Within Cluster Sum of Squares, which defines the total variations within a cluster.  WCSS formula (for 3 clusters) : It is the sum of the square of the distances between each data point and its centroid within a cluster1 and the same for the other two terms. (use any method such as Euclidean distance or Manhattan distance)
  • 35. ELBOW METHOD It executes the K-means clustering on a given dataset for different K values (ranges from 1-10). For each value of K, calculate WCSS value. Plot a curve between calculated WCSS values and the number of clusters K. The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the best value of K. Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow method.