SlideShare a Scribd company logo
Clustering
 Clustering: the process of portioning a set of
data objects into subsets (clusters) where
objects in a cluster are similar to one , yet
dissimilar to objects in other clusters.
 Considered as unsupervised learning: no
predefined classes (learning by observation
vs learning by examples)
 Descriptive data mining
Clustering
Y X OBJECT
1 1 A
1 2 B
3 4 C
4 5 D
Types of Clustering
:
 Partitioning approach: construct
various portions and then evaluate
them by some criterion (i.e.
minimize the sum of square errors).
 Hierarchical approach: create a
hierarchal decomposition of the set
of data using some criterion.
Partitioning approach
 Partitioning methods:: Partioning a
dataset D of n objects into a set of k
clusters.
 A centroid-based partitioning
technique uses the centroid of a
cluster, Ci , to represent that cluster.
 The centroid can be defined in
various ways such as by the mean
or medoid of the objects (or points)
What is K-Means Clustering
?
 It is an algorithm to group your objects
based on attributes/features into K
number of group.
 K is positive integer number.
 Cluster representative can be:
 mean / centroid (average of data
point)
 median / medoid (a point closer to the
mean)
Distance Function
 Euclidean Distance
 Manhatten Distance
Ex
:
 Find centroid and medoid of cluster containing three
two dimensional points (1,1) ,(2,3) and (6,2)
Centroid (mean)=
To find Kmedoid find the closest point
to mean
- For (1,1) = |3-1|+|2-1| = 3
- For (2,3)=|3-2|+|2-3| = 2
- - For (6,2)=|3-6|+|2-2|=3
K medoid = (2,3)
Closest point
Partitioning approach
 The grouping is done by minimizing
the sum of squares of distances
between data and the
corresponding cluster centroid.
 The quality of cluster Ci can be
measured by the within cluster
variation, which is the sum of
squared error between all objects in
Ci and the centroid ci, defined as
Main steps for K means
Example: Suppose we have 4 objects as your
training data point and each object have 2 tributes.
Each attribute represents coordinate of the
object
.
Y X OBJECT
1 1 A
1 2 B
3 4 C
4 5 D
 First step is to determine number of K.
K=2
 Initial centroids.
c1 = (1,1) and c2 = (2,1)
c2 = (2,1) c1 = (1,1)
= (1,1)
=1
= (1,1)
=0 min
= (2,1)
=0 min
= (2,1)
=1
= (4,3)
=2.83 min
= (4,3)
=3.61
= (5,4)
=4.24min
= (5,4)
=5
Calculate distance between objects and centroids.
New centroids
, )= c1 = (1,1)
𝒐𝒃𝒋 𝟏
Calculate distance between objects and new centroids
.
c2 = c1 = (1,1)
= (1,1)
=3.14
= (1,1)
=0 min
= (2,1)
=2.36
= (2,1)
=1 min
= (4,3)
=0.47min
= (4,3)
=3.61
= (5,4)
=1.89min
= (5,4)
=5
New centroids
𝒐𝒃𝒋𝟏𝒐𝒃𝒋𝟐
, )= , )=
Calculate distance between objects and new centroids
.
c2 = c1 = (,1)
= (1,1)
=4.3
= (1,1)
=0.5 min
= (2,1)
=3.54
= (2,1)
=0.5 min
= (4,3)
=0.71min
= (4,3)
=3.20
= (5,4)
=0.71min
= (5,4)
=4.61
New centroids
𝒐𝒃𝒋𝟏𝒐𝒃𝒋𝟐
, )= , )=
Centroids not changed then Stop
EX
:
 The following is a set of one-dimensional
points: {6; 12; 18; 24; 30; 42; 48}.
For each of the following set of initial centroids,
create two clusters by assigning each point to
the nearest centroid, and then calculate the
total squared error for each set of two clusters.
Show both the clusters and the total squared
error for each set of
centroid.
 {18; 45}.
 {15; 40}.
Sol
:
 First round of k means
- Cluster assign
42,48)
- Recompute mean
New centroid is the same as the previous
centroid
The final clusters are {6,12,18,24,30}
{42,48}
= ( + + + +) = 360
= ( + )= 18
Total square error is 360+18= 378
True or false
 K means is a hierarchical clustering
method.
 In k means clustering the number
of clusters produced is not known.
 A partition clustering is a division of
data objects into overlapping
clusters.
 K means results in optimal data
clustering.
 A centroid must be an actual data

More Related Content

PPTX
K means ALGORITHM IN MACHINE LEARNING.pptx
PDF
Unsupervised Learning in Machine Learning
PPTX
partitioning methods in data mining .pptx
PPT
26-Clustering MTech-2017.ppt
PDF
An improvement in k mean clustering algorithm using better time and accuracy
PDF
Clustering
PPTX
PDF
Optimising Data Using K-Means Clustering Algorithm
K means ALGORITHM IN MACHINE LEARNING.pptx
Unsupervised Learning in Machine Learning
partitioning methods in data mining .pptx
26-Clustering MTech-2017.ppt
An improvement in k mean clustering algorithm using better time and accuracy
Clustering
Optimising Data Using K-Means Clustering Algorithm

Similar to K-means machine learning clustering .pptx (20)

PPTX
Pattern recognition binoy k means clustering
PDF
The International Journal of Engineering and Science (The IJES)
PPTX
Unsupervised learning Algorithms and Assumptions
PDF
K means clustering
PPT
Lecture_3_k-mean-clustering.ppt
PPTX
K – means cluster analysis.pptx
PDF
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
PPTX
K means Clustering algorithmgfgbfgb.pptx
PDF
Machine Learning - Clustering
PDF
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
PDF
New Approach for K-mean and K-medoids Algorithm
PPTX
MODULE 4_ CLUSTERING.pptx
PPTX
k-means clustering machine learning.pptx
DOCX
8.clustering algorithm.k means.em algorithm
PDF
ClusteringClusteringClusteringClustering.pdf
PDF
Experimental study of Data clustering using k- Means and modified algorithms
PPTX
Lec13 Clustering.pptx
PDF
k-means clustering Machine Learning.pdf
Pattern recognition binoy k means clustering
The International Journal of Engineering and Science (The IJES)
Unsupervised learning Algorithms and Assumptions
K means clustering
Lecture_3_k-mean-clustering.ppt
K – means cluster analysis.pptx
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
K means Clustering algorithmgfgbfgb.pptx
Machine Learning - Clustering
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
New Approach for K-mean and K-medoids Algorithm
MODULE 4_ CLUSTERING.pptx
k-means clustering machine learning.pptx
8.clustering algorithm.k means.em algorithm
ClusteringClusteringClusteringClustering.pdf
Experimental study of Data clustering using k- Means and modified algorithms
Lec13 Clustering.pptx
k-means clustering Machine Learning.pdf
Ad

More from asmaashalma456 (6)

PPTX
BFCAI-BigDataAnalytics-Lecture lecture 2 .pptx
PPTX
Decision Tree machine learning classification .pptx
PPT
lecture-TFIDF information retrieval .ppt
PPTX
lab 2 intelligent information systems .pptx
PPTX
lab 1 intelligent information systems .pptx
PPTX
session #1 cloud computing in Amazon web services.pptx
BFCAI-BigDataAnalytics-Lecture lecture 2 .pptx
Decision Tree machine learning classification .pptx
lecture-TFIDF information retrieval .ppt
lab 2 intelligent information systems .pptx
lab 1 intelligent information systems .pptx
session #1 cloud computing in Amazon web services.pptx
Ad

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Mega Projects Data Mega Projects Data
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Database Infoormation System (DBIS).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Fluorescence-microscope_Botany_detailed content
IB Computer Science - Internal Assessment.pptx
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Knowledge Engineering Part 1
Mega Projects Data Mega Projects Data
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
Foundation of Data Science unit number two notes
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Clinical guidelines as a resource for EBP(1).pdf

K-means machine learning clustering .pptx

  • 1. Clustering  Clustering: the process of portioning a set of data objects into subsets (clusters) where objects in a cluster are similar to one , yet dissimilar to objects in other clusters.  Considered as unsupervised learning: no predefined classes (learning by observation vs learning by examples)  Descriptive data mining
  • 2. Clustering Y X OBJECT 1 1 A 1 2 B 3 4 C 4 5 D
  • 3. Types of Clustering :  Partitioning approach: construct various portions and then evaluate them by some criterion (i.e. minimize the sum of square errors).  Hierarchical approach: create a hierarchal decomposition of the set of data using some criterion.
  • 4. Partitioning approach  Partitioning methods:: Partioning a dataset D of n objects into a set of k clusters.  A centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that cluster.  The centroid can be defined in various ways such as by the mean or medoid of the objects (or points)
  • 5. What is K-Means Clustering ?  It is an algorithm to group your objects based on attributes/features into K number of group.  K is positive integer number.  Cluster representative can be:  mean / centroid (average of data point)  median / medoid (a point closer to the mean)
  • 6. Distance Function  Euclidean Distance  Manhatten Distance
  • 7. Ex :  Find centroid and medoid of cluster containing three two dimensional points (1,1) ,(2,3) and (6,2) Centroid (mean)= To find Kmedoid find the closest point to mean - For (1,1) = |3-1|+|2-1| = 3 - For (2,3)=|3-2|+|2-3| = 2 - - For (6,2)=|3-6|+|2-2|=3 K medoid = (2,3) Closest point
  • 8. Partitioning approach  The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid.  The quality of cluster Ci can be measured by the within cluster variation, which is the sum of squared error between all objects in Ci and the centroid ci, defined as
  • 9. Main steps for K means
  • 10. Example: Suppose we have 4 objects as your training data point and each object have 2 tributes. Each attribute represents coordinate of the object . Y X OBJECT 1 1 A 1 2 B 3 4 C 4 5 D
  • 11.  First step is to determine number of K. K=2  Initial centroids. c1 = (1,1) and c2 = (2,1)
  • 12. c2 = (2,1) c1 = (1,1) = (1,1) =1 = (1,1) =0 min = (2,1) =0 min = (2,1) =1 = (4,3) =2.83 min = (4,3) =3.61 = (5,4) =4.24min = (5,4) =5 Calculate distance between objects and centroids.
  • 13. New centroids , )= c1 = (1,1) 𝒐𝒃𝒋 𝟏
  • 14. Calculate distance between objects and new centroids . c2 = c1 = (1,1) = (1,1) =3.14 = (1,1) =0 min = (2,1) =2.36 = (2,1) =1 min = (4,3) =0.47min = (4,3) =3.61 = (5,4) =1.89min = (5,4) =5
  • 16. Calculate distance between objects and new centroids . c2 = c1 = (,1) = (1,1) =4.3 = (1,1) =0.5 min = (2,1) =3.54 = (2,1) =0.5 min = (4,3) =0.71min = (4,3) =3.20 = (5,4) =0.71min = (5,4) =4.61
  • 17. New centroids 𝒐𝒃𝒋𝟏𝒐𝒃𝒋𝟐 , )= , )= Centroids not changed then Stop
  • 18. EX :  The following is a set of one-dimensional points: {6; 12; 18; 24; 30; 42; 48}. For each of the following set of initial centroids, create two clusters by assigning each point to the nearest centroid, and then calculate the total squared error for each set of two clusters. Show both the clusters and the total squared error for each set of centroid.  {18; 45}.  {15; 40}.
  • 19. Sol :  First round of k means - Cluster assign 42,48) - Recompute mean New centroid is the same as the previous centroid The final clusters are {6,12,18,24,30} {42,48}
  • 20. = ( + + + +) = 360 = ( + )= 18 Total square error is 360+18= 378
  • 21. True or false  K means is a hierarchical clustering method.  In k means clustering the number of clusters produced is not known.  A partition clustering is a division of data objects into overlapping clusters.  K means results in optimal data clustering.  A centroid must be an actual data