SlideShare a Scribd company logo
K-MEANS
CLUSTERING
Topic to cover
Introduction of Clustering
K-Means Clustering
Examples
Conclusion
INTRODUCTION-
What is clustering?
 Clustering is the Classification of objects into
different groups, or more precisely, the
partitioning a data set into subset (clusters),
so that the data in each subset (ideally) share
some common trait - often according to some
defined distance measure
K-MEANS CLUSTERING
 The k-means algorithm is an algorithm to cluster
n objects based on attributes into k patitions,
where k < n.
 It assumes that the object attributes form a vector
space.
 An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj
containing data points so as to minimize the
sum-of-squares criterion
where xn is a vector representing the the nth
data point and uj is the geometric centroid of
the data points in Sj.
 Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
 K is positive integer number.
 The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
Simplify K-means:
How the K-Mean Clustering
algorithm works?
 Step 1: Begin with a decision on the value of k =
number of clusters .
 Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the
training samples randomly,or systematically
as the following:
1.Take the first k training sample as single-
element clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest centroid.
After each assignment, recompute the centroid of
the gaining cluster.
 Step 3: Take each sample in sequence and
compute its distance from the centroid
of each of the clusters. If a sample is not
currently in the cluster with the
closest centroid, switch this
sample to that cluster and update the
centroid of the cluster gaining the
new sample and the cluster losing the
sample.
 Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through
the training sample causes no new
assignments.
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.
 Therefore, the new
clusters are:
{1,2} and {3,4,5,6,7}
 Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
 Therefore, there is no
change in the cluster.
 Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
PLOT
(with K=3)
Step 1 Step 2
PLOT
k-mean-clustering big data analaysis.ppt

More Related Content

PPT
Lecture_3_k-mean-clustering.ppt
PPTX
AI-Lec20 Clustering I - Kmean.pptx
PPTX
K means Clustering - algorithm to cluster n objects
PPTX
K-Means clustering and its working .pptx
PPTX
K means clustering
PPT
06K-means-clustering K-MEANS CLUSTERING.ppt
PPT
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
PPT
K mean-clustering
Lecture_3_k-mean-clustering.ppt
AI-Lec20 Clustering I - Kmean.pptx
K means Clustering - algorithm to cluster n objects
K-Means clustering and its working .pptx
K means clustering
06K-means-clustering K-MEANS CLUSTERING.ppt
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
K mean-clustering

Similar to k-mean-clustering big data analaysis.ppt (20)

PPTX
partitioning methods in data mining .pptx
PPT
K mean-clustering algorithm
PPT
K mean-clustering
PPT
Enhance The K Means Algorithm On Spatial Dataset
DOCX
Neural nw k means
PPT
k-mean-Clustering impact on AI using DSS
PPT
k-mean-clustering.ppt
PPT
k-mean-clustering algorithm with example.ppt
PPT
Unsupervised Machine Learning, Clustering, K-Means
PPT
k-mean-clustering (1) clustering topic explanation
PPTX
Clustering
PDF
k-mean-clustering.pdf
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
Unsupervised learning Algorithms and Assumptions
PPT
Slide-TIF311-DM-10-11.ppt
PPT
Slide-TIF311-DM-10-11.ppt
PPT
Chapter 11. Cluster Analysis Advanced Methods.ppt
PPT
clustering and their types explanation of data mining
PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PDF
11 clusadvanced
partitioning methods in data mining .pptx
K mean-clustering algorithm
K mean-clustering
Enhance The K Means Algorithm On Spatial Dataset
Neural nw k means
k-mean-Clustering impact on AI using DSS
k-mean-clustering.ppt
k-mean-clustering algorithm with example.ppt
Unsupervised Machine Learning, Clustering, K-Means
k-mean-clustering (1) clustering topic explanation
Clustering
k-mean-clustering.pdf
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Unsupervised learning Algorithms and Assumptions
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
clustering and their types explanation of data mining
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
11 clusadvanced
Ad

Recently uploaded (20)

PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Leprosy and NLEP programme community medicine
DOCX
Factor Analysis Word Document Presentation
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
How to run a consulting project- client discovery
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
modul_python (1).pptx for professional and student
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
retention in jsjsksksksnbsndjddjdnFPD.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
CYBER SECURITY the Next Warefare Tactics
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Topic 5 Presentation 5 Lesson 5 Corporate Fin
[EN] Industrial Machine Downtime Prediction
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Leprosy and NLEP programme community medicine
Factor Analysis Word Document Presentation
Qualitative Qantitative and Mixed Methods.pptx
SAP 2 completion done . PRESENTATION.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
How to run a consulting project- client discovery
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
A Complete Guide to Streamlining Business Processes
modul_python (1).pptx for professional and student
Optimise Shopper Experiences with a Strong Data Estate.pdf
Ad

k-mean-clustering big data analaysis.ppt

  • 2. Topic to cover Introduction of Clustering K-Means Clustering Examples Conclusion
  • 3. INTRODUCTION- What is clustering?  Clustering is the Classification of objects into different groups, or more precisely, the partitioning a data set into subset (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure
  • 4. K-MEANS CLUSTERING  The k-means algorithm is an algorithm to cluster n objects based on attributes into k patitions, where k < n.  It assumes that the object attributes form a vector space.
  • 5.  An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj containing data points so as to minimize the sum-of-squares criterion where xn is a vector representing the the nth data point and uj is the geometric centroid of the data points in Sj.
  • 6.  Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group.  K is positive integer number.  The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Simplify K-means:
  • 7. How the K-Mean Clustering algorithm works?
  • 8.  Step 1: Begin with a decision on the value of k = number of clusters .  Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly,or systematically as the following: 1.Take the first k training sample as single- element clusters 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recompute the centroid of the gaining cluster.
  • 9.  Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.  Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.
  • 10. A Simple example showing the implementation of k-means algorithm (using K=2)
  • 11. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
  • 12. Step 2:  Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}.  Their new centroids are:
  • 13. Step 3:  Now using these centroids we compute the Euclidean distance of each object, as shown in table.  Therefore, the new clusters are: {1,2} and {3,4,5,6,7}  Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
  • 14.  Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7}  Therefore, there is no change in the cluster.  Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
  • 15. PLOT
  • 17. PLOT