SlideShare a Scribd company logo
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
▪ What is Clustering?
▪ Types of Clustering
▪ What is K- Means Clustering?
▪ How does a K-Means Algorithm works?
▪ K-Means with Python
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Agenda of Today’s Session
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
What is Clustering?
“Clustering is the process of dividing the datasets
into groups, consisting of similar data-points”
What is
Clustering? ▪ Points in the same group are
as similar as possible
▪ Points in different group are
as dissimilar as possible
What is
Clustering?
Group of diners
in a restaurant
Items arranged in
a mall
Where is it
Used?
Recommendation System
Recommended Movies
Fickr’s Photos
How
business use
Clustering?
Insurance
Companies
Retail Store
Banking
Types of
Clustering
Exclusive Clustering
Overlapping Clustering
Hierarchical Clustering
Exclusive Clustering
▪ Hard Clustering
▪ Data Point / Item belongs exclusively to one cluster
▪ For Example: K-Means Clustering
Types of
Clustering
Exclusive Clustering
Overlapping Clustering
Hierarchical Clustering
Overlapping Clustering
▪ Soft Cluster
▪ Data Point/ Item belongs to multiple cluster
▪ For Example: Fuzzy/ C-Means Clustering
Types of
Clustering
Exclusive Clustering
Overlapping Clustering
Hierarchical Clustering
Hierarchical Clustering
1 2 3 4
What is
K-Means
Clustering?
“K-Means is a clustering algorithm whose mail goal is
to group similar elements or data points into a
cluster.”
NOTE: ‘K’ in K-Means represent the number of clusters
What is
K-Means
Clustering?
Pile of dirty clothes
Where Can I
apply
K-Means?
https://gifer.com/en/Ckp3
Document Classifier
K-Means
Algorithm
Number of Clusters = 3
K-Means
Algorithm
Number of Clusters, K = 3
K-Means
Algorithm
Distance from point 1 to
the red cluster
Distance from point 1 to
the blue cluster
Distance from point 1 to
the green cluster
▪ Step 1: Select the number of clusters to be identified,
i.e select a value for K =3 in this case
▪ Step 2: Randomly select 3 distinct data point
▪ Step 3: Measure the distance between the 1st point
and selected 3 clusters
K-Means
Algorithm
Step 4: Assign the 1st
point to nearest cluster
(red in this case).
K-Means
Algorithm
Step 5: Calculate the
mean value including
the new point for the
red cluster
K-Means
Algorithm
Distance from point 2 to
the red cluster
Distance from point 2 to
the blue cluster
Distance from point 2 to
the green cluster
Find to which cluster does point 2 belongs to, how?
▪ Repeat the same procedure but measure the
distance to the red mean
Add the point to the
nearest cluster
K-Means
Algorithm
Calculate the cluster mean
including the new point
K-Means
Algorithm
Find to which cluster does point 3 belongs to, how?
▪ Repeat the same procedure but measure the
distance to the red mean
K-Means
Algorithm
Measure the distance and add
the 3rd point to the nearest
cluster, (red)
K-Means
Algorithm
Calculate the new cluster
mean using the new point
K-Means
Algorithm
To which cluster does
this point belongs to?
▪ Measure the distance
▪ Assign the point to the nearest cluster
▪ Calculate the cluster mean using the new point
K-Means
Algorithm
▪ Measure the distance
▪ Assign the point to the nearest cluster
▪ Calculate the cluster mean using the new point
K-Means
Algorithm
▪ Measure the distance
▪ Assign the point to the nearest cluster
▪ Calculate the cluster mean using the new point
K-Means
Algorithm
To which cluster does
this point belongs to?
▪ Measure the distance from the cluster mean (centroids)
▪ Assign the point to the nearest cluster
▪ Calculate the cluster mean using the new point
K-Means
Algorithm
▪ Measure the distance from the cluster mean (centroids)
▪ Assign the point to the nearest cluster
▪ Calculate the cluster mean using the new point
Since the point is located
closet to green cluster
K-Means
Algorithm
▪ Measure the distance from the cluster mean (centroids)
▪ Assign the point to the nearest cluster
▪ Calculate the cluster mean using the new point
K-Means
Algorithm
Since all of these points are
located closet to green cluster
so all of them will be assigned
to green cluster
K-Means
Algorithm
Original/Expected Result
Result from 1st
iteration
K-Means
Algorithm
Total variation within the cluster
According to the K-Means Algorithm it iterates over again and again
unless and until the data points within each cluster stops changing
K-Means
Algorithm
Iteration 2: Again we will start from the beginning. But this time
we will be selecting different initial random point (as compared
to what we chose in the 1st iteration)
▪ Step 1: Select the number of clusters to be identified, i.e. K =3 in this case
▪ Step 2: Randomly select 3 distinct data point
▪ Step 3: Measure the distance between the 1st point and selected 3
clusters
K-Means
Algorithm
Algorithm picks 3 initial clusters and adds the remaining points to
the cluster with the nearest mean, and again recalculating the mean
each time a new point is added to the cluster
K-Means
Algorithm
Algorithm picks 3 initial clusters and adds the remaining points to
the cluster with the nearest mean, and again recalculating the mean
each time a new point is added to the cluster
K-Means
Algorithm
Algorithm picks 3 initial clusters and adds the remaining points to
the cluster with the nearest mean, and again recalculating the mean
each time a new point is added to the cluster
K-Means
Algorithm
Algorithm picks 3 initial clusters and adds the remaining points to
the cluster with the nearest mean, and again recalculating the mean
each time a new point is added to the cluster
Total variation within the cluster
K-Means
Algorithm
Iteration 3: Again we will start from the beginning and select
different initial random point (as compared to what we chose in
the 1st and 2nd iteration)
Pick 3 initial clusters
K-Means
Algorithm
Cluster the remaining points
K-Means
Algorithm
Total variation within the cluster
Finally sum the variation within each cluster
K-Means
Algorithm
The algorithm can now compare the result and select
the best variance out of it
1st Iteration
2nd Iteration
3rd Iteration
K-Means
Algorithm
Now what if we have our data plotted on the X and Y axis
X-Axis
Y-Axis
K-Means
Algorithm
Similarly, pick initial 3 random points..
X-Axis
Y-Axis
K-Means
Algorithm
We will be using the Euclidean distance (in 2D its same as
that of a Pythagorean Theorem)
X-Axis
Y-Axis
ℎ2
= 𝑝2
+ 𝑏2
p
b
K-Means
Algorithm
Again assign the point to the nearest cluster
X-Axis
Y-Axis
K-Means
Algorithm
Finally calculate the centroid (mean of cluster)
including the new point
X-Axis
Y-Axis
K-Means
Algorithm
Finally in first iteration you get something like this…again
you have to iterate this process to get the final cluster
X-Axis
Y-Axis
How will you
find
K value
In the previous scenario k value was
known to be 3, but this is not always true
How will you
find
K value
For deciding the value of k, you have to use hit
and trail method, starting from K = 1
K=1 is the worst case scenario, even you cross-
verify it with total variation
How will you
find
K value
Now try with K = 2
K=2 is still better then K = 1 (Total Variation)
K = 1
K = 2
How will you
find
K value
Now try with K = 3
K=3 is even better than K =2 (Total Variation)
K = 1
K = 2
K = 3
How will you
find
K value
Now try with K = 4
Total variation in K=4 is less than K =3
K = 1
K = 2
K = 3
K = 4
How will you
find
K value
Now try with K = 4
Total variation in K=4 is less than K =3
K = 1
K = 2
K = 3
K = 4
Each time you increase the cluster the variation
decreases, no. of clusters = no. of data points then
in that case the variation = 0
How will you
find
K value
Reductioninvariance
Number of Cluster
This point is the elbow point and it is
used to determine the value of K(clusters)
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
DEMO
Let’s learn to
code
© MadeByShape Ltd. Web Design Manchester 2018. All Rights Reserved.
K-Means
Algorithm
Summarizing the K-Means Algorithm
Copyright © 2017, edureka and/or its affiliates. All rights reserved.

More Related Content

PPTX
Presentation on data preparation with pandas
PPTX
Tableau Presentation
PPT
K means Clustering Algorithm
PPTX
introduction to machin learning
PDF
La persistance des données : ORM et hibernate
PPTX
Machine Learning ppt.pptx
PPTX
KEY PERFORMANCE INDICATOR
PPTX
Job analysis
Presentation on data preparation with pandas
Tableau Presentation
K means Clustering Algorithm
introduction to machin learning
La persistance des données : ORM et hibernate
Machine Learning ppt.pptx
KEY PERFORMANCE INDICATOR
Job analysis

What's hot (20)

PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
Kmeans
PPTX
K MEANS CLUSTERING
PPTX
Fuzzy Clustering(C-means, K-means)
PPTX
Theory of Automata and formal languages unit 2
PDF
Unsupervised Learning in Machine Learning
PPTX
K-means clustering algorithm
PPTX
K-Nearest Neighbor(KNN)
PPTX
Introduction to Clustering algorithm
PPT
K mean-clustering algorithm
PPTX
Hierarchical clustering.pptx
PPTX
Theory of Automata and formal languages unit 1
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PPTX
K means clustering
PDF
K means Clustering
PDF
Feature selection
PPTX
Machine learning clustering
PDF
Optics ordering points to identify the clustering structure
PPT
Planning
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Kmeans
K MEANS CLUSTERING
Fuzzy Clustering(C-means, K-means)
Theory of Automata and formal languages unit 2
Unsupervised Learning in Machine Learning
K-means clustering algorithm
K-Nearest Neighbor(KNN)
Introduction to Clustering algorithm
K mean-clustering algorithm
Hierarchical clustering.pptx
Theory of Automata and formal languages unit 1
Decision tree induction \ Decision Tree Algorithm with Example| Data science
K means clustering
K means Clustering
Feature selection
Machine learning clustering
Optics ordering points to identify the clustering structure
Planning
Ad

Similar to K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Edureka (20)

PPTX
K MEANS CLUSTERING.pptx
PPTX
K MEANS CLUSTERING (1).pptx
PPTX
K-Means Clustering Algorithm.pptx
PPT
Enhance The K Means Algorithm On Spatial Dataset
PPTX
Clustering.pptx
PPTX
MODULE 4_ CLUSTERING.pptx
PPTX
machine learning - Clustering in R
PPT
Lecture_3_k-mean-clustering.ppt
PPT
Slide-TIF311-DM-10-11.ppt
PPT
Slide-TIF311-DM-10-11.ppt
PPT
Clustering & classification
PPT
clustering and their types explanation of data mining
PDF
Clustering
PPTX
Statistical Machine Learning unit3 lecture notes
PPTX
K Means Clustering in ML.pptx
PPTX
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
PPTX
Scalable k-means plus plus
PPTX
CLUSTER ANALYSIS ALGORITHMS.pptx
PPT
Cluster spss week7
PDF
Data analysis of weather forecasting
K MEANS CLUSTERING.pptx
K MEANS CLUSTERING (1).pptx
K-Means Clustering Algorithm.pptx
Enhance The K Means Algorithm On Spatial Dataset
Clustering.pptx
MODULE 4_ CLUSTERING.pptx
machine learning - Clustering in R
Lecture_3_k-mean-clustering.ppt
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
Clustering & classification
clustering and their types explanation of data mining
Clustering
Statistical Machine Learning unit3 lecture notes
K Means Clustering in ML.pptx
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
Scalable k-means plus plus
CLUSTER ANALYSIS ALGORITHMS.pptx
Cluster spss week7
Data analysis of weather forecasting
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation theory and applications.pdf
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
Encapsulation theory and applications.pdf
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing

K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Edureka