SlideShare a Scribd company logo
Clustering:
K-means Clustering
• With the abundance of raw data and the need for analysis, the concept
of unsupervised learning became popular over time.
• The main goal of unsupervised learning is to discover hidden and
exciting patterns in unlabelled data.
• The most common unsupervised learning algorithm is clustering.
• grouping documents according to the topic.
• Market Segmentation Statistical data analysis
• Social network analysis Image segmentation
• Anomaly detection, etc.
• It is used by the Amazon in its recommendation system to provide the
recommendations as per the past search of products.
• Netflix also uses this technique to recommend the movies and web-
series to its users as per the watch history.
K –means Clustering:
• K-Means Clustering is an Unsupervised Learning algorithm, which
groups the unlabeled dataset into different clusters.
• K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there
will be three clusters, and so on.
• It is a centroid-based algorithm, where each cluster is associated with
a centroid.
• The main aim of this algorithm is to minimize the sum of distances
between the data point and their corresponding clusters.
Clustering as a unsupervised learning method inin machine learning
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Elbow Method:
• To find the optimal number of clusters.
• This method uses the concept of WCSS value.
• WCSS stands for Within Cluster Sum of Squares, which defines the total
variations within a cluster.
Clustering as a unsupervised learning method inin machine learning
Hierarchical clustering, Choosing the number of clusters
• Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabeled datasets into a cluster and also
known as hierarchical cluster analysis or HCA.
• Develop the hierarchy of clusters in the form of a tree, and this tree-shaped
structure is known as the dendrogram.
• we don't need to have knowledge about the predefined number of
clusters.
• To group the datasets into clusters, it follows the bottom-up approach.
• This algorithm considers each data as a single cluster at the beginning
Step-1: Create each data point as a single cluster. Let's say there are N
data points, so the number of clusters will also be N.
Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.
Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:
Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.
To decide the number of clusters by looking at the dendrogram, you can use a
simple strategy called the "largest vertical distance without crossing any
horizontal line". Essentially, you draw a horizontal line across the dendrogram
and count the number of vertical lines it crosses. This will give you the number
of clusters. The idea is to choose a cut such that the distance (or difference)
between two clusters is maximum, which means they are the most dissimilar
and hence should be separate clusters.
However, the decision also depends on the context and the specific problem
you are trying to solve. Sometimes, domain knowledge can also help in
deciding the number of clusters.
Locate the largest vertical difference between nodes in the dendrogram, and in the middle
pass a horizontal line. The number of vertical lines intersecting it is the optimal number of
clusters.
Clustering as a unsupervised learning method inin machine learning
Measure for the distance between two clusters :
• Closest distance between the two clusters is crucial for the hierarchical
clustering.
• There are various ways to calculate the distance between two clusters, and
these ways decide the rule for clustering.
• These measures are called Linkage methods.
Single Linkage: It is the Shortest Distance between the closest points of the
clusters.
Complete Linkage: It is the farthest distance between the two points of
two different clusters.
Average Linkage: It is the linkage method in which the distance between
each pair of datasets is added up and then divided by the total number of
datasets to calculate the average distance between two clusters.
Centroid Linkage: It is the linkage method in which the distance between the
centroid of the clusters is calculated.
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning
Clustering as a unsupervised learning method inin machine learning

More Related Content

PPTX
Machine Learning : Clustering - Cluster analysis.pptx
PPTX
DS9 - Clustering.pptx
PPTX
machine learning - Clustering in R
PPTX
Hierarchical clustering machine learning by arpit_sharma
PDF
Unsupervised Machine Learning PPT Adi.pdf
PPTX
Unsupervised Learning-Clustering Algorithms.pptx
PPTX
Unsupervised learning (clustering)
PPTX
Cluster analysis
Machine Learning : Clustering - Cluster analysis.pptx
DS9 - Clustering.pptx
machine learning - Clustering in R
Hierarchical clustering machine learning by arpit_sharma
Unsupervised Machine Learning PPT Adi.pdf
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised learning (clustering)
Cluster analysis

Similar to Clustering as a unsupervised learning method inin machine learning (20)

PPTX
5_6305592025861329686.pptx_20240912_120520_0000.pptx
PPTX
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PDF
Data mining
PPTX
Clustering in Data Mining
PDF
Unsupervised Learning in Machine Learning
PPTX
UNIT - 4: Data Warehousing and Data Mining
PDF
PPT s10-machine vision-s2
PPTX
Clustering on DSS
PPTX
clustering and distance metrics.pptx
PPTX
PPTX
hierarchical clustering.pptx
PPT
26-Clustering MTech-2017.ppt
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
PDF
clustering using different methods in .pdf
PPTX
computational statistics machine learning unit 5.pptx
PDF
CSA 3702 machine learning module 3
PDF
Clustering[306] [Read-Only].pdf
PPTX
clustering ppt.pptx
PPTX
Unsupervised%20Learninffffg (2).pptx. application
PDF
4.Unit 4 ML Q&A.pdf machine learning qb
5_6305592025861329686.pptx_20240912_120520_0000.pptx
K MEANS CLUSTERING - UNSUPERVISED LEARNING
Data mining
Clustering in Data Mining
Unsupervised Learning in Machine Learning
UNIT - 4: Data Warehousing and Data Mining
PPT s10-machine vision-s2
Clustering on DSS
clustering and distance metrics.pptx
hierarchical clustering.pptx
26-Clustering MTech-2017.ppt
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
clustering using different methods in .pdf
computational statistics machine learning unit 5.pptx
CSA 3702 machine learning module 3
Clustering[306] [Read-Only].pdf
clustering ppt.pptx
Unsupervised%20Learninffffg (2).pptx. application
4.Unit 4 ML Q&A.pdf machine learning qb
Ad

Recently uploaded (20)

PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
PPT on Performance Review to get promotions
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Welding lecture in detail for understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
composite construction of structures.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
web development for engineering and engineering
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Construction Project Organization Group 2.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT on Performance Review to get promotions
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
R24 SURVEYING LAB MANUAL for civil enggi
Mechanical Engineering MATERIALS Selection
Welding lecture in detail for understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Operating System & Kernel Study Guide-1 - converted.pdf
additive manufacturing of ss316l using mig welding
composite construction of structures.pdf
Foundation to blockchain - A guide to Blockchain Tech
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
web development for engineering and engineering
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT 4 Total Quality Management .pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
OOP with Java - Java Introduction (Basics)
Construction Project Organization Group 2.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Ad

Clustering as a unsupervised learning method inin machine learning

  • 2. • With the abundance of raw data and the need for analysis, the concept of unsupervised learning became popular over time. • The main goal of unsupervised learning is to discover hidden and exciting patterns in unlabelled data. • The most common unsupervised learning algorithm is clustering. • grouping documents according to the topic. • Market Segmentation Statistical data analysis • Social network analysis Image segmentation • Anomaly detection, etc.
  • 3. • It is used by the Amazon in its recommendation system to provide the recommendations as per the past search of products. • Netflix also uses this technique to recommend the movies and web- series to its users as per the watch history.
  • 4. K –means Clustering: • K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. • K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. • It is a centroid-based algorithm, where each cluster is associated with a centroid. • The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
  • 6. Step-1: Select the number K to decide the number of clusters. Step-2: Select random K points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. Step-4: Calculate the variance and place a new centroid of each cluster. Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready.
  • 18. Elbow Method: • To find the optimal number of clusters. • This method uses the concept of WCSS value. • WCSS stands for Within Cluster Sum of Squares, which defines the total variations within a cluster.
  • 20. Hierarchical clustering, Choosing the number of clusters • Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA. • Develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. • we don't need to have knowledge about the predefined number of clusters. • To group the datasets into clusters, it follows the bottom-up approach. • This algorithm considers each data as a single cluster at the beginning
  • 21. Step-1: Create each data point as a single cluster. Let's say there are N data points, so the number of clusters will also be N.
  • 22. Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there will now be N-1 clusters.
  • 23. Step-3: Again, take the two closest clusters and merge them together to form one cluster. There will be N-2 clusters.
  • 24. Repeat Step 3 until only one cluster left. So, we will get the following clusters. Consider the below images:
  • 25. Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to divide the clusters as per the problem.
  • 26. To decide the number of clusters by looking at the dendrogram, you can use a simple strategy called the "largest vertical distance without crossing any horizontal line". Essentially, you draw a horizontal line across the dendrogram and count the number of vertical lines it crosses. This will give you the number of clusters. The idea is to choose a cut such that the distance (or difference) between two clusters is maximum, which means they are the most dissimilar and hence should be separate clusters. However, the decision also depends on the context and the specific problem you are trying to solve. Sometimes, domain knowledge can also help in deciding the number of clusters.
  • 27. Locate the largest vertical difference between nodes in the dendrogram, and in the middle pass a horizontal line. The number of vertical lines intersecting it is the optimal number of clusters.
  • 29. Measure for the distance between two clusters : • Closest distance between the two clusters is crucial for the hierarchical clustering. • There are various ways to calculate the distance between two clusters, and these ways decide the rule for clustering. • These measures are called Linkage methods.
  • 30. Single Linkage: It is the Shortest Distance between the closest points of the clusters.
  • 31. Complete Linkage: It is the farthest distance between the two points of two different clusters.
  • 32. Average Linkage: It is the linkage method in which the distance between each pair of datasets is added up and then divided by the total number of datasets to calculate the average distance between two clusters.
  • 33. Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is calculated.