SlideShare a Scribd company logo
UNSUPERVISED
MACHINE LEARNING
presented by- Pravinkumar Landge
1
• Introduction
• Clustering
• K-means clustering
• Hierarchical clustering
• Comparison between K-means and Hierarchical
• DBSCAN clustering
2
Introduction
• Unsupervised learning is a type of machine
learning algorithm used to draw inferences from datasets
consisting of input data without labeled responses. The
most common unsupervised learning method is cluster
analysis, which is used for exploratory data analysis to
find hidden patterns or grouping in data.
3
What is clustering?
• A group of objects that are similar to other objects in the
cluster, and dissimilar to data points in other clusters.
4
Use of clustering
Clustering has been widely used across industries for
years:
• Biology - for genetic and species grouping;
• Medical imaging - for distinguishing between different
kinds of tissues;
• Market research - for differentiating groups of customers
based on some attributes
• Recommender systems - giving you better Amazon
purchase suggestions or Netflix movie matches.
5
Clustering algorithms
• Partition-based clustering
• Relatively efficient
• E.g. k-means
• Hierarchical clustering
• Produces trees of clusters
• E.g. Agglomerative, Divisive
• Density-based clustering
• Produces arbitrary shaped clusters
• E.g. DBSCAN
6
K-means clustering
• k-means is a partitioning clustering
• K-means divides the data into non-overlapping subsets
(clusters) without any cluster-internal structure
• Examples within a cluster are very similar
• Examples across different clusters are very different
7
Determine the similarity or dissimilarity
8
1-dimensional similarity/distance
9
2-dimensional similarity/distance
10
How does k-means clustering works?
1. Randomly place k centroids, one for each cluster
2. Calculate the distance of each point from each centroid
3. Assign each data point(object) to the closest centroid,
creating a cluster
4. Recalculate the position of the k centroids
5. Repeat the steps 2-4, until the centroids no longer
move
11
12
Choosing k
13
• K-means is partitioning algorithm relatively efficient for
medium and large sized databases
• Produces sphere-like clusters
• Needs number of clusters (k)
14
Hierarchical clustering
• Hierarchical clustering algorithms build a hierarchy of
clusters where each node is a cluster consists of the
clusters of its daughter nodes.
• Hierarchical clustering strategies
• Divisive (top down)
• Agglomerative (bottom up)
15
Agglomerative algorithm
1. Create n clusters, one for each data point
2. Compute the proximity matrix
3. Repeat
1. Merge the two closest clusters
2. Update the proximity matrix
4. Until only a single cluster remains
16
Similarity/Distance
17
Distance between clusters
• Single-Linkage clustering
• Minimum distance between clusters
• Complete-Linkage Clustering
• Maximum distance between clusters
• Average linkage clustering
• Average distance between clusters
• Centroid linkage clustering
• Distance between cluster centroids
18
• Advantages
• Doesn’t required number of clusters to be specified
• Easy to implement
• Produces a dendrogram, which helps with understanding the data
19
• Disadvantages
• Can never undo any previous steps throughout the algorithm
• Generally has long runtimes
• Sometimes difficult to identify the number of clusters by the
dendrogram
20
Hierarchical clustering Vs. K-means
K-means Hierarchical Clustering
Much more efficient Can be slow for large datasets
Requires the number of clusters to be
specified
Does not require the number of
clusters to run
Gives only one partitioning of the data
based on the predefined number of
clusters
Gives more than one partitioning
depending on the resolution
Potentially returns different clusters
each time it is run due to random
initialization of centroids
Always generates the same clusters
21
DBSCAN clustering
• When applied to tasks with arbitrary shaped clusters or
clusters within clusters, traditional techniques might not
be able to achieve good results
• Partitioning based algorithms has no notion of outliers that
is, all points are assigned to a cluster even if they do not
belong in any
• In contrast, density-based clustering locates regions
of high density that are separated from one another by
regions of low density. Density in this context is defined as
the number of points within a specified radius.
22
23
K-means vs density based clustering
24
What is DBSCAN?
• DBSCAN (Density-Based Spatial Clustering of
Applications with Noise)
• Is one of the most common clustering algorithms
• Works based on density of objects
• R (Radius of neighborhood)
• Radius (R) that if includes enough number
of points within, we call it a dense area
• M (Min number of neighbors)
• The minimum number of data points
we want in a neighborhood to define a cluster
25
How DBSCAN works?
26
DBSCAN algorithm- core point
• R=2 units M=6
27
DBSCAN algorithm- border point
• R=2 unit M=6
28
29
DBSCAN algorithm- outliers
30
DBSCAN algorithm- identify all points
31
DBSCAN algorithm- clusters?
32
Advantages of DBSCAN
1. Arbitrarily shaped clusters
2. Robust to outliers
3. Does not require specification
of the number of clusters
33
34

More Related Content

PPTX
Unsupervised learning
PPTX
Unsupervised learning clustering
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PDF
Reinforcement learning, Q-Learning
PPTX
Artificial neural network
PPTX
Lesson 1 intro to ai
PPTX
core java
Unsupervised learning
Unsupervised learning clustering
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Reinforcement learning, Q-Learning
Artificial neural network
Lesson 1 intro to ai
core java

What's hot (20)

PPTX
Presentation on supervised learning
PPTX
Machine Learning
PPT
2.4 rule based classification
PPTX
Data preprocessing in Machine learning
PPTX
Supervised and unsupervised learning
PDF
Data preprocessing using Machine Learning
PPTX
PDF
Performance Metrics for Machine Learning Algorithms
PDF
K - Nearest neighbor ( KNN )
PPTX
Ensemble learning Techniques
PDF
Classification Based Machine Learning Algorithms
PPT
K mean-clustering algorithm
PPTX
Supervised learning and Unsupervised learning
PPT
2.2 decision tree
PPTX
Clustering in Data Mining
PDF
Unsupervised learning: Clustering
PDF
Decision trees in Machine Learning
PDF
Dimensionality Reduction
PDF
Decision tree
Presentation on supervised learning
Machine Learning
2.4 rule based classification
Data preprocessing in Machine learning
Supervised and unsupervised learning
Data preprocessing using Machine Learning
Performance Metrics for Machine Learning Algorithms
K - Nearest neighbor ( KNN )
Ensemble learning Techniques
Classification Based Machine Learning Algorithms
K mean-clustering algorithm
Supervised learning and Unsupervised learning
2.2 decision tree
Clustering in Data Mining
Unsupervised learning: Clustering
Decision trees in Machine Learning
Dimensionality Reduction
Decision tree
Ad

Similar to Unsupervised learning (clustering) (20)

PPTX
DS9 - Clustering.pptx
PPTX
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
PPTX
unitvclusteranalysis-221214135407-1956d6ef.pptx
PPTX
Unsupervised%20Learninffffg (2).pptx. application
PPTX
Machine Learning : Clustering - Cluster analysis.pptx
PPTX
Clustering on DSS
PPTX
Data mining techniques unit v
PPT
26-Clustering MTech-2017.ppt
PPTX
machine learning - Clustering in R
PDF
algoritma klastering.pdf
PPTX
Clustering as a unsupervised learning method inin machine learning
PDF
clustering using different methods in .pdf
PPTX
UNIT_V_Cluster Analysis.pptx
PPTX
05 Clustering in Data Mining
PDF
PPT s10-machine vision-s2
PDF
Algorithm for mining cluster and association patterns
PPTX
Advanced database and data mining & clustering concepts
PDF
CSA 3702 machine learning module 3
PPTX
Algorithms used in AIML and the need for aiml basic use cases
DS9 - Clustering.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
unitvclusteranalysis-221214135407-1956d6ef.pptx
Unsupervised%20Learninffffg (2).pptx. application
Machine Learning : Clustering - Cluster analysis.pptx
Clustering on DSS
Data mining techniques unit v
26-Clustering MTech-2017.ppt
machine learning - Clustering in R
algoritma klastering.pdf
Clustering as a unsupervised learning method inin machine learning
clustering using different methods in .pdf
UNIT_V_Cluster Analysis.pptx
05 Clustering in Data Mining
PPT s10-machine vision-s2
Algorithm for mining cluster and association patterns
Advanced database and data mining & clustering concepts
CSA 3702 machine learning module 3
Algorithms used in AIML and the need for aiml basic use cases
Ad

Recently uploaded (20)

PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
composite construction of structures.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPT
Project quality management in manufacturing
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mechanical Engineering MATERIALS Selection
Internet of Things (IOT) - A guide to understanding
composite construction of structures.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Construction Project Organization Group 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Structs to JSON How Go Powers REST APIs.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Project quality management in manufacturing
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lesson 3_Tessellation.pptx finite Mathematics
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf

Unsupervised learning (clustering)

  • 2. • Introduction • Clustering • K-means clustering • Hierarchical clustering • Comparison between K-means and Hierarchical • DBSCAN clustering 2
  • 3. Introduction • Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. 3
  • 4. What is clustering? • A group of objects that are similar to other objects in the cluster, and dissimilar to data points in other clusters. 4
  • 5. Use of clustering Clustering has been widely used across industries for years: • Biology - for genetic and species grouping; • Medical imaging - for distinguishing between different kinds of tissues; • Market research - for differentiating groups of customers based on some attributes • Recommender systems - giving you better Amazon purchase suggestions or Netflix movie matches. 5
  • 6. Clustering algorithms • Partition-based clustering • Relatively efficient • E.g. k-means • Hierarchical clustering • Produces trees of clusters • E.g. Agglomerative, Divisive • Density-based clustering • Produces arbitrary shaped clusters • E.g. DBSCAN 6
  • 7. K-means clustering • k-means is a partitioning clustering • K-means divides the data into non-overlapping subsets (clusters) without any cluster-internal structure • Examples within a cluster are very similar • Examples across different clusters are very different 7
  • 8. Determine the similarity or dissimilarity 8
  • 11. How does k-means clustering works? 1. Randomly place k centroids, one for each cluster 2. Calculate the distance of each point from each centroid 3. Assign each data point(object) to the closest centroid, creating a cluster 4. Recalculate the position of the k centroids 5. Repeat the steps 2-4, until the centroids no longer move 11
  • 12. 12
  • 14. • K-means is partitioning algorithm relatively efficient for medium and large sized databases • Produces sphere-like clusters • Needs number of clusters (k) 14
  • 15. Hierarchical clustering • Hierarchical clustering algorithms build a hierarchy of clusters where each node is a cluster consists of the clusters of its daughter nodes. • Hierarchical clustering strategies • Divisive (top down) • Agglomerative (bottom up) 15
  • 16. Agglomerative algorithm 1. Create n clusters, one for each data point 2. Compute the proximity matrix 3. Repeat 1. Merge the two closest clusters 2. Update the proximity matrix 4. Until only a single cluster remains 16
  • 18. Distance between clusters • Single-Linkage clustering • Minimum distance between clusters • Complete-Linkage Clustering • Maximum distance between clusters • Average linkage clustering • Average distance between clusters • Centroid linkage clustering • Distance between cluster centroids 18
  • 19. • Advantages • Doesn’t required number of clusters to be specified • Easy to implement • Produces a dendrogram, which helps with understanding the data 19
  • 20. • Disadvantages • Can never undo any previous steps throughout the algorithm • Generally has long runtimes • Sometimes difficult to identify the number of clusters by the dendrogram 20
  • 21. Hierarchical clustering Vs. K-means K-means Hierarchical Clustering Much more efficient Can be slow for large datasets Requires the number of clusters to be specified Does not require the number of clusters to run Gives only one partitioning of the data based on the predefined number of clusters Gives more than one partitioning depending on the resolution Potentially returns different clusters each time it is run due to random initialization of centroids Always generates the same clusters 21
  • 22. DBSCAN clustering • When applied to tasks with arbitrary shaped clusters or clusters within clusters, traditional techniques might not be able to achieve good results • Partitioning based algorithms has no notion of outliers that is, all points are assigned to a cluster even if they do not belong in any • In contrast, density-based clustering locates regions of high density that are separated from one another by regions of low density. Density in this context is defined as the number of points within a specified radius. 22
  • 23. 23
  • 24. K-means vs density based clustering 24
  • 25. What is DBSCAN? • DBSCAN (Density-Based Spatial Clustering of Applications with Noise) • Is one of the most common clustering algorithms • Works based on density of objects • R (Radius of neighborhood) • Radius (R) that if includes enough number of points within, we call it a dense area • M (Min number of neighbors) • The minimum number of data points we want in a neighborhood to define a cluster 25
  • 27. DBSCAN algorithm- core point • R=2 units M=6 27
  • 28. DBSCAN algorithm- border point • R=2 unit M=6 28
  • 29. 29
  • 31. DBSCAN algorithm- identify all points 31
  • 33. Advantages of DBSCAN 1. Arbitrarily shaped clusters 2. Robust to outliers 3. Does not require specification of the number of clusters 33
  • 34. 34