SlideShare a Scribd company logo
4
Most read
5
Most read
22
Most read
By – AYUSH
Netaji Subhash engineering college, kolkata
Introduction
 The method of identifying similar groups of data in a
dataset is called clustering.
 It is one of the most popular techniques in data science.
 Entities in each group are comparatively more similar to
entities of that group than those of the other groups.
 In this presentation, I will be taking you through the
types of clustering, different clustering algorithms and a
brief view of two of the most commonly used clustering
methods i.e.,
Graph Based Clustering and Density Based Clustering.
Graph
Based
Clustering
Graph Theory :
 Graph Theory can be used for getting thorough information
about the inside structure of the data set in terms of :
- cliques (subgraph of graph such that all vertices in subgraph are
completely connected)
- clusters (highly connected group of nodes)
- centrality (measure of importance of a node in the network)
- outliers (unimportant nodes)
 Applications :
- Social Graphs (drawing edges between us and the people
and everything)
- Path Optimization Algorithms (Minimal Spanning Tree, Kruskal’s, Prim’s)
- GPS Navigation Systems (shortest path APIs)
GRAPH BASED CLUSTERING
 Graph-based clustering is a method for identifying
groups of similar cells or samples.
 It makes no prior assumptions about the clusters in the
data.
 This means the number, size, density, and shape of
clusters does not need to be known or assumed prior to
clustering.
 Consequently, graph-based clustering is useful for
identifying clustering in complex data sets such as
scRNA-seq.
IDEA :
• Graph-Based clustering uses the proximity graph
– Start with the proximity matrix
– Consider each point as a node in a graph
– Each edge between two nodes has a weight which is the
proximity between the two points
– Initially the proximity graph is fully connected
– MIN (single-link) and MAX (complete-link) can be viewed as
starting with this graph
• In the simplest case, clusters are connected components in the graph.
GRAPH CLUSTERING IDEA :
HIERARCHICAL METHOD :
1) Determining a minimal spanning tree (MST)
2) Delete branches iteratively
New Connected Components = Cluster
MINIMAL SPANNING TREE :
A minimal spanning tree of a connected graph G = (V,E) is a
connected subgraph with minimal weight that contains all nodes of
G and has no cycles.
Minimal Spanning Trees can be calculated with :-
 Prim’s Algorithm
- Prim's (also known as Jarník's) algorithm is a greedy algorithm that finds a
minimum spanning tree for a weighted undirected graph.
- This means it finds a subset of the edges that forms a tree that includes
every vertex, where the total weight of all the edges in the tree is
minimized.
 Kruskal’s Algorithm
- Kruskal's algorithm is a minimum-spanning-tree algorithm which finds an
edge of the least possible weight that connects any two trees in the forest.
- It is a greedy algorithm in graph theory as it finds a minimum spanning tree
for a connected weighted graph adding increasing cost arcs at each step.
Branch Deletion
Delete Branches – Different Strategies :-
I. Delete the branch with maximum weight.
II. Delete inconsistent branches.
III. Delete by analysis of weights.
SUMMARY :-
In graph based clustering objects are represented as
nodes in a complete or connected graph.
The distance between two objects is given by the weight
of the corresponding branch.
Hierarchical Method :
(1) Determine a minimal spanning tree(MST).
(2) Delete branches iteratively.
Visualization of information in large datasets.
DENSITY
BASED
CLUSTERING
DBSCAN :
 Density based spatial clustering of applications with noise.
 It is one of the most cited clustering algorithms in the literature.
Features : -
• Spatial data
(geomarketing, tomography, satellite images)
• Discovery of clusteres with arbitrary shape
(spherical, drawn out, linear, elongated)
• Good efficiency or large databases
(parallel programming)
• Only two parameters required.
• No prior knowledge of the number of clusters are required.
IDEA :
Clusters have a high density of points.
In the area of noise the density is lower than in any of the
clusters.
Goal :
Formalize the notions of clusters and
noise.
Density based cluster : definition
 Relies on a density-based notion of cluster: A cluster is defined as
a
maximum set of density-connected points.
 A cluster C is a subset of D satisfying
- For all p, q if p is in C, and q is density reachable from p, then
q
is also in C
- For all p, q in C: p is density connected to q
DENSITY BASED CLUSTERING: DATA
● Two Parameters:
- Eps : Maximum radius of the neighbourhood
- MinPts : Minimum number of points in an Eps-neighbourhood of that point
● Neps(p) : {q belongs to D| dist(p,q)<= Eps}
Problem :
 In each cluster there are two kinds of points :
- points inside the cluster (core points)
- points on the border (border points)
 An Eps-neighbourhood of a border point contains significantly less
points than an Eps-neighbourhood of a core point.
IDEA :
For every point p in a cluster C there is a point q ∈
C, so that
1) p is inside the Eps-neighbourhood of q and
2) Neps(q) contains at least MinPts points.
● Directly density-reachable: A point p is directly
density-reachable from point q with regard to Eps and MinPts, if
1) p ∈ to Neps (q) (reachability)
2)|Neps (q)|>= MinPts (core point condition)
DEFINITION :
Density-reachable:
 A point p is density-reachable
from a point q wrt. Eps,
MinPts if there is a chain of
points p1,...,pn,p1= q, pn = p
such that pi+1 is directly
density-reachable from pi.

Density-concerned:
 A point p is density-connected
to a wrt. Eps, MinPts if there is
a point o such that both, p and
q are density-reachable from
O wrt. Eps and MinPts.

DBSCAN (algorithm) :
Start with an arbitrary point p from the database and
retrieve all points density-reachable from p with regard to
Eps and MinPts.
If p is a core point, the procedure yields a cluster with
regards to Eps and MinPts and the point is classified.
If p is a border point, no points are density-reachable
from p and DBSCAN visits the next unclassified point in
the database.
Density based clustering – application
CONCLUSION
Clustering is a descriptive technique.
The solution is not unique and it strongly depends
upon the analyst’s choices.
We described how it is possible to combine different
results in order to obtain stable clusters, not
depending too much on the criteria selected to
analyze data.
Clustering always provides groups, even if there is no
group structure.
REFERENCES :
 A big help from Eric Kropat.
 Wikipedia , Google Searches

More Related Content

PPTX
Language models
PDF
K - Nearest neighbor ( KNN )
PDF
Natural Language Processing with Python
PDF
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PPTX
DBSCAN (2014_11_25 06_21_12 UTC)
PPTX
Introduction to Big Data/Machine Learning
PPTX
Natural Language Processing: Parsing
PDF
Prepare your data for machine learning
Language models
K - Nearest neighbor ( KNN )
Natural Language Processing with Python
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
DBSCAN (2014_11_25 06_21_12 UTC)
Introduction to Big Data/Machine Learning
Natural Language Processing: Parsing
Prepare your data for machine learning

What's hot (20)

PPTX
AI_Session 10 Local search in continious space.pptx
PPTX
Machine learning for Data Science
PDF
Deep Learning - Convolutional Neural Networks
PPTX
Data mining technique (decision tree)
PPTX
Transformers AI PPT.pptx
PPTX
Introduction to Transformer Model
PPTX
AI: Learning in AI
PPTX
5. phases of nlp
PPTX
weak slot and filler structure
ODP
Machine Learning with Decision trees
PPTX
K Nearest Neighbor Algorithm
PPTX
Next word Prediction
PDF
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
PPTX
Introduction to Machine Learning
PPTX
Supervised and unsupervised learning
PPTX
UNIT-4.pptx
PDF
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
PPT
Knowledge Representation in Artificial intelligence
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
AI_Session 10 Local search in continious space.pptx
Machine learning for Data Science
Deep Learning - Convolutional Neural Networks
Data mining technique (decision tree)
Transformers AI PPT.pptx
Introduction to Transformer Model
AI: Learning in AI
5. phases of nlp
weak slot and filler structure
Machine Learning with Decision trees
K Nearest Neighbor Algorithm
Next word Prediction
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
Introduction to Machine Learning
Supervised and unsupervised learning
UNIT-4.pptx
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Knowledge Representation in Artificial intelligence
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Ad

Similar to Graph and Density Based Clustering (20)

PDF
clustering density technidques in machine learning
PDF
Clustering Algorithms for Data Stream
PDF
7. 10083 12464-1-pb
PDF
DBSCAN
PDF
Analysis of mass based and density based clustering techniques on numerical d...
PDF
Survey Paper on Clustering Data Streams Based on Shared Density between Micro...
PDF
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
PDF
A0360109
PPT
3.4 density and grid methods
PPTX
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
PPT
cluster analysis
PDF
Clustering Algorithm by Vishal.pdf
PPTX
Data Mining Lecture_7.pptx
PPT
dm_clustering2.ppt
PDF
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
PPTX
clustering and distance metrics.pptx
PPTX
Dbscan
PPTX
CLUSTER ANALYSIS ALGORITHMS.pptx
PDF
CPSC 340: Machine Learning and Data Mining More Clustering Andreas Lehrmann a...
clustering density technidques in machine learning
Clustering Algorithms for Data Stream
7. 10083 12464-1-pb
DBSCAN
Analysis of mass based and density based clustering techniques on numerical d...
Survey Paper on Clustering Data Streams Based on Shared Density between Micro...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A0360109
3.4 density and grid methods
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
cluster analysis
Clustering Algorithm by Vishal.pdf
Data Mining Lecture_7.pptx
dm_clustering2.ppt
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
clustering and distance metrics.pptx
Dbscan
CLUSTER ANALYSIS ALGORITHMS.pptx
CPSC 340: Machine Learning and Data Mining More Clustering Andreas Lehrmann a...
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Well-logging-methods_new................
PPTX
Sustainable Sites - Green Building Construction
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Welding lecture in detail for understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
PPT on Performance Review to get promotions
PPT
Project quality management in manufacturing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
DOCX
573137875-Attendance-Management-System-original
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CH1 Production IntroductoryConcepts.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Well-logging-methods_new................
Sustainable Sites - Green Building Construction
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Model Code of Practice - Construction Work - 21102022 .pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Welding lecture in detail for understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Structs to JSON How Go Powers REST APIs.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT on Performance Review to get promotions
Project quality management in manufacturing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
573137875-Attendance-Management-System-original

Graph and Density Based Clustering

  • 1. By – AYUSH Netaji Subhash engineering college, kolkata
  • 2. Introduction  The method of identifying similar groups of data in a dataset is called clustering.  It is one of the most popular techniques in data science.  Entities in each group are comparatively more similar to entities of that group than those of the other groups.  In this presentation, I will be taking you through the types of clustering, different clustering algorithms and a brief view of two of the most commonly used clustering methods i.e., Graph Based Clustering and Density Based Clustering.
  • 4. Graph Theory :  Graph Theory can be used for getting thorough information about the inside structure of the data set in terms of : - cliques (subgraph of graph such that all vertices in subgraph are completely connected) - clusters (highly connected group of nodes) - centrality (measure of importance of a node in the network) - outliers (unimportant nodes)  Applications : - Social Graphs (drawing edges between us and the people and everything) - Path Optimization Algorithms (Minimal Spanning Tree, Kruskal’s, Prim’s) - GPS Navigation Systems (shortest path APIs)
  • 5. GRAPH BASED CLUSTERING  Graph-based clustering is a method for identifying groups of similar cells or samples.  It makes no prior assumptions about the clusters in the data.  This means the number, size, density, and shape of clusters does not need to be known or assumed prior to clustering.  Consequently, graph-based clustering is useful for identifying clustering in complex data sets such as scRNA-seq.
  • 6. IDEA : • Graph-Based clustering uses the proximity graph – Start with the proximity matrix – Consider each point as a node in a graph – Each edge between two nodes has a weight which is the proximity between the two points – Initially the proximity graph is fully connected – MIN (single-link) and MAX (complete-link) can be viewed as starting with this graph • In the simplest case, clusters are connected components in the graph.
  • 8. HIERARCHICAL METHOD : 1) Determining a minimal spanning tree (MST) 2) Delete branches iteratively New Connected Components = Cluster MINIMAL SPANNING TREE : A minimal spanning tree of a connected graph G = (V,E) is a connected subgraph with minimal weight that contains all nodes of G and has no cycles.
  • 9. Minimal Spanning Trees can be calculated with :-  Prim’s Algorithm - Prim's (also known as Jarník's) algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. - This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized.  Kruskal’s Algorithm - Kruskal's algorithm is a minimum-spanning-tree algorithm which finds an edge of the least possible weight that connects any two trees in the forest. - It is a greedy algorithm in graph theory as it finds a minimum spanning tree for a connected weighted graph adding increasing cost arcs at each step.
  • 10. Branch Deletion Delete Branches – Different Strategies :- I. Delete the branch with maximum weight. II. Delete inconsistent branches. III. Delete by analysis of weights.
  • 11. SUMMARY :- In graph based clustering objects are represented as nodes in a complete or connected graph. The distance between two objects is given by the weight of the corresponding branch. Hierarchical Method : (1) Determine a minimal spanning tree(MST). (2) Delete branches iteratively. Visualization of information in large datasets.
  • 13. DBSCAN :  Density based spatial clustering of applications with noise.  It is one of the most cited clustering algorithms in the literature. Features : - • Spatial data (geomarketing, tomography, satellite images) • Discovery of clusteres with arbitrary shape (spherical, drawn out, linear, elongated) • Good efficiency or large databases (parallel programming) • Only two parameters required. • No prior knowledge of the number of clusters are required.
  • 14. IDEA : Clusters have a high density of points. In the area of noise the density is lower than in any of the clusters. Goal : Formalize the notions of clusters and noise.
  • 15. Density based cluster : definition  Relies on a density-based notion of cluster: A cluster is defined as a maximum set of density-connected points.  A cluster C is a subset of D satisfying - For all p, q if p is in C, and q is density reachable from p, then q is also in C - For all p, q in C: p is density connected to q
  • 16. DENSITY BASED CLUSTERING: DATA ● Two Parameters: - Eps : Maximum radius of the neighbourhood - MinPts : Minimum number of points in an Eps-neighbourhood of that point ● Neps(p) : {q belongs to D| dist(p,q)<= Eps}
  • 17. Problem :  In each cluster there are two kinds of points : - points inside the cluster (core points) - points on the border (border points)  An Eps-neighbourhood of a border point contains significantly less points than an Eps-neighbourhood of a core point.
  • 18. IDEA : For every point p in a cluster C there is a point q ∈ C, so that 1) p is inside the Eps-neighbourhood of q and 2) Neps(q) contains at least MinPts points.
  • 19. ● Directly density-reachable: A point p is directly density-reachable from point q with regard to Eps and MinPts, if 1) p ∈ to Neps (q) (reachability) 2)|Neps (q)|>= MinPts (core point condition) DEFINITION :
  • 20. Density-reachable:  A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1,...,pn,p1= q, pn = p such that pi+1 is directly density-reachable from pi.  Density-concerned:  A point p is density-connected to a wrt. Eps, MinPts if there is a point o such that both, p and q are density-reachable from O wrt. Eps and MinPts. 
  • 21. DBSCAN (algorithm) : Start with an arbitrary point p from the database and retrieve all points density-reachable from p with regard to Eps and MinPts. If p is a core point, the procedure yields a cluster with regards to Eps and MinPts and the point is classified. If p is a border point, no points are density-reachable from p and DBSCAN visits the next unclassified point in the database.
  • 22. Density based clustering – application
  • 23. CONCLUSION Clustering is a descriptive technique. The solution is not unique and it strongly depends upon the analyst’s choices. We described how it is possible to combine different results in order to obtain stable clusters, not depending too much on the criteria selected to analyze data. Clustering always provides groups, even if there is no group structure.
  • 24. REFERENCES :  A big help from Eric Kropat.  Wikipedia , Google Searches