SlideShare a Scribd company logo
7
Most read
11
Most read
12
Most read
hierarchical methods
 While partitioning methods meet the basic clustering
requirement of organizing a set of objects into a number of
exclusive groups, in some situations we may want to partition
our data into groups at different levels such as in a hierarchy.
 A hierarchical clustering method works by grouping data objects
into a hierarchy or “tree” of clusters.
 Representing data objects in the form of a hierarchy is usefull
for data summarization and visualization.
 The multiple-phase (or multiphase) clustering.
I. Agglomerative versus Divisive Hierarchical Clustering:
 A hierarchical clustering method can be either agglomerative or divisive,
depending on whether the hierarchical decomposition is formed in a
bottom-up or top-down fashion.
 An agglomerative hierarchical clustering method uses a bottom-up
strategy. The single cluster becomes the hierarchy`s root.
 A divisive hierarchical clustering method employs a top-down strategy. It
starts by placing all objects in one cluster, which is the hierarchy`s root.
 This is a single-linkage approach in that each cluster is represented by all
the objects in the cluster, and the similarity between two clusters is
measured by the similarity of the closest pair of data points belonging to
different cluster.
 The cluster-splitting process repeats until, eventually, each new cluster
contains only a single objects.
 A tree structure called a dendrogram is commonly used to represent the
process of hierarchical clustering.
II. Distance Measures in Algorithmic Methods:
 whether using an agglomerative method or a divisive method, a core need
is to measure the distance between two clusters, where each cluster is
generally a set of objects.
 Four widely used measures for distance between clusters, where |p-p`| is
the distance between two objects or points, p and p`; mi is the mean for
cluster, Ci and ni is the number of objects in Ci.
 They are also known as linkage measures.
 Minimum distance: distmin(Ci, Cj)= min {|p-p`|}
pCᵢ;p`Cj
 Maximum distance: distmax(Ci, Cj)= min {|p-p`|}
pCᵢ;p`Cj
 Mean distance: distmin(Ci, Cj)=  Mᵢ- Mj
 Average distance: distmin(Ci, Cj)= 1/ nᵢnj p - p`
pCᵢ;p`Cj
III. BIRCH: Multiphase Hierarchical Clustering Using
Clustering Feature Trees
 Balanced Iterative Reducing and Clustering using Hierarchical (BIRCH)
is designed for clustering a large amount of numeric data by integrating
hierarchical clustering(at the initial micro clustering stage) and other
clustering methods such as iterative partitioning(at the macro clustering
stage).
 The two difficulties in agglomerative clustering methods:
Scalability
Inability
 BIRCH uses the notions of clustering feature to summarize a cluster, and
clustering feature tree(CF-tree) to represent a cluster hierarchy.
 Consider a cluster of n d-dimensional data objects or points.
 A clustering feature is essentially a summary of the statistics for the given
cluster. The cluster`s centroid, xₒ, radius, R, and diameter, D, are
 Phase 1:
BIRCH scans the database to build an initial in-memory CF-
tree, which can be viewed as a multilevel compression of the
data that tries to preserve the data`s inherent clustering structure.
 Phase 2:
BIRCH applies a (selected) clustering algorithm to cluster the
leaf nodes of the CF-tree, which removes sparse clusters and
groups dense cluster into larger ones.
IV. Chameleon: Multiphase Hierarchical Clustering Using
Dynamic Modeling
 Chameleon is a hierarchical clustering algorithm that uses dynamic modeling
to determine the similarity between pairs of clusters.
 The connected objects are within a clusters and the proximity of clusters.
 That is, two clusters are merged if their interconnectivity high and they are
close together.
 Chameleon uses a k-nearest-neighbor graph approach to construct a sparse
graph.
 Chameleon uses a graph partitioning algorithm to partition the
k-nearest-neighbor graph into a large number of relatively small
subclusters such that it minimizes the edge cut.
 Chameleon determines the similarity between each pair of
clusters Cᵢ and Cj according to their relative interconnectivity,
RI(Cᵢ,Cj), and their relative closeness, RC(Ci,Cj).
 The processing cost for high-dimensional data may require
O(n²) time for n objects in the worst case.
V. Probabilistic Hierarchical Clustering
 Algorithmic hierarchical clustering methods using linkage measures tend
to be easy to understand and are often efficient in clustering.
 They are commonly used in many clustering analysis applications.
 Algorithmic hierarchical clustering methods can suffer from several
drawbacks.
 One way to look at the clustering problem is to regard the set of data
objects to be clustered as sample of the underlying data generation
mechanism to be analyzed or, formally, the generative model.
hierarchical methods
hierarchical methods

More Related Content

PPT
Cluster analysis
PPTX
Hierarchical clustering.pptx
PPTX
Machine learning clustering
PPTX
CART – Classification & Regression Trees
PPTX
Support vector machines (svm)
PDF
Hierarchical clustering
PPSX
Frequent itemset mining methods
PDF
Hierarchical Clustering
Cluster analysis
Hierarchical clustering.pptx
Machine learning clustering
CART – Classification & Regression Trees
Support vector machines (svm)
Hierarchical clustering
Frequent itemset mining methods
Hierarchical Clustering

What's hot (20)

PPTX
Ensemble learning
PPT
1.8 discretization
PPTX
Data cube computation
PPTX
Forms of learning in ai
PPTX
Merge sort algorithm
PPTX
Brute force method
PDF
Data Science - Part III - EDA & Model Selection
PPTX
Data Reduction
PPTX
10 -- Overfitting and Underfitting.pptx
PPTX
Clusters techniques
PPTX
05 Clustering in Data Mining
PPTX
data generalization and summarization
PPT
3.3 hierarchical methods
PDF
Understanding Bagging and Boosting
PPTX
DFS and BFS
PPTX
Strength of des & block cipher principle
PPTX
Machine learning and types
PPT
CONFUSION MATRIX.ppt
PPTX
Python pandas Library
PPT
Chapter 3 ds
Ensemble learning
1.8 discretization
Data cube computation
Forms of learning in ai
Merge sort algorithm
Brute force method
Data Science - Part III - EDA & Model Selection
Data Reduction
10 -- Overfitting and Underfitting.pptx
Clusters techniques
05 Clustering in Data Mining
data generalization and summarization
3.3 hierarchical methods
Understanding Bagging and Boosting
DFS and BFS
Strength of des & block cipher principle
Machine learning and types
CONFUSION MATRIX.ppt
Python pandas Library
Chapter 3 ds
Ad

Similar to hierarchical methods (20)

PDF
A0310112
PPTX
clustering ppt.pptx
PPT
My8clst
PDF
Paper id 26201478
PDF
Du35687693
PPT
Dataa miining
PDF
47 292-298
PDF
Algorithm for mining cluster and association patterns
PDF
Chapter 5.pdf
PPTX
Data Mining: clustering and analysis
PPTX
Data Mining: clustering and analysis
PPTX
K- means clustering method based Data Mining of Network Shared Resources .pptx
PPTX
K- means clustering method based Data Mining of Network Shared Resources .pptx
PPTX
Hierarchical clustering
PDF
A survey on Efficient Enhanced K-Means Clustering Algorithm
PDF
CLUSTERING IN DATA MINING.pdf
PDF
Hierarchical clustering
PDF
Data Science - Part VII - Cluster Analysis
PPTX
Unsupervised Learning.pptx
PPT
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
A0310112
clustering ppt.pptx
My8clst
Paper id 26201478
Du35687693
Dataa miining
47 292-298
Algorithm for mining cluster and association patterns
Chapter 5.pdf
Data Mining: clustering and analysis
Data Mining: clustering and analysis
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
Hierarchical clustering
A survey on Efficient Enhanced K-Means Clustering Algorithm
CLUSTERING IN DATA MINING.pdf
Hierarchical clustering
Data Science - Part VII - Cluster Analysis
Unsupervised Learning.pptx
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Ad

More from rajshreemuthiah (20)

PPTX
PPTX
PPTX
PPTX
polymorphism
PPTX
solutions and understanding text analytics
PPTX
interface
PPTX
Testing &ampdebugging
PPTX
concurrency control
PPTX
Education
PPTX
Formal verification
PPTX
Transaction management
PPTX
Multi thread
PPTX
System testing
PPTX
software maintenance
PPTX
exception handling
PPTX
e governance
PPTX
recovery management
PPTX
Implementing polymorphism
PPSX
Buffer managements
PPTX
os linux
polymorphism
solutions and understanding text analytics
interface
Testing &ampdebugging
concurrency control
Education
Formal verification
Transaction management
Multi thread
System testing
software maintenance
exception handling
e governance
recovery management
Implementing polymorphism
Buffer managements
os linux

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
Teaching material agriculture food technology
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Spectroscopy.pptx food analysis technology
“AI and Expert System Decision Support & Business Intelligence Systems”

hierarchical methods

  • 2.  While partitioning methods meet the basic clustering requirement of organizing a set of objects into a number of exclusive groups, in some situations we may want to partition our data into groups at different levels such as in a hierarchy.  A hierarchical clustering method works by grouping data objects into a hierarchy or “tree” of clusters.  Representing data objects in the form of a hierarchy is usefull for data summarization and visualization.  The multiple-phase (or multiphase) clustering.
  • 3. I. Agglomerative versus Divisive Hierarchical Clustering:  A hierarchical clustering method can be either agglomerative or divisive, depending on whether the hierarchical decomposition is formed in a bottom-up or top-down fashion.  An agglomerative hierarchical clustering method uses a bottom-up strategy. The single cluster becomes the hierarchy`s root.  A divisive hierarchical clustering method employs a top-down strategy. It starts by placing all objects in one cluster, which is the hierarchy`s root.
  • 4.  This is a single-linkage approach in that each cluster is represented by all the objects in the cluster, and the similarity between two clusters is measured by the similarity of the closest pair of data points belonging to different cluster.  The cluster-splitting process repeats until, eventually, each new cluster contains only a single objects.  A tree structure called a dendrogram is commonly used to represent the process of hierarchical clustering.
  • 5. II. Distance Measures in Algorithmic Methods:  whether using an agglomerative method or a divisive method, a core need is to measure the distance between two clusters, where each cluster is generally a set of objects.  Four widely used measures for distance between clusters, where |p-p`| is the distance between two objects or points, p and p`; mi is the mean for cluster, Ci and ni is the number of objects in Ci.  They are also known as linkage measures.
  • 6.  Minimum distance: distmin(Ci, Cj)= min {|p-p`|} pCᵢ;p`Cj  Maximum distance: distmax(Ci, Cj)= min {|p-p`|} pCᵢ;p`Cj  Mean distance: distmin(Ci, Cj)=  Mᵢ- Mj  Average distance: distmin(Ci, Cj)= 1/ nᵢnj p - p` pCᵢ;p`Cj
  • 7. III. BIRCH: Multiphase Hierarchical Clustering Using Clustering Feature Trees  Balanced Iterative Reducing and Clustering using Hierarchical (BIRCH) is designed for clustering a large amount of numeric data by integrating hierarchical clustering(at the initial micro clustering stage) and other clustering methods such as iterative partitioning(at the macro clustering stage).  The two difficulties in agglomerative clustering methods: Scalability Inability
  • 8.  BIRCH uses the notions of clustering feature to summarize a cluster, and clustering feature tree(CF-tree) to represent a cluster hierarchy.  Consider a cluster of n d-dimensional data objects or points.  A clustering feature is essentially a summary of the statistics for the given cluster. The cluster`s centroid, xₒ, radius, R, and diameter, D, are
  • 9.  Phase 1: BIRCH scans the database to build an initial in-memory CF- tree, which can be viewed as a multilevel compression of the data that tries to preserve the data`s inherent clustering structure.  Phase 2: BIRCH applies a (selected) clustering algorithm to cluster the leaf nodes of the CF-tree, which removes sparse clusters and groups dense cluster into larger ones.
  • 10. IV. Chameleon: Multiphase Hierarchical Clustering Using Dynamic Modeling  Chameleon is a hierarchical clustering algorithm that uses dynamic modeling to determine the similarity between pairs of clusters.  The connected objects are within a clusters and the proximity of clusters.  That is, two clusters are merged if their interconnectivity high and they are close together.  Chameleon uses a k-nearest-neighbor graph approach to construct a sparse graph.
  • 11.  Chameleon uses a graph partitioning algorithm to partition the k-nearest-neighbor graph into a large number of relatively small subclusters such that it minimizes the edge cut.  Chameleon determines the similarity between each pair of clusters Cᵢ and Cj according to their relative interconnectivity, RI(Cᵢ,Cj), and their relative closeness, RC(Ci,Cj).  The processing cost for high-dimensional data may require O(n²) time for n objects in the worst case.
  • 12. V. Probabilistic Hierarchical Clustering  Algorithmic hierarchical clustering methods using linkage measures tend to be easy to understand and are often efficient in clustering.  They are commonly used in many clustering analysis applications.  Algorithmic hierarchical clustering methods can suffer from several drawbacks.  One way to look at the clustering problem is to regard the set of data objects to be clustered as sample of the underlying data generation mechanism to be analyzed or, formally, the generative model.