Clustering

What is Clustering??
▪ Grouping of objects
How will you group these together??

Option 1: By Type Option 2: By Color

Option 3: By Shape

What is Cluster Analysis??
▪ A cluster is a collection of data objects that are similar to one another
within the same cluster and are dissimilar to the objects in other clusters.
▪ Cluster analysis has been extensively focused mainly on distance-based
cluster analysis.
The process of grouping a set of physical or abstract objects into classes of
similar objects is called as Clustering.

▪ How clustering differs from classification???

▪ Clustering is also called data segmentation
▪ Clustering is finding borders between groups,
▪ Segmenting is using borders to form groups
▪ Clustering is the method of creating segments.
▪ Clustering can also be used for outlier detection

▪ Classification: Supervised Learning
▪ Classes are predetermined
▪ Based on training data set
▪ Used to classify future observations
▪ Clustering : Unsupervised Learning
▪ Classes are not known in advance
▪ No prior knowledge
▪ Used to explore (understand) the data
▪ Clustering is a form of learning by observation, rather than learning by
examples.

Applications of Clustering
▪ Marketing:
▪ Segmentation of the customer based on behavior
▪ Banking:
▪ ATM Fraud detection (outlier detection)
▪ Gene analysis:
▪ Identifying gene responsible for a disease
▪ Image processing:
▪ Identifying objects on an image (face detection)
▪ Houses:
▪Identifying groups of houses according to their house type, value, and geographical location

Requirements of Clustering Analysis
▪ The following are typical requirements of clustering in data mining:
▪ Scalability
▪ Dealing with different types of attributes
▪ Discovering clusters with arbitrary shapes
▪ Ability to deal with noisy data
▪ Minimal requirements for domain knowledge to determine input parameters
▪ Incremental clustering
▪ High dimensionality
▪ Constraint-based clustering
▪ Interpretability and usability

Distance Measures
▪ Cluster analysis has been extensively focused mainly on distance-based
cluster analysis
▪ Distance is defined as the quantitative measure of how far apart two objects are.
▪ The similarity measure is the measure of how much alike two data objects
are.
▪ If the distance is small, the features are having a high degree of similarity.
▪ Whereas a large distance will be a low degree of similarity.
▪ Generally, similarity are measured in the range 0 to 1 [0,1].
▪ Similarity = 1 if X = Y (Where X, Y are two objects)
▪ Similarity = 0 if X ≠ Y

Distance
Measures Euclidean Distance
Manhattan Distance
Minkowski Distance
Cosine Similarity
Jaccard Similarity

Distance
Measures
𝑫 𝑿, 𝒀 = 𝒙𝟐 − 𝒙𝟏
𝟐 + 𝒚𝟐 − 𝒚𝟏
𝟐
• The Euclidean distance between two points is the length of the
path connecting them.
• The Pythagorean theorem gives this distance between two points.

Distance
Measures
𝑫 𝑨, 𝑩 = 𝒙𝟐 − 𝒙𝟏 + 𝒚𝟐 − 𝒚𝟏
• Manhattan distance is a metric in which the distance between
two points is calculated as the sum of the absolute differences
of their Cartesian coordinates.
• It is the total sum of the difference between the x-coordinates
and y-coordinates.

Distance
Measures
𝑫 𝑿, 𝒀 = ෍
𝒊=𝟏
𝒏
|𝒙𝒊 − 𝒚𝒊|𝒑
ൗ
𝟏
𝒑
=
𝒑
෍
𝒊=𝟏
𝒏
|𝒙𝒊 − 𝒚𝒊|𝒑
• It is the generalized form of the Euclidean and Manhattan Distance
Measure.

Distance
Measures
• The cosine similarity metric finds the normalized dot
product of the two attributes.
• By determining the cosine similarity, we would
effectively try to find the cosine of the angle between
the two objects.
• The cosine of 0° is 1, and it is less than 1 for any other
angle.

Distance
Measures
• When we consider Jaccard similarity these objects will
be sets.
| 𝑨 ∪ 𝑩 | = 7
| 𝑨 ∩ 𝑩 | = 2
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑱 𝑨, 𝑩 =
𝐴 ∩ 𝐵
𝐴 ∪ 𝐵
=
2
7
= 0.286

Clustering Techniques
▪ Clustering techniques are categorized in following categories
Partitioning Methods
Hierarchical Methods
Density-based Methods
Grid-based Methods
Model-based Methods

Partitioning Method
▪ Construct a partition of a database 𝑫 of 𝒏 objects into 𝒌 clusters
▪ each cluster contains at least one object
▪ each object belongs to exactly one cluster
▪ Given a 𝒌, find a partition of 𝒌 clusters that optimizes the chosen
partitioning criterion (min distance from cluster centers)
▪ Global optimal: exhaustively enumerate all partitions Stirling(n,k)
(S(10,3) = 9.330, S(20,3) = 580.606.446,…)
▪ Heuristic methods: k-means and k-medoids algorithms
▪ k-means: Each cluster is represented by the center of the cluster.
▪ k-medoids or PAM (Partition around medoids): Each cluster is represented by one of
the objects in the cluster.

𝑘-means Clustering
Input:
𝒌 clusters, 𝒏 objects of database 𝑫.
Output:
Set of 𝒌 clusters minimizing squared error function
Algorithm:
1. Arbitrarily choose 𝒌 objects from 𝑫 as the initial cluster centers;
2. Repeat
1. (Re)assign each object to the cluster to which the object is the most similar, based on
the mean value of the objects in the cluster;
2. Update the cluster means, i.e., calculate the mean value of the objects for each cluster;
3. Until no change;

Example: Cluster the following data example into 3 clusters using k-means clustering and Euclidean
distance
Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3

1. Choose arbitrary 3 points as cluster centers
Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)

2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster
centers. And choose closest one.
Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
𝐷1 𝑃1, 𝐶1 = 2 − 2 2 + 1 − 5 2 = 16 = 4
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
𝑫 = 𝒙𝟐 − 𝒙𝟏
𝟐 + 𝒚𝟐 − 𝒚𝟏
𝟐 … . 𝑬𝒖𝒄𝒍𝒊𝒅𝒆𝒂𝒏 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆
𝐷1 𝑃1, 𝐶2 = 4 − 2 2 + 4 − 5 2 = 5 = 2.236
𝐷1 𝑃1, 𝐶3 = 2 − 2 2 + 3 − 5 2 = 4 = 2
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
𝐷2 𝑃2, 𝐶1 = 2 − 2 2 + 1 − 1 2 = 0
𝐷2 𝑃2, 𝐶2 = 4 − 2 2 + 4 − 1 2 = 13 = 3.605
𝐷2 𝑃2, 𝐶3 = 2 − 2 2 + 3 − 1 2 = 4 = 2
Similarly, assign other points to appropriate cluster.
Cluster1 = { }
Cluster2 = { }
Cluster3 = {(2,5)}
Cluster1 = { }
Cluster2 = { }
Cluster3 = { }

Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
Cluster1 = { }
Cluster2 = { }
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = {(7,1)}
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = {(4,4),(7,1)}
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = {(4,4),(7,1), (3,5)}
Cluster3 = {(2,5)}
Cluster1 = {(2,1), (1,2)}
Cluster2 = {(4,4),(7,1), (3,5), (6,2), (6,1), (3,4)}
Cluster3 = {(2,3),(2,5)}

Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3

3. Update the cluster means
Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
Old Cluster Centers:
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
Clusters:
Cluster1 = {(2,1), (1,2), }
Cluster2 = {(4,4),(7,1), (3,5), (6,2), (6,1), (3,4)}
Cluster3 = {(2,3),(2,5)}
Calculate the mean of the points in each cluster
𝑚𝑒𝑎𝑛1 =
2+1
2
,
1+2
2
𝑚𝑒𝑎𝑛2 =
4+7+3+6+6+3
6
,
4+1+5+2+1+4
6
𝑚𝑒𝑎𝑛3 =
2+2
2
,
3+5
2
New Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)

2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
𝐷1 𝑃1, 𝐶1 = 1.5 − 2 2 + 1.5 − 5 2 = 3.535
Updated Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
𝐷1 𝑃1, 𝐶2 = 4.83 − 2 2 + 2.83 − 5 2 = 3.566
𝐷1 𝑃1, 𝐶3 = 2 − 2 2 + 4 − 5 2 = 1
Cluster1 = { }
Cluster2 = { }
Cluster3 = {(2,5)}
𝐷2 𝑃2, 𝐶1 = 1.5 − 2 2 + 1.5 − 1 2 = 0.707
𝐷2 𝑃2, 𝐶2 = 4.83 − 2 2 + 2.83 − 1 2 = 13 = 3.3701
𝐷2 𝑃2, 𝐶3 = 2 − 2 2 + 4 − 1 2 = 3
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
Similarly, assign other points to appropriate cluster.

Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
Updated Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
Updated Clusters
Cluster1 = {(2,1), (1,2) }
Cluster2 = {(7,1), (4,4), (6,2), (6,1)}
Cluster3 = {(3,5), (2,5), (3,4), (2,3)}
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}

Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3

Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
Updated Clusters
Cluster1 = {(2,1), (1,2) }
Cluster2 = {(7,1), (4,4), (6,2), (6,1)}
Cluster3 = {(3,5), (2,5), (3,4), (2,3)}
3. Update Cluster centers by repeating the process until there is no
change in clusters
C1 = (1.5, 1.5)
C2 = (5.75, 2)
C3 = (2.5, 4.25)

Point X Y
P1 2 5
P2 2 1
P3 7 1
P4 3 5
P5 4 4
P6 6 2
P7 1 2
P8 6 1
P9 3 4
P10 2 3
C1 = (1.5, 1.5)
C2 = (5.75, 2)
C3 = (2.5, 4.25)
Updated Clusters
Cluster1 = {(2,1), (1,2) }
Cluster2 = {(7,1), (6,2), (6,1)}
Cluster3 = {(3,5), (2,5), (4,4), (3,4), (2,3)}
3. Update Cluster centers by repeating the process until there is no
change in clusters
C1 = (1.5, 1.5)
C2 = (6.33, 1.33)
C3 = (2.8, 4.2)

Apply k-means algorithm for the following data set with two
clusters.
D={15, 16, 19, 20, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}

▪ Advantages:
▪ Relatively scalable and efficient in processing large data sets
▪ The computational complexity of the algorithm is 𝑂 𝑛𝑘𝑡
▪ where 𝑛 is the total number of objects, 𝑘 is the number of clusters, and 𝑡 is the number of iterations
▪ This method terminates at a local optimum.
▪Disadvantages:
▪ Can be applied only when the mean of a cluster is defined
▪ The necessity for users to specify 𝑘, the number of clusters, in advance.
▪ Sensitive to noise and outlier data points

▪ How to cluster categorical data?
▪ Variant of 𝑘-means is used for clustering categorical data: 𝑘-modes Method
▪ Replace mean of cluster with mode of data
▪ A new dissimilarity measures to deal with categorical objects
▪ A frequency-based method to update modes of clusters.

𝑘-Medoids Clustering
▪ Picks actual objects to represent the clusters, using one representative object per
cluster
▪ Each remaining object is clustered with the representative object to which it is the
most similar.
▪ Partitioning method is then performed based on the principle of minimizing the
sum of the dissimilarities between each object and its corresponding reference
point
▪ Absolute Error criterion is used
𝐸 = ෍
𝑗=1
𝑘
෍
𝑝∈𝑐𝑗
𝑑𝑖𝑠𝑡(𝑝, 𝑜𝑗)
Where
• 𝑝 is the point in space representing
a given object in cluster 𝑐𝑗
• 𝑂𝑗 is the representative object of
cluster 𝑐𝑗
Sum of absolute error

▪ The iterative process of replacing representative objects by nonrepresentative objects
continues as long as the quality of the resulting clustering is improved.
▪ Quality is measured by a cost function that measures the average dissimilarity between an
object and the representative object of its cluster.
▪ Four cases are examined for each of the nonrepresentative objects, 𝑝.
▪ Suppose, object 𝒑 is currently assigned to a cluster represented by medoid 𝑶𝒋
𝒑
𝑶𝒊
𝑶𝒋
𝑶𝒓𝒂𝒏𝒅𝒐𝒎
𝒑
𝑶𝒊
𝑶𝒋
𝒑
𝑶𝒊
𝑶𝒋
𝒑
𝑶𝒊
𝑶𝒋
Case 1 Case 2 Case 3 Case 4
Before Swapping After Swapping

▪ Each time a reassignment occurs, a difference in absolute error, 𝐸, is
contributed to the cost function.
▪ Therefore, the cost function calculates the difference in absolute-error value if
a current representative object is replaced by a nonrepresentative object.
▪ The total cost of swapping is the sum of costs incurred by all nonrepresentative
objects.
▪ If the total cost is negative, then 𝑂𝑗 is replaced or swapped with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚
▪ If the total cost is positive, the current representative object, 𝑂𝑗, is considered acceptable, and
nothing is changed.
▪ PAM(Partitioning Around Medoids) was one of the first k-medoids algorithms

Input: 𝑘 number of clusters, 𝑛 data objects from data set 𝐷
Output: a set of 𝑘 clusters
Algorithm:
1. Arbitrarily select 𝑘 objects as the representative objects or seeds
2. Repeat
1. Assign each remaining objects to the cluster with the nearest representative object
2. Randomly select the nonrepresentative object 𝑂𝑟𝑎𝑛𝑑𝑜𝑚
3. Compute the total cost 𝑆 of swapping 𝑂𝑗 with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚
4. If 𝑆 < 0, then swap 𝑂𝑗 with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚 to form the new set of 𝑘 representative objects
3. Until no change

X Y
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5
O10 7 6
Data Objects
Aim: Create two Clusters
Step 1:
Choose randomly two medoids
(representative objects)
𝑂3 = 3,8
𝑂8 = (7,4)

X Y Cluster
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5
O10 7 6
Data Objects
Step 2:
Assign each object to the closest
representative object
Using Euclidean distance, we
form following clusters

Data Objects
X Y Cluster
O1 2 6 C1
O2 3 4 C1
O3 3 8 C1
O4 4 7 C1
O5 6 2 C2
O6 6 4 C2
O7 7 3 C2
O8 7 4 C2
O9 8 5 C2
O10 7 6 C2
Step 2:
Assign each object to the closest
representative object
Using Euclidean distance, we
form following clusters
C1={O1, O2, O3, O4}
C2={O5, O6, O7, O8, O9, O10}

Data Objects
X Y Cluster
O1 2 6 C1
O2 3 4 C1
O3 3 8 C1
O4 4 7 C1
O5 6 2 C2
O6 6 4 C2
O7 7 3 C2
O8 7 4 C2
O9 8 5 C2
O10 7 6 C2
Step 3:
Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂8)
𝐸 = ෍
𝑗=1
𝑘
෍
𝑝∈𝑐𝑗
𝑝 − 𝑂𝑗
𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑
+ 𝑶𝟓 − 𝑶𝟖 + 𝑶𝟔 − 𝑶𝟖 + 𝑶𝟕 − 𝑶𝟖 + 𝑶𝟖 − 𝑶𝟖 + 𝑶𝟗 − 𝑶𝟖 + 𝑶𝟏𝟎 − 𝑶𝟖
𝑶𝟏 − 𝑶𝟑 = 𝒙𝟏 − 𝒙𝟑 + 𝒚𝟏 − 𝒚𝟑 . . . . Manhattan Distance

Data Objects
X Y Cluster
O1 2 6 C1
O2 3 4 C1
O3 3 8 C1
O4 4 7 C1
O5 6 2 C2
O6 6 4 C2
O7 7 3 C2
O8 7 4 C2
O9 8 5 C2
O10 7 6 C2
Step 3:
Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂8)
𝐸 = ෍
𝑗=1
𝑘
෍
𝑝∈𝑐𝑗
𝑝 − 𝑂𝑗
𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑
+ 𝑶𝟓 − 𝑶𝟖 + 𝑶𝟔 − 𝑶𝟖 + 𝑶𝟕 − 𝑶𝟖 + 𝑶𝟖 − 𝑶𝟖 + 𝑶𝟗 − 𝑶𝟖 + 𝑶𝟏𝟎 − 𝑶𝟖
𝑬 = 𝟑 + 𝟒 + 𝟎 + 𝟐 + 𝟑 + 𝟏 + 𝟏 + 𝟎 + 𝟐 + 𝟐
𝑬 = 𝟏𝟖

Data Objects
X Y Cluster
O1 2 6 C1
O2 3 4 C1
O3 3 8 C1
O4 4 7 C1
O5 6 2 C2
O6 6 4 C2
O7 7 3 C2
O8 7 4 C2
O9 8 5 C2
O10 7 6 C2
Step 4:
Choose a random object 𝑂9
Swap 𝑂8 and 𝑂9
Compute the absolute error (for
the set of representative objects
𝑂3 and 𝑂9)

Data Objects
X Y Cluster
O1 2 6 C1
O2 3 4 C1
O3 3 8 C1
O4 4 7 C1
O5 6 2 C2
O6 6 4 C2
O7 7 3 C2
O8 7 4 C2
O9 8 5 C2
O10 7 6 C2
𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑
+ 𝑶𝟓 − 𝑶𝟗 + 𝑶𝟔 − 𝑶𝟗 + 𝑶𝟕 − 𝑶𝟗 + 𝑶𝟖 − 𝑶𝟗 + 𝑶𝟗 − 𝑶𝟗 + 𝑶𝟏𝟎 − 𝑶𝟗
𝑬 = 𝟑 + 𝟒 + 𝟎 + 𝟐 + (𝟓 + 𝟑 + 𝟑 + 𝟐 + 𝟎 + 𝟐)
𝑬 = 𝟐𝟒
Step 5:
Compute the cost function
𝑆 = 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑶𝟑, 𝑶𝟖 − 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑶𝟑, 𝑶𝟗
𝑆 = 18 − 24 = −6
As 𝑆 < 0, we swap 𝑶𝟖 with 𝑶𝟗

Data Objects
X Y Cluster
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5
O10 7 6
Step 6:
New medoids are 𝑶𝟑 with 𝑶𝟗
Repeat Step 2
Assign each object to the
closest representative object.
X Y Cluster
O1 2 6 C1
O2 3 4 C1
O3 3 8 C1
O4 4 7 C1
O5 6 2 C2
O6 6 4 C2
O7 7 3 C2
O8 7 4 C2
O9 8 5 C2
O10 7 6 C2

▪ Which method is more robust 𝑘-Means or 𝑘-Medoids?
▪ The k-medoids method is more robust than k-means in the presence of noise and outliers,
because a medoid is less influenced by outliers or other extreme values than a mean.
▪ The processing of 𝑘-Medoids is more costly than the k-means method.

Hierarchical Clustering
▪ Groups data objects into a tree of clusters.
Hierarchical
Clustering
Methods
Agglomerative Divisive

▪ Agglomerative Hierarchical Clustering
▪ Starts by placing each object in its own cluster
▪ Merges these atomic clusters into larger and larger clusters
▪ It will halt when all of the objects are in a single cluster or until certain termination
conditions are satisfied.
▪ Bottom-Up Strategy.
▪ The user can specify the desired number of clusters as a termination condition.

A B F C D E G
AB CD
ABF CDE
ABFCDEG
CDEG
Step 0
Step 1
Step 2
Step 3
Step 4
Application of Agglomerative NESting
(AGNES) Hierarchical Clustering

▪ Divisive Hierarchical Clustering Method
▪ Starting with all objects in one cluster.
▪ Subdivides the cluster into smaller and smaller pieces.
▪ It will halt when each object forms a cluster on its own or until it satisfies certain termination
conditions
▪ Top-Down Strategy
▪ The user can specify the desired number of clusters as a termination condition.

A B F C D E G
AB CD
ABF CDE
ABFCDEG
CDEG
Step 4
Step 3
Step 2
Step 1
Step 0
Application of DIvisive ANAlysis
(DIANA) Hierarchical Clustering

▪ A tree structure called a dendrogram is used to represent the process of
hierarchical clustering.
Fig. Dendrogram representation for hierarchical clustering of data objects {a, b, c, d, e}

▪ Four widely used measures for distance between clusters
▪ 𝒑 − 𝒑′ is distance between two objects 𝑝 and 𝑝′.
▪ 𝒎𝒊 is mean for cluster 𝑪𝒊
▪ 𝒏𝒊 is number of objects in cluster 𝑪𝒊.
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑚𝑖𝑛 𝐶𝑖, 𝐶𝑗 = 𝑚𝑖𝑛𝑝∈𝐶𝑖,𝑝′∈𝐶𝑗
𝑝 − 𝑝′
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑚𝑎𝑥 𝐶𝑖, 𝐶𝑗 = 𝑚𝑎𝑥𝑝∈𝐶𝑖,𝑝′∈𝐶𝑗
𝑝 − 𝑝′
𝑀𝑒𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑚𝑒𝑎𝑛 𝐶𝑖, 𝐶𝑗 = 𝑚𝑖 − 𝑚𝑗
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑎𝑣𝑔 𝐶𝑖, 𝐶𝑗 =
1
𝑛𝑖𝑛𝑗
෍
𝑝∈𝐶𝑖
෍
𝑝′∈𝐶𝑗
𝑝 − 𝑝′

▪ If an algorithm uses minimum distance measure, an algorithm is called a
nearest-neighbor clustering algorithm.
▪If the clustering process is terminated when the minimum distance between
nearest clusters exceeds an arbitrary threshold, it is called a single-linkage
algorithm.
▪ If an algorithm uses maximum distance measure, an algorithm is called a
farthest-neighbor clustering algorithm.
▪ If the clustering process is terminated when the maximum distance between
nearest clusters exceeds an arbitrary threshold, it is called a complete-
linkage algorithm.
▪ An agglomerative hierarchical clustering algorithm that uses the minimum
distance measure is also called a minimal spanning tree algorithm.

Clustering

More Related Content

What's hot (20)

Similar to Clustering (20)

More from Rashmi Bhat (20)

Recently uploaded (20)

Clustering