Clustering part 1

Clustering Part 1
Abdul Kawsar Tushar
Nadeem Ahmed
CSE, UAP

What is Clustering
• visualization of data
• hypothesis generation

Overview of Clustering
• Feature Selection
• Feature Extraction
• transformations of the input features to produce
new salient features.
• Inter-pattern Similarity
• Grouping

Formal Definition
• Clustering is the classification of objects into different
groups, or more precisely, the partitioning of a data set into
subsets (clusters), so that the data in each subset (ideally)
share some common trait - often according to some defined
distance measure.

Notion of a Cluster can be Ambiguous
How many clusters?
Four ClustersTwo Clusters
Six Clusters

Hierarchical Clustering: Example

Hierarchical Clustering: Example Using Single
Linkage

Hierarchical Clustering: Forming Clusters
• Forming clusters from dendograms

Hierarchical Clustering
• Advantages
• Dendograms are great for visualization
• Provides hierarchical relations between clusters
• Shown to be able to capture concentric clusters
• Disadvantages
• Not easy to define levels for clusters
• Experiments showed that other clustering techniques outperform hierarchical
clustering

How to Define Inter-Cluster Similarity
Similarity?
 Single Link
 Complete Link
 Average Link

How to Define Inter-Cluster Similarity
 Single Link
 Complete Link
 Average Link

Common Similarity Measures
• Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is given by:
2. The Manhattan distance (also called taxicab norm or 1-norm) is
given by:

A Simple example showing the implementation of k-
means algorithm
(using K=2)

Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).

Step 2:
• Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
• Their new centroids are:

Step 3:
• Now using these centroids we
compute the Euclidean
distance of each object, as
shown in table.
• Therefore, the new clusters
are:
{1,2} and {3,4,5,6,7}
• Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)

• Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
• Therefore, there is no change
in the cluster.
• Thus, the algorithm comes to
a halt here and final result
consist of 2 clusters {1,2} and
{3,4,5,6,7}.

Two different K-means Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Sub-optimal Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Optimal Clustering
Original Points

Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6

Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5

Getting Stuck In A Local Minimum

Can k-means Handle Non-spherical Clusters?

Let’s Try Single Linkage Hierarchical Clustering

K-means with Polar Coordinates

Clustering part 1

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Clustering part 1 (20)

Recently uploaded (20)

Clustering part 1