K means clustering | K Means ++

Presentation Outline:
1. What is K-means Clustering
2. Limitation of K-means
3. K-means Algorithm
4. Example of k-means++ clustering
5. Initialization with k-means++
6. K- means++ Visualized
Page-2

What is K-means Clustering:
K-means clustering is a type of unsupervised learning, which is used when you
have unlabeled data (i.e., data without defined categories or groups). The goal of
this algorithm is to find groups in the data, with the number of groups represented
by the variable K. The algorithm works iteratively to assign each data point to one
of K groups based on the features that are provided. Data points are clustered
based on feature similarity. The results of the K-means clustering algorithm are:
1.The centroids of the K clusters, which can be used to label new data
2.Labels for the training data (each data point is assigned to a single cluster)
Page-3

Limitations of K-means:
1. Clusters of different size
2. Clusters of different density
3. Clusters of non-globular shape
4. Sensitive to initialization
Page-4

K-means Algorithm:
The exact algorithm is as follows:
1. Choose one center uniformly at random from among the data points.
2. For each data point x, compute D(x), the distance between x and the nearest center that
has already been chosen.
3. Choose one new data point at random as a new center, using a weighted probability
distribution where a point x is chosen with probability proportional to D(x)2.
4. Repeat Steps 2 and 3 until k centers have been chosen.
5. Now that the initial centers have been chosen, proceed using standard K-means
clustering.
Page-5

Here, K= 3
Data Sets= A1(2,8) , A2(6,9) , A3(4,8) , A4(8,4) , A5(4,9) , A6(5,8) , A7(3,4)
A1
A2
A3
A4
A5
A6
A7
C1
C2
C3
Figure: Data points
Page-6

Date Sets
A1(2,8)
A2(6,9)
A3(4,8)
A4(8,4)
A5(4,9)
A6(5,8)
A7(3,4)
Let,
M1 = A1(2,8)
M2 = A5(4,9)
M3 = A7(3,4)
Calculation:
|X2-X1| + |Y2-Y1|
Page-7

Table:
Data Points K1
M1= A1(2,8)
K2
M2 = A5(4,9)
K3
M3 = A7(3,4)
New Cluster
A1(2,8) 0 3 7 1
A2(6,9) 5 2 8 2
A3(4,8) 2 1 5 2
A4(8,4) 10 9 5 3
A5(4,9) 3 0 6 2
A6(5,8) 3 2 6 2
A7(3,4) 5 6 0 3
Page-8

Table-2:
Data Points K1
M1= A1(2,8)
K2
M2 = A5(4,8)
K3
M3 = A7(5,4)
New Cluster
A1(2,8) 0 2 7 1
A2(6,9) 5 3 6 2
A3(4,8) 2 0 5 2
A4(8,4) 10 8 3 3
A5(4,9) 3 1 6 2
A6(5,8) 3 1 4 2
A7(3,4) 5 5 1 3
Page-9

Initialization with k-means++
1. Choose first cluster center uniformly at random from data points.
2. For each obs x, compute distance d(x) to nearest cluster center
3. Choose new cluster center from amongst data points, with probability of
X being chosen proportional to d(X)2
4. Repeat steps 2 and 3 until k centers have been chosen.
Page-10

K- means++ Visualized
Figure: Data points
Page-11

Thank you for being
with me up to now.
Page-12

K means clustering | K Means ++

More Related Content

What's hot (20)

Similar to K means clustering | K Means ++ (20)

Recently uploaded (20)

K means clustering | K Means ++

Editor's Notes