SlideShare a Scribd company logo
Improving the accuracy
         of
  K-means clustering
      algorithm
           Kasun Ranga Wijeweera
          (krw19870829@gmail.com)
This presentation is based on the
   following research paper

   K. A. Abdul Nazeer, M. P. Sebastian, Improving
     the Accuracy and Efficiency of the k-means
   Clustering Algorithm, Proceedings of the World
     Congress on Engineering 2009 Vol I, WCE
        2009, July 1 – 3, 2009, London, U. K.
Consider a Set of Data Points,




And a Set of Clusters,
The Goal,
Algorithm k-means
1.Randomly choose K data items from X as initial
centroids.
2.Repeat
    Assign each data point to the cluster which has
   the closest centroid.
    Calculate new cluster centroids.
   Until the convergence criteria is met.
K-means gets stuck in a local
         optima
Algorithm selection of initial centroids
1. Set m = 1;
2. Compute the distance between each data point and all
   other data points in the set;
3. Find the closest pair of data points from the set X and
   form a data point set A[m] (1 <= m <= K) which
   contains these two data points. Delete these two data
   points from the set;
4. Find the data point in X that is closest to the data
   points set. Add it to A[m] and delete it from X;
5. Repeat step 4 until the number of data points in A[m]
   reaches 0.75*(n/k);
Algorithm selection of initial centroids
continued…
6. If m < k then m = m + 1, find another pair of data
   points from X between which the distance is the
   shortest, form another data point set A[m] and delete
   them from X. Go to step 4;
7. For each data point set A[m] (1 <= m <= K) find the
   arithmetic mean of the vectors of data points in A[m].
   These means will be the initial centroids.
Any Questions ?
Thanks for your attention !

More Related Content

PPTX
Unsupervised Learning
PPTX
K-means Clustering with Scikit-Learn
PPTX
05 k-means clustering
PDF
K-Means, its Variants and its Applications
PPTX
PPTX
Clustering techniques
PDF
Principal Component Analysis(PCA) understanding document
Unsupervised Learning
K-means Clustering with Scikit-Learn
05 k-means clustering
K-Means, its Variants and its Applications
Clustering techniques
Principal Component Analysis(PCA) understanding document

What's hot (20)

PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
PDF
Principal component analysis and lda
PPTX
Pillar k means
PPTX
Implement principal component analysis (PCA) in python from scratch
PPTX
presentation 2019 04_09_rev1
PPTX
PPSX
PPTX
Lect5 principal component analysis
PPTX
Lect4 principal component analysis-I
PPTX
Machine learning clustering
PDF
Principal Component Analysis
PDF
Graph Based Clustering
PPT
Lecture6 pca
PPTX
PCA (Principal component analysis) Theory and Toolkits
PDF
Pca analysis
PDF
Principal component analysis - application in finance
PDF
Classifying hot water chemistry: Application of multivariate statistics
PPTX
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
PPTX
KNN Algorithm using C++
PDF
Principal Component Analysis and Clustering
Principal Component Analysis (PCA) and LDA PPT Slides
Principal component analysis and lda
Pillar k means
Implement principal component analysis (PCA) in python from scratch
presentation 2019 04_09_rev1
Lect5 principal component analysis
Lect4 principal component analysis-I
Machine learning clustering
Principal Component Analysis
Graph Based Clustering
Lecture6 pca
PCA (Principal component analysis) Theory and Toolkits
Pca analysis
Principal component analysis - application in finance
Classifying hot water chemistry: Application of multivariate statistics
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
KNN Algorithm using C++
Principal Component Analysis and Clustering
Ad

Similar to Improved k-means (20)

PDF
The International Journal of Engineering and Science (The IJES)
PDF
Optimising Data Using K-Means Clustering Algorithm
PPT
Enhance The K Means Algorithm On Spatial Dataset
PPTX
K means clustering algorithm
PPTX
Scalable k-means plus plus
PDF
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
PDF
An improvement in k mean clustering algorithm using better time and accuracy
PDF
Lecture 03 ❘ Statistics & Linear Algebra.pdf
PPTX
K means clustering | K Means ++
PPTX
k-mean medoid and-knn-algorithm problems.pptx
PDF
Bb25322324
PDF
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
PDF
Rough K Means - Numerical Example
PPTX
K-Means manual work
PPT
K mean-clustering
PPT
K mean-clustering algorithm
PPT
k-mean-clustering (1) clustering topic explanation
PDF
k-means clustering Machine Learning.pdf
PPT
Unsupervised Machine Learning, Clustering, K-Means
The International Journal of Engineering and Science (The IJES)
Optimising Data Using K-Means Clustering Algorithm
Enhance The K Means Algorithm On Spatial Dataset
K means clustering algorithm
Scalable k-means plus plus
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
An improvement in k mean clustering algorithm using better time and accuracy
Lecture 03 ❘ Statistics & Linear Algebra.pdf
K means clustering | K Means ++
k-mean medoid and-knn-algorithm problems.pptx
Bb25322324
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
Rough K Means - Numerical Example
K-Means manual work
K mean-clustering
K mean-clustering algorithm
k-mean-clustering (1) clustering topic explanation
k-means clustering Machine Learning.pdf
Unsupervised Machine Learning, Clustering, K-Means
Ad

More from Kasun Ranga Wijeweera (20)

PDF
Decorator Design Pattern in C#
PDF
Singleton Design Pattern in C#
PDF
Introduction to Design Patterns
PPTX
Algorithms for Convex Partitioning of a Polygon
PDF
Geometric Transformations II
PDF
Geometric Transformations I
PDF
Introduction to Polygons
PDF
Bresenham Line Drawing Algorithm
PDF
Digital Differential Analyzer Line Drawing Algorithm
PDF
Loops in Visual Basic: Exercises
PDF
Conditional Logic: Exercises
PDF
Getting Started with Visual Basic Programming
PDF
CheckBoxes and RadioButtons
PDF
Variables in Visual Basic Programming
PDF
Loops in Visual Basic Programming
PDF
Conditional Logic in Visual Basic Programming
PDF
Assignment for Variables
PDF
Assignment for Factory Method Design Pattern in C# [ANSWERS]
PDF
Assignment for Events
PDF
Mastering Arrays Assignment
Decorator Design Pattern in C#
Singleton Design Pattern in C#
Introduction to Design Patterns
Algorithms for Convex Partitioning of a Polygon
Geometric Transformations II
Geometric Transformations I
Introduction to Polygons
Bresenham Line Drawing Algorithm
Digital Differential Analyzer Line Drawing Algorithm
Loops in Visual Basic: Exercises
Conditional Logic: Exercises
Getting Started with Visual Basic Programming
CheckBoxes and RadioButtons
Variables in Visual Basic Programming
Loops in Visual Basic Programming
Conditional Logic in Visual Basic Programming
Assignment for Variables
Assignment for Factory Method Design Pattern in C# [ANSWERS]
Assignment for Events
Mastering Arrays Assignment

Improved k-means

  • 1. Improving the accuracy of K-means clustering algorithm Kasun Ranga Wijeweera (krw19870829@gmail.com)
  • 2. This presentation is based on the following research paper K. A. Abdul Nazeer, M. P. Sebastian, Improving the Accuracy and Efficiency of the k-means Clustering Algorithm, Proceedings of the World Congress on Engineering 2009 Vol I, WCE 2009, July 1 – 3, 2009, London, U. K.
  • 3. Consider a Set of Data Points, And a Set of Clusters,
  • 5. Algorithm k-means 1.Randomly choose K data items from X as initial centroids. 2.Repeat  Assign each data point to the cluster which has the closest centroid.  Calculate new cluster centroids. Until the convergence criteria is met.
  • 6. K-means gets stuck in a local optima
  • 7. Algorithm selection of initial centroids 1. Set m = 1; 2. Compute the distance between each data point and all other data points in the set; 3. Find the closest pair of data points from the set X and form a data point set A[m] (1 <= m <= K) which contains these two data points. Delete these two data points from the set; 4. Find the data point in X that is closest to the data points set. Add it to A[m] and delete it from X; 5. Repeat step 4 until the number of data points in A[m] reaches 0.75*(n/k);
  • 8. Algorithm selection of initial centroids continued… 6. If m < k then m = m + 1, find another pair of data points from X between which the distance is the shortest, form another data point set A[m] and delete them from X. Go to step 4; 7. For each data point set A[m] (1 <= m <= K) find the arithmetic mean of the vectors of data points in A[m]. These means will be the initial centroids.
  • 10. Thanks for your attention !