SlideShare a Scribd company logo
K-Means Clustering Problem
            Ahmad Sabiq
          Febri Maspiyanti
       Indah Kuntum Khairina
          Wiwin Farhania
              Yonatan
What is k-means?
• To partition n objects into k clusters, based on
  attributes.
  – Objects of the same cluster are close their
    attributes are related to each other.
  – Objects of different clusters are far apart their
    attributes are very dissimilar.
Algorithm
• Input: n objects, k (integer k ≤ n)
• Output: k clusters
• Steps:
   1. Select k initial centroids.
   2. Calculate the distance between each object and
      each centroid.
   3. Assign each object to the cluster with the nearest
      centroid.
   4. Recalculate each centroid.
   5. If the centroids don’t change, stop (convergence).
      Otherwise, back to step 2.
• Complexity: O(k.n.d.total_iteration)
Initialization
• Why is it important? What does it affect?
  – Clustering result local optimum!
  – Total iteration / complexity
Good Initialization
3 clusters with 2 iterations…
Bad Initialization
3 clusters with 4 iterations…
Initialization Methods
1.   Random
2.   Forgy
3.   Macqueen
4.   Kaufman
Random
• Algorithm:
  1. Assigns each object to a random cluster.
  2. Computes the initial centroid of each cluster.
Random
Random
Random
9
8
7
6
5
4
3
2
1
0
    0   5   10    15   20   25   30   35
Forgy
• Algorithm:
  1. Chooses k objects at random and uses them as the initial
     centroids.
Forgy
9
8
7
6
5
4
3
2
1
0
    0   5   10   15   20   25   30   35
MacQueen
• Algorithm:
  1. Chooses k objects at random and uses them as the initial
     centroids.
  2. Assign each object to the cluster with the nearest
     centroid.
  3. After each assignment, recalculate the centroid.
MacQueen
9
8
7
6
5
4
3
2
1
0
    0   5   10     15   20   25   30   35
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
                        C=0




d = 24,33

            D = 15,52
Kaufman
          C=0


          C=0   C=0

          C=0




          C=0
Kaufman
                       C=0


                       C=0   C=0

                       C=0



∑C1 = 2,74
                       C=0
Kaufman
                                       ∑C5 = 52,55

                                       ∑C6 = 55,88   ∑C9 = 42,69

                                  ∑C7 = 53,77




∑C1 = 2,74                           ∑C8 = 51,16

         ∑C2 = 12,,21


         ∑C3 = 12,36



        ∑C3 = 8,38
Kaufman
                                       ∑C5 = 52,55

                                       ∑C6 = 55,88   ∑C9 = 42,69

                                  ∑C7 = 53,77




∑C1 = 2,74                           ∑C8 = 51,16

         ∑C2 = 12,,21


         ∑C3 = 12,36



        ∑C3 = 8,38
Reference
1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical
   Comparison of Four Initialization Methods for the K-
   Means Algorithm. Pattern Recognition Letters, vol. 20,
   pp. 1027–1040. 1999.
2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A
   Greedy Randomized Adaptive Search Procedure
   Applied to the Clustering Problem as an Initialization
   Process Using K-Means as a Local Search Procedure.
   Journal of Intelligent and Fuzzy Systems, vol. 12, pp.
   235 – 242. 2002.
3. L. Kaufman and P.J. Rousseeuw. Finding Groups in
   Data: An Introduction to Cluster Analysis. Wiley. 1990.
Questions
1. Kenapa inisialisasi penting pada k-means?
2. Metode inisialisasi apa yang memiliki greedy
   choice property?
3. Jelaskan kompleksitas O(nkd) pada metode
   Random.

More Related Content

PPT
K means Clustering Algorithm
PPTX
K-means clustering algorithm
PDF
MNIST and machine learning - presentation
PPTX
Convolutional neural network from VGG to DenseNet
PPTX
Digital Image Processing
PPT
Thresholding.ppt
PDF
Expectation Maximization and Gaussian Mixture Models
PDF
Random Features Strengthen Graph Neural Networks
K means Clustering Algorithm
K-means clustering algorithm
MNIST and machine learning - presentation
Convolutional neural network from VGG to DenseNet
Digital Image Processing
Thresholding.ppt
Expectation Maximization and Gaussian Mixture Models
Random Features Strengthen Graph Neural Networks

What's hot (20)

PDF
Object Detection and Recognition
PDF
Clustering
PPTX
Depth estimation using deep learning
PPTX
RABIN KARP ALGORITHM STRING MATCHING
PPTX
Batch normalization presentation
PPTX
Object detection
PPTX
Speaker Recognition using Gaussian Mixture Model
PPTX
CNN Machine learning DeepLearning
PDF
Rnn and lstm
PPTX
greedy algorithm Fractional Knapsack
PPTX
Image feature extraction
PDF
A Beginner's Guide to Monocular Depth Estimation
PPTX
Image segmentation
PDF
08. spectal clustering
PDF
Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)
PPTX
KNN.pptx
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
PPT
3.2 partitioning methods
PPTX
Diabetic Retinopathy Analysis using Fundus Image
Object Detection and Recognition
Clustering
Depth estimation using deep learning
RABIN KARP ALGORITHM STRING MATCHING
Batch normalization presentation
Object detection
Speaker Recognition using Gaussian Mixture Model
CNN Machine learning DeepLearning
Rnn and lstm
greedy algorithm Fractional Knapsack
Image feature extraction
A Beginner's Guide to Monocular Depth Estimation
Image segmentation
08. spectal clustering
Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)
KNN.pptx
Deep Learning for Computer Vision: Object Detection (UPC 2016)
3.2 partitioning methods
Diabetic Retinopathy Analysis using Fundus Image
Ad

Viewers also liked (20)

PDF
Kmeans plusplus
PPT
K mean-clustering algorithm
PDF
K-Means Algorithm
PDF
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PDF
Kmeans
PDF
The Public Opinion Landscape: Election 2016
PDF
Comprension de lectura de los mexicanos
PDF
广东证券见记者发表
DOC
Zaragoza turismo 243
PDF
Маркетинг финансовых услуг - выступление для студентов
PDF
Experimental design
PDF
سبيلك الى الثروة و النجاح
PPTX
Mumbai - Zappos - Downtown Project - Dec 10, 2015
PDF
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
PPTX
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
PPT
Kmeans
PPTX
Trulia Metro Movers Report - Winter 2012
PPTX
Historia insp aurora silva
DOCX
Application of Number
Kmeans plusplus
K mean-clustering algorithm
K-Means Algorithm
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
Kmeans
The Public Opinion Landscape: Election 2016
Comprension de lectura de los mexicanos
广东证券见记者发表
Zaragoza turismo 243
Маркетинг финансовых услуг - выступление для студентов
Experimental design
سبيلك الى الثروة و النجاح
Mumbai - Zappos - Downtown Project - Dec 10, 2015
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
Kmeans
Trulia Metro Movers Report - Winter 2012
Historia insp aurora silva
Application of Number
Ad

Similar to Kmeans initialization (20)

PPTX
Advanced database and data mining & clustering concepts
PDF
clustering unsupervised learning and machine learning.pdf
PPTX
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
PPTX
Clustering.pptx
PDF
Clustering Theory
PDF
K means-1
PPTX
Pattern recognition binoy k means clustering
PDF
DMTM 2015 - 08 Representative-Based Clustering
PPTX
Selection K in K-means Clustering
PPTX
Data Mining Lecture_7.pptx
PDF
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
PDF
K means clustering
PPTX
K means clustering algorithm
PPTX
Mathematics online: some common algorithms
PDF
TunUp final presentation
PPT
multiarmed bandit.ppt
PPTX
Knn 160904075605-converted
PDF
ch_5_dm clustering in data mining.......
PDF
DMTM Lecture 13 Representative based clustering
PDF
Bioalgo 2012-03-randomized
Advanced database and data mining & clustering concepts
clustering unsupervised learning and machine learning.pdf
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
Clustering.pptx
Clustering Theory
K means-1
Pattern recognition binoy k means clustering
DMTM 2015 - 08 Representative-Based Clustering
Selection K in K-means Clustering
Data Mining Lecture_7.pptx
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
K means clustering
K means clustering algorithm
Mathematics online: some common algorithms
TunUp final presentation
multiarmed bandit.ppt
Knn 160904075605-converted
ch_5_dm clustering in data mining.......
DMTM Lecture 13 Representative based clustering
Bioalgo 2012-03-randomized

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Assigned Numbers - 2025 - Bluetooth® Document
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Machine Learning_overview_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Dropbox Q2 2025 Financial Results & Investor Presentation

Kmeans initialization

  • 1. K-Means Clustering Problem Ahmad Sabiq Febri Maspiyanti Indah Kuntum Khairina Wiwin Farhania Yonatan
  • 2. What is k-means? • To partition n objects into k clusters, based on attributes. – Objects of the same cluster are close their attributes are related to each other. – Objects of different clusters are far apart their attributes are very dissimilar.
  • 3. Algorithm • Input: n objects, k (integer k ≤ n) • Output: k clusters • Steps: 1. Select k initial centroids. 2. Calculate the distance between each object and each centroid. 3. Assign each object to the cluster with the nearest centroid. 4. Recalculate each centroid. 5. If the centroids don’t change, stop (convergence). Otherwise, back to step 2. • Complexity: O(k.n.d.total_iteration)
  • 4. Initialization • Why is it important? What does it affect? – Clustering result local optimum! – Total iteration / complexity
  • 5. Good Initialization 3 clusters with 2 iterations…
  • 6. Bad Initialization 3 clusters with 4 iterations…
  • 7. Initialization Methods 1. Random 2. Forgy 3. Macqueen 4. Kaufman
  • 8. Random • Algorithm: 1. Assigns each object to a random cluster. 2. Computes the initial centroid of each cluster.
  • 11. Random 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 12. Forgy • Algorithm: 1. Chooses k objects at random and uses them as the initial centroids.
  • 13. Forgy 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 14. MacQueen • Algorithm: 1. Chooses k objects at random and uses them as the initial centroids. 2. Assign each object to the cluster with the nearest centroid. 3. After each assignment, recalculate the centroid.
  • 15. MacQueen 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 33. Kaufman C=0 d = 24,33 D = 15,52
  • 34. Kaufman C=0 C=0 C=0 C=0 C=0
  • 35. Kaufman C=0 C=0 C=0 C=0 ∑C1 = 2,74 C=0
  • 36. Kaufman ∑C5 = 52,55 ∑C6 = 55,88 ∑C9 = 42,69 ∑C7 = 53,77 ∑C1 = 2,74 ∑C8 = 51,16 ∑C2 = 12,,21 ∑C3 = 12,36 ∑C3 = 8,38
  • 37. Kaufman ∑C5 = 52,55 ∑C6 = 55,88 ∑C9 = 42,69 ∑C7 = 53,77 ∑C1 = 2,74 ∑C8 = 51,16 ∑C2 = 12,,21 ∑C3 = 12,36 ∑C3 = 8,38
  • 38. Reference 1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical Comparison of Four Initialization Methods for the K- Means Algorithm. Pattern Recognition Letters, vol. 20, pp. 1027–1040. 1999. 2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A Greedy Randomized Adaptive Search Procedure Applied to the Clustering Problem as an Initialization Process Using K-Means as a Local Search Procedure. Journal of Intelligent and Fuzzy Systems, vol. 12, pp. 235 – 242. 2002. 3. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley. 1990.
  • 39. Questions 1. Kenapa inisialisasi penting pada k-means? 2. Metode inisialisasi apa yang memiliki greedy choice property? 3. Jelaskan kompleksitas O(nkd) pada metode Random.