SlideShare a Scribd company logo
Parallel Algorithms K – means
Clustering
Final Results
By: Andreina Uzcategui
CSE 633: Parallel Algorithms
Spring 2014
Outline
The problem
Algorithm Description
Parallel Algorithm Implementation(MPI)
Test Cases
Results
The Problem
K-means Clustering
Dividing a large vector filled of points into smaller groups which
are organized according to a centroid point, each group must
have almost the same number of components.
Centroids (k)
Algorithm Description
K – means clustering
It has by objective to partition n elements into k
clusters.
The partition is made grouping the observed
elements according to it proximity with one of the k
elements using as centroids.
The distance between a centroid (k) and a point is
calculated by:
- Euclidean Distance Metric:
Point – K = |Distance| (Absolute value result)
Parallel Algorithm Implementation
(MPI)
In order to make the k – means clustering problem
parallel, the following steps will be implemented:
Data organization
1- P processors, each will contain nxTn data values
(points) randomly assigned.
2- Three k values (centroids) will be used in each
iteration to determinate the clusters.
T1 Tn…
P1 … Pn
Parallel Algorithm Implementation
(MPI)
Algorithm
Iterative algorithm
1- For the first iteration 3 k values (centroids) will be
determinate randomly.
2- Each PE in parallel will calculate the clusters
associated to each k using the Euclidean Distance
Metric.
Parallel Algorithm Implementation
(MPI)
3- Each PE in parallel will calculate the median value
of each of its cluster.
- Media:
1- Determinate a frequency table containing
each point in the cluster frequency.
2- Calculate the media position according to
the frequency table and hence the median value will
be obtained.
Parallel Algorithm Implementation
(MPI)
4- Each PE will broadcast its medians for each
cluster to all other PEs.
5- In parallel each PE will determinate a new median
for each cluster using the received data and its just
calculated median.
6- Each PE will check for each cluster the different
between the new calculated median and it previous
calculated median.
Parallel Algorithm Implementation
(MPI)
Final Conditions
- When the different between old and new median (error
value) is minimal or zero the iteration process stops
under normal considerations.
- For simplicity of the algorithm, in this case the number
of iterations made was predetermined to avoid infinite
iterations (10 itarations).
- For each iteration (except first one) the K values will
be the closest medians to 0 determinate in previous
iteration.
Test Cases & Conclusions
1- Same centroids, different data, same # processors,
same # tasks.
2- Same centroids, same data, different # processors.
3- Same centroids, same data, different # tasks.
4- Different centroids, different data, different #
processors.
5- Different centroids, different data, different # tasks.
6- Same data, different # processors.
Test Case 1: Same centroids, different data, same # processors, same
# tasks.
K d P T Time
3 100 2 8 0.13
3 1000 2 8 0.28
3 2000 2 8 0.35
3 5000 2 8 0.80
3 9000 2 8 3.03
Conclusion
The processing time
dramatically increase.
K = # centroids
d = # data
P = # processor
T = # tasks
Test Case 2: Same centroids, same data, different # processors.
K d P Time.sec
3 100 2 0.13
3 100 4 0.14
3 100 8 0.29
3 100 16 0.43
Conclusion
The processing time slowly
increase.
Conclusion
The processing time
K = # centroids
d = # data
P = # processor
Test Case 3: Same centroids, same data, different # tasks.
K d T Time
3 100 2 0.05
3 100 4 0.06
3 100 8 0.13
3 100 16 0.24
Conclusion
The processing time slowly
increase.
Conclusion
The processing time
K = # centroids
d = # data
T = # tasks
Test Case 4: Different centroids, different data, different # processors.
K d P Time
3 100 2 0.1
6 1000 4 0.35
12 5000 8 25.54
Conclusion
The processing time
dramatically increase.
K = # centroids
d = # data
P = # processor
Test Case 5: Different centroids, different data, different # tasks.
K d T Time
3 100 2 0.05
6 1000 4 0.12
12 5000 8 4.95
Conclusion
The processing time
dramatically increase.
K = # centroids
d = # data
T = # tasks
Test Case 6: Same data, different # processors.
P time, sec
2 0.85
4 0.18
8 0.07
16 0.05
32 0.06
Conclusion
The processing time slowly
decrease until the #
processors is to high and
the data per P is too low.
Total data, N = 12288, is
divided by an increasing P
in every stage
Questions?

More Related Content

PPTX
Mobile Application Development: Hybrid, Native and Mobile Web Apps
PPTX
Introduction to Flutter
PPTX
React event
PPTX
Progressive Web App
PPT
React js
PDF
Electron JS | Build cross-platform desktop applications with web technologies
PPT
Intro to CloudStack API
PPTX
JavaScript Promises
Mobile Application Development: Hybrid, Native and Mobile Web Apps
Introduction to Flutter
React event
Progressive Web App
React js
Electron JS | Build cross-platform desktop applications with web technologies
Intro to CloudStack API
JavaScript Promises

What's hot (20)

PPSX
Electron - Build cross platform desktop apps
PPTX
Top 10 RxJs Operators in Angular
PDF
Introduction to react native
PDF
WEB DEVELOPMENT USING REACT JS
PDF
Bootiful Development with Spring Boot and Angular - Connect.Tech 2017
PPTX
Deep dive into swift UI
PPTX
PPTX
ReactJS presentation.pptx
PDF
INTRODUCTION TO FLUTTER.pdf
PPTX
Flutter Leap of Faith
PPTX
Mobile application development ppt
PPTX
Flutter workshop
PPTX
React state
PDF
Angular Observables & RxJS Introduction
PPTX
Jetpack Compose.pptx
PDF
e-commerce web development project report (Bookz report)
PDF
react redux.pdf
PDF
Introduction to Mobile Application Development
PPTX
Express js
PPTX
.Net Core 1.0 vs .NET Framework
Electron - Build cross platform desktop apps
Top 10 RxJs Operators in Angular
Introduction to react native
WEB DEVELOPMENT USING REACT JS
Bootiful Development with Spring Boot and Angular - Connect.Tech 2017
Deep dive into swift UI
ReactJS presentation.pptx
INTRODUCTION TO FLUTTER.pdf
Flutter Leap of Faith
Mobile application development ppt
Flutter workshop
React state
Angular Observables & RxJS Introduction
Jetpack Compose.pptx
e-commerce web development project report (Bookz report)
react redux.pdf
Introduction to Mobile Application Development
Express js
.Net Core 1.0 vs .NET Framework
Ad

Viewers also liked (18)

PDF
Information Gain
PPTX
Decision Tree - C4.5&CART
PDF
Lecture 3b: Decision Trees (1 part)
PDF
Parallel-kmeans
PDF
Lecture 5: Bayesian Classification
PPSX
Decision tree Using c4.5 Algorithm
PPTX
Chapter 4 Classification
PPT
Intelligence Cycle
PDF
Decision tree lecture 3
PPTX
Decision tree powerpoint presentation templates
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PPT
K mean-clustering algorithm
PPT
K means Clustering Algorithm
PDF
Decision tree example problem
PPTX
Naive bayes
PPTX
Data Mining: Classification and analysis
PPTX
Data mining: Classification and prediction
Information Gain
Decision Tree - C4.5&CART
Lecture 3b: Decision Trees (1 part)
Parallel-kmeans
Lecture 5: Bayesian Classification
Decision tree Using c4.5 Algorithm
Chapter 4 Classification
Intelligence Cycle
Decision tree lecture 3
Decision tree powerpoint presentation templates
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
K mean-clustering algorithm
K means Clustering Algorithm
Decision tree example problem
Naive bayes
Data Mining: Classification and analysis
Data mining: Classification and prediction
Ad

Similar to Parallel Algorithms K – means Clustering (20)

PDF
Implementation of K-Nearest Neighbor Algorithm
PDF
Experimental study of Data clustering using k- Means and modified algorithms
PDF
Optimising Data Using K-Means Clustering Algorithm
DOCX
K means report
PDF
New Approach for K-mean and K-medoids Algorithm
DOCX
Neural nw k means
PDF
An improvement in k mean clustering algorithm using better time and accuracy
PDF
Premeditated Initial Points for K-Means Clustering
PDF
The International Journal of Engineering and Science (The IJES)
PDF
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
PPTX
Selection K in K-means Clustering
PDF
Af4201214217
PPT
Novel algorithms for Knowledge discovery from neural networks in Classificat...
PPTX
K-Nearest Neighbor Classifier
PDF
Data analysis of weather forecasting
PPT
Genetic Algorithms
PDF
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
PDF
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
Implementation of K-Nearest Neighbor Algorithm
Experimental study of Data clustering using k- Means and modified algorithms
Optimising Data Using K-Means Clustering Algorithm
K means report
New Approach for K-mean and K-medoids Algorithm
Neural nw k means
An improvement in k mean clustering algorithm using better time and accuracy
Premeditated Initial Points for K-Means Clustering
The International Journal of Engineering and Science (The IJES)
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
Selection K in K-means Clustering
Af4201214217
Novel algorithms for Knowledge discovery from neural networks in Classificat...
K-Nearest Neighbor Classifier
Data analysis of weather forecasting
Genetic Algorithms
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

Parallel Algorithms K – means Clustering

  • 1. Parallel Algorithms K – means Clustering Final Results By: Andreina Uzcategui CSE 633: Parallel Algorithms Spring 2014
  • 2. Outline The problem Algorithm Description Parallel Algorithm Implementation(MPI) Test Cases Results
  • 3. The Problem K-means Clustering Dividing a large vector filled of points into smaller groups which are organized according to a centroid point, each group must have almost the same number of components. Centroids (k)
  • 4. Algorithm Description K – means clustering It has by objective to partition n elements into k clusters. The partition is made grouping the observed elements according to it proximity with one of the k elements using as centroids. The distance between a centroid (k) and a point is calculated by: - Euclidean Distance Metric: Point – K = |Distance| (Absolute value result)
  • 5. Parallel Algorithm Implementation (MPI) In order to make the k – means clustering problem parallel, the following steps will be implemented: Data organization 1- P processors, each will contain nxTn data values (points) randomly assigned. 2- Three k values (centroids) will be used in each iteration to determinate the clusters. T1 Tn… P1 … Pn
  • 6. Parallel Algorithm Implementation (MPI) Algorithm Iterative algorithm 1- For the first iteration 3 k values (centroids) will be determinate randomly. 2- Each PE in parallel will calculate the clusters associated to each k using the Euclidean Distance Metric.
  • 7. Parallel Algorithm Implementation (MPI) 3- Each PE in parallel will calculate the median value of each of its cluster. - Media: 1- Determinate a frequency table containing each point in the cluster frequency. 2- Calculate the media position according to the frequency table and hence the median value will be obtained.
  • 8. Parallel Algorithm Implementation (MPI) 4- Each PE will broadcast its medians for each cluster to all other PEs. 5- In parallel each PE will determinate a new median for each cluster using the received data and its just calculated median. 6- Each PE will check for each cluster the different between the new calculated median and it previous calculated median.
  • 9. Parallel Algorithm Implementation (MPI) Final Conditions - When the different between old and new median (error value) is minimal or zero the iteration process stops under normal considerations. - For simplicity of the algorithm, in this case the number of iterations made was predetermined to avoid infinite iterations (10 itarations). - For each iteration (except first one) the K values will be the closest medians to 0 determinate in previous iteration.
  • 10. Test Cases & Conclusions 1- Same centroids, different data, same # processors, same # tasks. 2- Same centroids, same data, different # processors. 3- Same centroids, same data, different # tasks. 4- Different centroids, different data, different # processors. 5- Different centroids, different data, different # tasks. 6- Same data, different # processors.
  • 11. Test Case 1: Same centroids, different data, same # processors, same # tasks. K d P T Time 3 100 2 8 0.13 3 1000 2 8 0.28 3 2000 2 8 0.35 3 5000 2 8 0.80 3 9000 2 8 3.03 Conclusion The processing time dramatically increase. K = # centroids d = # data P = # processor T = # tasks
  • 12. Test Case 2: Same centroids, same data, different # processors. K d P Time.sec 3 100 2 0.13 3 100 4 0.14 3 100 8 0.29 3 100 16 0.43 Conclusion The processing time slowly increase. Conclusion The processing time K = # centroids d = # data P = # processor
  • 13. Test Case 3: Same centroids, same data, different # tasks. K d T Time 3 100 2 0.05 3 100 4 0.06 3 100 8 0.13 3 100 16 0.24 Conclusion The processing time slowly increase. Conclusion The processing time K = # centroids d = # data T = # tasks
  • 14. Test Case 4: Different centroids, different data, different # processors. K d P Time 3 100 2 0.1 6 1000 4 0.35 12 5000 8 25.54 Conclusion The processing time dramatically increase. K = # centroids d = # data P = # processor
  • 15. Test Case 5: Different centroids, different data, different # tasks. K d T Time 3 100 2 0.05 6 1000 4 0.12 12 5000 8 4.95 Conclusion The processing time dramatically increase. K = # centroids d = # data T = # tasks
  • 16. Test Case 6: Same data, different # processors. P time, sec 2 0.85 4 0.18 8 0.07 16 0.05 32 0.06 Conclusion The processing time slowly decrease until the # processors is to high and the data per P is too low. Total data, N = 12288, is divided by an increasing P in every stage