SlideShare a Scribd company logo
K-MEANS
CLUSTERING
METHOD BASED
NETWORK SHARED
RESOURCES MINING
A SHORT STORY PRESENTED BY
KANCHETI SAI PRAGNA
SJSU_ID: 016698552
WHY MINING NETWORK
SHARED RESOURCES?
 The demand for data resource
sharing in internet has been
growing and this brought up
many optimization techniques in
utilizing efficiency of resources.
 At present, there are at least 15
Trillion files available on the
internet, The vast availability of
resources makes a complex task in
retrieving the relevant data
resources efficiently
 In order to solve problems of large
redundant information and
relevant data resources research
the need for data mining in
network shared data resources
arose.
Existing
Methods of
network
shared
resources
mining
• There has been a significant research done in data mining methods in relevant
data resources research and various techniques came into picture.
• clustering analysis algorithm based Method where it uses clustering analysis
algorithm to process resource data, construct the data preprocessing set, and
calculate the data feature vector.
• Another method based on multi-dimensional resource coordination and
aggregation where this technique focuses on using the data center's network
resource sharing process analysis as the basis for building a multidimensional
resource aggregation data model.
• using fuzzy logic to build multidimensional collaborative fitness functions, and
using data mining to optimize decision-making in order to increase the execution
efficiency of the data mining process.
• However, Although these methods produced some excellent results they lack in
run time efficiency, precision and they are usually complex to apply practically.
• In order to overcome above drawbacks a new method based on k means
clustering algorithm has come into picture.
CLUSTERING
WHAT IS
CLUSTERING?
 Clustering is used in assembling
bulky data into clusters or
groups that helps us to visualize
the internal structure of the
data. Basically, it is a grouping
of items based on how similar
and distinct they are to one
another
 For example, there is some
online shopping site where we
can find variety of stuffs from
electronics, clothing, books,
grocery items, cosmetic items,
accessories. Here in figure 2
describes how it looks after
clustering is done.
STAGES OF
CLUSTERING
 Raw Data
 Clustering Algorithm
 Clusters
STAGES OF CLUSTERING
 Raw Data: Raw data (which are not being processed yet) are collected from various sources on which we
want to solicit various clustering algorithm
 Clustering Algorithm: A specific algorithm is selected according to our requirements and then that very
algorithm is applied on the raw data that were being selected.
 Clusters: After soliciting the selected clustering algorithm on the raw data, we acquire our clusters.
TYPES OF
CLUSTERING
 Partitioning Method
 Density-based Method
 Hierarchical Method
 Grid-based method
 Model-based clustering method
 Constraint-based method
PARTITIONING METHOD
 In the case of partitioning clustering method,
the objects of the datasets are segregated into
numerous subsets.
 Given some examples of the partitioning
algorithms are K-means, PAM (Partitioning
AroundMedoids).
 The figure shows how clusters are formed after
applying partitioning clustering technique
DENSITY-BASED METHOD
 Density-Based Clustering method identify
distinctive clusters in the data, based on the
idea that a cluster/group in a data space is a
contiguous region of high point density,
separated from other clusters by sparse
regions.
 Basically, in this method clusters are formed or
the data spaces are partitioned by the density
of the data point in a particular region
 The figure shows how clusters are formed after
applying Density-Based Method of clustering
HIERARCHICAL METHOD
 In the case of hierarchical clustering method,
the objects of the datasets are segregated in
the hierarchical fashion of clusters or groups.
 Examples: Agglomerative Hierarchical
clustering algorithm (AGNES), Divisive
Hierarchical clustering algorithm (DIANA) etc.,
 The figure shows how clusters are formed after
applying Hierarchical Method of clustering
GRID-BASED METHOD
 In grid-based clustering method, the object
space is divided into fixed number of cells that
forms the shape of a grid like structure.
Clustering algorithm is STING (Statistical
Information Grid).
 The figure shows how clusters are formed after
applying grid-based clustering methodrid-
based method
MODEL-BASED CLUSTERING METHOD
 Model-based clustering works on the concept
of Probability Model which is a mathematical
representation of any random occurrence of
dataset. Each of the groups that would form
will have different Probability Model.
 The figure shows how clusters are formed after
applying Model-based clustering method
CONSTRAINT-BASED METHOD
 Constrained-based clustering method is a
semi-supervised learning technique where
amalgamation of small proportion of labeled
data with a large proportion of unlabeled data
occurs.
 Constrained K-means (COP-K-Means)
algorithm is one of the common algorithms
using this method
 The figure illustrates clustering using
Constraint-based method.
K-MEANS
CLUSTERING
K-MEANS CLUSTERING ALGORITHM
 The K-Means algorithm is a sort of partition-based clustering approach that belongs to the unsupervised
learning techniques. It divides a huge set of data into K number of smaller groups. The two distinct steps
of this method are described below.
 a. First phase: K centroids or centers are selected haphazardly in this phase. K should have a permanent
value. During the procedure, it cannot be changed.
 b. Second phase: Each data point is given its closest center or centroids during this phase. Euclidean
distance is used to calculate the separation between cluster centroids or centers and all data points.
 The distance between any two points, let's say point x and point y, is known as the Euclidean distance.
The separation between x and y is equal to the separation between x and y. Equation (1) states the
following for the Euclidean distance between any two randomly chosen points, x and y:
K-MEANS CLUSTERING ALGORITHM
 Algorithm for K-Means
 1. Input: Choose a database and select the value of K that is the number of clusters we want at the
end.Let
 the database be D with n number of data objects. D = {d1, d2, d3, ….,dn}
 2. Output: We will obtain an arrangement of K number of clusters.
 3. Algorithm
 (i) Randomly select the number of clusters, K.
 (ii) Choose the centre or the centroids for K clusters. The initial values of the centres are selected
 arbitrarily.
K-MEANS CLUSTERING ALGORITHM
 (iii) Arrange all data objects to the closest cluster; this is
determined with the help of Euclidean distance
 theory.
 (iv) Again calculate the centre of the cluster. This is evaluated by
taking the mean of the data objects
 present in each of the cluster individually. If there are n objects say
x1, x2, x3, …., and then the mean is
 given in equation (2)
 (v) Repeat step (iii) and (iv) until convergence. This is basically an
iterative technique
APPLICATION OF K-MEANS CLUSTERING ALGORITHM IN
MINING OF NETWORK SHARED RESOURCES
K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES
 The K-means algorithm has emerged as the most well-known and
widely used algorithm in the process of data collecting due to its
advantages of high data processing efficiency, low computational
complexity, and strong scalability.
 The data of Network shared resources is clustered in to different
classes using k-means clustering in the manner shown in the
image.
K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES
 When compared to existing methods that are mentioned above the K-means clustering algorithm has
the following advantages:
 The K-means clustering technique has a significant robustness when managing data sets. In particular,
when using the algorithm to handle the class and the class has a large gap between the data set, the
classification results are improved.
 The input order of data objects has almost no impact on the classification outcomes when numerical
data sets are classified using the K-means clustering algorithm.
K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES
 The reason is that in order to achieve the classification of the data set, the distance formula is applied to
determine the distance from each data object to the center point during the clustering process using
this technique.
 Which was not in the case of above mentioned methods where the outcomes of classification division
are hugely impacted buy the order of input objects.
 This algorithm is capable of handling big data sets. The outcomes of data clustering won't be affected if
there is data overlap between different data sets, hence this approach has good practical use.
COMPARISONS WITH EXISTING METHODS
ACCURACY
COMPARISON
 The accuracy of k-means
based method is almost
close to 97% while the other
methods could not be more
than 80% as the number of
experiments increases.
DATA MINING TIME
COMPARISON
 The average time for data
mining using K-means
clustering based method is
only 0.6s. whereas, the
average time for other
methods are almost 4.2 and
2.9 seconds.
CONCLUSION
 in order to improve the quality of network shared
resource data mining, the K-means cluster network
data mining technique has accuracy of in-depth data
mining of network shared resources by the method is
always over 94%, and the average time of in-depth
data mining is only 0.6s,.
 suggesting that this method can achieve fast and
accurate in-depth data mining of network shared
resources.
 Yet, there are still a number of challenges including
the deep mining of language and cross-cultural
resource sharing as well as the security,
personalization, and intelligence of resource data
mining to resolve.
THANK YOU

More Related Content

PDF
Clustering[306] [Read-Only].pdf
PPTX
Introduction to Clustering . pptx
PPTX
clustering and distance metrics.pptx
PPTX
Presentation on K-Means Clustering
PDF
Chapter 5.pdf
PPT
Clustering & classification
PDF
A survey on Efficient Enhanced K-Means Clustering Algorithm
PPT
26-Clustering MTech-2017.ppt
Clustering[306] [Read-Only].pdf
Introduction to Clustering . pptx
clustering and distance metrics.pptx
Presentation on K-Means Clustering
Chapter 5.pdf
Clustering & classification
A survey on Efficient Enhanced K-Means Clustering Algorithm
26-Clustering MTech-2017.ppt

Similar to K- means clustering method based Data Mining of Network Shared Resources .pptx (20)

PPTX
Clustering - K-Means, DBSCAN
PPTX
K-Means clustring @jax
PPTX
K means clustring @jax
PPTX
Data clustring
PPTX
machine learning - Clustering in R
PPT
cluster analysis
PDF
Ijartes v1-i2-006
PPT
DM_clustering.ppt
PPT
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
PPTX
Cluster Analysis Introduction
PPTX
Cluster Analysis.pptx
PPTX
Clustering on DSS
PDF
ch_5_dm clustering in data mining.......
PPTX
big data analytics unit 2 notes for study
PDF
Mat189: Cluster Analysis with NBA Sports Data
PPT
Data Mining Lecture Node: Hierarchical Cluster in Data Mining
PPT
Dataa miining
PDF
Applications Of Clustering Techniques In Data Mining A Comparative Study
PDF
4.Unit 4 ML Q&A.pdf machine learning qb
PPTX
Clustering for Beginners
Clustering - K-Means, DBSCAN
K-Means clustring @jax
K means clustring @jax
Data clustring
machine learning - Clustering in R
cluster analysis
Ijartes v1-i2-006
DM_clustering.ppt
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
Cluster Analysis Introduction
Cluster Analysis.pptx
Clustering on DSS
ch_5_dm clustering in data mining.......
big data analytics unit 2 notes for study
Mat189: Cluster Analysis with NBA Sports Data
Data Mining Lecture Node: Hierarchical Cluster in Data Mining
Dataa miining
Applications Of Clustering Techniques In Data Mining A Comparative Study
4.Unit 4 ML Q&A.pdf machine learning qb
Clustering for Beginners
Ad

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Global journeys: estimating international migration
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Lecture1 pattern recognition............
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Mega Projects Data Mega Projects Data
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Global journeys: estimating international migration
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Fluorescence-microscope_Botany_detailed content
Lecture1 pattern recognition............
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Acumen Training GuidePresentation.pptx
Introduction to Knowledge Engineering Part 1
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Ad

K- means clustering method based Data Mining of Network Shared Resources .pptx

  • 1. K-MEANS CLUSTERING METHOD BASED NETWORK SHARED RESOURCES MINING A SHORT STORY PRESENTED BY KANCHETI SAI PRAGNA SJSU_ID: 016698552
  • 2. WHY MINING NETWORK SHARED RESOURCES?  The demand for data resource sharing in internet has been growing and this brought up many optimization techniques in utilizing efficiency of resources.  At present, there are at least 15 Trillion files available on the internet, The vast availability of resources makes a complex task in retrieving the relevant data resources efficiently  In order to solve problems of large redundant information and relevant data resources research the need for data mining in network shared data resources arose.
  • 3. Existing Methods of network shared resources mining • There has been a significant research done in data mining methods in relevant data resources research and various techniques came into picture. • clustering analysis algorithm based Method where it uses clustering analysis algorithm to process resource data, construct the data preprocessing set, and calculate the data feature vector. • Another method based on multi-dimensional resource coordination and aggregation where this technique focuses on using the data center's network resource sharing process analysis as the basis for building a multidimensional resource aggregation data model. • using fuzzy logic to build multidimensional collaborative fitness functions, and using data mining to optimize decision-making in order to increase the execution efficiency of the data mining process. • However, Although these methods produced some excellent results they lack in run time efficiency, precision and they are usually complex to apply practically. • In order to overcome above drawbacks a new method based on k means clustering algorithm has come into picture.
  • 5. WHAT IS CLUSTERING?  Clustering is used in assembling bulky data into clusters or groups that helps us to visualize the internal structure of the data. Basically, it is a grouping of items based on how similar and distinct they are to one another  For example, there is some online shopping site where we can find variety of stuffs from electronics, clothing, books, grocery items, cosmetic items, accessories. Here in figure 2 describes how it looks after clustering is done.
  • 6. STAGES OF CLUSTERING  Raw Data  Clustering Algorithm  Clusters
  • 7. STAGES OF CLUSTERING  Raw Data: Raw data (which are not being processed yet) are collected from various sources on which we want to solicit various clustering algorithm  Clustering Algorithm: A specific algorithm is selected according to our requirements and then that very algorithm is applied on the raw data that were being selected.  Clusters: After soliciting the selected clustering algorithm on the raw data, we acquire our clusters.
  • 8. TYPES OF CLUSTERING  Partitioning Method  Density-based Method  Hierarchical Method  Grid-based method  Model-based clustering method  Constraint-based method
  • 9. PARTITIONING METHOD  In the case of partitioning clustering method, the objects of the datasets are segregated into numerous subsets.  Given some examples of the partitioning algorithms are K-means, PAM (Partitioning AroundMedoids).  The figure shows how clusters are formed after applying partitioning clustering technique
  • 10. DENSITY-BASED METHOD  Density-Based Clustering method identify distinctive clusters in the data, based on the idea that a cluster/group in a data space is a contiguous region of high point density, separated from other clusters by sparse regions.  Basically, in this method clusters are formed or the data spaces are partitioned by the density of the data point in a particular region  The figure shows how clusters are formed after applying Density-Based Method of clustering
  • 11. HIERARCHICAL METHOD  In the case of hierarchical clustering method, the objects of the datasets are segregated in the hierarchical fashion of clusters or groups.  Examples: Agglomerative Hierarchical clustering algorithm (AGNES), Divisive Hierarchical clustering algorithm (DIANA) etc.,  The figure shows how clusters are formed after applying Hierarchical Method of clustering
  • 12. GRID-BASED METHOD  In grid-based clustering method, the object space is divided into fixed number of cells that forms the shape of a grid like structure. Clustering algorithm is STING (Statistical Information Grid).  The figure shows how clusters are formed after applying grid-based clustering methodrid- based method
  • 13. MODEL-BASED CLUSTERING METHOD  Model-based clustering works on the concept of Probability Model which is a mathematical representation of any random occurrence of dataset. Each of the groups that would form will have different Probability Model.  The figure shows how clusters are formed after applying Model-based clustering method
  • 14. CONSTRAINT-BASED METHOD  Constrained-based clustering method is a semi-supervised learning technique where amalgamation of small proportion of labeled data with a large proportion of unlabeled data occurs.  Constrained K-means (COP-K-Means) algorithm is one of the common algorithms using this method  The figure illustrates clustering using Constraint-based method.
  • 16. K-MEANS CLUSTERING ALGORITHM  The K-Means algorithm is a sort of partition-based clustering approach that belongs to the unsupervised learning techniques. It divides a huge set of data into K number of smaller groups. The two distinct steps of this method are described below.  a. First phase: K centroids or centers are selected haphazardly in this phase. K should have a permanent value. During the procedure, it cannot be changed.  b. Second phase: Each data point is given its closest center or centroids during this phase. Euclidean distance is used to calculate the separation between cluster centroids or centers and all data points.  The distance between any two points, let's say point x and point y, is known as the Euclidean distance. The separation between x and y is equal to the separation between x and y. Equation (1) states the following for the Euclidean distance between any two randomly chosen points, x and y:
  • 17. K-MEANS CLUSTERING ALGORITHM  Algorithm for K-Means  1. Input: Choose a database and select the value of K that is the number of clusters we want at the end.Let  the database be D with n number of data objects. D = {d1, d2, d3, ….,dn}  2. Output: We will obtain an arrangement of K number of clusters.  3. Algorithm  (i) Randomly select the number of clusters, K.  (ii) Choose the centre or the centroids for K clusters. The initial values of the centres are selected  arbitrarily.
  • 18. K-MEANS CLUSTERING ALGORITHM  (iii) Arrange all data objects to the closest cluster; this is determined with the help of Euclidean distance  theory.  (iv) Again calculate the centre of the cluster. This is evaluated by taking the mean of the data objects  present in each of the cluster individually. If there are n objects say x1, x2, x3, …., and then the mean is  given in equation (2)  (v) Repeat step (iii) and (iv) until convergence. This is basically an iterative technique
  • 19. APPLICATION OF K-MEANS CLUSTERING ALGORITHM IN MINING OF NETWORK SHARED RESOURCES
  • 20. K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES  The K-means algorithm has emerged as the most well-known and widely used algorithm in the process of data collecting due to its advantages of high data processing efficiency, low computational complexity, and strong scalability.  The data of Network shared resources is clustered in to different classes using k-means clustering in the manner shown in the image.
  • 21. K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES  When compared to existing methods that are mentioned above the K-means clustering algorithm has the following advantages:  The K-means clustering technique has a significant robustness when managing data sets. In particular, when using the algorithm to handle the class and the class has a large gap between the data set, the classification results are improved.  The input order of data objects has almost no impact on the classification outcomes when numerical data sets are classified using the K-means clustering algorithm.
  • 22. K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES  The reason is that in order to achieve the classification of the data set, the distance formula is applied to determine the distance from each data object to the center point during the clustering process using this technique.  Which was not in the case of above mentioned methods where the outcomes of classification division are hugely impacted buy the order of input objects.  This algorithm is capable of handling big data sets. The outcomes of data clustering won't be affected if there is data overlap between different data sets, hence this approach has good practical use.
  • 24. ACCURACY COMPARISON  The accuracy of k-means based method is almost close to 97% while the other methods could not be more than 80% as the number of experiments increases.
  • 25. DATA MINING TIME COMPARISON  The average time for data mining using K-means clustering based method is only 0.6s. whereas, the average time for other methods are almost 4.2 and 2.9 seconds.
  • 26. CONCLUSION  in order to improve the quality of network shared resource data mining, the K-means cluster network data mining technique has accuracy of in-depth data mining of network shared resources by the method is always over 94%, and the average time of in-depth data mining is only 0.6s,.  suggesting that this method can achieve fast and accurate in-depth data mining of network shared resources.  Yet, there are still a number of challenges including the deep mining of language and cross-cultural resource sharing as well as the security, personalization, and intelligence of resource data mining to resolve.