K- means clustering method based Data Mining of Network Shared Resources .pptx

K-MEANS
CLUSTERING
METHOD BASED
NETWORK SHARED
RESOURCES MINING
A SHORT STORY PRESENTED BY
KANCHETI SAI PRAGNA
SJSU_ID: 016698552

WHY MINING NETWORK
SHARED RESOURCES?
 The demand for data resource
sharing in internet has been
growing and this brought up
many optimization techniques in
utilizing efficiency of resources.
 At present, there are at least 15
Trillion files available on the
internet, The vast availability of
resources makes a complex task in
retrieving the relevant data
resources efficiently
 In order to solve problems of large
redundant information and
relevant data resources research
the need for data mining in
network shared data resources
arose.

Existing
Methods of
network
shared
resources
mining
• There has been a significant research done in data mining methods in relevant
data resources research and various techniques came into picture.
• clustering analysis algorithm based Method where it uses clustering analysis
algorithm to process resource data, construct the data preprocessing set, and
calculate the data feature vector.
• Another method based on multi-dimensional resource coordination and
aggregation where this technique focuses on using the data center's network
resource sharing process analysis as the basis for building a multidimensional
resource aggregation data model.
• using fuzzy logic to build multidimensional collaborative fitness functions, and
using data mining to optimize decision-making in order to increase the execution
efficiency of the data mining process.
• However, Although these methods produced some excellent results they lack in
run time efficiency, precision and they are usually complex to apply practically.
• In order to overcome above drawbacks a new method based on k means
clustering algorithm has come into picture.

WHAT IS
CLUSTERING?
 Clustering is used in assembling
bulky data into clusters or
groups that helps us to visualize
the internal structure of the
data. Basically, it is a grouping
of items based on how similar
and distinct they are to one
another
 For example, there is some
online shopping site where we
can find variety of stuffs from
electronics, clothing, books,
grocery items, cosmetic items,
accessories. Here in figure 2
describes how it looks after
clustering is done.

STAGES OF
CLUSTERING
 Raw Data
 Clustering Algorithm
 Clusters

STAGES OF CLUSTERING
 Raw Data: Raw data (which are not being processed yet) are collected from various sources on which we
want to solicit various clustering algorithm
 Clustering Algorithm: A specific algorithm is selected according to our requirements and then that very
algorithm is applied on the raw data that were being selected.
 Clusters: After soliciting the selected clustering algorithm on the raw data, we acquire our clusters.

TYPES OF
CLUSTERING
 Partitioning Method
 Density-based Method
 Hierarchical Method
 Grid-based method
 Model-based clustering method
 Constraint-based method

PARTITIONING METHOD
 In the case of partitioning clustering method,
the objects of the datasets are segregated into
numerous subsets.
 Given some examples of the partitioning
algorithms are K-means, PAM (Partitioning
AroundMedoids).
 The figure shows how clusters are formed after
applying partitioning clustering technique

DENSITY-BASED METHOD
 Density-Based Clustering method identify
distinctive clusters in the data, based on the
idea that a cluster/group in a data space is a
contiguous region of high point density,
separated from other clusters by sparse
regions.
 Basically, in this method clusters are formed or
the data spaces are partitioned by the density
of the data point in a particular region
applying Density-Based Method of clustering

HIERARCHICAL METHOD
 In the case of hierarchical clustering method,
the objects of the datasets are segregated in
the hierarchical fashion of clusters or groups.
 Examples: Agglomerative Hierarchical
clustering algorithm (AGNES), Divisive
Hierarchical clustering algorithm (DIANA) etc.,
applying Hierarchical Method of clustering

GRID-BASED METHOD
 In grid-based clustering method, the object
space is divided into fixed number of cells that
forms the shape of a grid like structure.
Clustering algorithm is STING (Statistical
Information Grid).
applying grid-based clustering methodrid-
based method

MODEL-BASED CLUSTERING METHOD
 Model-based clustering works on the concept
of Probability Model which is a mathematical
representation of any random occurrence of
dataset. Each of the groups that would form
will have different Probability Model.
applying Model-based clustering method

CONSTRAINT-BASED METHOD
 Constrained-based clustering method is a
semi-supervised learning technique where
amalgamation of small proportion of labeled
data with a large proportion of unlabeled data
occurs.
 Constrained K-means (COP-K-Means)
algorithm is one of the common algorithms
using this method
 The figure illustrates clustering using
Constraint-based method.

K-MEANS CLUSTERING ALGORITHM
 The K-Means algorithm is a sort of partition-based clustering approach that belongs to the unsupervised
learning techniques. It divides a huge set of data into K number of smaller groups. The two distinct steps
of this method are described below.
 a. First phase: K centroids or centers are selected haphazardly in this phase. K should have a permanent
value. During the procedure, it cannot be changed.
 b. Second phase: Each data point is given its closest center or centroids during this phase. Euclidean
distance is used to calculate the separation between cluster centroids or centers and all data points.
 The distance between any two points, let's say point x and point y, is known as the Euclidean distance.
The separation between x and y is equal to the separation between x and y. Equation (1) states the
following for the Euclidean distance between any two randomly chosen points, x and y:

 Algorithm for K-Means
 1. Input: Choose a database and select the value of K that is the number of clusters we want at the
end.Let
 the database be D with n number of data objects. D = {d1, d2, d3, ….,dn}
 2. Output: We will obtain an arrangement of K number of clusters.
 3. Algorithm
 (i) Randomly select the number of clusters, K.
 (ii) Choose the centre or the centroids for K clusters. The initial values of the centres are selected
 arbitrarily.

 (iii) Arrange all data objects to the closest cluster; this is
determined with the help of Euclidean distance
 theory.
 (iv) Again calculate the centre of the cluster. This is evaluated by
taking the mean of the data objects
 present in each of the cluster individually. If there are n objects say
x1, x2, x3, …., and then the mean is
 given in equation (2)
 (v) Repeat step (iii) and (iv) until convergence. This is basically an
iterative technique

APPLICATION OF K-MEANS CLUSTERING ALGORITHM IN
MINING OF NETWORK SHARED RESOURCES

K-MEANS-BASED DATA CLUSTERING OF NETWORK SHARED RESOURCES
 The K-means algorithm has emerged as the most well-known and
widely used algorithm in the process of data collecting due to its
advantages of high data processing efficiency, low computational
complexity, and strong scalability.
 The data of Network shared resources is clustered in to different
classes using k-means clustering in the manner shown in the
image.

 When compared to existing methods that are mentioned above the K-means clustering algorithm has
the following advantages:
 The K-means clustering technique has a significant robustness when managing data sets. In particular,
when using the algorithm to handle the class and the class has a large gap between the data set, the
classification results are improved.
 The input order of data objects has almost no impact on the classification outcomes when numerical
data sets are classified using the K-means clustering algorithm.

 The reason is that in order to achieve the classification of the data set, the distance formula is applied to
determine the distance from each data object to the center point during the clustering process using
this technique.
 Which was not in the case of above mentioned methods where the outcomes of classification division
are hugely impacted buy the order of input objects.
 This algorithm is capable of handling big data sets. The outcomes of data clustering won't be affected if
there is data overlap between different data sets, hence this approach has good practical use.

COMPARISONS WITH EXISTING METHODS

ACCURACY
COMPARISON
 The accuracy of k-means
based method is almost
close to 97% while the other
methods could not be more
than 80% as the number of
experiments increases.

DATA MINING TIME
COMPARISON
 The average time for data
mining using K-means
clustering based method is
only 0.6s. whereas, the
average time for other
methods are almost 4.2 and
2.9 seconds.

CONCLUSION
 in order to improve the quality of network shared
resource data mining, the K-means cluster network
data mining technique has accuracy of in-depth data
mining of network shared resources by the method is
always over 94%, and the average time of in-depth
data mining is only 0.6s,.
 suggesting that this method can achieve fast and
accurate in-depth data mining of network shared
resources.
 Yet, there are still a number of challenges including
the deep mining of language and cross-cultural
resource sharing as well as the security,
personalization, and intelligence of resource data
mining to resolve.

K- means clustering method based Data Mining of Network Shared Resources .pptx

More Related Content

Similar to K- means clustering method based Data Mining of Network Shared Resources .pptx (20)

Recently uploaded (20)

K- means clustering method based Data Mining of Network Shared Resources .pptx