Data Mining In Social Networks Using K-Means Clustering Algorithm

Social Media Analysis Using K-
Means Clustering
Made By
Nishant Alsatwar

Introduction
• Social Media Analysis is based on the
analyzing the Facebook Data Set that we have
obtained from UCI Repository.
• We’re going to use K-Means Clustering
Algorithm to obtain the results in the form of
clusters.
• Clusters are analyzed to conclude the results.

Motivation
• Nowadays social media is very popular way to get
connected with friends and colleagues.
• When someone sends you a friend request that
request depends upon some common interests
or they might be your family members or
colleagues, etc.
• Our aim is to find out the intention behind
sending the friend request. Clusters are formed
on the basis of common interest and groups.

Objective
• The general theme of this survey is to know
the intention behind a friend request that a
people can request to his friend or somebody
else. Making it easy to understand the
intention of a request sent.
• To differentiate the users of the social media
on the basis of their friendship network and
dividing them in various clusters according to
relations of Mutual Friendship.

What is Data Mining ?
• Data mining is extracting useful information
from a lot of raw and unprocessed data using
some techniques such as data cleaning and
preprocessing.

What is Data Analysis ?
• Data Analysis deals with the utilization of
various techniques to extract useful
information from large volume of data and
obtained results are analyzed in order to
predict some useful patterns.

Proposed System
• Minimum RAM : 2GB
• Minimum HDD : 250GB
• OS : Windows
• Application Platform : R Studio

K- means Clustering
• The main idea is to define k centers, one for
each cluster.
• The next step is to take each point belonging
to a given dataset and associate it to the
nearest center.
• Find the distance between center and each
point using Euclidean Distance Formula.

Mathematical Modelling
• Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc}
• be the set of centers.
• [1] Randomly select ‘c’ cluster centers.
• [2] Calculate the distance between each data point and cluster centers.
• [3] Assign the data point to the cluster center whose distance from the
• cluster center is minimum of all the cluster centers.
• [4] Recalculate the new cluster center using:
• where, ‘ci’ represents the number of data points in ith cluster.
• [5] Recalculate the distance between each data point and new obtained
cluster
• centers.
• [6] If no data point was reassigned then stop, otherwise repeat from step
[3].

Mathematical Modelling
• Calculate the values for the points with
respect to the centroid with the help of
Euclidean Distance Formula.

Dataset Description
Node 1 Node 2 Link Timestamp
1 12 1 0
1 20 1 1.22E+09
1 24 1 1.23E+09
1 25 1 1.23E+09
1 26 1 0
1 27 1 0
2 3 1 1.18E+09

Dataset Description
• “Node 1” represents “Person 1” and “Node 2”
represents “Person 2”, “Link” represents the
friendship network between those two
persons. If it exists, then the value in Link field
will be “1” (One) otherwise it will be “0”
(Zero). The “Timestamp” field represents the
standard format for the timestamp on what
time the friend request sent to the person.

Timestamp Conversion
http://tools.zenverse.net/timestamp-to-date/
• Using the website mentioned above, we can
convert the timestamp into human readable
time and date format.
• This website uses online application to convert
Unix Hexadecimal Timestamp into Human
Readable Format.

Expected Output When Dataset is
Linear

Expected Output When Dataset is
Non-Linear

Friend Recommendation
• “Correlation” between networks means that
the topologies of different networks share
similar properties. According to these similar
properties, we can make inferences from one
network to another.

• For example, if two nodes have a strong tie in
the Flickr tag network, we might guess that
they are also in each other’s contact list.
However, we cannot say that they will be
friends with each other in Flickr: Remember
that the topologies of the tag and contact
networks are not the same. To make more
precise recommendation, we should
determine ho the two networks are
correlated.

Network Alignment
• “Network alignment” is defined as the action
of mapping one network to another with a
number of constraints/rules. It has been
widely applied in the fields of bio-informatics
and computer vision. Here, we take advantage
of the study of network alignment in other
fields, such as bio-informatics, to use as a new
approach in social media.

Network Alignment
• To model the network correlations, we propose to align
tag and contact networks through important tag
feature selection.
• An “important” feature is decided by whether it
contributes to the correlation of the tag network with
the contact network, or in other words, makes the
topologies of the two networks more similar. The
reason we select important features is that a person
usually presents many social features in social
networks, some of which are attractive to others, and
some of which are not very useful for building
relationships.

Example
• A photographer uploads images to Flickr tags
such as “natural animals”, “historical buildings”,
“street views” and “people”. We view these tags
as different feature words. The photographer
may find that most of his friends in the Flickr
network contact him because of the photos
tagged with “natural animals” and “historical
buildings”, rather than “street views” and
“people”. This indicates that the first two feature
words are more important than the last two for
friend recommendation.

• If two users in the tag network have a strong
similarity in the selected features after the
alignment, we can infer that they have a
higher possibility of having a relationship in
the contact network.
• To make more precise friend
recommendation, we also consider network
structure preservation in our algorithm in
addition to network alignment.

Preservation
• “preservation” means that we do not
significantly change the tag network structure
before and after alignment. By preserving the
tag network structure on Flickr, we reduce the
over-fitting risk of our algorithm.

Data Mining In Social Networks Using K-Means Clustering Algorithm

Social media analytics research serves
several purposes:
• facilitating conversations and interaction
between online communities and
• extracting useful patterns and intelligence to
serve entities that include, but are not limited
to, active contributors in ongoing dialogues.

Conclusion
• Recent work in machine learning and data mining has made impressive
strides toward learning highly accurate models of relational data.
• Making use of appropriate algorithms such as K-means for extraction of
useful patterns will leads to useful results.
• We propose a new friend recommendation method, based on network
correlation, by considering the effect of different social roles.
• To model the correlation between different networks, we develop a
method that aligns these networks through important feature selection.
• We also consider preserving the network structure for a more precise
recommendation.
• We conduct comprehensive experiments to show that the proposed
method significantly improves the accuracy of friend-recommendation.

References
• [1] Constraint Neighborhood Projections for Semi-
Supervised Clustering Hongjun Wang, Tao Li, Tianrui Li, and
Yan Yang.
• [2] Learning Assignment Order of Instances for the
Constrained K-Means Clustering Algorithm, Yi Hong and
Sam Kwong, Senior Member, IEEE
• [3] Extensions of Kmeans-Type Algorithms: A New
Clustering Framework by Integrating Intracluster
Compactness and Intercluster Separation Xiaohui Huang,
Yunming Ye, and Haijun Zhang.
• [4] Special Section on Social Media as Sensors.
• 5] Special Issue on Social Media Analytics: Understanding
the Pulse of the Society.

References
• [6] Visual Analytics for Multimodal Social Network Analysis:
A Design Study with Social Scientists.
• [7] Social Friend Recommendation Based on Multiple
Network Correlation.
• [8] OpinionFlow: Visual Analysis of Opinion Diffusion on
Social Media Yingcai Wu, Member, IEEE, Shixia Liu, Senior
Member, IEEE, Kai Yan, Mengchen Liu, Fangzhao Wu.
• [9] A Survey on Visual Analytics of Social Media Data
Yingcai Wu, Nan Cao, David Gotz, Yap-Peng Tan, and Daniel
A. Keim
• [10] Analyzing and Visualizing Web Opinion Development
and Social Interactions With Density-Based Clustering
Christopher C. Yang and Tobun Dorbin Ng, Member, IEEE.

Data Mining In Social Networks Using K-Means Clustering Algorithm

More Related Content

What's hot (12)

Similar to Data Mining In Social Networks Using K-Means Clustering Algorithm (20)

Recently uploaded (20)

Data Mining In Social Networks Using K-Means Clustering Algorithm