SlideShare a Scribd company logo
Social Media Analysis Using K-
Means Clustering
Made By
Nishant Alsatwar
Introduction
• Social Media Analysis is based on the
analyzing the Facebook Data Set that we have
obtained from UCI Repository.
• We’re going to use K-Means Clustering
Algorithm to obtain the results in the form of
clusters.
• Clusters are analyzed to conclude the results.
Motivation
• Nowadays social media is very popular way to get
connected with friends and colleagues.
• When someone sends you a friend request that
request depends upon some common interests
or they might be your family members or
colleagues, etc.
• Our aim is to find out the intention behind
sending the friend request. Clusters are formed
on the basis of common interest and groups.
Objective
• The general theme of this survey is to know
the intention behind a friend request that a
people can request to his friend or somebody
else. Making it easy to understand the
intention of a request sent.
• To differentiate the users of the social media
on the basis of their friendship network and
dividing them in various clusters according to
relations of Mutual Friendship.
What is Data Mining ?
• Data mining is extracting useful information
from a lot of raw and unprocessed data using
some techniques such as data cleaning and
preprocessing.
What is Data Analysis ?
• Data Analysis deals with the utilization of
various techniques to extract useful
information from large volume of data and
obtained results are analyzed in order to
predict some useful patterns.
Proposed System
• Minimum RAM : 2GB
• Minimum HDD : 250GB
• OS : Windows
• Application Platform : R Studio
K- means Clustering
• The main idea is to define k centers, one for
each cluster.
• The next step is to take each point belonging
to a given dataset and associate it to the
nearest center.
• Find the distance between center and each
point using Euclidean Distance Formula.
Block Diagram
Mathematical Modelling
• Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc}
• be the set of centers.
• [1] Randomly select ā€˜c’ cluster centers.
• [2] Calculate the distance between each data point and cluster centers.
• [3] Assign the data point to the cluster center whose distance from the
• cluster center is minimum of all the cluster centers.
• [4] Recalculate the new cluster center using:
• where, ā€˜ci’ represents the number of data points in ith cluster.
• [5] Recalculate the distance between each data point and new obtained
cluster
• centers.
• [6] If no data point was reassigned then stop, otherwise repeat from step
[3].
Mathematical Modelling
• Calculate the values for the points with
respect to the centroid with the help of
Euclidean Distance Formula.
Dataset Description
Node 1 Node 2 Link Timestamp
1 12 1 0
1 20 1 1.22E+09
1 24 1 1.23E+09
1 25 1 1.23E+09
1 26 1 0
1 27 1 0
2 3 1 1.18E+09
Dataset Description
• ā€œNode 1ā€ represents ā€œPerson 1ā€ and ā€œNode 2ā€
represents ā€œPerson 2ā€, ā€œLinkā€ represents the
friendship network between those two
persons. If it exists, then the value in Link field
will be ā€œ1ā€ (One) otherwise it will be ā€œ0ā€
(Zero). The ā€œTimestampā€ field represents the
standard format for the timestamp on what
time the friend request sent to the person.
Timestamp Conversion
http://tools.zenverse.net/timestamp-to-date/
• Using the website mentioned above, we can
convert the timestamp into human readable
time and date format.
• This website uses online application to convert
Unix Hexadecimal Timestamp into Human
Readable Format.
Timestamp Conversion
Timestamp Conversion
Timestamp Conversion
Timestamp Conversion
Expected Output When Dataset is
Linear
Expected Output When Dataset is
Linear
Expected Output When Dataset is
Linear
Expected Output When Dataset is
Non-Linear
Friend Recommendation
Friend Recommendation
• ā€œCorrelationā€ between networks means that
the topologies of different networks share
similar properties. According to these similar
properties, we can make inferences from one
network to another.
Friend Recommendation
• For example, if two nodes have a strong tie in
the Flickr tag network, we might guess that
they are also in each other’s contact list.
However, we cannot say that they will be
friends with each other in Flickr: Remember
that the topologies of the tag and contact
networks are not the same. To make more
precise recommendation, we should
determine ho the two networks are
correlated.
Network Alignment
• ā€œNetwork alignmentā€ is defined as the action
of mapping one network to another with a
number of constraints/rules. It has been
widely applied in the fields of bio-informatics
and computer vision. Here, we take advantage
of the study of network alignment in other
fields, such as bio-informatics, to use as a new
approach in social media.
Network Alignment
• To model the network correlations, we propose to align
tag and contact networks through important tag
feature selection.
• An ā€œimportantā€ feature is decided by whether it
contributes to the correlation of the tag network with
the contact network, or in other words, makes the
topologies of the two networks more similar. The
reason we select important features is that a person
usually presents many social features in social
networks, some of which are attractive to others, and
some of which are not very useful for building
relationships.
Example
• A photographer uploads images to Flickr tags
such as ā€œnatural animalsā€, ā€œhistorical buildingsā€,
ā€œstreet viewsā€ and ā€œpeopleā€. We view these tags
as different feature words. The photographer
may find that most of his friends in the Flickr
network contact him because of the photos
tagged with ā€œnatural animalsā€ and ā€œhistorical
buildingsā€, rather than ā€œstreet viewsā€ and
ā€œpeopleā€. This indicates that the first two feature
words are more important than the last two for
friend recommendation.
Friend Recommendation
• If two users in the tag network have a strong
similarity in the selected features after the
alignment, we can infer that they have a
higher possibility of having a relationship in
the contact network.
• To make more precise friend
recommendation, we also consider network
structure preservation in our algorithm in
addition to network alignment.
Preservation
• ā€œpreservationā€ means that we do not
significantly change the tag network structure
before and after alignment. By preserving the
tag network structure on Flickr, we reduce the
over-fitting risk of our algorithm.
Data Mining In Social Networks Using K-Means Clustering Algorithm
Social media analytics research serves
several purposes:
• facilitating conversations and interaction
between online communities and
• extracting useful patterns and intelligence to
serve entities that include, but are not limited
to, active contributors in ongoing dialogues.
Results
Conclusion
• Recent work in machine learning and data mining has made impressive
strides toward learning highly accurate models of relational data.
• Making use of appropriate algorithms such as K-means for extraction of
useful patterns will leads to useful results.
• We propose a new friend recommendation method, based on network
correlation, by considering the effect of different social roles.
• To model the correlation between different networks, we develop a
method that aligns these networks through important feature selection.
• We also consider preserving the network structure for a more precise
recommendation.
• We conduct comprehensive experiments to show that the proposed
method significantly improves the accuracy of friend-recommendation.
References
• [1] Constraint Neighborhood Projections for Semi-
Supervised Clustering Hongjun Wang, Tao Li, Tianrui Li, and
Yan Yang.
• [2] Learning Assignment Order of Instances for the
Constrained K-Means Clustering Algorithm, Yi Hong and
Sam Kwong, Senior Member, IEEE
• [3] Extensions of Kmeans-Type Algorithms: A New
Clustering Framework by Integrating Intracluster
Compactness and Intercluster Separation Xiaohui Huang,
Yunming Ye, and Haijun Zhang.
• [4] Special Section on Social Media as Sensors.
• 5] Special Issue on Social Media Analytics: Understanding
the Pulse of the Society.
References
• [6] Visual Analytics for Multimodal Social Network Analysis:
A Design Study with Social Scientists.
• [7] Social Friend Recommendation Based on Multiple
Network Correlation.
• [8] OpinionFlow: Visual Analysis of Opinion Diffusion on
Social Media Yingcai Wu, Member, IEEE, Shixia Liu, Senior
Member, IEEE, Kai Yan, Mengchen Liu, Fangzhao Wu.
• [9] A Survey on Visual Analytics of Social Media Data
Yingcai Wu, Nan Cao, David Gotz, Yap-Peng Tan, and Daniel
A. Keim
• [10] Analyzing and Visualizing Web Opinion Development
and Social Interactions With Density-Based Clustering
Christopher C. Yang and Tobun Dorbin Ng, Member, IEEE.
Thank You

More Related Content

DOCX
Soal UAS Pemrograman Berorientasi Objek kelas 12 SMK semester ganjil tahun aj...
PPTX
Social Media & Marketing
PPT
Metode Pemisahan Kimia
PDF
Applications of Digital Marketing-For success in Business
PPT
Imunologi tumor bag.9
PPT
9 pilot projects rs final id 2 r1
PPTX
Komunikasi sel
PPT
Inflamasi
Soal UAS Pemrograman Berorientasi Objek kelas 12 SMK semester ganjil tahun aj...
Social Media & Marketing
Metode Pemisahan Kimia
Applications of Digital Marketing-For success in Business
Imunologi tumor bag.9
9 pilot projects rs final id 2 r1
Komunikasi sel
Inflamasi

What's hot (12)

PPTX
Presentasi live streaming 2015
DOCX
Makalah Pemisahan Campuran
PDF
Portfolio 15 - Facebook Ads.pdf
PDF
Analisis soal secara manual
PPTX
MATERI 2 BTIK KLS 9 SEM 1- Sistem Jaringan Internet.pptx
PPTX
Konsep dasar biologi
PDF
Kimia Organik semester 7
PPTX
Hiperaktif
PDF
Sosyal Medya Kullanım Kılavuzu
PPTX
Blok Situs di Mikrotik Menggunakan Layer 7 Protokol
DOCX
Laporan praktikum biofisika polarimeter won2
PPTX
SOSYAL MEDYANIN ƖNEMİ
Presentasi live streaming 2015
Makalah Pemisahan Campuran
Portfolio 15 - Facebook Ads.pdf
Analisis soal secara manual
MATERI 2 BTIK KLS 9 SEM 1- Sistem Jaringan Internet.pptx
Konsep dasar biologi
Kimia Organik semester 7
Hiperaktif
Sosyal Medya Kullanım Kılavuzu
Blok Situs di Mikrotik Menggunakan Layer 7 Protokol
Laporan praktikum biofisika polarimeter won2
SOSYAL MEDYANIN ƖNEMİ
Ad

Similar to Data Mining In Social Networks Using K-Means Clustering Algorithm (20)

PDF
Social Friend Overlying Communities Based on Social Network Context
PPTX
Sylva workshop.gt that camp.2012
PPTX
Social Network Analysis Using Gephi
PPTX
Recomendation system: Community Detection Based Recomendation System using Hy...
PDF
CS6010 Social Network Analysis Unit V
PDF
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
PPTX
dbms ppt parul university dbms course for
PDF
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
PDF
Tutorial on Relationship Mining In Online Social Networks
Ā 
PPTX
Delab_link_prediction_for faloutsos.pptx
PDF
TruSIS: Trust Accross Social Network
PPTX
Network Measures Social Computing-Unit 2.pptx
PPTX
Module1:Social Networks-PG(Computer Network Engineering)
PDF
Deducing Private Information from Social Network Using Unified Classification
PPTX
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
PDF
cs224w-79-final
PDF
kdd2015-feed (1)
PDF
Prediction of Reaction towards Textual Posts in Social Networks
PPTX
Social Network Analysis with Spark
Social Friend Overlying Communities Based on Social Network Context
Sylva workshop.gt that camp.2012
Social Network Analysis Using Gephi
Recomendation system: Community Detection Based Recomendation System using Hy...
CS6010 Social Network Analysis Unit V
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
dbms ppt parul university dbms course for
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Tutorial on Relationship Mining In Online Social Networks
Ā 
Delab_link_prediction_for faloutsos.pptx
TruSIS: Trust Accross Social Network
Network Measures Social Computing-Unit 2.pptx
Module1:Social Networks-PG(Computer Network Engineering)
Deducing Private Information from Social Network Using Unified Classification
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
cs224w-79-final
kdd2015-feed (1)
Prediction of Reaction towards Textual Posts in Social Networks
Social Network Analysis with Spark
Ad

Recently uploaded (20)

PDF
Real Presence. Real Power. Boost with Authenticity
PPTX
Result-Driven Social Media Marketing Services | Boost ROI
PDF
Live Echo Boost on TikTok_ Double Devices, Higher Ranks
Ā 
PPTX
Types of Social Media Marketing for Business Success
PPTX
Preposition and Asking and Responding Suggestion.pptx
PDF
Presence That Pays Off Activate My Social Growth
PDF
Create. Post. Dominate. Let's Build Together
PPTX
Strategies for Social Media App Enhancement
PDF
11111111111111111111111111111111111111111111111
PDF
Medium @mikehydes The Cryptomaster About page
PDF
Subscribe This Channel Subscribe Back You
PDF
Your Best Post Vanished. Blame the Attention Economy
PPTX
How Social Media Influencers Repurpose Content (1).pptx
PDF
A copy of a Medium article wishing Merry Christmas To All My Followers
PDF
Instant Audience, Long-Term Impact Buy Real Telegram Members
PDF
Mastering Social Media Marketing in 2025.pdf
PDF
FINAL-Content-Marketing-Made-Easy-Workbook-Guied-Editable.pdf
PDF
Medium @mikehydes The Cryptomaster Home page
PDF
Instagram Reels Growth Guide 2025.......
PDF
How can India improve its Public Diplomacy - Social Media.pdf
Real Presence. Real Power. Boost with Authenticity
Result-Driven Social Media Marketing Services | Boost ROI
Live Echo Boost on TikTok_ Double Devices, Higher Ranks
Ā 
Types of Social Media Marketing for Business Success
Preposition and Asking and Responding Suggestion.pptx
Presence That Pays Off Activate My Social Growth
Create. Post. Dominate. Let's Build Together
Strategies for Social Media App Enhancement
11111111111111111111111111111111111111111111111
Medium @mikehydes The Cryptomaster About page
Subscribe This Channel Subscribe Back You
Your Best Post Vanished. Blame the Attention Economy
How Social Media Influencers Repurpose Content (1).pptx
A copy of a Medium article wishing Merry Christmas To All My Followers
Instant Audience, Long-Term Impact Buy Real Telegram Members
Mastering Social Media Marketing in 2025.pdf
FINAL-Content-Marketing-Made-Easy-Workbook-Guied-Editable.pdf
Medium @mikehydes The Cryptomaster Home page
Instagram Reels Growth Guide 2025.......
How can India improve its Public Diplomacy - Social Media.pdf

Data Mining In Social Networks Using K-Means Clustering Algorithm

  • 1. Social Media Analysis Using K- Means Clustering Made By Nishant Alsatwar
  • 2. Introduction • Social Media Analysis is based on the analyzing the Facebook Data Set that we have obtained from UCI Repository. • We’re going to use K-Means Clustering Algorithm to obtain the results in the form of clusters. • Clusters are analyzed to conclude the results.
  • 3. Motivation • Nowadays social media is very popular way to get connected with friends and colleagues. • When someone sends you a friend request that request depends upon some common interests or they might be your family members or colleagues, etc. • Our aim is to find out the intention behind sending the friend request. Clusters are formed on the basis of common interest and groups.
  • 4. Objective • The general theme of this survey is to know the intention behind a friend request that a people can request to his friend or somebody else. Making it easy to understand the intention of a request sent. • To differentiate the users of the social media on the basis of their friendship network and dividing them in various clusters according to relations of Mutual Friendship.
  • 5. What is Data Mining ? • Data mining is extracting useful information from a lot of raw and unprocessed data using some techniques such as data cleaning and preprocessing.
  • 6. What is Data Analysis ? • Data Analysis deals with the utilization of various techniques to extract useful information from large volume of data and obtained results are analyzed in order to predict some useful patterns.
  • 7. Proposed System • Minimum RAM : 2GB • Minimum HDD : 250GB • OS : Windows • Application Platform : R Studio
  • 8. K- means Clustering • The main idea is to define k centers, one for each cluster. • The next step is to take each point belonging to a given dataset and associate it to the nearest center. • Find the distance between center and each point using Euclidean Distance Formula.
  • 10. Mathematical Modelling • Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} • be the set of centers. • [1] Randomly select ā€˜c’ cluster centers. • [2] Calculate the distance between each data point and cluster centers. • [3] Assign the data point to the cluster center whose distance from the • cluster center is minimum of all the cluster centers. • [4] Recalculate the new cluster center using: • where, ā€˜ci’ represents the number of data points in ith cluster. • [5] Recalculate the distance between each data point and new obtained cluster • centers. • [6] If no data point was reassigned then stop, otherwise repeat from step [3].
  • 11. Mathematical Modelling • Calculate the values for the points with respect to the centroid with the help of Euclidean Distance Formula.
  • 12. Dataset Description Node 1 Node 2 Link Timestamp 1 12 1 0 1 20 1 1.22E+09 1 24 1 1.23E+09 1 25 1 1.23E+09 1 26 1 0 1 27 1 0 2 3 1 1.18E+09
  • 13. Dataset Description • ā€œNode 1ā€ represents ā€œPerson 1ā€ and ā€œNode 2ā€ represents ā€œPerson 2ā€, ā€œLinkā€ represents the friendship network between those two persons. If it exists, then the value in Link field will be ā€œ1ā€ (One) otherwise it will be ā€œ0ā€ (Zero). The ā€œTimestampā€ field represents the standard format for the timestamp on what time the friend request sent to the person.
  • 14. Timestamp Conversion http://tools.zenverse.net/timestamp-to-date/ • Using the website mentioned above, we can convert the timestamp into human readable time and date format. • This website uses online application to convert Unix Hexadecimal Timestamp into Human Readable Format.
  • 19. Expected Output When Dataset is Linear
  • 20. Expected Output When Dataset is Linear
  • 21. Expected Output When Dataset is Linear
  • 22. Expected Output When Dataset is Non-Linear
  • 24. Friend Recommendation • ā€œCorrelationā€ between networks means that the topologies of different networks share similar properties. According to these similar properties, we can make inferences from one network to another.
  • 25. Friend Recommendation • For example, if two nodes have a strong tie in the Flickr tag network, we might guess that they are also in each other’s contact list. However, we cannot say that they will be friends with each other in Flickr: Remember that the topologies of the tag and contact networks are not the same. To make more precise recommendation, we should determine ho the two networks are correlated.
  • 26. Network Alignment • ā€œNetwork alignmentā€ is defined as the action of mapping one network to another with a number of constraints/rules. It has been widely applied in the fields of bio-informatics and computer vision. Here, we take advantage of the study of network alignment in other fields, such as bio-informatics, to use as a new approach in social media.
  • 27. Network Alignment • To model the network correlations, we propose to align tag and contact networks through important tag feature selection. • An ā€œimportantā€ feature is decided by whether it contributes to the correlation of the tag network with the contact network, or in other words, makes the topologies of the two networks more similar. The reason we select important features is that a person usually presents many social features in social networks, some of which are attractive to others, and some of which are not very useful for building relationships.
  • 28. Example • A photographer uploads images to Flickr tags such as ā€œnatural animalsā€, ā€œhistorical buildingsā€, ā€œstreet viewsā€ and ā€œpeopleā€. We view these tags as different feature words. The photographer may find that most of his friends in the Flickr network contact him because of the photos tagged with ā€œnatural animalsā€ and ā€œhistorical buildingsā€, rather than ā€œstreet viewsā€ and ā€œpeopleā€. This indicates that the first two feature words are more important than the last two for friend recommendation.
  • 29. Friend Recommendation • If two users in the tag network have a strong similarity in the selected features after the alignment, we can infer that they have a higher possibility of having a relationship in the contact network. • To make more precise friend recommendation, we also consider network structure preservation in our algorithm in addition to network alignment.
  • 30. Preservation • ā€œpreservationā€ means that we do not significantly change the tag network structure before and after alignment. By preserving the tag network structure on Flickr, we reduce the over-fitting risk of our algorithm.
  • 32. Social media analytics research serves several purposes: • facilitating conversations and interaction between online communities and • extracting useful patterns and intelligence to serve entities that include, but are not limited to, active contributors in ongoing dialogues.
  • 34. Conclusion • Recent work in machine learning and data mining has made impressive strides toward learning highly accurate models of relational data. • Making use of appropriate algorithms such as K-means for extraction of useful patterns will leads to useful results. • We propose a new friend recommendation method, based on network correlation, by considering the effect of different social roles. • To model the correlation between different networks, we develop a method that aligns these networks through important feature selection. • We also consider preserving the network structure for a more precise recommendation. • We conduct comprehensive experiments to show that the proposed method significantly improves the accuracy of friend-recommendation.
  • 35. References • [1] Constraint Neighborhood Projections for Semi- Supervised Clustering Hongjun Wang, Tao Li, Tianrui Li, and Yan Yang. • [2] Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm, Yi Hong and Sam Kwong, Senior Member, IEEE • [3] Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation Xiaohui Huang, Yunming Ye, and Haijun Zhang. • [4] Special Section on Social Media as Sensors. • 5] Special Issue on Social Media Analytics: Understanding the Pulse of the Society.
  • 36. References • [6] Visual Analytics for Multimodal Social Network Analysis: A Design Study with Social Scientists. • [7] Social Friend Recommendation Based on Multiple Network Correlation. • [8] OpinionFlow: Visual Analysis of Opinion Diffusion on Social Media Yingcai Wu, Member, IEEE, Shixia Liu, Senior Member, IEEE, Kai Yan, Mengchen Liu, Fangzhao Wu. • [9] A Survey on Visual Analytics of Social Media Data Yingcai Wu, Nan Cao, David Gotz, Yap-Peng Tan, and Daniel A. Keim • [10] Analyzing and Visualizing Web Opinion Development and Social Interactions With Density-Based Clustering Christopher C. Yang and Tobun Dorbin Ng, Member, IEEE.