SlideShare a Scribd company logo
Social Network Analysis
Approach and Applications
Joshua S. White
PhD Candidate, Engineering Science
April 22, 2014
Committee Members:
Jeanna N. Matthews, PhD (Advisor)
John S. Bay, PhD (External Examiner)
Chris Lynch, PhD
Chen Liu, PhD
Stephanie C. Schuckers, PhD
| Clarkson University 1/42
Outline
Motivation . . . . . . . . . . . . . . . . 3
Problem Questions . . . . . . . . . 4
Method & Publications . . . . . . . . . 5
Coalmine . . . . . . . . . . . . . . . . . 6
PySNAP . . . . . . . . . . . . . . . . . 7
Established Dataset . . . . . . . . . . . 8
Insights into the Data . . . . . . . 9
Botnet Command & Control Detection . 10
Phishing Website Detection . . . . . . . 12
Phishing Website Detection Con-
tinuum: ML based detection 14
Malware Infection Vector Detection . . 15
Actor Identification . . . . . . . . . . . 19
Event Identification . . . . . . . . . . . 24
Conclusions . . . . . . . . . . . . . . . 30
Future Work . . . . . . . . . . . . . . . 31
Acknowledgements . . . . . . . . . . . 32
References . . . . . . . . . . . . . . . . 33
Contact . . . . . . . . . . . . . . . . . 34
Questions . . . . . . . . . . . . . . . . 35
Suplimental Material . . . . . . . . . . 36
| Clarkson University 2/42
Motivation
Partially inspired by Gladwell’s book, The Tipping Point [1], in which he discusses
how life can be thought of as an epidemic. Some criticism exists as to Gladwell’s
rigor, however for our use it is about inspiration and motivation not accuracy.
The Books Key Points “for our purposes”
• Actors (Connectors, Mavens, Salesmen).
• Information spreads like disease.
• Ideas reach a tipping point (critical mass).
Let’s Face It - Social Networks Are Fun
• We are a social species, that enjoy communicating and self adulation.
| Clarkson University 3/42
Problem Questions
• Can we come up with a way of classifying users based on actor types?
• Can we determine who the opinion leaders or influencers are?
• Can we determine how information spreads on these networks?
• Can we detect malicious social network use?
• Are there information security applications for social network data-mining?
| Clarkson University 4/42
Method & Publications
• Establish a reliable collection mechanism.
• Establish a large dataset that can be utilized to answer each question.
• Use a case study approach, whereby each case feeds the next.
• Produce each case study as an individual publication or presentation.
– 3 x Published Proceedings
– 2 x Pending Proceedings
– 3 x Invited Presentations
| Clarkson University 5/42
Coalmine
• Scales well based on initial tests
• Useful for both manual and automated detection
• Allowed us to refine our data collection capabilities
At the Time (Future Work)
• Rebuild of the tool to fix scaling limitations
• More extensible Map/Reduce method
• Inclusion of native multi-threading capability
• New storage and distribution method
• New algorithms for automated opinion leader detection
| Clarkson University 6/42
PySNAP
• Fixes all of the previous issues with Coalmine
• Completely reimplimented in Python with a few supportive Bash scripts
• Utilizes the DISCO MapReduce framework, also built on Python
• Included a better method for data capture that was previously bolted on to Coalmine
• Allowed us to establish a large dataset for future work
| Clarkson University 7/42
Established Dataset
• Over the course of 2012 we collected 165 TB of Twitter Data (Uncompressed)
– 175 Days Collected, 147 Full Days
∗ Estimated 45 Billion Tweets
– Recently released estimates place total Twitter traffic at 175 million tweets per
day in 2012
– Thus our daily collection rates varied between 50% and 80% of total Twitter
traffic.
– We captured complete tweet data in JSON format using Twitters REST API.
∗ This data includes a large number of additional fields other than the mes-
sage text, all of which can be taken into account when doing measure-
ments.
| Clarkson University 8/42
Insights into the Data
| Clarkson University 9/42
Botnet Command & Control Detection
• Joshua S White, Jeanna N Matthews, and John L Stacy. Coalmine: an experience in building a system for social
media analytics. In SPIE Defense, Security, and Sensing, pages 84080A-84080A. International Society for Optics
and Photonics, 2012.
| Clarkson University 10/42
Botnet Command & Control Detection Continued
Date/Time UID Text MSG Entropy Source
Sun Mar 20 15:27:02
+0000 2011
49492150
668365824
Shutdown -r now 3.373557
26227518
http://twitter.com/Ebastos
Sun Mar 20 01:25:20
+0000 2011
49280326
475853825
# shutdown -h now 3.373557
26227518
http://twitter.com/ohdediku
Sun Mar 20 21:40:53
+0000 2011
49586229
964062720
$ sudo shutdown -h
now
3.373557
26227518
http://twitter.com/souzabruno
Sun Mar 20 19:38:41
+0000 2011
49555476
769280000
Text: sudo shut-
down -h now
3.373557
26227518
http://twitter.com/stormyblack
Sun Mar 20 18:51:51
+0000 2011
49543693
820116992
shutdown -now 3.373557
26227518
http://twitter.com/godzilla2k9
Sun Mar 20 18:52:30
+0000 2011
49543856
840126464
shutdown -h now !: 3.373557
26227518
http://twitter.com/ph3nagen
Sun Mar 20 18:52:30
+0000 2011
49600582
113177600
shutdown -H now. 3.373557
26227518
http://twitter.com/willybistuer
Sun Mar 20 22:37:54
+0000 2011
49597117
039251457
elmenda: su shut-
down -h now
3.373557
26227518
http://twitter.com/NeoVasili
| Clarkson University 11/42
Phishing Website Detection
• Joshua S White, Jeanna N Matthews, and John L Stacy. A method for the automated detection phishing websites
through both site characteristics and image analysis. In SPIE Defense, Security, and Sensing, pages 84080B- 84080B.
International Society for Optics and Photonics, 2012.
| Clarkson University 12/42
Phishing Website Detection Continued
(F)raud / (L)egit URL Structural
Fingerprint
Page Title pHash Value Hamming Score
Paypal Fraudulent http://si4r.com/_paypal
.co.uk/webscr.html?cmd
=SignIn&co_partnerId=2
&pUserId=&siteid=0
&pageType=&pa1=&i1
=&bshowgif=&UsingSSL
=&ru=&pp=&pa2=
&errmsg=&runame=
0,7,1,0,2 RETURNED
NOTHING
167161696874
89800000
1
Paypal Legitimate https://www.paypal.com/
cgi-bin/webscr?cmd=
_login-submit&dispatch=
5885d80a13c0db1f8e263
663d3faee8d1e83f46a369
95b3856cef1e18897ad75
27,3,0,0,2 Redirecting
- Paypal
184397071904
31800000
0
| Clarkson University 13/42
Phishing Website Detection Continuum: ML based
detection
• Title: An Image-based Feature Extraction Approach for Phishing Website Detection
• Authors: Hao Jiang, Joshua White, Jeanna Matthews
• Builds off of our previous work in phishing website detection, specifically the image
analysis approach
• Utilizes a Machine Learning based approach to identifying the most prominent images
on a webpage, usually the sites logo
• Is able to detect phishing sites that the phash/hamming distance method concludes as
not similar.
– These are the “poor quality” phishing sites
| Clarkson University 14/42
Malware Infection Vector Detection
• BEK (The Blackhole Exploit Kit) was the predominant MaaS (Malware as a Service)
in 2012.
• It accounted for an estimated 29% of all malicious URLs.
• BEK licenses went for around 1500$ USD
• BEK used Twitter as it’s primary means of spreading infectious URLs
• Our method detects these malicious URLs and infectious accounts on a large scale
| Clarkson University 15/42
Malware Infection Vector Detection Continued
• Joshua S. White and Jeanna N. Matthews, “It’s you on photo?: Automatic detection of Twitter accounts in-
fected with the Blackhole Exploit Kit,” Malicious and Unwanted Software: "The Americas" (MALWARE), 2013 8th
International Conference on , vol., no., pp.51,58, 22-24 Oct. 2013 doi: 10.1109/MALWARE.2013.6703685
| Clarkson University 16/42
Malware Infection Vector Detection Continued
| Clarkson University 17/42
Malware Infection Vector Detection Continued
| Clarkson University 18/42
Actor Identification
• Title: Connectors, Mavens, Salesmen and More: Actor Based Online Social Network
(OSN) Analysis Method Using Tensed Predicate Logic
• Authors: Joshua White and Jeanna Matthews
• Submitted to KDD2014 (Knowledge Discovery and Data Mining) Conference “Data
Mining for Social Good”
• Utilized multiple definitions of actor types to created tensed predicate logic descriptions
• Translated these logics into semantic queries
• Tested the queries against a known dataset
| Clarkson University 19/42
Actor Identification Continued
| Clarkson University 20/42
Actor Identification Continued
• Time is important
• Previous methods did not take event sequence into account
• Liaison Example:
| Clarkson University 21/42
Actor Identification Continued
| Clarkson University 22/42
Actor Identification Continued
| Clarkson University 23/42
Event Identification
• Still in the initial stages of this part of our work
• Given a general topic, “search term, hashtag,” we can identify most of the related
content from the dataset
• We have a means for alerting on all new posts regarding that term
• We can dig historically through the data and trace the path that an itea took
• We can identify the influential individuals, “accounts,” that played a part in the infor-
mation spread
• Our test case was the KONY2012 Event
| Clarkson University 24/42
Event Identification Continued
| Clarkson University 25/42
Event Identification Continued
• Top 10 Twitter Accounts, sending and receiving KONY2012 related Tweets
Directed @ Account Names In-Degree Origin Account Names Out-Degree
tothekidswho 625 twittonpeace 47
Invisible 125 interhabernet 44
youtube 118 DailyisOut 44
helpspreadthis 95 MEDYA_TURK 42
justinbieber 83 haber_42 35
prettypinkprobz 48 gundem_haber 30
ninadobrev 48 twittofpeace 22
MeekMill 47 korkmazhaber 19
ladygaga 43 tarafsiz_haber 14
KendallJenner 39 Son_DakikaHaber 13
| Clarkson University 26/42
Event Identification Continued
• Top 10 Twitter Accounts, retweeting and being retweeted regarding KONY2012
Retweeting Accounts In-Degree Message Source Out-Degree
MedyaKonya 8 Stop____Kony 2642
twittonpeace 8 tothekidswho 753
haber_42 7 konyfamous2012 716
gundem_haber 7 Kony2012Help 615
korkmazhaber 7 stop______kony 353
DailyisOut 7 WESTOPKONY 225
interhabernet 6 zaynmalik 221
KONYA_ZAMAN 6 iSayStopKony 127
konya_time 6 Stop_2012_Kony 80
konyagazetesi 5 Kony_Awareness 72
| Clarkson University 27/42
Event Identification Continued
| Clarkson University 28/42
Event Identification Continued
| Clarkson University 29/42
Conclusions
• We aimed to answer the following questions when we started this work:
– Can we come up with a way of classifying users based on actor types?
– Can we determine who the opinion leaders or influencers are?
– Can we determine how information spreads on these networks?
– Can we detect malicious social network use?
– Are there information security applications for social network data-mining?
• I think we did a good job at providing at least some cursory answers to these questions
| Clarkson University 30/42
Future Work
• We have applied for a data grant from Twitter
• We have, are in the process of, moving our entire dataset to the lab at Clarkson and
building up a new capture/analysis system
• I am planning on pursuing the semantic side of social network analysis
– Currently only one SNA semantic ontology exists and it’s on on paper.
– I am planning on rolling both the actor and event analysis into one approach
which will be part of a new ontology
| Clarkson University 31/42
Acknowledgements
• I would like to thank:
– Dr. Matthews
– Dr. Bay
– Dr. Lynch
– Dr. Schuckers
– Dr. Liu
| Clarkson University 32/42
References
[1] Gladwell, M. (2000). The tipping point. Boston: Little, Brown and Company
| Clarkson University 33/42
Contact
whitejs@clarkson.edu
| Clarkson University 34/42
Questions
Questions?
Suplimental Material
| Clarkson University 36/42
• DDFS
| Clarkson University 37/42
| Clarkson University 38/42
• Twitter JSON Key Fields
profile_link_color Coordinates verified
In_reply_to_screen_name Geo time_zone
In_reply_to_status_id text statuses_count
In_reply_to_status_id_str entities Contributors
In_reply_to_user_id place protected
profile_background_color contributors_enabled trunkated
profile_background_title default_profile retweeted
default_profile_image description id_translator
follow_request_sent followers_count location
friends_count geo_endabled favorites_count
profile_image_url_https listed_count following
profile_background_image_url notifications retweet_count
background_image_url_https name created_at
profile_image_url lang Favorited
sidebar_border_color use_background_image Id_str
sidebar_fill_color screen_name Created_at
profile_text_color show_all_inline_media Id
url utc_offset
| Clarkson University 39/42
• BEK Infectious Account Visualization
| Clarkson University 40/42
• Tensed Predicate Logic Key
| Clarkson University 41/42
• Coalmine User Interface
| Clarkson University 42/42

More Related Content

PPTX
People's mode of online engagement: The Many Faces of Digital Visitors and Re...
PDF
Introduction to Social Network Analysis
PPTX
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
PPTX
Assessing the available and accessible evidence: How personal reputations are...
PPTX
Social Network Analysis Introduction including Data Structure Graph overview.
PPT
How to conduct a social network analysis: A tool for empowering teams and wor...
PPTX
Studying Online Food Consumption and Production Patterns: Recent Trends and C...
PDF
05 Communities in Network
People's mode of online engagement: The Many Faces of Digital Visitors and Re...
Introduction to Social Network Analysis
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
Assessing the available and accessible evidence: How personal reputations are...
Social Network Analysis Introduction including Data Structure Graph overview.
How to conduct a social network analysis: A tool for empowering teams and wor...
Studying Online Food Consumption and Production Patterns: Recent Trends and C...
05 Communities in Network

What's hot (20)

PPTX
12 Network Experiments and Interventions: Studying Information Diffusion and ...
PPT
Virtual Assisted Self Interview Research
PPTX
Cognitive Models in Recommender Systems
PDF
Opinion Dynamics on Networks
PDF
Social Computing in the area of Big Data at the Know-Center Austria's leading...
PDF
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
PPTX
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
PDF
Social Network Analysis in Two Parts
PPT
01 Introduction to Networks Methods and Measures
PDF
Social network analysis intro part I
PDF
Social Network Analysis
PPTX
From research to reality: Transforming libraries for a global information world.
PDF
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
PPTX
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
PDF
Social machines: theory design and incentives
PPTX
11 Network Experiments and Interventions
PPTX
Recommending Items in Social Tagging Systems Using Tag and Time Information
PDF
Paper Writing in Applied Mathematics (slightly updated slides)
PPTX
Recommending Tags with a Model of Human Categorization
PPTX
Social Network Analysis (Part 1)
12 Network Experiments and Interventions: Studying Information Diffusion and ...
Virtual Assisted Self Interview Research
Cognitive Models in Recommender Systems
Opinion Dynamics on Networks
Social Computing in the area of Big Data at the Know-Center Austria's leading...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
Social Network Analysis in Two Parts
01 Introduction to Networks Methods and Measures
Social network analysis intro part I
Social Network Analysis
From research to reality: Transforming libraries for a global information world.
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social machines: theory design and incentives
11 Network Experiments and Interventions
Recommending Items in Social Tagging Systems Using Tag and Time Information
Paper Writing in Applied Mathematics (slightly updated slides)
Recommending Tags with a Model of Human Categorization
Social Network Analysis (Part 1)
Ad

Similar to Social Network Analysis Applications and Approach (20)

PDF
ase-social-informatics (6)
PPTX
The analysis of qualitative data 22nd Oct 2015
PDF
User behavior model & recommendation on basis of social networks
PPTX
NCME Big Data in Education
PDF
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
PPTX
Working with Social Media Data: Ethics & good practice around collecting, usi...
PDF
Applied calculus for the managerial life and social sciences 8th Edition Soo ...
PPTX
Panel: Our Scholarly Recognition System Doesn’t Still Work
PPTX
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
PDF
Social Network Analysis & an Introduction to Tools
PDF
Digital Scholar Webinar: Recruiting Research Participants Online Using Reddit
PDF
A brief introduction to crowdsourcing for data collection
PDF
Practical Applications for Social Network Analysis in Public Sector Marketing...
PPTX
Learning Analytics: Seeking new insights from educational data
PPTX
Aligning Learning Analytics with Classroom Practices & Needs
PDF
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
PDF
Lecture_1_Intro.pdf
PDF
ONA and the tools landscape
PDF
PPTX
Online Research: New Challenges and Opportunities
ase-social-informatics (6)
The analysis of qualitative data 22nd Oct 2015
User behavior model & recommendation on basis of social networks
NCME Big Data in Education
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Working with Social Media Data: Ethics & good practice around collecting, usi...
Applied calculus for the managerial life and social sciences 8th Edition Soo ...
Panel: Our Scholarly Recognition System Doesn’t Still Work
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
Social Network Analysis & an Introduction to Tools
Digital Scholar Webinar: Recruiting Research Participants Online Using Reddit
A brief introduction to crowdsourcing for data collection
Practical Applications for Social Network Analysis in Public Sector Marketing...
Learning Analytics: Seeking new insights from educational data
Aligning Learning Analytics with Classroom Practices & Needs
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Lecture_1_Intro.pdf
ONA and the tools landscape
Online Research: New Challenges and Opportunities
Ad

More from Joshua S. White, PhD josh@securemind.org (11)

PDF
Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
PDF
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
PDF
Presentation - Application of Actor Level Social Characteristic Indicator Sel...
PDF
ODP
Clarkson joshua white - ids testing - spie 2013 presentation - jsw - d1
PPT
Malware bek slides 20131023 final
PDF
CSIAC - Social Media Analysis and Privacy
PDF
Clarkson - Joshua White - Research Proposal Presentation
PPT
Coalmine spie 2012 presentation - jsw -d3
PPT
Phishing spie 2012 presentation - jsw - d2
PPT
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...
Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
Presentation - Application of Actor Level Social Characteristic Indicator Sel...
Clarkson joshua white - ids testing - spie 2013 presentation - jsw - d1
Malware bek slides 20131023 final
CSIAC - Social Media Analysis and Privacy
Clarkson - Joshua White - Research Proposal Presentation
Coalmine spie 2012 presentation - jsw -d3
Phishing spie 2012 presentation - jsw - d2
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Supervised vs unsupervised machine learning algorithms
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
climate analysis of Dhaka ,Banglades.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Foundation of Data Science unit number two notes
Fluorescence-microscope_Botany_detailed content
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx

Social Network Analysis Applications and Approach

  • 1. Social Network Analysis Approach and Applications Joshua S. White PhD Candidate, Engineering Science April 22, 2014 Committee Members: Jeanna N. Matthews, PhD (Advisor) John S. Bay, PhD (External Examiner) Chris Lynch, PhD Chen Liu, PhD Stephanie C. Schuckers, PhD | Clarkson University 1/42
  • 2. Outline Motivation . . . . . . . . . . . . . . . . 3 Problem Questions . . . . . . . . . 4 Method & Publications . . . . . . . . . 5 Coalmine . . . . . . . . . . . . . . . . . 6 PySNAP . . . . . . . . . . . . . . . . . 7 Established Dataset . . . . . . . . . . . 8 Insights into the Data . . . . . . . 9 Botnet Command & Control Detection . 10 Phishing Website Detection . . . . . . . 12 Phishing Website Detection Con- tinuum: ML based detection 14 Malware Infection Vector Detection . . 15 Actor Identification . . . . . . . . . . . 19 Event Identification . . . . . . . . . . . 24 Conclusions . . . . . . . . . . . . . . . 30 Future Work . . . . . . . . . . . . . . . 31 Acknowledgements . . . . . . . . . . . 32 References . . . . . . . . . . . . . . . . 33 Contact . . . . . . . . . . . . . . . . . 34 Questions . . . . . . . . . . . . . . . . 35 Suplimental Material . . . . . . . . . . 36 | Clarkson University 2/42
  • 3. Motivation Partially inspired by Gladwell’s book, The Tipping Point [1], in which he discusses how life can be thought of as an epidemic. Some criticism exists as to Gladwell’s rigor, however for our use it is about inspiration and motivation not accuracy. The Books Key Points “for our purposes” • Actors (Connectors, Mavens, Salesmen). • Information spreads like disease. • Ideas reach a tipping point (critical mass). Let’s Face It - Social Networks Are Fun • We are a social species, that enjoy communicating and self adulation. | Clarkson University 3/42
  • 4. Problem Questions • Can we come up with a way of classifying users based on actor types? • Can we determine who the opinion leaders or influencers are? • Can we determine how information spreads on these networks? • Can we detect malicious social network use? • Are there information security applications for social network data-mining? | Clarkson University 4/42
  • 5. Method & Publications • Establish a reliable collection mechanism. • Establish a large dataset that can be utilized to answer each question. • Use a case study approach, whereby each case feeds the next. • Produce each case study as an individual publication or presentation. – 3 x Published Proceedings – 2 x Pending Proceedings – 3 x Invited Presentations | Clarkson University 5/42
  • 6. Coalmine • Scales well based on initial tests • Useful for both manual and automated detection • Allowed us to refine our data collection capabilities At the Time (Future Work) • Rebuild of the tool to fix scaling limitations • More extensible Map/Reduce method • Inclusion of native multi-threading capability • New storage and distribution method • New algorithms for automated opinion leader detection | Clarkson University 6/42
  • 7. PySNAP • Fixes all of the previous issues with Coalmine • Completely reimplimented in Python with a few supportive Bash scripts • Utilizes the DISCO MapReduce framework, also built on Python • Included a better method for data capture that was previously bolted on to Coalmine • Allowed us to establish a large dataset for future work | Clarkson University 7/42
  • 8. Established Dataset • Over the course of 2012 we collected 165 TB of Twitter Data (Uncompressed) – 175 Days Collected, 147 Full Days ∗ Estimated 45 Billion Tweets – Recently released estimates place total Twitter traffic at 175 million tweets per day in 2012 – Thus our daily collection rates varied between 50% and 80% of total Twitter traffic. – We captured complete tweet data in JSON format using Twitters REST API. ∗ This data includes a large number of additional fields other than the mes- sage text, all of which can be taken into account when doing measure- ments. | Clarkson University 8/42
  • 9. Insights into the Data | Clarkson University 9/42
  • 10. Botnet Command & Control Detection • Joshua S White, Jeanna N Matthews, and John L Stacy. Coalmine: an experience in building a system for social media analytics. In SPIE Defense, Security, and Sensing, pages 84080A-84080A. International Society for Optics and Photonics, 2012. | Clarkson University 10/42
  • 11. Botnet Command & Control Detection Continued Date/Time UID Text MSG Entropy Source Sun Mar 20 15:27:02 +0000 2011 49492150 668365824 Shutdown -r now 3.373557 26227518 http://twitter.com/Ebastos Sun Mar 20 01:25:20 +0000 2011 49280326 475853825 # shutdown -h now 3.373557 26227518 http://twitter.com/ohdediku Sun Mar 20 21:40:53 +0000 2011 49586229 964062720 $ sudo shutdown -h now 3.373557 26227518 http://twitter.com/souzabruno Sun Mar 20 19:38:41 +0000 2011 49555476 769280000 Text: sudo shut- down -h now 3.373557 26227518 http://twitter.com/stormyblack Sun Mar 20 18:51:51 +0000 2011 49543693 820116992 shutdown -now 3.373557 26227518 http://twitter.com/godzilla2k9 Sun Mar 20 18:52:30 +0000 2011 49543856 840126464 shutdown -h now !: 3.373557 26227518 http://twitter.com/ph3nagen Sun Mar 20 18:52:30 +0000 2011 49600582 113177600 shutdown -H now. 3.373557 26227518 http://twitter.com/willybistuer Sun Mar 20 22:37:54 +0000 2011 49597117 039251457 elmenda: su shut- down -h now 3.373557 26227518 http://twitter.com/NeoVasili | Clarkson University 11/42
  • 12. Phishing Website Detection • Joshua S White, Jeanna N Matthews, and John L Stacy. A method for the automated detection phishing websites through both site characteristics and image analysis. In SPIE Defense, Security, and Sensing, pages 84080B- 84080B. International Society for Optics and Photonics, 2012. | Clarkson University 12/42
  • 13. Phishing Website Detection Continued (F)raud / (L)egit URL Structural Fingerprint Page Title pHash Value Hamming Score Paypal Fraudulent http://si4r.com/_paypal .co.uk/webscr.html?cmd =SignIn&co_partnerId=2 &pUserId=&siteid=0 &pageType=&pa1=&i1 =&bshowgif=&UsingSSL =&ru=&pp=&pa2= &errmsg=&runame= 0,7,1,0,2 RETURNED NOTHING 167161696874 89800000 1 Paypal Legitimate https://www.paypal.com/ cgi-bin/webscr?cmd= _login-submit&dispatch= 5885d80a13c0db1f8e263 663d3faee8d1e83f46a369 95b3856cef1e18897ad75 27,3,0,0,2 Redirecting - Paypal 184397071904 31800000 0 | Clarkson University 13/42
  • 14. Phishing Website Detection Continuum: ML based detection • Title: An Image-based Feature Extraction Approach for Phishing Website Detection • Authors: Hao Jiang, Joshua White, Jeanna Matthews • Builds off of our previous work in phishing website detection, specifically the image analysis approach • Utilizes a Machine Learning based approach to identifying the most prominent images on a webpage, usually the sites logo • Is able to detect phishing sites that the phash/hamming distance method concludes as not similar. – These are the “poor quality” phishing sites | Clarkson University 14/42
  • 15. Malware Infection Vector Detection • BEK (The Blackhole Exploit Kit) was the predominant MaaS (Malware as a Service) in 2012. • It accounted for an estimated 29% of all malicious URLs. • BEK licenses went for around 1500$ USD • BEK used Twitter as it’s primary means of spreading infectious URLs • Our method detects these malicious URLs and infectious accounts on a large scale | Clarkson University 15/42
  • 16. Malware Infection Vector Detection Continued • Joshua S. White and Jeanna N. Matthews, “It’s you on photo?: Automatic detection of Twitter accounts in- fected with the Blackhole Exploit Kit,” Malicious and Unwanted Software: "The Americas" (MALWARE), 2013 8th International Conference on , vol., no., pp.51,58, 22-24 Oct. 2013 doi: 10.1109/MALWARE.2013.6703685 | Clarkson University 16/42
  • 17. Malware Infection Vector Detection Continued | Clarkson University 17/42
  • 18. Malware Infection Vector Detection Continued | Clarkson University 18/42
  • 19. Actor Identification • Title: Connectors, Mavens, Salesmen and More: Actor Based Online Social Network (OSN) Analysis Method Using Tensed Predicate Logic • Authors: Joshua White and Jeanna Matthews • Submitted to KDD2014 (Knowledge Discovery and Data Mining) Conference “Data Mining for Social Good” • Utilized multiple definitions of actor types to created tensed predicate logic descriptions • Translated these logics into semantic queries • Tested the queries against a known dataset | Clarkson University 19/42
  • 20. Actor Identification Continued | Clarkson University 20/42
  • 21. Actor Identification Continued • Time is important • Previous methods did not take event sequence into account • Liaison Example: | Clarkson University 21/42
  • 22. Actor Identification Continued | Clarkson University 22/42
  • 23. Actor Identification Continued | Clarkson University 23/42
  • 24. Event Identification • Still in the initial stages of this part of our work • Given a general topic, “search term, hashtag,” we can identify most of the related content from the dataset • We have a means for alerting on all new posts regarding that term • We can dig historically through the data and trace the path that an itea took • We can identify the influential individuals, “accounts,” that played a part in the infor- mation spread • Our test case was the KONY2012 Event | Clarkson University 24/42
  • 25. Event Identification Continued | Clarkson University 25/42
  • 26. Event Identification Continued • Top 10 Twitter Accounts, sending and receiving KONY2012 related Tweets Directed @ Account Names In-Degree Origin Account Names Out-Degree tothekidswho 625 twittonpeace 47 Invisible 125 interhabernet 44 youtube 118 DailyisOut 44 helpspreadthis 95 MEDYA_TURK 42 justinbieber 83 haber_42 35 prettypinkprobz 48 gundem_haber 30 ninadobrev 48 twittofpeace 22 MeekMill 47 korkmazhaber 19 ladygaga 43 tarafsiz_haber 14 KendallJenner 39 Son_DakikaHaber 13 | Clarkson University 26/42
  • 27. Event Identification Continued • Top 10 Twitter Accounts, retweeting and being retweeted regarding KONY2012 Retweeting Accounts In-Degree Message Source Out-Degree MedyaKonya 8 Stop____Kony 2642 twittonpeace 8 tothekidswho 753 haber_42 7 konyfamous2012 716 gundem_haber 7 Kony2012Help 615 korkmazhaber 7 stop______kony 353 DailyisOut 7 WESTOPKONY 225 interhabernet 6 zaynmalik 221 KONYA_ZAMAN 6 iSayStopKony 127 konya_time 6 Stop_2012_Kony 80 konyagazetesi 5 Kony_Awareness 72 | Clarkson University 27/42
  • 28. Event Identification Continued | Clarkson University 28/42
  • 29. Event Identification Continued | Clarkson University 29/42
  • 30. Conclusions • We aimed to answer the following questions when we started this work: – Can we come up with a way of classifying users based on actor types? – Can we determine who the opinion leaders or influencers are? – Can we determine how information spreads on these networks? – Can we detect malicious social network use? – Are there information security applications for social network data-mining? • I think we did a good job at providing at least some cursory answers to these questions | Clarkson University 30/42
  • 31. Future Work • We have applied for a data grant from Twitter • We have, are in the process of, moving our entire dataset to the lab at Clarkson and building up a new capture/analysis system • I am planning on pursuing the semantic side of social network analysis – Currently only one SNA semantic ontology exists and it’s on on paper. – I am planning on rolling both the actor and event analysis into one approach which will be part of a new ontology | Clarkson University 31/42
  • 32. Acknowledgements • I would like to thank: – Dr. Matthews – Dr. Bay – Dr. Lynch – Dr. Schuckers – Dr. Liu | Clarkson University 32/42
  • 33. References [1] Gladwell, M. (2000). The tipping point. Boston: Little, Brown and Company | Clarkson University 33/42
  • 37. • DDFS | Clarkson University 37/42
  • 39. • Twitter JSON Key Fields profile_link_color Coordinates verified In_reply_to_screen_name Geo time_zone In_reply_to_status_id text statuses_count In_reply_to_status_id_str entities Contributors In_reply_to_user_id place protected profile_background_color contributors_enabled trunkated profile_background_title default_profile retweeted default_profile_image description id_translator follow_request_sent followers_count location friends_count geo_endabled favorites_count profile_image_url_https listed_count following profile_background_image_url notifications retweet_count background_image_url_https name created_at profile_image_url lang Favorited sidebar_border_color use_background_image Id_str sidebar_fill_color screen_name Created_at profile_text_color show_all_inline_media Id url utc_offset | Clarkson University 39/42
  • 40. • BEK Infectious Account Visualization | Clarkson University 40/42
  • 41. • Tensed Predicate Logic Key | Clarkson University 41/42
  • 42. • Coalmine User Interface | Clarkson University 42/42