SlideShare a Scribd company logo
Mentions of Security
Vulnerabilities on Reddit,
Twitter and GitHub
Sameera Horawalavithana*,
Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury,
Lawrence O. Hall, Adriana Iamnitchi
University of South Florida
IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece
Security Vulnerabilities
❏ Identified by CVE (Common
Vulnerabilities and Exposures)
identifiers:
❏ Publicly known security
vulnerability is uniquely identified
by a pattern CVE-YYYY-NNNN
❏ Formally recorded in National
Vulnerability Database (NVD)
❏ “U.S. government repository of
standards based vulnerability
management data represented
using the Security Content
Automation Protocol (SCAP)”
❏ Discussed on social media
2CVEs published in NVD over time.
Research Questions
1) What is the relationship between
mentions of security
vulnerabilities as posted on
Twitter, Reddit and GitHub?
2) Can the software development
activities in GitHub be predicted
from the discussions on Reddit
and Twitter?
3
Outline
❏ Dataset
❏ Data analysis
❏ CVE mentions in Reddit and Twitter
❏ CVE mentions in GitHub actions
❏ Predicting GitHub activities by using Reddit and Twitter activity signals
❏ Summary
4
Datasets
❏ Two social-media platforms: Reddit and
Twitter
❏ One software collaborative platform:
GitHub
❏ 18 months of records: 03/16-08/17
❏ Data filtering using the regular expression
CVE-d{4}-d{4} to match CVE identifiers
that appeared in posts, comments in
Reddit, tweets, replies in Twitter, and
GitHub event descriptions
5
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume
of security vulnerability mentions?
6
CVE Mentions in Reddit and Twitter (1)
7
❏ 10,257 CVE identifiers
mentioned in our Reddit/Twitter
dataset,
❏ 95% CVE identifiers are
mentioned only on Twitter.
❏ 0.5% CVE IDs are mentioned
only on Reddit.
❏ 4.5% mentioned on both
platforms
More security vulnerabilities are discussed on Twitter
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
8
CVE Mentions in Reddit and Twitter (2)
9
Reddit Twitter
Both platforms show a peak in the mentions of CVE identifiers near their
public disclosure
❏ Day 0 represent the NVD public disclosure date
❏ Published date of the message (post/tweet) is relative to NVD public
disclosure date of mentioned CVE identifier
CVE Mentions in Reddit and Twitter (3)
10
Reddit Twitter
❏ Timing of social-media messages with respect to Reddit subreddits and
Twitter Hashtags
Out of the CVE identifiers discussed on Reddit, majority are discussed
before public disclosure
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How does the severity of the security vulnerabilities affect the
timing of vulnerability mentions on the two platforms?
11
CVE Mentions in Reddit and Twitter (4)
12
❏ Timing of social-media messages
with respect to the severity of
mentioned security vulnerabilities
❏ We identified bot-driven
communities using the textual
description of the subreddit
❏ We used BotHunter to detect
Twitter bot users
Early discussions related to high
severity CVE identifiers occur on
Reddit
RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
13
CVE Mentions in Reddit and Twitter (5)
14
❏ Three Cascade Types
❏ Before (completed): cascades start and end before the public disclosure day of the
mentioned CVE
❏ Before (not completed): cascades start before the public disclosure day, but continue
after the public disclosure day of the mentioned CVE
❏ After: cascades start and end before the public disclosure day of the mentioned CVE
Reddit discussions are viral before the CVE public disclosure,
Twitter re-shares emerge after the CVE public disclosure
RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. What types of sentiments fuel these discussions?
15
CVE Mentions in Reddit and Twitter (6)
16
● Uncertainty analysis of Reddit
comments
○ Used a pre-trained machine learning model
(Yu et al. [1]) to classify whether comment
is certain or not towards the subject of the
conversation
● Reaction types of Twitter replies
○ Used a pre-trained machine learning model
(Glenski et al. [2]) to classify whether the
reply is in a type of an answer, elaboration,
question, appreciation, negative reaction,
and agreement
1. Ning Yu and Graham Horwood. 2018. Veracity Enriched Event Extraction. In 2018 International Workshop on Social Sensing (SocialSens).3–3.
2. Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and Understanding User Reactions to Deceptive and Trusted Social News
Sources. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 176–181.
More “certain” comments in Reddit,
Majority of Twitter replies are classified
as “elaboration”, then follows “answer”
before and after public disclosure
RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. How does GitHub activity depend on the public disclosure of
security vulnerabilities?
17
CVE Mentions in GitHub Events (1)
❏ 10,502 CVE identifiers
mentioned in GitHub Events
❏ The overlap with the CVE
identifiers mentioned in
platforms
❏ 40% with Twitter
❏ 3% with Reddit
18
Moderate overlap of CVE identifiers
subject to software development
with Twitter
CVE Mentions in GitHub Events (2)
❏ Majority of GitHub events
mentioned only one CVE identifier,
❏ One CVE identifier
(CVE-2015-1805) is mentioned
more than in 3000 GitHub events,
❏ CVE-2015-1805 is published in
NVD around August 2015
❏ We noticed an increased
volume of related GitHub
activities in early 2016
❏ What did really happen?
19
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. How does GitHub activity depend on the public disclosure of
security vulnerabilities?
f. How does GitHub activity correlate with the number of CVEs
for the most vulnerable repositories?
20
CVE Mentions in GitHub Events (3)
21
❏ We selected two most vulnerable
repositories with respect to the
number of associated CVE identifiers
❏ We show the pattern across three
time-series, monthly number of
mentioned CVEs, Forks, Watches
and Push Events
❏ We calculate Dynamic Time
Warping (DTW) to measure the
similarity between GitHub event
and CVE time-series
Push Events are the closest to follow
the pattern of CVE mentions
RQ2: Can the software development activities in GitHub be predicted
from the discussions on Reddit and Twitter?
22
Predicting GitHub Activities
A GitHub event is defined as (U,R,Ep
,Th
),
❏ U: user
❏ R: repository
❏ Ep
: type of action (PushEvent
PullRequestEvent, IssuesEvent,
ForkEvent, WatchEvent,
CommitEvent, ReleaseEvent)
❏ Th
: the event time-stamp in hours
23
Time
Reddit
Twitter
GitHub
Training Testing
Features
Target
(Event)
Features
Target
(Event)
January 2017 to May 2017* August 2017
*June and July, 2017 as validation data
Predicting GitHub Activities: Features and Approach
❏ Reddit time-series features
❏ Daily count of posts
❏ Daily count of active authors
❏ Daily count of active subreddits
❏ Daily counts of comments
❏ Twitter time-series features
❏ Daily count of tweets
❏ Daily count of tweeting users
❏ Daily count of retweets
❏ Daily count of retweeting users
24
Reddit/Twitter
time-series
Features
NN
Number of
GitHub events
in a day
Likelihood of a user
performing an
action to a
repository in a hour
LSTM
Hourly GitHub
activities of a
user to a
repository
Predicting Longitudinal User Activity at Fine Time Granularity in Online Collaborative
Platforms, Renhao Liu, Frederick mubang, Lawrence Hall*, Sameera Horawalavithana,
Adriana iamnitchi, John Skvoretz, IEEE International Conference on Systems, Man, and
Cybernetics (SMC) , Bari, Italy, 2019
Predicting GitHub Activities: Results
25
JS-divergence: 0.0020, and R2: 0.6067,JS-divergence: 0.0029 and R2: 0.6300
Predicting GitHub Activities: Relevance
26
❏ Why is predicting GitHub activities
important?
❏ GitHub hosts many exploits and
patches related with CVE identifiers
❏ Predictions might reflect the
software development activities of
an attacker who develops an exploit
❏ Predictions can be used to estimate
the availability of a patch related to
a security vulnerability
Reddit/Twitter features are helpful for
predicting number of GitHub events.
It is more difficult to predict the
identity of a user and the repository
in an event.
Summary
27
❏ We characterized a use-case scenario where diverse online platforms are
interconnected such that the activities in one platform can be predicted based
on the activities in the others.
Practical implications of our findings:
❏ Advance or calibrate security alert tools based on information from multiple
social media platforms.
❏ Better coordinate software development activities with the lessons learned
from social-media information
Acknowledgements
❏ Funded by DARPA SocialSim Program and the Air Force Research
Laboratory
❏ Data: Leidos, Netanomics
❏ Evaluation code provided by Pacific Northwest National Laboratory
28
Mentions of Security
Vulnerabilities on Reddit,
Twitter and GitHub
Sameera Horawalavithana*
(sameera1@mail.usf.edu)
Check out our project @SocialSim
Backup
30
Thank you.
sameera1@mail.usf.edu
31
Check out our project @SocialSim
Related Work
❏ Different types of security vulnerability information available in Twitter (Syed
et. al., Sauerwein et al.)
❏ Description of Vulnerabilities (e.g., URLs to security mailing list, expert blogs etc.)
❏ Demonstration of Exploits (e.g., URLs to YouTube videos)
❏ Unofficial proposals of countermeasures (e.g., URLs to security blogs describing unofficial
patches)
❏ Announcement of patch releases (e.g., URLs to official blog posts by vendors)
❏ Automatically discovering security threats from independent platforms.
❏ E.g., Twitter, Dark Web (Sapienza et al.), security blogs (Mittal et. al, ) etc.
32

More Related Content

PDF
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
PDF
Data-driven Studies on Social Networks: Privacy and Simulation
PDF
Monitoring real time public vaccine confidence through social media (Francesc...
PDF
Link prediction 방법의 개념 및 활용
PPTX
Link prediction with the linkpred tool
PDF
News construction from microblogging post using open data
DOCX
Outsourcing privacy preserving social networks to a cloud
PPTX
Applying DevOps Principles to Address Dynamic Changes in Cyber Security
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Data-driven Studies on Social Networks: Privacy and Simulation
Monitoring real time public vaccine confidence through social media (Francesc...
Link prediction 방법의 개념 및 활용
Link prediction with the linkpred tool
News construction from microblogging post using open data
Outsourcing privacy preserving social networks to a cloud
Applying DevOps Principles to Address Dynamic Changes in Cyber Security

Similar to Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub (20)

PPTX
Computational Verification Challenges in Social Media
PDF
Software Security Engineering (Learnings from the past to fix the future) - B...
PPTX
How to assign a CVE to yourself?
PDF
Evaluating the Utilization of Twitter Messages as a Source of Security Alerts
PPSX
Ids 004 cve
PPTX
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs, PATCH Act, & Wan...
PPTX
Built-in Security Mindfulness for Software Developers
PPTX
From Research to Applications: What Can We Extract with Social Media Sensing?
PDF
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
PDF
Understanding the mirai botnet
PDF
Hacks R Us used 4 zero-days to infect Windows and Android devices
PPTX
I’m going to go... stalk... Lenny and Carl...
PPTX
16 CVSS16 CVSS16 CVSS16 CVSS16 CVSS16 CVSS.pptx
PDF
Computer Virus and Antivirus MCQ Question
PPTX
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
PDF
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
PDF
Eset trends report_2018
PDF
Cybersecurity Trends 2018: The costs of connection
PDF
Life of a CVE
PDF
BNYMellon - CVE 101.pdf
Computational Verification Challenges in Social Media
Software Security Engineering (Learnings from the past to fix the future) - B...
How to assign a CVE to yourself?
Evaluating the Utilization of Twitter Messages as a Source of Security Alerts
Ids 004 cve
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs, PATCH Act, & Wan...
Built-in Security Mindfulness for Software Developers
From Research to Applications: What Can We Extract with Social Media Sensing?
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Understanding the mirai botnet
Hacks R Us used 4 zero-days to infect Windows and Android devices
I’m going to go... stalk... Lenny and Carl...
16 CVSS16 CVSS16 CVSS16 CVSS16 CVSS16 CVSS.pptx
Computer Virus and Antivirus MCQ Question
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
Eset trends report_2018
Cybersecurity Trends 2018: The costs of connection
Life of a CVE
BNYMellon - CVE 101.pdf
Ad

More from Sameera Horawalavithana (15)

PDF
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
PPTX
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
PDF
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
PPTX
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
PDF
Duplicate Detection on Hoaxy Dataset
PDF
Dancing with Stream Processing
PPTX
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
PDF
Be Elastic: Leapset Innovation session 06-08-2015
PPTX
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
PPTX
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
PPTX
Locality sensitive hashing
PPTX
Zipf distribution
PPTX
Query personalization
PPTX
Dancing with publish/subscribe
PPTX
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
Duplicate Detection on Hoaxy Dataset
Dancing with Stream Processing
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
Be Elastic: Leapset Innovation session 06-08-2015
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
Locality sensitive hashing
Zipf distribution
Query personalization
Dancing with publish/subscribe
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to machine learning and Linear Models
PPTX
1_Introduction to advance data techniques.pptx
Lecture1 pattern recognition............
IBA_Chapter_11_Slides_Final_Accessible.pptx
Clinical guidelines as a resource for EBP(1).pdf
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Miokarditis (Inflamasi pada Otot Jantung)
ISS -ESG Data flows What is ESG and HowHow
Introduction to Knowledge Engineering Part 1
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to machine learning and Linear Models
1_Introduction to advance data techniques.pptx

Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub

  • 1. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub Sameera Horawalavithana*, Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury, Lawrence O. Hall, Adriana Iamnitchi University of South Florida IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece
  • 2. Security Vulnerabilities ❏ Identified by CVE (Common Vulnerabilities and Exposures) identifiers: ❏ Publicly known security vulnerability is uniquely identified by a pattern CVE-YYYY-NNNN ❏ Formally recorded in National Vulnerability Database (NVD) ❏ “U.S. government repository of standards based vulnerability management data represented using the Security Content Automation Protocol (SCAP)” ❏ Discussed on social media 2CVEs published in NVD over time.
  • 3. Research Questions 1) What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? 2) Can the software development activities in GitHub be predicted from the discussions on Reddit and Twitter? 3
  • 4. Outline ❏ Dataset ❏ Data analysis ❏ CVE mentions in Reddit and Twitter ❏ CVE mentions in GitHub actions ❏ Predicting GitHub activities by using Reddit and Twitter activity signals ❏ Summary 4
  • 5. Datasets ❏ Two social-media platforms: Reddit and Twitter ❏ One software collaborative platform: GitHub ❏ 18 months of records: 03/16-08/17 ❏ Data filtering using the regular expression CVE-d{4}-d{4} to match CVE identifiers that appeared in posts, comments in Reddit, tweets, replies in Twitter, and GitHub event descriptions 5
  • 6. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? 6
  • 7. CVE Mentions in Reddit and Twitter (1) 7 ❏ 10,257 CVE identifiers mentioned in our Reddit/Twitter dataset, ❏ 95% CVE identifiers are mentioned only on Twitter. ❏ 0.5% CVE IDs are mentioned only on Reddit. ❏ 4.5% mentioned on both platforms More security vulnerabilities are discussed on Twitter
  • 8. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? 8
  • 9. CVE Mentions in Reddit and Twitter (2) 9 Reddit Twitter Both platforms show a peak in the mentions of CVE identifiers near their public disclosure ❏ Day 0 represent the NVD public disclosure date ❏ Published date of the message (post/tweet) is relative to NVD public disclosure date of mentioned CVE identifier
  • 10. CVE Mentions in Reddit and Twitter (3) 10 Reddit Twitter ❏ Timing of social-media messages with respect to Reddit subreddits and Twitter Hashtags Out of the CVE identifiers discussed on Reddit, majority are discussed before public disclosure
  • 11. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How does the severity of the security vulnerabilities affect the timing of vulnerability mentions on the two platforms? 11
  • 12. CVE Mentions in Reddit and Twitter (4) 12 ❏ Timing of social-media messages with respect to the severity of mentioned security vulnerabilities ❏ We identified bot-driven communities using the textual description of the subreddit ❏ We used BotHunter to detect Twitter bot users Early discussions related to high severity CVE identifiers occur on Reddit
  • 13. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? 13
  • 14. CVE Mentions in Reddit and Twitter (5) 14 ❏ Three Cascade Types ❏ Before (completed): cascades start and end before the public disclosure day of the mentioned CVE ❏ Before (not completed): cascades start before the public disclosure day, but continue after the public disclosure day of the mentioned CVE ❏ After: cascades start and end before the public disclosure day of the mentioned CVE Reddit discussions are viral before the CVE public disclosure, Twitter re-shares emerge after the CVE public disclosure
  • 15. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. What types of sentiments fuel these discussions? 15
  • 16. CVE Mentions in Reddit and Twitter (6) 16 ● Uncertainty analysis of Reddit comments ○ Used a pre-trained machine learning model (Yu et al. [1]) to classify whether comment is certain or not towards the subject of the conversation ● Reaction types of Twitter replies ○ Used a pre-trained machine learning model (Glenski et al. [2]) to classify whether the reply is in a type of an answer, elaboration, question, appreciation, negative reaction, and agreement 1. Ning Yu and Graham Horwood. 2018. Veracity Enriched Event Extraction. In 2018 International Workshop on Social Sensing (SocialSens).3–3. 2. Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 176–181. More “certain” comments in Reddit, Majority of Twitter replies are classified as “elaboration”, then follows “answer” before and after public disclosure
  • 17. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. How does GitHub activity depend on the public disclosure of security vulnerabilities? 17
  • 18. CVE Mentions in GitHub Events (1) ❏ 10,502 CVE identifiers mentioned in GitHub Events ❏ The overlap with the CVE identifiers mentioned in platforms ❏ 40% with Twitter ❏ 3% with Reddit 18 Moderate overlap of CVE identifiers subject to software development with Twitter
  • 19. CVE Mentions in GitHub Events (2) ❏ Majority of GitHub events mentioned only one CVE identifier, ❏ One CVE identifier (CVE-2015-1805) is mentioned more than in 3000 GitHub events, ❏ CVE-2015-1805 is published in NVD around August 2015 ❏ We noticed an increased volume of related GitHub activities in early 2016 ❏ What did really happen? 19
  • 20. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. How does GitHub activity depend on the public disclosure of security vulnerabilities? f. How does GitHub activity correlate with the number of CVEs for the most vulnerable repositories? 20
  • 21. CVE Mentions in GitHub Events (3) 21 ❏ We selected two most vulnerable repositories with respect to the number of associated CVE identifiers ❏ We show the pattern across three time-series, monthly number of mentioned CVEs, Forks, Watches and Push Events ❏ We calculate Dynamic Time Warping (DTW) to measure the similarity between GitHub event and CVE time-series Push Events are the closest to follow the pattern of CVE mentions
  • 22. RQ2: Can the software development activities in GitHub be predicted from the discussions on Reddit and Twitter? 22
  • 23. Predicting GitHub Activities A GitHub event is defined as (U,R,Ep ,Th ), ❏ U: user ❏ R: repository ❏ Ep : type of action (PushEvent PullRequestEvent, IssuesEvent, ForkEvent, WatchEvent, CommitEvent, ReleaseEvent) ❏ Th : the event time-stamp in hours 23 Time Reddit Twitter GitHub Training Testing Features Target (Event) Features Target (Event) January 2017 to May 2017* August 2017 *June and July, 2017 as validation data
  • 24. Predicting GitHub Activities: Features and Approach ❏ Reddit time-series features ❏ Daily count of posts ❏ Daily count of active authors ❏ Daily count of active subreddits ❏ Daily counts of comments ❏ Twitter time-series features ❏ Daily count of tweets ❏ Daily count of tweeting users ❏ Daily count of retweets ❏ Daily count of retweeting users 24 Reddit/Twitter time-series Features NN Number of GitHub events in a day Likelihood of a user performing an action to a repository in a hour LSTM Hourly GitHub activities of a user to a repository Predicting Longitudinal User Activity at Fine Time Granularity in Online Collaborative Platforms, Renhao Liu, Frederick mubang, Lawrence Hall*, Sameera Horawalavithana, Adriana iamnitchi, John Skvoretz, IEEE International Conference on Systems, Man, and Cybernetics (SMC) , Bari, Italy, 2019
  • 25. Predicting GitHub Activities: Results 25 JS-divergence: 0.0020, and R2: 0.6067,JS-divergence: 0.0029 and R2: 0.6300
  • 26. Predicting GitHub Activities: Relevance 26 ❏ Why is predicting GitHub activities important? ❏ GitHub hosts many exploits and patches related with CVE identifiers ❏ Predictions might reflect the software development activities of an attacker who develops an exploit ❏ Predictions can be used to estimate the availability of a patch related to a security vulnerability Reddit/Twitter features are helpful for predicting number of GitHub events. It is more difficult to predict the identity of a user and the repository in an event.
  • 27. Summary 27 ❏ We characterized a use-case scenario where diverse online platforms are interconnected such that the activities in one platform can be predicted based on the activities in the others. Practical implications of our findings: ❏ Advance or calibrate security alert tools based on information from multiple social media platforms. ❏ Better coordinate software development activities with the lessons learned from social-media information
  • 28. Acknowledgements ❏ Funded by DARPA SocialSim Program and the Air Force Research Laboratory ❏ Data: Leidos, Netanomics ❏ Evaluation code provided by Pacific Northwest National Laboratory 28
  • 29. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub Sameera Horawalavithana* (sameera1@mail.usf.edu) Check out our project @SocialSim
  • 32. Related Work ❏ Different types of security vulnerability information available in Twitter (Syed et. al., Sauerwein et al.) ❏ Description of Vulnerabilities (e.g., URLs to security mailing list, expert blogs etc.) ❏ Demonstration of Exploits (e.g., URLs to YouTube videos) ❏ Unofficial proposals of countermeasures (e.g., URLs to security blogs describing unofficial patches) ❏ Announcement of patch releases (e.g., URLs to official blog posts by vendors) ❏ Automatically discovering security threats from independent platforms. ❏ E.g., Twitter, Dark Web (Sapienza et al.), security blogs (Mittal et. al, ) etc. 32