SlideShare a Scribd company logo
WARNINGBIRD: A Near Real-time Detection System for
Suspicious URLs in Twitter Stream
ABSTRACT:
Twitter is prone to malicious tweets containing URLs for spam, phishing, and
malware distribution. Conventional Twitter spam detection schemes utilize account
features such as the ratio of tweets containing URLs and the account creation date,
or relation features in the Twitter graph. These detection schemes are ineffective
against feature fabrications or consume much time and resources. Conventional
suspicious URL detection schemes utilize several features including lexical
features of URLs, URL redirection, HTML content, and dynamic behavior.
However, evading techniques such as time-based evasion and crawler evasion
exist. In this paper, we propose WARNINGBIRD, a suspicious URL detection
system for Twitter. Our system investigates correlations of URL redirect chains
extracted from several tweets. Because attackers have limited resources and
usually reuse them, their URL redirect chains frequently share the same URLs. We
develop methods to discover correlated URL redirect chains using the frequently
shared URLs and to determine their suspiciousness. We collect numerous tweets
from the Twitter public timeline and build a statistical classifier using them.
Evaluation results show that our classifier accurately and efficiently detects
suspicious URLs. We also present WARNINGBIRD as a near real-time system for
classifying suspicious URLs in the Twitter stream.
EXISTING SYSTEM:
In the existing system attackers use shortened malicious URLs that redirect Twitter
users to external attack servers. To cope with malicious tweets, several Twitter
spam detection schemes have been proposed. These schemes can be classified into
account feature-based, relation feature-based, and message feature based schemes.
Account feature-based schemes use the distinguishing features of spam accounts
such as the ratio of tweets containing URLs, the account creation date, and the
number of followers and friends. However, malicious users can easily fabricate
these account features. The relation feature-based schemes rely on more robust
features that malicious users cannot easily fabricate such as the distance and
connectivity apparent in the Twitter graph. Extracting these relation features from
a Twitter graph, however, requires a significant amount of time and resources as a
Twitter graph is tremendous in size. The message feature-based scheme focused on
the lexical features of messages. However, spammers can easily change the shape
of their messages. A number of suspicious URL detection schemes have also been
introduced.
DISADVANTAGES OF EXISTING SYSTEM:
Malicious servers can bypass an investigation by selectively providing
benign pages to crawlers.
For instance, because static crawlers usually cannot handle JavaScript or
Flash, malicious servers can use them to deliver malicious content only to
normal browsers.
A recent technical report from Google has also discussed techniques for
evading current Web malware detection systems.
Malicious servers can also employ temporal behaviors— providing different
content at different times—to evade an investigation
PROPOSED SYSTEM:
In this paper, we propose WARNINGBIRD, a suspicious URL detection system
for Twitter. Instead of investigating the landing pages of individual URLs in each
tweet, which may not be successfully fetched, we considered correlations of URL
redirect chains extracted from a number of tweets. Because attacker’s resources are
generally limited and need to be reused, their URL redirect chains usually share the
same URLs. We therefore created a method to detect correlated URL redirect
chains using such frequently shared URLs. By analyzing the correlated URL
redirect chains and their tweet context information, we discover several features
that can be used to classify suspicious URLs. We collected a large number of
tweets from the Twitter public timeline and trained a statistical classifier using the
discovered features.
ADVANTAGES OF PROPOSED SYSTEM:
The trained classifier is shown to be accurate and has low false positives and
negatives. The contributions of this paper are as follows:
• We present a new suspicious URL detection system for Twitter that is based on
the correlations of URL redirect chains, which are difficult to fabricate. The system
can find correlated URL redirect chains using the frequently shared URLs and
determine their suspiciousness in almost real time.
• We introduce new features of suspicious URLs: some of which are newly
discovered and while others are variations of previously discovered features.
• We present the results of investigations conducted on suspicious URLs that have
been widely distributed through Twitter over several months.
SYSTEM ARCHITECTURE:
ALGORITHM USED:
 Offline supervised learning algorithm
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
 Processor - Pentium –IV
 Speed - 1.1 Ghz
 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
SOFTWARE CONFIGURATION:-
 Operating System : Windows XP
 Programming Language : JAVA
 Java Version : JDK 1.6 & above.
REFERENCE:
Sangho Lee, Student Member, IEEE, and Jong Kim, Member, IEEE
―WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in
Twitter Stream‖-IEEE TRANSACTIONS ON DEPENDABLE AND SECURE
COMPUTING, VOL. X, NO. Y, JANUARY2013.

More Related Content

PDF
DMAP: Data Aggregation and Presentation Framework
PDF
Iy2515891593
PDF
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
PPTX
PDF
State of the Art Analysis Approach for Identification of the Malignant URLs
PPTX
Real time classification of malicious urls.pptx 2
PDF
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
DOCX
plagiarism checker
DMAP: Data Aggregation and Presentation Framework
Iy2515891593
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
State of the Art Analysis Approach for Identification of the Malignant URLs
Real time classification of malicious urls.pptx 2
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
plagiarism checker

What's hot (19)

PDF
Classification of phishing scam in website using vowpal wabbit algorithm (4)
PDF
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
PDF
A Novel Interface to a Web Crawler using VB.NET Technology
PPTX
App Observatory
PPTX
Twitter api
PDF
The Value of Shared Threat Intelligence
PPT
Web filtering through Software
PPTX
Facebook api setting and mining data
PPTX
MassTLC Opening Slides and Simulation Session
PDF
011918 espionage health_check_fact_sheet_rs
PDF
PHP SuperGlobals - Supersized Trouble
DOCX
Discovery of ranking fraud for mobile apps
DOCX
DISCOVERY OF RANKING FRAUD FOR MOBILE APPS
DOCX
Discovery of ranking fraud for mobile apps
PPTX
Automated Detection of HPP Vulnerabilities in Web Applications Version 0.3, B...
PDF
Web of Short URL’s
DOCX
DETECTING MALICIOUS FACEBOOK APPLICATIONS - IEEE PROJECTS IN PONDICHERRY,BUL...
DOCX
Protecting user data in profile matching social networks
PDF
IRJET- Discovery of Fraud Apps Utilizing Sentiment Analysis
Classification of phishing scam in website using vowpal wabbit algorithm (4)
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
A Novel Interface to a Web Crawler using VB.NET Technology
App Observatory
Twitter api
The Value of Shared Threat Intelligence
Web filtering through Software
Facebook api setting and mining data
MassTLC Opening Slides and Simulation Session
011918 espionage health_check_fact_sheet_rs
PHP SuperGlobals - Supersized Trouble
Discovery of ranking fraud for mobile apps
DISCOVERY OF RANKING FRAUD FOR MOBILE APPS
Discovery of ranking fraud for mobile apps
Automated Detection of HPP Vulnerabilities in Web Applications Version 0.3, B...
Web of Short URL’s
DETECTING MALICIOUS FACEBOOK APPLICATIONS - IEEE PROJECTS IN PONDICHERRY,BUL...
Protecting user data in profile matching social networks
IRJET- Discovery of Fraud Apps Utilizing Sentiment Analysis
Ad

Viewers also liked (19)

PPT
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relation...
DOCX
On the node clone detection in wireless sensor networks
DOCX
Security threats to mobile multimedia applications camera based attacks on mo...
DOCX
Review of behavior malware analysis for android
DOCX
Ip geolocation mapping for moderately connected internet regions
DOCX
Detection and localization of multiple spoofing attackers in wireless networks
DOCX
A proxy based approach to continuous location-based spatial queries in mobile...
DOCX
Incentive compatible privacy preserving data analysis
DOCX
Multicast capacity in manet with infrastructure support
DOCX
Enforcing secure and privacy preserving information brokering in distributed ...
DOCX
A log based approach to make digital forensics easier on cloud computing
DOCX
Local directional number pattern for face analysis face and expression recogn...
DOCX
Attribute based access to scalable media in cloud-assisted content sharing ne...
DOCX
Bahg back bone-assisted hop greedy routing for vanet’s city environments
DOCX
Enabling dynamic data and indirect mutual trust for cloud computing storage s...
PDF
Understanding the external links of video sharing sites measurement and analysis
DOCX
Minimum cost blocking problem in multi path wireless routing protocols
DOCX
Content sharing over smartphone based delay-tolerant networks
DOCX
Combining cryptographic primitives to prevent jamming attacks in wireless net...
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relation...
On the node clone detection in wireless sensor networks
Security threats to mobile multimedia applications camera based attacks on mo...
Review of behavior malware analysis for android
Ip geolocation mapping for moderately connected internet regions
Detection and localization of multiple spoofing attackers in wireless networks
A proxy based approach to continuous location-based spatial queries in mobile...
Incentive compatible privacy preserving data analysis
Multicast capacity in manet with infrastructure support
Enforcing secure and privacy preserving information brokering in distributed ...
A log based approach to make digital forensics easier on cloud computing
Local directional number pattern for face analysis face and expression recogn...
Attribute based access to scalable media in cloud-assisted content sharing ne...
Bahg back bone-assisted hop greedy routing for vanet’s city environments
Enabling dynamic data and indirect mutual trust for cloud computing storage s...
Understanding the external links of video sharing sites measurement and analysis
Minimum cost blocking problem in multi path wireless routing protocols
Content sharing over smartphone based delay-tolerant networks
Combining cryptographic primitives to prevent jamming attacks in wireless net...
Ad

Similar to Warningbird a near real time detection system for suspicious urls in twitter stream (20)

PDF
Detecting Phishing using Machine Learning
PPTX
WARNINGBIRD: A NEAR REAL-TIME DETECTION SYSTEM FOR SUSPICIOUS URLS IN TWITTER...
PDF
F43033234
PDF
Detection of Phishing Websites
PDF
Warningbird
PDF
Classification Methods for Spam Detection in Online Social Network
PDF
Patent. US11483343 [EN] .pdf
PDF
IRJET - Review on Search Engine Optimization
PPTX
detection of malicious URLs.pptx
PPTX
Project PPT.pptx for social media project
PPTX
click stream sequence analysis for mallicious bot identification
PPTX
Detection of Phishing Websites
PDF
IRJET- Malicious Short Urls Detection: A Survey
PDF
Learning to detect phishing ur ls
PDF
Smart Crawler Automation with RMI
PDF
IRJET- Machine Learning Techniques to Seek Out Malicious Websites
PPTX
phishing attack specifically URL detection
PPT
Googling of GooGle
PDF
Web Crawler For Mining Web Data
PDF
E017624043
Detecting Phishing using Machine Learning
WARNINGBIRD: A NEAR REAL-TIME DETECTION SYSTEM FOR SUSPICIOUS URLS IN TWITTER...
F43033234
Detection of Phishing Websites
Warningbird
Classification Methods for Spam Detection in Online Social Network
Patent. US11483343 [EN] .pdf
IRJET - Review on Search Engine Optimization
detection of malicious URLs.pptx
Project PPT.pptx for social media project
click stream sequence analysis for mallicious bot identification
Detection of Phishing Websites
IRJET- Malicious Short Urls Detection: A Survey
Learning to detect phishing ur ls
Smart Crawler Automation with RMI
IRJET- Machine Learning Techniques to Seek Out Malicious Websites
phishing attack specifically URL detection
Googling of GooGle
Web Crawler For Mining Web Data
E017624043

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
Complications of Minimal Access Surgery at WLH
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
History, Philosophy and sociology of education (1).pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
2.FourierTransform-ShortQuestionswithAnswers.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Final Presentation General Medicine 03-08-2024.pptx
Microbial disease of the cardiovascular and lymphatic systems
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Complications of Minimal Access Surgery at WLH
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Weekly quiz Compilation Jan -July 25.pdf
Classroom Observation Tools for Teachers

Warningbird a near real time detection system for suspicious urls in twitter stream

  • 1. WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream ABSTRACT: Twitter is prone to malicious tweets containing URLs for spam, phishing, and malware distribution. Conventional Twitter spam detection schemes utilize account features such as the ratio of tweets containing URLs and the account creation date, or relation features in the Twitter graph. These detection schemes are ineffective against feature fabrications or consume much time and resources. Conventional suspicious URL detection schemes utilize several features including lexical features of URLs, URL redirection, HTML content, and dynamic behavior. However, evading techniques such as time-based evasion and crawler evasion exist. In this paper, we propose WARNINGBIRD, a suspicious URL detection system for Twitter. Our system investigates correlations of URL redirect chains extracted from several tweets. Because attackers have limited resources and usually reuse them, their URL redirect chains frequently share the same URLs. We develop methods to discover correlated URL redirect chains using the frequently shared URLs and to determine their suspiciousness. We collect numerous tweets from the Twitter public timeline and build a statistical classifier using them. Evaluation results show that our classifier accurately and efficiently detects
  • 2. suspicious URLs. We also present WARNINGBIRD as a near real-time system for classifying suspicious URLs in the Twitter stream. EXISTING SYSTEM: In the existing system attackers use shortened malicious URLs that redirect Twitter users to external attack servers. To cope with malicious tweets, several Twitter spam detection schemes have been proposed. These schemes can be classified into account feature-based, relation feature-based, and message feature based schemes. Account feature-based schemes use the distinguishing features of spam accounts such as the ratio of tweets containing URLs, the account creation date, and the number of followers and friends. However, malicious users can easily fabricate these account features. The relation feature-based schemes rely on more robust features that malicious users cannot easily fabricate such as the distance and connectivity apparent in the Twitter graph. Extracting these relation features from a Twitter graph, however, requires a significant amount of time and resources as a Twitter graph is tremendous in size. The message feature-based scheme focused on the lexical features of messages. However, spammers can easily change the shape of their messages. A number of suspicious URL detection schemes have also been introduced.
  • 3. DISADVANTAGES OF EXISTING SYSTEM: Malicious servers can bypass an investigation by selectively providing benign pages to crawlers. For instance, because static crawlers usually cannot handle JavaScript or Flash, malicious servers can use them to deliver malicious content only to normal browsers. A recent technical report from Google has also discussed techniques for evading current Web malware detection systems. Malicious servers can also employ temporal behaviors— providing different content at different times—to evade an investigation PROPOSED SYSTEM: In this paper, we propose WARNINGBIRD, a suspicious URL detection system for Twitter. Instead of investigating the landing pages of individual URLs in each tweet, which may not be successfully fetched, we considered correlations of URL redirect chains extracted from a number of tweets. Because attacker’s resources are generally limited and need to be reused, their URL redirect chains usually share the same URLs. We therefore created a method to detect correlated URL redirect
  • 4. chains using such frequently shared URLs. By analyzing the correlated URL redirect chains and their tweet context information, we discover several features that can be used to classify suspicious URLs. We collected a large number of tweets from the Twitter public timeline and trained a statistical classifier using the discovered features. ADVANTAGES OF PROPOSED SYSTEM: The trained classifier is shown to be accurate and has low false positives and negatives. The contributions of this paper are as follows: • We present a new suspicious URL detection system for Twitter that is based on the correlations of URL redirect chains, which are difficult to fabricate. The system can find correlated URL redirect chains using the frequently shared URLs and determine their suspiciousness in almost real time. • We introduce new features of suspicious URLs: some of which are newly discovered and while others are variations of previously discovered features. • We present the results of investigations conducted on suspicious URLs that have been widely distributed through Twitter over several months.
  • 5. SYSTEM ARCHITECTURE: ALGORITHM USED:  Offline supervised learning algorithm SYSTEM CONFIGURATION:- HARDWARE CONFIGURATION:-  Processor - Pentium –IV  Speed - 1.1 Ghz  RAM - 256 MB(min)
  • 6.  Hard Disk - 20 GB  Key Board - Standard Windows Keyboard  Mouse - Two or Three Button Mouse  Monitor - SVGA SOFTWARE CONFIGURATION:-  Operating System : Windows XP  Programming Language : JAVA  Java Version : JDK 1.6 & above. REFERENCE: Sangho Lee, Student Member, IEEE, and Jong Kim, Member, IEEE ―WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream‖-IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. X, NO. Y, JANUARY2013.