SlideShare a Scribd company logo
Detection of bots on
Twitter
B.E Project: Detection of Bots on Twitter
B.E Project: Detection of Bots on Twitter
Literature Review
Online Human-Bot Interactions:
Detection, Estimation, and
Characterization
Using clustering analysis to
characterize bots as
● Spammers
● Self-Promoter
● Accounts that post
content from
connected
applications.
The best classification
performance of 0.95 AUC
was obtained by the
Random Forest algorithm.
● In some cases, the
boundary separating
the bot and human is
not sharp.
● Too many feature
sets used.
A New Approach to Bot Detection: Striking
the Balance Between Precision and Recall
Bot detection which considers
recall in its formulation along
with precision using AdaBoost
and algorithm.
● Increased calculation
complexity.
● Decreased precision.
● Need of a boosting
algorithm.
Who is Tweeting on Twitter: Human, Bot, or
Cyborg?
To classify a Twitter user as a human or
a bot by observing the difference among
human and bot in terms of tweeting
behavior, tweet content, and account
properties.
The problem
Objective
Classify a Twitter
user as a human or a
bot by observing the
difference among
human and bot in
terms of tweeting
behavior, tweet
content, and account
properties.
Context
● Around 9 percent to
15 percent of
Twitter's active
monthly users are
bots
● So, out of 319
million active
monthly users, that
translates into 28.7
million to 47.9
million bots.
Problem statement
To assist human users in
identifying who they are
interacting with,our project
focuses on the classification of
human and bot accounts on
Twitter,by using the
combination of features
extracted from user's’ account
to determine the likelihood of
being a human or bot.
Process
Features
Identifying Features
We will use various
features, like followers to
friends ratio, URL ratio,
etc. for identification.
Classification
Classify bots from
humans
The decision maker uses
the features identified to
determine whether is a
human or bot and
classify the account user
accordingly.
Analytics
Visualization of Analysis
Visualise each feature
showing the
differentiating
characteristics of a bot.
And displaying the final
analysis/classification.
Features
To Identify a Bot
1. Huge amount of following, small
amount of followers (Followers to Friend
Ratio)
2. Coming from an API (Tweeting
device)
3. URL Ratio
4. Reciprocity
5. Entropy
Implementation
1. Followers to Friend Ratio
● We have the individual columns containing the followers and friends count of
every user (both humans and bots) in the dataset.
● We will create a new column having the ratio of followers and friends count in
the final dataset
2. Recognizing the Tweeting Device
● Identify the user-id or tweet-id and using the Twitter API, recognize the device
from which the tweets were tweeted from and classify them as mobile or
laptop, API, third-party component, etc.
● Store this information in a column and use for classification.
3. URL Ratio
● We have the feature column that contains the URL count that is extracted
from the account user’s tweets.
● We will create a new column having the ratio of URL count to total tweets
count in the final dataset.
4. Reciprocity
1. For every user, randomly choose 20 users from the list of users whom he/she
follows.
2. Check whether the follower follows back the user or not
3. Find the ratio of ‘number of users who follow back’ to 20
4. Store this ratio in the dataset
5. Entropy component
● The entropy component detects periodic or regular timing of the messages
posted by a Twitter user.
● If the entropy or corrected conditional entropy is low for the inter-tweet
delays, it indicates periodic or regular behavior, a sign of automation.
● High entropy indicates irregularity, a sign of human participation.
Logistic Regression - The Classifier
Logistic regression does not try to predict the value of a numeric variable given a set of
inputs. Instead, the output is a probability that the given input point belongs to a certain
class(human or bot).
● 0 = you are absolutely sure that the user is a human.
● 1 = you are absolutely sure that the user is a bot.
● Any value above 0.5 = you are pretty sure about that user being a bot. Say you
predict 0.8, then you are 80% confident that the user is bot. Likewise, any value
below 0.5 you can say with a corresponding degree of confidence that the user is
not a bot.
It is clear that the data points MUST be separable into the two aforementioned regions by
a linear boundary.
Random Forest - The Classifier
Random forest classification algorithm is used for
training the dataset.It is the collection of decision trees.
The random forest learning is also robust when training
with imbalanced data set. It provides high accuracy rate
when training with large dataset.
In random forest classification algorithm, a random
instance of data is chosen from the training dataset.
With the selected data, a random set of attributes
from the original dataset is chosen.In a dataset, where
M is the total number of input attributes in the dataset,
only m attributes are chosen at random for each tree
where m<M.
Bot Human Bot
Bot
Visualization
It’s a bot!
Analytics
Seaborn harnesses the power of matplotlib to create beautiful charts in a few lines of code.
The key difference is Seaborn’s default styles and color palettes, which are designed to be
more aesthetically pleasing and modern.
Seaborn offers various features such as built in themes, color palettes, functions and tools to
visualize univariate, bivariate, linear regression, matrices of data, statistical time series etc
which lets us to build complex visualizations.
Installation - pip install seaborn
Real-time
detection
It’s a bot!
Algorithm
1. User visits the web application.
2. User signs in using his/her Twitter account and allows the web application to read the user’s twitter feed.
3. The web application, uses the access given by the user to fetch the user’s Twitter feed using the Twitter API
4. For every tweet in the user’s Twitter feed, the web application sends the userId of the tweeter’s account to the server.
5. The server, on receiving an user-id, check the Redis cache for an existing classification output.
a. If the cache has the classification output, then return this output to the web application
b. Else,
i. The server uses the Twitter API to fetch details necessary for the classification model to predict the
output.
ii. The server, on successful fetch of user details, sends them to the AWS Lambda function for
classification.
iii. The Lambda function, on receiving the user details, fetches the trained machine learning model
from Amazon S3 and uses it to classify the user. It returns this output back to the server.
iv. The server, on receiving the output, formats it accordingly and sends the output to the web
application.
6. The web application, on receiving the classification output, displays it along with the tweet to the user.
Block Diagram
Fig: Block Diagram of Training Phase
Block Diagram
Fig: Architectural View of Deployment Phase
DFD Level - 0
DFD Level - 1
DFD Level - 2
Possible Problems and Suggestions
● Problem: Language of the tweet content can cause errors since our
framework/model will tend to assign high bot scores to users who tweet in
multiple languages.
Solution : Ignore language dependent features.
● Problem: Determining which machine learning algorithm to use for our work
that will deliver accurate results with higher precision.
Solution : Decided to go for Logistic Regression and Random Forest and
determine which algorithm solves our problem.
Possible Problems and Suggestions
● Problem: The real-time detection of bots may take longer due to delay in
communication with server and running the trained algorithm
Solution: Cache or store the result of a username search in the database to
avoid running the algorithm for that username again
● Problem : Deploying trained machine learning model on server
Solution : Using AWS Lambda and S3 for deployment and API gateway for
sending the request and receiving the response.
Technologies
Algorithms
Libraries
Tools
● Python
● Scikit-learn for machine learning
algorithms
● AWS (Lambda, API Gateway, S3)
● Seaborn for graphing and
visualization.
● MongoDB (Database)
● Python libraries like pickle
● Twitter API
● Redis (Cache Store)
● Nodejs (Back End)
● HTML, CSS, JS, JQuery ,Bootstrap,
AJAX (FrontEnd)
Implementation
Results
Precision Recall F1-score Support
0
(human)
0.52 1.00 0.69 92
1
(bot)
0.00 0.00 0.00 84
avg/
total
0.27 0.52 0.36 176
Logistic Regression
Precision Recall F1-score Support
0
(human)
0.96 0.96 0.96 92
1
(bot)
0.95 0.95 0.95 84
avg/
total
0.95 0.95 0.95 176
Random Forest
Conclusion
Logistic regression underperforms even though it is known for its binary classification, and the reason for
that its inflexibility to capture complex relationships and also tends to underperform when there are non-
linear decision boundaries.Also, logistic regression are susceptible to outliers.
It must be noted that, in some cases, the boundary separating the bot and human is not sharp [3] and for
logistic regression to perform its best the data points MUST be separable into two aforementioned regions
by a linear boundary.
We can see that, Random Forest is one of the most effective and versatile machine learning algorithm and
has higher classification accuracy (0.95)
The machine learning model will be trained using Random Forest algorithms to classify whether the given
user is a bot or a human.
Visualization
Analysis Results – Random
Forest
(Heat Map, Box Plot, Pair Plot)
It’s a bot!
B.E Project: Detection of Bots on Twitter
B.E Project: Detection of Bots on Twitter
Handcrafted
APIs
1. GET /api/classifyUserName
1. POST /api/extractUserData
1. GET /api/fetchTweets
Applications / Future Work
1. Our model/framework will be able to identify
whether a twitter user is a bot or a human.
1. We can extend our work to other social media
platforms like facebook,etc.
1. Our work will safeguard oneself and an
organization from false information, malicious
contents and ensure their brand value.
1. Our project work can also be utilized to
sort/identify human online traffic from bot
activity.
References
[1] Title: Who is Tweeting on Twitter: Human, Bot, or Cyborg?
Authors: Zi Chu, Steven Gianvecchio, Haining Wang and Sushil Jajodia
URL: http://www.pensivepuffin.com/dwmcphd/syllabi/hcde530_wi17/twitter_readings/chu-who.tweets.ACSAC2010.pdf
[2] Title: A New Approach to Bot Detection: Striking the Balance Between Precision and Recall
Authors: Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M. Carley and Huan Liu
URL: http://www.public.asu.edu/~fmorstat/paperpdfs/asonam16.pdf
[3] Title: Online Human-Bot Interactions: Detection, Estimation, and Characterization
Authors: Onur Varol, Emilio Ferrara, Clayton A.Davis, Filippo Menczer, Alessandro Flammini
URL: https://arxiv.org/pdf/1703.03107.pdf
[4] Title: Using AWS for deploying trained model
Author: Jose Miguel Arrieta
URL: https://datascience.com.co/creating-an-api-using-scikit-learn-aws-lambda-s3-and-amazon-api-gateway-d9d10317e38d
Thank You

More Related Content

PDF
AI presentation and introduction - Retrieval Augmented Generation RAG 101
PPTX
Basics of MongoDB
PPTX
Presentation on Business Intelligence (BI)
PPTX
Performance appraisal
PPT
Guest services
PPTX
Conjuntivitis
PPTX
Estilo de vida (1) diapositivas
PPTX
Automation and robotics
AI presentation and introduction - Retrieval Augmented Generation RAG 101
Basics of MongoDB
Presentation on Business Intelligence (BI)
Performance appraisal
Guest services
Conjuntivitis
Estilo de vida (1) diapositivas
Automation and robotics

What's hot (20)

DOCX
Python report on twitter sentiment analysis
PDF
IRJET- College Enquiry Chatbot System(DMCE)
PPTX
blood bank management system project report
PPT
automatic classification in information retrieval
PDF
Information Retrieval based on Cluster Analysis Approach
PPT
similarity measure
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
ODP
Web content mining
PDF
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
PDF
IRJET - College Enquiry Chatbot
PDF
Usr tour and tra vel mini project report
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PPTX
Diabetic Retinopathy
PDF
Identifying customer segments using machine learning
PPTX
Web scraping
PPTX
Automatic indexing
PPT
Ecg analysis in the cloud
DOCX
Facial Expression Recognition via Python
Python report on twitter sentiment analysis
IRJET- College Enquiry Chatbot System(DMCE)
blood bank management system project report
automatic classification in information retrieval
Information Retrieval based on Cluster Analysis Approach
similarity measure
NE7012- SOCIAL NETWORK ANALYSIS
Web content mining
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
IRJET - College Enquiry Chatbot
Usr tour and tra vel mini project report
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
Diabetic Retinopathy
Identifying customer segments using machine learning
Web scraping
Automatic indexing
Ecg analysis in the cloud
Facial Expression Recognition via Python
Ad

Similar to B.E Project: Detection of Bots on Twitter (20)

PPTX
Jaswanth-PPT.pptx
PPTX
Pa2 session 4
PDF
IRJET - Online Product Scoring based on Sentiment based Review Analysis
PDF
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
PDF
IRJET- Recruitment Chatbot
PDF
IRJET - Smart Vet Locator for Hybrid Pets
PDF
IRJET- Development of College Enquiry Chatbot using Snatchbot
PDF
IRJET - Artificial Conversation Entity for an Educational Institute
PDF
IRJET - Chat-Bot for College Information System using AI
PDF
Classroom Attendance using Face Detection and Raspberry-Pi
PDF
IRJET - Recommendation System using Big Data Mining on Social Networks
PPTX
Aia session2 (1)
PPTX
What is algorithm
PPTX
Pa2 session 3
PDF
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
PDF
IRJET- Advanced Phishing Identification Technique using Machine Learning
PPTX
Project PPT.pptx for social media project
PPTX
click stream sequence analysis for mallicious bot identification
PPTX
Chatbot using Python, mini project presentation
DOCX
Jaswanth-PPT.pptx
Pa2 session 4
IRJET - Online Product Scoring based on Sentiment based Review Analysis
Detecting Malicious Bots in Social Media Accounts Using Machine Learning Tech...
IRJET- Recruitment Chatbot
IRJET - Smart Vet Locator for Hybrid Pets
IRJET- Development of College Enquiry Chatbot using Snatchbot
IRJET - Artificial Conversation Entity for an Educational Institute
IRJET - Chat-Bot for College Information System using AI
Classroom Attendance using Face Detection and Raspberry-Pi
IRJET - Recommendation System using Big Data Mining on Social Networks
Aia session2 (1)
What is algorithm
Pa2 session 3
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET- Advanced Phishing Identification Technique using Machine Learning
Project PPT.pptx for social media project
click stream sequence analysis for mallicious bot identification
Chatbot using Python, mini project presentation
Ad

More from Mufaddal Haidermota (6)

PPTX
Shell scripting
PPTX
User management
PPTX
Azure lessons
PPTX
Data Literacy
PPTX
Queuing theory
Shell scripting
User management
Azure lessons
Data Literacy
Queuing theory

Recently uploaded (20)

PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
PPT on Performance Review to get promotions
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Welding lecture in detail for understanding
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
DOCX
573137875-Attendance-Management-System-original
PDF
composite construction of structures.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Well-logging-methods_new................
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
CH1 Production IntroductoryConcepts.pptx
PPT on Performance Review to get promotions
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
OOP with Java - Java Introduction (Basics)
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
additive manufacturing of ss316l using mig welding
Welding lecture in detail for understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
573137875-Attendance-Management-System-original
composite construction of structures.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Geodesy 1.pptx...............................................
Well-logging-methods_new................
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd

B.E Project: Detection of Bots on Twitter

  • 1. Detection of bots on Twitter
  • 5. Online Human-Bot Interactions: Detection, Estimation, and Characterization Using clustering analysis to characterize bots as ● Spammers ● Self-Promoter ● Accounts that post content from connected applications. The best classification performance of 0.95 AUC was obtained by the Random Forest algorithm. ● In some cases, the boundary separating the bot and human is not sharp. ● Too many feature sets used. A New Approach to Bot Detection: Striking the Balance Between Precision and Recall Bot detection which considers recall in its formulation along with precision using AdaBoost and algorithm. ● Increased calculation complexity. ● Decreased precision. ● Need of a boosting algorithm. Who is Tweeting on Twitter: Human, Bot, or Cyborg? To classify a Twitter user as a human or a bot by observing the difference among human and bot in terms of tweeting behavior, tweet content, and account properties.
  • 6. The problem Objective Classify a Twitter user as a human or a bot by observing the difference among human and bot in terms of tweeting behavior, tweet content, and account properties. Context ● Around 9 percent to 15 percent of Twitter's active monthly users are bots ● So, out of 319 million active monthly users, that translates into 28.7 million to 47.9 million bots. Problem statement To assist human users in identifying who they are interacting with,our project focuses on the classification of human and bot accounts on Twitter,by using the combination of features extracted from user's’ account to determine the likelihood of being a human or bot.
  • 7. Process Features Identifying Features We will use various features, like followers to friends ratio, URL ratio, etc. for identification. Classification Classify bots from humans The decision maker uses the features identified to determine whether is a human or bot and classify the account user accordingly. Analytics Visualization of Analysis Visualise each feature showing the differentiating characteristics of a bot. And displaying the final analysis/classification.
  • 8. Features To Identify a Bot 1. Huge amount of following, small amount of followers (Followers to Friend Ratio) 2. Coming from an API (Tweeting device) 3. URL Ratio 4. Reciprocity 5. Entropy
  • 10. 1. Followers to Friend Ratio ● We have the individual columns containing the followers and friends count of every user (both humans and bots) in the dataset. ● We will create a new column having the ratio of followers and friends count in the final dataset
  • 11. 2. Recognizing the Tweeting Device ● Identify the user-id or tweet-id and using the Twitter API, recognize the device from which the tweets were tweeted from and classify them as mobile or laptop, API, third-party component, etc. ● Store this information in a column and use for classification.
  • 12. 3. URL Ratio ● We have the feature column that contains the URL count that is extracted from the account user’s tweets. ● We will create a new column having the ratio of URL count to total tweets count in the final dataset.
  • 13. 4. Reciprocity 1. For every user, randomly choose 20 users from the list of users whom he/she follows. 2. Check whether the follower follows back the user or not 3. Find the ratio of ‘number of users who follow back’ to 20 4. Store this ratio in the dataset
  • 14. 5. Entropy component ● The entropy component detects periodic or regular timing of the messages posted by a Twitter user. ● If the entropy or corrected conditional entropy is low for the inter-tweet delays, it indicates periodic or regular behavior, a sign of automation. ● High entropy indicates irregularity, a sign of human participation.
  • 15. Logistic Regression - The Classifier Logistic regression does not try to predict the value of a numeric variable given a set of inputs. Instead, the output is a probability that the given input point belongs to a certain class(human or bot). ● 0 = you are absolutely sure that the user is a human. ● 1 = you are absolutely sure that the user is a bot. ● Any value above 0.5 = you are pretty sure about that user being a bot. Say you predict 0.8, then you are 80% confident that the user is bot. Likewise, any value below 0.5 you can say with a corresponding degree of confidence that the user is not a bot. It is clear that the data points MUST be separable into the two aforementioned regions by a linear boundary.
  • 16. Random Forest - The Classifier Random forest classification algorithm is used for training the dataset.It is the collection of decision trees. The random forest learning is also robust when training with imbalanced data set. It provides high accuracy rate when training with large dataset. In random forest classification algorithm, a random instance of data is chosen from the training dataset. With the selected data, a random set of attributes from the original dataset is chosen.In a dataset, where M is the total number of input attributes in the dataset, only m attributes are chosen at random for each tree where m<M. Bot Human Bot Bot
  • 18. Analytics Seaborn harnesses the power of matplotlib to create beautiful charts in a few lines of code. The key difference is Seaborn’s default styles and color palettes, which are designed to be more aesthetically pleasing and modern. Seaborn offers various features such as built in themes, color palettes, functions and tools to visualize univariate, bivariate, linear regression, matrices of data, statistical time series etc which lets us to build complex visualizations. Installation - pip install seaborn
  • 20. Algorithm 1. User visits the web application. 2. User signs in using his/her Twitter account and allows the web application to read the user’s twitter feed. 3. The web application, uses the access given by the user to fetch the user’s Twitter feed using the Twitter API 4. For every tweet in the user’s Twitter feed, the web application sends the userId of the tweeter’s account to the server. 5. The server, on receiving an user-id, check the Redis cache for an existing classification output. a. If the cache has the classification output, then return this output to the web application b. Else, i. The server uses the Twitter API to fetch details necessary for the classification model to predict the output. ii. The server, on successful fetch of user details, sends them to the AWS Lambda function for classification. iii. The Lambda function, on receiving the user details, fetches the trained machine learning model from Amazon S3 and uses it to classify the user. It returns this output back to the server. iv. The server, on receiving the output, formats it accordingly and sends the output to the web application. 6. The web application, on receiving the classification output, displays it along with the tweet to the user.
  • 21. Block Diagram Fig: Block Diagram of Training Phase
  • 22. Block Diagram Fig: Architectural View of Deployment Phase
  • 26. Possible Problems and Suggestions ● Problem: Language of the tweet content can cause errors since our framework/model will tend to assign high bot scores to users who tweet in multiple languages. Solution : Ignore language dependent features. ● Problem: Determining which machine learning algorithm to use for our work that will deliver accurate results with higher precision. Solution : Decided to go for Logistic Regression and Random Forest and determine which algorithm solves our problem.
  • 27. Possible Problems and Suggestions ● Problem: The real-time detection of bots may take longer due to delay in communication with server and running the trained algorithm Solution: Cache or store the result of a username search in the database to avoid running the algorithm for that username again ● Problem : Deploying trained machine learning model on server Solution : Using AWS Lambda and S3 for deployment and API gateway for sending the request and receiving the response.
  • 28. Technologies Algorithms Libraries Tools ● Python ● Scikit-learn for machine learning algorithms ● AWS (Lambda, API Gateway, S3) ● Seaborn for graphing and visualization. ● MongoDB (Database) ● Python libraries like pickle ● Twitter API ● Redis (Cache Store) ● Nodejs (Back End) ● HTML, CSS, JS, JQuery ,Bootstrap, AJAX (FrontEnd)
  • 30. Precision Recall F1-score Support 0 (human) 0.52 1.00 0.69 92 1 (bot) 0.00 0.00 0.00 84 avg/ total 0.27 0.52 0.36 176 Logistic Regression
  • 31. Precision Recall F1-score Support 0 (human) 0.96 0.96 0.96 92 1 (bot) 0.95 0.95 0.95 84 avg/ total 0.95 0.95 0.95 176 Random Forest
  • 32. Conclusion Logistic regression underperforms even though it is known for its binary classification, and the reason for that its inflexibility to capture complex relationships and also tends to underperform when there are non- linear decision boundaries.Also, logistic regression are susceptible to outliers. It must be noted that, in some cases, the boundary separating the bot and human is not sharp [3] and for logistic regression to perform its best the data points MUST be separable into two aforementioned regions by a linear boundary. We can see that, Random Forest is one of the most effective and versatile machine learning algorithm and has higher classification accuracy (0.95) The machine learning model will be trained using Random Forest algorithms to classify whether the given user is a bot or a human.
  • 33. Visualization Analysis Results – Random Forest (Heat Map, Box Plot, Pair Plot) It’s a bot!
  • 36. Handcrafted APIs 1. GET /api/classifyUserName 1. POST /api/extractUserData 1. GET /api/fetchTweets
  • 37. Applications / Future Work 1. Our model/framework will be able to identify whether a twitter user is a bot or a human. 1. We can extend our work to other social media platforms like facebook,etc. 1. Our work will safeguard oneself and an organization from false information, malicious contents and ensure their brand value. 1. Our project work can also be utilized to sort/identify human online traffic from bot activity.
  • 38. References [1] Title: Who is Tweeting on Twitter: Human, Bot, or Cyborg? Authors: Zi Chu, Steven Gianvecchio, Haining Wang and Sushil Jajodia URL: http://www.pensivepuffin.com/dwmcphd/syllabi/hcde530_wi17/twitter_readings/chu-who.tweets.ACSAC2010.pdf [2] Title: A New Approach to Bot Detection: Striking the Balance Between Precision and Recall Authors: Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M. Carley and Huan Liu URL: http://www.public.asu.edu/~fmorstat/paperpdfs/asonam16.pdf [3] Title: Online Human-Bot Interactions: Detection, Estimation, and Characterization Authors: Onur Varol, Emilio Ferrara, Clayton A.Davis, Filippo Menczer, Alessandro Flammini URL: https://arxiv.org/pdf/1703.03107.pdf [4] Title: Using AWS for deploying trained model Author: Jose Miguel Arrieta URL: https://datascience.com.co/creating-an-api-using-scikit-learn-aws-lambda-s3-and-amazon-api-gateway-d9d10317e38d