SlideShare a Scribd company logo
eliminary Screening
Project Title ---- Toxic Comment Detection
Presented by :- Under the Supervision of:
1
Contents:
 Introduction
 Motivation
 Problem statement
 Literature survey
 Meeting Details
 Workload Distribution
 Project Planning
 Screenshot of Approval of the Certificate of the Project Report
 Methodology Used
 Solution Approached
 Algorithms and Framework
 Outcome Produced
 Proof of the Outcome
Introduction
• Texts that are considered toxic are those that are impolite, show disrespect, or have a tendency to drive away from the
conversation.
• On various social networks, news websites, and online forums, we might be able to have healthier discussions if these toxic texts
can be automatically identified.
• These texts contain dangers such as high-toxicity texts that lead to personal insults, online abuse, and bullying habits that are
harmful to a person's psychological health and emotional well-being.
• Many people refrain from expressing themselves and give up on expressing themselves because they are afraid of online
harassment and bullying.
• An automated system must be formulated to keep away, remove, or identify such harmful content from online sites. But
developing such a toxicity identification system is a difficult task for online platform providers.
• Natural language processing provides a helping hand in the identification of toxicity in texts expressed as images or texts.
• The detection of insulting comments is a critical area of research in natural language processing.
• The primary goal is to assess the toxicity and habits expressed in words and their contexts.
• The objective of this paper is to propose a model to detect toxic or non-toxic texts with higher accuracy.
Motivation
• People refrain from expressing
themselves due to toxicity on social
media affecting their emotional and
mental well-being.
• A system must be developed to
identify such toxicity in texts.
Problem
statement
• To propose a model that will help
users to stay away from the
toxic environment that exist on the
social media in the form of text.
• To propose a model for identification
of toxicity i.e., toxic or non- toxic in
texts.
• To propose a model with the high-rate
accuracy.
Literature Survey
S.NO. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS
1. Keeping
Children
safe online
with limited
resources:
Analyzing
what is seen
and what is
heard.
ALEKSANDAR
JEVREMOVIC,
MLADEN
VEINOVIC ,
MILAN
CABARKAPA
Designed a
framework(Casper)
which will directly
analyzes at the content
what the user sees and
hears.
BERT, for
images
CNN, LSTM,
BLSTM.
1. Twitter
sexism
parsed
2. You
tube
parsed
3. Toxicit
y
parsed
4. Attack
parsed
1. Accuracy-
95%
2. Audio
Accuracy-
91%
Online
grooming
and self-
harm
detection
are their
future
focus.
S.NO. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS
2. Text Mining
and text
analytics of
research
articles
Akshaya
Udgave and Prasa
nna Kulkarni
To analyze the
use of text
mining
techniques,
and to explore
recent
developments
in the field of
design
science.
Text mining,
NLP
In the future,
different
design
algorithms
would be
helpful in
resolving
various issues
in the text
mining field.
Integration of
domain
information,
varying
granularity
principles,
refinement of text
in multilingual
type and
ambiguity in the
handling of the
natural language
are major
problems and
challenges that
emerge
throughout the
text extraction or
mining phase.
S.NO. NAME AUTHOR OBJECTIVE ALGORITHMS DATASET CONCLUSIONS DRAWBACKS
3. Multilin
gual
Sentime
nt
Analysis
and
Toxicity
Detectio
n for
Text
Message
s in
Russian
Darya
Bogoradni
kova,
Olesia
Makhnytki
na, Anton
Matveev,
Anastasia
Zakharova,
Artem
Akulov
In this
paper, they
discuss an
approach to
sentiment
analysis and
emotion
identificatio
n for user
comments.
1.Text pre-
processing
2.Data
Augmentation
3.Sentiment
analysis
4. Detection of
toxic comments
5. Detection of
toxic spans.
The
dataset
contains
1703 user
reviews in
Russian
from two
online
education
platforms:
Coursera
and Stepik
Finally, they
achieved a
complex
solution for
evaluating
users’ opinions
about online-
courses.
S.N
O.
NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS
4. Commen
t toxicity
detection
via a
multicha
nnel
convolut
ional
bidirecti
onal
gated
recurrent
unit
Ashok
Kumar J,
Abirami,
Tina
Esther
Trueman ,
Erik
Cambria b,
To check
toxicity of the
neural
network using
ML
algorithims
Natural
language
processing,
MCBiGRU
model ,
CNN
223; 549
instances
with six
labels,
namely,
toxic,
obscene,
severe toxic,
insult,
threat, and
identity
hate. These
labels
define an
instance as
toxicity or
non-
toxicity.
achieve better
training and
testing accuracy
than the existing
models using
only n-gram
word
embeddings.
the proposed
MCBiGRU
model
outperforms the
existing results.
----
S.NO. NAME AUTHOR OBJECTIVE ALGORITH
MS
DATASET CONCLUSIONS DRAWBACKS
5. Detectin
g Islamic
Radicalis
m Arabic
Tweets
Using
Natural
Languag
e
Processi
ng
KHALID T.
MURSI
,MOHAM
MAD D.
ALAHMA
DI, FAISAL
S.
ALSUBAEI
,AND
AHMED S.
ALGHAM
DI
To automate
the process of
detecting
hateful
tweets,
utilized
advanced
Machine
Learning
(ML)
techniques
and perform
sentiment
analysis to
capture the
meaning of
the Arabic
words in a
proper word
embedding
(Word2Vec)
Word2vec 100,000
tweets of
the last
decade.
Determined the
most frequent
terminologies
in the radical
tweets of each
year which
include some
Jihadist groups,
Countries, and
Individuals.
This work can
help law
enforcement to
analyze and
detect
extremism in
social media.
Small dataset.
The proposed
paper has low
range of
radical
keywords
S.NO. NAME AUTHOR OBJECTIVE ALGORITHM
S
DATASET CONCLUSIONS DRAWBACKS
6 Offensive
Language
Detection
in Arabic
Social
Networks
Using
Evolution
ary-Based
Classifiers
Learned
From
Fine-
Tuned
Embeddin
gs
FATIMA
SHANNAQ
,
BASSAM
HAMMO ,
HOSSAM
FARIS ,
AND
PEDRO A.
Detect
offensive
tweets
using SVM
XGBoost
SVM(suppor
t Vector
Machine)
ArCybC
dataset
an intelligent
prediction
system to detect
the offensive
language in
Arabic tweets
has been
presented
Dataset of
ARCybC is
small,
effectiveness
towards bid
dataset is to
measured,.
S.NO. NAME AUTHOR OBJECTIVE ALGORITHMS DATASET CONCLUSION
S
DRAWBACKS
7 A
Frame
work
for
Hate
Speech
Detecti
on
Using
Deep
Convol
utional
Neural
Networ
k
Pradeep
kumar roy
To monitor
user’s posts
and filter the
hate speech
related post
before it is
spread.
Deep
Convolutional
Neural Network
(DCNN)
used 10 fold
cross-
validation
used with the
proposed
DCNN and
achieved the
best
prediction
recall value of
0.88 for hate
speech and
0.99 for non
hate speech
It can predict
only 53% of
tweets of his
correctly in the
dataset because
of the inbalance
in the dataset
(baise towards
non hate
tweets).
Images can be
also used for
the same.
S.NO. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS
8 An
Assess
ment of
Deep
Learnin
g
Models
and
Word
Embed
dings
for
Toxicit
y
Detecti
on
within
Online
Textual
Comm
ents
Danilo
Dessì Dieg
o
Reforgiato
Recupero
and Harald
Sack 1
Uses multiple
deep learning
models in
multiple tests
for checking
the toxicity of
the text.
Natural
language
processing ,
Sentiment
Analyis,
Emotion
Detection.
CNN
BERT
LSTM
Kaggle
based
dataset.
LSTM-based
model is the first
choice among
the experimented
models to detect
toxicity.
how various
word
embeddings may
represent the
domain
knowledge in a
variety of ways,
and an unique
model for all
cases might be
insufficient.
failure of BERT
embeddings
S.NO. NAME AUTHOR OBJECTIVE DATASET ALGORITHM CONCLUSION DRAWBACKS
9 Toxic
comme
nts
detectio
n using
LSTM
Krishna
Dubey,
Rahul Nair
This paper
aims to
achieve text
mining and
making use of
deep learning
models that
can nearly
accurate
classify given
text is toxic or
not.
ML algorithm,
LSTM, NLP,
artificial neural
network
Accuracy-94% Could have been
more precise and
ELMOL model
has not being
very used to
detect the
problem.
S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS
10 Detectin
g Toxic
Remarks
in
Online
Convers
ation
Pushpit
Gautam
This project
aims to
establish
toxicity
classification
scheme in
online
comments
based on
vocabulary and
other
characteristics
in a sentence
Kaggle
competition
multi label
Wikipedia
talk page edit
dataset
•Naïve bayes,
•Gaussian
naïve bayes,
• Support
vector
machine,
• Back
propagation
neural
network
It has been
observed that the
label power set
method with
multinomial naïve
could be used for
finding the toxic
comments with
more than one
type.
Dataset used in
this had more
than 1.5 Lakh
comments and
due to this
kernel was
frequently
getting down a
lot errors.
Implementation
of Adaboost in
scikit learn
library so that it
could be used
directly for
multilabel
classification
problems.
S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS
11. Detect
Toxic
Content to
Improve
Online
Conversatio
ns
Deepshi
Mediratta,
Nikhil oswal
Train online
text to detect
offensive
content
SVM, Naïve
Bayes, GRU
and LSTM
GRU using
GloVe
embedding
provided the best
result ( Accuracy
= 89.49, F1 score
= 0.72)
dataset provided is
highly imbalanced,
The data also
contains noise,
questions not
classified correctly by
humans,
S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS
12 Convolutiona
l Neural
Networks for
Toxic
Comment
Classification
Spiros V.
Georgakopou
los
Perform text
mining using
CNN
Convolutional neural
network,
word2vec
CNN can
outperform well
established
methodologies
providing enough
evidence that
their use is
appropriate for
toxic comment
classification
Promising
results are
motivating for
further
development of
CNN based
methodologies
for text mining
in the near
future, in our
interest,
employing
methods for
adaptive
learning and
providing
further
comparisons
with n-gram
based
approaches
S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS
13 Machine
learning
methods
for toxic
comment
classifica
tion: a
systemati
c review
Darco
Arcocez
Toxic
comment or
reply
detection
using machine
learning
RPART, SVM
and GLM
evaluated 62
classifiers
representing 19
major algorithmic
families against
features extracted
from the Jigsaw
dataset of
Wikipedia
comments
.compared the
classifiers based
on statistically
significant
differences in
accuracy and
relative execution
time.
S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS
14 A Study of
Multilingu
al Toxic
Text
Detection
Approache
s under
Imbalance
d Sample
Distributio
n
Guizhe
Song ,
Degen
Huang and
Zhifeng
Xiao
Use machine
learning for
toxic text
detection in in
uneven dataset
XLM-
RoBERTa;
MBERT
Part of English
training corpus is
divided into
multiple
languages
Sample size
reconstruction is
required.
Work load distribution
Serial No. Team Member Role to be assigned
1. Vishwajeet Kumar Research work, coding and
Documentation
2. Ashwani Tyagi Coding and concerned
Research, Product Design
3. Arpit Rao Research , Testing coding
and Product review
Project Planning
Topic found Research about the
topic
Define problem
statement
Workload
distribution
Prioritize tasks
Read previous years
research paper
Implementation
Methodology
Performance Analysis
Detection of toxic word based on proposed work
Data Analysis
Data Preprocessing
Collection of Data
RAW DATA
TEXT PRE-
PROCESSING
FEATURE
EXTRACTION
TRAINING
DATA
TEST DATA CLASSIFICATI
ON
BINARY
CLASSIFICATI
ON
TOXIC TEXT NON-TOXIC
TEXT
Solution Approach:-
• RAW DATA: We have first collected
the dataset from kaggle. We have
selected the dataset of Twitter.
• PRE-PROCESSING: We have edited,
cleansed and modified the data
in this step and the steps are shown.
• FEATURE EXTRACTION: We have
seen what features has been there in the
data in this step before training and
testing the data.
• TRAIN and TEST: We have divided
the dataset into two subsets train and test.
• CLASSIFICATION: For classification
we have used Linear Regression
, CNN, LSTM.
• And the used classifier has detected the
text is toxic or non-toxic text.
Algorithms and
Framework
Machine
Learning
Linear
Regression
Deep
Learning
CNN
LSTM
NLP
Semantic
Analysis
Outcome
Produced
The expected outcome of this project is a research
paper that we have submitted on IEEE explore.
Proof of the Outcome
Thank you

More Related Content

PDF
Toxic Comment Classification
PDF
Derogatory Comment Classification
PDF
Hate Speech / Toxic Comment Detection - Data Mining (CSE-362) Project
PPTX
Bilingual Toxic Comment Classification of English and Urdu.pptx
PPTX
CYBERBULLIYING DETECTION
PDF
IRJET - Real-Time Cyberbullying Analysis on Social Media using Machine Learni...
PPTX
Detecting the presence of cyberbullying using computer software
PPTX
Pattern Assignment (1)about machine learning.pptx
Toxic Comment Classification
Derogatory Comment Classification
Hate Speech / Toxic Comment Detection - Data Mining (CSE-362) Project
Bilingual Toxic Comment Classification of English and Urdu.pptx
CYBERBULLIYING DETECTION
IRJET - Real-Time Cyberbullying Analysis on Social Media using Machine Learni...
Detecting the presence of cyberbullying using computer software
Pattern Assignment (1)about machine learning.pptx

Similar to toxic commnets classification using python (20)

PDF
MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT O...
PPTX
ONLINE TOXIC COMMENTS.pptx
PDF
IRJET - Profanity Statistical Analyzer
PDF
Project report
PDF
Offensive_Language_Detection_on_Social_Media_Based_on_Text_Classification.pdf
PPTX
final review ppt of engineering hypothetic arm
PPTX
CYBER BULLYING DETECTION UPDATED USING social
PDF
IRJET - Twitter Sentiment Analysis using Machine Learning
PDF
IRJET - Cyberbulling Detection Model
PPTX
Major presentation
PDF
Detecting insults in social media conversations
PDF
Sentiment Analysis of Twitter Data
PPTX
Fast and accurate sentiment classification us and naive bayes model b516001
PPTX
Collective sensing
PPTX
Predicting Tweet Sentiment
PDF
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
PDF
IRJET - Twitter Sentimental Analysis
PDF
2017 sa tc_pi_meeting_-_poster final 2
PDF
76 s201914
PDF
Comparative Study of Cyberbullying Detection using Different Machine Learning...
MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT O...
ONLINE TOXIC COMMENTS.pptx
IRJET - Profanity Statistical Analyzer
Project report
Offensive_Language_Detection_on_Social_Media_Based_on_Text_Classification.pdf
final review ppt of engineering hypothetic arm
CYBER BULLYING DETECTION UPDATED USING social
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET - Cyberbulling Detection Model
Major presentation
Detecting insults in social media conversations
Sentiment Analysis of Twitter Data
Fast and accurate sentiment classification us and naive bayes model b516001
Collective sensing
Predicting Tweet Sentiment
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
IRJET - Twitter Sentimental Analysis
2017 sa tc_pi_meeting_-_poster final 2
76 s201914
Comparative Study of Cyberbullying Detection using Different Machine Learning...
Ad

More from Hamed Raza (20)

DOCX
project report for engineering studfents
DOCX
project report format for engineering stufents
PPT
blind stick for assistive technology project
PPT
helmetdetection for bikers security and alert.ppt
PPTX
fish feeder ppt using arduino and microcontroller
PPT
nawab shah alam college engineering technology
PPT
metal detection using arduino uno ppt.ppt
PPTX
ULTRASONIC NAVIGATION AID FOR VISUALLY IMPAIRED.pptx
PPTX
shop lifting using deep learnign and python pptx
PPTX
handgesture leds control by mems sensor and arduino uno
PPTX
fingerprint-based-exam-hall-authentication-system.pptx
PPTX
career guidance using ml and python for college students projects
PPT
night lighting using ldr and arduino uno
PPTX
INTELLIGENT STUDENTS TRACKING SYSTEM IN CAMPUS RF.pptx
PPTX
solar tracker sunflower model based.pptx
PPT
railway gate using servo and arduino uno
PPT
home automation ppt using google assistant and wifi
PPT
railway gate using arduino uno and servo motors
PPT
patient monitoring using iot and heart beat sensor
PPTX
Automatic Head Light Intensity control for avoid accidents at night times..pptx
project report for engineering studfents
project report format for engineering stufents
blind stick for assistive technology project
helmetdetection for bikers security and alert.ppt
fish feeder ppt using arduino and microcontroller
nawab shah alam college engineering technology
metal detection using arduino uno ppt.ppt
ULTRASONIC NAVIGATION AID FOR VISUALLY IMPAIRED.pptx
shop lifting using deep learnign and python pptx
handgesture leds control by mems sensor and arduino uno
fingerprint-based-exam-hall-authentication-system.pptx
career guidance using ml and python for college students projects
night lighting using ldr and arduino uno
INTELLIGENT STUDENTS TRACKING SYSTEM IN CAMPUS RF.pptx
solar tracker sunflower model based.pptx
railway gate using servo and arduino uno
home automation ppt using google assistant and wifi
railway gate using arduino uno and servo motors
patient monitoring using iot and heart beat sensor
Automatic Head Light Intensity control for avoid accidents at night times..pptx
Ad

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
web development for engineering and engineering
PPTX
additive manufacturing of ss316l using mig welding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
composite construction of structures.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
Sustainable Sites - Green Building Construction
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
web development for engineering and engineering
additive manufacturing of ss316l using mig welding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
bas. eng. economics group 4 presentation 1.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Lecture Notes Electrical Wiring System Components
R24 SURVEYING LAB MANUAL for civil enggi
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Foundation to blockchain - A guide to Blockchain Tech
CH1 Production IntroductoryConcepts.pptx
composite construction of structures.pdf
573137875-Attendance-Management-System-original
Current and future trends in Computer Vision.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Automation-in-Manufacturing-Chapter-Introduction.pdf

toxic commnets classification using python

  • 1. eliminary Screening Project Title ---- Toxic Comment Detection Presented by :- Under the Supervision of: 1
  • 2. Contents:  Introduction  Motivation  Problem statement  Literature survey  Meeting Details  Workload Distribution  Project Planning  Screenshot of Approval of the Certificate of the Project Report  Methodology Used  Solution Approached  Algorithms and Framework  Outcome Produced  Proof of the Outcome
  • 3. Introduction • Texts that are considered toxic are those that are impolite, show disrespect, or have a tendency to drive away from the conversation. • On various social networks, news websites, and online forums, we might be able to have healthier discussions if these toxic texts can be automatically identified. • These texts contain dangers such as high-toxicity texts that lead to personal insults, online abuse, and bullying habits that are harmful to a person's psychological health and emotional well-being. • Many people refrain from expressing themselves and give up on expressing themselves because they are afraid of online harassment and bullying. • An automated system must be formulated to keep away, remove, or identify such harmful content from online sites. But developing such a toxicity identification system is a difficult task for online platform providers. • Natural language processing provides a helping hand in the identification of toxicity in texts expressed as images or texts. • The detection of insulting comments is a critical area of research in natural language processing. • The primary goal is to assess the toxicity and habits expressed in words and their contexts. • The objective of this paper is to propose a model to detect toxic or non-toxic texts with higher accuracy.
  • 4. Motivation • People refrain from expressing themselves due to toxicity on social media affecting their emotional and mental well-being. • A system must be developed to identify such toxicity in texts.
  • 5. Problem statement • To propose a model that will help users to stay away from the toxic environment that exist on the social media in the form of text. • To propose a model for identification of toxicity i.e., toxic or non- toxic in texts. • To propose a model with the high-rate accuracy.
  • 6. Literature Survey S.NO. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS 1. Keeping Children safe online with limited resources: Analyzing what is seen and what is heard. ALEKSANDAR JEVREMOVIC, MLADEN VEINOVIC , MILAN CABARKAPA Designed a framework(Casper) which will directly analyzes at the content what the user sees and hears. BERT, for images CNN, LSTM, BLSTM. 1. Twitter sexism parsed 2. You tube parsed 3. Toxicit y parsed 4. Attack parsed 1. Accuracy- 95% 2. Audio Accuracy- 91% Online grooming and self- harm detection are their future focus.
  • 7. S.NO. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS 2. Text Mining and text analytics of research articles Akshaya Udgave and Prasa nna Kulkarni To analyze the use of text mining techniques, and to explore recent developments in the field of design science. Text mining, NLP In the future, different design algorithms would be helpful in resolving various issues in the text mining field. Integration of domain information, varying granularity principles, refinement of text in multilingual type and ambiguity in the handling of the natural language are major problems and challenges that emerge throughout the text extraction or mining phase.
  • 8. S.NO. NAME AUTHOR OBJECTIVE ALGORITHMS DATASET CONCLUSIONS DRAWBACKS 3. Multilin gual Sentime nt Analysis and Toxicity Detectio n for Text Message s in Russian Darya Bogoradni kova, Olesia Makhnytki na, Anton Matveev, Anastasia Zakharova, Artem Akulov In this paper, they discuss an approach to sentiment analysis and emotion identificatio n for user comments. 1.Text pre- processing 2.Data Augmentation 3.Sentiment analysis 4. Detection of toxic comments 5. Detection of toxic spans. The dataset contains 1703 user reviews in Russian from two online education platforms: Coursera and Stepik Finally, they achieved a complex solution for evaluating users’ opinions about online- courses.
  • 9. S.N O. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS 4. Commen t toxicity detection via a multicha nnel convolut ional bidirecti onal gated recurrent unit Ashok Kumar J, Abirami, Tina Esther Trueman , Erik Cambria b, To check toxicity of the neural network using ML algorithims Natural language processing, MCBiGRU model , CNN 223; 549 instances with six labels, namely, toxic, obscene, severe toxic, insult, threat, and identity hate. These labels define an instance as toxicity or non- toxicity. achieve better training and testing accuracy than the existing models using only n-gram word embeddings. the proposed MCBiGRU model outperforms the existing results. ----
  • 10. S.NO. NAME AUTHOR OBJECTIVE ALGORITH MS DATASET CONCLUSIONS DRAWBACKS 5. Detectin g Islamic Radicalis m Arabic Tweets Using Natural Languag e Processi ng KHALID T. MURSI ,MOHAM MAD D. ALAHMA DI, FAISAL S. ALSUBAEI ,AND AHMED S. ALGHAM DI To automate the process of detecting hateful tweets, utilized advanced Machine Learning (ML) techniques and perform sentiment analysis to capture the meaning of the Arabic words in a proper word embedding (Word2Vec) Word2vec 100,000 tweets of the last decade. Determined the most frequent terminologies in the radical tweets of each year which include some Jihadist groups, Countries, and Individuals. This work can help law enforcement to analyze and detect extremism in social media. Small dataset. The proposed paper has low range of radical keywords
  • 11. S.NO. NAME AUTHOR OBJECTIVE ALGORITHM S DATASET CONCLUSIONS DRAWBACKS 6 Offensive Language Detection in Arabic Social Networks Using Evolution ary-Based Classifiers Learned From Fine- Tuned Embeddin gs FATIMA SHANNAQ , BASSAM HAMMO , HOSSAM FARIS , AND PEDRO A. Detect offensive tweets using SVM XGBoost SVM(suppor t Vector Machine) ArCybC dataset an intelligent prediction system to detect the offensive language in Arabic tweets has been presented Dataset of ARCybC is small, effectiveness towards bid dataset is to measured,.
  • 12. S.NO. NAME AUTHOR OBJECTIVE ALGORITHMS DATASET CONCLUSION S DRAWBACKS 7 A Frame work for Hate Speech Detecti on Using Deep Convol utional Neural Networ k Pradeep kumar roy To monitor user’s posts and filter the hate speech related post before it is spread. Deep Convolutional Neural Network (DCNN) used 10 fold cross- validation used with the proposed DCNN and achieved the best prediction recall value of 0.88 for hate speech and 0.99 for non hate speech It can predict only 53% of tweets of his correctly in the dataset because of the inbalance in the dataset (baise towards non hate tweets). Images can be also used for the same.
  • 13. S.NO. NAME AUTHOR OBJECTIVE ALGORITHM DATASET CONCLUSION DRAWBACKS 8 An Assess ment of Deep Learnin g Models and Word Embed dings for Toxicit y Detecti on within Online Textual Comm ents Danilo Dessì Dieg o Reforgiato Recupero and Harald Sack 1 Uses multiple deep learning models in multiple tests for checking the toxicity of the text. Natural language processing , Sentiment Analyis, Emotion Detection. CNN BERT LSTM Kaggle based dataset. LSTM-based model is the first choice among the experimented models to detect toxicity. how various word embeddings may represent the domain knowledge in a variety of ways, and an unique model for all cases might be insufficient. failure of BERT embeddings
  • 14. S.NO. NAME AUTHOR OBJECTIVE DATASET ALGORITHM CONCLUSION DRAWBACKS 9 Toxic comme nts detectio n using LSTM Krishna Dubey, Rahul Nair This paper aims to achieve text mining and making use of deep learning models that can nearly accurate classify given text is toxic or not. ML algorithm, LSTM, NLP, artificial neural network Accuracy-94% Could have been more precise and ELMOL model has not being very used to detect the problem.
  • 15. S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS 10 Detectin g Toxic Remarks in Online Convers ation Pushpit Gautam This project aims to establish toxicity classification scheme in online comments based on vocabulary and other characteristics in a sentence Kaggle competition multi label Wikipedia talk page edit dataset •Naïve bayes, •Gaussian naïve bayes, • Support vector machine, • Back propagation neural network It has been observed that the label power set method with multinomial naïve could be used for finding the toxic comments with more than one type. Dataset used in this had more than 1.5 Lakh comments and due to this kernel was frequently getting down a lot errors. Implementation of Adaboost in scikit learn library so that it could be used directly for multilabel classification problems.
  • 16. S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS 11. Detect Toxic Content to Improve Online Conversatio ns Deepshi Mediratta, Nikhil oswal Train online text to detect offensive content SVM, Naïve Bayes, GRU and LSTM GRU using GloVe embedding provided the best result ( Accuracy = 89.49, F1 score = 0.72) dataset provided is highly imbalanced, The data also contains noise, questions not classified correctly by humans,
  • 17. S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS 12 Convolutiona l Neural Networks for Toxic Comment Classification Spiros V. Georgakopou los Perform text mining using CNN Convolutional neural network, word2vec CNN can outperform well established methodologies providing enough evidence that their use is appropriate for toxic comment classification Promising results are motivating for further development of CNN based methodologies for text mining in the near future, in our interest, employing methods for adaptive learning and providing further comparisons with n-gram based approaches
  • 18. S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS 13 Machine learning methods for toxic comment classifica tion: a systemati c review Darco Arcocez Toxic comment or reply detection using machine learning RPART, SVM and GLM evaluated 62 classifiers representing 19 major algorithmic families against features extracted from the Jigsaw dataset of Wikipedia comments .compared the classifiers based on statistically significant differences in accuracy and relative execution time.
  • 19. S.NO. NAME AUTHOR OBJECTIVE DATASET ALORITHM CONCLUSION DRAWBACKS 14 A Study of Multilingu al Toxic Text Detection Approache s under Imbalance d Sample Distributio n Guizhe Song , Degen Huang and Zhifeng Xiao Use machine learning for toxic text detection in in uneven dataset XLM- RoBERTa; MBERT Part of English training corpus is divided into multiple languages Sample size reconstruction is required.
  • 20. Work load distribution Serial No. Team Member Role to be assigned 1. Vishwajeet Kumar Research work, coding and Documentation 2. Ashwani Tyagi Coding and concerned Research, Product Design 3. Arpit Rao Research , Testing coding and Product review
  • 21. Project Planning Topic found Research about the topic Define problem statement Workload distribution Prioritize tasks Read previous years research paper Implementation
  • 22. Methodology Performance Analysis Detection of toxic word based on proposed work Data Analysis Data Preprocessing Collection of Data
  • 23. RAW DATA TEXT PRE- PROCESSING FEATURE EXTRACTION TRAINING DATA TEST DATA CLASSIFICATI ON BINARY CLASSIFICATI ON TOXIC TEXT NON-TOXIC TEXT Solution Approach:-
  • 24. • RAW DATA: We have first collected the dataset from kaggle. We have selected the dataset of Twitter. • PRE-PROCESSING: We have edited, cleansed and modified the data in this step and the steps are shown. • FEATURE EXTRACTION: We have seen what features has been there in the data in this step before training and testing the data. • TRAIN and TEST: We have divided the dataset into two subsets train and test. • CLASSIFICATION: For classification we have used Linear Regression , CNN, LSTM. • And the used classifier has detected the text is toxic or non-toxic text.
  • 26. Outcome Produced The expected outcome of this project is a research paper that we have submitted on IEEE explore.
  • 27. Proof of the Outcome