SlideShare a Scribd company logo
2
Most read
8
Most read
15
Most read
A Deep Learning Approach For Hate
Speech and Offensive Language
Detection on Twitter
Presented By:
Nasim Alam
M Tech Computer
INTRODUCTION
Hate Speech
● Hate speech is speech that attacks a person or group on the basis of attributes
such as race religion, ethnic origin, national origin, gender, disability, sexual
orientation.
● The law of some countries describes hate speech as speech, gesture or
conduct, writing, or display that incites violence or prejudicial action against a
protected group or individual on the basis of their membership of the group.
● Social media platforms like Facebook and twitter has raised concerns about
emerging dubious activity such as the intensity of hate, abusive and offensive
behavior among us.
2
Motivation
Potential of social media for spreading hate speech
◉ 30% internet penetration in India (World Bank, 2016)
◉ 241 million users of Facebook alone (The Next Web Report, 2017)
◉ 136 million Indians are active social media users (Yral Report, 2016)
◉ 200 million whatsapp users in India (Mashable, 2017)
3
OBJECTIVE
• The main objective of this work is to develop an automated deep learning
based approach for detecting hate speech and offensive language.
• Automated detection corresponds to automated learning such as machine
learning: supervised and unsupervised learning. We use a supervised learning
method to detect hate and offensive language.
• Classify tweets into three or four classes (like: racist, sexist, none , both) based
on tweet sentiment and other features that a tweet demonstrate.
4
PROJECT CONTRIBUTION
• An efficient feature extraction and selection.
• A Multi-layer perceptron based model to train and classify tweets
into hate, offensive or none.
• A Dynamic CNN based model for training and GloVe embedding
vector for feature extraction.
5
Literature survey
Refereance Dataset Techinque Results
Greevy et al
(2004)
PRINCIP Corpus
Size: 3 Million words from tweets
Model: SVM
Feature Extraction:
BOW, Bi-gram
Precision: 92.5%(BOW)
Recall: 87% (BOW)
Precision: 92.5% (Bi-gram)
Recall: 87% (Bi-gram)
Waseem and Hovy
(2016)
Total Annotated tweets: 16,914.
#Sexist tweets 3,383.
#Racist Tweets 1,972.
#tweets Neither racist nor sexist: 11,559.
Model: Char n-grams
Word n-grams
Precision: 73.89%(char n-gram)
Recall: 77.75% (char n-gram)
F1 Score: 72.87 (char n-grams)
Precision: 64.58%(word n-grams)
Recall: 71.93% (word n-grams)
F1 Score: 64.58(word n-grams)
Akshita et al
(2016)
Waseem and Hovy, 2016
Size: 22,142 tweets
Class: Benevolent, Hostile, others
Model: SVM, Seq2Seq
(LSTM), FastText
Classifier(by Facebook
AI Research)
Feature Extraction: TF-
IDF, Bag of n-words
Average F1 score among
classes: 0.723 (SVM),
0.74 (Seq2Seq)
Overall F1 Score: 0.84 (FastText)
6
Literature survey
Refereance Dataset Techinque Results
Ji Ho et al
(2016)
Waseem and Hovy, 2016
Waseem 2016
Class: Racism,Sexism and None
Size: 25k tweets
Model: Hybrid CNN
Classifier(wordCNN +
CharCNN)
Precision: 0.827
Recall: 0.827
F1 Score: 0.827
Davidson et al
(2017)
CrowdFlower (CF)
Class: Hate,offensive and None
Size: 25k tweets
Model: Linear SVM, Logistic,
Regression
precision: 0.91,
Recall: 0.90,
F1 score: 0.90.
Zhang et al
(2018)
7 publicly available dataset:
DT(24k), RM(2k), WZ-LS(18k), WZ-
L(16k), WZ-S.amt(6k), WZ-S.exp(6k),
WZ-S.gb(6k)
Model:CNN+GRU
Accuracy:
DT: 0.94, RM: 0.92, WZ-L:
0.82,WZ-S.amt: 0.92, WZ-S.exp:
0.92, WZ-S.gb: 0.93
7
Dataset
8
Dataset No of Tweets Classes (%Tweets) Target Class
WZ-LS 18,595 Racism(10.6%),
Sexism(20.2%), None (68.8%)
Racism, Sexism
WZ-L 16,093 Racism(12.01%),
Sexism(19.56%), None
(68.41%)
Racism, Sexism
WZ-S.exp 6,594 Racism(1.2%),
Sexism(11.7%), both(0.53%),
None (84.37%)
Racism, Sexism
Hate (Davidson) 24,783 Hate(11.6%),
offensive(76.6%), Neither
(11.8%)
Hate, Offensive
A Multi-Layer perceptron (MLP) based model
9
● Raw text in the form of tweets in csv file crawled from twitter using
Tweepy API.
● A lots of preprocessing done to get cleaned text.
● Feature Extraction:
○ Convert it into TF-IDF feature matrix.
○ POS TF feature matrix.
○ Other Features like: No_of_syllales, avg_syl_per_word,
no_of_unique_words,num_mentions,is_retweets,VaderSentime
nt:pos,neg,neutral, compound).
● Concatenated these feature matrices into one matrix.
● We used logistic regression with L1 regularization to select most
important features and then passed this selected feature vector to an
MLP network for classification.
● MLP consists of an input layer, three hidden layer and an output layer
or softmax layer.
○ Input layer Size: Size of Input feature matrix, Activation:
Sigmoid.
○ Number of nodes: 200, 140, 70 and Activation Function: Relu.
○ Softmax Layer: Output class: 3 or 4, Activation function:
Softmax.
MLP based Proposed model
10
A simple single layer CNN
● A Sentence (a single tweet): X1:n = X1 ⊕ X2 ⊕ …………..Xn
● All possible widow of length h: {X1:h, X2:h+1, …………Xn-h+1:n }
● We can have multiple filter or window of different length like h=1 for unigram, h=2 for bigram , h=3
for trigram and so on.
● This filter is consist of random weight which is convolved over sentence matrix in overlapped
fashion and a sum of multiplication of filter and X is calculated as feature map.
● A feature map C = [c1,c2,……………………cn+h-1] ∈ ℝn-h+1, for multiple filter we may have multiple
feature map as Ci = [C1, C2, …………Cm] where m is number of filters.
● pooling: pooling is a process of selecting only interested region from the convolution feature vector.
The result of pooling is Ĉ = max{ C } and Ĉi can be pooled feature vector for ith filter.
● All the pooled vectors are concatenated into single feature vector Z = [Ĉ1, Ĉ2, ……, Ĉm ]
● Finally Z feature vector is passed through a softmax function for final classification.
Word2Vec Word Embedding
11
Word2vec
• Word2vec is a predictive model, which uses an ANN based model to
predict the real valued vector of a target word with respect to the
another context word.
• Mikolov et al used continuous bag of words and skipgram models
that are trained over millions of words and represent each in a
vector space.
GloVe
• GloVe is a semantic vector space models of language represent
each word with a real valued vector.
• GloVe model uses word frequency and global co-occurance count
matrix.
• Count-based models learn their vectors by essentially doing
dimensionality reduction on the co-occurrence counts matri.
• These vectors can be used as features in a variety of applications
such as information retrieval, document classification, question
answering, NER, and Parsing.
Representation of word in vector space
Text based CNN
12
Text based Convolutional Neural Network operation (Source: Kim 2014)
13
Dynamic Convolutional Neural Network
• Wide Convolution: Given an input sentence, to obtain the first
layer of the DCNN we take the embedding Wi ∈ ℝd for each word
in the sentence and construct the sentence matrix s ∈ ℝd × s .
• A convolutional layer in the network is obtained by convolving a
matrix of weights m ∈ ℝd × m with the matrix of activations at the
layer below.
• A dynamic k-max pooling operation is a k-max pooling
operation where we let k be a function of the length of the
sentence and the depth of the network, we simply model the
pooling parameter as follows:
Where i is ith conv-layer in which k max-pooling is
applied. L is the total number of convolutional layers
in the network.S is input sentence length.
• Folding is used just to sum every two rows in feature map
component wise. For the feature map of d rows folding returns
d/2 rows.
A DCNN Architecture (Source: Kalchbrenner et al. (2014) )
A DCNN based Model for Hate speech detection
14
● Tweets: Crawled tweets using tweet-id, saved as csv file having tweets and label.
● Preprocessing of tweets:
○ Convert to lowercase, Stop words removal.
○ Remove unwanted symbols and retweets.
○ Normalize the words to make it meaningful.
○ Remove tokens having document frequency less than 7 which removes
sparse features which is less informative.
● Word2vec conversion:
○ A 300-dimensional word embedding GloVe model, which is pre- trained on
the 4-billion-word Wikipedia text corpus by researcher from Stanford
University.
○ Embedding dimension: 100*300.
● Passed to DCNN model for classification:
○ Four conv1d layer of having 300 filters of each of window size 1,2,3 and 4.
○ K-max pooling performed corresponding to each conv1d and merged into
one single vector.
○ Further passed through Dropout, dense layer and softmax layer for
classification.
A DCNN based proposed model
Results and Discussion
15
Datasets SVM MLP CNN* DCNN
WZ-LS 0.73 0.83 0.82 0.83
WZ-L 0.74 0.83 0.82 0.83
WZ-S.exp 0.89 0.93 0.90 0.9283
Hate 0.87 0.92 0.91 0.92
Table 1: shows testing accuracy of 4 different model on 4 publicly available Hate & offensive
language datasets.
Results and Discussion
16
(a) (b)
(c) (d)
Performance of MLP based Model
17
WZ-LS
class Precision Recall F1
Racist 0.73 0.73 0.73
Sexism 0.77 0.56 0.65
None 0.85 0.92 0.88
Both 1.0 0.33 0.50
Overall 0.83 0.83 0.82
WZ-L
class Precision Recall F1
Racist 0.81 0.68 0.74
Sexism 0.85 0.61 0.71
None 0.83 0.93 0.88
Overall 0.83 0.83 0.82
WZ-S.exp
class Precision Recall F1
Racist 1.0 0.05 0.2
Sexism 0.85 0.77 0.81
None 0.95 0.99 0.97
Both 0.0 0.0 0.0
Overall 0.93 0.92 0.93
DT
class Precision Recall F1
Hate 0.60 0.52 0.56
Offensive 0.95 0.80 0.87
Neither 0.87 0.91 0.89
Overall 0.92 0.91 0.92
(a) (b)
(c)
(d)
Conclusion
The propagation of hate speech on social media has been increasing
significantly in recent years and it is recognised that effective counter-measures
rely on automated data mining techniques. Our work made several contributions
to this problem. First, we introduced a method for automatically classifying hate
speech on Twitter using a deep neural network model (DCNN and MLP) that
empirically improve classification accuracy. Second we did comparative analysis
of our model on four publicly available datasets.
18
Future Work
We will explore future work in numerous ways, such as first, further fine tuning of
hyperparameter can improve accuracy, second we will use metadata along with
tweets such as number of followers, the location, account age, total number of
(posted/favorited/liked) tweets, etc., of a user. We will make a hybrid model
(DCNN + MLP), all tweets are passed through DCNN model and metadata to MLP
in parallel then the result of these two can be combined and then it will be passed
through dense layer and softmax layer for final classification.
19
THANKS!
20
References
• Greevy E and Smeaton A F. "Classifying racist texts using a support vector machine"; In Proceedings of the 27th Annual
International ACM SIGIR Conference on Research andDevelopment in Information Retrieval SIGIR ’04, pages 468–469, New
York, NY, USA, 2004. ACM
• Davidson T, Warmsley D, Macy M, and Weber I. "Automated hate speech detection and the problem of offensive language"; In
Proceedings of the 11th Conference on Web and Social Media. AAAI, 2017.
• Lozano E, Cede˜no J, Castillo G, Layedra F, Lasso H, and Vaca C. 2017 "Requiem for online harassers: Identifying racism from
political tweets"; In 4th IEEE Conference on eDemocracy & eGovernment (ICEDEG), 154–160.
• Jha A, and Mamidi R. 2017. "When does acompliment become sexist? analysis and classification of ambivalent sexism using
twitter data"; In 2nd Workshop on NLP and Computational Social Science, 7–16.
• Park H. J. and Fung P. "One-step and two-step classcation for abusive language detection on twitter";In ALW1: 1st Workshop on
Abusive Language Online, Vancouver, Canada, 2017. Association for Computational Linguistics.
• Zhang Z, Robinson D and Tepper J, “Detection Hate Speech on Twitter Using a Convolution-GRU based DNN” In 15th ESWC 2018
conference on Semantic web.
• Waseem Z and Hovy D. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter";In
Proceedings of the NAACL Student Research Workshop, pages 88–93. Association for Computational Linguistics, 2016.
• Kalchbrenner N, Grefenstette E., Blunsom P. “A Convolutional Neural Network for Modelling Sentences”, In arXiv:1404.2188v1
[cs.CL] 8 Apr 2014.
21

More Related Content

PDF
chapter-3-logic-gates.pdf
PPTX
Impact of social media on youth
PDF
Sentiment Analysis of Twitter Data
PPTX
Online Food Ordering System Presentation
PPTX
impact of social media on youth
PPTX
Sentiment Analysis in Twitter
PPT
Hospital management final report presentation
PPTX
Social issues and environment
chapter-3-logic-gates.pdf
Impact of social media on youth
Sentiment Analysis of Twitter Data
Online Food Ordering System Presentation
impact of social media on youth
Sentiment Analysis in Twitter
Hospital management final report presentation
Social issues and environment

What's hot (20)

PPTX
Types of Machine Learning
PPTX
Machine Learning Project
PDF
Emotion detection using cnn.pptx
PPTX
K means clustering
PDF
Speech emotion recognition
PDF
Lecture 1: What is Machine Learning?
PDF
Recurrent neural networks rnn
PPTX
Semi-Supervised Learning
PDF
Feature selection
PPTX
Machine Learning project presentation
PPT
4.2 spatial data mining
PPTX
Presentation on Sentiment Analysis
PPTX
Machine learning seminar ppt
PDF
Artificial Intelligence with Python | Edureka
PPTX
Random forest
PPTX
A Comprehensive Review of Large Language Models for.pptx
PPTX
PPTX
Convolutional Neural Network and Its Applications
PPTX
Machine learning
PPTX
Intro/Overview on Machine Learning Presentation
Types of Machine Learning
Machine Learning Project
Emotion detection using cnn.pptx
K means clustering
Speech emotion recognition
Lecture 1: What is Machine Learning?
Recurrent neural networks rnn
Semi-Supervised Learning
Feature selection
Machine Learning project presentation
4.2 spatial data mining
Presentation on Sentiment Analysis
Machine learning seminar ppt
Artificial Intelligence with Python | Edureka
Random forest
A Comprehensive Review of Large Language Models for.pptx
Convolutional Neural Network and Its Applications
Machine learning
Intro/Overview on Machine Learning Presentation
Ad

Similar to Hate speech detection (20)

PDF
Icon18revrec sudeshna
PPTX
CNN for modeling sentence
PPTX
A technical paper presentation on Evaluation of Deep Learning techniques in S...
PDF
5th_sem_presentationtoday.pdf
PDF
Convolutional Neural Network for Text Classification
PPTX
TensorFlow.pptx
PPT
deep learning UNIT-1 Introduction Part-1.ppt
PDF
Methodological study of opinion mining and sentiment analysis techniques
PPT
presentation.ppt
PDF
Optimizer algorithms and convolutional neural networks for text classification
PPTX
Gnerative AI presidency Module1_L4_LLMs_new.pptx
PDF
Extract Stressors for Suicide from Twitter Using Deep Learning
PPTX
Introduction to deep learning
PPTX
Seminar dm
PDF
Recurrent Neural Networks, LSTM and GRU
PPTX
Dataworkz odsc london 2018
PPTX
Word_Embedding.pptx
PPTX
240722_Thuy_Labseminar[Unveiling Global Interactive Patterns across Graphs: T...
PDF
Feedforward Networks and Deep Learning Module-02.pdf
PDF
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Icon18revrec sudeshna
CNN for modeling sentence
A technical paper presentation on Evaluation of Deep Learning techniques in S...
5th_sem_presentationtoday.pdf
Convolutional Neural Network for Text Classification
TensorFlow.pptx
deep learning UNIT-1 Introduction Part-1.ppt
Methodological study of opinion mining and sentiment analysis techniques
presentation.ppt
Optimizer algorithms and convolutional neural networks for text classification
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Extract Stressors for Suicide from Twitter Using Deep Learning
Introduction to deep learning
Seminar dm
Recurrent Neural Networks, LSTM and GRU
Dataworkz odsc london 2018
Word_Embedding.pptx
240722_Thuy_Labseminar[Unveiling Global Interactive Patterns across Graphs: T...
Feedforward Networks and Deep Learning Module-02.pdf
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Ad

Recently uploaded (20)

PPTX
bas. eng. economics group 4 presentation 1.pptx
PPT
Project quality management in manufacturing
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT 4 Total Quality Management .pptx
PDF
PPT on Performance Review to get promotions
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Well-logging-methods_new................
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT
introduction to datamining and warehousing
PPTX
Artificial Intelligence
PDF
composite construction of structures.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT
Mechanical Engineering MATERIALS Selection
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
bas. eng. economics group 4 presentation 1.pptx
Project quality management in manufacturing
Lecture Notes Electrical Wiring System Components
Geodesy 1.pptx...............................................
UNIT 4 Total Quality Management .pptx
PPT on Performance Review to get promotions
Model Code of Practice - Construction Work - 21102022 .pdf
Well-logging-methods_new................
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
introduction to datamining and warehousing
Artificial Intelligence
composite construction of structures.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mechanical Engineering MATERIALS Selection
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Safety Seminar civil to be ensured for safe working.

Hate speech detection

  • 1. A Deep Learning Approach For Hate Speech and Offensive Language Detection on Twitter Presented By: Nasim Alam M Tech Computer
  • 2. INTRODUCTION Hate Speech ● Hate speech is speech that attacks a person or group on the basis of attributes such as race religion, ethnic origin, national origin, gender, disability, sexual orientation. ● The law of some countries describes hate speech as speech, gesture or conduct, writing, or display that incites violence or prejudicial action against a protected group or individual on the basis of their membership of the group. ● Social media platforms like Facebook and twitter has raised concerns about emerging dubious activity such as the intensity of hate, abusive and offensive behavior among us. 2
  • 3. Motivation Potential of social media for spreading hate speech ◉ 30% internet penetration in India (World Bank, 2016) ◉ 241 million users of Facebook alone (The Next Web Report, 2017) ◉ 136 million Indians are active social media users (Yral Report, 2016) ◉ 200 million whatsapp users in India (Mashable, 2017) 3
  • 4. OBJECTIVE • The main objective of this work is to develop an automated deep learning based approach for detecting hate speech and offensive language. • Automated detection corresponds to automated learning such as machine learning: supervised and unsupervised learning. We use a supervised learning method to detect hate and offensive language. • Classify tweets into three or four classes (like: racist, sexist, none , both) based on tweet sentiment and other features that a tweet demonstrate. 4
  • 5. PROJECT CONTRIBUTION • An efficient feature extraction and selection. • A Multi-layer perceptron based model to train and classify tweets into hate, offensive or none. • A Dynamic CNN based model for training and GloVe embedding vector for feature extraction. 5
  • 6. Literature survey Refereance Dataset Techinque Results Greevy et al (2004) PRINCIP Corpus Size: 3 Million words from tweets Model: SVM Feature Extraction: BOW, Bi-gram Precision: 92.5%(BOW) Recall: 87% (BOW) Precision: 92.5% (Bi-gram) Recall: 87% (Bi-gram) Waseem and Hovy (2016) Total Annotated tweets: 16,914. #Sexist tweets 3,383. #Racist Tweets 1,972. #tweets Neither racist nor sexist: 11,559. Model: Char n-grams Word n-grams Precision: 73.89%(char n-gram) Recall: 77.75% (char n-gram) F1 Score: 72.87 (char n-grams) Precision: 64.58%(word n-grams) Recall: 71.93% (word n-grams) F1 Score: 64.58(word n-grams) Akshita et al (2016) Waseem and Hovy, 2016 Size: 22,142 tweets Class: Benevolent, Hostile, others Model: SVM, Seq2Seq (LSTM), FastText Classifier(by Facebook AI Research) Feature Extraction: TF- IDF, Bag of n-words Average F1 score among classes: 0.723 (SVM), 0.74 (Seq2Seq) Overall F1 Score: 0.84 (FastText) 6
  • 7. Literature survey Refereance Dataset Techinque Results Ji Ho et al (2016) Waseem and Hovy, 2016 Waseem 2016 Class: Racism,Sexism and None Size: 25k tweets Model: Hybrid CNN Classifier(wordCNN + CharCNN) Precision: 0.827 Recall: 0.827 F1 Score: 0.827 Davidson et al (2017) CrowdFlower (CF) Class: Hate,offensive and None Size: 25k tweets Model: Linear SVM, Logistic, Regression precision: 0.91, Recall: 0.90, F1 score: 0.90. Zhang et al (2018) 7 publicly available dataset: DT(24k), RM(2k), WZ-LS(18k), WZ- L(16k), WZ-S.amt(6k), WZ-S.exp(6k), WZ-S.gb(6k) Model:CNN+GRU Accuracy: DT: 0.94, RM: 0.92, WZ-L: 0.82,WZ-S.amt: 0.92, WZ-S.exp: 0.92, WZ-S.gb: 0.93 7
  • 8. Dataset 8 Dataset No of Tweets Classes (%Tweets) Target Class WZ-LS 18,595 Racism(10.6%), Sexism(20.2%), None (68.8%) Racism, Sexism WZ-L 16,093 Racism(12.01%), Sexism(19.56%), None (68.41%) Racism, Sexism WZ-S.exp 6,594 Racism(1.2%), Sexism(11.7%), both(0.53%), None (84.37%) Racism, Sexism Hate (Davidson) 24,783 Hate(11.6%), offensive(76.6%), Neither (11.8%) Hate, Offensive
  • 9. A Multi-Layer perceptron (MLP) based model 9 ● Raw text in the form of tweets in csv file crawled from twitter using Tweepy API. ● A lots of preprocessing done to get cleaned text. ● Feature Extraction: ○ Convert it into TF-IDF feature matrix. ○ POS TF feature matrix. ○ Other Features like: No_of_syllales, avg_syl_per_word, no_of_unique_words,num_mentions,is_retweets,VaderSentime nt:pos,neg,neutral, compound). ● Concatenated these feature matrices into one matrix. ● We used logistic regression with L1 regularization to select most important features and then passed this selected feature vector to an MLP network for classification. ● MLP consists of an input layer, three hidden layer and an output layer or softmax layer. ○ Input layer Size: Size of Input feature matrix, Activation: Sigmoid. ○ Number of nodes: 200, 140, 70 and Activation Function: Relu. ○ Softmax Layer: Output class: 3 or 4, Activation function: Softmax. MLP based Proposed model
  • 10. 10 A simple single layer CNN ● A Sentence (a single tweet): X1:n = X1 ⊕ X2 ⊕ …………..Xn ● All possible widow of length h: {X1:h, X2:h+1, …………Xn-h+1:n } ● We can have multiple filter or window of different length like h=1 for unigram, h=2 for bigram , h=3 for trigram and so on. ● This filter is consist of random weight which is convolved over sentence matrix in overlapped fashion and a sum of multiplication of filter and X is calculated as feature map. ● A feature map C = [c1,c2,……………………cn+h-1] ∈ ℝn-h+1, for multiple filter we may have multiple feature map as Ci = [C1, C2, …………Cm] where m is number of filters. ● pooling: pooling is a process of selecting only interested region from the convolution feature vector. The result of pooling is Ĉ = max{ C } and Ĉi can be pooled feature vector for ith filter. ● All the pooled vectors are concatenated into single feature vector Z = [Ĉ1, Ĉ2, ……, Ĉm ] ● Finally Z feature vector is passed through a softmax function for final classification.
  • 11. Word2Vec Word Embedding 11 Word2vec • Word2vec is a predictive model, which uses an ANN based model to predict the real valued vector of a target word with respect to the another context word. • Mikolov et al used continuous bag of words and skipgram models that are trained over millions of words and represent each in a vector space. GloVe • GloVe is a semantic vector space models of language represent each word with a real valued vector. • GloVe model uses word frequency and global co-occurance count matrix. • Count-based models learn their vectors by essentially doing dimensionality reduction on the co-occurrence counts matri. • These vectors can be used as features in a variety of applications such as information retrieval, document classification, question answering, NER, and Parsing. Representation of word in vector space
  • 12. Text based CNN 12 Text based Convolutional Neural Network operation (Source: Kim 2014)
  • 13. 13 Dynamic Convolutional Neural Network • Wide Convolution: Given an input sentence, to obtain the first layer of the DCNN we take the embedding Wi ∈ ℝd for each word in the sentence and construct the sentence matrix s ∈ ℝd × s . • A convolutional layer in the network is obtained by convolving a matrix of weights m ∈ ℝd × m with the matrix of activations at the layer below. • A dynamic k-max pooling operation is a k-max pooling operation where we let k be a function of the length of the sentence and the depth of the network, we simply model the pooling parameter as follows: Where i is ith conv-layer in which k max-pooling is applied. L is the total number of convolutional layers in the network.S is input sentence length. • Folding is used just to sum every two rows in feature map component wise. For the feature map of d rows folding returns d/2 rows. A DCNN Architecture (Source: Kalchbrenner et al. (2014) )
  • 14. A DCNN based Model for Hate speech detection 14 ● Tweets: Crawled tweets using tweet-id, saved as csv file having tweets and label. ● Preprocessing of tweets: ○ Convert to lowercase, Stop words removal. ○ Remove unwanted symbols and retweets. ○ Normalize the words to make it meaningful. ○ Remove tokens having document frequency less than 7 which removes sparse features which is less informative. ● Word2vec conversion: ○ A 300-dimensional word embedding GloVe model, which is pre- trained on the 4-billion-word Wikipedia text corpus by researcher from Stanford University. ○ Embedding dimension: 100*300. ● Passed to DCNN model for classification: ○ Four conv1d layer of having 300 filters of each of window size 1,2,3 and 4. ○ K-max pooling performed corresponding to each conv1d and merged into one single vector. ○ Further passed through Dropout, dense layer and softmax layer for classification. A DCNN based proposed model
  • 15. Results and Discussion 15 Datasets SVM MLP CNN* DCNN WZ-LS 0.73 0.83 0.82 0.83 WZ-L 0.74 0.83 0.82 0.83 WZ-S.exp 0.89 0.93 0.90 0.9283 Hate 0.87 0.92 0.91 0.92 Table 1: shows testing accuracy of 4 different model on 4 publicly available Hate & offensive language datasets.
  • 17. Performance of MLP based Model 17 WZ-LS class Precision Recall F1 Racist 0.73 0.73 0.73 Sexism 0.77 0.56 0.65 None 0.85 0.92 0.88 Both 1.0 0.33 0.50 Overall 0.83 0.83 0.82 WZ-L class Precision Recall F1 Racist 0.81 0.68 0.74 Sexism 0.85 0.61 0.71 None 0.83 0.93 0.88 Overall 0.83 0.83 0.82 WZ-S.exp class Precision Recall F1 Racist 1.0 0.05 0.2 Sexism 0.85 0.77 0.81 None 0.95 0.99 0.97 Both 0.0 0.0 0.0 Overall 0.93 0.92 0.93 DT class Precision Recall F1 Hate 0.60 0.52 0.56 Offensive 0.95 0.80 0.87 Neither 0.87 0.91 0.89 Overall 0.92 0.91 0.92 (a) (b) (c) (d)
  • 18. Conclusion The propagation of hate speech on social media has been increasing significantly in recent years and it is recognised that effective counter-measures rely on automated data mining techniques. Our work made several contributions to this problem. First, we introduced a method for automatically classifying hate speech on Twitter using a deep neural network model (DCNN and MLP) that empirically improve classification accuracy. Second we did comparative analysis of our model on four publicly available datasets. 18
  • 19. Future Work We will explore future work in numerous ways, such as first, further fine tuning of hyperparameter can improve accuracy, second we will use metadata along with tweets such as number of followers, the location, account age, total number of (posted/favorited/liked) tweets, etc., of a user. We will make a hybrid model (DCNN + MLP), all tweets are passed through DCNN model and metadata to MLP in parallel then the result of these two can be combined and then it will be passed through dense layer and softmax layer for final classification. 19
  • 21. References • Greevy E and Smeaton A F. "Classifying racist texts using a support vector machine"; In Proceedings of the 27th Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval SIGIR ’04, pages 468–469, New York, NY, USA, 2004. ACM • Davidson T, Warmsley D, Macy M, and Weber I. "Automated hate speech detection and the problem of offensive language"; In Proceedings of the 11th Conference on Web and Social Media. AAAI, 2017. • Lozano E, Cede˜no J, Castillo G, Layedra F, Lasso H, and Vaca C. 2017 "Requiem for online harassers: Identifying racism from political tweets"; In 4th IEEE Conference on eDemocracy & eGovernment (ICEDEG), 154–160. • Jha A, and Mamidi R. 2017. "When does acompliment become sexist? analysis and classification of ambivalent sexism using twitter data"; In 2nd Workshop on NLP and Computational Social Science, 7–16. • Park H. J. and Fung P. "One-step and two-step classcation for abusive language detection on twitter";In ALW1: 1st Workshop on Abusive Language Online, Vancouver, Canada, 2017. Association for Computational Linguistics. • Zhang Z, Robinson D and Tepper J, “Detection Hate Speech on Twitter Using a Convolution-GRU based DNN” In 15th ESWC 2018 conference on Semantic web. • Waseem Z and Hovy D. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter";In Proceedings of the NAACL Student Research Workshop, pages 88–93. Association for Computational Linguistics, 2016. • Kalchbrenner N, Grefenstette E., Blunsom P. “A Convolutional Neural Network for Modelling Sentences”, In arXiv:1404.2188v1 [cs.CL] 8 Apr 2014. 21