SlideShare a Scribd company logo
Multi-level classifier for
the detection of insults
in social media
MARCH 2015
CONFERENCE: 15TH PHILIPPINE COMPUTING SCIENCE CONGRESS
AT: UNIVERSITY OF ST. LOUIS, TUGUEGARAO CITY, PHILIPPINES
Outline:
Introduction
Old approaches in insult detection
The multi-level classifier :
Data
Lexicon based classifier
N-GRAM SVM classifier
Neural network
Results
Conclusion
2
Introduction:
Progressive growth of social networks
Huge impact on our life
Critical age class easily affected
Urgent need of solution
3
Old approaches in insult detection:
Lexical syntactic feature LSF
Naïve Bayes text classifier
4
The multi-level classifier:
5
Data:
• Training : 3947 rows
• Test and verification : 4881 rows
6
7
Second Kind data :
List with 1048 curse word
Second person pronoun list :
you your ya u ur yourself yo and yours
8
The lexicon based classifier :
Presence of curse and second person pronoun in the same text are
considered insult
The more they close the more the text is insulting
9
Pseudo code to obtain the lexical score:
10
N-GRAM SVM classifiers:
11
Neural network:
10 inputs :
3 of them are those generated from the lower level
classifiers
One hidden layer with 15 nodes
Learning rate : 5*10^-6
12
Results :
13
Conclusion:
The approach of multi-level classifier showed even if some
classifiers generate moderate results we can improve their
efficiency if we combine them together
14
Thank you for you attention
15

More Related Content

DOCX
A system to filter unwanted messages from osn user walls
PDF
A system to filter unwanted messages
PDF
Filtering Unwanted Messages from Online Social Networks (OSN) using Rule Base...
PPTX
Enhancing openness from within: a new approach to increase universities capac...
DOCX
Dissertation Executive Summary
PDF
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
PPT
A Study of User Interaction with Context Aware Notifications from a Moodle Le...
PPT
How an interactive 3D perception can help scientists
A system to filter unwanted messages from osn user walls
A system to filter unwanted messages
Filtering Unwanted Messages from Online Social Networks (OSN) using Rule Base...
Enhancing openness from within: a new approach to increase universities capac...
Dissertation Executive Summary
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
A Study of User Interaction with Context Aware Notifications from a Moodle Le...
How an interactive 3D perception can help scientists

Similar to Multi level classifier for the detection of insults (20)

PPTX
A systematic literature review of academic cyberbullying 2021
PDF
ESRC Research Methods Festival NSMNSS presentation
PDF
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid. ...
DOCX
discovering emerging topics in social
PDF
Applications Of Patterndriven Methods In Corpus Linguistics Kopaczyk
PDF
EMOOCs 2014 Proceedings of the European MOOC Stakeholder Summit 2014
PDF
Combining Methods In Educational And Social Research Conducting Educational R...
PDF
SAMUEL FULL PROJECT
PPTX
toxic commnets classification using python
PDF
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
PDF
Mixed Methods Social Network Analysis Theories And Methodologies In Learning ...
PPTX
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
PDF
Computational methods for intelligent matchmaking for knowledge work
PDF
DCLA14_Haythornthwaite_Absar_Paulin
PDF
The Semantic Web Trends And Challenges 11th International Conference Eswc 201...
PDF
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
PDF
Analysing Student Participation In Foreign Language MOOCs A Case Study
KEY
Developing media literacy indicators for Europe
PPT
Can Technology 'Democratize' Academia?
PDF
Detecting fake news_with_weak_social_supervision
A systematic literature review of academic cyberbullying 2021
ESRC Research Methods Festival NSMNSS presentation
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid. ...
discovering emerging topics in social
Applications Of Patterndriven Methods In Corpus Linguistics Kopaczyk
EMOOCs 2014 Proceedings of the European MOOC Stakeholder Summit 2014
Combining Methods In Educational And Social Research Conducting Educational R...
SAMUEL FULL PROJECT
toxic commnets classification using python
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
Mixed Methods Social Network Analysis Theories And Methodologies In Learning ...
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
Computational methods for intelligent matchmaking for knowledge work
DCLA14_Haythornthwaite_Absar_Paulin
The Semantic Web Trends And Challenges 11th International Conference Eswc 201...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Analysing Student Participation In Foreign Language MOOCs A Case Study
Developing media literacy indicators for Europe
Can Technology 'Democratize' Academia?
Detecting fake news_with_weak_social_supervision
Ad

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Assigned Numbers - 2025 - Bluetooth® Document
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Ad

Multi level classifier for the detection of insults

Editor's Notes

  • #4: These days social media are progressively growing and taking part of our life , people of all ages are spending most of their time in the internet at social networking sites which proves how much successful social networks are nowadays but also how much they can influence positively or negatively our life specially that as we mentioned people of all ages are using social media sites , among them are kids or teenagers which can be affected very badly with insulting or offensive content, That’s what built the urgent need of working and research in this topic,
  • #5: Since it is a critical need many researchers have worked on this problem which means many solutions were proposed , one of them is the lexical syntactic feature LSF which is used to detect insult on social media using two major feature : lexical and syntactic feature where the lexical features treat each word or phrase as an entity and where the syntactic ones looks for whom those words are directed , In addition to some experience conducted by Vandermissen which resulted a test classifier based on naïve Bayes classifier, And both gave a very poor precision results
  • #6: The classifier proposed in this project is composed of three main part or four if we consider the data preprocessing part After preprocessing the data , the datasets are passed to two classifier which are the lexicon based classifier and the n-gram SVM classifier The lexicon based classifier which is the classifier 1 doesn't use the dataset to learn with supervision it uses only words list to generate a lexical score that I will explain later The classifier 2 contains 2 SVM classifier that are used to reduce the dimension of the feature set since it is receiving thousand of n-grams from the preprocessing part and feeding that directly to the neural network will make it converge very slowly,
  • #7: Two kind of data needs to be gathered the first is the training and the test sets : in this project they worked with a dataset that some cyber security start up released in Kaggle , it contains 3947 rows for training and 4881 rows for the test and verification it only consists of 3 column the date when the text was made , the text it self and the classification , it is a binary classification whether it is an insult or not And insults are considered only if they intended to be insulting to a person who is a part of the larger blog or website conversation Which explain the reason for the second kind of data
  • #8: Before being sent to the first and second classifier the data is preprocessed and managed to be divided into 3 dataset : word n-grams and character n-grams which will be sent to second classifier which is the N-gram SVM classifier and the other dataset will be sent to the lexicon based classifier and it is the second kind of data
  • #9: The second kind is a two words lists that are needed for the lexicon based classifier these are the curse word list and the second person pronoun list The curse word list is obviously needed no need to explain why , the second person pronoun list assure that the curse word are intended to someone who is part of the conversation There 1048 curse word gathered from different sources The second person pronoun list consist of only 8 words which are you your ya u ur yourself yo and yours
  • #10: The lexicon based classifier assumes that if there are curse words and second person pronoun in the same text , the text is more likely to be insulting and the more closer they are the more the text is considered insulting
  • #11: Then the lexical score is normalized to fit the 0 1 range
  • #12: We have two SVM classifiers one for the character n-grams and the other for the word n-grams which are the first kind of data that we talked about in the data part The minimum word n-gram are unigram and the maximum word n-gram are 4-gram The minimum character n-gram are unigram and the maximum character n-gram are 10-gram After the training every SVM classifier will generate an SVM vector one for the characters and one for the words
  • #13: The neural network which is the last level , takes 10 inputs among them the first three input , those obtained from the first 2 level classifiers which are the lexical score and the 2 SVM output vector and in addition to those 3 we will be adding the number of curse words , the number of second person pronoun , the number of characters , the number of exclamation point , the number of asterisks , the number of capital letters The neural network has only one hidden layer with 15 nodes and the learning rate is 5*10-6