SMS spam classification using
NLP: Methods, approaches,
and applications
By Anisha Agarwal
Introduction
The easy accessibility and simplicity of SMS have made it
attractive to malicious users thereby incurring unnecessary costing
on the mobile users and also the Secure Mobile Message
Communication is jeopardized.
Thus, this article is to identify and review existing state-of-the-art
methodology for SMS spam classification based on certain
metrics: ML and AI methods and techniques, approaches, and
deployed environment.
Approach
1. Import the required Libraries.
2. Data Preprocessing.
3. Bag of Words.
4. Adding new Feature. Like- Length of the text,
Profanity of the text, Parts of Speech(POS).
5. EDA of the dataset.
6. Word Tokenization.
7. Implementing different ML classifying models. Like-
LogisticRegression, MultinomialNB,
RandomForestClassifier, LinearSVC, SGDClassifier,
GradientBoostingClassifier. And compare these to
find which Model is best for this classification.
Implementation
Libraries
Data Preprocessing
1. Removing unnecessary
columns and renaming
features name.
Data Preprocessing:
2. Numericalizing categorical feature which is our label (ham or sam).
Data Preprocessing:
3. Generating corpus from raw sms messages (stopwords,lowering,stemming).
Data Preprocessing:
Data Preprocessing:
4. Creating bag of words model using CountVectorizer.
Bag of Words: Code to Generate Bag of Words
Code to plot Word of Cloud Spam Words
Code to plot Word of Cloud Ham Words
New Features added: Length of Text
New Features added: Profanity Check
New Features added: Readability Score
New Features added: Parts of Speech (POS)
Exploratory Data Analysis:
Maximum Length of the Text Plotted
Spam and Ham Text against the Length
Distribution of text length
Ham Tokenization for first 50 Words:
OutPut
Spam Tokenization for first 50 Words:
OutPut
Classification Model Data Preparation:
Logistic Regression:
MultinomialNB:
Random Forest Classifier:
Linear SVC:
SGD Classifier:
Gradient Boosting Classifier:
Compare Models:
1. We provided the text and refined the text (removal of stopwords,
punctuations, and performed lemmatization). This helped in
improving the Accuracy.
2. We have used different Model Pipeline containing TfidfVectorizer,
where SVM model gives the best accuracy score of 98%.
3. The top Spam Tokenized words are- Call, Txt, Claim, Prize, Stop
etc. These words gives an indication that it is either an commercial
SMS or Spam SMS which is not used in regular life.
4. Most likely spam SMS’s have longer length in text as compared to
Non Spam SMS.
5. Readability score is less or negative in Spam SMS as compared to
Non Spam SMS.
6. Parts of speech that is adjective and adverbs, we can see that
adjectives are used most frequently in Spam SMS as compared to
Non Spam SMS.
Inference
Thank You!!!

More Related Content

PPTX
Spam Detection Using Natural Language processing
PPTX
Sms spam-detection
PPTX
Machine Learning Project - Email Spam Filtering using Enron Dataset
PPTX
miniproject.ppt.pptx
PDF
Spam Filtering
PPTX
Spam filtering with Naive Bayes Algorithm
PPTX
Phishing Detection using Machine Learning
PDF
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Spam Detection Using Natural Language processing
Sms spam-detection
Machine Learning Project - Email Spam Filtering using Enron Dataset
miniproject.ppt.pptx
Spam Filtering
Spam filtering with Naive Bayes Algorithm
Phishing Detection using Machine Learning
Phishing Website Detection by Machine Learning Techniques Presentation.pdf

What's hot (20)

PPTX
Email spam detection
PPTX
Final spam-e-mail-detection
PPT
E Mail & Spam Presentation
PPTX
Spam detection using machine learning based binary classifier_043660
PPTX
Spam email detection using machine learning PPT.pptx
PPT
E mail image spam filtering techniques
PDF
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
PPTX
FAKE NEWS DETECTION PPT
PDF
A Survey: SMS Spam Filtering
PPT
Image Steganography
PDF
Spam Email identification
PDF
Email security presentation
PPTX
Security services
PDF
spam_msg_detection.pdf
PPTX
Steganography
PPTX
Text Classification
PPTX
Virus and its CounterMeasures -- Pruthvi Monarch
PPTX
Detection of Phishing Websites
PDF
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
PPTX
Machine Learning and Real-World Applications
Email spam detection
Final spam-e-mail-detection
E Mail & Spam Presentation
Spam detection using machine learning based binary classifier_043660
Spam email detection using machine learning PPT.pptx
E mail image spam filtering techniques
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
FAKE NEWS DETECTION PPT
A Survey: SMS Spam Filtering
Image Steganography
Spam Email identification
Email security presentation
Security services
spam_msg_detection.pdf
Steganography
Text Classification
Virus and its CounterMeasures -- Pruthvi Monarch
Detection of Phishing Websites
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
Machine Learning and Real-World Applications
Ad

Similar to Sms spam classification (20)

PPTX
ppt of SMS.pptxFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF...
PDF
A Comparative Study for SMS Spam Detection
PPTX
EDUNET FOUNDATION Project_template_AICTE.pptx
PPTX
Project_template_EDUNET FOUNDATION1.pptx
PPTX
From Spam to Ham_ SMS Detection via Naïve Bayes.pptx
PPTX
SMS Spam Filtering System for sms system
PPTX
Spam message classifier using machine learning
PDF
An intelligent auto-response short message service categorization model using...
DOCX
Final Report(SuddhasatwaSatpathy)
PDF
Differential evolution detection models for SMS spam
DOCX
Enabling Spam filtering
PPTX
project review using naive bayes theorem .pptx
PPTX
finbg dlf cm DH kf ki dfbjjhfsckhvkhal review ppt.pptx
PPTX
The-Growing- Problem- of- SMS- Spam.pptx
PPT
Supervised Learning-classification Part-3.ppt
PPT
Supervised Learningclassification Part3.ppt
PDF
Ai group-seminar-2013 nbc
PPTX
Spam Detection.pptx email spam detection ppt using naive bayes classifier
PPT
Fang feb-17
PDF
Implementation of Spam Classifier using Naïve Bayes Algorithm
ppt of SMS.pptxFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF...
A Comparative Study for SMS Spam Detection
EDUNET FOUNDATION Project_template_AICTE.pptx
Project_template_EDUNET FOUNDATION1.pptx
From Spam to Ham_ SMS Detection via Naïve Bayes.pptx
SMS Spam Filtering System for sms system
Spam message classifier using machine learning
An intelligent auto-response short message service categorization model using...
Final Report(SuddhasatwaSatpathy)
Differential evolution detection models for SMS spam
Enabling Spam filtering
project review using naive bayes theorem .pptx
finbg dlf cm DH kf ki dfbjjhfsckhvkhal review ppt.pptx
The-Growing- Problem- of- SMS- Spam.pptx
Supervised Learning-classification Part-3.ppt
Supervised Learningclassification Part3.ppt
Ai group-seminar-2013 nbc
Spam Detection.pptx email spam detection ppt using naive bayes classifier
Fang feb-17
Implementation of Spam Classifier using Naïve Bayes Algorithm
Ad

Recently uploaded (20)

PDF
Best Data Science Professional Certificates in the USA | IABAC
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Tapan_20220802057_Researchinternship_final_stage.pptx
PPTX
chrmotography.pptx food anaylysis techni
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
DOCX
Factor Analysis Word Document Presentation
PDF
Microsoft Core Cloud Services powerpoint
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PDF
Microsoft 365 products and services descrption
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPT
Image processing and pattern recognition 2.ppt
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPTX
CYBER SECURITY the Next Warefare Tactics
Best Data Science Professional Certificates in the USA | IABAC
retention in jsjsksksksnbsndjddjdnFPD.pptx
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Tapan_20220802057_Researchinternship_final_stage.pptx
chrmotography.pptx food anaylysis techni
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
Factor Analysis Word Document Presentation
Microsoft Core Cloud Services powerpoint
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
Microsoft 365 products and services descrption
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Image processing and pattern recognition 2.ppt
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
1 hour to get there before the game is done so you don’t need a car seat for ...
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
CYBER SECURITY the Next Warefare Tactics

Sms spam classification