SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 735
Email Spam Detection Using Machine Learning
Prof. Prachi Nilekar, Tamboli Abdul Salam, Manish Kumar Gupta,
Krishna Sharma, Safwan Attar
ALARD COLLEGE OF ENGINEERING & MANAGEMENT
(ALARD Knowledge Park, Survey No. 50, Marunje, Near Rajiv Gandhi IT Park, Hinjewadi, Pune-411057)
Approved by AICTE. Recognized by DTE. NAAC Accredited. Affiliated to SPPU (Pune University).
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Nowadays, Email spam has become a big
problem, with the fast growth of internet users, email spams
are also increasing. People are using them for phishing, illegal
and unethical practices and frauds. Sending malicious links
through spam emails that can harm for our system and may
also get into your system. It is very simple for spammers to
create a fake profile and email account, they show like a real
person in their spam emails, these spammers simply target
people who are not aware of these frauds. then there is a need
to identify those spam mails which are frauds, this project will
identifies those spams using techniques of machine learning,
this paper will discuss machinelearningalgorithm'sandapply
all these algorithm's to our dataset. it select the best
algorithm, for this project algorithm will be chosen based on
the best accuracy and precision in email spam detecting.
Key Words: (Machine Learning, Naive Bayes, Support
Vector Machine, DTS, Random Forest, Bagging, Boosting)
1. INTRODUCTION
Machine learning approaches are more efficient, a set of
training data is used, these samples are the set of email
which are pre classified.Machinelearningapproacheshavea
lot of algorithms that can be used for email filtering, these
algorithms are “Naive Bayes, support vector machines,
Neural Networks, K-nearest neighbor, RandomForests, etc.”
Why Machine Learning: Machine learning allows the user
to feed a computer algorithm an immense amount of data
and have the computer analyze and make data-driven
recommendations and decisions based on only the input
data.
What is DATASET: Dataset is a collection of data or related
information that is composed for separate elements. A
collection of datasets for e-mail spam contains spam and
non-spam messages.
What is Train and Test datasets: The main difference
between training data and test data is that training data is
the subset of original data that is used to train a machine
learning model, whereas test data is used to check the
accuracy of the model. The training dataset is usually larger
in size than the test dataset. Train and test dataset are two
key concepts in machine learning, wherethetrainingdataset
is used to fit the model, and the test dataset is used to
evaluate the model.
Fig -1: Train and Test Model
Machine learning algorithms used to classify the text into
two different categories, spam and ham. The algorithm will
predict the score more accurately. The objective of
developing this model is to detect and score word faster and
accurately.
2. MACHINE LEARNING CLASSIFICATION
ALGORITHMS
Naive Bayes: Naive Bayes is a classification algorithm
suitable for both binary and multiclass classification. Naive
Bayes performs better for categorical input variables than
for numerical variables. It is useful for making predictions
based on historical results and forecast data.
P(A) is Prior Probability: The possibility of a hypothesis
before seeing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Support Vector Machine: SVMs are used in intrusion
detection, face detection, email classification, gene
classification, web pages, etc. It can handle classificationand
regression on linear and non-linear data.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 736
Fig -2: Support Vector Machine
Decision tree: Decision trees are extremely useful for data
analytics and machine learning because they break down
complex data into more manageable chunks. They are often
used in these fields for predictive analysis, data
classification, and regression.
Entropy using the frequency table of one attribute:
Entropy using the frequency table of two attributes:
KNN: The KNN algorithm can compete with highly accurate
models because it makes highly accurate predictions. The
KNN algorithm use for applications that require high
accuracy but do not require a human readable model. The
quality of the predictions is depends on the distance
measurement. Formula:
dist((x, y), (a, b)) = √(x - a)² + (y - b)²
Random forest classifiers:Randomforestclassifierscanbe
used to solve regression or classification problems. The
random forest algorithm is composed of a collection of
decision trees, and each tree in the ensembleconsistsofdata
samples drawn from the training set with replacement,
called bootstrap samples.
3. OBJECTIVES OF THE STUDY
Machine learning algorithms used to classify the text into
two different categories, spam and ham, the algorithm will
predict the score more accurately. The purpose of
developing this model is to recognize and score the word
rapidly and accurately.
4. SCOPE OF THE STUDY
The proposed system of the project will effectively detect
spam mails and the system will extract spam mails using
some machine learning algorithms and it gives results with
more accuracy and good performance. This project required
a coordinated scope of work. These project scopes will help
focus the project. The scopes are:
 Modifying existing machine learning algorithm.
 Use and classify data sets, including data
preparation, classification, and visualization.
 Score the data to determine the accuracy of spam
detection.
 This proposed system will detect the credibility of
the mail and it will filter spam messages.
 This proposed system will save the time of the user
and it will eliminate the risk of spam mails.
Use case diagrams describe the high-level functions and
scope of the system, these diagrams also identify the
interactions between the system and its actors. A Use case
diagram outlines how external entities user interact with an
internal software system.
Fig -3: Use Case Diagram
A state diagram consists of states, transitions, activities, and
events. It describes the different states that an object moves
through or provide an abstract descriptionofthebehaviorof
a system.
Fig -4: State Diagram
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737
Activity diagrams are graphical representations of
workflows with support for selection, repetition, and
concurrency of step-by-step activities and tasks.
Fig -5: Activity Diagram
5. PROJECT ARCHITECRURE DIAGRAM
An architectural diagram is a visual representation that
shows the physical implementation of the components of a
software system. It shows the general structure of the
software system and the associations, boundaries andlimits
between each element.
Fig -6: Architecture Diagram of Email Spam Detection
6. CONCLUSIONS
This system, in addition to lessening the work load, it also
fixes any false data about the users that they may have.It isa
benefit for the users’ who’s important time and data is
preserved, for the Affected users or authority whose data is
immensely important, whose data will securedfrommisuse.
We are able to classify email as spam or non spam. With
huge number of emails if people are using the system it will
be difficult to handle all the possible mails as our project
deals with only limited amount.
The website use for end user, it is user friendliness. Because
of the end user it uses without any other help and without
any conflicts. The website goal is “Email Spam or Non Spam”
using machine learning, related to its use for free and
maintenance (coding, updates, uploading data,datasets,etc)
cost is less. The many goals was successfully completed and
achieved by us.
ACKNOWLEDGEMENT
This paper was supported by Alard College of Engineering &
Management, Pune 411057. Weareverythankful toall those
who have provided us valuable guidance towards the
completion of this SeminarReporton“Email SpamDetection
Using Machine Learning” as part of the syllabus of our
course. We express our sincere gratitude towards the
cooperative department who has provided us with valuable
assistance and requirements for the system development.
We are very grateful and Prof. Prachi Nikelar for guiding us
in the right manner, correcting our doubts by giving us their
time whenever we required, and providing their knowledge
and experience in making this project.
REFERENCES
[1] A Sharaff and Srinivasarao U (2020), "Towards
classification of email through selection of informative
features," First International Conference on Power,
Control and Computing Technologies (ICPC2T), Raipur,
India, pp. 316-320, DOI:
10.1109/ICPC2T48082.2020.9071488.
[2] Adebayo A. Alli, Modupe Odusami, Olusola A. Alli and
Sanjay Misra (2019), A reviewofsofttechniquesforSMS
classification: methods, approaches and applications,
Engineering Applications of Artificial Intelligence, DOI:
10.1016/j.engappai.2019.08.024.
[3] A. Sharma & H. Kaur,Improvedemail spamclassification
method using integrated particle swarm optimization
and decision tree. In Next Generation Computing
Technologies 2nd International Conference on pp. 516-
521, DOI: 10.1109/NGCT.2016.7877470.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 738
[4] A. Sharaff, A. Dhadse and Naresh K. Nagwani (2016),
Comparative study of classification algorithmsforspam
email detection, in Emerging Research in Computing,
communication and applications, Information, pp. 237-
244, Springer, Berlin, Germany, DOI: 10.1007/978-81-
322-2553-9_23.
[5] Alfandi O., Dahmani N. and Kaddoura S., "A Spam Email
Detection Mechanism for English Language Text Emails
Using DeepLearningApproach",IEEE29thInternational
Conference on EnablingTechnologies:Infrastructurefor
Collaborative Enterprises, France, Bayonne, pp. 193-
198, DOI: 10.1109/WETICE49692.2020.00045.
[6] Amin, Hossain N. & Rahman M. M., "A Bangla Spam
Email Detection and Datasets Creation Approach based
on Machine Learning Algorithms," 2019 3rd
International Conference on Electrical, Computer &
Telecommunication Engineering, Bangladesh,Rajshahi,
2019, pp. 169-172, DOI:
10.1109/ICECTE48615.2019.9303525.

More Related Content

PDF
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
PDF
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
PDF
A survey on Machine Learning and Artificial Neural Networks
PDF
A Machine learning based framework for Verification and Validation of Massive...
PDF
Comparative Study of Enchancement of Automated Student Attendance System Usin...
PDF
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
PDF
A Web-based Attendance System Using Face Recognition
PDF
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
A survey on Machine Learning and Artificial Neural Networks
A Machine learning based framework for Verification and Validation of Massive...
Comparative Study of Enchancement of Automated Student Attendance System Usin...
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
A Web-based Attendance System Using Face Recognition
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...

Similar to Email Spam Detection Using Machine Learning (20)

PDF
Network Intrusion Detection System using Machine Learning
PDF
Cross Domain Recommender System using Machine Learning and Transferable Knowl...
PDF
IRJET- Intelligence Extraction using Various Machine Learning Algorithms
PDF
Departure Delay Prediction using Machine Learning.
PDF
Performance analysis of binary and multiclass models using azure machine lear...
PDF
Implementation of Spam Classifier using Naïve Bayes Algorithm
PDF
IRJET- Automated CV Classification using Clustering Technique
PDF
Academic Resources Architecture Framework Planning using ERP in Cloud Computing
PDF
IRJET- Machine Learning based Network Security
PDF
Handwritten Text Recognition Using Machine Learning
PDF
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
PDF
IRJET - Encoded Polymorphic Aspect of Clustering
PDF
IRJET- Deep Learning Model to Predict Hardware Performance
PDF
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
PDF
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
PDF
Network Intrusion Detection System Based on Modified Random Forest Classifier...
PDF
IRJET - Automated Fraud Detection Framework in Examination Halls
PDF
IRJET- A Comparative Research of Rule based Classification on Dataset using W...
PDF
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
PDF
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
Network Intrusion Detection System using Machine Learning
Cross Domain Recommender System using Machine Learning and Transferable Knowl...
IRJET- Intelligence Extraction using Various Machine Learning Algorithms
Departure Delay Prediction using Machine Learning.
Performance analysis of binary and multiclass models using azure machine lear...
Implementation of Spam Classifier using Naïve Bayes Algorithm
IRJET- Automated CV Classification using Clustering Technique
Academic Resources Architecture Framework Planning using ERP in Cloud Computing
IRJET- Machine Learning based Network Security
Handwritten Text Recognition Using Machine Learning
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET - Encoded Polymorphic Aspect of Clustering
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
Network Intrusion Detection System Based on Modified Random Forest Classifier...
IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET- A Comparative Research of Rule based Classification on Dataset using W...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
DOCX
573137875-Attendance-Management-System-original
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
composite construction of structures.pdf
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Welding lecture in detail for understanding
PPTX
Construction Project Organization Group 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Well-logging-methods_new................
CH1 Production IntroductoryConcepts.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Foundation to blockchain - A guide to Blockchain Tech
573137875-Attendance-Management-System-original
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Embodied AI: Ushering in the Next Era of Intelligent Systems
Structs to JSON How Go Powers REST APIs.pdf
Sustainable Sites - Green Building Construction
composite construction of structures.pdf
Digital Logic Computer Design lecture notes
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Welding lecture in detail for understanding
Construction Project Organization Group 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Model Code of Practice - Construction Work - 21102022 .pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Well-logging-methods_new................

Email Spam Detection Using Machine Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 735 Email Spam Detection Using Machine Learning Prof. Prachi Nilekar, Tamboli Abdul Salam, Manish Kumar Gupta, Krishna Sharma, Safwan Attar ALARD COLLEGE OF ENGINEERING & MANAGEMENT (ALARD Knowledge Park, Survey No. 50, Marunje, Near Rajiv Gandhi IT Park, Hinjewadi, Pune-411057) Approved by AICTE. Recognized by DTE. NAAC Accredited. Affiliated to SPPU (Pune University). ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – Nowadays, Email spam has become a big problem, with the fast growth of internet users, email spams are also increasing. People are using them for phishing, illegal and unethical practices and frauds. Sending malicious links through spam emails that can harm for our system and may also get into your system. It is very simple for spammers to create a fake profile and email account, they show like a real person in their spam emails, these spammers simply target people who are not aware of these frauds. then there is a need to identify those spam mails which are frauds, this project will identifies those spams using techniques of machine learning, this paper will discuss machinelearningalgorithm'sandapply all these algorithm's to our dataset. it select the best algorithm, for this project algorithm will be chosen based on the best accuracy and precision in email spam detecting. Key Words: (Machine Learning, Naive Bayes, Support Vector Machine, DTS, Random Forest, Bagging, Boosting) 1. INTRODUCTION Machine learning approaches are more efficient, a set of training data is used, these samples are the set of email which are pre classified.Machinelearningapproacheshavea lot of algorithms that can be used for email filtering, these algorithms are “Naive Bayes, support vector machines, Neural Networks, K-nearest neighbor, RandomForests, etc.” Why Machine Learning: Machine learning allows the user to feed a computer algorithm an immense amount of data and have the computer analyze and make data-driven recommendations and decisions based on only the input data. What is DATASET: Dataset is a collection of data or related information that is composed for separate elements. A collection of datasets for e-mail spam contains spam and non-spam messages. What is Train and Test datasets: The main difference between training data and test data is that training data is the subset of original data that is used to train a machine learning model, whereas test data is used to check the accuracy of the model. The training dataset is usually larger in size than the test dataset. Train and test dataset are two key concepts in machine learning, wherethetrainingdataset is used to fit the model, and the test dataset is used to evaluate the model. Fig -1: Train and Test Model Machine learning algorithms used to classify the text into two different categories, spam and ham. The algorithm will predict the score more accurately. The objective of developing this model is to detect and score word faster and accurately. 2. MACHINE LEARNING CLASSIFICATION ALGORITHMS Naive Bayes: Naive Bayes is a classification algorithm suitable for both binary and multiclass classification. Naive Bayes performs better for categorical input variables than for numerical variables. It is useful for making predictions based on historical results and forecast data. P(A) is Prior Probability: The possibility of a hypothesis before seeing the evidence. P(B) is Marginal Probability: Probability of Evidence. Support Vector Machine: SVMs are used in intrusion detection, face detection, email classification, gene classification, web pages, etc. It can handle classificationand regression on linear and non-linear data.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 736 Fig -2: Support Vector Machine Decision tree: Decision trees are extremely useful for data analytics and machine learning because they break down complex data into more manageable chunks. They are often used in these fields for predictive analysis, data classification, and regression. Entropy using the frequency table of one attribute: Entropy using the frequency table of two attributes: KNN: The KNN algorithm can compete with highly accurate models because it makes highly accurate predictions. The KNN algorithm use for applications that require high accuracy but do not require a human readable model. The quality of the predictions is depends on the distance measurement. Formula: dist((x, y), (a, b)) = √(x - a)² + (y - b)² Random forest classifiers:Randomforestclassifierscanbe used to solve regression or classification problems. The random forest algorithm is composed of a collection of decision trees, and each tree in the ensembleconsistsofdata samples drawn from the training set with replacement, called bootstrap samples. 3. OBJECTIVES OF THE STUDY Machine learning algorithms used to classify the text into two different categories, spam and ham, the algorithm will predict the score more accurately. The purpose of developing this model is to recognize and score the word rapidly and accurately. 4. SCOPE OF THE STUDY The proposed system of the project will effectively detect spam mails and the system will extract spam mails using some machine learning algorithms and it gives results with more accuracy and good performance. This project required a coordinated scope of work. These project scopes will help focus the project. The scopes are:  Modifying existing machine learning algorithm.  Use and classify data sets, including data preparation, classification, and visualization.  Score the data to determine the accuracy of spam detection.  This proposed system will detect the credibility of the mail and it will filter spam messages.  This proposed system will save the time of the user and it will eliminate the risk of spam mails. Use case diagrams describe the high-level functions and scope of the system, these diagrams also identify the interactions between the system and its actors. A Use case diagram outlines how external entities user interact with an internal software system. Fig -3: Use Case Diagram A state diagram consists of states, transitions, activities, and events. It describes the different states that an object moves through or provide an abstract descriptionofthebehaviorof a system. Fig -4: State Diagram
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737 Activity diagrams are graphical representations of workflows with support for selection, repetition, and concurrency of step-by-step activities and tasks. Fig -5: Activity Diagram 5. PROJECT ARCHITECRURE DIAGRAM An architectural diagram is a visual representation that shows the physical implementation of the components of a software system. It shows the general structure of the software system and the associations, boundaries andlimits between each element. Fig -6: Architecture Diagram of Email Spam Detection 6. CONCLUSIONS This system, in addition to lessening the work load, it also fixes any false data about the users that they may have.It isa benefit for the users’ who’s important time and data is preserved, for the Affected users or authority whose data is immensely important, whose data will securedfrommisuse. We are able to classify email as spam or non spam. With huge number of emails if people are using the system it will be difficult to handle all the possible mails as our project deals with only limited amount. The website use for end user, it is user friendliness. Because of the end user it uses without any other help and without any conflicts. The website goal is “Email Spam or Non Spam” using machine learning, related to its use for free and maintenance (coding, updates, uploading data,datasets,etc) cost is less. The many goals was successfully completed and achieved by us. ACKNOWLEDGEMENT This paper was supported by Alard College of Engineering & Management, Pune 411057. Weareverythankful toall those who have provided us valuable guidance towards the completion of this SeminarReporton“Email SpamDetection Using Machine Learning” as part of the syllabus of our course. We express our sincere gratitude towards the cooperative department who has provided us with valuable assistance and requirements for the system development. We are very grateful and Prof. Prachi Nikelar for guiding us in the right manner, correcting our doubts by giving us their time whenever we required, and providing their knowledge and experience in making this project. REFERENCES [1] A Sharaff and Srinivasarao U (2020), "Towards classification of email through selection of informative features," First International Conference on Power, Control and Computing Technologies (ICPC2T), Raipur, India, pp. 316-320, DOI: 10.1109/ICPC2T48082.2020.9071488. [2] Adebayo A. Alli, Modupe Odusami, Olusola A. Alli and Sanjay Misra (2019), A reviewofsofttechniquesforSMS classification: methods, approaches and applications, Engineering Applications of Artificial Intelligence, DOI: 10.1016/j.engappai.2019.08.024. [3] A. Sharma & H. Kaur,Improvedemail spamclassification method using integrated particle swarm optimization and decision tree. In Next Generation Computing Technologies 2nd International Conference on pp. 516- 521, DOI: 10.1109/NGCT.2016.7877470.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 738 [4] A. Sharaff, A. Dhadse and Naresh K. Nagwani (2016), Comparative study of classification algorithmsforspam email detection, in Emerging Research in Computing, communication and applications, Information, pp. 237- 244, Springer, Berlin, Germany, DOI: 10.1007/978-81- 322-2553-9_23. [5] Alfandi O., Dahmani N. and Kaddoura S., "A Spam Email Detection Mechanism for English Language Text Emails Using DeepLearningApproach",IEEE29thInternational Conference on EnablingTechnologies:Infrastructurefor Collaborative Enterprises, France, Bayonne, pp. 193- 198, DOI: 10.1109/WETICE49692.2020.00045. [6] Amin, Hossain N. & Rahman M. M., "A Bangla Spam Email Detection and Datasets Creation Approach based on Machine Learning Algorithms," 2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering, Bangladesh,Rajshahi, 2019, pp. 169-172, DOI: 10.1109/ICECTE48615.2019.9303525.