Report
Abstract:
Pattern classification is a branch of machine learning that focuses on recognition of patterns and
regularities in data. In adversarial applications like biometric authentication, spam filtering,
network intrusion detection the pattern classification systems are used. Our research paper
consists of comprehensive study of spam detection algorithms under the category of content
predicated filtering and rule predicated filtering. The implemented results have been
benchmarked to analyze how accurately they have been relegated into their pristine categories of
spam and ham. Further, an incipient filter has been suggested in the proposed work by the
interfacing of rule predicated filtering followed by content predicated filtering for more efficient
results. The system evaluates at design phase the security of pattern classifiers, namely, the
performance degradation under potential attacks they may incur during operation. A framework
is used for evaluation of classifier security that formalizes and generalizes the training and
testing datasets. As this antagonistic situation is not considered by traditional configuration
techniques, design transfer frameworks may show susceptibilities, whose abuse might
astringently influence their execution, and subsequently restrain their commonsense utility.
Extending example assignment hypothesis and configuration routines to antagonistic settings is
subsequently a novel and exceptionally germane examination bearing, which has not yet been
pursued in an efficient way.
Keywords: -Pattern classification; adversarial classification; performance evaluation; security
evaluation; robustness evaluation.
1. INTRODUCTION:
1.1 About the project:
Here we present novel method to access Classifier
Security againsttheir attack while presenting we are
going to build SPAM filtering applicationwith the
help of “Bag Of Words Method” also we are going to
Analyze the overallSPAM traffic on our Email
Server with the help of HADOOP .
1.2 Purposeof the project:
Developsuitablecountermeasuresbefore the
attack actuallyoccurs.
 To provide practicalguidelines for simulating
realisticattack scenarios, we define a general
model of the adversary.
 Algorithm for the generationof training and
testing sets to be used for security evaluation.
1.3 Domainof the project:
This Project Falls into Data Mining, Pattern
Classificationand Hadoop –BigDataDomain
2. Literature survey:
Sr.
No.
Title Author Description
1. Security Evaluation of
Pattern Classifiers under
Attack.
Battista Biggio, Giorgio
Fumera, Fabio Roli.
Study of Existing System-biometric,
IDS, Spam filtering.
2. Identifying Security
Evaluation of Pattern
Classifiers under Attack.
S. P. MohanaPriya, S.
Pothumani.
empirical security evaluation of pattern
classifiers.
3. Comparison and
analysis of a spam
detection algorithm.
Sahil Puri, Dishant Gosain,
Mehak Ahuja, Ishita
Kathuria, Nishtha Jatana.
Different algorithms used for spam
filtering and their comparison.
5. Effectiveness and
Limitations of statistical
spam filters.
M. Tariq Banday, Tariq R.
Jan.
Limitations of Statistical spam filters.
6. Spam Filtering using
support vector machine.
Priyankachhabra, Rajesh
Wadhvani, SanyamShukla.
The number of features for Spam
filtering is more than 7000.
3. System Analysis:
3.1 Existing System:
Design assignment frameworks predicated on traditional hypothesis and configuration
systems don't consider antagonistic settings, they display susceptibilities to a few potential
assaults, authorizing foes to undermine their adequacy. A deliberate and cumulated treatment
of this issue is subsequently expected to authorize the trusted selection of example classifiers
in ill-disposed situations, beginning from the hypothetical substructures up to novel outline
strategies, extending the traditional configuration cycle of Specifically, three fundamental
open issues can be recognized: (i) break down the susceptibilities of assignment calculations,
and the comparing assaults. (ii) Developing novel techniques to survey classifier security
against these assailment, which are impractical using traditional execution assessment
routines. (iii) Developing novel configuration techniques to guarantee classifier security in
ill-disposed situations.
In the Year 2009 A. Kolcz and Teo developed method for Feature weighting for
improved classifier robustness,” in 6th Conf. on Email and Anti-Spam[5] and in the year
2010 Abernethy, Chapelle and Castillo developed prototype Graph regularization methods
for Web spam detection’ [8]
3.2 Drawbacks of existing system:
Poor dissecting the vulnerabilities of arrangement calculations, and the relating
assaults A noxious website admin may control web crawler rankings to misleadingly advance
their site.
3.3 Feasibilityof Project:
Avoid Shoulder Attack In Card Payment: Given a failure case viz. Q, Client uses
invalid IP address or Port No we devise an algorithm for this problem as follows:
For a Problem P1 to be NP-Hard, Satisfiability problem (SAT) must be reducible to P1;
SAT ≤ P ;
Let the propositional formula be: G= X1^X2
Where
X1: True if client uses invalid IP address or Link
X2: True if port no is invalid
Algosati()
{
For i:1 to 2
xi= Choice(True,False);
if G(x1,x2) then
Success();
else
failure();
}
Therefore, since the problem becomes a decision problem, it is NP.
Satisfiability and Reducibility:
3SAT problem is NP Complete. The system can be reduced to 3SAT problem. A
3SAT problem takes a Boolean formula S that is in CNF in which each clause has exactly
three literals. 3SAT is a restricted form of CNF-SAT problem.
x1–Receive Email
x2 - Get pattern Data from Database
x3–Pattern Matching
S=(x1 ^ x2 ^ x3)
Algosat()
{
For i= 1 to 3
Xi=Choice(true, false)
If(S(x1,x2,x3)= true)
Success()
Else
Failure()
}
As it is polynomial time. It is NP-Complete.
3.4 ProposedSystem:
In this work we address issues above by building up a structure for the observational
assessment of classifier security at configuration stage that lengthens the model separate and
execution assessment ventures of the established outline cycle .We compress front work, and
call attention to three fundamental originations that rise up out of it. We then formalize and
sum them up in our system. To start with, to seek after security in the connection of a
weapons contest it is not adequate to respond to watched assaults, but rather it is also
obligatory to proactively suspect the foe by guessing the most apropos, potential assaults
through an imagine a scenario where investigation; this authorizations one to create
compatible countermeasures in advance of the assailment genuinely happens, as per the
guideline of security by configuration. Second, to give functional rules to recreating genuine
assault situations, we characterize a general model of the enemy, regarding her objective,
discernment, and capacity, which incorporate and sum up models proposed in foremost work.
Third, since the vicinity of scrupulously focused on assaults may influence the conveyance of
preparing and testing information discretely, we propose an information's model dispersion
that can formally describe this comportment, and that authorizes us to consider a hugely huge
number of potential assaults; we withal propose a calculation for the era of preparing and
testing sets to be used for security assessment, which can normally suit application-concrete
and heuristic methods for mimicking assaults.
3.5 Advantages of proposedsystem:
 Prevents developing novel methods to assess classifier security against these
attack.
 The presence of an intelligent and adaptive adversary makes the classification
problem highly non-stationary .
 Reduce chances of Attack by detecting it in early stage.
 Saves cost as this prototype can be used in multiple applications.
3.6 ProposedSystem Features:
3.7 Goals and Objectives:
• To build the efficient Spam Filtering System.
• Analysis of Spam emails on Hadoop platform.
• Build Email Application.
• Create our own Database.
• Implement SPAM Filtering algorithm in JAVA and Hadoop.
• Create result and perform testing.
4. System Requirements:
4.1 Software Requirements:
Operating system : Windows XP/7.
Coding Language : JAVA/J2EE.
IDE : Eclipse .
Database : Hadoop , MYSQL.
4.2 Hardware Requirements:
System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
RAM : 4 GB
Monitor : 15 VGA Colour.
Mouse : Logitech.

More Related Content

DOCX
Security evaluation of pattern classifiers under attack
DOC
Security evaluation of pattern classifiers under attack
PPTX
I Dunderstn
DOCX
JPJ1425 Security Evaluation of Pattern Classifiers under Attack
PPTX
Machine learning in computer security
DOCX
security evaluation of pattern classifiers under attack
PDF
Probabilistic models for anomaly detection based on usage of network traffic
PPTX
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
Security evaluation of pattern classifiers under attack
Security evaluation of pattern classifiers under attack
I Dunderstn
JPJ1425 Security Evaluation of Pattern Classifiers under Attack
Machine learning in computer security
security evaluation of pattern classifiers under attack
Probabilistic models for anomaly detection based on usage of network traffic
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique

What's hot (19)

DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Security evaluation of pattern classifier...
PPTX
Subverting Machine Learning Detections for fun and profit
PDF
M.E Computer Science Biometric System Projects
PDF
Progress of Machine Learning in the Field of Intrusion Detection Systems
PDF
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
DOC
Cyb 5675 class project final
PPTX
Transforming incident Response to Intelligent Response using Graphs
PDF
A fast static analysis approach to detect exploit code inside network flows
PDF
Ij2514951500
DOCX
Node-Level Trust Evaluation in Wireless Sensor Networks
PPTX
Infiltrate 2015 - Data Driven Offense
PDF
Computer Worms Based on Monitoring Replication and Damage: Experiment and Eva...
PDF
Failure of A Mix Network
PDF
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKS
PDF
NOVEL HYBRID INTRUSION DETECTION SYSTEM FOR CLUSTERED WIRELESS SENSOR NETWORK
PDF
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
PDF
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
IEEE 2014 JAVA DATA MINING PROJECTS Security evaluation of pattern classifier...
Subverting Machine Learning Detections for fun and profit
M.E Computer Science Biometric System Projects
Progress of Machine Learning in the Field of Intrusion Detection Systems
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
Cyb 5675 class project final
Transforming incident Response to Intelligent Response using Graphs
A fast static analysis approach to detect exploit code inside network flows
Ij2514951500
Node-Level Trust Evaluation in Wireless Sensor Networks
Infiltrate 2015 - Data Driven Offense
Computer Worms Based on Monitoring Replication and Damage: Experiment and Eva...
Failure of A Mix Network
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKS
NOVEL HYBRID INTRUSION DETECTION SYSTEM FOR CLUSTERED WIRELESS SENSOR NETWORK
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Ad

Similar to Spam email filtering (20)

DOCX
2014 IEEE JAVA DATA MINING PROJECT Security evaluation of pattern classifiers...
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
PDF
Integration of feature sets with machine learning techniques
PDF
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
PPTX
Presentation1.pptx
PDF
Evaluating the Security of Machine Learning Algorithms
PDF
The Detection of Suspicious Email Based on Decision Tree ...
PDF
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
PDF
IRJET- Suspicious Email Detection System
PDF
Threat Detection System Using Data-science and NLP
PDF
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
PDF
Detecting spam mail using machine learning algorithm
PPTX
When Cyber Security Meets Machine Learning
2014 IEEE JAVA DATA MINING PROJECT Security evaluation of pattern classifiers...
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
Integration of feature sets with machine learning techniques
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
Presentation1.pptx
Evaluating the Security of Machine Learning Algorithms
The Detection of Suspicious Email Based on Decision Tree ...
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
IRJET- Suspicious Email Detection System
Threat Detection System Using Data-science and NLP
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
Detecting spam mail using machine learning algorithm
When Cyber Security Meets Machine Learning
Ad

Recently uploaded (20)

PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
20th Century Theater, Methods, History.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Empowerment Technology for Senior High School Guide
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
Computer Architecture Input Output Memory.pptx
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
advance database management system book.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
HVAC Specification 2024 according to central public works department
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
20th Century Theater, Methods, History.pptx
History, Philosophy and sociology of education (1).pptx
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Empowerment Technology for Senior High School Guide
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Introduction to pro and eukaryotes and differences.pptx
Weekly quiz Compilation Jan -July 25.pdf
What if we spent less time fighting change, and more time building what’s rig...
TNA_Presentation-1-Final(SAVE)) (1).pptx
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Computer Architecture Input Output Memory.pptx
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
advance database management system book.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
HVAC Specification 2024 according to central public works department

Spam email filtering

  • 1. Report Abstract: Pattern classification is a branch of machine learning that focuses on recognition of patterns and regularities in data. In adversarial applications like biometric authentication, spam filtering, network intrusion detection the pattern classification systems are used. Our research paper consists of comprehensive study of spam detection algorithms under the category of content predicated filtering and rule predicated filtering. The implemented results have been benchmarked to analyze how accurately they have been relegated into their pristine categories of spam and ham. Further, an incipient filter has been suggested in the proposed work by the interfacing of rule predicated filtering followed by content predicated filtering for more efficient results. The system evaluates at design phase the security of pattern classifiers, namely, the performance degradation under potential attacks they may incur during operation. A framework is used for evaluation of classifier security that formalizes and generalizes the training and testing datasets. As this antagonistic situation is not considered by traditional configuration techniques, design transfer frameworks may show susceptibilities, whose abuse might astringently influence their execution, and subsequently restrain their commonsense utility. Extending example assignment hypothesis and configuration routines to antagonistic settings is subsequently a novel and exceptionally germane examination bearing, which has not yet been pursued in an efficient way. Keywords: -Pattern classification; adversarial classification; performance evaluation; security evaluation; robustness evaluation.
  • 2. 1. INTRODUCTION: 1.1 About the project: Here we present novel method to access Classifier Security againsttheir attack while presenting we are going to build SPAM filtering applicationwith the help of “Bag Of Words Method” also we are going to Analyze the overallSPAM traffic on our Email Server with the help of HADOOP . 1.2 Purposeof the project: Developsuitablecountermeasuresbefore the attack actuallyoccurs.  To provide practicalguidelines for simulating realisticattack scenarios, we define a general model of the adversary.  Algorithm for the generationof training and testing sets to be used for security evaluation. 1.3 Domainof the project:
  • 3. This Project Falls into Data Mining, Pattern Classificationand Hadoop –BigDataDomain 2. Literature survey: Sr. No. Title Author Description 1. Security Evaluation of Pattern Classifiers under Attack. Battista Biggio, Giorgio Fumera, Fabio Roli. Study of Existing System-biometric, IDS, Spam filtering. 2. Identifying Security Evaluation of Pattern Classifiers under Attack. S. P. MohanaPriya, S. Pothumani. empirical security evaluation of pattern classifiers. 3. Comparison and analysis of a spam detection algorithm. Sahil Puri, Dishant Gosain, Mehak Ahuja, Ishita Kathuria, Nishtha Jatana. Different algorithms used for spam filtering and their comparison. 5. Effectiveness and Limitations of statistical spam filters. M. Tariq Banday, Tariq R. Jan. Limitations of Statistical spam filters. 6. Spam Filtering using support vector machine. Priyankachhabra, Rajesh Wadhvani, SanyamShukla. The number of features for Spam filtering is more than 7000. 3. System Analysis: 3.1 Existing System: Design assignment frameworks predicated on traditional hypothesis and configuration systems don't consider antagonistic settings, they display susceptibilities to a few potential assaults, authorizing foes to undermine their adequacy. A deliberate and cumulated treatment
  • 4. of this issue is subsequently expected to authorize the trusted selection of example classifiers in ill-disposed situations, beginning from the hypothetical substructures up to novel outline strategies, extending the traditional configuration cycle of Specifically, three fundamental open issues can be recognized: (i) break down the susceptibilities of assignment calculations, and the comparing assaults. (ii) Developing novel techniques to survey classifier security against these assailment, which are impractical using traditional execution assessment routines. (iii) Developing novel configuration techniques to guarantee classifier security in ill-disposed situations. In the Year 2009 A. Kolcz and Teo developed method for Feature weighting for improved classifier robustness,” in 6th Conf. on Email and Anti-Spam[5] and in the year 2010 Abernethy, Chapelle and Castillo developed prototype Graph regularization methods for Web spam detection’ [8] 3.2 Drawbacks of existing system: Poor dissecting the vulnerabilities of arrangement calculations, and the relating assaults A noxious website admin may control web crawler rankings to misleadingly advance their site. 3.3 Feasibilityof Project: Avoid Shoulder Attack In Card Payment: Given a failure case viz. Q, Client uses invalid IP address or Port No we devise an algorithm for this problem as follows: For a Problem P1 to be NP-Hard, Satisfiability problem (SAT) must be reducible to P1; SAT ≤ P ; Let the propositional formula be: G= X1^X2 Where X1: True if client uses invalid IP address or Link X2: True if port no is invalid Algosati() { For i:1 to 2
  • 5. xi= Choice(True,False); if G(x1,x2) then Success(); else failure(); } Therefore, since the problem becomes a decision problem, it is NP. Satisfiability and Reducibility: 3SAT problem is NP Complete. The system can be reduced to 3SAT problem. A 3SAT problem takes a Boolean formula S that is in CNF in which each clause has exactly three literals. 3SAT is a restricted form of CNF-SAT problem. x1–Receive Email x2 - Get pattern Data from Database x3–Pattern Matching S=(x1 ^ x2 ^ x3) Algosat() { For i= 1 to 3 Xi=Choice(true, false) If(S(x1,x2,x3)= true) Success() Else Failure() } As it is polynomial time. It is NP-Complete.
  • 6. 3.4 ProposedSystem: In this work we address issues above by building up a structure for the observational assessment of classifier security at configuration stage that lengthens the model separate and execution assessment ventures of the established outline cycle .We compress front work, and call attention to three fundamental originations that rise up out of it. We then formalize and sum them up in our system. To start with, to seek after security in the connection of a weapons contest it is not adequate to respond to watched assaults, but rather it is also obligatory to proactively suspect the foe by guessing the most apropos, potential assaults through an imagine a scenario where investigation; this authorizations one to create compatible countermeasures in advance of the assailment genuinely happens, as per the guideline of security by configuration. Second, to give functional rules to recreating genuine assault situations, we characterize a general model of the enemy, regarding her objective, discernment, and capacity, which incorporate and sum up models proposed in foremost work. Third, since the vicinity of scrupulously focused on assaults may influence the conveyance of preparing and testing information discretely, we propose an information's model dispersion that can formally describe this comportment, and that authorizes us to consider a hugely huge number of potential assaults; we withal propose a calculation for the era of preparing and testing sets to be used for security assessment, which can normally suit application-concrete and heuristic methods for mimicking assaults.
  • 7. 3.5 Advantages of proposedsystem:  Prevents developing novel methods to assess classifier security against these attack.  The presence of an intelligent and adaptive adversary makes the classification problem highly non-stationary .  Reduce chances of Attack by detecting it in early stage.  Saves cost as this prototype can be used in multiple applications. 3.6 ProposedSystem Features: 3.7 Goals and Objectives: • To build the efficient Spam Filtering System. • Analysis of Spam emails on Hadoop platform. • Build Email Application.
  • 8. • Create our own Database. • Implement SPAM Filtering algorithm in JAVA and Hadoop. • Create result and perform testing. 4. System Requirements: 4.1 Software Requirements: Operating system : Windows XP/7. Coding Language : JAVA/J2EE. IDE : Eclipse . Database : Hadoop , MYSQL. 4.2 Hardware Requirements: System : Pentium IV 2.4 GHz. Hard Disk : 40 GB. RAM : 4 GB Monitor : 15 VGA Colour. Mouse : Logitech.