Spam email filtering

Report
Abstract:
Pattern classification is a branch of machine learning that focuses on recognition of patterns and
regularities in data. In adversarial applications like biometric authentication, spam filtering,
network intrusion detection the pattern classification systems are used. Our research paper
consists of comprehensive study of spam detection algorithms under the category of content
predicated filtering and rule predicated filtering. The implemented results have been
benchmarked to analyze how accurately they have been relegated into their pristine categories of
spam and ham. Further, an incipient filter has been suggested in the proposed work by the
interfacing of rule predicated filtering followed by content predicated filtering for more efficient
results. The system evaluates at design phase the security of pattern classifiers, namely, the
performance degradation under potential attacks they may incur during operation. A framework
is used for evaluation of classifier security that formalizes and generalizes the training and
testing datasets. As this antagonistic situation is not considered by traditional configuration
techniques, design transfer frameworks may show susceptibilities, whose abuse might
astringently influence their execution, and subsequently restrain their commonsense utility.
Extending example assignment hypothesis and configuration routines to antagonistic settings is
subsequently a novel and exceptionally germane examination bearing, which has not yet been
pursued in an efficient way.
Keywords: -Pattern classification; adversarial classification; performance evaluation; security
evaluation; robustness evaluation.

1. INTRODUCTION:
1.1 About the project:
Here we present novel method to access Classifier
Security againsttheir attack while presenting we are
going to build SPAM filtering applicationwith the
help of “Bag Of Words Method” also we are going to
Analyze the overallSPAM traffic on our Email
Server with the help of HADOOP .
1.2 Purposeof the project:
Developsuitablecountermeasuresbefore the
attack actuallyoccurs.
 To provide practicalguidelines for simulating
realisticattack scenarios, we define a general
model of the adversary.
 Algorithm for the generationof training and
testing sets to be used for security evaluation.
1.3 Domainof the project:

This Project Falls into Data Mining, Pattern
Classificationand Hadoop –BigDataDomain
2. Literature survey:
Sr.
No.
Title Author Description
1. Security Evaluation of
Pattern Classifiers under
Attack.
Battista Biggio, Giorgio
Fumera, Fabio Roli.
Study of Existing System-biometric,
IDS, Spam filtering.
2. Identifying Security
Evaluation of Pattern
Classifiers under Attack.
S. P. MohanaPriya, S.
Pothumani.
empirical security evaluation of pattern
classifiers.
3. Comparison and
analysis of a spam
detection algorithm.
Sahil Puri, Dishant Gosain,
Mehak Ahuja, Ishita
Kathuria, Nishtha Jatana.
Different algorithms used for spam
filtering and their comparison.
5. Effectiveness and
Limitations of statistical
spam filters.
M. Tariq Banday, Tariq R.
Jan.
Limitations of Statistical spam filters.
6. Spam Filtering using
support vector machine.
Priyankachhabra, Rajesh
Wadhvani, SanyamShukla.
The number of features for Spam
filtering is more than 7000.
3. System Analysis:
3.1 Existing System:
Design assignment frameworks predicated on traditional hypothesis and configuration
systems don't consider antagonistic settings, they display susceptibilities to a few potential
assaults, authorizing foes to undermine their adequacy. A deliberate and cumulated treatment

of this issue is subsequently expected to authorize the trusted selection of example classifiers
in ill-disposed situations, beginning from the hypothetical substructures up to novel outline
strategies, extending the traditional configuration cycle of Specifically, three fundamental
open issues can be recognized: (i) break down the susceptibilities of assignment calculations,
and the comparing assaults. (ii) Developing novel techniques to survey classifier security
against these assailment, which are impractical using traditional execution assessment
routines. (iii) Developing novel configuration techniques to guarantee classifier security in
ill-disposed situations.
In the Year 2009 A. Kolcz and Teo developed method for Feature weighting for
improved classifier robustness,” in 6th Conf. on Email and Anti-Spam[5] and in the year
2010 Abernethy, Chapelle and Castillo developed prototype Graph regularization methods
for Web spam detection’ [8]
3.2 Drawbacks of existing system:
Poor dissecting the vulnerabilities of arrangement calculations, and the relating
assaults A noxious website admin may control web crawler rankings to misleadingly advance
their site.
3.3 Feasibilityof Project:
Avoid Shoulder Attack In Card Payment: Given a failure case viz. Q, Client uses
invalid IP address or Port No we devise an algorithm for this problem as follows:
For a Problem P1 to be NP-Hard, Satisfiability problem (SAT) must be reducible to P1;
SAT ≤ P ;
Let the propositional formula be: G= X1^X2
Where
X1: True if client uses invalid IP address or Link
X2: True if port no is invalid
Algosati()
{
For i:1 to 2

xi= Choice(True,False);
if G(x1,x2) then
Success();
else
failure();
}
Therefore, since the problem becomes a decision problem, it is NP.
Satisfiability and Reducibility:
3SAT problem is NP Complete. The system can be reduced to 3SAT problem. A
3SAT problem takes a Boolean formula S that is in CNF in which each clause has exactly
three literals. 3SAT is a restricted form of CNF-SAT problem.
x1–Receive Email
x2 - Get pattern Data from Database
x3–Pattern Matching
S=(x1 ^ x2 ^ x3)
Algosat()
{
For i= 1 to 3
Xi=Choice(true, false)
If(S(x1,x2,x3)= true)
Success()
Else
Failure()
}
As it is polynomial time. It is NP-Complete.

3.4 ProposedSystem:
In this work we address issues above by building up a structure for the observational
assessment of classifier security at configuration stage that lengthens the model separate and
execution assessment ventures of the established outline cycle .We compress front work, and
call attention to three fundamental originations that rise up out of it. We then formalize and
sum them up in our system. To start with, to seek after security in the connection of a
weapons contest it is not adequate to respond to watched assaults, but rather it is also
obligatory to proactively suspect the foe by guessing the most apropos, potential assaults
through an imagine a scenario where investigation; this authorizations one to create
compatible countermeasures in advance of the assailment genuinely happens, as per the
guideline of security by configuration. Second, to give functional rules to recreating genuine
assault situations, we characterize a general model of the enemy, regarding her objective,
discernment, and capacity, which incorporate and sum up models proposed in foremost work.
Third, since the vicinity of scrupulously focused on assaults may influence the conveyance of
preparing and testing information discretely, we propose an information's model dispersion
that can formally describe this comportment, and that authorizes us to consider a hugely huge
number of potential assaults; we withal propose a calculation for the era of preparing and
testing sets to be used for security assessment, which can normally suit application-concrete
and heuristic methods for mimicking assaults.

3.5 Advantages of proposedsystem:
 Prevents developing novel methods to assess classifier security against these
attack.
 The presence of an intelligent and adaptive adversary makes the classification
problem highly non-stationary .
 Reduce chances of Attack by detecting it in early stage.
 Saves cost as this prototype can be used in multiple applications.
3.6 ProposedSystem Features:
3.7 Goals and Objectives:
• To build the efficient Spam Filtering System.
• Analysis of Spam emails on Hadoop platform.
• Build Email Application.

• Create our own Database.
• Implement SPAM Filtering algorithm in JAVA and Hadoop.
• Create result and perform testing.
4. System Requirements:
4.1 Software Requirements:
Operating system : Windows XP/7.
Coding Language : JAVA/J2EE.
IDE : Eclipse .
Database : Hadoop , MYSQL.
4.2 Hardware Requirements:
System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
RAM : 4 GB
Monitor : 15 VGA Colour.
Mouse : Logitech.

Spam email filtering

More Related Content

What's hot (19)

Similar to Spam email filtering (20)

Recently uploaded (20)

Spam email filtering