SlideShare a Scribd company logo
IJSRD - International Journal for Scientific Research & Development| Vol. 2, Issue 09, 2014 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 1
An Analysis of Effective Anti Spam Protocol Using Decision Tree
Classifiers
Sarasu.S 1
Sathees Kumar.B2
1
BCA, MSc., MPhil
2
MCA, MPhil, SET, Assistant Professor
Abstract— As the internet usage increases in day to day
activities, there is an inherent corresponding increase in
usage of communication through it with email being the
mainstay or rather in the forefront of modern day
communication methodologies for businesses and general
persons as well. This has led to get customer attention in the
form of unwanted and unsolicited bombarding of the
customers mail accounts with advertisements, offers,
phishing activities, viruses, worms, trojans, generating hate
crimes, making the customer to part with sensitive
information like passwords, and other media as well which
is known as spam. Spam is mass mailing or flooding of
mail account servers with unwanted trash data causing
damage some times. Spam filters have been in use from the
time such mail flooding happens. Most of the spam filters
are manual meaning which the user after identifying a mail
in his account blocks the sender and henceforth the system
will not allow mails to the inbox from such addresses.
However the spammers are resilient and send spam mails
from different identities and flood the inboxes. This study
focuses on algorithms and data mining techniques used to
unearth spam mails. They filter the inbox mails as they
arrive at the server depending on certain rules which are
already defined known as supervised learning methods.
Such technologies are known as knowledge engineering
techniques. Here a decision classifier is used to train such
mails with varying words to filter and identify the words in
the mail as spam. The Decision Tree model is used to
analyze the mails and identify spam mails and block them.
The number of mails sent, content, subject, type whether
reply or forward, language etc. are identified using the
decision classifier like Naves Bayes and analyzed
accordingly to filter the emails.
Keywords: Spam, Phishing, Email, Clustering, Decision
Tree
I. INTRODUCTION
Spam is unsolicited commercial email sent in bulk; it is
considered an intrusive transmission. These bulk messages
often advertise commercial products, but sometimes contain
fraudulent offers and incentives. Due to the nature of
Internet mail, spammers can flood the net with thousands or
even millions of unwanted messages at negligible cost to
themselves; the actual cost is distributed among the
maintainers and users of the net. Their methods are
sometimes devious and unlawful and are designed to
transmit the maximum number of messages at the least
possible cost to them. Unfortunately, these emails impose a
significant burden upon recipients. Due to the dramatic
increase in the volume of spam over the past year, many
email users are searching for solutions to this growing
problem. The huge amount of unwanted email has led to
significant decreases in worker productivity, network
throughput, data storage space, and mail server efficiency.
In large organizations, a considerable portion of the time of
each worker is spent reviewing and deleting the spam itself,
leading to a decrease in productivity. The increased network
traffic has a deleterious effect on network performance, in
general, and on the organization’s mail server(s), in
particular. Also, data storage space is consumed by the need
to store the large volume of mail.
II. PURPOSE OF SPAM
The motivation behind spam is to have information
delivered to the recipient that contains a payload such as
advertising for a (likely worthless, illegal, or non-existent)
product, bait for a fraud scheme, promotion of a cause, or
computer malware designed to hijack the recipient’s
computer. Because it is so cheap to send information, only a
very small fraction of targeted recipients — perhaps one in
ten thousand or fewer — need to receive and respond to the
payload for spam to be profitable to its sender ]. A decade
ago, the mechanism, payload, and purpose of spam were
quite transparent. The majority of spam was sent by ―cottage
industry‖ spammers who merely abused social norms to
promote their wares.
III. ANTI-SPAM TECHNIQUES / APPROACHES
Anti-spam methods can be grouped into a few, fairly well
defined, categories, though only some of these methods are
currently in use. There are two aspects to the response to
spam. The most commonly discussed problem relates to the
ability to distinguish between spam and legitimate email.
For a large percentage of email, the decision is easy, which
can easily be identified by more than half of all email as
either definitely legitimate (white) or definitely spam
(black). It is the rest that is the most difficult to handle. We
call these mails as ―gray mail‖.
The second issue for any comprehensive spam
solution is the proper response to black and gray email. For
confirmed spam, the solution is often easy to simply delete
them. However, there may be instances where other or
additional actions are appropriate. Some possibilities are:
(1) Forward the spam to the abuse department at the
domain of the originator.
(2) Reply to the originator voicing displeasure at
receiving spam.
(3) Reply to the originator to advise them that the
email was not delivered.
(4) Report the spam to a spam gathering station.
This list is not exhaustive and multiple responses
may be appropriate in some situations. For gray mail, the
appropriate response is unclear. Thus, the goal of a
comprehensive anti-spam product is to be able to identify
every email as either white or black with a very high
probability of accuracy.
An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers
(IJSRD/Vol. 2/Issue 09/2014/001)
All rights reserved by www.ijsrd.com 2
IV. IMPLEMENTATION
A. Spam Filter Inputs and Outputs
A spam filter with perfect knowledge might base its decision
on the content of the message, characteristics of the sender
and the target, knowledge as to whether the target or others
consider similar messages to be spam, or the sender to be a
spammer, and so on. But perfect knowledge does not exist
and it is therefore necessary to constrain the filter to use well
defined information sources such as the content of the
message itself, hand-crafted rules either embedded in the
filter or acquired from an external source, or statistical
information derived from feedback to the filter or from
external repositories compiled by third parties.
The desired result from a spam filter is some
indication of whether or not a message is spam. The
simplest result is a binary categorization spam or non-spam
which may be acted upon in various ways by the user or by
the system. We call a filter that returns such a binary
categorization a hard classifier. More commonly, the filter is
required to give some indication of how likely it considers
the message to be spam, either on a continuous scale (e.g., 1
= sure spam; 0= sure non-spam) or on an ordinal categorical
scale (e.g., sure spam, likely spam, unsure, likely non-spam,
sure non-spam). We call such a filter a soft classifier. Many
filters are internally soft classifiers, but compare the soft
classification result to a sensitivity threshold t yielding a
hard classifier. Users may be able to adjust this sensitivity
threshold according to the relative importance they ascribe
to correctly classifying spam vs. correctly classifying non-
spam.
B. Filter
A filter may also be called upon to justify its decision; for
example, by highlighting the features upon which it bases is
classification. The filter may also classify messages into
different genres of spam and good mail. For example, spam
might be advertising, phishing or a Nigerian scam, while
good email might be a personal correspondence, a news
digest or advertising. These genres may be important in
justifying the spam/non-spam classification of a message, as
well in assessing its impact.
The typical use of an email spam filter from the
perspective of a single user. Incoming messages are
processed by the filter one at a time and classified as ham (a
widely used colloquial term for non-spam) or spam. Ham is
directed to the user’s inbox which is read regularly. Spam is
directed to a quarantine file which is irregularly (or) never)
read but may be searched in an attempt to find ham
messages which the filter has misclassified. If the user
discovers filter errors either spam in the inbox or ham in the
quarantine — he or she may report these errors to the filter,
particularly if doing so is easy and he or she feels that doing
so will improve filter performance.
V. EVALUATION
The filter is on-line in that it processes one message at a
time, classifying each in turn before examining the next.
Furthermore, it is passive in that it makes use only of
information at hand when the message is examined. Variants
of this deployment are possible, only some of which have
been systematically investigated:
Batch filtering, in which several messages are
presented to the filter at once for classification. This method
of deployment is atypical in that delivery of messages must
necessarily be delayed to form a batch. Nevertheless, it is
conceivable that filters could make use of information
contained in the batch to classify its members more
accurately than on-line.
Batch training, in which messages may be
classified online, but the classifier’s memory is updated only
periodically. Batch training is common for classifiers that
An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers
(IJSRD/Vol. 2/Issue 09/2014/001)
All rights reserved by www.ijsrd.com 3
involves much computation, or human intervention, in
harnessing new information about spam.
Just-in-time filtering, in which the classification of
messages is driven by client demand. In this deployment a
filter would defer classification until the client opened his or
her mail client, sorting the messages in real-time into inbox
and quarantine.
The vast breadth of the spam ecosystem and
possible abatement techniques render impossible the direct
measurement of these quantities; there are simply too many
parameters for any single evaluation or experiment to
measure all their effects at once. Instead, we make various
simplifying assumptions which hold many of the parameters
constant, and conduct an experiment to measure a quantity
of interest subject to those assumptions. Such experiments
yield valuable insight, particularly if the assumptions are
reasonable and the quantities measured truly illuminate the
question under investigation. The validity of an experiment
may be considered to have two aspects: internal validity and
external validity or generalizability. Internal validity
concerns the veracity of the experimental results under the
test conditions and stated assumptions; external validity
concerns the generalizability of these results to other
situations where the stated assumptions, or hidden
assumptions, may or may not hold. Establishing internal
validity is largely a matter of good experimental design;
establishing external validity involves analysis and repeated
experiments using different assumptions and designs.
VI. RESULTS AND DISCUSSIONS
The time feature is an important task and the proposed
mechanism processes in fewer time frames as shown in the
table given below. It is less than half which means it is
effective by more than 50% of the earlier solutions which is
a heartening factor.
Factor Existing Proposed
Time 0.56 0.125
A. Table for Time Factor to Detect Spam
Chart Showing Time Factor to Detect Spam
Training time for the spam detector was also taken
into account and this was found to take less time for training
and thus the training time was found to be less for the
proposed mechanism. Hence the detection would start
immediately after the training finishes. The below table and
graph illustrates this.
Factor Existing Proposed
Training Time 42 20
B. Table for Training Time Factor
Chart Showing Training Time Factor
The next factor considers accuracy of the spam
detection and this was also monitored and mapped into an
appropriate table and graph which again showed that the
proposed hybrid mechanism was resilient and accurate in
terms of accuracy of detecting spam mails.
Factor Existing Proposed
ACCURACY 65% 94%
Table showing Accuracy of the Spam Detector.
Chart showing Accuracy of the Spam Detector.
Thus the evaluation based on spam for the
proposed model shows the accuracy, training time and
detection time with less filtration techniques and resilient to
spam attacks.
VII. CONCLUSION
In the modern communication world email spam is fast
becoming a serious threat for regular users of emails,
businesses and corporate firms. This study has shown how
to filter the spam that causes trouble by the proposed
suitable hybrid algorithm on the set of emails which consist
of spam and ham mails collected from our sources. Result
analysis on data shows that the proposed method is effective
and also suitable for all mails. It also does not show false
positives or negatives.
This model can be extended in the form of web
services as part of the future enhancements. This will enable
different mail service providers to incorporate the spam
features into their service without actually writing the code
but directly implementing them.
0.56
0.125
0
0.2
0.4
0.6
Existing Proposed
Time
0
50
Existing Proposed
42
20
Training Time in msecs
An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers
(IJSRD/Vol. 2/Issue 09/2014/001)
All rights reserved by www.ijsrd.com 4
REFERENCES
[1] P.Sudhakar, G.Poonkuzhali, K.Thaigarajan,
K.Sarukesi, International Journal of Compuers, Issue
3, Volume 5, 2011, P. 332-345
[2] Almeida T, Yamakami A, Almeida J (2009)
Evaluation of approaches for dimensionality
reduction applied with Naive Bayes anti-spam filters.
In: Proceedings of the 8th IEEE international
conference on machine learning and applications,
Miami, FL, USA, pp 517–
[3] Cormack G (2008) Email spam filtering: a systematic
review. Found Trends InfRetr 1(4):335–455
[4] Machine Learning Techniques in Spam Filtering
Konstantin Tretyakov, kt@ut.ee, Institute of
Computer Science, University of Tartu, Data Mining
Problem-oriented Seminar, MTAT.03.177, May
2004, pp. 60-79.
[5] A Study on Email Spam Filtering Techniques,
Christina V et. all. International Journal of Computer
Applications (0975 – 8887) Volume 12– No.1,
December 2010, pp.07-09

More Related Content

PPT
PPT
Evaluation of Spam Detection and Prevention Frameworks for Email and Image Sp...
PDF
IRJET- Email Spam Detection & Automation
PDF
Spam Filtering
PDF
A Survey: SMS Spam Filtering
PPTX
Email spam detection
PDF
Web Spam Detection Using Machine Learning
PPT
Spam and Anti-spam - Sudipta Bhattacharya
Evaluation of Spam Detection and Prevention Frameworks for Email and Image Sp...
IRJET- Email Spam Detection & Automation
Spam Filtering
A Survey: SMS Spam Filtering
Email spam detection
Web Spam Detection Using Machine Learning
Spam and Anti-spam - Sudipta Bhattacharya

What's hot (20)

PPTX
PDF
A multi layer architecture for spam-detection system
PDF
B0940509
PDF
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
PDF
How an Enterprise SPAM Filter Works
PPT
Spam and Anti Spam Techniques
PDF
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
PPTX
PPTX
Spam filtering with Naive Bayes Algorithm
PPT
Spamming and Spam Filtering
PDF
IRJET- Image Spam Detection: Problem and Existing Solution
PDF
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
PDF
ACO-email spam filtering
PPT
E mail image spam filtering techniques
PDF
Spam Email identification
PPT
What is SPAM?
PPT
E Mail & Spam Presentation
PDF
How to Block NDR Spam
PPTX
The Path to the Inbox Part 2
A multi layer architecture for spam-detection system
B0940509
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
How an Enterprise SPAM Filter Works
Spam and Anti Spam Techniques
Overview of Existing Methods of Spam Mining and Potential Usefulness of Sende...
Spam filtering with Naive Bayes Algorithm
Spamming and Spam Filtering
IRJET- Image Spam Detection: Problem and Existing Solution
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
ACO-email spam filtering
E mail image spam filtering techniques
Spam Email identification
What is SPAM?
E Mail & Spam Presentation
How to Block NDR Spam
The Path to the Inbox Part 2
Ad

Similar to AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS (20)

PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
PDF
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
PDF
A review of spam filtering and measures of antispam
PDF
Identification of Spam Emails from Valid Emails by Using Voting
PDF
Detecting spam mail using machine learning algorithm
PDF
Study of Various Techniques to Filter Spam Emails
PDF
Overview of Anti-spam filtering Techniques
PDF
Detection of Spam in Emails using Machine Learning
PDF
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
PDF
Analysis of an image spam in email based on content analysis
PPTX
Presentation2.pptx
PDF
SPAM FILTERS
PPTX
miniproject.ppt.pptx
PDF
A multi layer architecture for spam-detection system
PDF
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
PDF
Cross breed Spam Categorization Method using Machine Learning Techniques
DOC
Survey on spam filtering
PDF
Prepare black list using bayesian approach to improve performance of spam fil...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
A review of spam filtering and measures of antispam
Identification of Spam Emails from Valid Emails by Using Voting
Detecting spam mail using machine learning algorithm
Study of Various Techniques to Filter Spam Emails
Overview of Anti-spam filtering Techniques
Detection of Spam in Emails using Machine Learning
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
Analysis of an image spam in email based on content analysis
Presentation2.pptx
SPAM FILTERS
miniproject.ppt.pptx
A multi layer architecture for spam-detection system
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
Cross breed Spam Categorization Method using Machine Learning Techniques
Survey on spam filtering
Prepare black list using bayesian approach to improve performance of spam fil...
Ad

More from ijsrd.com (20)

PDF
IoT Enabled Smart Grid
PDF
A Survey Report on : Security & Challenges in Internet of Things
PDF
IoT for Everyday Life
PDF
Study on Issues in Managing and Protecting Data of IOT
PDF
Interactive Technologies for Improving Quality of Education to Build Collabor...
PDF
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
PDF
A Study of the Adverse Effects of IoT on Student's Life
PDF
Pedagogy for Effective use of ICT in English Language Learning
PDF
Virtual Eye - Smart Traffic Navigation System
PDF
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
PDF
Understanding IoT Management for Smart Refrigerator
PDF
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
PDF
A Review: Microwave Energy for materials processing
PDF
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
PDF
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
PDF
Making model of dual axis solar tracking with Maximum Power Point Tracking
PDF
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
PDF
Study and Review on Various Current Comparators
PDF
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
PDF
Defending Reactive Jammers in WSN using a Trigger Identification Service.
IoT Enabled Smart Grid
A Survey Report on : Security & Challenges in Internet of Things
IoT for Everyday Life
Study on Issues in Managing and Protecting Data of IOT
Interactive Technologies for Improving Quality of Education to Build Collabor...
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
A Study of the Adverse Effects of IoT on Student's Life
Pedagogy for Effective use of ICT in English Language Learning
Virtual Eye - Smart Traffic Navigation System
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
Understanding IoT Management for Smart Refrigerator
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
A Review: Microwave Energy for materials processing
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
Making model of dual axis solar tracking with Maximum Power Point Tracking
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
Study and Review on Various Current Comparators
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
Defending Reactive Jammers in WSN using a Trigger Identification Service.

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Insiders guide to clinical Medicine.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Institutional Correction lecture only . . .
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Supply Chain Operations Speaking Notes -ICLT Program
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O5-L3 Freight Transport Ops (International) V1.pdf
Pre independence Education in Inndia.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
Sports Quiz easy sports quiz sports quiz
Insiders guide to clinical Medicine.pdf
VCE English Exam - Section C Student Revision Booklet
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pharma ospi slides which help in ospi learning
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
Supply Chain Operations Speaking Notes -ICLT Program

AN ANALYSIS OF EFFECTIVE ANTI SPAM PROTOCOL USING DECISION TREE CLASSIFIERS

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 2, Issue 09, 2014 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 1 An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers Sarasu.S 1 Sathees Kumar.B2 1 BCA, MSc., MPhil 2 MCA, MPhil, SET, Assistant Professor Abstract— As the internet usage increases in day to day activities, there is an inherent corresponding increase in usage of communication through it with email being the mainstay or rather in the forefront of modern day communication methodologies for businesses and general persons as well. This has led to get customer attention in the form of unwanted and unsolicited bombarding of the customers mail accounts with advertisements, offers, phishing activities, viruses, worms, trojans, generating hate crimes, making the customer to part with sensitive information like passwords, and other media as well which is known as spam. Spam is mass mailing or flooding of mail account servers with unwanted trash data causing damage some times. Spam filters have been in use from the time such mail flooding happens. Most of the spam filters are manual meaning which the user after identifying a mail in his account blocks the sender and henceforth the system will not allow mails to the inbox from such addresses. However the spammers are resilient and send spam mails from different identities and flood the inboxes. This study focuses on algorithms and data mining techniques used to unearth spam mails. They filter the inbox mails as they arrive at the server depending on certain rules which are already defined known as supervised learning methods. Such technologies are known as knowledge engineering techniques. Here a decision classifier is used to train such mails with varying words to filter and identify the words in the mail as spam. The Decision Tree model is used to analyze the mails and identify spam mails and block them. The number of mails sent, content, subject, type whether reply or forward, language etc. are identified using the decision classifier like Naves Bayes and analyzed accordingly to filter the emails. Keywords: Spam, Phishing, Email, Clustering, Decision Tree I. INTRODUCTION Spam is unsolicited commercial email sent in bulk; it is considered an intrusive transmission. These bulk messages often advertise commercial products, but sometimes contain fraudulent offers and incentives. Due to the nature of Internet mail, spammers can flood the net with thousands or even millions of unwanted messages at negligible cost to themselves; the actual cost is distributed among the maintainers and users of the net. Their methods are sometimes devious and unlawful and are designed to transmit the maximum number of messages at the least possible cost to them. Unfortunately, these emails impose a significant burden upon recipients. Due to the dramatic increase in the volume of spam over the past year, many email users are searching for solutions to this growing problem. The huge amount of unwanted email has led to significant decreases in worker productivity, network throughput, data storage space, and mail server efficiency. In large organizations, a considerable portion of the time of each worker is spent reviewing and deleting the spam itself, leading to a decrease in productivity. The increased network traffic has a deleterious effect on network performance, in general, and on the organization’s mail server(s), in particular. Also, data storage space is consumed by the need to store the large volume of mail. II. PURPOSE OF SPAM The motivation behind spam is to have information delivered to the recipient that contains a payload such as advertising for a (likely worthless, illegal, or non-existent) product, bait for a fraud scheme, promotion of a cause, or computer malware designed to hijack the recipient’s computer. Because it is so cheap to send information, only a very small fraction of targeted recipients — perhaps one in ten thousand or fewer — need to receive and respond to the payload for spam to be profitable to its sender ]. A decade ago, the mechanism, payload, and purpose of spam were quite transparent. The majority of spam was sent by ―cottage industry‖ spammers who merely abused social norms to promote their wares. III. ANTI-SPAM TECHNIQUES / APPROACHES Anti-spam methods can be grouped into a few, fairly well defined, categories, though only some of these methods are currently in use. There are two aspects to the response to spam. The most commonly discussed problem relates to the ability to distinguish between spam and legitimate email. For a large percentage of email, the decision is easy, which can easily be identified by more than half of all email as either definitely legitimate (white) or definitely spam (black). It is the rest that is the most difficult to handle. We call these mails as ―gray mail‖. The second issue for any comprehensive spam solution is the proper response to black and gray email. For confirmed spam, the solution is often easy to simply delete them. However, there may be instances where other or additional actions are appropriate. Some possibilities are: (1) Forward the spam to the abuse department at the domain of the originator. (2) Reply to the originator voicing displeasure at receiving spam. (3) Reply to the originator to advise them that the email was not delivered. (4) Report the spam to a spam gathering station. This list is not exhaustive and multiple responses may be appropriate in some situations. For gray mail, the appropriate response is unclear. Thus, the goal of a comprehensive anti-spam product is to be able to identify every email as either white or black with a very high probability of accuracy.
  • 2. An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers (IJSRD/Vol. 2/Issue 09/2014/001) All rights reserved by www.ijsrd.com 2 IV. IMPLEMENTATION A. Spam Filter Inputs and Outputs A spam filter with perfect knowledge might base its decision on the content of the message, characteristics of the sender and the target, knowledge as to whether the target or others consider similar messages to be spam, or the sender to be a spammer, and so on. But perfect knowledge does not exist and it is therefore necessary to constrain the filter to use well defined information sources such as the content of the message itself, hand-crafted rules either embedded in the filter or acquired from an external source, or statistical information derived from feedback to the filter or from external repositories compiled by third parties. The desired result from a spam filter is some indication of whether or not a message is spam. The simplest result is a binary categorization spam or non-spam which may be acted upon in various ways by the user or by the system. We call a filter that returns such a binary categorization a hard classifier. More commonly, the filter is required to give some indication of how likely it considers the message to be spam, either on a continuous scale (e.g., 1 = sure spam; 0= sure non-spam) or on an ordinal categorical scale (e.g., sure spam, likely spam, unsure, likely non-spam, sure non-spam). We call such a filter a soft classifier. Many filters are internally soft classifiers, but compare the soft classification result to a sensitivity threshold t yielding a hard classifier. Users may be able to adjust this sensitivity threshold according to the relative importance they ascribe to correctly classifying spam vs. correctly classifying non- spam. B. Filter A filter may also be called upon to justify its decision; for example, by highlighting the features upon which it bases is classification. The filter may also classify messages into different genres of spam and good mail. For example, spam might be advertising, phishing or a Nigerian scam, while good email might be a personal correspondence, a news digest or advertising. These genres may be important in justifying the spam/non-spam classification of a message, as well in assessing its impact. The typical use of an email spam filter from the perspective of a single user. Incoming messages are processed by the filter one at a time and classified as ham (a widely used colloquial term for non-spam) or spam. Ham is directed to the user’s inbox which is read regularly. Spam is directed to a quarantine file which is irregularly (or) never) read but may be searched in an attempt to find ham messages which the filter has misclassified. If the user discovers filter errors either spam in the inbox or ham in the quarantine — he or she may report these errors to the filter, particularly if doing so is easy and he or she feels that doing so will improve filter performance. V. EVALUATION The filter is on-line in that it processes one message at a time, classifying each in turn before examining the next. Furthermore, it is passive in that it makes use only of information at hand when the message is examined. Variants of this deployment are possible, only some of which have been systematically investigated: Batch filtering, in which several messages are presented to the filter at once for classification. This method of deployment is atypical in that delivery of messages must necessarily be delayed to form a batch. Nevertheless, it is conceivable that filters could make use of information contained in the batch to classify its members more accurately than on-line. Batch training, in which messages may be classified online, but the classifier’s memory is updated only periodically. Batch training is common for classifiers that
  • 3. An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers (IJSRD/Vol. 2/Issue 09/2014/001) All rights reserved by www.ijsrd.com 3 involves much computation, or human intervention, in harnessing new information about spam. Just-in-time filtering, in which the classification of messages is driven by client demand. In this deployment a filter would defer classification until the client opened his or her mail client, sorting the messages in real-time into inbox and quarantine. The vast breadth of the spam ecosystem and possible abatement techniques render impossible the direct measurement of these quantities; there are simply too many parameters for any single evaluation or experiment to measure all their effects at once. Instead, we make various simplifying assumptions which hold many of the parameters constant, and conduct an experiment to measure a quantity of interest subject to those assumptions. Such experiments yield valuable insight, particularly if the assumptions are reasonable and the quantities measured truly illuminate the question under investigation. The validity of an experiment may be considered to have two aspects: internal validity and external validity or generalizability. Internal validity concerns the veracity of the experimental results under the test conditions and stated assumptions; external validity concerns the generalizability of these results to other situations where the stated assumptions, or hidden assumptions, may or may not hold. Establishing internal validity is largely a matter of good experimental design; establishing external validity involves analysis and repeated experiments using different assumptions and designs. VI. RESULTS AND DISCUSSIONS The time feature is an important task and the proposed mechanism processes in fewer time frames as shown in the table given below. It is less than half which means it is effective by more than 50% of the earlier solutions which is a heartening factor. Factor Existing Proposed Time 0.56 0.125 A. Table for Time Factor to Detect Spam Chart Showing Time Factor to Detect Spam Training time for the spam detector was also taken into account and this was found to take less time for training and thus the training time was found to be less for the proposed mechanism. Hence the detection would start immediately after the training finishes. The below table and graph illustrates this. Factor Existing Proposed Training Time 42 20 B. Table for Training Time Factor Chart Showing Training Time Factor The next factor considers accuracy of the spam detection and this was also monitored and mapped into an appropriate table and graph which again showed that the proposed hybrid mechanism was resilient and accurate in terms of accuracy of detecting spam mails. Factor Existing Proposed ACCURACY 65% 94% Table showing Accuracy of the Spam Detector. Chart showing Accuracy of the Spam Detector. Thus the evaluation based on spam for the proposed model shows the accuracy, training time and detection time with less filtration techniques and resilient to spam attacks. VII. CONCLUSION In the modern communication world email spam is fast becoming a serious threat for regular users of emails, businesses and corporate firms. This study has shown how to filter the spam that causes trouble by the proposed suitable hybrid algorithm on the set of emails which consist of spam and ham mails collected from our sources. Result analysis on data shows that the proposed method is effective and also suitable for all mails. It also does not show false positives or negatives. This model can be extended in the form of web services as part of the future enhancements. This will enable different mail service providers to incorporate the spam features into their service without actually writing the code but directly implementing them. 0.56 0.125 0 0.2 0.4 0.6 Existing Proposed Time 0 50 Existing Proposed 42 20 Training Time in msecs
  • 4. An Analysis of Effective Anti Spam Protocol Using Decision Tree Classifiers (IJSRD/Vol. 2/Issue 09/2014/001) All rights reserved by www.ijsrd.com 4 REFERENCES [1] P.Sudhakar, G.Poonkuzhali, K.Thaigarajan, K.Sarukesi, International Journal of Compuers, Issue 3, Volume 5, 2011, P. 332-345 [2] Almeida T, Yamakami A, Almeida J (2009) Evaluation of approaches for dimensionality reduction applied with Naive Bayes anti-spam filters. In: Proceedings of the 8th IEEE international conference on machine learning and applications, Miami, FL, USA, pp 517– [3] Cormack G (2008) Email spam filtering: a systematic review. Found Trends InfRetr 1(4):335–455 [4] Machine Learning Techniques in Spam Filtering Konstantin Tretyakov, kt@ut.ee, Institute of Computer Science, University of Tartu, Data Mining Problem-oriented Seminar, MTAT.03.177, May 2004, pp. 60-79. [5] A Study on Email Spam Filtering Techniques, Christina V et. all. International Journal of Computer Applications (0975 – 8887) Volume 12– No.1, December 2010, pp.07-09