SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 451
FALSE POSITIVE REDUCTION BY COMBINING SVM AND KNN
ALGO
Sushil Kumar Mishra1
, Pankaj Bhatt2
1
PG Student, Computer Science Engineering, Graphic Era Hill University, Uttarakhand, India
2
PG Student, Computer Science Engineering, Graphic Era Hill University, Uttarakhand, India
Abstract
With the growth of information technology. There emerges many intrusion detection problem such as cyber security. Intrusion
detection system provides basic infrastructure to detect a number of attacks. This research work focuses on intrusion detection
problem of network security. The main goal is to detect network behaviour as normal or abnormal. In this research work, two
different machine learning algorithm have been combined together to reduce its weakness and takes positive feature of both
algorithm. Its experimental results generates better result than other algorithm in terms of performance, accuracy and false
positive rate. These combined algorithm has been applied on KDDCUP99 dataset to find better result by improving its
performance, accuracy and reducing its false positive rate.
Keywords: Intrusion detection system, KDDCUP99 dataset, False positive rate.
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
In this century, Information security is a most menacing
problem. For handling these problem, many intrusion
detection method has been introduced but no one is perfect.
Intrusion detection system can provide protection for a
computer network from malicious files such as virus,
spyware and torjan horse. In which many computers are
interconnected. An intrusion detection system can monitor
the behaviour of all files those are coming in that computer
network. If any file is suspicious or malicious. So Intrusion
detection system can detect that malicious file or virus.
Intrusion detection system has created many clustering
based models separate normal and abnormal files. Intrusion
detection system can be used for neural network also to
provide security for computer network. Neural network first
uses trained dataset to recognize normal as well as abnormal
activity. Intrusion detection system protects a network
traffics from malicious files. It basically maintains
confidentiality and integrity of computer network. Any
unauthorized access of any personal data can not be made
possibled. So secrecy of network traffic and information
can be well maintained. Intrusion detection system can only
takes preventive majors to protect a computer network. No
intrusion detection system (IDS) is perfect to protect a
computer network. A very deep research work is going on
intrusion detection system to develop a such system that can
fully provide protection for a network traffic or a computer
network. In this research work, support vector machine
(SVM) basically creates clustering model. Which contains
normal as well as abnormal data. Which can monitor normal
as well as malicious behaviour to protect a computer
network from any malicious attack such as virus , worms,
torjan horse, rootkits attacks.
Intrusion detection system has been divided into two parts.
Fig. 1 Types of IDS
1.1 Anomaly Based Detection
Anomaly based intrusion detection system is based on a set
of heuristic rule. Which basically monitors a normal as well
as abnormal behaviour in a computer network. If any file is
self replicating in nature or trying to damage any other file,
such behaviours are detected by anomaly based detection.
The main disadvantage of anomaly based detection system
is higher false positive rate.
1.2 Signature Based Detection
Signature based intrusion detection system can detect only
known computer virus in a computer network. The computer
virus, those are discovered. Its signatures are created. These
signatures are stored in database. If any file comes in a
computer network. So its signatures are matched with all
file. If file matches with virus signature so it is declared a
computer virus otherwise a normal file. The main
disadvantage of signature based intrusion detection system
is that it can not detect a new computer virus.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 452
2. EXPERIMENTAL PARAMETERS
There are many parameters such performance, accuracy and
false positive rate, that can be calculated for intrusion
detection system.
Performance : Performance deals with achieving a target in
more efficient manner.
Performance = (True Positive)/(True Positive)+(True
Negative)
Accuracy : Accuracy deals with achieving a goal more
close to its actual value.
Accuracy = (True positive+True Negative)/(True
Positive+True Negative+False positive+False Negative).
False positive rate : Falsely detect a normal file as
abnormal file.
False positive rate =(False Positive)/(False Positive+True
Negative).
3. EVALUATION DATA SOURCES
False positive rate was calculated by the standard data set
KDDCUP99 given by the MIT laboratory. In this data set,
there are different types of attacks. Those may categorize
normal as well as abnormal data.
MIT Lincoln laboratory basically establishes a computer
network. About 7 days, monitors network traffic. Which
contains normal as well as abnormal data.
KDDCUP99 data set basically contains normal, denial of
service, buffer overflow, guess_passwd(53) and probe
attacks.
Denial of service : Denial of service (DOS) intrusion is an
intrusion. In which , legitmate information can not be make
available to legitmate receiver. DOS intrusion also slows
down computer system.
User to Root(U2R) : In this type of attack, attacker accesses
client’s password in unauthorized manner and can access
personal information or secret information from computer
system by using stolen password.
Remote to User(R2U) : In this attack, attacker can transmit
a packet over network. Which is not legitmate for that
network. Which increases network traffic. Remote to
user(R2U) can adversly affacts performance of that
computer network and can slow down computer system or
can restart a computer system again and again.
Probe : In this attack, attacker monitors all information.
Which are being sent in that network and can access it.
4. COMBINING SVM AND KNN ALGORITHM
Support vector machine(SVM) is a supervised learning
method for classification. In which, a hyperplane is created
through which a normal as well as abnormal data is
separated from each other. Support vector machine(SVM)
basically contains two phases-
1- Training phase
2- Testing phase
1-Training phase : Support vector machine(SVM) is able
to learn a huge set of pattern from dataset. In the dataset,
there are various kind of homogeneous pattern and
heterogeneous pattern of data . That can provide better
classification between normal and abnormal data.
2-Testing phase : By using training phases, Testing can be
done by support vector machine. Support vector machine
can evaluate accuracy, performance etc.
Support vector machine can evaluate false positive rate but
it generates very high false positive rate.
K nearest neighbor algorithm is basically a machine
learning algorithm. Which can be used to solve traveling
salesman problem.
By using K nearest neighbor algorithm, false positive rate
can be evaluated but it gives higher false positive rate.
Fig. 2: Intrusion detection system using CSVMKNN
Support vector machine(SVM) basically uses support
vectors to create a hyperplane. Hyperplane is used to
separate normal and abnormal data. Knn algorithm is used
to find new data added to training data set.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 453
so here, Support vector machine(SVM) and K nearest
neighbor (KNN) algorithms are combined together to
evaluate false positive rate is known as COMBINED
SUPPORT VECTOR K NEAREST NEIGHBOR
(CSVMKNN) algorithm. CSVMKNN algorithm is a
mixture of support vector machine (SVM) and K nearest
neighbor (KNN) algorithm. These two algorithm works
together in CSVMKNN algorithm. In which, support vector
machine (SVM) uses training data set to learn something
from data set. If any new is added to its dataset. so it is
updated by K nearest neighbor (KNN) algorithm.
CSVMKNN algorithm can be used as support vector
machine (SVM) and K nearest neighbor (KNN) algorithm to
evaluate false positive rate or false alarm rate. False positive
rate evaluated by using CSVMKNN algorithm, Can
produce better result. CSVMKNN algorithm is applied on
KDDCUP99 data set. This data set contains several type of
attack such as buffer overflow, Denial of service (DOS) etc.
CSVMKNN algorithm generates false positive rate. Which
is better than Support vector machine (SVM) and K nearest
neighbor (KNN) algorithm.
5. CSVMKNN ALGORITHM
Algorithm1 : SVM with KNN clustering
Input: Use training data set containing normal and
abnormal data (Class type).
Output: Generate SVM classifier.
1 start
2 select data from different class;
3 Separate normal and abnormal data by SVM classifier;
4 While number of iteration to add data to data set
5 Use support vector to create hyperplane;
6 Hyperplane separate normal and abnormal data;
7 Apply KNN clustering
8 KNN clustering classified normal and abnormal cluster.
9 If new data added to data set
10 update dataset;
11 else
12 Continues it as it;
13 end.
After this algorithm, SVM learning process is applied on
data set. Its main goal is to randomly choose data points
from KDDCUP99 data set. Hyperplane is used to separate
normal and abnormal data points. So there must be a
separate hyperplane between each training data points. So it
can provide a better selection method for each data points.
Support vector machine (SVM) training phase should be
introduced. In which. Hyperplane can allocate between
each data points. KNN clustering phase is introduced to
separate normal data and abnormal data. If new data is
added to training data set. So by using K nearest neighbor
(KNN) clustering phase, these new added data can be
updated to training data set. So these strategy is carried out
in next algorithm.
Algorithm2:
Input: Training data set (KDDCUP99).
Input: S1-Number of iteration.
Input: S2-Maximum detection rate.
Input: S3-Minimum detection rate.
Output: Support vector machine(SVM) and K nearest
neighbor (KNN) Classifier.
1 Start
2 initialize the data;
3 Let S2 is maximum detection rate, initially zero;
4 Let S3 is minimum detection rate, initially Zero
5 While S3<S2
6 initialize i=0;
7 for i=1,……..,….S1
8 Training phase :
9 Support vector machine (SVM) training phase;
10 Clustering Phase :
11 K nearest neighbor (KNN) clustering phase;
12 end
13 Use Support vector machine(SVM) Classifier;
14 Use hyperplane to separate normal and abnormal data;
15 if new data is added to data set ;
15 Use Knn algorithm to update S2;
16 Update learning process;
17 else
18 continue it as it:
19 end
20 end
The KNN clustering phase is used for better selection
strategy. False positive can be decreased by using
CSVMKNN algorithm. If new added data is declared as
normal. Otherwise, it increases its true positive rate. Which
basically adversely affacts performance and accuracy. In
SVM training phase, if new data is declared as abnormal
but in KNN clustering phase, it is declared as normal. So
such new data is declared a new kind of intrusion. In SVM
training phase, if new data is added to training data set ,
declared as normal and in KNN clustering phase, it is again
declared as normal. So such data decreases false positive
rate or false alarm rate. It increases performance and
accuracy of that machine learning algorithm.
Combined support vector machine k nearest neighbor
(CSVMKNN) algorithm basically provides better selection
strategy than support vector machine (SVM) and K nearest
neighbor (KNN) algorithm. CSVMKNN algorithm takes
positive features of support vector machine (SVM)
algorithm and K nearest neighbor (KNN) algorithm and
avoids weakness of Support vector machine (SVM)
algorithm and K nearest neighbor (KNN) algorithm.
CSVMKNN algorithm reduces false positive rate of its
algorithm by using better selection strategy and improves
performance of machine learning (CSVMKNN) algorithm.
So, CVMKNN algorithm generates lesser false positive
rate than support vector machine (SVM) algorithm and K
nearest neighbor algorithm (KNN) algorithm. CSVMKNN
algorithm can produce higher performance and accuracy
than support vector machine (SVM) and K nearest neighbor
(KNN) algorithm.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 454
6. RESULTS
Support vector machine (SVM) algorithm, KNN nearest
neighbor (KNN) algorithm and CSVMKNN algorithm are
applied on training data set (KDDCUP99). Through which ,
false positive rate can be calculated. These false positive
rate will be compared to determine. Which algorithm has
generated lesser false positive rate
Support vector machine (SVM) classifier: SVM classifier
is used to create a hyperplane between different data points
by using support vector . These hyperplane is used to
separate normal and abnormal data. On the basis of this, we
can evaluate performance, accuracy, false positive rate.
Class Normal Denial
Of
service
User
To
Root
Remote
To
User
Probe
Normal 900 7 8 1 0
Denial
Of
service
3 345 0 2 11
User
To
Root
400 0 0 0 10
Remote
To
User
345 0 41 34 0
Probe 127 100 0 10 0
Fig-3 SVM classifier
K nearest neighbor (KNN) classifier is used to discover
new data added to training data set. KNN classifier also
determines that new added data is normal or abnormal.
KNN algorithm is applied on KDDCUP99 data set to
evaluate performance, accuracy and false positive rate.
Class Normal Denial
Of
service
User
To
Root
Remote
To
User
Probe
Normal 928 1 5 0 1
Denial
Of
service
0 45 0 200 1
User
To
Root
4 3 6 5 0
Remote
To
User
0 0 412 234 15
Probe 1 4 0 0 23
Fig-4 KNN classifier
CSVMKNN classifier basically contains feature of both
algorithm support vector machine (SVM) and K nearest
neighbor (KNN) algorithm. CSVMKNN algorithm is
applied on KDDCUP99 dataset to generate its performance,
accuracy, false positive rate.
Class Normal Denial
Of
service
User
To
Root
Remote
To
User
Probe
Normal 100 0 8 9 70
Denial
Of
service
30 35 0 0 89
User
To
Root
0 0 0 50 0
Remote
To
User
0 0 0 24 0
Probe 1 4 0 0 0
Fig-5 CSVMKNN Classifier
Evaluation
Measure
SVM KNN CSVMKNN
False
positive Rate
12.00 11.00 6.00
False
Negative
Rate
26.00 6.00 0.89
Performance 8.00 9.00 14.50
Accuracy 7.50 3.50 16.00
Fig-6 Comparison of false positive rate
CSVMKNN algorithm generates lesser false positive rate
than Support vector machine (SVM) and K nearest neighbor
(KNN) algorithm.
7. CONCLUSION
In this research work, Support vector machine (SVM)
algorithm, K nearest neighbor (KNN) algorithm and
CSVMKNN algorithm have been applied on KDDCUP99
data set separately. In which CSVMKNN algorithm has
generated lower false positive rate than SVM and KNN
algorithm. CSVMKNN algorithm has enhanced
performance , accuracy and higher detection rate than other
machine learning algorithm. Still , there is area of
improvement in this algorithm until we are not getting zero
false positive rate.
REFERENCES
[1]. pgale, Robert, Sheodoor schote, rengin and
Christopher kruegel.”A Literature analysis on automated
malware analysis technique”
[2]. Pargas, Rob Jonathan jarcy, Eleazar Aguirre Anaya ,
Samon Galeana Huerta and Alba Felix Moreno
Hernandez,"Security controls for Android" In
Computational Aspects of Social Networks (CASoN), 2012
Fourth International Conference on, pp.212-216,IEEE,2012
[3]. Blasing, Thomas, Leonid Batyuk, A-D.Schmidt, Seyit
Ahmet Camtepe, and Sahin Albayrak." An android
application sandbox system for suspicious software
detection" In Malicious and Unwanted Software
(MALWARE), 2010 5th
International Conference on ,pp.
55-62 IEEE, 2010.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 455
[4]. Johnson Ryan, Zhaohui Wang , Corey Gagnon and
Angelos Stavrou." Analysis of Android Applications'
Permissions. " In Software Security and Reliability
Companion(SERE-C),2012 IEEE Sixth International
Conference on, pp. 45 - 46.IEEE,2012.
[5]. Susan M. B. and Rayford B.V. (2000). Intrusion
detection via fuzzy data mining, Proceedings of the 12th
Annual Canadian Information Technology,Ottawa, Canada,
June 19-23, 2000, PP.109-122.
[6]. A Detailed Analysis of the KDD CUP 99 Data Set,
Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A.
BIOGRAPHIES
Sushil kumar Mishra is a M.tech student
and doing research work in computer
security
Pankaj Bhatt is pursuing M.tech and
doing research work in computer security.

More Related Content

PDF
IRJET- Improving Cyber Security using Artificial Intelligence
PDF
Evaluation of network intrusion detection using markov chain
PDF
An approach for ids by combining svm and ant colony algorithm
PDF
An approach for ids by combining svm and ant colony algorithm
PDF
IRJET- Proximity Detection Warning System using Ray Casting
PDF
An efficient intrusion detection using relevance vector machine
PDF
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
PDF
IRJET- Security in Ad-Hoc Network using Encrypted Data Transmission and S...
IRJET- Improving Cyber Security using Artificial Intelligence
Evaluation of network intrusion detection using markov chain
An approach for ids by combining svm and ant colony algorithm
An approach for ids by combining svm and ant colony algorithm
IRJET- Proximity Detection Warning System using Ray Casting
An efficient intrusion detection using relevance vector machine
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
IRJET- Security in Ad-Hoc Network using Encrypted Data Transmission and S...

What's hot (20)

PDF
IRJET- Review on “Using Big Data to Defend Machines against Network Attacks”
PDF
IRJET- Implementation of Artificial Intelligence Methods to Curb Cyber Assaul...
PDF
Machine learning in network security using knime analytics
PDF
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
PDF
IRJET - Securing Computers from Remote Access Trojans using Deep Learning...
PDF
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...
PDF
Icacci presentation-cnn intrusion
PDF
Review on Intrusion Detection in MANETs
PDF
Online Intrusion Alert Aggregation with Generative Data Stream Modeling
PDF
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
PDF
A BAYESIAN CLASSIFICATION ON ASSET VULNERABILITY FOR REAL TIME REDUCTION OF F...
PDF
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...
PDF
Volume 2-issue-6-2190-2194
PDF
Review of Intrusion and Anomaly Detection Techniques
PDF
IRJET- A Secured Method of Data Aggregation for Wireless Sensor Networks in t...
PDF
IRJET- Review on Network Intrusion Detection using Recurrent Neural Network A...
DOCX
Msc dare journal 1
PDF
An Investigation into the Effectiveness of Machine Learning Techniques for In...
PDF
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
PDF
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
IRJET- Review on “Using Big Data to Defend Machines against Network Attacks”
IRJET- Implementation of Artificial Intelligence Methods to Curb Cyber Assaul...
Machine learning in network security using knime analytics
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
IRJET - Securing Computers from Remote Access Trojans using Deep Learning...
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...
Icacci presentation-cnn intrusion
Review on Intrusion Detection in MANETs
Online Intrusion Alert Aggregation with Generative Data Stream Modeling
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
A BAYESIAN CLASSIFICATION ON ASSET VULNERABILITY FOR REAL TIME REDUCTION OF F...
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...
Volume 2-issue-6-2190-2194
Review of Intrusion and Anomaly Detection Techniques
IRJET- A Secured Method of Data Aggregation for Wireless Sensor Networks in t...
IRJET- Review on Network Intrusion Detection using Recurrent Neural Network A...
Msc dare journal 1
An Investigation into the Effectiveness of Machine Learning Techniques for In...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
Ad

Similar to False positive reduction by combining svm and knn algo (20)

DOC
Intrusion detection and anomaly detection system using sequential pattern mining
DOC
Intrusion detection and anomaly detection system using sequential pattern mining
PDF
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
PDF
Secure intrusion detection and countermeasure selection in virtual system usi...
PDF
A Study on Data Mining Based Intrusion Detection System
PDF
Intrusion Detection for HealthCare Network using Machine Learning
PDF
NSAS: NETWORK SECURITY AWARENESS SYSTEM
PDF
An Extensive Survey of Intrusion Detection Systems
PDF
Vulnerability Management System
PDF
A combined approach to search for evasion techniques in network intrusion det...
PDF
MACHINE LEARNING AND DEEP LEARNING MODEL-BASED DETECTION OF IOT BOTNET ATTACKS.
PDF
IRJET - A Secure Approach for Intruder Detection using Backtracking
PDF
Volume 2-issue-6-2190-2194
PDF
A Study on Data Mining Based Intrusion Detection System
PDF
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
PDF
Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...
PDF
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
PDF
Single sign on mechanism for distributed computing
PDF
Kx3419591964
PDF
Alert Analysis using Fuzzy Clustering and Artificial Neural Network
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
Secure intrusion detection and countermeasure selection in virtual system usi...
A Study on Data Mining Based Intrusion Detection System
Intrusion Detection for HealthCare Network using Machine Learning
NSAS: NETWORK SECURITY AWARENESS SYSTEM
An Extensive Survey of Intrusion Detection Systems
Vulnerability Management System
A combined approach to search for evasion techniques in network intrusion det...
MACHINE LEARNING AND DEEP LEARNING MODEL-BASED DETECTION OF IOT BOTNET ATTACKS.
IRJET - A Secure Approach for Intruder Detection using Backtracking
Volume 2-issue-6-2190-2194
A Study on Data Mining Based Intrusion Detection System
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...
A PHASED APPROACH TO INTRUSION DETECTION IN NETWORK
Single sign on mechanism for distributed computing
Kx3419591964
Alert Analysis using Fuzzy Clustering and Artificial Neural Network
Ad

More from eSAT Journals (20)

PDF
Mechanical properties of hybrid fiber reinforced concrete for pavements
PDF
Material management in construction – a case study
PDF
Managing drought short term strategies in semi arid regions a case study
PDF
Life cycle cost analysis of overlay for an urban road in bangalore
PDF
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
PDF
Laboratory investigation of expansive soil stabilized with natural inorganic ...
PDF
Influence of reinforcement on the behavior of hollow concrete block masonry p...
PDF
Influence of compaction energy on soil stabilized with chemical stabilizer
PDF
Geographical information system (gis) for water resources management
PDF
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
PDF
Factors influencing compressive strength of geopolymer concrete
PDF
Experimental investigation on circular hollow steel columns in filled with li...
PDF
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
PDF
Evaluation of punching shear in flat slabs
PDF
Evaluation of performance of intake tower dam for recent earthquake in india
PDF
Evaluation of operational efficiency of urban road network using travel time ...
PDF
Estimation of surface runoff in nallur amanikere watershed using scs cn method
PDF
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
PDF
Effect of variation of plastic hinge length on the results of non linear anal...
PDF
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Mechanical properties of hybrid fiber reinforced concrete for pavements
Material management in construction – a case study
Managing drought short term strategies in semi arid regions a case study
Life cycle cost analysis of overlay for an urban road in bangalore
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of compaction energy on soil stabilized with chemical stabilizer
Geographical information system (gis) for water resources management
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Factors influencing compressive strength of geopolymer concrete
Experimental investigation on circular hollow steel columns in filled with li...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Evaluation of punching shear in flat slabs
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of operational efficiency of urban road network using travel time ...
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of use of recycled materials on indirect tensile strength of asphalt c...

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT 4 Total Quality Management .pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
OOP with Java - Java Introduction (Basics)
bas. eng. economics group 4 presentation 1.pptx
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Internet of Things (IOT) - A guide to understanding
Sustainable Sites - Green Building Construction
web development for engineering and engineering
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT 4 Total Quality Management .pptx

False positive reduction by combining svm and knn algo

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 451 FALSE POSITIVE REDUCTION BY COMBINING SVM AND KNN ALGO Sushil Kumar Mishra1 , Pankaj Bhatt2 1 PG Student, Computer Science Engineering, Graphic Era Hill University, Uttarakhand, India 2 PG Student, Computer Science Engineering, Graphic Era Hill University, Uttarakhand, India Abstract With the growth of information technology. There emerges many intrusion detection problem such as cyber security. Intrusion detection system provides basic infrastructure to detect a number of attacks. This research work focuses on intrusion detection problem of network security. The main goal is to detect network behaviour as normal or abnormal. In this research work, two different machine learning algorithm have been combined together to reduce its weakness and takes positive feature of both algorithm. Its experimental results generates better result than other algorithm in terms of performance, accuracy and false positive rate. These combined algorithm has been applied on KDDCUP99 dataset to find better result by improving its performance, accuracy and reducing its false positive rate. Keywords: Intrusion detection system, KDDCUP99 dataset, False positive rate. --------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION In this century, Information security is a most menacing problem. For handling these problem, many intrusion detection method has been introduced but no one is perfect. Intrusion detection system can provide protection for a computer network from malicious files such as virus, spyware and torjan horse. In which many computers are interconnected. An intrusion detection system can monitor the behaviour of all files those are coming in that computer network. If any file is suspicious or malicious. So Intrusion detection system can detect that malicious file or virus. Intrusion detection system has created many clustering based models separate normal and abnormal files. Intrusion detection system can be used for neural network also to provide security for computer network. Neural network first uses trained dataset to recognize normal as well as abnormal activity. Intrusion detection system protects a network traffics from malicious files. It basically maintains confidentiality and integrity of computer network. Any unauthorized access of any personal data can not be made possibled. So secrecy of network traffic and information can be well maintained. Intrusion detection system can only takes preventive majors to protect a computer network. No intrusion detection system (IDS) is perfect to protect a computer network. A very deep research work is going on intrusion detection system to develop a such system that can fully provide protection for a network traffic or a computer network. In this research work, support vector machine (SVM) basically creates clustering model. Which contains normal as well as abnormal data. Which can monitor normal as well as malicious behaviour to protect a computer network from any malicious attack such as virus , worms, torjan horse, rootkits attacks. Intrusion detection system has been divided into two parts. Fig. 1 Types of IDS 1.1 Anomaly Based Detection Anomaly based intrusion detection system is based on a set of heuristic rule. Which basically monitors a normal as well as abnormal behaviour in a computer network. If any file is self replicating in nature or trying to damage any other file, such behaviours are detected by anomaly based detection. The main disadvantage of anomaly based detection system is higher false positive rate. 1.2 Signature Based Detection Signature based intrusion detection system can detect only known computer virus in a computer network. The computer virus, those are discovered. Its signatures are created. These signatures are stored in database. If any file comes in a computer network. So its signatures are matched with all file. If file matches with virus signature so it is declared a computer virus otherwise a normal file. The main disadvantage of signature based intrusion detection system is that it can not detect a new computer virus.
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 452 2. EXPERIMENTAL PARAMETERS There are many parameters such performance, accuracy and false positive rate, that can be calculated for intrusion detection system. Performance : Performance deals with achieving a target in more efficient manner. Performance = (True Positive)/(True Positive)+(True Negative) Accuracy : Accuracy deals with achieving a goal more close to its actual value. Accuracy = (True positive+True Negative)/(True Positive+True Negative+False positive+False Negative). False positive rate : Falsely detect a normal file as abnormal file. False positive rate =(False Positive)/(False Positive+True Negative). 3. EVALUATION DATA SOURCES False positive rate was calculated by the standard data set KDDCUP99 given by the MIT laboratory. In this data set, there are different types of attacks. Those may categorize normal as well as abnormal data. MIT Lincoln laboratory basically establishes a computer network. About 7 days, monitors network traffic. Which contains normal as well as abnormal data. KDDCUP99 data set basically contains normal, denial of service, buffer overflow, guess_passwd(53) and probe attacks. Denial of service : Denial of service (DOS) intrusion is an intrusion. In which , legitmate information can not be make available to legitmate receiver. DOS intrusion also slows down computer system. User to Root(U2R) : In this type of attack, attacker accesses client’s password in unauthorized manner and can access personal information or secret information from computer system by using stolen password. Remote to User(R2U) : In this attack, attacker can transmit a packet over network. Which is not legitmate for that network. Which increases network traffic. Remote to user(R2U) can adversly affacts performance of that computer network and can slow down computer system or can restart a computer system again and again. Probe : In this attack, attacker monitors all information. Which are being sent in that network and can access it. 4. COMBINING SVM AND KNN ALGORITHM Support vector machine(SVM) is a supervised learning method for classification. In which, a hyperplane is created through which a normal as well as abnormal data is separated from each other. Support vector machine(SVM) basically contains two phases- 1- Training phase 2- Testing phase 1-Training phase : Support vector machine(SVM) is able to learn a huge set of pattern from dataset. In the dataset, there are various kind of homogeneous pattern and heterogeneous pattern of data . That can provide better classification between normal and abnormal data. 2-Testing phase : By using training phases, Testing can be done by support vector machine. Support vector machine can evaluate accuracy, performance etc. Support vector machine can evaluate false positive rate but it generates very high false positive rate. K nearest neighbor algorithm is basically a machine learning algorithm. Which can be used to solve traveling salesman problem. By using K nearest neighbor algorithm, false positive rate can be evaluated but it gives higher false positive rate. Fig. 2: Intrusion detection system using CSVMKNN Support vector machine(SVM) basically uses support vectors to create a hyperplane. Hyperplane is used to separate normal and abnormal data. Knn algorithm is used to find new data added to training data set.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 453 so here, Support vector machine(SVM) and K nearest neighbor (KNN) algorithms are combined together to evaluate false positive rate is known as COMBINED SUPPORT VECTOR K NEAREST NEIGHBOR (CSVMKNN) algorithm. CSVMKNN algorithm is a mixture of support vector machine (SVM) and K nearest neighbor (KNN) algorithm. These two algorithm works together in CSVMKNN algorithm. In which, support vector machine (SVM) uses training data set to learn something from data set. If any new is added to its dataset. so it is updated by K nearest neighbor (KNN) algorithm. CSVMKNN algorithm can be used as support vector machine (SVM) and K nearest neighbor (KNN) algorithm to evaluate false positive rate or false alarm rate. False positive rate evaluated by using CSVMKNN algorithm, Can produce better result. CSVMKNN algorithm is applied on KDDCUP99 data set. This data set contains several type of attack such as buffer overflow, Denial of service (DOS) etc. CSVMKNN algorithm generates false positive rate. Which is better than Support vector machine (SVM) and K nearest neighbor (KNN) algorithm. 5. CSVMKNN ALGORITHM Algorithm1 : SVM with KNN clustering Input: Use training data set containing normal and abnormal data (Class type). Output: Generate SVM classifier. 1 start 2 select data from different class; 3 Separate normal and abnormal data by SVM classifier; 4 While number of iteration to add data to data set 5 Use support vector to create hyperplane; 6 Hyperplane separate normal and abnormal data; 7 Apply KNN clustering 8 KNN clustering classified normal and abnormal cluster. 9 If new data added to data set 10 update dataset; 11 else 12 Continues it as it; 13 end. After this algorithm, SVM learning process is applied on data set. Its main goal is to randomly choose data points from KDDCUP99 data set. Hyperplane is used to separate normal and abnormal data points. So there must be a separate hyperplane between each training data points. So it can provide a better selection method for each data points. Support vector machine (SVM) training phase should be introduced. In which. Hyperplane can allocate between each data points. KNN clustering phase is introduced to separate normal data and abnormal data. If new data is added to training data set. So by using K nearest neighbor (KNN) clustering phase, these new added data can be updated to training data set. So these strategy is carried out in next algorithm. Algorithm2: Input: Training data set (KDDCUP99). Input: S1-Number of iteration. Input: S2-Maximum detection rate. Input: S3-Minimum detection rate. Output: Support vector machine(SVM) and K nearest neighbor (KNN) Classifier. 1 Start 2 initialize the data; 3 Let S2 is maximum detection rate, initially zero; 4 Let S3 is minimum detection rate, initially Zero 5 While S3<S2 6 initialize i=0; 7 for i=1,……..,….S1 8 Training phase : 9 Support vector machine (SVM) training phase; 10 Clustering Phase : 11 K nearest neighbor (KNN) clustering phase; 12 end 13 Use Support vector machine(SVM) Classifier; 14 Use hyperplane to separate normal and abnormal data; 15 if new data is added to data set ; 15 Use Knn algorithm to update S2; 16 Update learning process; 17 else 18 continue it as it: 19 end 20 end The KNN clustering phase is used for better selection strategy. False positive can be decreased by using CSVMKNN algorithm. If new added data is declared as normal. Otherwise, it increases its true positive rate. Which basically adversely affacts performance and accuracy. In SVM training phase, if new data is declared as abnormal but in KNN clustering phase, it is declared as normal. So such new data is declared a new kind of intrusion. In SVM training phase, if new data is added to training data set , declared as normal and in KNN clustering phase, it is again declared as normal. So such data decreases false positive rate or false alarm rate. It increases performance and accuracy of that machine learning algorithm. Combined support vector machine k nearest neighbor (CSVMKNN) algorithm basically provides better selection strategy than support vector machine (SVM) and K nearest neighbor (KNN) algorithm. CSVMKNN algorithm takes positive features of support vector machine (SVM) algorithm and K nearest neighbor (KNN) algorithm and avoids weakness of Support vector machine (SVM) algorithm and K nearest neighbor (KNN) algorithm. CSVMKNN algorithm reduces false positive rate of its algorithm by using better selection strategy and improves performance of machine learning (CSVMKNN) algorithm. So, CVMKNN algorithm generates lesser false positive rate than support vector machine (SVM) algorithm and K nearest neighbor algorithm (KNN) algorithm. CSVMKNN algorithm can produce higher performance and accuracy than support vector machine (SVM) and K nearest neighbor (KNN) algorithm.
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 454 6. RESULTS Support vector machine (SVM) algorithm, KNN nearest neighbor (KNN) algorithm and CSVMKNN algorithm are applied on training data set (KDDCUP99). Through which , false positive rate can be calculated. These false positive rate will be compared to determine. Which algorithm has generated lesser false positive rate Support vector machine (SVM) classifier: SVM classifier is used to create a hyperplane between different data points by using support vector . These hyperplane is used to separate normal and abnormal data. On the basis of this, we can evaluate performance, accuracy, false positive rate. Class Normal Denial Of service User To Root Remote To User Probe Normal 900 7 8 1 0 Denial Of service 3 345 0 2 11 User To Root 400 0 0 0 10 Remote To User 345 0 41 34 0 Probe 127 100 0 10 0 Fig-3 SVM classifier K nearest neighbor (KNN) classifier is used to discover new data added to training data set. KNN classifier also determines that new added data is normal or abnormal. KNN algorithm is applied on KDDCUP99 data set to evaluate performance, accuracy and false positive rate. Class Normal Denial Of service User To Root Remote To User Probe Normal 928 1 5 0 1 Denial Of service 0 45 0 200 1 User To Root 4 3 6 5 0 Remote To User 0 0 412 234 15 Probe 1 4 0 0 23 Fig-4 KNN classifier CSVMKNN classifier basically contains feature of both algorithm support vector machine (SVM) and K nearest neighbor (KNN) algorithm. CSVMKNN algorithm is applied on KDDCUP99 dataset to generate its performance, accuracy, false positive rate. Class Normal Denial Of service User To Root Remote To User Probe Normal 100 0 8 9 70 Denial Of service 30 35 0 0 89 User To Root 0 0 0 50 0 Remote To User 0 0 0 24 0 Probe 1 4 0 0 0 Fig-5 CSVMKNN Classifier Evaluation Measure SVM KNN CSVMKNN False positive Rate 12.00 11.00 6.00 False Negative Rate 26.00 6.00 0.89 Performance 8.00 9.00 14.50 Accuracy 7.50 3.50 16.00 Fig-6 Comparison of false positive rate CSVMKNN algorithm generates lesser false positive rate than Support vector machine (SVM) and K nearest neighbor (KNN) algorithm. 7. CONCLUSION In this research work, Support vector machine (SVM) algorithm, K nearest neighbor (KNN) algorithm and CSVMKNN algorithm have been applied on KDDCUP99 data set separately. In which CSVMKNN algorithm has generated lower false positive rate than SVM and KNN algorithm. CSVMKNN algorithm has enhanced performance , accuracy and higher detection rate than other machine learning algorithm. Still , there is area of improvement in this algorithm until we are not getting zero false positive rate. REFERENCES [1]. pgale, Robert, Sheodoor schote, rengin and Christopher kruegel.”A Literature analysis on automated malware analysis technique” [2]. Pargas, Rob Jonathan jarcy, Eleazar Aguirre Anaya , Samon Galeana Huerta and Alba Felix Moreno Hernandez,"Security controls for Android" In Computational Aspects of Social Networks (CASoN), 2012 Fourth International Conference on, pp.212-216,IEEE,2012 [3]. Blasing, Thomas, Leonid Batyuk, A-D.Schmidt, Seyit Ahmet Camtepe, and Sahin Albayrak." An android application sandbox system for suspicious software detection" In Malicious and Unwanted Software (MALWARE), 2010 5th International Conference on ,pp. 55-62 IEEE, 2010.
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 02 | Feb-2015, Available @ http://www.ijret.org 455 [4]. Johnson Ryan, Zhaohui Wang , Corey Gagnon and Angelos Stavrou." Analysis of Android Applications' Permissions. " In Software Security and Reliability Companion(SERE-C),2012 IEEE Sixth International Conference on, pp. 45 - 46.IEEE,2012. [5]. Susan M. B. and Rayford B.V. (2000). Intrusion detection via fuzzy data mining, Proceedings of the 12th Annual Canadian Information Technology,Ottawa, Canada, June 19-23, 2000, PP.109-122. [6]. A Detailed Analysis of the KDD CUP 99 Data Set, Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. BIOGRAPHIES Sushil kumar Mishra is a M.tech student and doing research work in computer security Pankaj Bhatt is pursuing M.tech and doing research work in computer security.