SlideShare a Scribd company logo
IDS
[Intrusion Detection
System]
Analysis of Decision Trees and SVM
S. V. Farrahi
H. Manzari
N. Kharazmi
Shiraz University of Technology
What is intrusion detection?
»Intrusion detection systems (IDSs) are software
or hardware systems that automate the process of
monitoring the events occurring in a computer
system or network, analyzing them for signs of
security problems.
What is intrusion detection?
»Intrusion detection is the process of monitoring
the events occurring in a computer system or
network and analyzing them for signs of intrusions,
defined as attempts to compromise the
confidentiality, integrity, availability, or to bypass
the security mechanisms of a computer or network
Why need intrusion detection?
»Intrusions are caused by attackers accessing the
systems from the Internet, authorized users of the
systems who attempt to gain additional privileges
for which they are not authorized, and authorized
users who misuse the privileges given them.
Classification of intrusion
detection system
Generally speaking, there are two kinds of
classification methods for intrusion detection system:
» According to different data sources, intrusion
detection system includes host-based IDS and
network-based IDS.
» According to different analysis methods, intrusion
detection system includes Misuse Detection and
Anomaly Detection.
host-based and network-based
IDS
» Host-based systems base their decisions on
information obtained from a single host (usually
audit trails), while network-based intrusion
detection systems obtain data by monitoring the
traffic in the network to which the hosts are
connected
Misuse Detection and Anomaly
Detection
» A signature detection system identifies patterns of
traffic or application data presumed to be malicious
while anomaly detection systems compare activities
against a ‘‘normal ’’ baseline
» Anomaly detection assumes that an intrusion will
always reflect some deviations from normal
patterns.
» Misuse detection is based on the knowledge of
system vulnerabilities and known attack patterns
Signatures based
Intrusion
Patterns
activities
pattern
matching
intrusion
Example: if (src_ip == dst_ip) then “land attack”
Anomaly based
activity
measures
probable
intrusion
Misuse detection Advantages and
disadvantages
» The primary advantage of signature detection is
that known attacks can be detected fairly reliably
with a low false positive rate.
» The drawback of the signature detection
approach is that such systems typically require a
signature to be defined for all of the possible
attacks that an attacker may launch against a
network
Misuse detection Advantages and
disadvantages
» The main disadvantage of misuse detection
approaches is that they will detect only the attacks
for which they are trained to detect.
» Novel attacks or unknown attacks or even variants
of common attacks often go undetected. At a time
when new security vulnerabilities in software are
discovered and exploited every day, the reactive
approach embodied by misuse detection methods is
not feasible for defeating malicious attacks
Anomaly detection Advantages and
disadvantages
» Anomaly detection systems have two major advantages
over signature based intrusion detection systems. The first
advantage that differentiates anomaly detection systems from
signature detection systems is their ability to detect unknown
attacks as well as ‘‘zero day’’ attacks
» profiles of normal activity are customized for every system,
application and/or network, and therefore making it very
difficult for an attacker to know with certainty what activities
it can carry out without getting detected.
Anomaly detection Advantages and
disadvantages
» Disadvantage of the anomaly detection
approach is that well-known attacks may not be
detected, particularly if they fit the established
profile of the user
» if the attacker knows that his profile is stored
he can change his profile slightly and train the
system in such a way that the system will
consider the attack as a normal behavior.
Process model for Intrusion Detection
» Three fundamental functional components of an IDS:
Information Sources – the different sources of event
information used to determine whether an intrusion has
taken place. These sources can be drawn from different
levels of the system, with network, host, and application
monitoring most common.
» Analysis – the part of intrusion detection systems that
actually organizes and makes sense of the events derived
from the information sources, deciding when those events
indicate that intrusions are occurring or have already taken
place
» Response – Send alarm to the administrator
Architecture
Architecture of an intrusion detection system
KDD Cup 99 dataset- A benchmark
» There are approximately 4,940,000 kinds of data in
training dataset
» There are 23 types of attacks contained in training
information and 37 types of attacks contained in test
information,14 types of attacks more than training
information
» each record ( row) has 41 features plus one that is class
variable
» test information can be used to assess the detection
capacity for unknown attacks.
KDD Cup 99 dataset attacks
» Four types of attacks in the KDD cup 99 :
Probe: Strictly speaking, it should not be regarded as
true attacks but preparation step of attackers before
launching attacks.
» Dos (Denial of service): Such attack may cause the
stop of server operation, and the server cannot
provide services. The attack usually occupies all
system source of server, or occupies the band width
and disables system resource and makes operation
stop.
KDD Cup 99 dataset attacks
(cont…
» U2R (User gain root): In the attack, users
take advantage of system leak to get access to
legal purview or administrator’s purview
» A remote to user (R2L) attack is a class of
attack where an attacker sends packets
to a machine over a network, then exploits the
machine’s vulnerability to illegally gain local
access as a user.
Evaluation steps
Classification tree
» Classification tree which is also called decision tree is
one of the main techniques used in data mining.
» Its main goal is to learn from class-labeled training tuples
for predicting classes of new or previously unseen data.
» Two methods for building tree are top-down tree and
bottom-up Pruning
» ID3 and C4.5, two common algorithms of decision tree, are
constructed in top-down manner.
Steps of Classification tree
1) Computing the information gain for each attribute.
2) The attribute with the highest information gain, is
selected as a splitting attribute.
3) If the selected attribute is discrete (categorical), the node
is branched with all possible values. If the attribute is
continuous, a cut point with the highest information gain is
selected.
4) After splitting, consider whether or not these new nodes
are leaves (their data belong to the same type); otherwise,
new nodes are the root of the sub-trees.
5) Repeating all the above steps, until all new nodes are
leaves.
SVM – Support Vector Machine
small distance between data and hyperplane and right: big distance
between data and hyperplane.
Percentage of various data
10% kddcup.data_10_percent.gz.
Preprocess of data
» The research will sample training dataset (10%
kddcup.data_10_percent.gz) and test Dataset
» Based on the normal proportion, select each
10,000 group of data where normal proportion is
10%, 20%, 30%, . . ., 90% in training dataset and
test dataset
Camparison
Accuracy = TP +TN/(TP + TN + FP + FN) * 100%
False alarm rate = FP/(FP +TN)* 100%
Detection rate = TP /(TP + FN) * 100%
precision = TP/(TP + FP) * 100%
recall = TP/(TP + FN) * 100%
Accuracy comparison between C4.5 and
SVM
Accuracy comparison between C4.5 and
SVM
Accuracy comparison between C4.5 and
SVM
» when the proportion of normal information is
large (>70%), their accuracy is approximately equal,
but SVM is much better
» According to the average, C4.5 is slightly better
than SVM
Detection rate comparison between C4.5 and
SVM
Comparison of Detection Rate(cont..)
Comparison of Detection
Rate(cont…)
» In detection rate, C4.5 declines as the percentage
of normal data rises, but SVM is not fixed.
» Integrally speaking, Curve of C4.5 is above that of
SVM
» obviously, its detection rate is better than that of
SVM
False alarm rate comparison between C4.5 and
SVM
False alarm rate comparison between C4.5 and
SVM
False alarm rate comparison between C4.5 and SVM
(cont..)
» In comparison of false alarm rate, SVM is inferior
to C4.5 only when the proportion of normal
information is 30%, 50% and 60%, but it is better
than C4.5 otherwise
» According to the average value, SVM is better C4.5
in false alarm rate.
Comparison
» For comparison results of C4.5 and SVM, we
finds that C4.5 is superior to SVM in accuracy
and detection; but in false alarm rate, SVM is
better
Feature Selection
» In complex classification domains, features
may contain false correlations, which hinder
the process of detecting intrusions.
» Further, some features may be redundant
since the information they add is contained in
other features
» Extra features can increase computation time,
and can have an impact on the accuracy of the
IDS.
Feature Selection(cont..)
» Empirical results indicate that significant input feature
selection is important to design an IDS that is lightweight,
efficient and effective for real world detection systems
» IDSs try to perform their task in real time.Some data may
not be useful to the IDS and thus can be eliminated before
processing
» Feature selection can help to reduce the time need to
construct a model
Correlation
coefficient(preprocessing)
Correlation coefficient of A and B is defined as follows :
Correlation
coefficient(preprocessing)
Detection rate comparison between
C4.5 and SVM
Classification and Regression
Trees (CART)
» The Classification and Regression Trees (CART)
methodology is based on binary recursive partitioning
» The process is binary because parent nodes are always
split into exactly two child nodes and recursive because
the process is repeated by treating each child node as a
parent
» For splitting, the Gini rule is used which essentially is a
measure of how well the splitting rule separates the
classes contained in the parent node
Classification and Regression Trees
(CART)(cont…)
» Unlike other methods, CART does not stop in the
middle of the tree growing process, because there
might still be important information to be
discovered by drilling down several more levels.
» Once the maximal tree is grown and a set of
sub-trees is derived from it, CART determines the
best tree by testing for error rates or costs
Classification and Regression Trees
(CART)(cont…)
» The best sub-tree is the one with the lowest or
near-lowest cost, which may be a relatively small
tree
» The best variable selected at each node of the tree
is called (first) primary variable
» Surrogate variables are defined as the variables
that most accurately predict the action of the
primary variable
Result of CART
» KDD cup 99 Data set has 41 features , which is
high-dimensional
» IDS is a real-time task , thus feature reduction
can help reduce the time of constructing a model
» This resulted in a reduced 12-variable data set
with C, E, F, L, W, X, Y, AB, AE, AF, AG and AI as
variables
Performance of CART
Experimental Result
Experimental Result
Conclusion and future work
» Decision trees can help in IDSs with constructing an
accurate model But not do well in R2l and U2R attacks
» From empirical results of U2R and R2L classes which
have small training data and for which decision tree gives
better performance than SVM, we can say that decision
tree works well with small training data
» We found that reducing the number of features will
not necessarily reduce the test time. This quite depends
on the existing relationship between dataset features,
not on the number of features.
Refrences
[1] M. Ektefa, S. Memar, F. Sidi, and L. S. Affendey,
"Intrusion Detection Using Data Mining Techniques," 2010
International Conference on Information Retrieval & Knowledge
Management, (CAMP)
2010.
[2] B. M. Bidgoli, M. Analoui, M. H. Rezvani, and H. S.
Shahhoseini, "Performance Evaluation of Decision Tree for
Intrusion Detection Using Reduced Feature Spaces," Trends in
Intelligent Systems and Computer Engineering, 2008.
[3] S. Chebrolua, A. Abrahama, and J. P. Thomasa, "Feature
deduction and ensemble design of intrusion detection
systems," Computers & Security, 2005.

More Related Content

PDF
Data aggregation in wireless sensor network
PDF
Gis in telecomm ppt
PDF
módulo de educación estrategias ambiental 3.pdf
PPTX
Mobile ad-hoc network [autosaved]
PPTX
Object Recognition
PDF
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
PDF
Survey of network anomaly detection using markov chain
PDF
International Journal of Computer Science, Engineering and Information Techno...
Data aggregation in wireless sensor network
Gis in telecomm ppt
módulo de educación estrategias ambiental 3.pdf
Mobile ad-hoc network [autosaved]
Object Recognition
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
Survey of network anomaly detection using markov chain
International Journal of Computer Science, Engineering and Information Techno...

Similar to IDS - Analysis of SVM and decision trees (20)

PDF
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
PDF
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
PDF
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
PDF
Ijcet 06 07_002
PDF
Network Security: A Multi-Stage Intrusion Detection Approach
PPTX
intrusion-detection-using-ML.pptx
PDF
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
PPTX
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
PDF
Survey on classification techniques for intrusion detection
PDF
Comparison study of machine learning classifiers to detect anomalies
DOCX
V1_I1_2012_Paper3.docx
PDF
Study on Data Mining Suitability for Intrusion Detection System (IDS)
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Implementation of Secured Network Based Intrusion Detection System Using SVM ...
PDF
An efficient intrusion detection using relevance vector machine
PDF
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
PDF
New Fuzzy Logic Based Intrusion Detection System
PDF
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
PDF
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
PDF
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
Ijcet 06 07_002
Network Security: A Multi-Stage Intrusion Detection Approach
intrusion-detection-using-ML.pptx
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Survey on classification techniques for intrusion detection
Comparison study of machine learning classifiers to detect anomalies
V1_I1_2012_Paper3.docx
Study on Data Mining Suitability for Intrusion Detection System (IDS)
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Implementation of Secured Network Based Intrusion Detection System Using SVM ...
An efficient intrusion detection using relevance vector machine
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
New Fuzzy Logic Based Intrusion Detection System
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
Ad

Recently uploaded (20)

PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PPT on Performance Review to get promotions
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT
Total quality management ppt for engineering students
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
additive manufacturing of ss316l using mig welding
PDF
737-MAX_SRG.pdf student reference guides
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
PPT on Performance Review to get promotions
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
UNIT 4 Total Quality Management .pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Total quality management ppt for engineering students
R24 SURVEYING LAB MANUAL for civil enggi
Internet of Things (IOT) - A guide to understanding
Foundation to blockchain - A guide to Blockchain Tech
additive manufacturing of ss316l using mig welding
737-MAX_SRG.pdf student reference guides
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Embodied AI: Ushering in the Next Era of Intelligent Systems
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Ad

IDS - Analysis of SVM and decision trees

  • 1. IDS [Intrusion Detection System] Analysis of Decision Trees and SVM S. V. Farrahi H. Manzari N. Kharazmi Shiraz University of Technology
  • 2. What is intrusion detection? »Intrusion detection systems (IDSs) are software or hardware systems that automate the process of monitoring the events occurring in a computer system or network, analyzing them for signs of security problems.
  • 3. What is intrusion detection? »Intrusion detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions, defined as attempts to compromise the confidentiality, integrity, availability, or to bypass the security mechanisms of a computer or network
  • 4. Why need intrusion detection? »Intrusions are caused by attackers accessing the systems from the Internet, authorized users of the systems who attempt to gain additional privileges for which they are not authorized, and authorized users who misuse the privileges given them.
  • 5. Classification of intrusion detection system Generally speaking, there are two kinds of classification methods for intrusion detection system: » According to different data sources, intrusion detection system includes host-based IDS and network-based IDS. » According to different analysis methods, intrusion detection system includes Misuse Detection and Anomaly Detection.
  • 6. host-based and network-based IDS » Host-based systems base their decisions on information obtained from a single host (usually audit trails), while network-based intrusion detection systems obtain data by monitoring the traffic in the network to which the hosts are connected
  • 7. Misuse Detection and Anomaly Detection » A signature detection system identifies patterns of traffic or application data presumed to be malicious while anomaly detection systems compare activities against a ‘‘normal ’’ baseline » Anomaly detection assumes that an intrusion will always reflect some deviations from normal patterns. » Misuse detection is based on the knowledge of system vulnerabilities and known attack patterns
  • 10. Misuse detection Advantages and disadvantages » The primary advantage of signature detection is that known attacks can be detected fairly reliably with a low false positive rate. » The drawback of the signature detection approach is that such systems typically require a signature to be defined for all of the possible attacks that an attacker may launch against a network
  • 11. Misuse detection Advantages and disadvantages » The main disadvantage of misuse detection approaches is that they will detect only the attacks for which they are trained to detect. » Novel attacks or unknown attacks or even variants of common attacks often go undetected. At a time when new security vulnerabilities in software are discovered and exploited every day, the reactive approach embodied by misuse detection methods is not feasible for defeating malicious attacks
  • 12. Anomaly detection Advantages and disadvantages » Anomaly detection systems have two major advantages over signature based intrusion detection systems. The first advantage that differentiates anomaly detection systems from signature detection systems is their ability to detect unknown attacks as well as ‘‘zero day’’ attacks » profiles of normal activity are customized for every system, application and/or network, and therefore making it very difficult for an attacker to know with certainty what activities it can carry out without getting detected.
  • 13. Anomaly detection Advantages and disadvantages » Disadvantage of the anomaly detection approach is that well-known attacks may not be detected, particularly if they fit the established profile of the user » if the attacker knows that his profile is stored he can change his profile slightly and train the system in such a way that the system will consider the attack as a normal behavior.
  • 14. Process model for Intrusion Detection » Three fundamental functional components of an IDS: Information Sources – the different sources of event information used to determine whether an intrusion has taken place. These sources can be drawn from different levels of the system, with network, host, and application monitoring most common. » Analysis – the part of intrusion detection systems that actually organizes and makes sense of the events derived from the information sources, deciding when those events indicate that intrusions are occurring or have already taken place » Response – Send alarm to the administrator
  • 15. Architecture Architecture of an intrusion detection system
  • 16. KDD Cup 99 dataset- A benchmark » There are approximately 4,940,000 kinds of data in training dataset » There are 23 types of attacks contained in training information and 37 types of attacks contained in test information,14 types of attacks more than training information » each record ( row) has 41 features plus one that is class variable » test information can be used to assess the detection capacity for unknown attacks.
  • 17. KDD Cup 99 dataset attacks » Four types of attacks in the KDD cup 99 : Probe: Strictly speaking, it should not be regarded as true attacks but preparation step of attackers before launching attacks. » Dos (Denial of service): Such attack may cause the stop of server operation, and the server cannot provide services. The attack usually occupies all system source of server, or occupies the band width and disables system resource and makes operation stop.
  • 18. KDD Cup 99 dataset attacks (cont… » U2R (User gain root): In the attack, users take advantage of system leak to get access to legal purview or administrator’s purview » A remote to user (R2L) attack is a class of attack where an attacker sends packets to a machine over a network, then exploits the machine’s vulnerability to illegally gain local access as a user.
  • 20. Classification tree » Classification tree which is also called decision tree is one of the main techniques used in data mining. » Its main goal is to learn from class-labeled training tuples for predicting classes of new or previously unseen data. » Two methods for building tree are top-down tree and bottom-up Pruning » ID3 and C4.5, two common algorithms of decision tree, are constructed in top-down manner.
  • 21. Steps of Classification tree 1) Computing the information gain for each attribute. 2) The attribute with the highest information gain, is selected as a splitting attribute. 3) If the selected attribute is discrete (categorical), the node is branched with all possible values. If the attribute is continuous, a cut point with the highest information gain is selected. 4) After splitting, consider whether or not these new nodes are leaves (their data belong to the same type); otherwise, new nodes are the root of the sub-trees. 5) Repeating all the above steps, until all new nodes are leaves.
  • 22. SVM – Support Vector Machine small distance between data and hyperplane and right: big distance between data and hyperplane.
  • 23. Percentage of various data 10% kddcup.data_10_percent.gz.
  • 24. Preprocess of data » The research will sample training dataset (10% kddcup.data_10_percent.gz) and test Dataset » Based on the normal proportion, select each 10,000 group of data where normal proportion is 10%, 20%, 30%, . . ., 90% in training dataset and test dataset
  • 25. Camparison Accuracy = TP +TN/(TP + TN + FP + FN) * 100% False alarm rate = FP/(FP +TN)* 100% Detection rate = TP /(TP + FN) * 100% precision = TP/(TP + FP) * 100% recall = TP/(TP + FN) * 100%
  • 28. Accuracy comparison between C4.5 and SVM » when the proportion of normal information is large (>70%), their accuracy is approximately equal, but SVM is much better » According to the average, C4.5 is slightly better than SVM
  • 29. Detection rate comparison between C4.5 and SVM
  • 30. Comparison of Detection Rate(cont..)
  • 31. Comparison of Detection Rate(cont…) » In detection rate, C4.5 declines as the percentage of normal data rises, but SVM is not fixed. » Integrally speaking, Curve of C4.5 is above that of SVM » obviously, its detection rate is better than that of SVM
  • 32. False alarm rate comparison between C4.5 and SVM
  • 33. False alarm rate comparison between C4.5 and SVM
  • 34. False alarm rate comparison between C4.5 and SVM (cont..) » In comparison of false alarm rate, SVM is inferior to C4.5 only when the proportion of normal information is 30%, 50% and 60%, but it is better than C4.5 otherwise » According to the average value, SVM is better C4.5 in false alarm rate.
  • 35. Comparison » For comparison results of C4.5 and SVM, we finds that C4.5 is superior to SVM in accuracy and detection; but in false alarm rate, SVM is better
  • 36. Feature Selection » In complex classification domains, features may contain false correlations, which hinder the process of detecting intrusions. » Further, some features may be redundant since the information they add is contained in other features » Extra features can increase computation time, and can have an impact on the accuracy of the IDS.
  • 37. Feature Selection(cont..) » Empirical results indicate that significant input feature selection is important to design an IDS that is lightweight, efficient and effective for real world detection systems » IDSs try to perform their task in real time.Some data may not be useful to the IDS and thus can be eliminated before processing » Feature selection can help to reduce the time need to construct a model
  • 40. Detection rate comparison between C4.5 and SVM
  • 41. Classification and Regression Trees (CART) » The Classification and Regression Trees (CART) methodology is based on binary recursive partitioning » The process is binary because parent nodes are always split into exactly two child nodes and recursive because the process is repeated by treating each child node as a parent » For splitting, the Gini rule is used which essentially is a measure of how well the splitting rule separates the classes contained in the parent node
  • 42. Classification and Regression Trees (CART)(cont…) » Unlike other methods, CART does not stop in the middle of the tree growing process, because there might still be important information to be discovered by drilling down several more levels. » Once the maximal tree is grown and a set of sub-trees is derived from it, CART determines the best tree by testing for error rates or costs
  • 43. Classification and Regression Trees (CART)(cont…) » The best sub-tree is the one with the lowest or near-lowest cost, which may be a relatively small tree » The best variable selected at each node of the tree is called (first) primary variable » Surrogate variables are defined as the variables that most accurately predict the action of the primary variable
  • 44. Result of CART » KDD cup 99 Data set has 41 features , which is high-dimensional » IDS is a real-time task , thus feature reduction can help reduce the time of constructing a model » This resulted in a reduced 12-variable data set with C, E, F, L, W, X, Y, AB, AE, AF, AG and AI as variables
  • 48. Conclusion and future work » Decision trees can help in IDSs with constructing an accurate model But not do well in R2l and U2R attacks » From empirical results of U2R and R2L classes which have small training data and for which decision tree gives better performance than SVM, we can say that decision tree works well with small training data » We found that reducing the number of features will not necessarily reduce the test time. This quite depends on the existing relationship between dataset features, not on the number of features.
  • 49. Refrences [1] M. Ektefa, S. Memar, F. Sidi, and L. S. Affendey, "Intrusion Detection Using Data Mining Techniques," 2010 International Conference on Information Retrieval & Knowledge Management, (CAMP) 2010. [2] B. M. Bidgoli, M. Analoui, M. H. Rezvani, and H. S. Shahhoseini, "Performance Evaluation of Decision Tree for Intrusion Detection Using Reduced Feature Spaces," Trends in Intelligent Systems and Computer Engineering, 2008. [3] S. Chebrolua, A. Abrahama, and J. P. Thomasa, "Feature deduction and ensemble design of intrusion detection systems," Computers & Security, 2005.