SlideShare a Scribd company logo
An Experimental Study about Simple Decision Trees for
Bagging Ensemble on Datasets with Classification Noise
Joaquín Abellán and Andrés R. Masegosa
Department of Computer Science and Artificial Intelligence
University of Granada
Verona, July 2009
10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
ECSQARU 2009 Verona (Italy) 1/23
Introduction
Part I
Introduction
ECSQARU 2009 Verona (Italy) 2/23
Introduction
Introduction
Ensembles of Decision Trees (DT)
Features
They usually build different DT for different samples of the training dataset.
The final prediction is a combination of the individual predictions of each tree.
Take advantage of the inherent instability of DT.
Bagging, AdaBoost and Randomization the most known approaches.
ECSQARU 2009 Verona (Italy) 3/23
Introduction
Introduction
Classification Noise (CN) in the class values
Definition
The class values of the samples given to the learning algorithm have some
errors.
Random classification noise: the label of each example is flipped randomly
and independently with some fixed probability called the noise rate.
Causes
It is mainly due to errors in the data capture process.
Very common in real world applications: surveys, biological or medical
information...
Effects on ensembles of decision trees
The presence of classification noise degenerates the performance of any
classification inducer.
AdaBoost is known to be very affected by the presence of classification noise.
Bagging is the ensemble approach with the better response to classification
noise.
ECSQARU 2009 Verona (Italy) 4/23
Introduction
Introduction
Motivation of this study
Description
Decision trees built with different split criteria are considered in a Bagging
scheme.
Common split criteria (InfoGain, InfoGain Ratio and Gini Index) and a new split
criteria based on imprecise probabilities are analyzed.
Analyze which split criteria is more robust to the presence of classification
noise.
Outline
Description of the different split criteria.
Bagging Decision Trees
Experimental Results.
Conclusions and Future Works.
ECSQARU 2009 Verona (Italy) 5/23
Split Criteria
Part II
Split Criteria
ECSQARU 2009 Verona (Italy) 6/23
Split Criteria
Split Criteria
Decision Trees
Example Description
Attributes are placed at the nodes.
Class values are placed at the leaves.
Each leaf corresponds to a decision
rule.
Learning
Split Criteria selects the attribute to
place at each branching node.
Stop Criteria decides when to fix a leaf
and stop the branching.
ECSQARU 2009 Verona (Italy) 7/23
Split Criteria
Split Criteria
Classic Split Criteria
Description
A real-valued function which measures the goodness of an attribute X as an
split node in the decision tree.
A local measure that allows a recursive building of the decision tree.
Information Gain (IG)
Introduced by Quinlan as basis of his ID3 model [18].
It is based on Shannon’s entropy.
IG(X, C) = H(C|X) − H(C) = −
i j
p(cj , xi ) log
p(cj , xi )
p(cj )p(xi )
.
Tendency to select attributes with a high number of states.
ECSQARU 2009 Verona (Italy) 8/23
Split Criteria
Split Criteria
Classic Split Criteria
Information Gain Ratio (IGR)
Improved version of IG (Quinlan’C4.5 tree inducer [19]).
Normalizes the information gain dividing by the entropy of the split attribute.
IGR(X, C) =
IG(X, C)
H(X)
.
Penalizes attributes with many states.
Gini Index (GIx)
Measure the impurity degree of a partition.
Introduced by Breiman as basis of CART tree inducer [8].
GIx(X, C) = gini(C|X) − gini(C)
gini(C|X) =
t
p(xi )gini(C|X = xi ) gini(C) = 1 −
j
p2
(cj )
Tendency to select attributes with a high number of states.
ECSQARU 2009 Verona (Italy) 9/23
Split Criteria
Split Criteria
Split Criteria based on Imprecise Probabilities
Imprecise Information Gain (IIG) [3]
An uncertainty measure for convex sets of probability distributions.
Probability intervals for each state of the class variable are computed from the
dataset using Walley’s Imprecise Dirichlet Model (IDM) [24].
p(cj ) ∈
ncj
N + s
,
ncj + s
N + s
≡ Icj , p(cj |xi ) ∈
ncj ,xi
N + s
,
ncj ,xi + s
Nxi + s
≡ Icj ,xi
If we label K(C) and K(C|(X = xi )) for the following sets of probability
distributions q on ΩC:
K(C) = {q| q(cj ) ∈ Icj } K(C|X = xi ) = {q| q(cj ) ∈ Icj ,xi },
Imprecise Info-Gain for each variable X is defined as:
IIG(X, C) = S(K(C)) − p(xi )S(K(C|(X = xi )))
where S() is the maximum entropy function of a convex set.
It can be efficiently computed for s=1 [1].
ECSQARU 2009 Verona (Italy) 10/23
Bagging Decision Trees
Part III
Bagging Decision Trees
ECSQARU 2009 Verona (Italy) 11/23
Bagging Decision Trees
Bagging Decision Trees
Bagging Decision Trees
Procedure
Ti samples are generated by
random sampling with
replacement from the initial training
dataset.
From each Ti sample, a simple
decision tree is built using a given
split criteria.
Final prediction is made by a
majority voting criteria.
Description
As Breiman [9] said about Bagging: The vital element is the instability of the
prediction method. If perturbing the learning set can cause significant changes
in the predictor constructed, then Bagging can improve accuracy.
The combination of multiple models reduce the overfitting of the single
decision trees to the data set.
ECSQARU 2009 Verona (Italy) 12/23
Experiments
Part IV
Experiments
ECSQARU 2009 Verona (Italy) 13/23
Experiments
Experiments
Experimental Set-up
Datasets Benchmark
25 UCI datasets with very different features.
Missing values were replaced with mean and mode values for continuous and
discrete attributes respectively.
Continuous attributes were discretized with Fayyad & Irani’s method [13].
Preprocessing was only carried out considering information for training data sets.
Evaluated Algorithms
Bagging ensembles of 100 trees.
Different split criteria: IG, IGR, GIx and IIG.
Evaluation Method
Different noise rates were applied to training datasets (not to test datasets):
0%, 5%, 10%, 20% and 30%.
k-10 fold cross validation repeated 10 times were used to estimate the
classification accuracy.
ECSQARU 2009 Verona (Italy) 14/23
Experiments
Experiments
Statistical Tests
Two classifiers on a single dataset
Corrected Paired T-test [26]: A corrected version of the paired T-test
implemented in Weka.
Two classifiers on multiple datasets
Wilconxon Signed-Ranks Test [25]: A non-parametric test which ranks the
differences in each dataset.
Sign Test [20,22]: A binomial test that counts the number of wins, losses and
ties across each dataset.
Multiple classifiers on multiple datasets
Friedman Test [15,16]: A non-parametric test that ranks the algorithms for each
dataset, the best one gets rank 1, second one gets rank 2... Null-hypothesis is
that all algorithms perform equally well.
Nemenyi Test [17]: A post-hoc test that is employed to compare the algorithms
among them when the null-hypothesis with Friedman test is rejected.
ECSQARU 2009 Verona (Italy) 15/23
Experiments
Experiments
Average Performance
Analysis
The average accuracy is similar when no noise is introduced.
The introduction of noise deteriorates the performance of classifiers.
But IIG is more robust to noise, because its average performance is higher in
each one of the noise levels.
ECSQARU 2009 Verona (Italy) 16/23
Experiments
Experiments
Corrected Paired T-Test at 0.05 level
Number of accumulated Wins, Ties and Defeates (W/T/D)
of IIG respect to IG, IGR and GIx in 25 datasets.
Noise IG IGR GIx
0% 2/22/1 1/23/1 2/22/1
5% 11/14/0 10/15/0 11/14/0
10% 13/12/0 10/15/0 13/12/0
20% 16/9/0 11/14/0 18/7/0
30% 17/8/0 11/14/0 17/8/0
Analysis
Without noise, there is a tie in almost all datasets.
As much noise is added, higher the number of wins are.
IIG wins in a high number of datasets and it is not defeated in none of them.
ECSQARU 2009 Verona (Italy) 17/23
Experiments
Experiments
Wilconxon and Sign Test at 0.05 level
Comparison of IIG respect to the rest of split criteria.
’-’ indicates non-statistically significant differences.
Noise
0 %
5 %
10 %
20 %
30 %
Wilconxon Test
IG IGR GIx
IIG - IIG
IIG IIG IIG
IIG IIG IIG
IIG IIG IIG
IIG IIG IIG
Sign Test
IG IGR GIx
IIG - IIG
IIG IIG IIG
IIG IIG IIG
IIG IIG IIG
IIG IIG IIG
Analysis
Without noise, IIG outperforms IG and GIx, but not IGR.
With any level of noise, IIG outperforms the rest of the splits.
IGR also outperforms IG and GIx when there is some noise level.
ECSQARU 2009 Verona (Italy) 18/23
Experiments
Experiments
Friedman Test at 0.05 level
The ranks assessed by Friedman Test are depicted.
As lower the rank, better the performance.
Ranks in bold face indicates that IIG statistically outperforms with Nemenyi Test.
Noise IIG IG IGR GIx
0% 1.86 2.92 2.52 2.70
5% 1.18 3.18 2.54 3.12
10% 1.12 3.26 2.36 3.26
20% 1.12 3.20 2.16 3.52
30% 1.12 3.36 2.26 3.26
Analysis
Without noise, IIG has the best ranking and outperforms IG.
With a noise level higher than 10%, IIG outperforms over the rest.
IGR also outperforms IG and GIx when the noise level is higher than 20%.
ECSQARU 2009 Verona (Italy) 19/23
Experiments
Experiments
Computational Time
Analysis
Without noise, all split criteria have similar time average.
The introduction of noise deteriorates the computational performance of
classifiers.
IIG and GIx consumes less time than the other split criteria. IGR is the most time
consumer.
ECSQARU 2009 Verona (Italy) 20/23
Conclusions and Future Works
Part V
Conclusions and Future Works
ECSQARU 2009 Verona (Italy) 21/23
Conclusions and Future Works
Conclusions and Future Works
Conclusions and Future Works
Conclusions
Experimental study about the performance of different split criteria in a
bagging scheme under classification noise.
Three classic split criteria: IG, IGR and GIx; and a new one based on
imprecise probabilities: IIG.
Bagging with IIG has a strong behavior respect to the other ones when the
noise level is increased.
IGR has also a good performance with noise level, but lower than IIG.
Future Works
Extend the methods for continuous and missing values.
Further investigate the computational cost of any of the models as well as other
factors such as number of trees, pruning...
Introduce new imprecise models.
ECSQARU 2009 Verona (Italy) 22/23
Thanks for your attention!!
Questions?
ECSQARU 2009 Verona (Italy) 23/23

More Related Content

PDF
Bagging Decision Trees on Data Sets with Classification Noise
PDF
IRJET - Real Time Object Detection using YOLOv3
PDF
Forest Cover type prediction
PDF
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
PPT
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

PDF
forest-cover-type
PDF
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Bagging Decision Trees on Data Sets with Classification Noise
IRJET - Real Time Object Detection using YOLOv3
Forest Cover type prediction
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

forest-cover-type
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio

What's hot (16)

PDF
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
PDF
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
PDF
Multivariate decision tree
PDF
Polikar10missing
PDF
S0450598102
PDF
Rohan's Masters presentation
PDF
IRJET- A Review on Object Tracking based on KNN Classifier
PPTX
Decision Tree - C4.5&CART
PDF
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
PDF
IRJET- A Detailed Study on Classification Techniques for Data Mining
PDF
IRJET- A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
PDF
N046047780
PDF
Df24693697
PDF
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
PDF
Decision tree lecture 3
PDF
An Image Based PCB Fault Detection and Its Classification
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Multivariate decision tree
Polikar10missing
S0450598102
Rohan's Masters presentation
IRJET- A Review on Object Tracking based on KNN Classifier
Decision Tree - C4.5&CART
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
N046047780
Df24693697
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
Decision tree lecture 3
An Image Based PCB Fault Detection and Its Classification
Ad

Viewers also liked (20)

PPTX
Aguilar hoyuky act2
PDF
Finnish Red Cross Project
PPTX
Mining and Managing Large-scale Linked Open Data
PPT
Calse 1, 2, 3 de quique
PPT
PROYECTO INTEGRADOR
PPTX
Cervantes
PPT
Presentacion Final 23 08 11
PDF
Technology Strategy Report - Case UPM Biofuels
PDF
Júpiter - Hércules - Hermes Tres Modelos de Juez. Autor François Ost
PDF
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
DOC
Memoria descriptiva (1)
PPT
Conceptos básicos sobre tipos sistemas de agua
PPTX
Física o química - Química inorgánica
PDF
Loncheras nutritivas en el Nivel Inicial
PPTX
HONGOS BOLETACEOS DE LA REGION DEL COFRE DE PEROTE VERACRUZ, MEXICO
PPTX
Sesión informativa
PDF
Including financial criteria in the strategic planning of knowledge repositor...
DOC
Cuaderno valderejo 2015 2016
PPTX
Contratos administrativos
PPTX
Bigot means
Aguilar hoyuky act2
Finnish Red Cross Project
Mining and Managing Large-scale Linked Open Data
Calse 1, 2, 3 de quique
PROYECTO INTEGRADOR
Cervantes
Presentacion Final 23 08 11
Technology Strategy Report - Case UPM Biofuels
Júpiter - Hércules - Hermes Tres Modelos de Juez. Autor François Ost
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
Memoria descriptiva (1)
Conceptos básicos sobre tipos sistemas de agua
Física o química - Química inorgánica
Loncheras nutritivas en el Nivel Inicial
HONGOS BOLETACEOS DE LA REGION DEL COFRE DE PEROTE VERACRUZ, MEXICO
Sesión informativa
Including financial criteria in the strategic planning of knowledge repositor...
Cuaderno valderejo 2015 2016
Contratos administrativos
Bigot means
Ad

Similar to An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise (20)

PDF
A Bayesian Random Split to Build Ensembles of Classification Trees
PDF
CCC-Bicluster Analysis for Time Series Gene Expression Data
PDF
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
PDF
Log Message Anomaly Detection with Oversampling
PPTX
Orpailleur -- triclustering talk
PDF
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
PDF
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
ODP
On cascading small decision trees
PDF
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
PDF
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
PPTX
Multivariate dimensionality reduction in cross-correlation analysis
PDF
Ensembles.pdf
PDF
Increasing electrical grid stability classification performance using ensemb...
PDF
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
PDF
A detailed analysis of the supervised machine Learning Algorithms
PDF
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
PDF
Dimensionality Reduction
PPTX
Automatic Skin Lesion Segmentation and Melanoma Detection: Transfer Learning ...
PDF
Deep_Learning__INAF_baroncelli.pdf
PDF
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
A Bayesian Random Split to Build Ensembles of Classification Trees
CCC-Bicluster Analysis for Time Series Gene Expression Data
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Log Message Anomaly Detection with Oversampling
Orpailleur -- triclustering talk
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
On cascading small decision trees
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
Multivariate dimensionality reduction in cross-correlation analysis
Ensembles.pdf
Increasing electrical grid stability classification performance using ensemb...
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
A detailed analysis of the supervised machine Learning Algorithms
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
Dimensionality Reduction
Automatic Skin Lesion Segmentation and Melanoma Detection: Transfer Learning ...
Deep_Learning__INAF_baroncelli.pdf
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN

More from NTNU (16)

PDF
Varying parameter in classification based on imprecise probabilities
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PDF
lassification with decision trees from a nonparametric predictive inference p...
PDF
Locally Averaged Bayesian Dirichlet Metrics
PDF
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
PDF
An interactive approach for cleaning noisy observations in Bayesian networks ...
PDF
Learning classifiers from discretized expression quantitative trait loci
PDF
Split Criterions for Variable Selection Using Decision Trees
PDF
A Semi-naive Bayes Classifier with Grouping of Cases
PDF
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
PDF
Interactive Learning of Bayesian Networks
PDF
A Bayesian approach to estimate probabilities in classification trees
PDF
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
PDF
Evaluating query-independent object features for relevancy prediction
PDF
Effects of Highly Agreed Documents in Relevancy Prediction
PDF
Conference poster 6
Varying parameter in classification based on imprecise probabilities
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
lassification with decision trees from a nonparametric predictive inference p...
Locally Averaged Bayesian Dirichlet Metrics
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
An interactive approach for cleaning noisy observations in Bayesian networks ...
Learning classifiers from discretized expression quantitative trait loci
Split Criterions for Variable Selection Using Decision Trees
A Semi-naive Bayes Classifier with Grouping of Cases
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Interactive Learning of Bayesian Networks
A Bayesian approach to estimate probabilities in classification trees
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Evaluating query-independent object features for relevancy prediction
Effects of Highly Agreed Documents in Relevancy Prediction
Conference poster 6

Recently uploaded (20)

PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
An interstellar mission to test astrophysical black holes
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
famous lake in india and its disturibution and importance
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Classification Systems_TAXONOMY_SCIENCE8.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
An interstellar mission to test astrophysical black holes
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
Biophysics 2.pdffffffffffffffffffffffffff
Introduction to Fisheries Biotechnology_Lesson 1.pptx
The KM-GBF monitoring framework – status & key messages.pptx
famous lake in india and its disturibution and importance
bbec55_b34400a7914c42429908233dbd381773.pdf
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Comparative Structure of Integument in Vertebrates.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
microscope-Lecturecjchchchchcuvuvhc.pptx
. Radiology Case Scenariosssssssssssssss
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise

  • 1. An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise Joaquín Abellán and Andrés R. Masegosa Department of Computer Science and Artificial Intelligence University of Granada Verona, July 2009 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty ECSQARU 2009 Verona (Italy) 1/23
  • 3. Introduction Introduction Ensembles of Decision Trees (DT) Features They usually build different DT for different samples of the training dataset. The final prediction is a combination of the individual predictions of each tree. Take advantage of the inherent instability of DT. Bagging, AdaBoost and Randomization the most known approaches. ECSQARU 2009 Verona (Italy) 3/23
  • 4. Introduction Introduction Classification Noise (CN) in the class values Definition The class values of the samples given to the learning algorithm have some errors. Random classification noise: the label of each example is flipped randomly and independently with some fixed probability called the noise rate. Causes It is mainly due to errors in the data capture process. Very common in real world applications: surveys, biological or medical information... Effects on ensembles of decision trees The presence of classification noise degenerates the performance of any classification inducer. AdaBoost is known to be very affected by the presence of classification noise. Bagging is the ensemble approach with the better response to classification noise. ECSQARU 2009 Verona (Italy) 4/23
  • 5. Introduction Introduction Motivation of this study Description Decision trees built with different split criteria are considered in a Bagging scheme. Common split criteria (InfoGain, InfoGain Ratio and Gini Index) and a new split criteria based on imprecise probabilities are analyzed. Analyze which split criteria is more robust to the presence of classification noise. Outline Description of the different split criteria. Bagging Decision Trees Experimental Results. Conclusions and Future Works. ECSQARU 2009 Verona (Italy) 5/23
  • 6. Split Criteria Part II Split Criteria ECSQARU 2009 Verona (Italy) 6/23
  • 7. Split Criteria Split Criteria Decision Trees Example Description Attributes are placed at the nodes. Class values are placed at the leaves. Each leaf corresponds to a decision rule. Learning Split Criteria selects the attribute to place at each branching node. Stop Criteria decides when to fix a leaf and stop the branching. ECSQARU 2009 Verona (Italy) 7/23
  • 8. Split Criteria Split Criteria Classic Split Criteria Description A real-valued function which measures the goodness of an attribute X as an split node in the decision tree. A local measure that allows a recursive building of the decision tree. Information Gain (IG) Introduced by Quinlan as basis of his ID3 model [18]. It is based on Shannon’s entropy. IG(X, C) = H(C|X) − H(C) = − i j p(cj , xi ) log p(cj , xi ) p(cj )p(xi ) . Tendency to select attributes with a high number of states. ECSQARU 2009 Verona (Italy) 8/23
  • 9. Split Criteria Split Criteria Classic Split Criteria Information Gain Ratio (IGR) Improved version of IG (Quinlan’C4.5 tree inducer [19]). Normalizes the information gain dividing by the entropy of the split attribute. IGR(X, C) = IG(X, C) H(X) . Penalizes attributes with many states. Gini Index (GIx) Measure the impurity degree of a partition. Introduced by Breiman as basis of CART tree inducer [8]. GIx(X, C) = gini(C|X) − gini(C) gini(C|X) = t p(xi )gini(C|X = xi ) gini(C) = 1 − j p2 (cj ) Tendency to select attributes with a high number of states. ECSQARU 2009 Verona (Italy) 9/23
  • 10. Split Criteria Split Criteria Split Criteria based on Imprecise Probabilities Imprecise Information Gain (IIG) [3] An uncertainty measure for convex sets of probability distributions. Probability intervals for each state of the class variable are computed from the dataset using Walley’s Imprecise Dirichlet Model (IDM) [24]. p(cj ) ∈ ncj N + s , ncj + s N + s ≡ Icj , p(cj |xi ) ∈ ncj ,xi N + s , ncj ,xi + s Nxi + s ≡ Icj ,xi If we label K(C) and K(C|(X = xi )) for the following sets of probability distributions q on ΩC: K(C) = {q| q(cj ) ∈ Icj } K(C|X = xi ) = {q| q(cj ) ∈ Icj ,xi }, Imprecise Info-Gain for each variable X is defined as: IIG(X, C) = S(K(C)) − p(xi )S(K(C|(X = xi ))) where S() is the maximum entropy function of a convex set. It can be efficiently computed for s=1 [1]. ECSQARU 2009 Verona (Italy) 10/23
  • 11. Bagging Decision Trees Part III Bagging Decision Trees ECSQARU 2009 Verona (Italy) 11/23
  • 12. Bagging Decision Trees Bagging Decision Trees Bagging Decision Trees Procedure Ti samples are generated by random sampling with replacement from the initial training dataset. From each Ti sample, a simple decision tree is built using a given split criteria. Final prediction is made by a majority voting criteria. Description As Breiman [9] said about Bagging: The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then Bagging can improve accuracy. The combination of multiple models reduce the overfitting of the single decision trees to the data set. ECSQARU 2009 Verona (Italy) 12/23
  • 14. Experiments Experiments Experimental Set-up Datasets Benchmark 25 UCI datasets with very different features. Missing values were replaced with mean and mode values for continuous and discrete attributes respectively. Continuous attributes were discretized with Fayyad & Irani’s method [13]. Preprocessing was only carried out considering information for training data sets. Evaluated Algorithms Bagging ensembles of 100 trees. Different split criteria: IG, IGR, GIx and IIG. Evaluation Method Different noise rates were applied to training datasets (not to test datasets): 0%, 5%, 10%, 20% and 30%. k-10 fold cross validation repeated 10 times were used to estimate the classification accuracy. ECSQARU 2009 Verona (Italy) 14/23
  • 15. Experiments Experiments Statistical Tests Two classifiers on a single dataset Corrected Paired T-test [26]: A corrected version of the paired T-test implemented in Weka. Two classifiers on multiple datasets Wilconxon Signed-Ranks Test [25]: A non-parametric test which ranks the differences in each dataset. Sign Test [20,22]: A binomial test that counts the number of wins, losses and ties across each dataset. Multiple classifiers on multiple datasets Friedman Test [15,16]: A non-parametric test that ranks the algorithms for each dataset, the best one gets rank 1, second one gets rank 2... Null-hypothesis is that all algorithms perform equally well. Nemenyi Test [17]: A post-hoc test that is employed to compare the algorithms among them when the null-hypothesis with Friedman test is rejected. ECSQARU 2009 Verona (Italy) 15/23
  • 16. Experiments Experiments Average Performance Analysis The average accuracy is similar when no noise is introduced. The introduction of noise deteriorates the performance of classifiers. But IIG is more robust to noise, because its average performance is higher in each one of the noise levels. ECSQARU 2009 Verona (Italy) 16/23
  • 17. Experiments Experiments Corrected Paired T-Test at 0.05 level Number of accumulated Wins, Ties and Defeates (W/T/D) of IIG respect to IG, IGR and GIx in 25 datasets. Noise IG IGR GIx 0% 2/22/1 1/23/1 2/22/1 5% 11/14/0 10/15/0 11/14/0 10% 13/12/0 10/15/0 13/12/0 20% 16/9/0 11/14/0 18/7/0 30% 17/8/0 11/14/0 17/8/0 Analysis Without noise, there is a tie in almost all datasets. As much noise is added, higher the number of wins are. IIG wins in a high number of datasets and it is not defeated in none of them. ECSQARU 2009 Verona (Italy) 17/23
  • 18. Experiments Experiments Wilconxon and Sign Test at 0.05 level Comparison of IIG respect to the rest of split criteria. ’-’ indicates non-statistically significant differences. Noise 0 % 5 % 10 % 20 % 30 % Wilconxon Test IG IGR GIx IIG - IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG Sign Test IG IGR GIx IIG - IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG IIG Analysis Without noise, IIG outperforms IG and GIx, but not IGR. With any level of noise, IIG outperforms the rest of the splits. IGR also outperforms IG and GIx when there is some noise level. ECSQARU 2009 Verona (Italy) 18/23
  • 19. Experiments Experiments Friedman Test at 0.05 level The ranks assessed by Friedman Test are depicted. As lower the rank, better the performance. Ranks in bold face indicates that IIG statistically outperforms with Nemenyi Test. Noise IIG IG IGR GIx 0% 1.86 2.92 2.52 2.70 5% 1.18 3.18 2.54 3.12 10% 1.12 3.26 2.36 3.26 20% 1.12 3.20 2.16 3.52 30% 1.12 3.36 2.26 3.26 Analysis Without noise, IIG has the best ranking and outperforms IG. With a noise level higher than 10%, IIG outperforms over the rest. IGR also outperforms IG and GIx when the noise level is higher than 20%. ECSQARU 2009 Verona (Italy) 19/23
  • 20. Experiments Experiments Computational Time Analysis Without noise, all split criteria have similar time average. The introduction of noise deteriorates the computational performance of classifiers. IIG and GIx consumes less time than the other split criteria. IGR is the most time consumer. ECSQARU 2009 Verona (Italy) 20/23
  • 21. Conclusions and Future Works Part V Conclusions and Future Works ECSQARU 2009 Verona (Italy) 21/23
  • 22. Conclusions and Future Works Conclusions and Future Works Conclusions and Future Works Conclusions Experimental study about the performance of different split criteria in a bagging scheme under classification noise. Three classic split criteria: IG, IGR and GIx; and a new one based on imprecise probabilities: IIG. Bagging with IIG has a strong behavior respect to the other ones when the noise level is increased. IGR has also a good performance with noise level, but lower than IIG. Future Works Extend the methods for continuous and missing values. Further investigate the computational cost of any of the models as well as other factors such as number of trees, pruning... Introduce new imprecise models. ECSQARU 2009 Verona (Italy) 22/23
  • 23. Thanks for your attention!! Questions? ECSQARU 2009 Verona (Italy) 23/23