SlideShare a Scribd company logo
A Bayesian approach to estimate probabilities in
classification trees
Andrés Cano, Andrés R. Masegosa , Serafín Moral
Department of Computer Science and A.I.
University of Granada
Classification trees (CT) are one of the most used
supervised classification models. But one of their main
problems is the poor estimates of the class probabilities
they produce [1].
Good class probability estimates are essential in many
tasks such as probability based ranking problems [2].
This work proposes a Bayesian approach to build CT
with excellent class probability estimates (CPE).
1. Introduction
3. Bayesian Tree Averaging (BMA)
References
[1] Pazzani et al. 1994. Reducing misclassification costs. In International Conference of Machine Learning, pages 217-225.
[2] Provost and Domingos. 2003. Tree induction for probability-based ranking. Machine Learning, 52(3):199-215.
[3] Heckerman, Geiger, and Chickering. 1994. Learning Bayesian networks: The combination of knowledge and statistical data. In
KDD Workshop, pages 85-96.
[4] Hoeting, Madigan, Raftery and Volinsky. 1999. Bayesian model averaging: A tutorial. Statistical Science, 14(4):382-417.
In this work, CT induction is faced as a Bayesian model
selection problem [3].
At each step it is selected the tree with MAP probability
given the data. These options are evaluated:
Branch by a non-used node X in this branch: .
Stop the branching: .
Eq. for selecting the splitting node or stop branching:
2. Bayesian Tree Induction (BTI)
In many cases, branching by a node is only a little more
probable than stopping the branching. So, there is
uncertainty in this decision: Bayesian model averaging (BMA)
[4] is an approach to deal with this uncertainty.
Our application of BMA is an alternative of pruning the final
tree. The probability at leaves are estimated as follows:
4. Non-Uniform Priors (NUP)
In previous analysis, uniform alpha values has been considered
for Dirichlet prior distributions over the parameters.
We test here a heuristic to define non-uniform alpha values.
It is based on the fact that trees partition data and create
subsets where there is no sample for some classes.
Petal
Width
Petal
Length
(-Inf,0.8]
[0.8, 1.75]
(1.75,+Inf)
(-Inf,2.45]
[2.45, 4.75]
(4.75,+Inf)
(50, 0, 0) (0, 1, 45)
(0, 5, 4)
(0, 44, 1)
(0, 0, 0)
Step 1: Tree Induction
- Firstly, the classification tree is induced following the
classic recursive partitioning method for building CT. Each
attribute is evaluated following the equation of Section 2.
- Let us see as there is no sample at the red bounded leaf.
So there is no associated decision for that leaf.
- Secondly, for each node it is computed its associated
weight accordingly to the red bounded quotient of Section 3.
- The weight of “Petal Width” is much higher than “Petal
Length” because the partition of “Petal Width” is better than
the partition of “Petal Length”.
Figure 1: Example Iris Data Classification
Step 2: Intermediate Tree
(-Inf,0.8]
[0.8, 1.75]
(1.75,+Inf)
(-Inf,2.45]
[2.45, 4.75]
(4.75,+Inf)
(0.99, 0.005, 0.005) (0.01, 0.03, 0.96)
(0.04, 0.53, 0.43)
(0.01, 0.96, 0.03)
(0.33, 0.33, 0.33)
(0.33, 0.33, 0.33)
W1: 6.31*E59
(0.05, 0.90, 0.05)
W2: 28.75
Petal
Width
Petal
Length
(-Inf,0.8]
[0.8, 1.75]
(1.75,+Inf)
(-Inf,2.45]
[2.45, 4.75]
(4.75,+Inf)
(0.99, 0.005, 0.005) (0.01, 0.03, 0.96)
(0.03, 0.55, 0.42)
(0.01, 0.96, 0.03)(0.325, 0.35, 0.325)
- Finally, the probabilities are weighted and updated following the
summation equation of Section 3.
- As we can see, at the red bounded leaf there is now an
associated decision. This effect would be the same than a post-
pruning process, but the CPE are more precise with this approach.
Step 3: Averaged Tree
5. Experiments & Conclusions
Methods were evaluated in 27 UCI data sets.
We compare the following 5 methods:
C4.5 of Quinlan with (C4.5p) and without pruning (C4.5¬p).
BTI of Section 2, BTI+BTA of Section 3 and BTI + BMA + NUP.
Several S values were evaluated: S=1, S=2 and S=|C|.
Two evaluated scores: the classic % of correct classification and the
log-likelihood of the true class (log-Score), this last score is introduced
with the aim of evaluate the quality of CPE.
Results are presented in Figure 2: the mean value of both scores and
the outputs of a corrected paired t-test are plotted. For simplicity, only
models with S = 2 are showed.
The main conclusions are:
BTI, BTA and NUP supposes an improvement in CPE and maintain the accuracy of
C4.5p.
The Bayesian approach is a promise technique to deal with model uncertainty in CT.
Figure 2: Results
• % Percentage of correct classifications.
• |Log-Score| Absolute Value of log-score.
As lower it is as better the class probability
estimates are.
• W/D/L The number of databases where there is
a statistically significant (at 1% level) improvement
respect to the score (% or |Log-S|) of C4.5p (it is set
as reference method).
• A Dirichlet prior distribution
over the parameters is assumed
with uniform alphas = S/|C|.
• S is considered the global
sample size.
Wi

More Related Content

PDF
IRJET-Handwritten Digit Classification using Machine Learning Models
PDF
hb2s5_BSc scriptie Steyn Heskes
PPTX
Second subjective assignment
PDF
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
PPTX
2016 7-13
PDF
Discretization methods for Bayesian networks in the case of the earthquake
PDF
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
PDF
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
IRJET-Handwritten Digit Classification using Machine Learning Models
hb2s5_BSc scriptie Steyn Heskes
Second subjective assignment
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
2016 7-13
Discretization methods for Bayesian networks in the case of the earthquake
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...

What's hot (19)

PDF
A Novel Algorithm for Design Tree Classification with PCA
PDF
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
PDF
Shriram Nandakumar & Deepa Naik
DOCX
Advance KNN classification of brain tumor
PDF
Graph Theory Based Approach For Image Segmentation Using Wavelet Transform
PDF
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
PDF
Haoying1999
PDF
A Review on Non Linear Dimensionality Reduction Techniques for Face Recognition
PDF
Textural Feature Extraction of Natural Objects for Image Classification
PPSX
New seven management tools
PDF
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
PDF
Dimensionality Reduction
PDF
Assessing Error Bound For Dominant Point Detection
PDF
Machine learning-in-non-stationary-environments-introduction-to-covariate-shi...
PDF
K-Means Segmentation Method for Automatic Leaf Disease Detection
PDF
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
PDF
Introduction to Machine Learning
PDF
Anirban part1
A Novel Algorithm for Design Tree Classification with PCA
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Shriram Nandakumar & Deepa Naik
Advance KNN classification of brain tumor
Graph Theory Based Approach For Image Segmentation Using Wavelet Transform
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
Haoying1999
A Review on Non Linear Dimensionality Reduction Techniques for Face Recognition
Textural Feature Extraction of Natural Objects for Image Classification
New seven management tools
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
Dimensionality Reduction
Assessing Error Bound For Dominant Point Detection
Machine learning-in-non-stationary-environments-introduction-to-covariate-shi...
K-Means Segmentation Method for Automatic Leaf Disease Detection
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
Introduction to Machine Learning
Anirban part1
Ad

Viewers also liked (9)

PDF
Adversarial Pattern Classification Using Multiple Classifiers and Randomisation
PPTX
CRI - Teaching Through Research - Melissa McCartney - AAAS - Science in the c...
PPTX
Freire 2013 summer
PPT
1 need and significance of teaching mathematics
PPTX
Pattern classification
PDF
A Review on Pattern Recognition with Offline Signature Classification and Tec...
PPTX
Pattern Recognition
PPT
Adaline madaline
PDF
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
Adversarial Pattern Classification Using Multiple Classifiers and Randomisation
CRI - Teaching Through Research - Melissa McCartney - AAAS - Science in the c...
Freire 2013 summer
1 need and significance of teaching mathematics
Pattern classification
A Review on Pattern Recognition with Offline Signature Classification and Tec...
Pattern Recognition
Adaline madaline
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
Ad

Similar to A Bayesian approach to estimate probabilities in classification trees (20)

PDF
Application of combined support vector machines in process fault diagnosis
PDF
debatrim_report (1)
PDF
IJCSI-10-6-1-288-292
PPT
4_22865_IS465_2019_1__2_1_08ClassBasic.ppt
PDF
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
PPT
decisiontrees.ppt
PPT
decisiontrees (3).ppt
PPT
decisiontrees.ppt
PDF
Multimodal Biometrics Recognition by Dimensionality Diminution Method
PDF
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
PPTX
Dataming-chapter-7-Classification-Basic.pptx
PDF
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
PDF
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
PDF
Analytical study of feature extraction techniques in opinion mining
PDF
NBaysian classifier, Naive Bayes classifier
PPTX
data mining.pptx
PDF
classification in data mining and data warehousing.pdf
PDF
Comparison on PCA ICA and LDA in Face Recognition
PDF
1376846406 14447221
Application of combined support vector machines in process fault diagnosis
debatrim_report (1)
IJCSI-10-6-1-288-292
4_22865_IS465_2019_1__2_1_08ClassBasic.ppt
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
decisiontrees.ppt
decisiontrees (3).ppt
decisiontrees.ppt
Multimodal Biometrics Recognition by Dimensionality Diminution Method
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Dataming-chapter-7-Classification-Basic.pptx
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
Analytical study of feature extraction techniques in opinion mining
NBaysian classifier, Naive Bayes classifier
data mining.pptx
classification in data mining and data warehousing.pdf
Comparison on PCA ICA and LDA in Face Recognition
1376846406 14447221

More from NTNU (18)

PDF
Varying parameter in classification based on imprecise probabilities
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PDF
Bagging Decision Trees on Data Sets with Classification Noise
PDF
lassification with decision trees from a nonparametric predictive inference p...
PDF
Locally Averaged Bayesian Dirichlet Metrics
PDF
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
PDF
An interactive approach for cleaning noisy observations in Bayesian networks ...
PDF
Learning classifiers from discretized expression quantitative trait loci
PDF
Split Criterions for Variable Selection Using Decision Trees
PDF
A Semi-naive Bayes Classifier with Grouping of Cases
PDF
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
PDF
Interactive Learning of Bayesian Networks
PDF
A Bayesian Random Split to Build Ensembles of Classification Trees
PDF
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
PDF
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
PDF
Evaluating query-independent object features for relevancy prediction
PDF
Effects of Highly Agreed Documents in Relevancy Prediction
PDF
Conference poster 6
Varying parameter in classification based on imprecise probabilities
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
Bagging Decision Trees on Data Sets with Classification Noise
lassification with decision trees from a nonparametric predictive inference p...
Locally Averaged Bayesian Dirichlet Metrics
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
An interactive approach for cleaning noisy observations in Bayesian networks ...
Learning classifiers from discretized expression quantitative trait loci
Split Criterions for Variable Selection Using Decision Trees
A Semi-naive Bayes Classifier with Grouping of Cases
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Interactive Learning of Bayesian Networks
A Bayesian Random Split to Build Ensembles of Classification Trees
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Evaluating query-independent object features for relevancy prediction
Effects of Highly Agreed Documents in Relevancy Prediction
Conference poster 6

Recently uploaded (20)

PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet earth and life
PDF
Sciences of Europe No 170 (2025)
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
famous lake in india and its disturibution and importance
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
neck nodes and dissection types and lymph nodes levels
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
microscope-Lecturecjchchchchcuvuvhc.pptx
7. General Toxicologyfor clinical phrmacy.pptx
The KM-GBF monitoring framework – status & key messages.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet earth and life
Sciences of Europe No 170 (2025)
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
TOTAL hIP ARTHROPLASTY Presentation.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
famous lake in india and its disturibution and importance
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

A Bayesian approach to estimate probabilities in classification trees

  • 1. A Bayesian approach to estimate probabilities in classification trees Andrés Cano, Andrés R. Masegosa , Serafín Moral Department of Computer Science and A.I. University of Granada Classification trees (CT) are one of the most used supervised classification models. But one of their main problems is the poor estimates of the class probabilities they produce [1]. Good class probability estimates are essential in many tasks such as probability based ranking problems [2]. This work proposes a Bayesian approach to build CT with excellent class probability estimates (CPE). 1. Introduction 3. Bayesian Tree Averaging (BMA) References [1] Pazzani et al. 1994. Reducing misclassification costs. In International Conference of Machine Learning, pages 217-225. [2] Provost and Domingos. 2003. Tree induction for probability-based ranking. Machine Learning, 52(3):199-215. [3] Heckerman, Geiger, and Chickering. 1994. Learning Bayesian networks: The combination of knowledge and statistical data. In KDD Workshop, pages 85-96. [4] Hoeting, Madigan, Raftery and Volinsky. 1999. Bayesian model averaging: A tutorial. Statistical Science, 14(4):382-417. In this work, CT induction is faced as a Bayesian model selection problem [3]. At each step it is selected the tree with MAP probability given the data. These options are evaluated: Branch by a non-used node X in this branch: . Stop the branching: . Eq. for selecting the splitting node or stop branching: 2. Bayesian Tree Induction (BTI) In many cases, branching by a node is only a little more probable than stopping the branching. So, there is uncertainty in this decision: Bayesian model averaging (BMA) [4] is an approach to deal with this uncertainty. Our application of BMA is an alternative of pruning the final tree. The probability at leaves are estimated as follows: 4. Non-Uniform Priors (NUP) In previous analysis, uniform alpha values has been considered for Dirichlet prior distributions over the parameters. We test here a heuristic to define non-uniform alpha values. It is based on the fact that trees partition data and create subsets where there is no sample for some classes. Petal Width Petal Length (-Inf,0.8] [0.8, 1.75] (1.75,+Inf) (-Inf,2.45] [2.45, 4.75] (4.75,+Inf) (50, 0, 0) (0, 1, 45) (0, 5, 4) (0, 44, 1) (0, 0, 0) Step 1: Tree Induction - Firstly, the classification tree is induced following the classic recursive partitioning method for building CT. Each attribute is evaluated following the equation of Section 2. - Let us see as there is no sample at the red bounded leaf. So there is no associated decision for that leaf. - Secondly, for each node it is computed its associated weight accordingly to the red bounded quotient of Section 3. - The weight of “Petal Width” is much higher than “Petal Length” because the partition of “Petal Width” is better than the partition of “Petal Length”. Figure 1: Example Iris Data Classification Step 2: Intermediate Tree (-Inf,0.8] [0.8, 1.75] (1.75,+Inf) (-Inf,2.45] [2.45, 4.75] (4.75,+Inf) (0.99, 0.005, 0.005) (0.01, 0.03, 0.96) (0.04, 0.53, 0.43) (0.01, 0.96, 0.03) (0.33, 0.33, 0.33) (0.33, 0.33, 0.33) W1: 6.31*E59 (0.05, 0.90, 0.05) W2: 28.75 Petal Width Petal Length (-Inf,0.8] [0.8, 1.75] (1.75,+Inf) (-Inf,2.45] [2.45, 4.75] (4.75,+Inf) (0.99, 0.005, 0.005) (0.01, 0.03, 0.96) (0.03, 0.55, 0.42) (0.01, 0.96, 0.03)(0.325, 0.35, 0.325) - Finally, the probabilities are weighted and updated following the summation equation of Section 3. - As we can see, at the red bounded leaf there is now an associated decision. This effect would be the same than a post- pruning process, but the CPE are more precise with this approach. Step 3: Averaged Tree 5. Experiments & Conclusions Methods were evaluated in 27 UCI data sets. We compare the following 5 methods: C4.5 of Quinlan with (C4.5p) and without pruning (C4.5¬p). BTI of Section 2, BTI+BTA of Section 3 and BTI + BMA + NUP. Several S values were evaluated: S=1, S=2 and S=|C|. Two evaluated scores: the classic % of correct classification and the log-likelihood of the true class (log-Score), this last score is introduced with the aim of evaluate the quality of CPE. Results are presented in Figure 2: the mean value of both scores and the outputs of a corrected paired t-test are plotted. For simplicity, only models with S = 2 are showed. The main conclusions are: BTI, BTA and NUP supposes an improvement in CPE and maintain the accuracy of C4.5p. The Bayesian approach is a promise technique to deal with model uncertainty in CT. Figure 2: Results • % Percentage of correct classifications. • |Log-Score| Absolute Value of log-score. As lower it is as better the class probability estimates are. • W/D/L The number of databases where there is a statistically significant (at 1% level) improvement respect to the score (% or |Log-S|) of C4.5p (it is set as reference method). • A Dirichlet prior distribution over the parameters is assumed with uniform alphas = S/|C|. • S is considered the global sample size. Wi