Ensembles of example dependent cost-sensitive decision trees slides

Ensembles of example-dependent cost-sensitive
decision trees
April 28, 2015
Alejandro Correa Bahnsen
with
Djamila Aouada, SnT
Björn Ottersten, SnT

Motivation
2
• Classification: predicting the class of
a set of examples given their
features.
• Standard classification methods aim
at minimizing the errors
• Such a traditional framework
assumes that all misclassification
errors carry the same cost
• This is not the case in many real-world applications: Credit card
fraud detection, churn modeling, credit scoring, direct marketing.

• Cost-sensitive classification
Background, previous contributions
• Cost-sensitive Ensembles
Introduction, random inducers, combination methods, propose algorithms
• Datasets
Credit card fraud detection, churn modeling, credit scoring, direct marketing
• Experiments
Experimental setup, results
• Conclusions
Contributions
Agenda
3

predict the class of set of examples given their features
Where each element of S is composed by
It is usually evaluated using a traditional misclassification measure such as
Accuracy, F1Score, AUC, among others.
However, these measures assumes that different misclassification errors
carry the same cost
Background - Binary classification
4

We define a cost measure based on the cost matrix [Elkan 2001]
From which we calculate the cost of applying a classifier to a given set
Background - Cost-sensitive evaluation
5

However, the total cost may not be easy to interpret. Therefore, we propose
a savings measure as the cost vs. the cost of using no algorithm at all
Where is the cost of predicting the costless class
Background - Cost-sensitive evaluation
6

Research in example-dependent cost-sensitive classification has been
narrow, mostly because of the lack of publicly available datasets [Aodha
and Brostow 2013].
Standard approaches consist in re-weighting the training examples based
on their costs:
• Cost-proportionate rejection sampling [Zadrozny et al. 2003]
• Cost-proportionate oversampling [Elkan 2001]
Background - State-of-the-art methods
7

• Bayes minimum risk
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost Sensitive Credit Card Fraud Detection
Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.
Miami, USA: IEEE, Dec. 2013, pp. 333–338.
• Probability calibration for Bayes minimum risk (BMR)
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Improving Credit Card Fraud Detection with
Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining,
Philadelphia, USA, 2014, pp. 677 – 685.
• Cost-sensitive logistic regression (CSLR)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for
Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:
IEEE, 2014, pp. 263–269.
• Cost-sensitive decision trees (CSDT)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Decision Trees,” Expert
Systems with Applications, in press, 2015.
Previous contributions
8

• Datasets
• Experiments
• Conclusions
Contributions
Agenda
9

The main idea behind the ensemble methodology is to combine several
individual base classifiers in order to have a classifier that outperforms
everyone of them
“The Blind Men and the Elephant”, Godfrey Saxe’s
Introduction - Ensemble learning
10
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Some unknown distribution

Typical ensemble is made by combining T different base classifiers. Each
base classifiers is trained by applying algorithm M in a random subset
Introduction - Ensemble learning
11

Random inducers
12
1
2
3
4
5
6
7
8
8
6
2
5
2
1
3
6
7
1
2
3
8
1
5
8
1
4
4
2
1
9
4
6
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
Bagging Pasting Random forest Random patches
Training set

After the base classifiers are constructed they are typically combined using
one of the following methods:
• Majority voting
• Proposed cost-sensitive weighted voting
Proposed combination methods
13

• Proposed cost-sensitive stacking
Using the cost-sensitive logistic regression [Correa et. al, 2014] model:
Then the weights are estimated using
Proposed combination methods
14

The subsampling can be done either by: Bagging, pasting, random forest or
random patches
Proposed algorithms
Base classifiers
For j in 1..T:
1.Subsample from training set
𝑆𝑗 ← Subsample(𝑆)
2.Train a CSDT on 𝑺𝒋
M𝑗 ← M(𝑆𝑗)
3.Estimate the weight
α𝑗 ← 𝑠𝑎𝑣𝑖𝑛𝑔𝑠 M𝑗 𝑆𝑗
𝑜𝑜𝑏
Combination
Select combination method:
1.Majority voting
𝐻 ← 𝑓𝑚𝑣 𝑆, 𝑀
2.CS-Weighted voting
𝐻 ← 𝑓𝑚𝑣 𝑆, 𝑀, 𝛼
3.CS-Stacking
𝛽 ← 𝑎𝑟𝑔𝑚𝑖𝑛 𝐽(𝑆, 𝑀, 𝛽)
𝐻 ← 𝑓𝑠 𝑆, 𝑀, 𝛽
15

• Datasets
• Experiments
• Conclusions
Contributions
Agenda
16

Cost matrix
Database
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost Sensitive Credit Card Fraud Detection
Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.
Miami, USA: IEEE, Dec. 2013, pp. 333–338.
Credit card fraud detection
17
# Examples % Positives Cost (Euros)
1,638,772 0.21% 860,448

Cost matrix
Database
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “A novel costsensitive
framework for customer churn predictive modeling,” Decision Analytics, vol. under review, 2015.
Churn modeling
18
9,410 4.83% 580,884

Cost matrix
Database
A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for
Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:
IEEE, 2014, pp. 263–269.
Credit scoring
19
Kaggle Credit 112,915 6.74% 83,740,181
PAKDD09 Credit 38,969 19.88% 3,117,960

Cost matrix
Database
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Improving Credit Card Fraud Detection with
Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining,
Philadelphia, USA, 2014, pp. 677 – 685.
Direct marketing
20
37,931 12.62% 59,507

• Datasets
• Experiments
• Conclusions
Contributions
Agenda
21

• Cost-insensitive (CI):
• Decision trees (DT)
• Logistic regression (LR)
• Random forest (RF)
• Under-sampling (u)
• Cost-proportionate sampling (CPS):
• Cost-proportionate rejection-sampling (r)
• Cost-proportionate over-sampling (o)
• Bayes minimum risk (BMR)
• Cost-sensitive training (CST):
• Cost-sensitive logistic regression (CSLR)
• Cost-sensitive decision trees (CSDT)
Experimental setup - Methods
22

• Ensemble cost-sensitive decision trees (ECSDT):
Random inducers:
• Bagging (CSB)
• Pasting (CSP)
• Random forest (CSRF)
• Random patches (CSRP)
Combination:
• Majority voting (mv)
• Cost-sensitive weighted voting (wv)
• Cost-sensitive staking (s)
Experimental setup - Methods
23

• Each experiment was carry out 50 times
• For the parameters of the algorithms a grid search was made
• Results are measured by savings
• Then the Friedman ranking is calculated for each method
Experimental setup
24

Results
25
Results of the Friedman rank of the savings (1=best, 28=worst)
Family Algorithm Rank
ECSDT CSRP-wv-t 2.6
ECSDT CSRP-s-t 3.4
ECSDT CSRP-mv-t 4
ECSDT CSB-wv-t 5.6
ECSDT CSP-wv-t 7.4
ECSDT CSB-mv-t 8.2
ECSDT CSRF-wv-t 9.4
BMR RF-t-BMR 9.4
ECSDT CSP-s-t 9.6
ECSDT CSP-mv-t 10.2
ECSDT CSB-s-t 10.2
BMR LR-t-BMR 11.2
CPS RF-r 11.6
CST CSDT-t 12.6
Family Algorithm Rank
CST CSLR-t 14.4
ECSDT CSRF-mv-t 15.2
ECSDT CSRF-s-t 16
CI RF-u 17.2
CPS LR-r 19
BMR DT-t-BMR 19
CPS LR-o 21
CPS DT-r 22.6
CI LR-u 22.8
CPS RF-o 22.8
CI DT-u 24.4
CPS DT-o 25
CI DT-t 26
CI RF-t 26.2

Results
26
Results of the Friedman rank of the savings organized by family

Results
27
Percentage of the highest savings
Database Algorithm Savings
Fraud CSRP-wv-t 0.73
Churn CSRP-s-t 0.17
Credit1 CSRP-mv-t 0.52
Credit2 LR-t-BMR 0.31
Marketing LR-t-BMR 0.5

Results within the ECSDT family
28
By combination methodBy random inducer

• New framework for ensembles of example dependent cost-sensitive
decision trees
• Using five databases, from four real-world applications: credit card fraud
detection, churn modeling, credit scoring and direct marketing, we show
that the proposed algorithm significantly outperforms the state-of-the-
art cost-insensitive and example-dependent cost-sensitive algorithms
• Highlight the importance of using the real example-dependent financial
costs associated with the real-world applications
Conclusions
29

Costcla - Software
CostCla is a Python module for cost-sensitive machine learning built on
top of Scikit-Learn, SciPy and distributed under the 3-Clause BSD license.
In particular, it provides:
• A set of example-dependent cost-sensitive algorithms
• Different real-world example-dependent cost-sensitive datasets.
Installation
pip install costcla
Documentation: https://pythonhosted.org/costcla/
Development: https://github.com/albahnsen/CostSensitiveClassification
30

Thank You!!
Alejandro Correa Bahnsen

Ensembles of example dependent cost-sensitive decision trees slides

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to Ensembles of example dependent cost-sensitive decision trees slides (20)

Recently uploaded (20)

Ensembles of example dependent cost-sensitive decision trees slides