SlideShare a Scribd company logo
Ensemble Learning:
About Ensemble Learning
AAA-Python Edition
Plan
●
1- Ensemble Learning
●
2- Bagging & Pasting
●
3- Features sampling
●
4- Boosting : Adaptive Boosting
●
5- Boosting: Gradient boosting
●
6- Stacking or Blending
3
1-EnsembleLearning
[By Amina Delali]
ConceptConcept
●
In Ensemble Learning, we combine several models to build a
better model. The algorithm used in Ensemble learning is
called: an Ensemble Method.
●
We can combine classifers or regressors.
●
The models can be all the same type, or diferent.
Model 1 Model 2 ... Model n
Learning from data
Making predictions for new
data
Select the final prediction
4
1-EnsembleLearning
[By Amina Delali]
VotingVoting
●
Its about: how to select the fnal prediction.
●
In Classifcation
➢
Hard Voting: For each sample, a classifer will make a
prediction : a class for that sample
➢
Select the most predicted class by all the
classifers, for that sample .
➢
Soft Voting: available when the classifers can predict class
probabilities.
●
Select the class with the highest averaged
probability
●
In Regression
●
The average of the predicted values.
5
1-EnsembleLearning
[By Amina Delali]
Ensemble MethodsEnsemble Methods
●
The ensemble methods can vary by:
●
Varying or not the types of the models: use the same or
different models.
●
Select the same sample only once or several times in the
same model.
●
Whether or not the model use all the features or only a subset
of features.
●
The models learn in parallel or sequentially.
●
The type of mechanism used to make a prediction.
6
1-EnsembleLearning
[By Amina Delali]
ExampleExample
The 3 models used in
this ensemble method
The will combine the models
A string representing
the type of the model:
‘dt’ for : decision tree
Hard voting
was used to
determine
these
classes
7
2-Bagging&Pasting
[By Amina Delali]
DefnitionDefnition
●
Bagging (or Bootstrap aggregating) and pasting are both
ensemble methods that combine same type of models. They both
train the models on diferent random sub sets. All the models run
in parallel. They use the voting mechanism for the fnal
prediction
●
Both Bagging and Basting apply random sampling :
➢
The training set for each model is a subset of the
original data randomly selected.
➢
The same sample can be found in different models
(different subsets).
➢
●
The difference is:
➢
Bagging : random sampling with replacement <==>
➔
One sample can be found several times in the
same model (same subset).
➢
Pasting: random sampling without replacement <==>
➔
One sample can be found only once in the same
model (same subset).
8
2-Bagging&Pasting
[By Amina Delali]
Bagging example using scikit-learnBagging example using scikit-learn
●
The data
Bootstrap =True ==> Ensemble method : bagging
TheModel2 = SVC == > all the models are support vector machine classifiers
max_samples=90 ==> the size of the subsets (bags) == 90 sample
n_jobs=-1 ==> use all the available cores (to compute in parallel)
n_estimators= 300 ==> use 300 SVC
9
2-Bagging&Pasting
[By Amina Delali]
Pasting example using sckit-learnPasting example using sckit-learn
●
Bootstrap =False ==> Ensemble method : pasting
The remaining parameter are the same as the
previous ones
The model used is a LogisticRegression classifier
10
3-Featuressampling
[By Amina Delali]
DefnitionDefnition
●
All the following methods use features sampling:
each model will be trained in a random subset of features.
●
Sampling features can be with or without replacement.
●
Random patches method
➢
Sampling both “training instances” and “features”
●
Random subspaces method
➢
Keeping all “training” instances but “sampling” features
11
3-Featuressampling
[By Amina Delali]
Random Patches methodRandom Patches method
●
The data: 75
samples
(instance),
with 100
features
Sampling
features
with
replacement
Bagging
Number of
selected
features =
80 <100
features==>
features
sampling
Number of selected samples = 50 <
75 ==> instances sampling
12
3-Featuressampling
[By Amina Delali]
Random Subspaces methodRandom Subspaces method
●
Sampling features :
80 < 100
Pasting
Features
sampling
without
replacement Since max_samples
= 1.0 (a float
value)==>
max_samples
=100% of the
training data (100%
*75 = 75)
Since all samples
are used without
replacement ==>
the instances are
not sampled ==>
all the training
data is used (75
samples)
13
4-Boosting:
AdaptiveBoosting
[By Amina Delali]
DefnitionDefnition
●
Boosting : Ensemble method, that combines several weak learners
into a stronger learner.
●
This is done by training the models sequentially ==> each
model correct (boost) its predecessor.
●
Uses the same models on the same data each time.
●
The most known boosting methods are: Adaptive
Boosting and Gradient Boosting.
➢
Adaptive Boosting: each new predictor focus on the
training samples that its predecessor underftted
( for example: misclassifed in a classifcation problem)
by modifying the instances weight .
➢
Gradient Boosting: the new predictor tries to ft
to the residual errors made by the previous
predictor.
14
4-Boosting:
AdaptiveBoosting
[By Amina Delali]
AdaBoost: trainingAdaBoost: training
●
Weighting samples ==> each sample value will be multiplied by
its weight.
●
AdaBoost is Applicable in binary classifcation.
●
The steps of the algorithm are as follow:
➢ initialize the samples weight wi
(for the frst predictor ) by 1/m.
m is the number of the training samples.
➢
for each predictor j compute:
➔ the weighted error rate : rj
=∑wi
(whre the prediction is wrong) / ∑wi
➔ compute the j predictor's weight: αj
=ηlog(1−rj)
/ rj
). η is the
learning rate parameter.
➔
Compute the new weights (to be used by the following new
predictor j+1) : wi
=wi
,if,ytrue
(i)=ypred
(i)
wi
=wi
exp(αj
),if,ytrue
(i)≠ypred
(i)
➔
Normalize the new weights wi
by: wi
=1/∑wi
➔
The process is repeated until the perfect predictor is found, or
the maximum number of predictors is reached.
●
15
4-Boosting:
AdaptiveBoosting
[By Amina Delali]
AdaBoost: PredictingAdaBoost: Predicting
➢
To make a prediction:
➔
make a prediction with each predictor j from the resulting N
predictors.
➔
attribute a weight to each prediction by the predictor's j weight
αj
➔
for each sample x select the class k that receives the majority
of weighted votes: for each predicted class k sum up the
corresponding αj
weights, then select the class k with the
biggest sum.
AdaBoost: SAMMEAdaBoost: SAMME
●
SAMME : Stagewise Additive Modeling using a Multi-class Exponential loss
function
●
Enhanced version of AdaBoost, applicable in multiclass classification.
●
Same steps as AdaBoost, just the α weight is computed differently:
αj
=η (log[(1−∗ rj) /rj] + log(K−1)). K is the number of classes.
16
4-Boosting:
AdaptiveBoosting
[By Amina Delali]
ExampleExample
➢
A weak classifier : a
decision tree with 1
level: the 2 leafs, the
split of the root node.
This Tree is called a :
Decision Stump
In sckit-learn to apply the
adaptive boosting to
regression, the weights are
adjusted according to the error
of the predictions
17
5-Boosting:
Gradientboosting
[By Amina Delali]
The conceptThe concept
1.The predictor will be frst trained on a set of data: x,y
2.The residual errors are computed from its prediction: r = y - ypred
3.A new predictor will be trained with the new set of data: x,r
4.The residual errors are computed again as follow: r2 = r - rpred
5.The steps 3 and 4 are repeated until: you predict using all the
predefned number of predictors, or you determine the optimal
consecutive predictors (the least generated error) and you select
those predictors as your fnal model. Or, you continue adding
predictors until the errors will not diminish
6. The fnal prediction will be the sum of all the predictions.
●
Scikit learn implements gradient tree boosting: the models used
are decision trees.
18
5-Boosting:
Gradientboosting
[By Amina Delali]
Example: the dataExample: the data
●
●
Boston House Prices dataset.
●
The chosen feature represents
the: % lower status of the
population
19
5-Boosting:
Gradientboosting
[By Amina Delali]
Example: using a GBRTExample: using a GBRT
●
GBRT for Gradient Boosted Regression Trees
20
6-Stackingorblending
[By Amina Delali]
ConceptConcept
●
The idea here, is to train a model to learn how to aggregate the
ensemble models predictions.
●
The method is composed of:
➢
Learner models : that will ft to the data, and make the
predictions.
➢
Blender: the fnal model or meta learner, that will make the fnal
prediction.
●
There are different methods to train the blender:
●
Hold-out set: Blending
●
Out-of-fold: Stacking
21
6-Stackingorblending
[By Amina Delali]
Hold-out set: principleHold-out set: principle
●
An example using 3 Predictors and 1 blender (2 layers)
predicting
Held-out set
Predicted
Values:
Set 1 Blender
Training
Predicted
Values:
Set 3
Training
Predicted
Values:
Set 2
New values
Final prediction
●
To make a
prediction, the
new instance
will go through
the first layer.
●
The resulting
predictions will
serve as input
for the second
layer.
●
The prediction
made by this
later one is the
final result.
Layer 1
Layer 2
22
6-Stackingorblending
[By Amina Delali]
Hold-out set: TrainingHold-out set: Training
Subset xr1 for training
Subset xr2 for predicting
Training the first level
Predicting with the
first level
Generate the new
features Training the blender
23
6-Stackingorblending
[By Amina Delali]
Hold-out set: TestingHold-out set: Testing
The test data will be used
by the first level to predict
the values == features used
by later by the blender
The final prediciton, will be
done by the blender.
24
6-Stackingorblending
[By Amina Delali]
Hold-out set: generalizationHold-out set: generalization
●
Its possible to train several type of blenders. And , each one can be
a set of models.
●
The idea is to divide the original training set into several subsets: n
subsets ==> n layers (n-1 blending phase)
●
The frst set of predictors will train from the frst subset, and make
prediction with the second one.
●
The second set of predictors will train from the previous predictions.
And then make new predictions using the third subset.
●
The process is repeated until the last subset of predictors: it will
train from the last predictions made by the previous predictors
using the last subset of data.
Layer 1
….
Layer 2 Layer 3
...
Layer n
n
subsets
References
●
Aurélien Géron. Hands-on machine learning with Scikit-Learn and
Tensor-Flow: concepts, tools, and techniques to build intelligent
systems. O’Reilly Media, Inc, 2017.
●
Scikit-learn.org. scikit-learn, machine learning in python. On-line
at https://scikit-learn.org/stable/. Accessed on 03-11-2018.
Thank
you!
FOR ALL YOUR TIME

More Related Content

PPTX
Ensemble methods in machine learning
PPTX
Ensemble hybrid learning technique
PPTX
(Machine Learning) Ensemble learning
PPTX
Machine Learning - Ensemble Methods
PPTX
Ensemble methods
PPTX
Ensemble learning
PDF
Understanding Bagging and Boosting
PPTX
Ensemble learning Techniques
Ensemble methods in machine learning
Ensemble hybrid learning technique
(Machine Learning) Ensemble learning
Machine Learning - Ensemble Methods
Ensemble methods
Ensemble learning
Understanding Bagging and Boosting
Ensemble learning Techniques

What's hot (20)

PPTX
Ensemble methods
PPTX
Ensemble learning
PPTX
Bag the model with bagging
PPTX
boosting algorithm
PPTX
Machine learning with ADA Boost
PDF
Ensemble modeling and Machine Learning
PPT
Learning On The Border:Active Learning in Imbalanced classification Data
PDF
Boosting Algorithms Omar Odibat
PPTX
Supervised Machine Learning in R
PPTX
Boosting Approach to Solving Machine Learning Problems
PDF
Classification
PPSX
ADABoost classifier
PDF
Introduction to Some Tree based Learning Method
PPTX
Lecture 6: Ensemble Methods
PDF
Machine Learning and Data Mining: 16 Classifiers Ensembles
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PPTX
Binary Class and Multi Class Strategies for Machine Learning
PPTX
Borderline Smote
PDF
L4. Ensembles of Decision Trees
PPTX
Presentation on supervised learning
Ensemble methods
Ensemble learning
Bag the model with bagging
boosting algorithm
Machine learning with ADA Boost
Ensemble modeling and Machine Learning
Learning On The Border:Active Learning in Imbalanced classification Data
Boosting Algorithms Omar Odibat
Supervised Machine Learning in R
Boosting Approach to Solving Machine Learning Problems
Classification
ADABoost classifier
Introduction to Some Tree based Learning Method
Lecture 6: Ensemble Methods
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Binary Class and Multi Class Strategies for Machine Learning
Borderline Smote
L4. Ensembles of Decision Trees
Presentation on supervised learning
Ad

Similar to Aaa ped-14-Ensemble Learning: About Ensemble Learning (20)

PPTX
AIML UNIT 4.pptx. IT contains syllabus and full subject
PDF
Supervised Learning Ensemble Techniques Machine Learning
PPTX
Unit V -Multiple Learners.pptx for artificial intelligence
PPTX
Unit V -Multiple Learners in artificial intelligence and machine learning
PPT
Lecture -8 Classification(AdaBoost) .ppt
PPT
Ensemble Learning in Machine Learning.ppt
PPT
INTRODUCTION TO BOOSTING.ppt
PDF
DMTM 2015 - 15 Classification Ensembles
PDF
BaggingBoosting.pdf
PDF
Complete picture of Ensemble-Learning, boosting, bagging
PPTX
Learn from Example and Learn Probabilistic Model
PDF
Ensemble Learning Notes for students of CS
PPT
Ensemble Learning bagging, boosting and stacking
PPT
Ensemble_Learning.ppt
PPT
Ensemble Learning bagging and boosting in ML
PPT
Ensemble_Learning_AND_ITS_TECHNIQUES.ppt
PPTX
Ensemble Learning.pptx machine learning1
PPTX
Gradient Boosted trees
PDF
dm1.pdf
PPT
Ensemble Learning Featuring the Netflix Prize Competition and ...
AIML UNIT 4.pptx. IT contains syllabus and full subject
Supervised Learning Ensemble Techniques Machine Learning
Unit V -Multiple Learners.pptx for artificial intelligence
Unit V -Multiple Learners in artificial intelligence and machine learning
Lecture -8 Classification(AdaBoost) .ppt
Ensemble Learning in Machine Learning.ppt
INTRODUCTION TO BOOSTING.ppt
DMTM 2015 - 15 Classification Ensembles
BaggingBoosting.pdf
Complete picture of Ensemble-Learning, boosting, bagging
Learn from Example and Learn Probabilistic Model
Ensemble Learning Notes for students of CS
Ensemble Learning bagging, boosting and stacking
Ensemble_Learning.ppt
Ensemble Learning bagging and boosting in ML
Ensemble_Learning_AND_ITS_TECHNIQUES.ppt
Ensemble Learning.pptx machine learning1
Gradient Boosted trees
dm1.pdf
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ad

More from AminaRepo (20)

PDF
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
PDF
Aaa ped-22-Artificial Neural Network: Introduction to ANN
PDF
Aaa ped-21-Recommender Systems: Content-based Filtering
PDF
Aaa ped-20-Recommender Systems: Model-based collaborative filtering
PDF
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
PDF
Aaa ped-18-Unsupervised Learning: Association Rule Learning
PDF
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
PDF
Aaa ped-16-Unsupervised Learning: clustering
PDF
Aaa ped-15-Ensemble Learning: Random Forests
PDF
Aaa ped-12-Supervised Learning: Support Vector Machines & Naive Bayes Classifer
PDF
Aaa ped-11-Supervised Learning: Multivariable Regressor & Classifers
PDF
Aaa ped-10-Supervised Learning: Introduction to Supervised Learning
PDF
Aaa ped-9-Data manipulation: Time Series & Geographical visualization
PDF
Aaa ped-Data-8- manipulation: Plotting and Visualization
PDF
Aaa ped-8- Data manipulation: Data wrangling, aggregation, and group operations
PDF
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
PDF
Aaa ped-5-Data manipulation: Pandas
PDF
Aaa ped-4- Data manipulation: Numpy
PDF
Aaa ped-3. Pythond: advanced concepts
PDF
Aaa ped-2- Python: Basics
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-20-Recommender Systems: Model-based collaborative filtering
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-18-Unsupervised Learning: Association Rule Learning
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-16-Unsupervised Learning: clustering
Aaa ped-15-Ensemble Learning: Random Forests
Aaa ped-12-Supervised Learning: Support Vector Machines & Naive Bayes Classifer
Aaa ped-11-Supervised Learning: Multivariable Regressor & Classifers
Aaa ped-10-Supervised Learning: Introduction to Supervised Learning
Aaa ped-9-Data manipulation: Time Series & Geographical visualization
Aaa ped-Data-8- manipulation: Plotting and Visualization
Aaa ped-8- Data manipulation: Data wrangling, aggregation, and group operations
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Aaa ped-5-Data manipulation: Pandas
Aaa ped-4- Data manipulation: Numpy
Aaa ped-3. Pythond: advanced concepts
Aaa ped-2- Python: Basics

Recently uploaded (20)

PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPT
protein biochemistry.ppt for university classes
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Sciences of Europe No 170 (2025)
PPTX
2. Earth - The Living Planet earth and life
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
BIOMOLECULES PPT........................
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
. Radiology Case Scenariosssssssssssssss
AlphaEarth Foundations and the Satellite Embedding dataset
protein biochemistry.ppt for university classes
Introduction to Fisheries Biotechnology_Lesson 1.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
2. Earth - The Living Planet Module 2ELS
Sciences of Europe No 170 (2025)
2. Earth - The Living Planet earth and life
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
BIOMOLECULES PPT........................
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Classification Systems_TAXONOMY_SCIENCE8.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
. Radiology Case Scenariosssssssssssssss

Aaa ped-14-Ensemble Learning: About Ensemble Learning

  • 1. Ensemble Learning: About Ensemble Learning AAA-Python Edition
  • 2. Plan ● 1- Ensemble Learning ● 2- Bagging & Pasting ● 3- Features sampling ● 4- Boosting : Adaptive Boosting ● 5- Boosting: Gradient boosting ● 6- Stacking or Blending
  • 3. 3 1-EnsembleLearning [By Amina Delali] ConceptConcept ● In Ensemble Learning, we combine several models to build a better model. The algorithm used in Ensemble learning is called: an Ensemble Method. ● We can combine classifers or regressors. ● The models can be all the same type, or diferent. Model 1 Model 2 ... Model n Learning from data Making predictions for new data Select the final prediction
  • 4. 4 1-EnsembleLearning [By Amina Delali] VotingVoting ● Its about: how to select the fnal prediction. ● In Classifcation ➢ Hard Voting: For each sample, a classifer will make a prediction : a class for that sample ➢ Select the most predicted class by all the classifers, for that sample . ➢ Soft Voting: available when the classifers can predict class probabilities. ● Select the class with the highest averaged probability ● In Regression ● The average of the predicted values.
  • 5. 5 1-EnsembleLearning [By Amina Delali] Ensemble MethodsEnsemble Methods ● The ensemble methods can vary by: ● Varying or not the types of the models: use the same or different models. ● Select the same sample only once or several times in the same model. ● Whether or not the model use all the features or only a subset of features. ● The models learn in parallel or sequentially. ● The type of mechanism used to make a prediction.
  • 6. 6 1-EnsembleLearning [By Amina Delali] ExampleExample The 3 models used in this ensemble method The will combine the models A string representing the type of the model: ‘dt’ for : decision tree Hard voting was used to determine these classes
  • 7. 7 2-Bagging&Pasting [By Amina Delali] DefnitionDefnition ● Bagging (or Bootstrap aggregating) and pasting are both ensemble methods that combine same type of models. They both train the models on diferent random sub sets. All the models run in parallel. They use the voting mechanism for the fnal prediction ● Both Bagging and Basting apply random sampling : ➢ The training set for each model is a subset of the original data randomly selected. ➢ The same sample can be found in different models (different subsets). ➢ ● The difference is: ➢ Bagging : random sampling with replacement <==> ➔ One sample can be found several times in the same model (same subset). ➢ Pasting: random sampling without replacement <==> ➔ One sample can be found only once in the same model (same subset).
  • 8. 8 2-Bagging&Pasting [By Amina Delali] Bagging example using scikit-learnBagging example using scikit-learn ● The data Bootstrap =True ==> Ensemble method : bagging TheModel2 = SVC == > all the models are support vector machine classifiers max_samples=90 ==> the size of the subsets (bags) == 90 sample n_jobs=-1 ==> use all the available cores (to compute in parallel) n_estimators= 300 ==> use 300 SVC
  • 9. 9 2-Bagging&Pasting [By Amina Delali] Pasting example using sckit-learnPasting example using sckit-learn ● Bootstrap =False ==> Ensemble method : pasting The remaining parameter are the same as the previous ones The model used is a LogisticRegression classifier
  • 10. 10 3-Featuressampling [By Amina Delali] DefnitionDefnition ● All the following methods use features sampling: each model will be trained in a random subset of features. ● Sampling features can be with or without replacement. ● Random patches method ➢ Sampling both “training instances” and “features” ● Random subspaces method ➢ Keeping all “training” instances but “sampling” features
  • 11. 11 3-Featuressampling [By Amina Delali] Random Patches methodRandom Patches method ● The data: 75 samples (instance), with 100 features Sampling features with replacement Bagging Number of selected features = 80 <100 features==> features sampling Number of selected samples = 50 < 75 ==> instances sampling
  • 12. 12 3-Featuressampling [By Amina Delali] Random Subspaces methodRandom Subspaces method ● Sampling features : 80 < 100 Pasting Features sampling without replacement Since max_samples = 1.0 (a float value)==> max_samples =100% of the training data (100% *75 = 75) Since all samples are used without replacement ==> the instances are not sampled ==> all the training data is used (75 samples)
  • 13. 13 4-Boosting: AdaptiveBoosting [By Amina Delali] DefnitionDefnition ● Boosting : Ensemble method, that combines several weak learners into a stronger learner. ● This is done by training the models sequentially ==> each model correct (boost) its predecessor. ● Uses the same models on the same data each time. ● The most known boosting methods are: Adaptive Boosting and Gradient Boosting. ➢ Adaptive Boosting: each new predictor focus on the training samples that its predecessor underftted ( for example: misclassifed in a classifcation problem) by modifying the instances weight . ➢ Gradient Boosting: the new predictor tries to ft to the residual errors made by the previous predictor.
  • 14. 14 4-Boosting: AdaptiveBoosting [By Amina Delali] AdaBoost: trainingAdaBoost: training ● Weighting samples ==> each sample value will be multiplied by its weight. ● AdaBoost is Applicable in binary classifcation. ● The steps of the algorithm are as follow: ➢ initialize the samples weight wi (for the frst predictor ) by 1/m. m is the number of the training samples. ➢ for each predictor j compute: ➔ the weighted error rate : rj =∑wi (whre the prediction is wrong) / ∑wi ➔ compute the j predictor's weight: αj =ηlog(1−rj) / rj ). η is the learning rate parameter. ➔ Compute the new weights (to be used by the following new predictor j+1) : wi =wi ,if,ytrue (i)=ypred (i) wi =wi exp(αj ),if,ytrue (i)≠ypred (i) ➔ Normalize the new weights wi by: wi =1/∑wi ➔ The process is repeated until the perfect predictor is found, or the maximum number of predictors is reached. ●
  • 15. 15 4-Boosting: AdaptiveBoosting [By Amina Delali] AdaBoost: PredictingAdaBoost: Predicting ➢ To make a prediction: ➔ make a prediction with each predictor j from the resulting N predictors. ➔ attribute a weight to each prediction by the predictor's j weight αj ➔ for each sample x select the class k that receives the majority of weighted votes: for each predicted class k sum up the corresponding αj weights, then select the class k with the biggest sum. AdaBoost: SAMMEAdaBoost: SAMME ● SAMME : Stagewise Additive Modeling using a Multi-class Exponential loss function ● Enhanced version of AdaBoost, applicable in multiclass classification. ● Same steps as AdaBoost, just the α weight is computed differently: αj =η (log[(1−∗ rj) /rj] + log(K−1)). K is the number of classes.
  • 16. 16 4-Boosting: AdaptiveBoosting [By Amina Delali] ExampleExample ➢ A weak classifier : a decision tree with 1 level: the 2 leafs, the split of the root node. This Tree is called a : Decision Stump In sckit-learn to apply the adaptive boosting to regression, the weights are adjusted according to the error of the predictions
  • 17. 17 5-Boosting: Gradientboosting [By Amina Delali] The conceptThe concept 1.The predictor will be frst trained on a set of data: x,y 2.The residual errors are computed from its prediction: r = y - ypred 3.A new predictor will be trained with the new set of data: x,r 4.The residual errors are computed again as follow: r2 = r - rpred 5.The steps 3 and 4 are repeated until: you predict using all the predefned number of predictors, or you determine the optimal consecutive predictors (the least generated error) and you select those predictors as your fnal model. Or, you continue adding predictors until the errors will not diminish 6. The fnal prediction will be the sum of all the predictions. ● Scikit learn implements gradient tree boosting: the models used are decision trees.
  • 18. 18 5-Boosting: Gradientboosting [By Amina Delali] Example: the dataExample: the data ● ● Boston House Prices dataset. ● The chosen feature represents the: % lower status of the population
  • 19. 19 5-Boosting: Gradientboosting [By Amina Delali] Example: using a GBRTExample: using a GBRT ● GBRT for Gradient Boosted Regression Trees
  • 20. 20 6-Stackingorblending [By Amina Delali] ConceptConcept ● The idea here, is to train a model to learn how to aggregate the ensemble models predictions. ● The method is composed of: ➢ Learner models : that will ft to the data, and make the predictions. ➢ Blender: the fnal model or meta learner, that will make the fnal prediction. ● There are different methods to train the blender: ● Hold-out set: Blending ● Out-of-fold: Stacking
  • 21. 21 6-Stackingorblending [By Amina Delali] Hold-out set: principleHold-out set: principle ● An example using 3 Predictors and 1 blender (2 layers) predicting Held-out set Predicted Values: Set 1 Blender Training Predicted Values: Set 3 Training Predicted Values: Set 2 New values Final prediction ● To make a prediction, the new instance will go through the first layer. ● The resulting predictions will serve as input for the second layer. ● The prediction made by this later one is the final result. Layer 1 Layer 2
  • 22. 22 6-Stackingorblending [By Amina Delali] Hold-out set: TrainingHold-out set: Training Subset xr1 for training Subset xr2 for predicting Training the first level Predicting with the first level Generate the new features Training the blender
  • 23. 23 6-Stackingorblending [By Amina Delali] Hold-out set: TestingHold-out set: Testing The test data will be used by the first level to predict the values == features used by later by the blender The final prediciton, will be done by the blender.
  • 24. 24 6-Stackingorblending [By Amina Delali] Hold-out set: generalizationHold-out set: generalization ● Its possible to train several type of blenders. And , each one can be a set of models. ● The idea is to divide the original training set into several subsets: n subsets ==> n layers (n-1 blending phase) ● The frst set of predictors will train from the frst subset, and make prediction with the second one. ● The second set of predictors will train from the previous predictions. And then make new predictions using the third subset. ● The process is repeated until the last subset of predictors: it will train from the last predictions made by the previous predictors using the last subset of data. Layer 1 …. Layer 2 Layer 3 ... Layer n n subsets
  • 25. References ● Aurélien Géron. Hands-on machine learning with Scikit-Learn and Tensor-Flow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc, 2017. ● Scikit-learn.org. scikit-learn, machine learning in python. On-line at https://scikit-learn.org/stable/. Accessed on 03-11-2018.