Understanding Bagging and Boosting

Understanding
Bagging and
Boosting
Both are ensemble techniques,
where a set of weak learners are combined to create a strong learner
that obtains better performance than a single one.
Error = Bias + Variance
+ Noise

Bagging short for Bootstrap Aggregating
It’s a way to increase accuracy by Decreasing Variance
Done by
Generating additional dataset using combinations
with repetitions to produce multisets of same
cardinality/size as original dataset.
Example: Random Forest
Develops fully grown decision
trees (low bias high variance)
which are uncorrelated to
maximize the decrease in
variance.
Since cannot reduce bias
therefore req. large unpruned
trees.

Boosting
It’s a way to increase accuracy by Reducing Bias
2- step Process Done by
Develop averagely performing models over subsets of
the original data.
Boost these model performance by combining them
using a cost function (eg.majority vote).
Note: every subsets contains elements that were
misclassified or were close by the previous model.
Example: Gradient Boosted Tree
Develops shallow decision trees (high
bias low variance) aka weak larner.
Reduce error mainly by reducing bias
developing new learner taking into
account the previous learner
(Sequential).

Comparison
Both are ensemble methods to get N learners
from 1 learner…
… but, while they are built independently for
Bagging, Boosting tries to add new models that do
well where previous models fail.
Both generate several training data sets by
random sampling…
… but only Boosting determines weights for the data
to tip the scales in favor of the most difficult cases.
Both make the final decision by averaging the N
learners (or taking the majority of them)…
… but it is an equally weighted average for Bagging
and a weighted average for Boosting, more weight
to those with better performance on training data.
Both are good at reducing variance and provide
higher stability…
… but only Boosting tries to reduce bias. On the other
hand, Bagging may solve the overfitting problem,
while Boosting can increase it.
Similarities Differences

Exploring the Scope of Supervised
Learning in Current Setup
Areas where Supervised Learning can be useful
Feature Selection for Clustering
Evaluating Features
Increasing the Aggressiveness of the Current setup
Bringing New Rules Idea

Feature
Selection/
Feature
Importance &
Model
Accuracy and
Threshold
Evaluation
Algorithm Used Feature Importance Metric
XGBoost F Score
Random Forest Gini Index, Entropy

Feature
Selection/
Importance
XGBoost - F Score

Feature
Selection/
Importance
RF - Gini Index

Feature
Selection/
Importance
RF - Entropy

Feature Selection/ Importance
Comparison b/w Important Feature by Random Forest & XGBoost
feature_21w
feature_sut
feature_du1
feature_sc3
feature_drh
feature_1a2
feature_sc18
feature_drl
feature_snc
feature_sc1
feature_2c3
feature_npb
feature_3e1
feature_bst
feature_nub
RF - Entropy
feature_sut
feature_sc3
feature_21w
feature_sc18
feature_du1
feature_sc1
feature_drh
feature_drl
feature_1a2
feature_snc
feature_npb
feature_3e1
feature_tbu
feature_nub
feature_bst
RF - GiniXGBoost - F Score
feature_1a2
feature_2c3
feature_hhs
feature_nrp
feature_urh
feature_nub
feature_nup
feature_psc
feature_sncp
feature_3e1
feature_tpa
feature_snc
feature_bst
feature_tbu
feature_nub
Analysis of Top 15 important variable

Feature Selection/ Importance
Comparison b/w Important Feature by Random Forest & XGBoost
Reason for difference in Feature Importance b/w XGB & RF
Basically, when there are several correlated features, boosting will tend to choose one and use it in
several trees (if necessary). Other correlated features won t be used a lot (or not at all).
It makes sense as other correlated features can't help in the split process anymore -> they don't bring
new information regarding the already used feature. And the learning is done in a serial way.
Each tree of a Random forest is not built from the same features (there is a random selection of
features to use for each tree). Each correlated feature may have the chance to be selected in one of the
tree. Therefore, when you look at the whole model it has used all features. The learning is done in
parallel so each tree is not aware of what have been used for other trees.
Tree Growth XGB
When you grow too many trees, trees are starting to be look very similar (when there is no loss
remaining to learn). Therefore the dominant feature will be an even more important. Having shallow
trees reinforce this trend because there are few possible important features at the root of a tree (shared
features between trees are most of the time the one at the root of it). So your results are not surprising.
In this case, you may have interesting results with random selection of columns (rate around 0.8).
Decreasing ETA may also help (keep more loss to explain after each iteration).

Model Accuracy and Threshold Evaluation
XGBoost

XGBoost
A A
A A BB
B B

XGBoost
Threshold Accuracy TN FP FN TP
0 0.059% 0 46990 0 2936
0.1 87.353% 42229 4761 1553 1383
0.2 93.881% 46075 915 2140 796
0.3 94.722% 46691 299 2336 600
0.4 94.894% 46866 124 2425 511
0.5 94.902% 46923 67 2478 458
0.6 94.866% 46956 34 2529 407
0.7 94.856% 46973 17 2551 385
0.8 94.824% 46977 13 2571 365
0.9 94.776% 46982 8 2600 336
1 94.119% 46990 0 2936 0
A
A B
B

Random Forest Criteria - Gini Index Random Forest Criteria - Entropy
Criteria Accuracy TN FP FN TP
Gini 94.800% 46968 22 2574 362
Entropy 94.788% 46967 23 2579 357
A A
A A BB
B B

Comparison b/w Random Forest & XGBoost
Criteria Accuracy TN FP FN TP
Gini 94.800% 46968 22 2574 362
Entropy 94.788% 46967 23 2579 357
Threshold Accuracy TN FP FN TP
0 0.059% 0 46990 0 2936
0.1 87.353% 42229 4761 1553 1383
0.2 93.881% 46075 915 2140 796
0.3 94.722% 46691 299 2336 600
0.4 94.894% 46866 124 2425 511
0.5 94.902% 46923 67 2478 458
0.6 94.866% 46956 34 2529 407
0.7 94.856% 46973 17 2551 385
0.8 94.824% 46977 13 2571 365
0.9 94.776% 46982 8 2600 336
1 94.119% 46990 0 2936 0

Bringing New Rules Idea
Comparison b/w Random Forest & XGBoost

Understanding Bagging and Boosting

More Related Content

What's hot (20)

Similar to Understanding Bagging and Boosting (20)

More from Mohit Rajput (20)

Recently uploaded (20)

Understanding Bagging and Boosting