2. 2
Ensembling Methods
• Ensembles involve a group of prediction models working
together to improve classification outcomes.
• Bagging Revisited: Bagging develops multiple models
using randomly selected subsets of the training dataset.
• Introduction to Boosting : Our focus today shifts to
Boosting, a technique where models are trained in
sequence, with an increased focus on examples that were
incorrectly predicted in previous rounds.
3. 3
Decision Stump
• A weak learner refers to a model that predicts marginally better than a
random guess, exemplified by achieving accuracy just above 50%, such
as 55%.
• Boosting leverages these minimal-performing models, prioritizing
those that are computationally simple, such as a decision stump—a
decision tree characterized by only one decision node.
4. 4
Visualizing Decision Stump Classifiers
Decision stump classifiers delineate the feature space using
horizontal and vertical boundaries, creating distinct half-
spaces for classification.
6. 6
Understanding the AdaBoost
Algorithm
• Key steps of AdaBoost:
1. In every iteration, AdaBoost adjusts the weights of the training instances, emphasizing those
that were previously misclassified.
2. A fresh weak classifier is then trained on these reweighted instances.
3. Newly developed classifiers are merged into the current ensemble with an assigned weight
based on their accuracy, thereby strengthening the collective decision-making power.
4. Repeat this process many times.
• Each weak learner is tasked with minimizing the error on the weighted data.
• AdaBoost's strategy of focusing on prior errors reduces the overall bias of the model, sharpening
accuracy with each step.
10. 10
Question
• If the decision stump is well optimized, The error rate should be
always lesser than 0.5. Why?
• (the confidence/trust) level will be greater than 0. When will the
confidence level be high?
• (the confidence/trust) is used as the weight of each individual
learner/weak classifier.
11. 11
Question
• In adaboost, after each boosting round, we should update the weight
of each individual sample using this formula
• What does this mean?
• How to explain the figure?
15. 15
Key steps of the above AdaBoost process
ε =
∑
𝑖=1
𝑤𝑖 Ι { ht ( 𝑥 ( 𝑖 )
)≠ 𝑡 (𝑖 )
}
∑
𝑖= 1
𝑤𝑖
𝑎=
1
2
log ( 1 − 𝜖
𝜖 )
Data Samples
with Weights
Train a decision stump
(See decision tree building lecture slide)
Calculate the weighted classification
error of the decision stump
Calculate weighting factor of the
current decision stump
Increase Data Sample Weights for
Wrongly Classified Samples
Next Round of
Boosting
Final classifier after T
round of boosting:
17. 17
AdaBoost Algorithm Key Steps
1. Classifier Training according
to weighted data samples
2. Calculated weighted error of
current classifier
3. Calculate classifier coefficient
4. Update data weights
18. 18
Adaboost Examples
• The decision made by the latest added learner is depicted with a dashed black line.
• The collective decision boundary formed by the entire ensemble is illustrated in green.
19. 19
Control AdaBoost Overfitting
• AdaBoost's training error is theoretically demonstrated to approach zero as the
algorithm progresses.
• With an increasing number of classifiers, there's a risk of overfitting the model
to the training data.
• To prevent overfitting, it's crucial to adjust the number of boosting rounds. This
is best achieved by utilizing a separate validation set to fine-tune the process.
20. 20
AdaBoost Application: Face Detection
• AdaBoost has gained prominence for its effective use in the
realm of facial detection.
• Achieved real-time facial detection capabilities as early as
2001.
21. 21
AdaBoost Application: Face Detection
• The fundamental classifier, or weak learner, operates by evaluating the sum of
pixel intensities within a specified rectangular area, employing certain efficient
computational techniques.
• As the number of boosting iterations increases, the selected features increasingly
concentrate on specific facial regions, improving detection accuracy.
22. 22
AdaBoost Face Detection Experiments in Lab
2
• Please DO Attend Lab 2 of this week, as we will detailed study face
detection in AdaBoost via experiments there.
23. 23
Course Summary
• Boosting strategically lowers bias by creating a collective of
weak classifiers, where each subsequent classifier is fine-
tuned to address the errors of the preceding ensemble.
• Careful calibration of the number of boosting iterations is
helpful to avert the risk of overfitting.
• A practical application of boosting is face detection,
showcasing the effectiveness of this ensemble method.
Editor's Notes
#2:Simple Idea: Ensembles use many different models together to get better results than any single model alone.
Bagging, in Short:
How It Works: Bagging creates several models by taking random parts of the training data for each one.
Learning About Boosting:
What's Boosting: Boosting is when you train models one after the other. Each new model pays more attention to the training examples that the previous models got wrong.
#3:The Basics: A weak learner is a simple model that doesn't predict very well, just a bit better than if you were guessing without any information. Think of it like getting a grade just over passing, like 55% when 50% is a pass.
How Boosting Uses Them: Boosting takes these basic models that don't do much better than guessing and uses them to build a stronger prediction. It often starts with really simple models, like a decision stump, which is a decision tree with just one question.
#4:In Simple Terms: A decision stump classifier is a very simple way of making decisions. It uses straight lines to split up a space into different areas. Each area represents a different group or category.
#6:Adjusting Weights: AdaBoost changes the importance (weights) of the examples in the training set. Examples that were wrong before get more attention.
Training a New Model: It then makes a weak classifier based on the examples that now have different weights.
Combining Models: Each new model is added to the ensemble with a certain importance based on how well it performs. This helps the ensemble make better decisions together.
Repeat: Do these steps over and over, many times.
Focus on Mistakes: Every simple model tries to get better at predicting the examples that have bigger weights (the ones that were wrong before).
Learning from Errors: By always focusing on what it got wrong before, AdaBoost becomes more accurate with each step, reducing overall mistakes (bias).
#8:Understanding AdaBoost Terms and Training:
ε (Error): This is the mistake rate of the model, considering how important (weighted) each example is.
α (Model Weight): This is how much trust we put in the model's decisions.
Step-by-Step Example:
Starting Point: Imagine we have 10 examples to learn from, and we treat each one as equally important, so each one gets a weight of 1/10.
First Model: We train our first simple model (let's call it h1) using these equal weights.
Calculate Error (ε): We find out how often this model h1 makes mistakes, weighted by our initial equal weights. Let's say the error rate (ε) is 0.3, which means the model gets 30% of the weighted examples wrong.
Calculate Model Weight (α): We use a formula to decide how much to trust h1. The formula uses the error rate (ε) we just found. For our error of 0.3, the trust level (α) comes out to be 0.42.
Combine for Final Model (H(x)): We then create a combined model, H(x), which is just our first model h1 weighted by the trust level (α) we calculated.
In math terms, it looks like this: H(x) = α1 * h1(x), where α1 is 0.42 in this example.
#9:Understanding AdaBoost Terms and Training:
ε (Error): This is the mistake rate of the model, considering how important (weighted) each example is.
α (Model Weight): This is how much trust we put in the model's decisions.
Step-by-Step Example:
Starting Point: Imagine we have 10 examples to learn from, and we treat each one as equally important, so each one gets a weight of 1/10.
First Model: We train our first simple model (let's call it h1) using these equal weights.
Calculate Error (ε): We find out how often this model h1 makes mistakes, weighted by our initial equal weights. Let's say the error rate (ε) is 0.3, which means the model gets 30% of the weighted examples wrong.
Calculate Model Weight (α): We use a formula to decide how much to trust h1. The formula uses the error rate (ε) we just found. For our error of 0.3, the trust level (α) comes out to be 0.42.
Combine for Final Model (H(x)): We then create a combined model, H(x), which is just our first model h1 weighted by the trust level (α) we calculated.
In math terms, it looks like this: H(x) = α1 * h1(x), where α1 is 0.42 in this example.
#12:Using Updated Weights to Train a New Classifier:
Updated Weights (w): We start with new importance levels (weights) for each data point.
Train Classifier h2: We train a second decision stump (h2) using these updated weights.
Calculate New Error (ε): We find the error rate of h2, which is 0.21 (21% mistake rate).
Calculate New Trust Level (α2): We work out the trust level for h2 with a formula. Here, it's 0.66.
Create Combined Model (H(x)): We combine the predictions of the first and second decision stumps, each weighted by their respective trust levels (α1 for h1 and α2 for h2).
So, the formula for the combined model is:
H(x) = α1 * h1(x) + α2 * h2(x)
Here, α1 is the weight from the first stump, h1 is the first decision stump, α2 is 0.66 from our second stump, and h2 is the second decision stump.
#13:Training the Third Classifier with New Weights:
Updated Weights (w): Use the new weights for each data point that were adjusted from the last round.
Train Classifier h3: Now use these weights to train a third decision stump (h3).
Find Error Rate (ε): The error for this third stump is 0.14 (14% mistakes).
Calculate Trust Level (α3): The formula tells us to trust this stump's predictions with a level of 0.92.
Combine All Stumps for Final Model (H(x)): The final model adds up the predictions from all the stumps, each one multiplied by its trust level.
#15:Training a Decision Stump Step-by-Step:
Train the Stump: First, make a simple decision stump (a basic model) to classify data. (Look at the lecture slides on building decision trees for help.)
Find Mistakes: Next, figure out the decision stump's error rate, but make sure to consider the importance (weight) of each data sample.
Set the Trust Level: After that, decide how much trust you can put in the decision stump's results. This is called the 'weighting factor'.
Focus on Errors: Increase the importance (weights) of the samples that the decision stump got wrong.
Repeat: Now, you're ready for another round of boosting, where you'll make a new decision stump that tries to correct the errors of the first one.
#19:Getting Better: As AdaBoost keeps going, it's supposed to make fewer and fewer mistakes on the training data, almost reaching a point where it makes no errors.
Too Specific: If you keep adding more classifiers, AdaBoost might get too good at the training data and not good at new, unseen data. That's like memorizing the answers to a test without understanding the subject.
Finding the Balance: It's important to not go too far with boosting. You can find the right amount by testing the model on a different set of data (validation set) that it hasn't seen before. This helps you figure out when to stop adding new classifiers.
#20:AdaBoost and Facial Detection:
Success Story: AdaBoost is famous for being really good at recognizing faces.
Fast Results: It was even able to detect faces in real-time back in 2001.
#21:The fundamental classifier, or weak learner, operates by evaluating the sum of pixel intensities within a specified rectangular area, employing certain efficient computational techniques.
As the number of boosting iterations increases, the selected features increasingly concentrate on specific facial regions, improving detection accuracy.
#23:How Boosting Works to Recognize Faces:
Teamwork: Boosting builds a team of simple models. Each new model is made to fix the mistakes the team made before.
Just Enough Practice: You have to choose the right number of times to repeat the process. Too many, and the model might get too picky and not work well on new data it hasn't seen.
Real-World Use: Boosting is really good at finding faces in pictures, which proves it's a strong tool for building smart models.