SlideShare a Scribd company logo
Decision Trees and Random Forests
Machine Learning 2021
UML book chapter 18
Slides P. Zanuttigh
Decision Trees
Class 0 Class 1 Class 0
Example: Decision Tree
Grow a Decision Tree
Consider a binary classification setting and assume to have a gain
(performances) measure:
Start
❑ A single leaf assigning the most common of the two labels (i.e., the
one of the majority of the samples)
At each iteration
❑ Analyze the effect of splitting a leaf
❑ Among all possible splits select the one leading to a larger gain and
split that leaf (or choose not to split)
• Iterative Dichotomizer 3 (ID3)
Find which split (i.e. splitting over which
feature) leads to the maximum gain
Split on xj and recursively call the algorithm
considering the remaining features*
* Split on a feature only once: they are binary
No more
features to use
xj: selected feature
for the split
If real valued features: need to
find threshold, can split on
same feature with different
thresholds
Gain Measure
Example
Pruning
❑ Issue of ID3: The tree is typically very large with high risk of overfitting
❑ Prune the tree to reduce its size without affecting too much the performances
Random Forests (RF)
❑ Introduced by Leo Breiman in 2001
❑ Instead of using a single large tree
construct an ensemble of simpler
trees
❑ A Random Forest (RF) is a classifier
consisting of a collection of
decision trees
❑ The prediction is obtained by a
majority voting over the prediction
of the single trees
Random Forest: Example
Random Sampling with
Replacement
Idea: randomly sample from a training dataset with replacement
❑ Assume a training set S of size m: we can build new training sets
by taking at random m samples from S with replacement (i.e., the
same sample can be selected multiple times)
For example, if our training data is [1, 2, 3, 4, 5, 6] then we might sample
sets like [1, 2, 2, 3, 6, 6], [1, 2, 4, 4, 5, 6], [1 1 1 1 1 1], etc…..
i.e., all lists have a length of six but some values can be repeated in the
random selection
❑ Notice that we are not subsetting the training data into smaller
chunks
Bootstrap Aggregation
(Bagging)
Bagging (Bootstrap Aggregation):
❑ Decisions trees are very sensitive to the data they are trained on: small
changes to the training set can result in significantly different tree structures
❑ Random forest takes advantage of this by allowing each individual tree to
randomly sample with replacement from the dataset, resulting in different
training sets producing different trees
❑ This process is known as bagging
Bagging: Example
Randomization:
Feature Randomnsess
❑ In a normal decision tree, when it is time to split a node, we consider every
possible feature and pick the one that produces the largest gain
❑ In contrast, each tree in a random forest can pick only from a random subset of
features ( Feature Randomness )
❑ I.e., node splitting in a random forest model is based on a random subset of
features for each tree.
❑ This forces even more variation amongst the trees in the model and ultimately
results in lower correlation across trees and more diversification

More Related Content

PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
PPTX
artifial intelligence notes of islamia university
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PPTX
Ensemble learning Techniques
PPTX
5.Module_AIML Random Forest.pptx
PPT
RANDOM FORESTS Ensemble technique Introduction
PDF
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
PDF
22PCOAM16 _ML_Unit 3 Notes & Question bank
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
artifial intelligence notes of islamia university
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Ensemble learning Techniques
5.Module_AIML Random Forest.pptx
RANDOM FORESTS Ensemble technique Introduction
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 _ML_Unit 3 Notes & Question bank

Similar to Random Forests for Machine Learning ML Decision Tree (20)

PDF
Decision Trees- Random Forests.pdf
PPTX
CS109a_Lecture16_Bagging_RF_Boosting.pptx
PPTX
Module III - Classification Decision tree (1).pptx
PPTX
DecisionTree_RandomForest.pptx
PDF
Random forest sgv_ai_talk_oct_2_2018
PPTX
Decision tree presentation
PPT
Ensemble Learning in Machine Learning.ppt
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
PPTX
DecisionTree_RandomForest good for data science
PPTX
Machine learning session6(decision trees random forrest)
PDF
Aaa ped-15-Ensemble Learning: Random Forests
PPTX
RandomForests_Sayed-tree based model.pptx
PDF
CSA 3702 machine learning module 2
PDF
How many trees in a Random Forest
PPTX
decision_trees_forests_2.pptx
PPTX
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
PPTX
Machine Learning with Python unit-2.pptx
PPT
Machine Learning: Decision Trees Chapter 18.1-18.3
PPTX
Decision Tree.pptx
PDF
Decision tree
Decision Trees- Random Forests.pdf
CS109a_Lecture16_Bagging_RF_Boosting.pptx
Module III - Classification Decision tree (1).pptx
DecisionTree_RandomForest.pptx
Random forest sgv_ai_talk_oct_2_2018
Decision tree presentation
Ensemble Learning in Machine Learning.ppt
Random Forest Classifier in Machine Learning | Palin Analytics
DecisionTree_RandomForest good for data science
Machine learning session6(decision trees random forrest)
Aaa ped-15-Ensemble Learning: Random Forests
RandomForests_Sayed-tree based model.pptx
CSA 3702 machine learning module 2
How many trees in a Random Forest
decision_trees_forests_2.pptx
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
Machine Learning with Python unit-2.pptx
Machine Learning: Decision Trees Chapter 18.1-18.3
Decision Tree.pptx
Decision tree
Ad

More from premkumar1891 (7)

PDF
Random Forests for AIML for 3rd year ECE department CSE
PDF
decision tree and random forest in AIML for CSE
PDF
Microprocessor and Microcontroller Notes
PDF
AIML notes students study material for CSE IT ECE and other departments
PPTX
BATCH NO FIRST REVIEW Smart trolley-1.pptx
PPTX
TNWise Hackathon PPT industry 4.0 PMC TECH
PPTX
Robotics lab module 3 ppt 4
Random Forests for AIML for 3rd year ECE department CSE
decision tree and random forest in AIML for CSE
Microprocessor and Microcontroller Notes
AIML notes students study material for CSE IT ECE and other departments
BATCH NO FIRST REVIEW Smart trolley-1.pptx
TNWise Hackathon PPT industry 4.0 PMC TECH
Robotics lab module 3 ppt 4
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
PPT on Performance Review to get promotions
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
web development for engineering and engineering
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
OOP with Java - Java Introduction (Basics)
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Well-logging-methods_new................
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Welding lecture in detail for understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
composite construction of structures.pdf
PPT
Project quality management in manufacturing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CYBER-CRIMES AND SECURITY A guide to understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
PPT on Performance Review to get promotions
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
web development for engineering and engineering
UNIT 4 Total Quality Management .pptx
additive manufacturing of ss316l using mig welding
OOP with Java - Java Introduction (Basics)
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Well-logging-methods_new................
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Welding lecture in detail for understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
composite construction of structures.pdf
Project quality management in manufacturing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

Random Forests for Machine Learning ML Decision Tree

  • 1. Decision Trees and Random Forests Machine Learning 2021 UML book chapter 18 Slides P. Zanuttigh
  • 3. Class 0 Class 1 Class 0 Example: Decision Tree
  • 4. Grow a Decision Tree Consider a binary classification setting and assume to have a gain (performances) measure: Start ❑ A single leaf assigning the most common of the two labels (i.e., the one of the majority of the samples) At each iteration ❑ Analyze the effect of splitting a leaf ❑ Among all possible splits select the one leading to a larger gain and split that leaf (or choose not to split)
  • 5. • Iterative Dichotomizer 3 (ID3) Find which split (i.e. splitting over which feature) leads to the maximum gain Split on xj and recursively call the algorithm considering the remaining features* * Split on a feature only once: they are binary No more features to use xj: selected feature for the split If real valued features: need to find threshold, can split on same feature with different thresholds
  • 8. Pruning ❑ Issue of ID3: The tree is typically very large with high risk of overfitting ❑ Prune the tree to reduce its size without affecting too much the performances
  • 9. Random Forests (RF) ❑ Introduced by Leo Breiman in 2001 ❑ Instead of using a single large tree construct an ensemble of simpler trees ❑ A Random Forest (RF) is a classifier consisting of a collection of decision trees ❑ The prediction is obtained by a majority voting over the prediction of the single trees
  • 11. Random Sampling with Replacement Idea: randomly sample from a training dataset with replacement ❑ Assume a training set S of size m: we can build new training sets by taking at random m samples from S with replacement (i.e., the same sample can be selected multiple times) For example, if our training data is [1, 2, 3, 4, 5, 6] then we might sample sets like [1, 2, 2, 3, 6, 6], [1, 2, 4, 4, 5, 6], [1 1 1 1 1 1], etc….. i.e., all lists have a length of six but some values can be repeated in the random selection ❑ Notice that we are not subsetting the training data into smaller chunks
  • 12. Bootstrap Aggregation (Bagging) Bagging (Bootstrap Aggregation): ❑ Decisions trees are very sensitive to the data they are trained on: small changes to the training set can result in significantly different tree structures ❑ Random forest takes advantage of this by allowing each individual tree to randomly sample with replacement from the dataset, resulting in different training sets producing different trees ❑ This process is known as bagging
  • 14. Randomization: Feature Randomnsess ❑ In a normal decision tree, when it is time to split a node, we consider every possible feature and pick the one that produces the largest gain ❑ In contrast, each tree in a random forest can pick only from a random subset of features ( Feature Randomness ) ❑ I.e., node splitting in a random forest model is based on a random subset of features for each tree. ❑ This forces even more variation amongst the trees in the model and ultimately results in lower correlation across trees and more diversification