4
Most read
8
Most read
9
Most read
zekeLabs
Decision Trees
“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
● Introduction to Trees
● Construction of Trees
● Information
● Root-Node Decision
● Classification Tree
● Regression Tree
● Pruning
● Advantages and Disadvantages of Trees
Overview of
Decision Trees
Introduction to Trees
● Supervised learning algorithm
● Classification & Regression
● Flowchart-like structure
● Models consist of one or more
nested if-then statements
● Mimic the human level thinking
Construction of Tree
● Hierarchical way of partitioning
the space
● Conditioning on the features
● Greedy way of partitioning is done
Measuring Information
● Low probability events have higher information
● Entropy is the average rate of information
● The measure of uncertainty
Information Gain
● Measures how much “information” a feature gives us about the class
Classification Tree
● In this problem, we have
four features i.e, X values
and one response i.e, Y
● Need to learn mapping
between X and Y
Outlook Temp. Humidity Wind Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
The Root Node
Entropy:
Pyes = - (9/14)*log2(9/14) = 0.41
Pno = - (5/14)*log2(5/14) = 0.53
H(S) = Pno + Pyes = 0.94
Outlook Temp. Humidity Wind Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
The Root Node
E(Outlook = Sunny) = - (2/5)*log(2/5) - (3/5)*log(3/5) = 0.971
E(Outlook = Overcast) = - (1)*log(1) - (0)*log(0) = 0
E(Outlook = Sunny) = - (3/5)*log(3/5) - (2/5)*log(2/5) = 0.971
Average Entropy information for Outlook:
I(Outlook) = (5/14)*0.971+(4/14)*0+(5/14)*0.971 = 0.693
Gain(Outlook) = 0.94 - 0.693 = 0.247
Outlook Temperature
Info 0.693 Info 0.911
Gain 0.247 Gain 0.029
Humidity Windy
Info 0.788 Info 0.892
Gain 0.152 Gain 0.048
Algorithm
● Compute the entropy for data-set
● For every attribute/feature:
○ Calculate entropy for all categories
○ Take average information for the current
attribute
○ Calculate gain for the current attribute
● Pick the highest gain attribute
● Repeat until we get the desired tree
Other Criteria
● The Gini index is defined as:
where 𝑝(c) denotes the proportion of instances belonging to class c
● The Classification error is defined as:
Regression Tree
● Divide the space into regions and fit a
model to the each region
● Constant can be simple fit to a each
region
● Deciding on which feature to split
could be an optimization problem
Optimization Function
● The optimization function for the regression is
Optimization Function
● The average at each region is the best estimate of the values at particular
region
● s and j values are found by solving the above problem
Pruning
● It involves removing the branches
● Reduce the complexity of tree
● Increases its predictive power by reducing overfitting
● Two methods of pruning:
○ Pre-pruning
○ Post-pruning
Post-pruning
● Cutting the tree once it is grown fully based on below Global optimal
function
Pre-pruning
● Setting the parameters before building the model
○ Set maximum tree depth
○ Set maximum number of terminal nodes
○ Set minimum samples for a node split
○ Set maximum number of features
● Controls the size of a resultant terminal nodes
● Sklearn uses this method
An Example - Regression Tree
Various Algorithms
● CART (Classification and Regression Trees) → Uses Gini Index as metric
● ID3 (Iterative Dichotomiser 3) → Uses Entropy and Information gain as
metrics
● C4.5, C5.0, CHAID, QUEST are various other algorithms
● Sklearn implements CART
The Iris Data set
Advantages of Trees
● Simple to understand, interpret, visualize
● Implicitly perform variable screening or feature selection
● Can handle both numerical and categorical data
● Can also handle multi-output problems
● Decision trees require relatively little effort from users for data preparation
● Nonlinear relationships between parameters do not affect tree
performance
Disadvantages of Trees
● Decision-tree learners can create overfitting
● Decision trees can be unstable due to small variations in the data
● Greedy algorithms cannot guarantee to return the globally optimal tree
● Decision tree can be biased trees if some classes dominate

More Related Content

PDF
Distributed Decision Tree Induction
PDF
Regression tree
PPTX
Decision tree
PPTX
Machine learning session 10
PPTX
Decision Trees for Classification: A Machine Learning Algorithm
PPTX
Data Applied:Decision Trees
PPTX
Random forest
PDF
From decision trees to random forests
Distributed Decision Tree Induction
Regression tree
Decision tree
Machine learning session 10
Decision Trees for Classification: A Machine Learning Algorithm
Data Applied:Decision Trees
Random forest
From decision trees to random forests

Similar to Decision Trees (20)

PDF
Decision Tree in classification problems in ML
PDF
DECISION TREE USING INFORMATION GAIN.pdf
PPTX
"Induction of Decision Trees" @ Papers We Love Bucharest
PPTX
unit 5 decision tree2.pptx
PPTX
Decision Trees Learning in Machine Learning
PPTX
Descision making descision making decision tree.pptx
PDF
ID3 Algorithm & ROC Analysis
PPTX
3. Tree Models in machine learning
PPTX
Machine Learning with Python unit-2.pptx
PPT
Random Forest algorithm in Machine learning
PPTX
Lecture4.pptx
PPTX
Decision tree in artificial intelligence
PDF
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
PPTX
03-classificationTrees03-classificationTrees.pptx
PDF
Decision trees
PPTX
lecture notes about decision tree. Its a very good
PDF
Random forest sgv_ai_talk_oct_2_2018
PDF
Unit3_Classification_Decision Tree ID4, C4.5, CART.pdf
PPTX
Classification decision tree
PDF
Decision tree
Decision Tree in classification problems in ML
DECISION TREE USING INFORMATION GAIN.pdf
"Induction of Decision Trees" @ Papers We Love Bucharest
unit 5 decision tree2.pptx
Decision Trees Learning in Machine Learning
Descision making descision making decision tree.pptx
ID3 Algorithm & ROC Analysis
3. Tree Models in machine learning
Machine Learning with Python unit-2.pptx
Random Forest algorithm in Machine learning
Lecture4.pptx
Decision tree in artificial intelligence
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
03-classificationTrees03-classificationTrees.pptx
Decision trees
lecture notes about decision tree. Its a very good
Random forest sgv_ai_talk_oct_2_2018
Unit3_Classification_Decision Tree ID4, C4.5, CART.pdf
Classification decision tree
Decision tree
Ad

More from zekeLabs Technologies (20)

PPTX
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
PPTX
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
PDF
[Webinar] Following the Agile Footprint - zekeLabs
PPTX
Machine learning at scale - Webinar By zekeLabs
PDF
A curtain-raiser to the container world Docker & Kubernetes
PPTX
Docker - A curtain raiser to the Container world
PPTX
Serverless and cloud computing
PPTX
02 terraform core concepts
PPTX
08 Terraform: Provisioners
PPTX
Outlier detection handling
PPTX
Nearest neighbors
PPTX
PPTX
Master guide to become a data scientist
PPTX
Linear regression
PPTX
Linear models of classification
PPTX
Grid search, pipeline, featureunion
PPTX
Feature selection
PPTX
Essential NumPy
PPTX
Ensemble methods
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
Machine learning at scale - Webinar By zekeLabs
A curtain-raiser to the container world Docker & Kubernetes
Docker - A curtain raiser to the Container world
Serverless and cloud computing
02 terraform core concepts
08 Terraform: Provisioners
Outlier detection handling
Nearest neighbors
Master guide to become a data scientist
Linear regression
Linear models of classification
Grid search, pipeline, featureunion
Feature selection
Essential NumPy
Ensemble methods
Ad

Recently uploaded (20)

PPT
Geologic Time for studying geology for geologist
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
STKI Israel Market Study 2025 version august
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
Chapter 5: Probability Theory and Statistics
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Five Habits of High-Impact Board Members
Geologic Time for studying geology for geologist
1 - Historical Antecedents, Social Consideration.pdf
CloudStack 4.21: First Look Webinar slides
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
NewMind AI Weekly Chronicles – August ’25 Week III
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Developing a website for English-speaking practice to English as a foreign la...
Module 1.ppt Iot fundamentals and Architecture
STKI Israel Market Study 2025 version august
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A proposed approach for plagiarism detection in Myanmar Unicode text
Chapter 5: Probability Theory and Statistics
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Final SEM Unit 1 for mit wpu at pune .pptx
Consumable AI The What, Why & How for Small Teams.pdf
sbt 2.0: go big (Scala Days 2025 edition)
Getting started with AI Agents and Multi-Agent Systems
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
Five Habits of High-Impact Board Members

Decision Trees

  • 2. “Goal - Become a Data Scientist” “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett “The Plan” “A Goal without a Plan is just a wish”
  • 3. ● Introduction to Trees ● Construction of Trees ● Information ● Root-Node Decision ● Classification Tree ● Regression Tree ● Pruning ● Advantages and Disadvantages of Trees Overview of Decision Trees
  • 4. Introduction to Trees ● Supervised learning algorithm ● Classification & Regression ● Flowchart-like structure ● Models consist of one or more nested if-then statements ● Mimic the human level thinking
  • 5. Construction of Tree ● Hierarchical way of partitioning the space ● Conditioning on the features ● Greedy way of partitioning is done
  • 6. Measuring Information ● Low probability events have higher information ● Entropy is the average rate of information ● The measure of uncertainty
  • 7. Information Gain ● Measures how much “information” a feature gives us about the class
  • 8. Classification Tree ● In this problem, we have four features i.e, X values and one response i.e, Y ● Need to learn mapping between X and Y Outlook Temp. Humidity Wind Play Sunny Hot High FALSE No Sunny Hot High TRUE No Overcast Hot High FALSE Yes Rainy Mild High FALSE Yes Rainy Cool Normal FALSE Yes Rainy Cool Normal TRUE No Overcast Cool Normal TRUE Yes Sunny Mild High FALSE No Sunny Cool Normal FALSE Yes Rainy Mild Normal FALSE Yes Sunny Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Rainy Mild High TRUE No
  • 9. The Root Node Entropy: Pyes = - (9/14)*log2(9/14) = 0.41 Pno = - (5/14)*log2(5/14) = 0.53 H(S) = Pno + Pyes = 0.94 Outlook Temp. Humidity Wind Play Sunny Hot High FALSE No Sunny Hot High TRUE No Overcast Hot High FALSE Yes Rainy Mild High FALSE Yes Rainy Cool Normal FALSE Yes Rainy Cool Normal TRUE No Overcast Cool Normal TRUE Yes Sunny Mild High FALSE No Sunny Cool Normal FALSE Yes Rainy Mild Normal FALSE Yes Sunny Mild Normal TRUE Yes Overcast Mild High TRUE Yes Overcast Hot Normal FALSE Yes Rainy Mild High TRUE No
  • 10. The Root Node E(Outlook = Sunny) = - (2/5)*log(2/5) - (3/5)*log(3/5) = 0.971 E(Outlook = Overcast) = - (1)*log(1) - (0)*log(0) = 0 E(Outlook = Sunny) = - (3/5)*log(3/5) - (2/5)*log(2/5) = 0.971 Average Entropy information for Outlook: I(Outlook) = (5/14)*0.971+(4/14)*0+(5/14)*0.971 = 0.693 Gain(Outlook) = 0.94 - 0.693 = 0.247 Outlook Temperature Info 0.693 Info 0.911 Gain 0.247 Gain 0.029 Humidity Windy Info 0.788 Info 0.892 Gain 0.152 Gain 0.048
  • 11. Algorithm ● Compute the entropy for data-set ● For every attribute/feature: ○ Calculate entropy for all categories ○ Take average information for the current attribute ○ Calculate gain for the current attribute ● Pick the highest gain attribute ● Repeat until we get the desired tree
  • 12. Other Criteria ● The Gini index is defined as: where 𝑝(c) denotes the proportion of instances belonging to class c ● The Classification error is defined as:
  • 13. Regression Tree ● Divide the space into regions and fit a model to the each region ● Constant can be simple fit to a each region ● Deciding on which feature to split could be an optimization problem
  • 14. Optimization Function ● The optimization function for the regression is
  • 15. Optimization Function ● The average at each region is the best estimate of the values at particular region ● s and j values are found by solving the above problem
  • 16. Pruning ● It involves removing the branches ● Reduce the complexity of tree ● Increases its predictive power by reducing overfitting ● Two methods of pruning: ○ Pre-pruning ○ Post-pruning
  • 17. Post-pruning ● Cutting the tree once it is grown fully based on below Global optimal function
  • 18. Pre-pruning ● Setting the parameters before building the model ○ Set maximum tree depth ○ Set maximum number of terminal nodes ○ Set minimum samples for a node split ○ Set maximum number of features ● Controls the size of a resultant terminal nodes ● Sklearn uses this method
  • 19. An Example - Regression Tree
  • 20. Various Algorithms ● CART (Classification and Regression Trees) → Uses Gini Index as metric ● ID3 (Iterative Dichotomiser 3) → Uses Entropy and Information gain as metrics ● C4.5, C5.0, CHAID, QUEST are various other algorithms ● Sklearn implements CART
  • 22. Advantages of Trees ● Simple to understand, interpret, visualize ● Implicitly perform variable screening or feature selection ● Can handle both numerical and categorical data ● Can also handle multi-output problems ● Decision trees require relatively little effort from users for data preparation ● Nonlinear relationships between parameters do not affect tree performance
  • 23. Disadvantages of Trees ● Decision-tree learners can create overfitting ● Decision trees can be unstable due to small variations in the data ● Greedy algorithms cannot guarantee to return the globally optimal tree ● Decision tree can be biased trees if some classes dominate