SlideShare a Scribd company logo
DECISION TREES
Overview
 What is a Decision Tree
 Sample Decision Trees
 How to Construct a Decision Tree
 Decision Tree Algorithm
 ID3 Heuristic
 Entropy
 Decision Tree Advantages And
Limitations
 Summary
What is a Decision Tree?
 An inductive learning task
 Use particular facts to make more generalized
conclusions
 A predictive model based on a branching
series of Boolean tests
 These smaller Boolean tests are less complex
than a one-stage classifier
 Let’s look at a sample decision tree…
Predictive Time Commuting
Leave At
Stall? Accident?
10 AM 9 AM
8 AM
Long
Long
Short Medium Long
No Yes No Yes
If we leave at
10 AM and
there are no
cars stalled on
the road, what
will our
commute time
be?
Inductive Learning
 In this decision tree, we made a series of
Boolean decisions and followed the
corresponding branch
 Did we leave at 10 AM?
 Did a car stall on the road?
 Is there an accident on the road?
 By answering each of these yes/no
questions, we then came to a conclusion
on how long our commute might take
Decision Trees as Rules
 We did not have represent this tree
graphically
 We could have represented as a set of
rules. However, this may be much
harder to read…
Decision Tree as a Rule Set
if hour == 8am
commute time = long
else if hour == 9am
if accident == yes
commute time = long
else
commute time = medium
else if hour == 10am
if stall == yes
commute time = long
else
commute time = short
Notice that all attributes to not
have to be used in each path
of the decision.
As we will see, all attributes
may not even appear in the
tree.
How to Create a Decision
Tree
 We first make a list of attributes that we
can measure
 These attributes (for now) must be discrete
 We then choose a target attribute that
we want to predict
 Then create an experience table that
lists what we have seen in the past
Sample Experience Table
Example Attributes Target
Hour Weather Accident Stall Commute
D1 8 AM Sunny No No Long
D2 8 AM Cloudy No Yes Long
D3 10 AM Sunny No No Short
D4 9 AM Rainy Yes No Long
D5 9 AM Sunny Yes Yes Long
D6 10 AM Sunny No No Short
D7 10 AM Cloudy No No Short
D8 9 AM Rainy No No Medium
D9 9 AM Sunny Yes No Long
D10 10 AM Cloudy Yes Yes Long
D11 10 AM Rainy No No Short
D12 8 AM Cloudy Yes No Long
D13 9 AM Sunny No No Medium
Decision Tree Algorithms
 The basic idea behind any decision tree
algorithm is as follows:
 Choose the best attribute(s) to split the remaining
instances and make that attribute a decision node
 Repeat this process for recursively for each child
 Stop when:
○ All the instances have the same target attribute value
○ There are no more attributes
○ There are no more instances
Identifying the Best
Attributes
 Refer back to our original decision tree
Leave At
Stall? Accident?
10 AM 9 AM
8 AM
Long
Long
Short Medium
No Yes No Yes
Long
 How did we know to split on leave at
and then on stall and accident and not
weather?
ID3 Heuristic
 To determine the best attribute, we look
at the ID3 heuristic
 ID3 splits attributes based on their
entropy.
 Entropy is the measure of
disinformation…
Entropy
 Entropy is minimized when all values of the
target attribute are the same.
 If we know that commute time will always be
short, then entropy = 0
 Entropy is maximized when there is an
equal chance of all values for the target
attribute (i.e. the result is random)
 If commute time = short in 3 instances, medium in
3 instances and long in 3 instances, entropy is
maximized
Entropy
 Calculation of entropy
 Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)
○ S = set of examples
○ Si = subset of S with value vi under the target
attribute
○ l = size of the range of the target attribute
ID3
 ID3 splits on attributes with the lowest
entropy
 We calculate the entropy for all values of an
attribute as the weighted sum of subset
entropies as follows:
 ∑(i = 1 to k) |Si|/|S| Entropy(Si), where k is the range
of the attribute we are testing
 We can also measure information gain
(which is inversely proportional to entropy)
as follows:
 Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)
ID3
 Given our commute time sample set, we
can calculate the entropy of each attribute
at the root node
Attribute Expected Entropy Information Gain
Hour 0.6511 0.768449
Weather 1.28884 0.130719
Accident 0.92307 0.496479
Stall 1.17071 0.248842
Decision Tree Advantages
• Inexpensive to construct
• Extremely fast at classifying unknown
records
• Easy to interpret for small-sized trees
• Accuracy is comparable to other
classification techniques for many simple
data sets
Decision Tree Limitations
 No backtracking
 local optimal solution not global optimal solution
 lookahead features may give us better trees
 Rectangular-shaped geometric regions
 in two-dimensional space
○ regions bounded by lines parallel to the x- and y- axes
 some linear relationships not parallel to the axes
Summary
 Decision trees can be used to help
predict the future
 The trees are easy to understand
 Decision trees work more efficiently with
discrete attributes
 The trees may suffer from error
propagation
Decision trees

More Related Content

PPTX
Sampling theorem
PPT
Propagation mechanisms
PPTX
Channel assignment strategies
PDF
Satellite link design
PPT
Small Scale Multi path measurements
PPT
Speech encoding techniques
PDF
Interference and system capacity
PPT
NOISE IN Analog Communication Part-2 AM SYSTEMS.ppt
Sampling theorem
Propagation mechanisms
Channel assignment strategies
Satellite link design
Small Scale Multi path measurements
Speech encoding techniques
Interference and system capacity
NOISE IN Analog Communication Part-2 AM SYSTEMS.ppt

What's hot (20)

PPTX
Pulse shaping
PPTX
Satallite mobile communication Poster
PPTX
Wireless Channels Capacity
PPT
HFSS MICROSTRIP PATCH ANTENNA- ANALYSIS AND DESIGN
PPT
Wireless routing protocols
PPSX
Concepts of & cell sectoring and micro cell
PDF
Random process and noise
PPTX
Comparsion of M-Ary psk,fsk,qapsk.pptx
PPTX
Equalization
PPT
Adaptive filter
PDF
Multiple Access
PPTX
Handoff in Mobile Communication
ODP
UMTS, Introduction.
PPT
PPTX
Presentation on Jamming
PPTX
VSAT- Very Small Aperture Terminal
PPTX
Noise.pptx
DOCX
Cdma and 3 g
PPTX
Amplitude modulation
Pulse shaping
Satallite mobile communication Poster
Wireless Channels Capacity
HFSS MICROSTRIP PATCH ANTENNA- ANALYSIS AND DESIGN
Wireless routing protocols
Concepts of & cell sectoring and micro cell
Random process and noise
Comparsion of M-Ary psk,fsk,qapsk.pptx
Equalization
Adaptive filter
Multiple Access
Handoff in Mobile Communication
UMTS, Introduction.
Presentation on Jamming
VSAT- Very Small Aperture Terminal
Noise.pptx
Cdma and 3 g
Amplitude modulation
Ad

Viewers also liked (19)

PDF
It's Not Magic - Explaining classification algorithms
PPTX
BAS 250 Lecture 5
PDF
Logistic Regression
PPT
Hidden Markov Models with applications to speech recognition
PPT
HIDDEN MARKOV MODEL AND ITS APPLICATION
PDF
Machine Learning Lecture 3 Decision Trees
PPT
HMM (Hidden Markov Model)
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PPSX
Decision tree Using c4.5 Algorithm
PDF
Hidden Markov Models
PPTX
Decision tree
PPT
Hidden markov model ppt
PPTX
Decision tree
PPTX
Naive bayes
PDF
Decision tree
PDF
Logistic Regression Analysis
PPTX
Logistic regression
PPTX
Logistic regression
It's Not Magic - Explaining classification algorithms
BAS 250 Lecture 5
Logistic Regression
Hidden Markov Models with applications to speech recognition
HIDDEN MARKOV MODEL AND ITS APPLICATION
Machine Learning Lecture 3 Decision Trees
HMM (Hidden Markov Model)
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Decision tree Using c4.5 Algorithm
Hidden Markov Models
Decision tree
Hidden markov model ppt
Decision tree
Naive bayes
Decision tree
Logistic Regression Analysis
Logistic regression
Logistic regression
Ad

Similar to Decision trees (20)

PPT
Storey_DecisionTrees explain ml algo.ppt
PPT
Decision tree Using Machine Learning.ppt
PPTX
Decision Trees Learning in Machine Learning
PPTX
Machine Learning, Decision Tree Learning module_2_ppt.pptx
PPTX
module_3_1.pptx
PPTX
module_3_1.pptx
PDF
Decision trees
PPTX
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
PPT
Decision Trees.ppt
PPTX
Decision Tree - ID3
PDF
Decision treeDecision treeDecision treeDecision tree
PPTX
DecisionTree.pptx for btech cse student
PPTX
ML_Unit_1_Part_C
PPTX
Decision Trees
PDF
Machine Learning using python module_2_ppt.pdf
PDF
CSA 3702 machine learning module 2
PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
PPTX
Lect9 Decision tree
DOCX
Classification Using Decision Trees and RulesChapter 5.docx
PDF
7 decision tree
Storey_DecisionTrees explain ml algo.ppt
Decision tree Using Machine Learning.ppt
Decision Trees Learning in Machine Learning
Machine Learning, Decision Tree Learning module_2_ppt.pptx
module_3_1.pptx
module_3_1.pptx
Decision trees
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
Decision Trees.ppt
Decision Tree - ID3
Decision treeDecision treeDecision treeDecision tree
DecisionTree.pptx for btech cse student
ML_Unit_1_Part_C
Decision Trees
Machine Learning using python module_2_ppt.pdf
CSA 3702 machine learning module 2
Decision Tree-ID3,C4.5,CART,Regression Tree
Lect9 Decision tree
Classification Using Decision Trees and RulesChapter 5.docx
7 decision tree

Recently uploaded (20)

PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
additive manufacturing of ss316l using mig welding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
web development for engineering and engineering
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PPT on Performance Review to get promotions
PDF
Well-logging-methods_new................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Embodied AI: Ushering in the Next Era of Intelligent Systems
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Model Code of Practice - Construction Work - 21102022 .pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
additive manufacturing of ss316l using mig welding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
bas. eng. economics group 4 presentation 1.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
web development for engineering and engineering
CYBER-CRIMES AND SECURITY A guide to understanding
PPT on Performance Review to get promotions
Well-logging-methods_new................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...

Decision trees

  • 2. Overview  What is a Decision Tree  Sample Decision Trees  How to Construct a Decision Tree  Decision Tree Algorithm  ID3 Heuristic  Entropy  Decision Tree Advantages And Limitations  Summary
  • 3. What is a Decision Tree?  An inductive learning task  Use particular facts to make more generalized conclusions  A predictive model based on a branching series of Boolean tests  These smaller Boolean tests are less complex than a one-stage classifier  Let’s look at a sample decision tree…
  • 4. Predictive Time Commuting Leave At Stall? Accident? 10 AM 9 AM 8 AM Long Long Short Medium Long No Yes No Yes If we leave at 10 AM and there are no cars stalled on the road, what will our commute time be?
  • 5. Inductive Learning  In this decision tree, we made a series of Boolean decisions and followed the corresponding branch  Did we leave at 10 AM?  Did a car stall on the road?  Is there an accident on the road?  By answering each of these yes/no questions, we then came to a conclusion on how long our commute might take
  • 6. Decision Trees as Rules  We did not have represent this tree graphically  We could have represented as a set of rules. However, this may be much harder to read…
  • 7. Decision Tree as a Rule Set if hour == 8am commute time = long else if hour == 9am if accident == yes commute time = long else commute time = medium else if hour == 10am if stall == yes commute time = long else commute time = short Notice that all attributes to not have to be used in each path of the decision. As we will see, all attributes may not even appear in the tree.
  • 8. How to Create a Decision Tree  We first make a list of attributes that we can measure  These attributes (for now) must be discrete  We then choose a target attribute that we want to predict  Then create an experience table that lists what we have seen in the past
  • 9. Sample Experience Table Example Attributes Target Hour Weather Accident Stall Commute D1 8 AM Sunny No No Long D2 8 AM Cloudy No Yes Long D3 10 AM Sunny No No Short D4 9 AM Rainy Yes No Long D5 9 AM Sunny Yes Yes Long D6 10 AM Sunny No No Short D7 10 AM Cloudy No No Short D8 9 AM Rainy No No Medium D9 9 AM Sunny Yes No Long D10 10 AM Cloudy Yes Yes Long D11 10 AM Rainy No No Short D12 8 AM Cloudy Yes No Long D13 9 AM Sunny No No Medium
  • 10. Decision Tree Algorithms  The basic idea behind any decision tree algorithm is as follows:  Choose the best attribute(s) to split the remaining instances and make that attribute a decision node  Repeat this process for recursively for each child  Stop when: ○ All the instances have the same target attribute value ○ There are no more attributes ○ There are no more instances
  • 11. Identifying the Best Attributes  Refer back to our original decision tree Leave At Stall? Accident? 10 AM 9 AM 8 AM Long Long Short Medium No Yes No Yes Long  How did we know to split on leave at and then on stall and accident and not weather?
  • 12. ID3 Heuristic  To determine the best attribute, we look at the ID3 heuristic  ID3 splits attributes based on their entropy.  Entropy is the measure of disinformation…
  • 13. Entropy  Entropy is minimized when all values of the target attribute are the same.  If we know that commute time will always be short, then entropy = 0  Entropy is maximized when there is an equal chance of all values for the target attribute (i.e. the result is random)  If commute time = short in 3 instances, medium in 3 instances and long in 3 instances, entropy is maximized
  • 14. Entropy  Calculation of entropy  Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|) ○ S = set of examples ○ Si = subset of S with value vi under the target attribute ○ l = size of the range of the target attribute
  • 15. ID3  ID3 splits on attributes with the lowest entropy  We calculate the entropy for all values of an attribute as the weighted sum of subset entropies as follows:  ∑(i = 1 to k) |Si|/|S| Entropy(Si), where k is the range of the attribute we are testing  We can also measure information gain (which is inversely proportional to entropy) as follows:  Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)
  • 16. ID3  Given our commute time sample set, we can calculate the entropy of each attribute at the root node Attribute Expected Entropy Information Gain Hour 0.6511 0.768449 Weather 1.28884 0.130719 Accident 0.92307 0.496479 Stall 1.17071 0.248842
  • 17. Decision Tree Advantages • Inexpensive to construct • Extremely fast at classifying unknown records • Easy to interpret for small-sized trees • Accuracy is comparable to other classification techniques for many simple data sets
  • 18. Decision Tree Limitations  No backtracking  local optimal solution not global optimal solution  lookahead features may give us better trees  Rectangular-shaped geometric regions  in two-dimensional space ○ regions bounded by lines parallel to the x- and y- axes  some linear relationships not parallel to the axes
  • 19. Summary  Decision trees can be used to help predict the future  The trees are easy to understand  Decision trees work more efficiently with discrete attributes  The trees may suffer from error propagation