SlideShare a Scribd company logo
BAS 250
Lesson 5: Decision Trees
• Explain what decision trees are, how they are used, and the
benefits of using them
• Describe the best format for data in order to perform predictive
decision tree mining
• Interpret visual tree’s nodes and leaves
• Explain the use of different algorithms in order to increase the
granularity of the tree’s detail
This Week’s Learning Objectives
 What is a Decision Tree
 Sample Decision Trees
 How to Construct a Decision Tree
 Problems with Decision Trees
 Summary
Overview
• Decision trees are excellent predictive models when the target attribute is categorical in
nature and when the data set is of mixed data types
• More numerically-based approaches, decision trees are better at handling attributes that
have missing or inconsistent values that are not handled- decision trees will work around
such data and still generate usable results
• Decision trees are made of nodes and leaves to represent the best predictor attributes in
a data set
• Decision trees tell the user what is predicted, how confident that prediction can be, and
how we arrived at said prediction
Overview
An example of a Decision Tree developed in RapidMiner
Decision Trees
• Nodes are circular or oval shapes that represent
attributes which serve as good predictors for the label
attribute
• Leaves are end points that demonstrate the
distribution of categories from the label attribute that
follow the branch of the tree to the point of that leaf
Decision Trees
An example of meta data for playing golf based on a decision tree
Decision Trees
 An inductive learning task
o Use particular facts to make more generalized conclusions
 A predictive model based on a branching series of
Boolean tests
o These smaller Boolean tests are less complex than a one-
stage classifier
 Let’s look at a sample decision tree…
What is a Decision Tree?
Predicting Commute Time
Leave At
Stall? Accident?
10 AM 9 AM
8 AM
Long
Long
Short Medium Long
No Yes No Yes
If we leave at 10 AM and
there are no cars stalled
on the road, what will our
commute time be?
 In this decision tree, we made a series of Boolean
decisions and followed the corresponding branch
o Did we leave at 10 AM?
o Did a car stall on the road?
o Is there an accident on the road?
 By answering each of these yes/no questions, we
then came to a conclusion on how long our commute
might take
Inductive Learning
We did not have represent this tree graphically
We could have represented as a set of rules.
However, this may be much harder to read…
Decision Trees as Rules
if hour == 8am
commute time = long
else if hour == 9am
if accident == yes
commute time = long
else
commute time = medium
else if hour == 10am
if stall == yes
commute time = long
else
commute time = short
Decision Tree as a Rule Set
• Notice that all attributes to
not have to be used in each
path of the decision.
• As we will see, all attributes
may not even appear in the
tree.
1. We first make a list of attributes that we can measure
 These attributes (for now) must be discrete
2. We then choose a target attribute that we want to predict
3. Then create an experience table that lists what we have
seen in the past
How to Create a Decision Tree
Example Attributes Target
Hour Weather Accident Stall Commute
D1 8 AM Sunny No No Long
D2 8 AM Cloudy No Yes Long
D3 10 AM Sunny No No Short
D4 9 AM Rainy Yes No Long
D5 9 AM Sunny Yes Yes Long
D6 10 AM Sunny No No Short
D7 10 AM Cloudy No No Short
D8 9 AM Rainy No No Medium
D9 9 AM Sunny Yes No Long
D10 10 AM Cloudy Yes Yes Long
D11 10 AM Rainy No No Short
D12 8 AM Cloudy Yes No Long
D13 9 AM Sunny No No Medium
Sample Experience Table
The previous experience decision table had 4 attributes:
1. Hour
2. Weather
3. Accident
4. Stall
But the decision tree only showed 3 attributes:
1. Hour
2. Accident
3. Stall
Why?
Choosing Attributes
 Methods for selecting attributes show that weather is
not a discriminating attribute
 We use the principle of Occam’s Razor: Given a
number of competing hypotheses, the simplest one
is preferable
Choosing Attributes
 The basic structure of creating a decision tree is
the same for most decision tree algorithms
 The difference lies in how we select the attributes
for the tree
 We will focus on the ID3 algorithm developed by
Ross Quinlan in 1975
Choosing Attributes
 The basic idea behind any decision tree algorithm is as
follows:
o Choose the best attribute(s) to split the remaining instances and make
that attribute a decision node
o Repeat this process for recursively for each child
o Stop when:
 All the instances have the same target attribute value
 There are no more attributes
 There are no more instances
Decision Tree Algorithms
Original decision tree
Identifying the Best Attributes
Leave At
Stall? Accident?
10 AM 9 AM
8 AM
Long
Long
Short Medium
No Yes No Yes
Long
How did we know to split on leave at and then on stall and
accident and not weather?
 To determine the best attribute, we look at the
ID3 heuristic
 ID3 splits attributes based on their entropy.
Entropy is the measure of disinformation…
ID3 Heuristic
 Entropy is minimized when all values of the target
attribute are the same
o If we know that commute time will always be short, then entropy = 0
 Entropy is maximized when there is an equal chance
of all values for the target attribute (i.e. the result is
random)
o If commute time = short in 3 instances, medium in 3 instances and long
in 3 instances, entropy is maximized
Entropy
 Calculation of entropy
o Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)
 S = set of examples
 Si = subset of S with value vi under the target attribute
 l = size of the range of the target attribute
Entropy
 ID3 splits on attributes with the lowest entropy
 We calculate the entropy for all values of an attribute
as the weighted sum of subset entropies as follows:
o ∑(i = 1 to k) |Si|/|S| Entropy(Si), where k is the range
of the attribute we are testing
 We can also measure information gain (which is
inversely proportional to entropy) as follows:
o Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)
ID3
Attribute Expected Entropy Information Gain
Hour 0.6511 0.768449
Weather 1.28884 0.130719
Accident 0.92307 0.496479
Stall 1.17071 0.248842
ID3
Given our commute time sample set, we can calculate
the entropy of each attribute at the root node
 There is another technique for reducing the
number of attributes used in a tree – pruning
 Two types of pruning:
oPre-pruning (forward pruning)
oPost-pruning (backward pruning)
Pruning Trees
 In prepruning, we decide during the building process
when to stop adding attributes (possibly based on their
information gain)
 However, this may be problematic – Why?
o Sometimes attributes individually do not contribute much to a
decision, but combined, they may have a significant impact
Prepruning
 Postpruning waits until the full decision tree
has built and then prunes the attributes
 Two techniques:
o Subtree Replacement
o Subtree Raising
Postpruning
Entire subtree is replaced by a single leaf node
Subtree Replacement
A
B
C
1 2 3
4 5
• Node 6 replaced
the subtree
• Generalizes tree
a little more, but
may increase
accuracy
Subtree Replacement
A
B
6 4 5
Entire subtree is raised onto another node
Subtree Raising
A
B
C
1 2 3
4 5
Entire subtree is raised onto another node
We will NOT be using Subtree Raising in this course!
Subtree Raising
A
C
1 2 3
 ID3 is not optimal
o Uses expected entropy reduction, not actual reduction
 Must use discrete (or discretized) attributes
o What if we left for work at 9:30 AM?
o We could break down the attributes into smaller
values…
Problems with ID3
If we broke down leave time to the minute, we
might get something like this:
Problems with ID3
8:02 AM 10:02 AM8:03 AM 9:09 AM9:05 AM 9:07 AM
Long Medium Short Long Long Short
Since entropy is very low for each branch, we have n branches
with n leaves. This would not be helpful for predictive modeling.
 We can use a technique known as discretization
 We choose cut points, such as 9AM for splitting
continuous attributes
 These cut points generally lie in a subset of boundary
points, such that a boundary point is where two adjacent
instances in a sorted list have different target value
attributes
Problems with ID3
Consider the attribute commute time
Problems with ID3
8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M)
When we split on these attributes, we increase
the entropy so we don’t have a decision tree
with the same number of cut points as leaves
 While decision trees classify quickly, the time for
building a tree may be higher than another type of
classifier
 Decision trees suffer from a problem of errors
propagating throughout a tree
 A very serious problem as the number of classes
increases
Problems with Decision Trees
 Since decision trees work by a series of local
decisions, what happens when one of these
local decisions is wrong?
o Every decision from that point on may be wrong
o We may never return to the correct path of the
tree
Error Propagation
 Decision trees can be used to help predict the
future
 The trees are easy to understand
 Decision trees work more efficiently with discrete
attributes
 The trees may suffer from error propagation
Summary
“This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s
Employment and Training Administration. The solution was created by the grantee and does not
necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor
makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such
information, including any information on linked sites and including, but not limited to, accuracy of the
information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building Capacity in
Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative
Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
Copyright Information

More Related Content

PPT
Propagation Models
PPT
Pdh and sdh1
PPTX
Introduction to Automatic Control Systems
PPT
2. signal & systems beyonds
PPTX
Spectrum vs Bandwidth vs Datarate | Networking
PPT
spread spectrum technique
PPT
Multiplexing : FDM
Propagation Models
Pdh and sdh1
Introduction to Automatic Control Systems
2. signal & systems beyonds
Spectrum vs Bandwidth vs Datarate | Networking
spread spectrum technique
Multiplexing : FDM

What's hot (14)

PPTX
OKUMURA, HATA and COST231 Propagation Models
PPT
Delta Modulation
PPTX
spectrum analyzer
PDF
8051 instruction set
PDF
Control system note for 6th sem electrical
PPTX
4.types of manufacturing system and layouts
PPT
PDF
cost estimation model per unit model and segmenting model.pdf
PPTX
Sonet (synchronous optical networking )
PDF
Improving coverage and capacity in cellular systems
PPTX
Channel assignment strategies
PPTX
SDH Frame Structure
PDF
Advanced VLSI PPT 21EC71 Module 2.......
OKUMURA, HATA and COST231 Propagation Models
Delta Modulation
spectrum analyzer
8051 instruction set
Control system note for 6th sem electrical
4.types of manufacturing system and layouts
cost estimation model per unit model and segmenting model.pdf
Sonet (synchronous optical networking )
Improving coverage and capacity in cellular systems
Channel assignment strategies
SDH Frame Structure
Advanced VLSI PPT 21EC71 Module 2.......
Ad

Viewers also liked (10)

PPTX
Decision trees
PPTX
BAS 150 Lesson 4 Lecture
PPTX
BAS 150 Lesson 6 Lecture
PPTX
BAS 150 Lesson 5 Lecture
PDF
Learning SAS With Example by Ron Cody :Chapter 16 to Chapter 20 Solution
PDF
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
PDF
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
PPTX
BAS 150 Lesson 2 Lecture
PPT
Where Vs If Statement
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Decision trees
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 5 Lecture
Learning SAS With Example by Ron Cody :Chapter 16 to Chapter 20 Solution
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
BAS 150 Lesson 2 Lecture
Where Vs If Statement
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Ad

Similar to BAS 250 Lecture 5 (20)

PPT
Storey_DecisionTrees explain ml algo.ppt
PPT
Decision tree Using Machine Learning.ppt
PPTX
K nearest neighbor
PPTX
Machine Learning with Python unit-2.pptx
PPTX
Decision tree induction
PPTX
Decision Tree.pptx
PPTX
Decision trees
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
PPTX
3. Tree Models in machine learning
PPTX
Machine learning tree models for classification
PPTX
Decision Trees
PPTX
Machine Learning, Decision Tree Learning module_2_ppt.pptx
PPTX
Chapter 3 Decision Trees.pptx by mark magumba
PPTX
Lecture 12.pptx for bca student DAA lecture
PPTX
Decision tree presentation
PDF
lec8_annotated.pdf ml csci 567 vatsal sharan
PPTX
Decision tree in artificial intelligence
PPTX
decision tree DECISION TREE IN MACHINE .pptx
PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
PPTX
module_3_1.pptx
Storey_DecisionTrees explain ml algo.ppt
Decision tree Using Machine Learning.ppt
K nearest neighbor
Machine Learning with Python unit-2.pptx
Decision tree induction
Decision Tree.pptx
Decision trees
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
3. Tree Models in machine learning
Machine learning tree models for classification
Decision Trees
Machine Learning, Decision Tree Learning module_2_ppt.pptx
Chapter 3 Decision Trees.pptx by mark magumba
Lecture 12.pptx for bca student DAA lecture
Decision tree presentation
lec8_annotated.pdf ml csci 567 vatsal sharan
Decision tree in artificial intelligence
decision tree DECISION TREE IN MACHINE .pptx
Decision Tree-ID3,C4.5,CART,Regression Tree
module_3_1.pptx

More from Wake Tech BAS (9)

PPTX
BAS 250 Lecture 8
PPTX
BAS 250 Lecture 4
PPTX
BAS 250 Lecture 3
PPTX
BAS 250 Lecture 2
PPTX
BAS 250 Lecture 1
PPTX
BAS 150 Lesson 8 Lecture
PPTX
BAS 150 Lesson 7 Lecture
PPTX
BAS 150 Lesson 3 Lecture
PPTX
BAS 150 Lesson 1 Lecture
BAS 250 Lecture 8
BAS 250 Lecture 4
BAS 250 Lecture 3
BAS 250 Lecture 2
BAS 250 Lecture 1
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 1 Lecture

Recently uploaded (20)

PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Cell Types and Its function , kingdom of life
PPTX
Cell Structure & Organelles in detailed.
PDF
Insiders guide to clinical Medicine.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
master seminar digital applications in india
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Classroom Observation Tools for Teachers
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Pre independence Education in Inndia.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Cell Types and Its function , kingdom of life
Cell Structure & Organelles in detailed.
Insiders guide to clinical Medicine.pdf
Basic Mud Logging Guide for educational purpose
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
master seminar digital applications in india
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Classroom Observation Tools for Teachers
102 student loan defaulters named and shamed – Is someone you know on the list?
Pre independence Education in Inndia.pdf
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O7-L3 Supply Chain Operations - ICLT Program
Week 4 Term 3 Study Techniques revisited.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Final Presentation General Medicine 03-08-2024.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES

BAS 250 Lecture 5

  • 1. BAS 250 Lesson 5: Decision Trees
  • 2. • Explain what decision trees are, how they are used, and the benefits of using them • Describe the best format for data in order to perform predictive decision tree mining • Interpret visual tree’s nodes and leaves • Explain the use of different algorithms in order to increase the granularity of the tree’s detail This Week’s Learning Objectives
  • 3.  What is a Decision Tree  Sample Decision Trees  How to Construct a Decision Tree  Problems with Decision Trees  Summary Overview
  • 4. • Decision trees are excellent predictive models when the target attribute is categorical in nature and when the data set is of mixed data types • More numerically-based approaches, decision trees are better at handling attributes that have missing or inconsistent values that are not handled- decision trees will work around such data and still generate usable results • Decision trees are made of nodes and leaves to represent the best predictor attributes in a data set • Decision trees tell the user what is predicted, how confident that prediction can be, and how we arrived at said prediction Overview
  • 5. An example of a Decision Tree developed in RapidMiner Decision Trees
  • 6. • Nodes are circular or oval shapes that represent attributes which serve as good predictors for the label attribute • Leaves are end points that demonstrate the distribution of categories from the label attribute that follow the branch of the tree to the point of that leaf Decision Trees
  • 7. An example of meta data for playing golf based on a decision tree Decision Trees
  • 8.  An inductive learning task o Use particular facts to make more generalized conclusions  A predictive model based on a branching series of Boolean tests o These smaller Boolean tests are less complex than a one- stage classifier  Let’s look at a sample decision tree… What is a Decision Tree?
  • 9. Predicting Commute Time Leave At Stall? Accident? 10 AM 9 AM 8 AM Long Long Short Medium Long No Yes No Yes If we leave at 10 AM and there are no cars stalled on the road, what will our commute time be?
  • 10.  In this decision tree, we made a series of Boolean decisions and followed the corresponding branch o Did we leave at 10 AM? o Did a car stall on the road? o Is there an accident on the road?  By answering each of these yes/no questions, we then came to a conclusion on how long our commute might take Inductive Learning
  • 11. We did not have represent this tree graphically We could have represented as a set of rules. However, this may be much harder to read… Decision Trees as Rules
  • 12. if hour == 8am commute time = long else if hour == 9am if accident == yes commute time = long else commute time = medium else if hour == 10am if stall == yes commute time = long else commute time = short Decision Tree as a Rule Set • Notice that all attributes to not have to be used in each path of the decision. • As we will see, all attributes may not even appear in the tree.
  • 13. 1. We first make a list of attributes that we can measure  These attributes (for now) must be discrete 2. We then choose a target attribute that we want to predict 3. Then create an experience table that lists what we have seen in the past How to Create a Decision Tree
  • 14. Example Attributes Target Hour Weather Accident Stall Commute D1 8 AM Sunny No No Long D2 8 AM Cloudy No Yes Long D3 10 AM Sunny No No Short D4 9 AM Rainy Yes No Long D5 9 AM Sunny Yes Yes Long D6 10 AM Sunny No No Short D7 10 AM Cloudy No No Short D8 9 AM Rainy No No Medium D9 9 AM Sunny Yes No Long D10 10 AM Cloudy Yes Yes Long D11 10 AM Rainy No No Short D12 8 AM Cloudy Yes No Long D13 9 AM Sunny No No Medium Sample Experience Table
  • 15. The previous experience decision table had 4 attributes: 1. Hour 2. Weather 3. Accident 4. Stall But the decision tree only showed 3 attributes: 1. Hour 2. Accident 3. Stall Why? Choosing Attributes
  • 16.  Methods for selecting attributes show that weather is not a discriminating attribute  We use the principle of Occam’s Razor: Given a number of competing hypotheses, the simplest one is preferable Choosing Attributes
  • 17.  The basic structure of creating a decision tree is the same for most decision tree algorithms  The difference lies in how we select the attributes for the tree  We will focus on the ID3 algorithm developed by Ross Quinlan in 1975 Choosing Attributes
  • 18.  The basic idea behind any decision tree algorithm is as follows: o Choose the best attribute(s) to split the remaining instances and make that attribute a decision node o Repeat this process for recursively for each child o Stop when:  All the instances have the same target attribute value  There are no more attributes  There are no more instances Decision Tree Algorithms
  • 19. Original decision tree Identifying the Best Attributes Leave At Stall? Accident? 10 AM 9 AM 8 AM Long Long Short Medium No Yes No Yes Long How did we know to split on leave at and then on stall and accident and not weather?
  • 20.  To determine the best attribute, we look at the ID3 heuristic  ID3 splits attributes based on their entropy. Entropy is the measure of disinformation… ID3 Heuristic
  • 21.  Entropy is minimized when all values of the target attribute are the same o If we know that commute time will always be short, then entropy = 0  Entropy is maximized when there is an equal chance of all values for the target attribute (i.e. the result is random) o If commute time = short in 3 instances, medium in 3 instances and long in 3 instances, entropy is maximized Entropy
  • 22.  Calculation of entropy o Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)  S = set of examples  Si = subset of S with value vi under the target attribute  l = size of the range of the target attribute Entropy
  • 23.  ID3 splits on attributes with the lowest entropy  We calculate the entropy for all values of an attribute as the weighted sum of subset entropies as follows: o ∑(i = 1 to k) |Si|/|S| Entropy(Si), where k is the range of the attribute we are testing  We can also measure information gain (which is inversely proportional to entropy) as follows: o Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si) ID3
  • 24. Attribute Expected Entropy Information Gain Hour 0.6511 0.768449 Weather 1.28884 0.130719 Accident 0.92307 0.496479 Stall 1.17071 0.248842 ID3 Given our commute time sample set, we can calculate the entropy of each attribute at the root node
  • 25.  There is another technique for reducing the number of attributes used in a tree – pruning  Two types of pruning: oPre-pruning (forward pruning) oPost-pruning (backward pruning) Pruning Trees
  • 26.  In prepruning, we decide during the building process when to stop adding attributes (possibly based on their information gain)  However, this may be problematic – Why? o Sometimes attributes individually do not contribute much to a decision, but combined, they may have a significant impact Prepruning
  • 27.  Postpruning waits until the full decision tree has built and then prunes the attributes  Two techniques: o Subtree Replacement o Subtree Raising Postpruning
  • 28. Entire subtree is replaced by a single leaf node Subtree Replacement A B C 1 2 3 4 5
  • 29. • Node 6 replaced the subtree • Generalizes tree a little more, but may increase accuracy Subtree Replacement A B 6 4 5
  • 30. Entire subtree is raised onto another node Subtree Raising A B C 1 2 3 4 5
  • 31. Entire subtree is raised onto another node We will NOT be using Subtree Raising in this course! Subtree Raising A C 1 2 3
  • 32.  ID3 is not optimal o Uses expected entropy reduction, not actual reduction  Must use discrete (or discretized) attributes o What if we left for work at 9:30 AM? o We could break down the attributes into smaller values… Problems with ID3
  • 33. If we broke down leave time to the minute, we might get something like this: Problems with ID3 8:02 AM 10:02 AM8:03 AM 9:09 AM9:05 AM 9:07 AM Long Medium Short Long Long Short Since entropy is very low for each branch, we have n branches with n leaves. This would not be helpful for predictive modeling.
  • 34.  We can use a technique known as discretization  We choose cut points, such as 9AM for splitting continuous attributes  These cut points generally lie in a subset of boundary points, such that a boundary point is where two adjacent instances in a sorted list have different target value attributes Problems with ID3
  • 35. Consider the attribute commute time Problems with ID3 8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M) When we split on these attributes, we increase the entropy so we don’t have a decision tree with the same number of cut points as leaves
  • 36.  While decision trees classify quickly, the time for building a tree may be higher than another type of classifier  Decision trees suffer from a problem of errors propagating throughout a tree  A very serious problem as the number of classes increases Problems with Decision Trees
  • 37.  Since decision trees work by a series of local decisions, what happens when one of these local decisions is wrong? o Every decision from that point on may be wrong o We may never return to the correct path of the tree Error Propagation
  • 38.  Decision trees can be used to help predict the future  The trees are easy to understand  Decision trees work more efficiently with discrete attributes  The trees may suffer from error propagation Summary
  • 39. “This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.” Except where otherwise stated, this work by Wake Technical Community College Building Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Copyright Information