Decision Trees

“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”

● Introduction to Trees
● Construction of Trees
● Information
● Root-Node Decision
● Classification Tree
● Regression Tree
● Pruning
● Advantages and Disadvantages of Trees
Overview of
Decision Trees

Introduction to Trees
● Supervised learning algorithm
● Classification & Regression
● Flowchart-like structure
● Models consist of one or more
nested if-then statements
● Mimic the human level thinking

Construction of Tree
● Hierarchical way of partitioning
the space
● Conditioning on the features
● Greedy way of partitioning is done

Measuring Information
● Low probability events have higher information
● Entropy is the average rate of information
● The measure of uncertainty

Information Gain
● Measures how much “information” a feature gives us about the class

Classification Tree
● In this problem, we have
four features i.e, X values
and one response i.e, Y
● Need to learn mapping
between X and Y
Outlook Temp. Humidity Wind Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No

The Root Node
Entropy:
Pyes = - (9/14)*log2(9/14) = 0.41
Pno = - (5/14)*log2(5/14) = 0.53
H(S) = Pno + Pyes = 0.94
Outlook Temp. Humidity Wind Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No

The Root Node
E(Outlook = Sunny) = - (2/5)*log(2/5) - (3/5)*log(3/5) = 0.971
E(Outlook = Overcast) = - (1)*log(1) - (0)*log(0) = 0
E(Outlook = Sunny) = - (3/5)*log(3/5) - (2/5)*log(2/5) = 0.971
Average Entropy information for Outlook:
I(Outlook) = (5/14)*0.971+(4/14)*0+(5/14)*0.971 = 0.693
Gain(Outlook) = 0.94 - 0.693 = 0.247
Outlook Temperature
Info 0.693 Info 0.911
Gain 0.247 Gain 0.029
Humidity Windy
Info 0.788 Info 0.892
Gain 0.152 Gain 0.048

Algorithm
● Compute the entropy for data-set
● For every attribute/feature:
○ Calculate entropy for all categories
○ Take average information for the current
attribute
○ Calculate gain for the current attribute
● Pick the highest gain attribute
● Repeat until we get the desired tree

Other Criteria
● The Gini index is defined as:
where 𝑝(c) denotes the proportion of instances belonging to class c
● The Classification error is defined as:

Regression Tree
● Divide the space into regions and fit a
model to the each region
● Constant can be simple fit to a each
region
● Deciding on which feature to split
could be an optimization problem

Optimization Function
● The optimization function for the regression is

Optimization Function
● The average at each region is the best estimate of the values at particular
region
● s and j values are found by solving the above problem

Pruning
● It involves removing the branches
● Reduce the complexity of tree
● Increases its predictive power by reducing overfitting
● Two methods of pruning:
○ Pre-pruning
○ Post-pruning

Post-pruning
● Cutting the tree once it is grown fully based on below Global optimal
function

Pre-pruning
● Setting the parameters before building the model
○ Set maximum tree depth
○ Set maximum number of terminal nodes
○ Set minimum samples for a node split
○ Set maximum number of features
● Controls the size of a resultant terminal nodes
● Sklearn uses this method

Various Algorithms
● CART (Classification and Regression Trees) → Uses Gini Index as metric
● ID3 (Iterative Dichotomiser 3) → Uses Entropy and Information gain as
metrics
● C4.5, C5.0, CHAID, QUEST are various other algorithms
● Sklearn implements CART

Advantages of Trees
● Simple to understand, interpret, visualize
● Implicitly perform variable screening or feature selection
● Can handle both numerical and categorical data
● Can also handle multi-output problems
● Decision trees require relatively little effort from users for data preparation
● Nonlinear relationships between parameters do not affect tree
performance

Disadvantages of Trees
● Decision-tree learners can create overfitting
● Decision trees can be unstable due to small variations in the data
● Greedy algorithms cannot guarantee to return the globally optimal tree
● Decision tree can be biased trees if some classes dominate

Decision Trees

More Related Content

Similar to Decision Trees (20)

More from zekeLabs Technologies (20)

Recently uploaded (20)

Decision Trees