2. INTRODUCTION
Classification Trees: When the decision tree has categorical target variable. The above tree
is an example of a classification tree because we know that there are two options for the
result.
Regression Trees: When the decision tree has a continuous target variable. For example, a
regression tree would be used for the price of a newly launched product because price can
be anything depending on various constraints.
Both types of decision trees fall under the Classification and Regression Tree (CART)
designation.
5. Golf players for sunny outlook = {25, 30, 35, 38, 48}
Average of golf players for sunny outlook = (25+30+35+38+48)/5 = 35.2
Standard deviation of golf players for sunny outlook = (((25 – 35.2)
√ 2
+ (30 – 35.2)2
+
… )/5) = 7.78
6. Golf players for overcast outlook = {46, 43, 52, 44}
Average of golf players for overcast outlook = (46 + 43 + 52 + 44)/4 =
46.25
Standard deviation of golf players for overcast outlook = (((46-
√
46.25)2
+(43-46.25)2
+…)= 3.49
7. Golf players for overcast outlook = {45, 52, 23, 46, 30}
Average of golf players for overcast outlook =
(45+52+23+46+30)/5 = 39.2
Standard deviation of golf players for rainy outlook = (((45 –
√
39.2)2
+(52 – 39.2)2
+…)/5)=10.87
8. Weighted standard deviation for outlook = (4/14)x3.49 +
(5/14)x10.87 + (5/14)x7.78 = 7.66
Standard deviation reduction for outlook = 9.32 – 7.66 =
1.66
9. Weighted standard deviation for humidity = (7/14)x9.36 +
(7/14)x8.73 = 9.04
Standard deviation reduction for humidity = 9.32 – 9.04 = 0.27
11. Root Node
Outlo
ok
!4 data - Global Std dev
5 data - Global Std dev
5
Temp
Hot
Mild
Cool
Sunny
Wind
Weak
Strong
Humidity
High
Normal
12. Golf players for sunny outlook = {25, 30, 35, 38, 48}
Average of golf players for sunny outlook = (25+30+35+38+48)/5 = 35.2
Standard deviation of golf players for sunny outlook = (((25 – 35.2)
√ 2
+ (30 – 35.2)2
+
… )/5) = 7.78
Considered as Global standard deviation for this sub data set = 7.78
13. Standard deviation for sunny
outlook and hot temperature = 2.5
Standard deviation for sunny
outlook and cool temperature = 0
Standard deviation for sunny
outlook and mild temperature =
6.5
14. Weighted standard deviation for sunny outlook and temperature =
(2/5)x2.5 + (1/5)x0 + (2/5)x6.5 = 3.6
Standard deviation reduction for sunny outlook and temperature =
7.78 – 3.6 = 4.18
15. Weighted standard deviations for sunny outlook and humidity = (3/5)x4.08 + (2/5)x5 =
4.45
Standard deviation reduction for sunny outlook and humidity = 7.78 – 4.45 = 3.33
Weighted standard deviations for sunny outlook and wind = (2/5)x9 + (3/5)x5.56 =
6.93
Standard deviation reduction for sunny outlook and wind = 7.78 – 6.93 = 0.85
Summarizing standard deviations for windy feature
when outlook is sunny
17. FINAL FORM OF REGRESSION TREE
https://sefiks.com/2018/08/28/a-step-by-step-regression-decision-tree-example/
Leaf Node =
Golf Player
5
5
1
(2) 2
18. Decision Tree
Entropy
Information gain – Higher gain Best candidate to be selected as a node
Entropy – If all the data belongs to the same class label – Entropy =0 (Pure)
If the input data belongs to many class lables – Entropy = near to 1 (Impure)
Nodes Input attributes (Ex: outlook)
Arcs/links/edges Values of input attributes (Ex: Sunny, Rainy, Overcast)
Top node – Root node
Other nodes in the tree – Intermediate nodes
Leaf node (last level of the tree) – Identifies the corresponding class label (Ex: Play = Yes/ No)
From Decision tree Derive classification rules
How many rules can be derived ? No of leaf level nodes