1. Dr C Deepa
Associate Professor and Head
CS(AI & DS)
Sri Ramakrishna College of Arts & Science
Coimbatore -6.
Decision Tree Classification System with
concepts of Information Gain
2. Classification
• Classification is aclassicalmethod used bymachinelearning researchers
andstatisticiansto predicttheoutcome of unknown samples.
• It is used for categorizing of the objects into a given discrete number of
classes.
• Classes of classifications aretwo types:
a) Binary class:- The target attribute have two possible values such as a team will
either win or lose.
b) Multi class:-thetarget attributecanhavemore thantwo values.For example:-A
tumor can be of type1,type II and or type III cancer.
3. Working of classification
It is a two-step process. The first step is training the model by analysing the
database tuples and the second step is testing of the model for the accuracy
on the unknown instances.
Training andTesting of a classifier
.
5. Introduction to the Decision Tree Classifier
In the decision tree classifier, predictions are made by using multiple
‘if…then...’ conditions
• The decision tree structure consists of a root
node, branches and leaf nodes.
• Each internal node represents a condition on
some input attribute, each branch specifies
the outcome of the condition and each leaf
node holds a class label.
• The root node is the topmost node in the tree
• Decision trees can easily be converted to
classification rules in the form of if-then
statements.
• Each leaf node specifies a decision or
prediction.
• The training process that produces this tree is
known as induction.
Decision tree to predict whether a customer will buy a
laptop or not
6. Building decision tree
Decision tree is a common machine learning technique which has been
implemented in many machine learning tools like Weka, R, Matlab as
well as some programming languages such as Python, Java, etc.
Decision tree are based on the concept of Information Gain and Gini
Index.
7. Understanding Information Gain
Itis basicallyrelatedto theuncertaintyintheinformation.
• For example, If there is a biased coin having an head on both sides, the result of tossing the coin has no
information.Similarly,if an unbiased coin is tossed thentheresulted toss depictssome information.
• if in your university or college, there is holiday on Sunday then a notice regarding the same
will not carry any information (because it is certain) but if some particular Sunday becomes
a working day then it will be information and henceforth becomes a news.
The information gain is counted as the probability of occurrences of an event. The formula to
calculate the information gain of an event is
8. Understanding the Entropy
Information theory defines entropy which is average amount of information
given by a source of data
• Information plays a key role in selecting the root node or attribute for building a
decision tree.
OR selection of a split attribute plays an important role.
• Split attribute is an attribute that reduces the uncertainty by largest amount, and is
always accredited as a root node. So, the attribute must distribute the objects such that
each attribute value results in objects that have as little uncertainty as possible.
9. Information gain
Information gain specifies the amount of information that is gained by knowing the value of
the attribute.
Information gain specifies the amount of information that is gained by knowing the value of
the attribute.
Mathematically, it is defined as the entropy of the distribution before the split minus the
entropy of the distribution after split
Information gain = (Entropy of distribution before the split) – (Entropy of
distribution after the split)
The largest information gain is equivalent to the smallest entropy or minimum information.
After computing information gain for every attribute,the classwiththehighestinformationgainis selected
assplitattribute.
10. ⦁ Problem: Ifapla
ytook placeor not.
⦁ Here,playis the output attribute.
⦁ These14 records contain the
information about weather.
Building Decision Tree (1)
Outlook Temp. Humidity Windy Play
Sunny = 5 Hot = 4 High = 7 True = 6 Yes = 9
Overcast =4 Mild = 6 Normal =7 False = 8 No = 5
Rainy = 5 Cool = 4
T
able 2:Summary of the dataset
Table1:A sampled dataset
Instance
number
Outlook Temper
ature
Humidity Windy Play
1 Sunny Hot High False No
2 Sunny Hot High True No
3 Overcast Hot High False Yes
4 Rainy Mild High False Yes
5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
7 Overcast Cool Normal True Yes
8 Sunny Mild High False No
9 Sunny Cool Normal False Yes
10 Rainy Mild Normal False Yes
11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No
11. 1. InformationGainbeforetheSplit
⦁ I (Play) = - probability for Class Yes * log (probability for
Class Yes) – probability for Class No * log (probability
for Class No)
⦁ I (Play) = (-9/
14) log (9/
14) – (5/
14) log (5/
14)
⦁ I (Play) = 0.9435142
Building Decision Tree (2)
T
able 1:A sampled dataset
Instance
number
Outlook Temper
ature
Humidity Windy Play
1 Sunny Hot High False No
2 Sunny Hot High True No
3 Overcast Hot High False Yes
4 Rainy Mild High False Yes
5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
7 Overcast Cool Normal True Yes
8 Sunny Mild High False No
9 Sunny Cool Normal False Yes
10 Rainy Mild Normal False Yes
11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No
Play Probability
Yes = 9 P (9 / 14)
No = 5 P (5 / 14)
Total 14 / 14
T
able 3:Probability of Play
12. II. Information Gain afterthe Split(Outlook)
Total information for sub-trees =P(Sunny)* I(Sunny) + P(Overcast) * I(Overcast) + P(Rainy) * I (Rainy)
I (Overcast) = (5/14) * 0.97428 + (4/14) * 0.00 + (5/14) * 0.97428
I (Overcast) = 0.695917
Building Decision Tree (3)
T
able 4: Outlook variable
Outlook
Probability
P (Outlook)
Yes No
Sunny = 5 P (2 / 5) P (3 / 5) P(5 / 14 )
Overcast=4 P (4 / 4) P (0 / 4) P(4 / 14 )
Rainy = 5 P (3 / 5) P (2 / 5) P(5 / 14 )
Total 9 / 14 5 / 14 14 / 14
⦁ I (Sunny) = -(2/
5) log (2/
5) – (3/
5) log (3/
5)
I (Sunny) = 0.97428
I (Overcast) = -(4/
4) log (4/
4) – (0/
4) log (0/
4)
⦁ I (Overcast) = 0.00
I (Rainy) = -(3/
5) log (3/
5) – (2/
5) log (2/
5)
⦁ I (Rainy) = 0.97428
13. II. InformationGainaftertheSplit(T
emperature)
⦁ Total information for sub-trees =P(Hot) * I(Hot) + P(Mild) * I(Mild) + P(Cool) * I (Cool)
I (Temperature) = (4/14) * 1.003433 + (6/14) * 0.9214486 + (4/14) * 0.814063501
I (Temperature) = 0.91419125587
Building Decision Tree (4)
T
able 5: Outlook variable
Temp-
erature
Probability
P (Temp.)
Yes No
Hot = 4 P (2 / 4) P (2 / 4) P (4 / 14 )
Mild = 6 P (4 / 6) P (2 / 6) P (6 / 14 )
Cool = 4 P (3 / 4) P (1 / 4) P (4 / 14 )
Total = 14 9 / 14 5 / 14 14 / 14
⦁ I (Hot) = -(2/
4) log (2/
4) – (2/
4) log (2/
4)
I (Hot) = 1.003433
⦁ I (Mild) = -(4/
6) log (4/
6) – (2/
6) log (2/
6)
I (Mild) = 0.9214486
⦁ I (Cool) = -(3/
4) log (3/
4) – (1/
4) log (1/
4)
I (Cool) = 0.814063501
14. II. Information Gain afterthe Split(Humidity)
Total information for sub-trees =P(High) * I(High) + P(Normal) * I(Normal)
I (Humidity) = (7/14) * 0.98861 + (7/14) * 0.593704
I (Humidity) = 0.791157
Building Decision Tree (5)
T
able 6: Outlook variable
Humidity
Probability P
(Humidity)
Yes No
High = 7 P (3 / 7) P (4 / 7) P (7 / 14 )
Normal = 7 P (6 / 7) P (1 / 7) P (7 / 14 )
Total = 14 9 / 14 5 / 14 14 / 14
⦁ I (High) = -(3/
7) log (3/
7) – (4/
7) log (4/
7)
I (High) = 0.98861
⦁ I (Normal) = -(6/
7) log (6/
7) – (1/
7) log (1/
7)
I (Normal) = 0.593704
15. II. Information Gain aftertheSplit(Windy)
T
otal information for sub-trees =P(T
rue) * I(T
rue) + P(False) * I(False)
I (Windy) = (6/14) * 1,03433 + (8/14) * 0.81406
I (Windy) = 0.89522
Building Decision Tree (6)
T
able 7: Outlook variable
Windy
Probability
P (Windy)
Yes No
True = 6 P (3 / 6) P (3 / 6) P (6 / 14 )
False = 8 P (6 / 8) P (2 / 8) P (8/ 14 )
Total = 14 9 / 14 5 / 14 14 / 14
⦁ I (True) = -(3/
6) log (3/
6) – (3/
6) log (3/
6)
I (True) = 1.003433
⦁ I (False) = -(6/
8) log (6/
8) – (2/
8) log (2/
8)
I (False) = 0.81406
16. ⦁ Outlook is selectedasRoot
Building Decision Tree (7)
III. FinalInformation Gain
T
able 8: Information Gain
Potential
Split
Attributes
Inform-
ation
before
split
Inform-
ation after
split
Information
Gain
Outlook 0.9435 0.6959 0.2476
Temperature 0.9435 0.9142 0.0293
Humidity 0.9435 0.7912 0.15234
Windy 0.9435 0.8952 0.0483
Outlook
? ?
Sunny Overcast Rainy
Figure 3:Data splitting based on Outlook
attribute
?
17. IV. Re-CalculateInformation Gain
T
able 9: Dataset for Outlook “Sunny”
⦁ Datasetconsist of 5 samples
⦁ Yes = 2 /5
⦁ No = 3 /5
Building Decision Tree (8)
Temp-
erature
Humidity Windy Play
Hot High False No
Hot High True No
Mild High False No
Cool Normal False Yes
Mild Normal True Yes
I (Play)= - probability for Class Yes * log
(probability for Class Yes) – probability for
Class No * log (probabilityfor Class No)
⦁ I (Play) = -(2/
5) log (2/
5) – (3/
5) log (3/
5)
⦁ I (Play) = 0.97
18. IV. Re-CalculateInformation Gain
T
able 9: Dataset for Outlook “Sunny”
Building Decision Tree (9)
Temp-
erature
Humidity Windy Play
Hot High False No
Hot High True No
Mild High False No
Cool Normal False Yes
Mild Normal True Yes
Attribute ‘Temperature’
19. IV. Re-CalculateInformation Gain
T
able 9: Dataset for Outlook “Sunny”
Building Decision Tree (10)
Temp-
erature
Humidity Windy Play
Hot High False No
Hot High True No
Mild High False No
Cool Normal False Yes
Mild Normal True Yes
Attribute ‘Humidity’
20. IV. Re-Calculate Information Gain
T
able 9: Dataset for Outlook “Sunny”
Building Decision Tree (11)
Temp-
erature
Humidity Windy Play
Hot High False No
Hot High True No
Mild High False No
Cool Normal False Yes
Mild Normal True Yes
Attribute ‘Windy’
21. IV. Re-CalculateInformation Gain
T
able 9: Dataset for Outlook “Sunny”
Building Decision Tree (12)
Temp-
erature
Humidity Windy Play
Hot High False No
Hot High True No
Mild High False No
Cool Normal False Yes
Mild Normal True Yes