2. Decision Tree
• Decision tree is a supervised machine
learning
technique, based on the divide and conquer paradigm.
• The basic idea behind decision trees is to partition the space
into patches and to fit a model to a patch.
• A decision tree is a tree structure, where each internal node
(non-leaf node) denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (or
terminal node) holds a class label.
• A decision tree is a classifier expressed as a recursive partition
of the instance space.
• Decision trees are used for classification task.
4. Building Decision Tree
The core algorithm for building decision trees called ID3 employs a top-
down, greedy search through the space of possible branches with no
backtracking. ID3 uses Entropy and Information Gain to construct a
decision tree.
• The tree starts as a single node, N, representing the training records in
D.
• If the records in D are all of the same class, then node N becomes a
leaf and is labeled with that class.
• Otherwise, the algorithm calls Attribute selection method to determine
the splitting criterion. The splitting criterion tells us which attribute to
test at node N by determining the “best” way to separate or partition
the tuples in D into individual classes.
6. Information Gain
• Information gain is Gain(D,A) for a set D is the effective change
in entropy after deciding on a particular attribute A.
𝑮𝒂𝒊𝒏( , ) = ( ) − ( , )
𝑫 𝑨 𝑬 𝑫 𝑬 𝑫 𝑨
• The information gain is the decrease in entropy after a dataset
is split on an attribute.
• Constructing a decision tree is all about finding attribute that
returns the highest information gain (i.e., the most
homogeneous branches).
8. Training set
15 Rain High Weak ?
Predict will John
play tennis ?
Day Outlook Humidity Wind Play
1 Sunny High Weak No
2 Sunny High Strong No
3 Cloudy High Weak Yes
4 Rain High Weak Yes
5 Rain Normal Weak Yes
6 Rain Normal Strong No
7 Cloudy Normal Strong Yes
8 Sunny High Weak No
9 Sunny Normal Weak Yes
10 Rain Normal Weak Yes
11 Sunny Normal Strong Yes
12 Cloudy High Strong Yes
13 Cloudy Normal Weak Yes
14 Rain High Strong No
21. RID Age Salary Employee feedback Purchase
1 <=30 High No Fair No
2 <=30 High No Excellent No
3 31..40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31..40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Fair Yes
11 <=30 Medium Yes Excellent Yes
12 31..40 Medium No Excellent Yes
13 31..40 High Yes Fair Yes
22. Sample MCQ
Internal nodes of a decision tree correspond to:
• A. Decision
• B. Classes
• C. Data instances
• D. None of the above
Correct Answer: Decision
23. Leaf nodes of a decision tree correspond to:
• A. Decision
• B. Classes
• C. Data instances
• D. None of the above Accepted
Correct Answer: Classes
24. Consider the following small data table for two classes of woods. Using
information gain, construct a decision tree to classify the data set.
Answer the following question for the resulting tree.
Which attribute would information gain choose as the root of the tree?
A. Density B. Grain C. Hardness D. None of the above
Correct Answer: Hardness
25. ________is a decision support tool that uses a tree-like graph or
model of decisions and their possible consequences, including
chance event outcomes, resource costs, and utility.
(a) Decision tree
(b) Graphs
(c) Trees
(d) Networks
Correct Answer: Decision tree
26. Age Competition Type Profi
t
Old Yes Software Dow
n
Old No Software Dow
n
Old No Hardware Dow
n
Mid Yes Software Dow
n
Mid Yes Hardware Dow
n
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up
Exercise