22PCOAM16 ML Unit 3 Session 19 Constructing Decision Trees.pptx
1. 14/04/2025 1
Department of Computer Science & Engineering (SB-ET)
III B. Tech -I Semester
MACHINE LEARNING
SUBJECT CODE: 22PCOAM16
AcademicY
ear: 2023-2024
by
Dr. M.Gokilavani
GNITC
Department of CSE (SB-ET)
2. 14/04/2025 Department of CSE (SB-ET) 2
22PCOAM16 MACHINE LEARNING
UNIT – III
Syllabus
Learning with Trees – Decision Trees – Constructing Decision Trees –
Classification and Regression Trees – Ensemble Learning – Boosting –
Bagging – Different ways to Combine Classifiers – Basic Statistics –
Gaussian Mixture Models – Nearest Neighbor Methods – Unsupervised
Learning – K means Algorithms
3. 14/04/2025 3
TEXTBOOK:
• Stephen Marsland, Machine Learning - An Algorithmic Perspective, Second Edition,
Chapman and Hall/CRC.
• Machine Learning and Pattern Recognition Series, 2014.
REFERENCES:
• Tom M Mitchell, Machine Learning, First Edition, McGraw Hill Education, 2013.
• Ethem Alpaydin, Introduction to Machine Learning 3e (Adaptive Computation and
Machine
No of Hours Required: 13
Department of CSE (SB-ET)
UNIT - III LECTURE – 19
4. 14/04/2025 Department of CSE (SB-ET) 4
Constructing Decision Trees
• Starting at the Root: The algorithm begins at the top, called the “root
node,” representing the entire dataset.
• Asking the Best Questions: It looks for the most important feature or
question that splits the data into the most distinct groups.
• Branching Out: Based on the answer to that question, it divides the data
into smaller subsets, creating new branches. Each branch represents a
possible route through the tree.
• Repeating the Process: The algorithm continues asking questions and
splitting the data at each branch until it reaches the final “leaf nodes,”
representing the predicted outcomes or classifications.
UNIT - III LECTURE - 19
5. 14/04/2025 Department of CSE (SB-ET) 5
Constructing Decision Trees
• While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes.
• To solve such problems there is a technique which is called as Attribute
selection measure or ASM.
• By this measurement, we can easily select the best attribute for the nodes
of the tree. There are two popular techniques for ASM, which are:
• Information Gain
• Gini Index
UNIT - III LECTURE - 19
6. 14/04/2025 Department of CSE (SB-ET) 6
INFORMATION GAIN
• Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the
decision tree.
• A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest information gain
is split first.
• It can be calculated using the below formula:
Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature)]
UNIT - III LECTURE - 19
7. 14/04/2025 Department of CSE (SB-ET) 7
Entropy
UNIT - III LECTURE - 19
• Entropy: Entropy is a metric to measure the impurity in a given attribute.
It specifies randomness in data.
• Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
•S= Total number of samples
•P(yes)= probability of yes
•P(no)= probability of no
8. 14/04/2025 Department of CSE (SB-ET) 8
Gini Index
• Gini index is a measure of impurity or purity used while creating a decision
tree in the CART(Classification and Regression Tree) algorithm.
• An attribute with the low Gini index should be preferred as compared to
the high Gini index.
• It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
• Gini index can be calculated using the below formula:
UNIT - III LECTURE - 19
Gini Index= 1- ∑j
Pj
2
9. 14/04/2025 Department of CSE (SB-ET) 9
Pruning
• Pruning is a process of deleting the unnecessary nodes from a tree in
order to get the optimal decision tree.
• A too-large tree increases the risk of over fitting, and a small tree may not
capture all the important features of the dataset.
• Therefore, a technique that decreases the size of the learning tree without
reducing accuracy is known as Pruning.
• There are mainly two types of tree pruning technology used:
• Cost Complexity Pruning
• Reduced Error Pruning
UNIT - III LECTURE - 19
10. 14/04/2025 Department of CSE (SB-ET) 10
UNIT - III LECTURE - 19
Types of Decision Tree
• Classification Trees: Used when the target variable is categorical. For
example, predicting whether an email is spam or not spam.
• Regression Trees: Used when the target variable is continuous, like
predicting house prices.
11. 14/04/2025 Department of CSE (SB-ET) 11
Topics to be covered in next session 20
• ID3 Algorithm
Thank you!!!
UNIT - III LECTURE - 19