Day 4 of 30 – Decision Trees in Machine Learning
Index
#MachineLearning #AI #MLRoadmap #30DaysOfML #LearningTogether #DecisionTree #SupervisedLearning #PythonML #HandsOnML #GrowEveryday #SaileshWrites
What is a Decision Tree?
A Decision Tree is like a flowchart. Imagine asking a series of yes/no questions to arrive at a decision – like a game of “20 Questions”. Decision Trees follow a similar logic.
Each internal node represents a question (decision), each branch represents the outcome, and each leaf node represents the final result (prediction).
It's like asking:
Is the weather sunny?
Yes → Is it hot?
Yes → Stay indoors
No → Go outside
No → Carry an umbrella
This kind of structure is great for both classification (Yes/No, True/False, categories) and regression (predicting numbers).
No Feature Scaling Needed!
Unlike logistic or linear regression, Decision Trees are not affected by different ranges of data. So you don’t need to scale features. That’s one step less!
Advantages
Easy to understand
Inbuilt feature selection
Requires little data for preprocessing and preparation
Works for both classification and regression
No need for scaling
Perform well with large datasets
Disadvantages
Can overfit if not pruned
Sensitive to small data changes
Unbalanced dataset can create problems
Decision Tree Metrics (Split Criteria)
Gini Impurity
Entropy
Information Gain
What is Gini Impurity?
Gini Impurity is one of the popular metrics used to decide how to split a node in a Decision Tree. It helps us measure how “pure” or “impure” a node is. In other words, it tells us how mixed up the classes are in a group of data.
Imagine This:
Let’s say you have a basket of fruits — apples and oranges.
If your basket contains only apples, it is pure.
If it contains 50% apples and 50% oranges, it is impure.
The Gini Impurity measures this impurity.
Gini Formula
The formula for Gini Impurity is:
Where:
pi is the probability (or proportion) of each class in the node.
Example 1: Pure Node (All Apples)
Suppose:
You have 10 fruits — all apples.
So, probability of apple p1 = 1, and orange p2=0
Gini = 1 - (1^2 + 0^2) = 1 - (1 + 0) = 0
A Gini score of 0 means the node is completely pure.
Example 2: Half Apples, Half Oranges
Suppose:
You have 10 fruits — 5 apples, 5 oranges.
So, p1 = 0.5; p2 = 0.5
Gini = 1 - (0.5^2 + 0.5^2) = 1 - (0.25 + 0.25) = 0.5
A Gini score of 0.5 means the node is impure (most mixed up).
Example 3: 80% Apples, 20% Oranges
So, p1 = 0.8; p2 = 0.2
Gini = 1 - (0.8^2 + 0.2^2) = 1 - (0.64 + 0.04) = 1 - 0.68 = 0.32
A Gini score of 0.32 — better than 0.5 — means this node is less impure.
Gini in Decision Tree
The Decision Tree algorithm chooses the split that gives us:
Lower Gini Impurity after the split
Because lower Gini means groups are purer (more “certain” class predictions)
What is Entropy?
Entropy is a metric that tells us how pure or impure a dataset is. It comes from information theory and is used in decision trees to decide which attribute to split on.
If all elements in a dataset belong to the same class → Entropy = 0 (pure)
If the data is split evenly between classes → Entropy = 1 (impure)
Think of it like:
"How much disorder or uncertainty is in this group?"
Entropy Formula:
Where:
S = dataset
c = number of classes
pi = proportion of class i
✅ Example:
Suppose you have 10 samples:
6 are "Yes" (positive)
4 are "No" (negative)
Then,
So the entropy is 0.971, which means the data is somewhat impure.
What is Information Gain?
Information Gain tells us how much entropy is reduced after we split the data on a particular feature.
"How much better did we make our dataset by splitting it on this feature?"
Formula:
It calculates the reduction in entropy.
The higher the Information Gain, the better that feature is for splitting.
Example:
Suppose we want to decide whether to play outside based on the weather (Sunny, Rainy). We have:
Parent set:
6 Yes
4 No Entropy = 0.971 (from earlier)
Split on “Weather”:
Sunny (5 samples) → 4 Yes, 1 No; Entropy ≈0.722
Rainy (5 samples) → 2 Yes, 3 No; Entropy ≈0.971
Weighted average of child entropies:
Entropy = (5/10 * 0.722) + (5/10 * 0.971) = 0.847
Information Gain:
IG = 0.971 − 0.847 = 0.124
So, splitting on "Weather" reduces impurity by 0.124.
Hands on
Click here to access the dataset
Click here to access the working code
TIP: To know how to run the code, or to know where these code is written and executed, here is the small video to help. https://www.youtube.com/watch?v=RLYoEyIHL6A
Learning Recap
Today we learned:
What Decision Trees are and how they work
Why they are intuitive and visual
How to train and test a Decision Tree model
How to interpret its accuracy using confusion matrix and reports
What’s Coming Next?
Next up – Day 5: K-Nearest Neighbors (KNN). A model that learns by “looking around” – quite literally!
Tip: Try changing the and see how the accuracy changes. A deeper tree may fit better but can overfit – watch out!
Share this with a friend who's curious about how machines make decisions!
Repost to your network — let’s build a powerful ML community together. #MachineLearning #AI #Python #DataScience #SaileshWrites #MLCommunity #30DaysChallenge #TechLearning #GrowEveryday
Student at PGDAV College
1w💡 Great insight