Day 4 of 30 – Decision Trees in Machine Learning

Sailesh Jaiswal

Aspiring CTO, IT Director, Technical Program Manager | Certified AWS Solution Architect | PMP® | Certified Agile Coach (ICP-ACC®)

Published Aug 6, 2025

Index

#MachineLearning #AI #MLRoadmap #30DaysOfML #LearningTogether #DecisionTree #SupervisedLearning #PythonML #HandsOnML #GrowEveryday #SaileshWrites

What is a Decision Tree?

A Decision Tree is like a flowchart. Imagine asking a series of yes/no questions to arrive at a decision – like a game of “20 Questions”. Decision Trees follow a similar logic.

Each internal node represents a question (decision), each branch represents the outcome, and each leaf node represents the final result (prediction).

It's like asking:

Is the weather sunny?
Yes → Is it hot?
Yes → Stay indoors
No → Go outside
No → Carry an umbrella

This kind of structure is great for both classification (Yes/No, True/False, categories) and regression (predicting numbers).

No Feature Scaling Needed!

Unlike logistic or linear regression, Decision Trees are not affected by different ranges of data. So you don’t need to scale features. That’s one step less!

Advantages

Easy to understand
Inbuilt feature selection
Requires little data for preprocessing and preparation
Works for both classification and regression
No need for scaling
Perform well with large datasets

Disadvantages

Can overfit if not pruned
Sensitive to small data changes
Unbalanced dataset can create problems

Decision Tree Metrics (Split Criteria)

Gini Impurity
Entropy
Information Gain

What is Gini Impurity?

Gini Impurity is one of the popular metrics used to decide how to split a node in a Decision Tree. It helps us measure how “pure” or “impure” a node is. In other words, it tells us how mixed up the classes are in a group of data.

Imagine This:

Let’s say you have a basket of fruits — apples and oranges.

If your basket contains only apples, it is pure.
If it contains 50% apples and 50% oranges, it is impure.

The Gini Impurity measures this impurity.

Gini Formula

The formula for Gini Impurity is:

Where:

pi is the probability (or proportion) of each class in the node.

Example 1: Pure Node (All Apples)

Suppose:

You have 10 fruits — all apples.

So, probability of apple p1 = 1, and orange p2=0

Gini = 1 - (1^2 + 0^2) = 1 - (1 + 0) = 0

A Gini score of 0 means the node is completely pure.

Example 2: Half Apples, Half Oranges

Suppose:

You have 10 fruits — 5 apples, 5 oranges.

So, p1 = 0.5; p2 = 0.5

Gini = 1 - (0.5^2 + 0.5^2) = 1 - (0.25 + 0.25) = 0.5

A Gini score of 0.5 means the node is impure (most mixed up).

Example 3: 80% Apples, 20% Oranges

So, p1 = 0.8; p2 = 0.2

Gini = 1 - (0.8^2 + 0.2^2) = 1 - (0.64 + 0.04) = 1 - 0.68 = 0.32

A Gini score of 0.32 — better than 0.5 — means this node is less impure.

Gini in Decision Tree

The Decision Tree algorithm chooses the split that gives us:

Lower Gini Impurity after the split
Because lower Gini means groups are purer (more “certain” class predictions)

What is Entropy?

Entropy is a metric that tells us how pure or impure a dataset is. It comes from information theory and is used in decision trees to decide which attribute to split on.

If all elements in a dataset belong to the same class → Entropy = 0 (pure)
If the data is split evenly between classes → Entropy = 1 (impure)

Think of it like:

"How much disorder or uncertainty is in this group?"

Entropy Formula:

Where:

S = dataset
c = number of classes
pi = proportion of class i

✅ Example:

Suppose you have 10 samples:

6 are "Yes" (positive)
4 are "No" (negative)

Then,

So the entropy is 0.971, which means the data is somewhat impure.

What is Information Gain?

Information Gain tells us how much entropy is reduced after we split the data on a particular feature.

"How much better did we make our dataset by splitting it on this feature?"

Formula:

It calculates the reduction in entropy.
The higher the Information Gain, the better that feature is for splitting.

Example:

Suppose we want to decide whether to play outside based on the weather (Sunny, Rainy). We have:

Parent set:

6 Yes
4 No Entropy = 0.971 (from earlier)

Split on “Weather”:

Sunny (5 samples) → 4 Yes, 1 No; Entropy ≈0.722
Rainy (5 samples) → 2 Yes, 3 No; Entropy ≈0.971

Weighted average of child entropies:

Entropy = (5/10 * 0.722) + (5/10 * 0.971) = 0.847

Information Gain:

IG = 0.971 − 0.847 = 0.124

So, splitting on "Weather" reduces impurity by 0.124.

Hands on

Click here to access the dataset

Click here to access the working code

TIP: To know how to run the code, or to know where these code is written and executed, here is the small video to help. https://www.youtube.com/watch?v=RLYoEyIHL6A

Learning Recap

Today we learned:

What Decision Trees are and how they work
Why they are intuitive and visual
How to train and test a Decision Tree model
How to interpret its accuracy using confusion matrix and reports

What’s Coming Next?

Next up – Day 5: K-Nearest Neighbors (KNN). A model that learns by “looking around” – quite literally!

Tip: Try changing the and see how the accuracy changes. A deeper tree may fit better but can overfit – watch out!

Share this with a friend who's curious about how machines make decisions!

Repost to your network — let’s build a powerful ML community together. #MachineLearning #AI #Python #DataScience #SaileshWrites #MLCommunity #30DaysChallenge #TechLearning #GrowEveryday

Index

What is a Decision Tree?

No Feature Scaling Needed!

Advantages

Disadvantages

Decision Tree Metrics (Split Criteria)

What is Gini Impurity?

Imagine This:

Gini Formula

Example 1: Pure Node (All Apples)

Example 2: Half Apples, Half Oranges

Example 3: 80% Apples, 20% Oranges

Gini in Decision Tree

What is Entropy?

Think of it like:

Entropy Formula:

✅ Example:

What is Information Gain?

Formula:

Example:

Parent set:

Split on “Weather”:

Weighted average of child entropies:

Information Gain:

Hands on

Learning Recap

What’s Coming Next?

Day 5 of 30 - KNN: The Friendly Neighborhood Algorithm

Aug 12, 2025

Day 3 of 30 – Logistic Regression: The Go-To Algorithm for Classification

Aug 2, 2025

Day 2 of 30 – Linear Regression

Jul 31, 2025

Day 1 of 30 – Introduction to Machine Learning

Jul 30, 2025

#AWSSeries Article 20 – Hybrid Cloud Solutions: The Best of Both Worlds

Jan 4, 2025

#AWSSeries Article 19 – Migration in AWS: Seamlessly Moving to the Cloud

Jan 2, 2025

#AWSSeries Article 18 – Governance in AWS: Staying in Control

Jan 1, 2025

#AWSSeries Article 17 – Caching in AWS: Accelerating Performance

Dec 30, 2024

#AWSSeries Article 16 – Automation in AWS: Working Smarter, Not Harder

Dec 29, 2024

#AWSSeries Article 15 – Security in AWS: Keeping Your Cloud Safe

Dec 28, 2024

Others also viewed

XGboost

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Types of Machine Learning Models From Basics to Advanced

Machine Learning (Classification models)

Common Machine Learning Algorithms

How Decision Tree Algorithms Work in Machine Learning: A Step-by-Step Guide

How (not) to use Machine Learning for time series forecasting: The sequel

Linear Regression in Machine Learning: Algorithm, Uses and Examples

Class 20 - MODEL EVALUATION METRICS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

🍞 Bread is a duckadent meal

Explore topics