Machine_Learining_Concepts_DecisionTrees&PCA.pdf

Decision Trees &
Dimensionality Reduction: PCA
Maruthi Sai Sasank Thunuguntla
AP22110011297
CSE - S
March 30, 2025
SRM University - AP

Overview
• Introduction to Decision Trees
• Importance of Decision trees
• Solved Example
• Dimensionality Reduction and Techniques
• PCA and Solved Example.
• Objectives of this presentation:
• Understand the working of Decision Trees.
• Learn about PCA and its role in dimensionality reduction.
• Comparision of PCA & Decision Trees
Maruthi Sai Sasank 2

Introduction : What is a Decision Tree?
• A supervised learning algorithm used for classification and
regression. Aiming to predict the class label / value of a target
variable by learning simple decision rules from the data .
• Structure:
• Root Node: Starting point.
• Decision Nodes: Splitting points.
• Leaf Nodes: Final outcomes/decision.
• Example: Predicting weather-based activities.

Importance of Decision Trees .
• Interpretability: Easy to understand and visualize, making
them useful for explaining model decisions.
• Versatility: Can handle both classification and regression
problems.
• Feature Selection: Inherently selects important features by
choosing the best splits.
• Non-linearity: Captures complex decision boundaries without
requiring feature transformations.
• Handles Missing Values: Can work with datasets that have
missing values.
• Low Computational Cost: Compared to other models,
decision trees are relatively efficient to train and evaluate.
• Foundation for Advanced Models: Used in ensemble
methods like Random Forest and Gradient Boosting.

Working of Decision Trees?
Steps:
1. Start at the root node.
2. Split data based on features (e.g., temperature, humidity).
3. Continue splitting until reaching leaf nodes.
Key Concepts:
• Entropy and Information Gain.
• Algorithms: ID3, C4.5, CART.

Example: Decision Tree Using ID3 Algorithm
Decision Tree solving using ID3 algorithmn.
Dataset:
Outlook Temperature Windy Play?
Sunny Hot No No
Sunny Hot Yes No
Overcast Hot No Yes
Rainy Mild No Yes
Rainy Cool No Yes
Rainy Cool Yes No
Overcast Cool Yes Yes
Sunny Mild No Yes

Entropy Calculation
Entropy Formula:
Entropy(S) = −
X
pi log2 pi (1)
Compute Entropy for Play? (Target Variable)
• Total samples = 8
• Positive (Yes) = 4, Negative (No) = 4
Entropy of Play:
Entropy(S) = −

4
8
log2
4
8
+
4
8
log2
4
8

(2)
= −(0.5 × −1 + 0.5 × −1) = 1.0 (3)
The Play(target variable) Entropy(S) = 1.

Information Gain Calculation
Information Gain Formula:
IG(S, A) = Entropy(S) −
X |Sv |
|S|
Entropy(Sv ) (4)
Compute Information Gains for the features
Splitting by Outlook:
• SSunny = {No, No, Yes} ⇒ Entropy = 0.918
• SOvercast = {Yes, Yes} ⇒ Entropy = 0.0
• SRainy = {Yes, Yes, No} ⇒ Entropy = 0.918
1) Information Gain for Outlook:
IG(S, Outlook) = 1 −

3
8
× 0.918 +
2
8
× 0 +
3
8
× 0.918

(5)
= 1 − (0.344 + 0 + 0.344) = 0.311 (6)

2) Information Gain for Temperature
• SHot = {No, No, Yes} ⇒ Entropy = 0.918
• SMild = {Yes, Yes} ⇒ Entropy = 0.0
• SCool = {Yes, No, Yes} ⇒ Entropy = 0.918
IG(S, Temperature) = 1−

3
8
× 0.918 +
2
8
× 0 +
3
8
× 0.918

(7)
= 1 − (0.344 + 0 + 0.344) = 0.311 (8)
3. Information Gain for Windy
• SWindy=Yes = {No, No, Yes} ⇒ Entropy = 0.918
• SWindy=No = {No, Yes, Yes, Yes, Yes} ⇒ Entropy = 0.721
IG(S, Windy) = 1 −

3
8
× 0.918 +
5
8
× 0.721

(9)
= 1 − (0.344 + 0.451) = 0.205 (10)

Overall Information Gains :
- Outlook = 0.311
- Temperature = 0.311
- Windy = 0.205
Now we take root node as Outlook and we proceed further .
Outlook
• Overcast → Yes
• Sunny → Calculate information gain for Temperature and
Windy with respect to Outlook-Sunny as system
• Entropy(OutlookSunny) = 0.918

Further Splitting of Sunny and Rainy
1) Splitting the Sunny Branch (Temperature)
• SHot = {No, No} ⇒ Entropy = 0.0 (Pure Node)
• SMild = {Yes} ⇒ Entropy = 0.0 (Pure Node)
IG(SSunny , Temperature) = 0.918 −

2
3
× 0 +
1
3
× 0

= 0.918
(11)
Decision: Since IG = 0.918, we split by Temperature.
2) Splitting the Rainy Branch (Windy)
• SWindy=No = {Yes, Yes} ⇒ Entropy = 0.0 (Pure Node)
• SWindy=Yes = {No} ⇒ Entropy = 0.0 (Pure Node)
IG(SRainy , Windy) = 0.918 −

2
3
× 0 +
1
3
× 0

= 0.918 (12)
Decision: Since IG = 0.918, we split by Windy.

Final Decision Tree
Constructed Decision Tree:

Pros and Cons of Decision Trees
Advantages:
• Easy to interpret and visualize
• Handles numerical and categorical data
• Performs well with small datasets
• Feature selection is automatic
Disadvantages:
• Prone to overfitting
• Sensitive to small changes in data
• Less effective with complex relationships

Dimensionality Reduction - PCA

What is Dimensionality Reduction?
Definition:
• A technique used to reduce the number of features (variables)
in a dataset while preserving its key information.
• This is achieved by transforming data from a high-dimensional
space to a lower-dimensional space, simplifying models and
improving performance
Why is it Needed?
• Avoids the Curse of Dimensionality: Too many features
can lead to overfitting and increased computation time.
• Improves Model Performance: Reduces noise and
redundant data, making the model more efficient.
• Simplification and Interpretability: Easier to understand
and visualize, which is crucial for gaining insights, etc

Dimensionality Reduction : Common Techniques
Common Techniques:
• PCA (Principal Component Analysis): Transforms data
into new variables (principal components) that capture
maximum variance.
• LDA (Linear Discriminant Analysis): Focuses on
maximizing class separability.
• Singular Value Decomposition: SVD is a linear algebra
technique that decomposes a matrix (A) into three matrices:
U,Σ, and V
• t-SNE (t-Distributed Stochastic Neighbor Embedding):
Used for visualizing high-dimensional data in 2D or 3D.

What is PCA?
Principal Component Analysis (PCA) is a statistical technique,that
transforms a set of possibly correlated variables into a smaller
number of uncorrelated variables called principal components,
which are linear combinations of the original variables.
How Does PCA Work?
• Identifies patterns in data and finds the directions (principal
components) that capture the most variance.
• Transforms the original data into a smaller set of new
variables while preserving key trends.
Why Use PCA?
• Reduces complexity and computation time.
• Helps visualize high-dimensional data in 2D or 3D.
• Removes noise and redundant features for better model

Steps in PCA
Key Idea: Maximize variance in lower-dimensional space.
Applications: Image compression, preprocessing for ML models.
Steps in PCA:
1. Compute Mean of class Centered matrix.
2. Compute the Covariance Matrix
3. Calculate Eigenvalues and Eigenvectors
4. Sort Eigenvalues and Select Principal Components
5. Construct the Projection Matrix
6. Transform the Original Data (Projection onto Principal
Components)

PCA Example: Step 1 - Compute Mean
Given Data:
X =





2 3
3 5
5 8
7 10





Step 1: Compute the mean of each column
µx =
2 + 3 + 5 + 7
4
= 4.25, µy =
3 + 5 + 8 + 10
4
= 6.5
Interpretation: The mean values represent the central tendency
of each feature (dimension). We will subtract these means to
center the data .

PCA Example: Step 2 - Center Data
Step 2: Center the data (subtract the mean)
Xcentered = X − µ
Xcentered =





2 − 4.25 3 − 6.5
3 − 4.25 5 − 6.5
5 − 4.25 8 − 6.5
7 − 4.25 10 − 6.5





=





−2.25 −3.5
−1.25 −1.5
0.75 1.5
2.75 3.5





Interpretation: The data is now centered at the origin, meaning
each feature has a mean of zero.

PCA Example: Step 3 - Compute Covariance Matrix
Step 3: Compute the Covariance Matrix
The covariance matrix is calculated using:
Σ =
1
n − 1
(XT
centeredXcentered)
Expanding:
Σ =
1
3

−2.25 −1.25 0.75 2.75
−3.5 −1.5 1.5 3.5
#
·





−2.25 −3.5
−1.25 −1.5
0.75 1.5
2.75 3.5





Σ =

4.916 6.83
6.83 9.67
#
Interpretation: The covariance matrix shows how the features are
correlated.

PCA Example: Step 4 - Compute Eigenvalues Eigenvectors
Step 4: Compute Eigenvalues and Eigenvectors
Solve for eigenvalues (λ) using:
det(Σ − λI) = 0
4.916 − λ 6.83
6.83 9.67 − λ
= 0
Expanding:
(4.916 − λ)(9.67 − λ) − (6.83)2
= 0
Solving, we get the eigen values as :
λ1 = 14.52, λ2 = 0.06

PCA Example: Step 4
- λ1 has the highest value.
Corresponding eigenvectors:
v1 =

−0.57
−0.81
#
, v2 =

−0.81
0.57
#
• v1 corresponds to λ1 and v2 corresponds to λ2.
Interpretation: The eigenvector with the largest eigenvalue
represents the principal direction of variance.

PCA Example: Step 5 - Project Data Onto Principal Compo-
nent
Step 5: Project data onto the principal component
Xprojected = Xcentered · v1
Xprojected =





−2.25 −3.5
−1.25 −1.5
0.75 1.5
2.75 3.5





·

0.61
0.79
#
=





−4.17
−2.00
1.39
4.78





Interpretation: The 2D data is now transformed into a 1D
representation.

PCA Example : Projection Visulaisation
• The principal component with the highest eigenvalue captures
the most variance.
• This transformation enables visualization, noise reduction, and
improved model performance.

PCA Example : Summary
Summary of PCA Example:
• Original data was in 2D.
• PCA reduced it to 1D while preserving variance.
• The first principal component captured the most important
information.
• This technique is widely used in dimensionality reduction and
data compression.

PCA vs. Decision Tree
Aspect PCA Decision Tree
Learning Type Unsupervised Supervised
Purpose Dimensionality reduc-
tion
Classification, Regres-
sion
Approach Transforms features
into new principal
components
Splits data based on
feature conditions
Feature Selec-
tion
Creates new features Selects important fea-
tures
Interpretability Hard to interpret Easy to interpret
Usage Preprocessing, noise
reduction
Decision-making
tasks

Conclusion
• Decision Trees are intuitive tools for classification/regression
tasks.
• Dimensionality reduction plays a crucial role in improving
computational efficiency, reducing noise, and enhancing model
performance.
• PCA reduces dimensions while retaining key information.
• Both are essential in machine learning workflows.

Machine_Learining_Concepts_DecisionTrees&PCA.pdf

More Related Content

Similar to Machine_Learining_Concepts_DecisionTrees&PCA.pdf (20)

Recently uploaded (20)

Machine_Learining_Concepts_DecisionTrees&PCA.pdf