SlideShare a Scribd company logo
Decision Trees &
Dimensionality Reduction: PCA
Maruthi Sai Sasank Thunuguntla
AP22110011297
CSE - S
March 30, 2025
SRM University - AP
Overview
• Introduction to Decision Trees
• Importance of Decision trees
• Solved Example
• Dimensionality Reduction and Techniques
• PCA and Solved Example.
• Objectives of this presentation:
• Understand the working of Decision Trees.
• Learn about PCA and its role in dimensionality reduction.
• Comparision of PCA & Decision Trees
Maruthi Sai Sasank 2
Decision Trees
Introduction : What is a Decision Tree?
• A supervised learning algorithm used for classification and
regression. Aiming to predict the class label / value of a target
variable by learning simple decision rules from the data .
• Structure:
• Root Node: Starting point.
• Decision Nodes: Splitting points.
• Leaf Nodes: Final outcomes/decision.
• Example: Predicting weather-based activities.
Maruthi Sai Sasank 3
Importance of Decision Trees .
• Interpretability: Easy to understand and visualize, making
them useful for explaining model decisions.
• Versatility: Can handle both classification and regression
problems.
• Feature Selection: Inherently selects important features by
choosing the best splits.
• Non-linearity: Captures complex decision boundaries without
requiring feature transformations.
• Handles Missing Values: Can work with datasets that have
missing values.
• Low Computational Cost: Compared to other models,
decision trees are relatively efficient to train and evaluate.
• Foundation for Advanced Models: Used in ensemble
methods like Random Forest and Gradient Boosting.
Maruthi Sai Sasank 4
Working of Decision Trees?
Steps:
1. Start at the root node.
2. Split data based on features (e.g., temperature, humidity).
3. Continue splitting until reaching leaf nodes.
Key Concepts:
• Entropy and Information Gain.
• Algorithms: ID3, C4.5, CART.
Maruthi Sai Sasank 5
Example: Decision Tree Using ID3 Algorithm
Decision Tree solving using ID3 algorithmn.
Dataset:
Outlook Temperature Windy Play?
Sunny Hot No No
Sunny Hot Yes No
Overcast Hot No Yes
Rainy Mild No Yes
Rainy Cool No Yes
Rainy Cool Yes No
Overcast Cool Yes Yes
Sunny Mild No Yes
Maruthi Sai Sasank 6
Entropy Calculation
Entropy Formula:
Entropy(S) = −
X
pi log2 pi (1)
Compute Entropy for Play? (Target Variable)
• Total samples = 8
• Positive (Yes) = 4, Negative (No) = 4
Entropy of Play:
Entropy(S) = −

4
8
log2
4
8
+
4
8
log2
4
8

(2)
= −(0.5 × −1 + 0.5 × −1) = 1.0 (3)
The Play(target variable) Entropy(S) = 1.
Maruthi Sai Sasank 7
Information Gain Calculation
Information Gain Formula:
IG(S, A) = Entropy(S) −
X |Sv |
|S|
Entropy(Sv ) (4)
Compute Information Gains for the features
Splitting by Outlook:
• SSunny = {No, No, Yes} ⇒ Entropy = 0.918
• SOvercast = {Yes, Yes} ⇒ Entropy = 0.0
• SRainy = {Yes, Yes, No} ⇒ Entropy = 0.918
1) Information Gain for Outlook:
IG(S, Outlook) = 1 −

3
8
× 0.918 +
2
8
× 0 +
3
8
× 0.918

(5)
= 1 − (0.344 + 0 + 0.344) = 0.311 (6)
Maruthi Sai Sasank 8
Information Gain Calculation
2) Information Gain for Temperature
• SHot = {No, No, Yes} ⇒ Entropy = 0.918
• SMild = {Yes, Yes} ⇒ Entropy = 0.0
• SCool = {Yes, No, Yes} ⇒ Entropy = 0.918
IG(S, Temperature) = 1−

3
8
× 0.918 +
2
8
× 0 +
3
8
× 0.918

(7)
= 1 − (0.344 + 0 + 0.344) = 0.311 (8)
3. Information Gain for Windy
• SWindy=Yes = {No, No, Yes} ⇒ Entropy = 0.918
• SWindy=No = {No, Yes, Yes, Yes, Yes} ⇒ Entropy = 0.721
IG(S, Windy) = 1 −

3
8
× 0.918 +
5
8
× 0.721

(9)
= 1 − (0.344 + 0.451) = 0.205 (10)
Maruthi Sai Sasank 9
Information Gain Calculation
Overall Information Gains :
- Outlook = 0.311
- Temperature = 0.311
- Windy = 0.205
Now we take root node as Outlook and we proceed further .
Outlook
• Overcast → Yes
• Sunny → Calculate information gain for Temperature and
Windy with respect to Outlook-Sunny as system
• Entropy(OutlookSunny) = 0.918
Maruthi Sai Sasank 10
Further Splitting of Sunny and Rainy
1) Splitting the Sunny Branch (Temperature)
• SHot = {No, No} ⇒ Entropy = 0.0 (Pure Node)
• SMild = {Yes} ⇒ Entropy = 0.0 (Pure Node)
IG(SSunny , Temperature) = 0.918 −

2
3
× 0 +
1
3
× 0

= 0.918
(11)
Decision: Since IG = 0.918, we split by Temperature.
2) Splitting the Rainy Branch (Windy)
• SWindy=No = {Yes, Yes} ⇒ Entropy = 0.0 (Pure Node)
• SWindy=Yes = {No} ⇒ Entropy = 0.0 (Pure Node)
IG(SRainy , Windy) = 0.918 −

2
3
× 0 +
1
3
× 0

= 0.918 (12)
Decision: Since IG = 0.918, we split by Windy.
Maruthi Sai Sasank 11
Final Decision Tree
Constructed Decision Tree:
Maruthi Sai Sasank 12
Pros and Cons of Decision Trees
Advantages:
• Easy to interpret and visualize
• Handles numerical and categorical data
• Performs well with small datasets
• Feature selection is automatic
Disadvantages:
• Prone to overfitting
• Sensitive to small changes in data
• Less effective with complex relationships
Maruthi Sai Sasank 13
Dimensionality Reduction - PCA
What is Dimensionality Reduction?
Definition:
• A technique used to reduce the number of features (variables)
in a dataset while preserving its key information.
• This is achieved by transforming data from a high-dimensional
space to a lower-dimensional space, simplifying models and
improving performance
Why is it Needed?
• Avoids the Curse of Dimensionality: Too many features
can lead to overfitting and increased computation time.
• Improves Model Performance: Reduces noise and
redundant data, making the model more efficient.
• Simplification and Interpretability: Easier to understand
and visualize, which is crucial for gaining insights, etc
Maruthi Sai Sasank 14
Dimensionality Reduction : Common Techniques
Common Techniques:
• PCA (Principal Component Analysis): Transforms data
into new variables (principal components) that capture
maximum variance.
• LDA (Linear Discriminant Analysis): Focuses on
maximizing class separability.
• Singular Value Decomposition: SVD is a linear algebra
technique that decomposes a matrix (A) into three matrices:
U,Σ, and V
• t-SNE (t-Distributed Stochastic Neighbor Embedding):
Used for visualizing high-dimensional data in 2D or 3D.
Maruthi Sai Sasank 15
What is PCA?
Principal Component Analysis (PCA) is a statistical technique,that
transforms a set of possibly correlated variables into a smaller
number of uncorrelated variables called principal components,
which are linear combinations of the original variables.
How Does PCA Work?
• Identifies patterns in data and finds the directions (principal
components) that capture the most variance.
• Transforms the original data into a smaller set of new
variables while preserving key trends.
Why Use PCA?
• Reduces complexity and computation time.
• Helps visualize high-dimensional data in 2D or 3D.
• Removes noise and redundant features for better model
Maruthi Sai Sasank 16
Steps in PCA
Key Idea: Maximize variance in lower-dimensional space.
Applications: Image compression, preprocessing for ML models.
Steps in PCA:
1. Compute Mean of class  Centered matrix.
2. Compute the Covariance Matrix
3. Calculate Eigenvalues and Eigenvectors
4. Sort Eigenvalues and Select Principal Components
5. Construct the Projection Matrix
6. Transform the Original Data (Projection onto Principal
Components)
Maruthi Sai Sasank 17
PCA Example: Step 1 - Compute Mean
Given Data:
X =





2 3
3 5
5 8
7 10





Step 1: Compute the mean of each column
µx =
2 + 3 + 5 + 7
4
= 4.25, µy =
3 + 5 + 8 + 10
4
= 6.5
Interpretation: The mean values represent the central tendency
of each feature (dimension). We will subtract these means to
center the data .
Maruthi Sai Sasank 18
PCA Example: Step 2 - Center Data
Step 2: Center the data (subtract the mean)
Xcentered = X − µ
Xcentered =





2 − 4.25 3 − 6.5
3 − 4.25 5 − 6.5
5 − 4.25 8 − 6.5
7 − 4.25 10 − 6.5





=





−2.25 −3.5
−1.25 −1.5
0.75 1.5
2.75 3.5





Interpretation: The data is now centered at the origin, meaning
each feature has a mean of zero.
Maruthi Sai Sasank 19
PCA Example: Step 3 - Compute Covariance Matrix
Step 3: Compute the Covariance Matrix
The covariance matrix is calculated using:
Σ =
1
n − 1
(XT
centeredXcentered)
Expanding:
Σ =
1
3

−2.25 −1.25 0.75 2.75
−3.5 −1.5 1.5 3.5
#
·





−2.25 −3.5
−1.25 −1.5
0.75 1.5
2.75 3.5





Σ =

4.916 6.83
6.83 9.67
#
Interpretation: The covariance matrix shows how the features are
correlated.
Maruthi Sai Sasank 20
PCA Example: Step 4 - Compute Eigenvalues  Eigenvectors
Step 4: Compute Eigenvalues and Eigenvectors
Solve for eigenvalues (λ) using:
det(Σ − λI) = 0
4.916 − λ 6.83
6.83 9.67 − λ
= 0
Expanding:
(4.916 − λ)(9.67 − λ) − (6.83)2
= 0
Solving, we get the eigen values as :
λ1 = 14.52, λ2 = 0.06
Maruthi Sai Sasank 21
PCA Example: Step 4
- λ1 has the highest value.
Corresponding eigenvectors:
v1 =

−0.57
−0.81
#
, v2 =

−0.81
0.57
#
• v1 corresponds to λ1 and v2 corresponds to λ2.
Interpretation: The eigenvector with the largest eigenvalue
represents the principal direction of variance.
Maruthi Sai Sasank 22
PCA Example: Step 5 - Project Data Onto Principal Compo-
nent
Step 5: Project data onto the principal component
Xprojected = Xcentered · v1
Xprojected =





−2.25 −3.5
−1.25 −1.5
0.75 1.5
2.75 3.5





·

0.61
0.79
#
=





−4.17
−2.00
1.39
4.78





Interpretation: The 2D data is now transformed into a 1D
representation.
Maruthi Sai Sasank 23
PCA Example : Projection Visulaisation
• The principal component with the highest eigenvalue captures
the most variance.
• This transformation enables visualization, noise reduction, and
improved model performance.
Maruthi Sai Sasank 24
PCA Example : Summary
Summary of PCA Example:
• Original data was in 2D.
• PCA reduced it to 1D while preserving variance.
• The first principal component captured the most important
information.
• This technique is widely used in dimensionality reduction and
data compression.
Maruthi Sai Sasank 25
PCA vs. Decision Tree
Aspect PCA Decision Tree
Learning Type Unsupervised Supervised
Purpose Dimensionality reduc-
tion
Classification, Regres-
sion
Approach Transforms features
into new principal
components
Splits data based on
feature conditions
Feature Selec-
tion
Creates new features Selects important fea-
tures
Interpretability Hard to interpret Easy to interpret
Usage Preprocessing, noise
reduction
Decision-making
tasks
Maruthi Sai Sasank 26
Conclusion
Conclusion
• Decision Trees are intuitive tools for classification/regression
tasks.
• Dimensionality reduction plays a crucial role in improving
computational efficiency, reducing noise, and enhancing model
performance.
• PCA reduces dimensions while retaining key information.
• Both are essential in machine learning workflows.
Maruthi Sai Sasank 27
Thank you

More Related Content

PDF
Heuristic design of experiments w meta gradient search
PPTX
Predictive Modelling
PPTX
Introduction to Datamining Concept and Techniques
PDF
Practical data analysis with wine
PPTX
Instance Based Learning in machine learning
PPTX
Dimensionality Reduction and feature extraction.pptx
PPT
Understandig PCA and LDA
PPTX
DimensionalityReduction.pptx
Heuristic design of experiments w meta gradient search
Predictive Modelling
Introduction to Datamining Concept and Techniques
Practical data analysis with wine
Instance Based Learning in machine learning
Dimensionality Reduction and feature extraction.pptx
Understandig PCA and LDA
DimensionalityReduction.pptx

Similar to Machine_Learining_Concepts_DecisionTrees&PCA.pdf (20)

PPTX
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
PPTX
Enhancing the performance of kmeans algorithm
PDF
Principal component analysis and lda
PPTX
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
PPTX
Training and Testing Neural Network unit II
PDF
PDF
Six sigma pedagogy
PPTX
Building and deploying analytics
PPTX
07 learning
PPTX
Lecture2-DT.pptx
PPTX
Matrix OLS dshksdfbksjdbfkdjsfbdskfbdkj.pptx
PPT
Lecture 12 Principal Component Analysis in Machine Learning.ppt
PPT
Principal Component Analysis (PCA):How to conduct PCA
PPT
pca in machine learning pca in machine learning pca in machine learning pca i...
PPTX
Machine Learning with Python unit-2.pptx
PPT
Design and Analysis of algorithms unit 1 notes
PDF
Monte Carlo Simulation for project estimates v1.0
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
PDF
Applying your Convolutional Neural Networks
PPTX
Data mining 8 estimasi linear regression
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
Enhancing the performance of kmeans algorithm
Principal component analysis and lda
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
Training and Testing Neural Network unit II
Six sigma pedagogy
Building and deploying analytics
07 learning
Lecture2-DT.pptx
Matrix OLS dshksdfbksjdbfkdjsfbdskfbdkj.pptx
Lecture 12 Principal Component Analysis in Machine Learning.ppt
Principal Component Analysis (PCA):How to conduct PCA
pca in machine learning pca in machine learning pca in machine learning pca i...
Machine Learning with Python unit-2.pptx
Design and Analysis of algorithms unit 1 notes
Monte Carlo Simulation for project estimates v1.0
Machine Learning Essentials Demystified part2 | Big Data Demystified
Applying your Convolutional Neural Networks
Data mining 8 estimasi linear regression
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Construction Project Organization Group 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Welding lecture in detail for understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
composite construction of structures.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
OOP with Java - Java Introduction (Basics)
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Construction Project Organization Group 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Welding lecture in detail for understanding
bas. eng. economics group 4 presentation 1.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
Geodesy 1.pptx...............................................
CYBER-CRIMES AND SECURITY A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
UNIT 4 Total Quality Management .pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
composite construction of structures.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Model Code of Practice - Construction Work - 21102022 .pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
OOP with Java - Java Introduction (Basics)
Ad

Machine_Learining_Concepts_DecisionTrees&PCA.pdf

  • 1. Decision Trees & Dimensionality Reduction: PCA Maruthi Sai Sasank Thunuguntla AP22110011297 CSE - S March 30, 2025 SRM University - AP
  • 2. Overview • Introduction to Decision Trees • Importance of Decision trees • Solved Example • Dimensionality Reduction and Techniques • PCA and Solved Example. • Objectives of this presentation: • Understand the working of Decision Trees. • Learn about PCA and its role in dimensionality reduction. • Comparision of PCA & Decision Trees Maruthi Sai Sasank 2
  • 4. Introduction : What is a Decision Tree? • A supervised learning algorithm used for classification and regression. Aiming to predict the class label / value of a target variable by learning simple decision rules from the data . • Structure: • Root Node: Starting point. • Decision Nodes: Splitting points. • Leaf Nodes: Final outcomes/decision. • Example: Predicting weather-based activities. Maruthi Sai Sasank 3
  • 5. Importance of Decision Trees . • Interpretability: Easy to understand and visualize, making them useful for explaining model decisions. • Versatility: Can handle both classification and regression problems. • Feature Selection: Inherently selects important features by choosing the best splits. • Non-linearity: Captures complex decision boundaries without requiring feature transformations. • Handles Missing Values: Can work with datasets that have missing values. • Low Computational Cost: Compared to other models, decision trees are relatively efficient to train and evaluate. • Foundation for Advanced Models: Used in ensemble methods like Random Forest and Gradient Boosting. Maruthi Sai Sasank 4
  • 6. Working of Decision Trees? Steps: 1. Start at the root node. 2. Split data based on features (e.g., temperature, humidity). 3. Continue splitting until reaching leaf nodes. Key Concepts: • Entropy and Information Gain. • Algorithms: ID3, C4.5, CART. Maruthi Sai Sasank 5
  • 7. Example: Decision Tree Using ID3 Algorithm Decision Tree solving using ID3 algorithmn. Dataset: Outlook Temperature Windy Play? Sunny Hot No No Sunny Hot Yes No Overcast Hot No Yes Rainy Mild No Yes Rainy Cool No Yes Rainy Cool Yes No Overcast Cool Yes Yes Sunny Mild No Yes Maruthi Sai Sasank 6
  • 8. Entropy Calculation Entropy Formula: Entropy(S) = − X pi log2 pi (1) Compute Entropy for Play? (Target Variable) • Total samples = 8 • Positive (Yes) = 4, Negative (No) = 4 Entropy of Play: Entropy(S) = − 4 8 log2 4 8 + 4 8 log2 4 8 (2) = −(0.5 × −1 + 0.5 × −1) = 1.0 (3) The Play(target variable) Entropy(S) = 1. Maruthi Sai Sasank 7
  • 9. Information Gain Calculation Information Gain Formula: IG(S, A) = Entropy(S) − X |Sv | |S| Entropy(Sv ) (4) Compute Information Gains for the features Splitting by Outlook: • SSunny = {No, No, Yes} ⇒ Entropy = 0.918 • SOvercast = {Yes, Yes} ⇒ Entropy = 0.0 • SRainy = {Yes, Yes, No} ⇒ Entropy = 0.918 1) Information Gain for Outlook: IG(S, Outlook) = 1 − 3 8 × 0.918 + 2 8 × 0 + 3 8 × 0.918 (5) = 1 − (0.344 + 0 + 0.344) = 0.311 (6) Maruthi Sai Sasank 8
  • 10. Information Gain Calculation 2) Information Gain for Temperature • SHot = {No, No, Yes} ⇒ Entropy = 0.918 • SMild = {Yes, Yes} ⇒ Entropy = 0.0 • SCool = {Yes, No, Yes} ⇒ Entropy = 0.918 IG(S, Temperature) = 1− 3 8 × 0.918 + 2 8 × 0 + 3 8 × 0.918 (7) = 1 − (0.344 + 0 + 0.344) = 0.311 (8) 3. Information Gain for Windy • SWindy=Yes = {No, No, Yes} ⇒ Entropy = 0.918 • SWindy=No = {No, Yes, Yes, Yes, Yes} ⇒ Entropy = 0.721 IG(S, Windy) = 1 − 3 8 × 0.918 + 5 8 × 0.721 (9) = 1 − (0.344 + 0.451) = 0.205 (10) Maruthi Sai Sasank 9
  • 11. Information Gain Calculation Overall Information Gains : - Outlook = 0.311 - Temperature = 0.311 - Windy = 0.205 Now we take root node as Outlook and we proceed further . Outlook • Overcast → Yes • Sunny → Calculate information gain for Temperature and Windy with respect to Outlook-Sunny as system • Entropy(OutlookSunny) = 0.918 Maruthi Sai Sasank 10
  • 12. Further Splitting of Sunny and Rainy 1) Splitting the Sunny Branch (Temperature) • SHot = {No, No} ⇒ Entropy = 0.0 (Pure Node) • SMild = {Yes} ⇒ Entropy = 0.0 (Pure Node) IG(SSunny , Temperature) = 0.918 − 2 3 × 0 + 1 3 × 0 = 0.918 (11) Decision: Since IG = 0.918, we split by Temperature. 2) Splitting the Rainy Branch (Windy) • SWindy=No = {Yes, Yes} ⇒ Entropy = 0.0 (Pure Node) • SWindy=Yes = {No} ⇒ Entropy = 0.0 (Pure Node) IG(SRainy , Windy) = 0.918 − 2 3 × 0 + 1 3 × 0 = 0.918 (12) Decision: Since IG = 0.918, we split by Windy. Maruthi Sai Sasank 11
  • 13. Final Decision Tree Constructed Decision Tree: Maruthi Sai Sasank 12
  • 14. Pros and Cons of Decision Trees Advantages: • Easy to interpret and visualize • Handles numerical and categorical data • Performs well with small datasets • Feature selection is automatic Disadvantages: • Prone to overfitting • Sensitive to small changes in data • Less effective with complex relationships Maruthi Sai Sasank 13
  • 16. What is Dimensionality Reduction? Definition: • A technique used to reduce the number of features (variables) in a dataset while preserving its key information. • This is achieved by transforming data from a high-dimensional space to a lower-dimensional space, simplifying models and improving performance Why is it Needed? • Avoids the Curse of Dimensionality: Too many features can lead to overfitting and increased computation time. • Improves Model Performance: Reduces noise and redundant data, making the model more efficient. • Simplification and Interpretability: Easier to understand and visualize, which is crucial for gaining insights, etc Maruthi Sai Sasank 14
  • 17. Dimensionality Reduction : Common Techniques Common Techniques: • PCA (Principal Component Analysis): Transforms data into new variables (principal components) that capture maximum variance. • LDA (Linear Discriminant Analysis): Focuses on maximizing class separability. • Singular Value Decomposition: SVD is a linear algebra technique that decomposes a matrix (A) into three matrices: U,Σ, and V • t-SNE (t-Distributed Stochastic Neighbor Embedding): Used for visualizing high-dimensional data in 2D or 3D. Maruthi Sai Sasank 15
  • 18. What is PCA? Principal Component Analysis (PCA) is a statistical technique,that transforms a set of possibly correlated variables into a smaller number of uncorrelated variables called principal components, which are linear combinations of the original variables. How Does PCA Work? • Identifies patterns in data and finds the directions (principal components) that capture the most variance. • Transforms the original data into a smaller set of new variables while preserving key trends. Why Use PCA? • Reduces complexity and computation time. • Helps visualize high-dimensional data in 2D or 3D. • Removes noise and redundant features for better model Maruthi Sai Sasank 16
  • 19. Steps in PCA Key Idea: Maximize variance in lower-dimensional space. Applications: Image compression, preprocessing for ML models. Steps in PCA: 1. Compute Mean of class Centered matrix. 2. Compute the Covariance Matrix 3. Calculate Eigenvalues and Eigenvectors 4. Sort Eigenvalues and Select Principal Components 5. Construct the Projection Matrix 6. Transform the Original Data (Projection onto Principal Components) Maruthi Sai Sasank 17
  • 20. PCA Example: Step 1 - Compute Mean Given Data: X =      2 3 3 5 5 8 7 10      Step 1: Compute the mean of each column µx = 2 + 3 + 5 + 7 4 = 4.25, µy = 3 + 5 + 8 + 10 4 = 6.5 Interpretation: The mean values represent the central tendency of each feature (dimension). We will subtract these means to center the data . Maruthi Sai Sasank 18
  • 21. PCA Example: Step 2 - Center Data Step 2: Center the data (subtract the mean) Xcentered = X − µ Xcentered =      2 − 4.25 3 − 6.5 3 − 4.25 5 − 6.5 5 − 4.25 8 − 6.5 7 − 4.25 10 − 6.5      =      −2.25 −3.5 −1.25 −1.5 0.75 1.5 2.75 3.5      Interpretation: The data is now centered at the origin, meaning each feature has a mean of zero. Maruthi Sai Sasank 19
  • 22. PCA Example: Step 3 - Compute Covariance Matrix Step 3: Compute the Covariance Matrix The covariance matrix is calculated using: Σ = 1 n − 1 (XT centeredXcentered) Expanding: Σ = 1 3 −2.25 −1.25 0.75 2.75 −3.5 −1.5 1.5 3.5 # ·      −2.25 −3.5 −1.25 −1.5 0.75 1.5 2.75 3.5      Σ = 4.916 6.83 6.83 9.67 # Interpretation: The covariance matrix shows how the features are correlated. Maruthi Sai Sasank 20
  • 23. PCA Example: Step 4 - Compute Eigenvalues Eigenvectors Step 4: Compute Eigenvalues and Eigenvectors Solve for eigenvalues (λ) using: det(Σ − λI) = 0 4.916 − λ 6.83 6.83 9.67 − λ = 0 Expanding: (4.916 − λ)(9.67 − λ) − (6.83)2 = 0 Solving, we get the eigen values as : λ1 = 14.52, λ2 = 0.06 Maruthi Sai Sasank 21
  • 24. PCA Example: Step 4 - λ1 has the highest value. Corresponding eigenvectors: v1 = −0.57 −0.81 # , v2 = −0.81 0.57 # • v1 corresponds to λ1 and v2 corresponds to λ2. Interpretation: The eigenvector with the largest eigenvalue represents the principal direction of variance. Maruthi Sai Sasank 22
  • 25. PCA Example: Step 5 - Project Data Onto Principal Compo- nent Step 5: Project data onto the principal component Xprojected = Xcentered · v1 Xprojected =      −2.25 −3.5 −1.25 −1.5 0.75 1.5 2.75 3.5      · 0.61 0.79 # =      −4.17 −2.00 1.39 4.78      Interpretation: The 2D data is now transformed into a 1D representation. Maruthi Sai Sasank 23
  • 26. PCA Example : Projection Visulaisation • The principal component with the highest eigenvalue captures the most variance. • This transformation enables visualization, noise reduction, and improved model performance. Maruthi Sai Sasank 24
  • 27. PCA Example : Summary Summary of PCA Example: • Original data was in 2D. • PCA reduced it to 1D while preserving variance. • The first principal component captured the most important information. • This technique is widely used in dimensionality reduction and data compression. Maruthi Sai Sasank 25
  • 28. PCA vs. Decision Tree Aspect PCA Decision Tree Learning Type Unsupervised Supervised Purpose Dimensionality reduc- tion Classification, Regres- sion Approach Transforms features into new principal components Splits data based on feature conditions Feature Selec- tion Creates new features Selects important fea- tures Interpretability Hard to interpret Easy to interpret Usage Preprocessing, noise reduction Decision-making tasks Maruthi Sai Sasank 26
  • 30. Conclusion • Decision Trees are intuitive tools for classification/regression tasks. • Dimensionality reduction plays a crucial role in improving computational efficiency, reducing noise, and enhancing model performance. • PCA reduces dimensions while retaining key information. • Both are essential in machine learning workflows. Maruthi Sai Sasank 27