SlideShare a Scribd company logo
2
Most read
6
Most read
17
Most read
Dimensionality Reduction
• Dimensionality reduction is the process of reducing the number of random variables or
attributes under consideration.
• When the dimension increases, with the sparsity, the distance between two independent
points increases. That results in less similarity among the data points which will result in
more error when it comes to most of the machine learning and other techniques used in
data mining. To compensate we will have to feed very large number of data points but with
higher dimensions it’s practically impossible and even it’s possible it will be inefficient.
Techniques of dimensionality reduction
Dimensionality reduction is accomplished based on either feature selection or feature
extraction.
Feature selection is based on omitting those features from the available measurements
which do not contribute to class separability. In other words, redundant and irrelevant
features are ignored.
Feature extraction, on the other hand, considers the whole information content and maps the
useful information content into a lower dimensional feature space.
Why Dimensionality Reduction is Important
• Dimensionality reduction brings many advantages to your machine learning data,
including:
• Fewer features mean less complexity
• You will need less storage space because you have fewer data
• Fewer features require less computation time
• Model accuracy improves due to less misleading data
• Algorithms train faster thanks to fewer data
• Reducing the data set’s feature dimensions helps visualize the data faster
• It removes noise and redundant features
Dimensionality Reduction Techniques
• Here are some techniques machine learning professionals use.
• Principal Component Analysis(feature extraction).
• PCA extracts a new set of variables from an existing, more extensive set. The new set is called “principal
components.”
• Backward Feature Elimination.
• Forward Feature Selection.
• Low Variance Filter.
• High Correlation Filter.
• Decision Trees.(feature selection)
• Random Forest.
• Factor Analysis.(feature selection)
How do you do a PCA?
1.Standardize the range of continuous initial variables
2.Compute the covariance matrix to identify correlations
3.Compute the eigenvectors and eigenvalues of the covariance matrix to identify the
principal components
4.Create a feature vector to decide which principal components to keep
5.Recast the data along the principal components axes
Exercise:
• Consider the two dimensional patterns
(2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
• Compute the principal component using PCA Algorithm.
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38.
Clearly, the second eigen value is very small compared to the first eigen value.
So, the second eigen vector can be left out.
Eigen vector corresponding to the greatest eigen value is the principal component for the given data
set.
So. we find the eigen vector corresponding to eigen value λ1.
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
• 𝐴 = 𝜋𝑟2
we project the data points onto the new subspace
as-
=
Projected points are:
Dimensionality Reduction in Machine Learning
Apply PCA for the following dataset

More Related Content

PPTX
Introduction to HDFS
PPTX
blood bank management system project report
PDF
Dimensionality Reduction
PPTX
Hadoop Architecture
PDF
PPT2: Introduction of Machine Learning & Deep Learning and its types
PPTX
OOP concepts -in-Python programming language
PPTX
Online Quiz System Project Report ppt
Introduction to HDFS
blood bank management system project report
Dimensionality Reduction
Hadoop Architecture
PPT2: Introduction of Machine Learning & Deep Learning and its types
OOP concepts -in-Python programming language
Online Quiz System Project Report ppt

What's hot (20)

PDF
Dimensionality Reduction
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PDF
Feature selection
PPTX
Feature Selection in Machine Learning
PDF
Bias and variance trade off
PDF
Data preprocessing using Machine Learning
PDF
K - Nearest neighbor ( KNN )
PPTX
Support Vector Machine ppt presentation
PDF
Introduction to Machine Learning Classifiers
PPTX
Classification techniques in data mining
PPTX
Hyperparameter Tuning
PPT
Clustering
PPTX
Naive bayes
PDF
Principal Component Analysis
PDF
Introduction to Machine Learning with SciKit-Learn
PPT
Support Vector Machines
PDF
07 dimensionality reduction
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPT
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
PPTX
Bagging.pptx
Dimensionality Reduction
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Feature selection
Feature Selection in Machine Learning
Bias and variance trade off
Data preprocessing using Machine Learning
K - Nearest neighbor ( KNN )
Support Vector Machine ppt presentation
Introduction to Machine Learning Classifiers
Classification techniques in data mining
Hyperparameter Tuning
Clustering
Naive bayes
Principal Component Analysis
Introduction to Machine Learning with SciKit-Learn
Support Vector Machines
07 dimensionality reduction
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Bagging.pptx
Ad

Similar to Dimensionality Reduction in Machine Learning (20)

PPTX
Module-4_Part-II.pptx
PPTX
introduction to Statistical Theory.pptx
PPTX
dimentionalityreduction-241109090040-5290a6cd.pptx
PPTX
Principal component analysis.pptx
PPTX
Principal component analysis.pptx
PPTX
data reduction techniques-data minig.pptx
PDF
Dimentionality Reduction PCA Version 1.pdf
PPTX
Principal component analysis.pptx
PPTX
Feature Engineering Fundamentals Explained.pptx
PPTX
data science module-3 power point presentation
PPTX
Dimensionality Reduction.pptx
PPTX
Easy_PCA_Presentation multivariate .pptx
PDF
30thSep2014
PPTX
Data reduction
PPTX
Week 12 Dimensionality Reduction Bagian 1
PPTX
Rapid Miner
PDF
overview of_data_processing
 
PPTX
Random Forest Decision Tree.pptx
PPTX
data_preprocessingknnnaiveandothera.pptx
PPTX
This notes are more beneficial for artifical intelligence
Module-4_Part-II.pptx
introduction to Statistical Theory.pptx
dimentionalityreduction-241109090040-5290a6cd.pptx
Principal component analysis.pptx
Principal component analysis.pptx
data reduction techniques-data minig.pptx
Dimentionality Reduction PCA Version 1.pdf
Principal component analysis.pptx
Feature Engineering Fundamentals Explained.pptx
data science module-3 power point presentation
Dimensionality Reduction.pptx
Easy_PCA_Presentation multivariate .pptx
30thSep2014
Data reduction
Week 12 Dimensionality Reduction Bagian 1
Rapid Miner
overview of_data_processing
 
Random Forest Decision Tree.pptx
data_preprocessingknnnaiveandothera.pptx
This notes are more beneficial for artifical intelligence
Ad

Recently uploaded (20)

PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
Project quality management in manufacturing
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
web development for engineering and engineering
PPTX
additive manufacturing of ss316l using mig welding
PDF
Well-logging-methods_new................
DOCX
573137875-Attendance-Management-System-original
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Geodesy 1.pptx...............................................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Project quality management in manufacturing
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Internet of Things (IOT) - A guide to understanding
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
web development for engineering and engineering
additive manufacturing of ss316l using mig welding
Well-logging-methods_new................
573137875-Attendance-Management-System-original
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx

Dimensionality Reduction in Machine Learning

  • 2. • Dimensionality reduction is the process of reducing the number of random variables or attributes under consideration. • When the dimension increases, with the sparsity, the distance between two independent points increases. That results in less similarity among the data points which will result in more error when it comes to most of the machine learning and other techniques used in data mining. To compensate we will have to feed very large number of data points but with higher dimensions it’s practically impossible and even it’s possible it will be inefficient.
  • 3. Techniques of dimensionality reduction Dimensionality reduction is accomplished based on either feature selection or feature extraction. Feature selection is based on omitting those features from the available measurements which do not contribute to class separability. In other words, redundant and irrelevant features are ignored.
  • 4. Feature extraction, on the other hand, considers the whole information content and maps the useful information content into a lower dimensional feature space.
  • 5. Why Dimensionality Reduction is Important • Dimensionality reduction brings many advantages to your machine learning data, including: • Fewer features mean less complexity • You will need less storage space because you have fewer data • Fewer features require less computation time • Model accuracy improves due to less misleading data • Algorithms train faster thanks to fewer data • Reducing the data set’s feature dimensions helps visualize the data faster • It removes noise and redundant features
  • 6. Dimensionality Reduction Techniques • Here are some techniques machine learning professionals use. • Principal Component Analysis(feature extraction). • PCA extracts a new set of variables from an existing, more extensive set. The new set is called “principal components.” • Backward Feature Elimination. • Forward Feature Selection. • Low Variance Filter. • High Correlation Filter. • Decision Trees.(feature selection) • Random Forest. • Factor Analysis.(feature selection)
  • 7. How do you do a PCA? 1.Standardize the range of continuous initial variables 2.Compute the covariance matrix to identify correlations 3.Compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components 4.Create a feature vector to decide which principal components to keep 5.Recast the data along the principal components axes
  • 8. Exercise: • Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8). • Compute the principal component using PCA Algorithm.
  • 17. Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38. Clearly, the second eigen value is very small compared to the first eigen value. So, the second eigen vector can be left out. Eigen vector corresponding to the greatest eigen value is the principal component for the given data set. So. we find the eigen vector corresponding to eigen value λ1.
  • 20. • 𝐴 = 𝜋𝑟2 we project the data points onto the new subspace as- = Projected points are:
  • 22. Apply PCA for the following dataset