SlideShare a Scribd company logo
Feature Extraction
Principal Component Analysis (PCA)
Dimensionality Reduction
2
Curse of Dimensionality
 Increasing the number of features will
not always improve classification
accuracy.
 In practice, the inclusion of more
features might actually lead to worse
performance.
 The number of training examples
required increases exponentially with
dimensionality D (i.e., kD). Total: 32 bins
Total: 33 bins
Total: 31 bins
k: number of bins per feature
k=3 bins
per feature
3
3
Dimensionality Reduction
 What is the objective?
Choose an optimum set of
features d* of lower
dimensionality to improve
classification accuracy.
 Different methods can be used
to reduce dimensionality:
Feature extraction
Feature selection
d*
4
4
Dimensionality Reduction (cont’d)
Feature extraction:
computes a new set of
features from the original
features through some
transformation f() .
1
2
1
2
.
.
.
.
.
.
. K
i
i
i
D
x
x x
x
x
x
 
 
   
   
   
   
  
   
   
   
   
 
 
 
x y
1
2
1
2
( )
.
.
.
.
.
.
.
f
K
D
x
x
y
y
y
x
 
 
   
   
   
   
 
 
   
   
   
 
 
 
 
 
x
x y
Feature selection:
chooses a subset of
the original features.
f() could be linear or non-linear
K<<D K<<D
5
1
2
1
2
( )
.
.
.
.
.
.
.
f
K
D
x
x
y
y
y
x
 
 
   
   
   
   
 
 
   
   
   
 
 
 
 
 
x
x y
Feature Extraction
 Linear transformations are particularly attractive because they
are simpler to compute and analytically tractable.
 Given x ϵ RD, find an K x D matrix T such that:
y = Tx ϵ RK where K<<D
5
T
This is a projection
transformation from D
dimensions to K dimensions.
Each new feature yi is a linear
combination of the original
features xi
6
Feature Extraction (cont’d)
 From a mathematical point of view, finding an
optimum mapping y=𝑓(x) can be formulated as an
optimization problem (i.e., minimize or maximize an
objective criterion).
 Commonly used objective criteria:
 Minimize Information Loss: projection in the lower-
dimensional space preserves as much information in the
data as possible.
 Maximize Discriminatory Information: projection in the
lower-dimensional space increases class separability.
6
7
Feature Extraction (cont’d)
 Popular linear feature extraction methods:
 Principal Components Analysis (PCA): Seeks a projection
that minimizes information loss.
 Linear Discriminant Analysis (LDA): Seeks a projection
that maximizes discriminatory information.
 Many other methods:
 Making features as independent as possible (Independent
Component Analysis).
 Retaining interesting directions (Projection Pursuit).
 Embedding to lower dimensional manifolds (Isomap,
Locally Linear Embedding).
7
Feature Extraction
Principal Component
Analysis (PCA)
9
Intuition behind PCA
Find the tallest person.
Person Height
A 185
B 145
C 160
I can by seeing person A is the tallest
10
Intuition behind PCA
Find the tallest person.
Person Height
A 173
B 172
C 174
It’s tough when they are very similar in height.
11
What Is Principal Component Analysis?
 Principal component analysis (PCA) is a
statistical technique to reduce the
dimensionality of complex, high-volume
datasets by extracting the principal
components that contain the most information
and rejecting noise or less important data while
preserving all the crucial details.
12
Principal Component Analysis (PCA)
 PCA comes under the Unsupervised Machine
Learning category
 Reducing the number of variables in a data
collection while retaining as much information
as feasible is the main goal of PCA.
 PCA can be mainly used for Dimensionality
Reduction and also for important feature
selection.
 Correlated features to Independent features
13
What Is Principal Component Analysis?
 PCA is a method to reduce the dimensionality
of enormous data collections.
 This approach transforms an extensive
collection of variables into a smaller group that
retains nearly all the data in the larger set.
 Lowering the number of variables in a data set
inevitably reduces its precision
14
What Is Principal Component Analysis?
 The purpose of dimension reduction is to forgo
a certain level of precision for simplification.
 Smaller data collections are easier to
investigate and visualize.
 This facilitates and accelerates the analysis of
data points by machine learning (ML)
algorithms.
15
What Is Principal Component Analysis?
 PCA is designed to limit the total quantity of
variables in a data set while maintaining as
many details as possible.
 The altered new features or PCA’s results
are known as principal components (PCs)
once PCA has been performed.
 The number of PCs is the same as or lesser
than the number of original features in the
dataset.
16
characteristics of principal components:
1. The principal component must correspond to the
linear arrangement of the initial features.
2. The character of these components is orthogonal.
This indicates there is no relationship within a pair
of variables.
3. From 1 to n, the significance of each component
diminishes.
1. This indicates that the number one PC is the most
important, while the number “n” PC is the least
significant.
17
Basic Terminologies of PCA
 Variance – for calculating the variation of data
distributed across dimensionality of graph
 Covariance – calculating dependencies and
relationship between features
 Standardizing data – Scaling our dataset within a
specific range for unbiased output
18
Basic Terminologies of PCA
 Covariance matrix – Used for calculating
interdependencies between the features or
variables and also helps in reduce it to improve
the performance
19
Basic Terminologies of PCA
 EigenValues and EigenVectors –
Eigenvectors’ purpose is to find out the largest
variance that exists in the dataset to calculate
Principal Component.
 Eigenvalue means the magnitude of the
Eigenvector.
 Eigenvalue indicates variance in a particular
direction and whereas eigenvector is expanding
or contracting X-Y (2D) graph without altering
the direction.
20
EigenValues and EigenVectors
• In this shear mapping, the blue arrow changes direction whereas the pink
arrow does not.
• The pink arrow in this instance is an eigenvector because of its constant
orientation.
• The length of this arrow is also unaltered, and its eigenvalue is 1.
• Technically, PC is a straight line that captures the maximum variance
(information) of the data.
• PC shows direction and magnitude. PC are perpendicular to each other.
21
Basic Terminologies of PCA
 Dimensionality Reduction – Transpose of
original data and multiply it by transposing of
the derived feature vector.
 Reducing the features without losing
information.
22
How does PCA work?
 The steps involved for PCA are as follows-
1. Original Data
2. Normalize the original data (mean =0, variance =1)
3. Calculating covariance matrix
4. Calculating Eigen values, Eigen vectors, and
normalized Eigenvectors
5. Calculating Principal Component (PC)
6. Plot the graph for orthogonality between PCs
23
24
STEP 1: STANDARDIZATION
 The range of variables is calculated and
standardized in this process to analyze the
contribution of each variable equally.
 Calculating the initial variables will help you
categorize the variables that are dominating the
other variables of small ranges.
 This will help you attain biased results at the
end of the analysis.
25
STEP 1: STANDARDIZATION
 To transform the variables of the same
standard, you can follow the following
formula.
 Where,
X= value in a data set N= number of
values in the data set
26
STEP 1: STANDARDIZATION
 Example:
27
STEP 1: STANDARDIZATION
 Calculate the Mean and Standard Deviation for
each feature and then, tabulate the same as
follows.
28
STEP 1: STANDARDIZATION
 after the Standardization of each variable, the
results are tabulated below.
29
STEP 2: COVARIANCE MATRIX COMPUTATION
 In this step, you will get to know how the variables of the
given data are varying with the mean value calculated.
 Any interrelated variables can also be sorted out at the
end of this step.
 To segregate the highly interrelated variables, you
calculate the covariance matrix with the help of the given
formula.
 **Note: **A covariance matrix is a N x N symmetrical matrix
that contains the covariances of all possible data sets.
30
STEP 2: COVARIANCE MATRIX COMPUTATION
 The covariance matrix of two-dimensional data is, given
as follows:
covariance matrix of 3-dimensional data
31
STEP 2: COVARIANCE MATRIX COMPUTATION
Insights of covariance matrix
 the covariance of a number with itself is its variance
(COV(X, X)=Var(X)), the values at the top left and
bottom right will have the variances of the same
initial number.
 Covariance Matrix at the main diagonal will be
symmetric concerning the fact that covariance is
commutative (COV(X, Y)=COV(Y, X)).
32
STEP 2: COVARIANCE MATRIX COMPUTATION
Insights of covariance matrix
 If the value of the Covariance Matrix is positive, :variables are
correlated. ( If X increases, Y also increases and vice versa)
 If the value of the Covariance Matrix is negative: variables are inversely
correlated. ( If X increases, Y also decreases and vice versa).i
 End of this step, we will come to know which pair of variables are
correlated with each other, so that you might categorize them much
easier.
33
STEP 2: COVARIANCE MATRIX COMPUTATION
 Example: The formula to calculate the covariance
matrix of the given example will be:
34
STEP 2: COVARIANCE MATRIX COMPUTATION
 Since you have already standardized the features, you can consider
Mean = 0 and Standard Deviation=1 for each feature.
VAR(F1) = ((-1.0695-0)² + (0.5347-0)² + (-1.0695-0)² + (0.5347–0)²
+(1.069–0)²)/5
 On solving the equation, you get, VAR(F1) = 0.78
COV(F1,F2) = ((-1.0695–0)(0.8196-0) + (0.5347–0)(-1.6393-0) + (-
1.0695–0)* (0.0000-0) + (0.5347–0)(0.0000-0)+ (1.0695–0)(0.8196–
0))/5
 On solving the equation, you get, COV(F1,F2 = -0.8586)
35
covariance matrix
 Similarly solving all the features, the
covariance matrix will be,
36
STEP 4: FEATURE VECTOR
 To determine the principal components of
variables, we have to define eigen value and
eigen vectors.
 Let A be any square matrix. A non-zero vector
v is an eigenvector of A if
Av = λv
for some number λ, called the corresponding
eigenvalue.
37
STEP 4: FEATURE VECTOR
 Once you have computed the eigen vector
components, define eigen values in descending
order ( for all variables) and now you will get a
list of principal components.
 So, the eigen values represent the principal
components and these components represent
the direction of data.
38
STEP 4: FEATURE VECTOR
 This indicates that if the line contains large
variables of large variances, then there are
many data points on the line. Thus, there is
more information on the line too.
 Finally, these principal components form a line
of new axes for easier evaluation of data and
also the differences between the observations
can also be easily monitored.
39
STEP 4: FEATURE VECTOR
 Example:
 Let ν be a non-zero vector and λ a scalar.
As per the rule,
 Aν = λν, then λ is called eigenvalue associated
with eigenvector ν of A.
40
STEP 4: FEATURE VECTOR
 Example:
 Upon substituting the values in det(A- λI) = 0,
you will get the following matrix.
41
STEP 4: FEATURE VECTOR
 When you solve the following the matrix by considering 0 on
right-hand side, you can define eigen values as
λ = 2.11691 , 0.855413 , 0.481689 , 0.334007
 Then, substitute each eigen value in (A-λI)ν=0 equation and
solve the same for different eigen vectors v1, v2, v3 and v4.
 For instance, For λ = 2.11691, solving the above equation
using Cramer's rule, the values for the v vector are
v1 = 0.515514 v2 = -0.616625 v3 = 0.399314 v4 =0.441098
42
STEP 4: FEATURE VECTOR
 Follow the same process and you will form the
following matrix by using the eigen vectors
calculated as instructed.
43
STEP 4: FEATURE VECTOR
 Sort the Eigenvalues in decreasing order.
44
STEP 4: FEATURE VECTOR
 calculate the sum of each Eigen column, arrange them
in descending order and pick up the topmost Eigen
values.
 These are your Principal components.
45
STEP 5: RECAST THE DATA ALONG THE
PRINCIPAL COMPONENTS AXES
 Still now, apart from standardization, you haven’t made any
changes to the original data.
 You have just selected the Principal components and formed a
feature vector.
 Yet, the initial data remains the same on their original axes.
 This step aims at the reorientation of data from their original
axes to the ones you have calculated from the Principal
components.
 This can be done by the following formula.
Final Data Set= Standardized Original Data Set * FeatureVector
46
Example:
 Standardized
Original Data Set =
 FeatureVector =
47
Example:
 By solving the above equations, you will get
the transformed data as follows.
 Your large dataset is now compressed into a
small dataset without any loss of data!
48
Example in Python
 Step 1: Load the Iris Data Set
49
Step 2: Standardize the Data
• Use StandardScaler to help you standardize the data set’s features
onto unit scale (mean = 0 and variance = 1), which is a requirement
for the optimal performance of many machine learning algorithms.
• If you don’t scale your data, it can have a negative effect on your
algorithm.
50
Step 2: Standardize the Data
• Use StandardScaler to help you standardize the data
set’s features onto unit scale (mean = 0 and variance =
1), which is a requirement for the optimal performance of
many machine learning algorithms.
• If you don’t scale your data, it can have a negative effect
on your algorithm.
51
Step 3: PCA Projection to 2D
 The original data has four columns (sepal length, sepal width,
petal length and petal width).
 In this section, the code projects the original data, which is
four-dimensional, into two dimensions.
52
Step 3: PCA Projection to 2D
 Concatenating DataFrame along axis = 1.
 finalDf is the final DataFrame before plotting the
data.
53
Step 4: Visualize 2D Projection
54
Step 4: Visualize 2D Projection
55
Explained Variance
 The explained variance tells you how much information
(variance) can be attributed to each of the principal
components.
 By using the attribute explained_variance_ratio_, you can
see that the first principal component contains 72.96 percent of
the variance, and the second principal component contains
22.85 percent of the variance.
 Together, the two components contain 95.71 percent of the
information.
56
Perform a Scree Plot of the
Principal Components
 A scree plot is like a bar chart showing the size
of each of the principal components.
 It helps us to visualize the percentage of
variation captured by each of the principal
components.
 To perform a scree plot you need to:
 first of all, create a list of columns
 then, list of PCs
 finally, do the scree plot using plt
57
Scree Plot

More Related Content

PPTX
Biometrical Techniques in Plant Breeding
PDF
Groundnut
PPTX
Feature selection
PDF
Fundamentals of data structures ellis horowitz & sartaj sahni
PPT
Slit lamp examination
ODP
Exploratory factor analysis
PPTX
Parametric versus non parametric test
PPTX
Association analysis
Biometrical Techniques in Plant Breeding
Groundnut
Feature selection
Fundamentals of data structures ellis horowitz & sartaj sahni
Slit lamp examination
Exploratory factor analysis
Parametric versus non parametric test
Association analysis

What's hot (20)

PDF
Dimensionality Reduction
PDF
An introduction to Machine Learning
PDF
Understanding Bagging and Boosting
PPTX
Machine Learning: Bias and Variance Trade-off
PDF
Support Vector Machines for Classification
PPTX
Feature selection concepts and methods
PDF
Decision trees in Machine Learning
PDF
Uncertainty Quantification in AI
PPTX
Random forest algorithm
PPTX
An overview of gradient descent optimization algorithms
PPTX
Polynomial regression
PPTX
K-Nearest Neighbor Classifier
PPT
2.5 backpropagation
PPTX
Support vector machine
PPTX
Unsupervised learning clustering
PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
PPTX
Data preprocessing in Machine learning
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Optimization in Deep Learning
PPSX
Dimensionality Reduction
An introduction to Machine Learning
Understanding Bagging and Boosting
Machine Learning: Bias and Variance Trade-off
Support Vector Machines for Classification
Feature selection concepts and methods
Decision trees in Machine Learning
Uncertainty Quantification in AI
Random forest algorithm
An overview of gradient descent optimization algorithms
Polynomial regression
K-Nearest Neighbor Classifier
2.5 backpropagation
Support vector machine
Unsupervised learning clustering
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Data preprocessing in Machine learning
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Optimization in Deep Learning
Ad

Similar to Dimensionality Reduction and feature extraction.pptx (20)

PPTX
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
PDF
Matrix Factorization In Recommender Systems
PPT
Understandig PCA and LDA
PPTX
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
PDF
Working with the data for Machine Learning
PDF
Principal component analysis and lda
PPTX
Feature selection using PCA.pptx
PDF
PDF
A Novel Algorithm for Design Tree Classification with PCA
PDF
1376846406 14447221
PPTX
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
PPTX
DimensionalityReduction.pptx
PPTX
Machine Learning Algorithms (Part 1)
PDF
pca.pdf polymer nanoparticles and sensors
PDF
Machine Learning.pdf
PPTX
Implement principal component analysis (PCA) in python from scratch
PDF
Ijariie1117 volume 1-issue 1-page-25-27
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
Matrix Factorization In Recommender Systems
Understandig PCA and LDA
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
Working with the data for Machine Learning
Principal component analysis and lda
Feature selection using PCA.pptx
A Novel Algorithm for Design Tree Classification with PCA
1376846406 14447221
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
DimensionalityReduction.pptx
Machine Learning Algorithms (Part 1)
pca.pdf polymer nanoparticles and sensors
Machine Learning.pdf
Implement principal component analysis (PCA) in python from scratch
Ijariie1117 volume 1-issue 1-page-25-27
Principal Component Analysis (PCA) and LDA PPT Slides
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
Ad

More from Sivam Chinna (10)

PPTX
Geospatial Visualization for datascience
PPT
graph theory in applied mathematics with example
PPT
Use case diagram with example of illustration
PPT
SE Roger S. Pressman - Requirement Model.ppt
PPT
14078956.ppt
PPTX
pythonpandas-181223110521-1(1).pptx
PPTX
Pandas-(Ziad).pptx
PPTX
Clustering.pptx
PPTX
13. Anomaly.pptx
PPT
UNIT 5-ANN.ppt
Geospatial Visualization for datascience
graph theory in applied mathematics with example
Use case diagram with example of illustration
SE Roger S. Pressman - Requirement Model.ppt
14078956.ppt
pythonpandas-181223110521-1(1).pptx
Pandas-(Ziad).pptx
Clustering.pptx
13. Anomaly.pptx
UNIT 5-ANN.ppt

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Global journeys: estimating international migration
PDF
Introduction to Business Data Analytics.
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
1_Introduction to advance data techniques.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Mega Projects Data Mega Projects Data
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
.pdf is not working space design for the following data for the following dat...
Database Infoormation System (DBIS).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
Global journeys: estimating international migration
Introduction to Business Data Analytics.
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
1_Introduction to advance data techniques.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
Introduction to Knowledge Engineering Part 1
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction-to-Cloud-ComputingFinal.pptx
Mega Projects Data Mega Projects Data

Dimensionality Reduction and feature extraction.pptx

  • 1. Feature Extraction Principal Component Analysis (PCA) Dimensionality Reduction
  • 2. 2 Curse of Dimensionality  Increasing the number of features will not always improve classification accuracy.  In practice, the inclusion of more features might actually lead to worse performance.  The number of training examples required increases exponentially with dimensionality D (i.e., kD). Total: 32 bins Total: 33 bins Total: 31 bins k: number of bins per feature k=3 bins per feature
  • 3. 3 3 Dimensionality Reduction  What is the objective? Choose an optimum set of features d* of lower dimensionality to improve classification accuracy.  Different methods can be used to reduce dimensionality: Feature extraction Feature selection d*
  • 4. 4 4 Dimensionality Reduction (cont’d) Feature extraction: computes a new set of features from the original features through some transformation f() . 1 2 1 2 . . . . . . . K i i i D x x x x x x                                              x y 1 2 1 2 ( ) . . . . . . . f K D x x y y y x                                               x x y Feature selection: chooses a subset of the original features. f() could be linear or non-linear K<<D K<<D
  • 5. 5 1 2 1 2 ( ) . . . . . . . f K D x x y y y x                                               x x y Feature Extraction  Linear transformations are particularly attractive because they are simpler to compute and analytically tractable.  Given x ϵ RD, find an K x D matrix T such that: y = Tx ϵ RK where K<<D 5 T This is a projection transformation from D dimensions to K dimensions. Each new feature yi is a linear combination of the original features xi
  • 6. 6 Feature Extraction (cont’d)  From a mathematical point of view, finding an optimum mapping y=𝑓(x) can be formulated as an optimization problem (i.e., minimize or maximize an objective criterion).  Commonly used objective criteria:  Minimize Information Loss: projection in the lower- dimensional space preserves as much information in the data as possible.  Maximize Discriminatory Information: projection in the lower-dimensional space increases class separability. 6
  • 7. 7 Feature Extraction (cont’d)  Popular linear feature extraction methods:  Principal Components Analysis (PCA): Seeks a projection that minimizes information loss.  Linear Discriminant Analysis (LDA): Seeks a projection that maximizes discriminatory information.  Many other methods:  Making features as independent as possible (Independent Component Analysis).  Retaining interesting directions (Projection Pursuit).  Embedding to lower dimensional manifolds (Isomap, Locally Linear Embedding). 7
  • 9. 9 Intuition behind PCA Find the tallest person. Person Height A 185 B 145 C 160 I can by seeing person A is the tallest
  • 10. 10 Intuition behind PCA Find the tallest person. Person Height A 173 B 172 C 174 It’s tough when they are very similar in height.
  • 11. 11 What Is Principal Component Analysis?  Principal component analysis (PCA) is a statistical technique to reduce the dimensionality of complex, high-volume datasets by extracting the principal components that contain the most information and rejecting noise or less important data while preserving all the crucial details.
  • 12. 12 Principal Component Analysis (PCA)  PCA comes under the Unsupervised Machine Learning category  Reducing the number of variables in a data collection while retaining as much information as feasible is the main goal of PCA.  PCA can be mainly used for Dimensionality Reduction and also for important feature selection.  Correlated features to Independent features
  • 13. 13 What Is Principal Component Analysis?  PCA is a method to reduce the dimensionality of enormous data collections.  This approach transforms an extensive collection of variables into a smaller group that retains nearly all the data in the larger set.  Lowering the number of variables in a data set inevitably reduces its precision
  • 14. 14 What Is Principal Component Analysis?  The purpose of dimension reduction is to forgo a certain level of precision for simplification.  Smaller data collections are easier to investigate and visualize.  This facilitates and accelerates the analysis of data points by machine learning (ML) algorithms.
  • 15. 15 What Is Principal Component Analysis?  PCA is designed to limit the total quantity of variables in a data set while maintaining as many details as possible.  The altered new features or PCA’s results are known as principal components (PCs) once PCA has been performed.  The number of PCs is the same as or lesser than the number of original features in the dataset.
  • 16. 16 characteristics of principal components: 1. The principal component must correspond to the linear arrangement of the initial features. 2. The character of these components is orthogonal. This indicates there is no relationship within a pair of variables. 3. From 1 to n, the significance of each component diminishes. 1. This indicates that the number one PC is the most important, while the number “n” PC is the least significant.
  • 17. 17 Basic Terminologies of PCA  Variance – for calculating the variation of data distributed across dimensionality of graph  Covariance – calculating dependencies and relationship between features  Standardizing data – Scaling our dataset within a specific range for unbiased output
  • 18. 18 Basic Terminologies of PCA  Covariance matrix – Used for calculating interdependencies between the features or variables and also helps in reduce it to improve the performance
  • 19. 19 Basic Terminologies of PCA  EigenValues and EigenVectors – Eigenvectors’ purpose is to find out the largest variance that exists in the dataset to calculate Principal Component.  Eigenvalue means the magnitude of the Eigenvector.  Eigenvalue indicates variance in a particular direction and whereas eigenvector is expanding or contracting X-Y (2D) graph without altering the direction.
  • 20. 20 EigenValues and EigenVectors • In this shear mapping, the blue arrow changes direction whereas the pink arrow does not. • The pink arrow in this instance is an eigenvector because of its constant orientation. • The length of this arrow is also unaltered, and its eigenvalue is 1. • Technically, PC is a straight line that captures the maximum variance (information) of the data. • PC shows direction and magnitude. PC are perpendicular to each other.
  • 21. 21 Basic Terminologies of PCA  Dimensionality Reduction – Transpose of original data and multiply it by transposing of the derived feature vector.  Reducing the features without losing information.
  • 22. 22 How does PCA work?  The steps involved for PCA are as follows- 1. Original Data 2. Normalize the original data (mean =0, variance =1) 3. Calculating covariance matrix 4. Calculating Eigen values, Eigen vectors, and normalized Eigenvectors 5. Calculating Principal Component (PC) 6. Plot the graph for orthogonality between PCs
  • 23. 23
  • 24. 24 STEP 1: STANDARDIZATION  The range of variables is calculated and standardized in this process to analyze the contribution of each variable equally.  Calculating the initial variables will help you categorize the variables that are dominating the other variables of small ranges.  This will help you attain biased results at the end of the analysis.
  • 25. 25 STEP 1: STANDARDIZATION  To transform the variables of the same standard, you can follow the following formula.  Where, X= value in a data set N= number of values in the data set
  • 27. 27 STEP 1: STANDARDIZATION  Calculate the Mean and Standard Deviation for each feature and then, tabulate the same as follows.
  • 28. 28 STEP 1: STANDARDIZATION  after the Standardization of each variable, the results are tabulated below.
  • 29. 29 STEP 2: COVARIANCE MATRIX COMPUTATION  In this step, you will get to know how the variables of the given data are varying with the mean value calculated.  Any interrelated variables can also be sorted out at the end of this step.  To segregate the highly interrelated variables, you calculate the covariance matrix with the help of the given formula.  **Note: **A covariance matrix is a N x N symmetrical matrix that contains the covariances of all possible data sets.
  • 30. 30 STEP 2: COVARIANCE MATRIX COMPUTATION  The covariance matrix of two-dimensional data is, given as follows: covariance matrix of 3-dimensional data
  • 31. 31 STEP 2: COVARIANCE MATRIX COMPUTATION Insights of covariance matrix  the covariance of a number with itself is its variance (COV(X, X)=Var(X)), the values at the top left and bottom right will have the variances of the same initial number.  Covariance Matrix at the main diagonal will be symmetric concerning the fact that covariance is commutative (COV(X, Y)=COV(Y, X)).
  • 32. 32 STEP 2: COVARIANCE MATRIX COMPUTATION Insights of covariance matrix  If the value of the Covariance Matrix is positive, :variables are correlated. ( If X increases, Y also increases and vice versa)  If the value of the Covariance Matrix is negative: variables are inversely correlated. ( If X increases, Y also decreases and vice versa).i  End of this step, we will come to know which pair of variables are correlated with each other, so that you might categorize them much easier.
  • 33. 33 STEP 2: COVARIANCE MATRIX COMPUTATION  Example: The formula to calculate the covariance matrix of the given example will be:
  • 34. 34 STEP 2: COVARIANCE MATRIX COMPUTATION  Since you have already standardized the features, you can consider Mean = 0 and Standard Deviation=1 for each feature. VAR(F1) = ((-1.0695-0)² + (0.5347-0)² + (-1.0695-0)² + (0.5347–0)² +(1.069–0)²)/5  On solving the equation, you get, VAR(F1) = 0.78 COV(F1,F2) = ((-1.0695–0)(0.8196-0) + (0.5347–0)(-1.6393-0) + (- 1.0695–0)* (0.0000-0) + (0.5347–0)(0.0000-0)+ (1.0695–0)(0.8196– 0))/5  On solving the equation, you get, COV(F1,F2 = -0.8586)
  • 35. 35 covariance matrix  Similarly solving all the features, the covariance matrix will be,
  • 36. 36 STEP 4: FEATURE VECTOR  To determine the principal components of variables, we have to define eigen value and eigen vectors.  Let A be any square matrix. A non-zero vector v is an eigenvector of A if Av = λv for some number λ, called the corresponding eigenvalue.
  • 37. 37 STEP 4: FEATURE VECTOR  Once you have computed the eigen vector components, define eigen values in descending order ( for all variables) and now you will get a list of principal components.  So, the eigen values represent the principal components and these components represent the direction of data.
  • 38. 38 STEP 4: FEATURE VECTOR  This indicates that if the line contains large variables of large variances, then there are many data points on the line. Thus, there is more information on the line too.  Finally, these principal components form a line of new axes for easier evaluation of data and also the differences between the observations can also be easily monitored.
  • 39. 39 STEP 4: FEATURE VECTOR  Example:  Let ν be a non-zero vector and λ a scalar. As per the rule,  Aν = λν, then λ is called eigenvalue associated with eigenvector ν of A.
  • 40. 40 STEP 4: FEATURE VECTOR  Example:  Upon substituting the values in det(A- λI) = 0, you will get the following matrix.
  • 41. 41 STEP 4: FEATURE VECTOR  When you solve the following the matrix by considering 0 on right-hand side, you can define eigen values as λ = 2.11691 , 0.855413 , 0.481689 , 0.334007  Then, substitute each eigen value in (A-λI)ν=0 equation and solve the same for different eigen vectors v1, v2, v3 and v4.  For instance, For λ = 2.11691, solving the above equation using Cramer's rule, the values for the v vector are v1 = 0.515514 v2 = -0.616625 v3 = 0.399314 v4 =0.441098
  • 42. 42 STEP 4: FEATURE VECTOR  Follow the same process and you will form the following matrix by using the eigen vectors calculated as instructed.
  • 43. 43 STEP 4: FEATURE VECTOR  Sort the Eigenvalues in decreasing order.
  • 44. 44 STEP 4: FEATURE VECTOR  calculate the sum of each Eigen column, arrange them in descending order and pick up the topmost Eigen values.  These are your Principal components.
  • 45. 45 STEP 5: RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES  Still now, apart from standardization, you haven’t made any changes to the original data.  You have just selected the Principal components and formed a feature vector.  Yet, the initial data remains the same on their original axes.  This step aims at the reorientation of data from their original axes to the ones you have calculated from the Principal components.  This can be done by the following formula. Final Data Set= Standardized Original Data Set * FeatureVector
  • 46. 46 Example:  Standardized Original Data Set =  FeatureVector =
  • 47. 47 Example:  By solving the above equations, you will get the transformed data as follows.  Your large dataset is now compressed into a small dataset without any loss of data!
  • 48. 48 Example in Python  Step 1: Load the Iris Data Set
  • 49. 49 Step 2: Standardize the Data • Use StandardScaler to help you standardize the data set’s features onto unit scale (mean = 0 and variance = 1), which is a requirement for the optimal performance of many machine learning algorithms. • If you don’t scale your data, it can have a negative effect on your algorithm.
  • 50. 50 Step 2: Standardize the Data • Use StandardScaler to help you standardize the data set’s features onto unit scale (mean = 0 and variance = 1), which is a requirement for the optimal performance of many machine learning algorithms. • If you don’t scale your data, it can have a negative effect on your algorithm.
  • 51. 51 Step 3: PCA Projection to 2D  The original data has four columns (sepal length, sepal width, petal length and petal width).  In this section, the code projects the original data, which is four-dimensional, into two dimensions.
  • 52. 52 Step 3: PCA Projection to 2D  Concatenating DataFrame along axis = 1.  finalDf is the final DataFrame before plotting the data.
  • 53. 53 Step 4: Visualize 2D Projection
  • 54. 54 Step 4: Visualize 2D Projection
  • 55. 55 Explained Variance  The explained variance tells you how much information (variance) can be attributed to each of the principal components.  By using the attribute explained_variance_ratio_, you can see that the first principal component contains 72.96 percent of the variance, and the second principal component contains 22.85 percent of the variance.  Together, the two components contain 95.71 percent of the information.
  • 56. 56 Perform a Scree Plot of the Principal Components  A scree plot is like a bar chart showing the size of each of the principal components.  It helps us to visualize the percentage of variation captured by each of the principal components.  To perform a scree plot you need to:  first of all, create a list of columns  then, list of PCs  finally, do the scree plot using plt