SlideShare a Scribd company logo
Principle Component Analysis
Linkon Chowdhury
Dept. of Computer Science & Engineering, CUET
2 Department of CSE, CUET
Outline
• Introduction
• Objective
• Coordinate System
• PCA Visualization
• Steps of Principle Component Analysis
• Variance & Covariance
• Eigenvector & Eigenvalue
• Conclusion
3 Department of CSE, CUET
Introduction
PCA (Principle Component Analysis) is defined as an
orthogonal linear transformation that transforms the
data to a new coordinate system such that the greatest
variance comes to lie on the first coordinate, the second
greatest variance on the second coordinate and so on.
4 Department of CSE, CUET
Objective
Principal component analysis (PCA) is a way to reduce
data dimensionality
PCA projects high dimensional data to a lower dimension
PCA projects the data in the least square sense– it captures
big (principal) variability in the data and ignores small
variability
5 Department of CSE, CUET
Philosophy of PCA
Introduced by Pearson (1901) and Hotelling
(1933) to describe the variation in a set of
multivariate data in terms of a set of uncorrelated
variables
We typically have a data matrix of n observations
on p correlated variables x1,x2,…xp
PCA looks for a transformation of the xi into p
new variables yi that are uncorrelated
6 Department of CSE, CUET
Data set
7 Department of CSE, CUET
Principal Component Analysis
Each Coordinate in Principle Component Analysis
is called Principle Component.
Ci = bi1 (x1) + bi2 (x2) + … + bin(xn)
where, Ci is the ith
principle component, bij is the
regression coefficient for observed variable j for
the principle component i and xi are the
variables/dimensions.
8 Department of CSE, CUET
Principal Component Analysis[cont..]
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
9 Department of CSE, CUET
Principal Component Analysis[cont..]
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
10 Department of CSE, CUET
Principal Component Analysis[cont..]
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance etc.
11 Department of CSE, CUET
PCA: Visually
Data points are represented in a rotated orthogonal coordinate system:
the origin is the mean of the data points and the axes are provided by
the eigenvectors
12 Department of CSE, CUET
Steps to Find Principle Component
1. Adjust the dataset to zero mean dataset.
2. Find the Covariance Matrix M
3. Calculate the normalized Eigenvectors and Eigenvalues
of M
4. Sort the Eigenvectors according to Eigenvalues from
highest to lowest
13 Department of CSE, CUET
Eigenvector and Principle Component
It turns out that the Eigenvectors of covariance matrix of
the data set are the principle components of the data set.
Eigenvector with the highest eigenvalue is first principle
component and with the 2nd
highest eigenvalue is the
second principle component and so on
14 Department of CSE, CUET
Example
AdjustedData Set=Original Data-Mean
Original Data set Adjusted Data Set
X Y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9
X Y
0.69 0.49
-1.31 -1.21
0.39 0.99
0.09 0.29
1.29 1.09
0.49 0.79
0.19 -0.31
-0.81 -0.81
-0.31 -0.31
-0.71 -1.01
15 Department of CSE, CUET
Variance & Covariance
The variance is a measure of how far a set of numbers is
spread out.
The equation of variance is
  
1
)
( 1






n
X
X
X
X
x
Var
n
i
i
i
16 Department of CSE, CUET
Variance & Covariance (cont..)
• Covariance measure how much to random variable change
together.
Equation of Covariance:
  
1
)
,
( 1






n
y
y
x
x
y
x
Cov
n
i
i
i
17 Department of CSE, CUET
Covariance Matrix
A covariance matrix n*n matrix where each element can be
defined as
A Covariance Matrix on 2-Dimensional Data Set:
)
,
cov( j
i
Mij 




)
,
(
)
,
(
x
y
Cov
x
x
Cov
M 


)
,
(
)
,
(
y
y
Cov
y
x
Cov
18 Department of CSE, CUET
Covariance Matrix(Cont...)







716555556
.
0
615444444
.
0
615444444
.
0
6
0.61655555
M
19 Department of CSE, CUET
Eigenvector & Eigenvalue
The eigenvectors of a square matrix A are the
non-zero vectors x such that, after being multiplied by
the matrix, remain parallel to the original vector.






1
1
1
2






 3
3
 





 3
3
20 Department of CSE, CUET
Eigenvector & Eigenvalue(cont..)
For each Eigenvector, the corresponding Eigenvalue is the
factor by which the eigenvector is scaled when multiplied
by the matrix.






1
1
1
2






 3
3
 





 3
3
.
1
21 Department of CSE, CUET
Eigenvector & Eigenvalue(cont..)
The vector x is an eigenvector of the matrix A with
eigenvalue λ (lambda) if the following equation holds:
0
)
(
,
0
,





x
I
A
or
x
Ax
or
x
Ax



22 Department of CSE, CUET
Eigenvector & Eigenvalue(cont..)
Calculating Eigenvalues
Calculating Eigenvector
0

 I
A 
  0

 x
I
A 
23 Department of CSE, CUET
Example…
Suppose A is a matrix
Finding Eigenvalue using
or,






2
1
1
A
2
2
0






3
1
1
0

 I
A 




 
2
1
1 
2
2
0










3
1
1
0

   
3
,
2
,
1
0
3
2
1










24 Department of CSE, CUET
Example…
Finding Eigenvector using
For ,λ=1
So, Let, x=k and y=-k
Eigenvector x1 is
  0

 x
I
A 





2
1
0
2
1
0






2
1
1










z
y
x











0
0
0
0
0





z
y
x
z











0
k
k












0
1
1
25 Department of CSE, CUET
Example…
For λ=2,
Eigenvector x2 =
For λ=3,
Eigenvector x3 =
So, Normalized Eigenvector x =











2
1
2












2
1
1






0
1
1
2
1
2








2
1
1
26 Department of CSE, CUET
4.0 4.5 5.0 5.5 6.0
2
3
4
5
1st Principal
Component, y1
2nd Principal
Component, y2
PCA Presentation
27 Department of CSE, CUET
PCA Scores
4.0 4.5 5.0 5.5 6.0
2
3
4
5
xi2
xi1
yi,1 yi,2
28 Department of CSE, CUET
PCA Eigenvalues
4.0 4.5 5.0 5.5 6.0
2
3
4
5
λ1
λ2
29 Department of CSE, CUET
Application
Uses:
Data Visualization
Data Reduction
Data Classification
Trend Analysis
Factor Analysis
Noise Reduction
Examples:
How many unique “sub-sets” are in the
sample?
How are they similar / different?
What are the underlying factors that influence
the samples?
Which time / temporal trends are
(anti)correlated?
Which measurements are needed to
differentiate?
How to best present what is “interesting”?
Which “sub-set” does this new sample
rightfully belong?
30 Department of CSE, CUET
Thanks to All

More Related Content

PPTX
Principal Component Analysis PCA
PDF
Mathematical Introduction to Principal Components Analysis
PPT
Lecture 12 Principal Component Analysis in Machine Learning.ppt
PPT
pca in machine learning pca in machine learning pca in machine learning pca i...
PPT
Principal Component Analysis (PCA):How to conduct PCA
PPTX
pcappt-140121072949-phpapp01.pptx
PPT
PPT
The following ppt is about principal component analysis
Principal Component Analysis PCA
Mathematical Introduction to Principal Components Analysis
Lecture 12 Principal Component Analysis in Machine Learning.ppt
pca in machine learning pca in machine learning pca in machine learning pca i...
Principal Component Analysis (PCA):How to conduct PCA
pcappt-140121072949-phpapp01.pptx
The following ppt is about principal component analysis

Similar to pca analysis principal component pca.ppt (20)

DOCX
Principal Component Analysis
PPTX
principle component analysis.pptx
PDF
Principal Components Analysis, Calculation and Visualization
PPTX
PPT
Class9_PCA_final.ppt
PDF
Covariance.pdf
PPTX
Principal Component Analysis(PCA) lecture
PDF
Soham Patra_13000120121.pdf
PPTX
11 Principal Component Analysis Computer Graphics.pptx
PDF
PDF
Principal component analysis for dimesion reductions for finer data analysis
PPTX
Principal Component Analysis in Machine learning.pptx
PPTX
Principal Component Analysis (PCA).pptx
PDF
Overview and Implementation of Principal Component Analysis
PPTX
PDF
PCA assignment 1.pdf
PPTX
Implement principal component analysis (PCA) in python from scratch
PDF
Principal Component Analysis in Machine Learning.pdf
Principal Component Analysis
principle component analysis.pptx
Principal Components Analysis, Calculation and Visualization
Class9_PCA_final.ppt
Covariance.pdf
Principal Component Analysis(PCA) lecture
Soham Patra_13000120121.pdf
11 Principal Component Analysis Computer Graphics.pptx
Principal component analysis for dimesion reductions for finer data analysis
Principal Component Analysis in Machine learning.pptx
Principal Component Analysis (PCA).pptx
Overview and Implementation of Principal Component Analysis
PCA assignment 1.pdf
Implement principal component analysis (PCA) in python from scratch
Principal Component Analysis in Machine Learning.pdf
Ad

Recently uploaded (20)

PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Science Form five needed shit SCIENEce so
PPT
veterinary parasitology ````````````.ppt
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPTX
Substance Disorders- part different drugs change body
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
BIOMOLECULES PPT........................
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
lecture 2026 of Sjogren's syndrome l .pdf
Science Form five needed shit SCIENEce so
veterinary parasitology ````````````.ppt
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Substance Disorders- part different drugs change body
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Biomechanics of the Hip - Basic Science.pptx
Fluid dynamics vivavoce presentation of prakash
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Hypertension_Training_materials_English_2024[1] (1).pptx
BIOMOLECULES PPT........................
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Placing the Near-Earth Object Impact Probability in Context
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
Ad

pca analysis principal component pca.ppt

  • 1. Principle Component Analysis Linkon Chowdhury Dept. of Computer Science & Engineering, CUET
  • 2. 2 Department of CSE, CUET Outline • Introduction • Objective • Coordinate System • PCA Visualization • Steps of Principle Component Analysis • Variance & Covariance • Eigenvector & Eigenvalue • Conclusion
  • 3. 3 Department of CSE, CUET Introduction PCA (Principle Component Analysis) is defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance comes to lie on the first coordinate, the second greatest variance on the second coordinate and so on.
  • 4. 4 Department of CSE, CUET Objective Principal component analysis (PCA) is a way to reduce data dimensionality PCA projects high dimensional data to a lower dimension PCA projects the data in the least square sense– it captures big (principal) variability in the data and ignores small variability
  • 5. 5 Department of CSE, CUET Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data in terms of a set of uncorrelated variables We typically have a data matrix of n observations on p correlated variables x1,x2,…xp PCA looks for a transformation of the xi into p new variables yi that are uncorrelated
  • 6. 6 Department of CSE, CUET Data set
  • 7. 7 Department of CSE, CUET Principal Component Analysis Each Coordinate in Principle Component Analysis is called Principle Component. Ci = bi1 (x1) + bi2 (x2) + … + bin(xn) where, Ci is the ith principle component, bij is the regression coefficient for observed variable j for the principle component i and xi are the variables/dimensions.
  • 8. 8 Department of CSE, CUET Principal Component Analysis[cont..] From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk
  • 9. 9 Department of CSE, CUET Principal Component Analysis[cont..] From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk
  • 10. 10 Department of CSE, CUET Principal Component Analysis[cont..] From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk such that: yk's are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.
  • 11. 11 Department of CSE, CUET PCA: Visually Data points are represented in a rotated orthogonal coordinate system: the origin is the mean of the data points and the axes are provided by the eigenvectors
  • 12. 12 Department of CSE, CUET Steps to Find Principle Component 1. Adjust the dataset to zero mean dataset. 2. Find the Covariance Matrix M 3. Calculate the normalized Eigenvectors and Eigenvalues of M 4. Sort the Eigenvectors according to Eigenvalues from highest to lowest
  • 13. 13 Department of CSE, CUET Eigenvector and Principle Component It turns out that the Eigenvectors of covariance matrix of the data set are the principle components of the data set. Eigenvector with the highest eigenvalue is first principle component and with the 2nd highest eigenvalue is the second principle component and so on
  • 14. 14 Department of CSE, CUET Example AdjustedData Set=Original Data-Mean Original Data set Adjusted Data Set X Y 2.5 2.4 0.5 0.7 2.2 2.9 1.9 2.2 3.1 3.0 2.3 2.7 2 1.6 1 1.1 1.5 1.6 1.1 0.9 X Y 0.69 0.49 -1.31 -1.21 0.39 0.99 0.09 0.29 1.29 1.09 0.49 0.79 0.19 -0.31 -0.81 -0.81 -0.31 -0.31 -0.71 -1.01
  • 15. 15 Department of CSE, CUET Variance & Covariance The variance is a measure of how far a set of numbers is spread out. The equation of variance is    1 ) ( 1       n X X X X x Var n i i i
  • 16. 16 Department of CSE, CUET Variance & Covariance (cont..) • Covariance measure how much to random variable change together. Equation of Covariance:    1 ) , ( 1       n y y x x y x Cov n i i i
  • 17. 17 Department of CSE, CUET Covariance Matrix A covariance matrix n*n matrix where each element can be defined as A Covariance Matrix on 2-Dimensional Data Set: ) , cov( j i Mij      ) , ( ) , ( x y Cov x x Cov M    ) , ( ) , ( y y Cov y x Cov
  • 18. 18 Department of CSE, CUET Covariance Matrix(Cont...)        716555556 . 0 615444444 . 0 615444444 . 0 6 0.61655555 M
  • 19. 19 Department of CSE, CUET Eigenvector & Eigenvalue The eigenvectors of a square matrix A are the non-zero vectors x such that, after being multiplied by the matrix, remain parallel to the original vector.       1 1 1 2        3 3         3 3
  • 20. 20 Department of CSE, CUET Eigenvector & Eigenvalue(cont..) For each Eigenvector, the corresponding Eigenvalue is the factor by which the eigenvector is scaled when multiplied by the matrix.       1 1 1 2        3 3         3 3 . 1
  • 21. 21 Department of CSE, CUET Eigenvector & Eigenvalue(cont..) The vector x is an eigenvector of the matrix A with eigenvalue λ (lambda) if the following equation holds: 0 ) ( , 0 ,      x I A or x Ax or x Ax   
  • 22. 22 Department of CSE, CUET Eigenvector & Eigenvalue(cont..) Calculating Eigenvalues Calculating Eigenvector 0   I A    0   x I A 
  • 23. 23 Department of CSE, CUET Example… Suppose A is a matrix Finding Eigenvalue using or,       2 1 1 A 2 2 0       3 1 1 0   I A        2 1 1  2 2 0           3 1 1 0      3 , 2 , 1 0 3 2 1          
  • 24. 24 Department of CSE, CUET Example… Finding Eigenvector using For ,λ=1 So, Let, x=k and y=-k Eigenvector x1 is   0   x I A       2 1 0 2 1 0       2 1 1           z y x            0 0 0 0 0      z y x z            0 k k             0 1 1
  • 25. 25 Department of CSE, CUET Example… For λ=2, Eigenvector x2 = For λ=3, Eigenvector x3 = So, Normalized Eigenvector x =            2 1 2             2 1 1       0 1 1 2 1 2         2 1 1
  • 26. 26 Department of CSE, CUET 4.0 4.5 5.0 5.5 6.0 2 3 4 5 1st Principal Component, y1 2nd Principal Component, y2 PCA Presentation
  • 27. 27 Department of CSE, CUET PCA Scores 4.0 4.5 5.0 5.5 6.0 2 3 4 5 xi2 xi1 yi,1 yi,2
  • 28. 28 Department of CSE, CUET PCA Eigenvalues 4.0 4.5 5.0 5.5 6.0 2 3 4 5 λ1 λ2
  • 29. 29 Department of CSE, CUET Application Uses: Data Visualization Data Reduction Data Classification Trend Analysis Factor Analysis Noise Reduction Examples: How many unique “sub-sets” are in the sample? How are they similar / different? What are the underlying factors that influence the samples? Which time / temporal trends are (anti)correlated? Which measurements are needed to differentiate? How to best present what is “interesting”? Which “sub-set” does this new sample rightfully belong?
  • 30. 30 Department of CSE, CUET Thanks to All