Dimensionality Reduction Principal Component Analysis (PCA).pdf

CS229: Machine Learning
Dimensionality Reduction
Principal Component Analysis
(PCA)
©2022 Carlos Guestrin
Carlos Guestrin
Stanford University
Slides include content developed by and co-developed with Emily Fox

2
Embedding
Example: Embedding images to visualize data
Data
ML
Method
Intelligence
PCA
[Saul &
Roweis ‘03]
Images with
thousands or
millions of pixels
Can we give each
image a coordinate,
such that similar
images are near
each other?

3
[Joseph Turian 2008]
Embedding words

4
Embedding words (zoom in)
[Joseph Turian 2008]

5
Dimensionality reduction
• Input data may have thousands or millions of
dimensions!
- e.g., text data
• Dimensionality reduction: represent data with fewer
dimensions
- easier learning – fewer parameters
- visualization – hard to visualize more than 3D or 4D
- discover “intrinsic dimensionality” of data
• high dimensional data that is truly lower dimensional

6
Lower dimensional projections
• Rather than picking a subset of the features, we can create
new features that are combinations of existing features
• Let’s see this in the unsupervised setting
- just x, but no y

7
Linear projection and reconstruction
#awesome
#awful
0 1 2 3 4 …
0
1
2
3
4
…
Reconstruction:
Only knowing z,
what was (x1,x2)?
Project onto
1-dimension

8
What if we project onto d vectors?
#awesome
#awful
0 1 2 3 4 …
0
1
2
3
4
…
Perfect
reconstruction!

9
If I had to choose one of these vectors, which
do I prefer?
#awesome
#awful
0 1 2 3 4 …
0
1
2
3
4
…

10
Principal component analysis (PCA) –
Basic idea
• Project d-dimensional data into k-dimensional space
while preserving as much information as possible:
- e.g., project space of 10000 words into 3-dimensions
- e.g., project 3-d into 2-d
• Choose projection with minimum reconstruction error

11
“PCA explained visually”
http://setosa.io/ev/principal-component-analysis/

12
Linear projections, a review
• Project a point into a (lower dimensional) space:
- point: x = (x1,…,xd)
- select a basis – set of basis vectors – (u1,…,uk)
• we consider orthonormal basis:
- ui•ui=1, and ui•uj=0 for i¹j
- select a center – x, defines offset of space
- best coordinates in lower dimensional space defined by dot-products:
(z1,…,zk), zi = (x-x)•ui
• minimum squared error

13
PCA finds projection that minimizes
reconstruction error
• Given N data points: xi = (x1
i,…,xd
i), i=1…N
• Will represent each point as a projection:
here: and
• PCA:
- Given k<<d, find (u1,…,uk)
minimizing reconstruction error:
x1
x2
N
N
N

14
• Note that xi can be represented exactly by
d-dimensional projection:
• Rewriting error:
Understanding the
reconstruction error
¨Given k<<d, find (u1,…,uk)
minimizing reconstruction error:
N
d

15
Reconstruction error and covariance matrix
N
N
N
d

16
Minimizing reconstruction error
and eigen vectors
N
d
• Minimizing reconstruction error equivalent to picking
orthonormal basis (u1,…,ud) minimizing:
• Eigen vector:
• Minimizing reconstruction error equivalent to picking
(uk+1,…,ud) to be eigen vectors with smallest eigen values

17
Basic PCA algoritm
• Start from N by d data matrix X
• Recenter: subtract mean from each row of X
- Xc ¬ X – X
• Compute covariance matrix:
- S ¬ 1/N Xc
T Xc
• Find eigen vectors and values of S
• Principal components: k eigen vectors with
highest eigen values

18
PCA example

19
PCA example – reconstruction
only used first principal component

20
Eigenfaces [Turk, Pentland ’91]
• Input
images:
n Principal components:

21
Eigenfaces reconstruction
• Each image corresponds to adding 8 principal
components:

22
Scaling up
• Covariance matrix can be really big!
- S is d by d
- Say, only 10000 features
- finding eigenvectors is very slow…
• Use singular value decomposition (SVD)
- finds to k eigenvectors
- great implementations available, e.g., python, R, Matlab svd

23
SVD
• Write X = W S VT
- X ¬ data matrix, one row per datapoint
- W ¬ weight matrix, one row per datapoint – coordinate of xi in
eigenspace
- S ¬ singular value matrix, diagonal matrix
• in our setting each entry is eigenvalue lj
- VT ¬ singular vector matrix
• in our setting each row is eigenvector vj

24
PCA using SVD algoritm
• Start from m by n data matrix X
• Recenter: subtract mean from each row of X
- Xc ¬ X – X
• Call SVD algorithm on Xc – ask for k singular vectors
• Principal components: k singular vectors with highest
singular values (rows of VT)
- Coefficients become:

25
What you need to know
• Dimensionality reduction
- why and when it’s important
• Simple feature selection
• Principal component analysis
- minimizing reconstruction error
- relationship to covariance matrix and eigenvectors
- using SVD

Dimensionality Reduction Principal Component Analysis (PCA).pdf

More Related Content

Similar to Dimensionality Reduction Principal Component Analysis (PCA).pdf (20)

More from Yatru Harsha Hiski (11)

Recently uploaded (20)

Dimensionality Reduction Principal Component Analysis (PCA).pdf