SlideShare a Scribd company logo
CS229: Machine Learning
Dimensionality Reduction
Principal Component Analysis
(PCA)
©2022 Carlos Guestrin
CS229: Machine Learning
Carlos Guestrin
Stanford University
Slides include content developed by and co-developed with Emily Fox
CS229: Machine Learning
2
Embedding
Example: Embedding images to visualize data
Data
ML
Method
Intelligence
PCA
[Saul &
Roweis ‘03]
Images with
thousands or
millions of pixels
Can we give each
image a coordinate,
such that similar
images are near
each other?
©2022 Carlos Guestrin
CS229: Machine Learning
3
[Joseph Turian 2008]
Embedding words
©2022 Carlos Guestrin
CS229: Machine Learning
4
Embedding words (zoom in)
[Joseph Turian 2008]
©2022 Carlos Guestrin
CS229: Machine Learning
5
Dimensionality reduction
• Input data may have thousands or millions of
dimensions!
- e.g., text data
• Dimensionality reduction: represent data with fewer
dimensions
- easier learning – fewer parameters
- visualization – hard to visualize more than 3D or 4D
- discover “intrinsic dimensionality” of data
• high dimensional data that is truly lower dimensional
©2022 Carlos Guestrin
CS229: Machine Learning
6
Lower dimensional projections
• Rather than picking a subset of the features, we can create
new features that are combinations of existing features
• Let’s see this in the unsupervised setting
- just x, but no y
©2022 Carlos Guestrin
CS229: Machine Learning
7
Linear projection and reconstruction
#awesome
#awful
0 1 2 3 4 …
0
1
2
3
4
…
Reconstruction:
Only knowing z,
what was (x1,x2)?
Project onto
1-dimension
©2022 Carlos Guestrin
CS229: Machine Learning
8
What if we project onto d vectors?
#awesome
#awful
0 1 2 3 4 …
0
1
2
3
4
…
Perfect
reconstruction!
©2022 Carlos Guestrin
CS229: Machine Learning
9
If I had to choose one of these vectors, which
do I prefer?
#awesome
#awful
0 1 2 3 4 …
0
1
2
3
4
…
©2022 Carlos Guestrin
CS229: Machine Learning
10
Principal component analysis (PCA) –
Basic idea
• Project d-dimensional data into k-dimensional space
while preserving as much information as possible:
- e.g., project space of 10000 words into 3-dimensions
- e.g., project 3-d into 2-d
• Choose projection with minimum reconstruction error
©2022 Carlos Guestrin
CS229: Machine Learning
11
“PCA explained visually”
http://setosa.io/ev/principal-component-analysis/
©2022 Carlos Guestrin
CS229: Machine Learning
12
Linear projections, a review
• Project a point into a (lower dimensional) space:
- point: x = (x1,…,xd)
- select a basis – set of basis vectors – (u1,…,uk)
• we consider orthonormal basis:
- ui•ui=1, and ui•uj=0 for i¹j
- select a center – x, defines offset of space
- best coordinates in lower dimensional space defined by dot-products:
(z1,…,zk), zi = (x-x)•ui
• minimum squared error
©2022 Carlos Guestrin
CS229: Machine Learning
13
PCA finds projection that minimizes
reconstruction error
• Given N data points: xi = (x1
i,…,xd
i), i=1…N
• Will represent each point as a projection:
here: and
• PCA:
- Given k<<d, find (u1,…,uk)
minimizing reconstruction error:
x1
x2
N
N
N
©2022 Carlos Guestrin
CS229: Machine Learning
14
• Note that xi can be represented exactly by
d-dimensional projection:
• Rewriting error:
Understanding the
reconstruction error
¨Given k<<d, find (u1,…,uk)
minimizing reconstruction error:
N
d
©2022 Carlos Guestrin
CS229: Machine Learning
15
Reconstruction error and covariance matrix
N
N
N
d
©2022 Carlos Guestrin
CS229: Machine Learning
16
Minimizing reconstruction error
and eigen vectors
N
d
• Minimizing reconstruction error equivalent to picking
orthonormal basis (u1,…,ud) minimizing:
• Eigen vector:
• Minimizing reconstruction error equivalent to picking
(uk+1,…,ud) to be eigen vectors with smallest eigen values
©2022 Carlos Guestrin
CS229: Machine Learning
17
Basic PCA algoritm
• Start from N by d data matrix X
• Recenter: subtract mean from each row of X
- Xc ¬ X – X
• Compute covariance matrix:
- S ¬ 1/N Xc
T Xc
• Find eigen vectors and values of S
• Principal components: k eigen vectors with
highest eigen values
©2022 Carlos Guestrin
CS229: Machine Learning
18
PCA example
©2022 Carlos Guestrin
CS229: Machine Learning
19
PCA example – reconstruction
only used first principal component
©2022 Carlos Guestrin
CS229: Machine Learning
20
Eigenfaces [Turk, Pentland ’91]
• Input
images:
n Principal components:
©2022 Carlos Guestrin
CS229: Machine Learning
21
Eigenfaces reconstruction
• Each image corresponds to adding 8 principal
components:
©2022 Carlos Guestrin
CS229: Machine Learning
22
Scaling up
• Covariance matrix can be really big!
- S is d by d
- Say, only 10000 features
- finding eigenvectors is very slow…
• Use singular value decomposition (SVD)
- finds to k eigenvectors
- great implementations available, e.g., python, R, Matlab svd
©2022 Carlos Guestrin
CS229: Machine Learning
23
SVD
• Write X = W S VT
- X ¬ data matrix, one row per datapoint
- W ¬ weight matrix, one row per datapoint – coordinate of xi in
eigenspace
- S ¬ singular value matrix, diagonal matrix
• in our setting each entry is eigenvalue lj
- VT ¬ singular vector matrix
• in our setting each row is eigenvector vj
©2022 Carlos Guestrin
CS229: Machine Learning
24
PCA using SVD algoritm
• Start from m by n data matrix X
• Recenter: subtract mean from each row of X
- Xc ¬ X – X
• Call SVD algorithm on Xc – ask for k singular vectors
• Principal components: k singular vectors with highest
singular values (rows of VT)
- Coefficients become:
©2022 Carlos Guestrin
CS229: Machine Learning
25
What you need to know
• Dimensionality reduction
- why and when it’s important
• Simple feature selection
• Principal component analysis
- minimizing reconstruction error
- relationship to covariance matrix and eigenvectors
- using SVD
©2022 Carlos Guestrin

More Related Content

PDF
K-means slides, K-means annotated, GMM slides, GMM annotated.pdf
PDF
Making BIG DATA smaller
PDF
Deep single view 3 d object reconstruction with visual hull
PPTX
Parking space detect
PPTX
Homomorphic Encryption
PPTX
Evaluation of programs codes using machine learning
PDF
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
PDF
Blockchain Technology - Week 6 - Role of Cryptography in Blockchain
K-means slides, K-means annotated, GMM slides, GMM annotated.pdf
Making BIG DATA smaller
Deep single view 3 d object reconstruction with visual hull
Parking space detect
Homomorphic Encryption
Evaluation of programs codes using machine learning
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Blockchain Technology - Week 6 - Role of Cryptography in Blockchain

Similar to Dimensionality Reduction Principal Component Analysis (PCA).pdf (20)

PPTX
Dzanan_Bajgoric_C2CUDA_MscThesis_Present
PDF
Lect8 viewing in3d&transformation
PDF
Structured Forests for Fast Edge Detection [Paper Presentation]
PDF
Visualising Big Data
PPTX
Dynamic Programming Matrix Chain Multiplication
PPT
Engr 22 lec-02-sp07_acad_intro
PPT
implementing the encryption in the JAVA.ppt
PDF
Lec1- CS110 Computational Engineering
PPTX
COMPUTER GRAPHICS
PPTX
An Efficient Convex Hull Algorithm for a Planer Set of Points
PDF
Computer graphics 2
PDF
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
PPTX
Image Cryptography and Steganography
PPTX
S12075-GPU-Accelerated-Video-Encoding.pptx
KEY
Introducing AlloyUI DiagramBuilder
PDF
Elliptic Curve Cryptography and Zero Knowledge Proof
PDF
Elliptic curve cryptography and zero knowledge proof
PPT
Unit I-cg.ppt Introduction to Computer Graphics elements
PPT
Introduction to Computer Graphics elements
PPT
Introduction to Computer Graphics computer
Dzanan_Bajgoric_C2CUDA_MscThesis_Present
Lect8 viewing in3d&transformation
Structured Forests for Fast Edge Detection [Paper Presentation]
Visualising Big Data
Dynamic Programming Matrix Chain Multiplication
Engr 22 lec-02-sp07_acad_intro
implementing the encryption in the JAVA.ppt
Lec1- CS110 Computational Engineering
COMPUTER GRAPHICS
An Efficient Convex Hull Algorithm for a Planer Set of Points
Computer graphics 2
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
Image Cryptography and Steganography
S12075-GPU-Accelerated-Video-Encoding.pptx
Introducing AlloyUI DiagramBuilder
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic curve cryptography and zero knowledge proof
Unit I-cg.ppt Introduction to Computer Graphics elements
Introduction to Computer Graphics elements
Introduction to Computer Graphics computer
Ad

More from Yatru Harsha Hiski (11)

PDF
Unit-10 Graphs .pdf
PDF
Unit-9 Searching .pdf
PDF
3. List .pdf
PDF
4. Linked list .pdf
PPTX
MIC3_The Intel 8086 .pptx
PDF
ch14_1 RISC Processors .pdf
PDF
ch16_1 Memory System Design .pdf
PDF
PRINCIPAL COMPONENTS (PCA) AND EXPLORATORY FACTOR ANALYSIS (EFA) WITH SPSS.pdf
PPTX
Fault Tolerance in Distributed System
PDF
1. Instruction set of 8085 .pdf
PDF
6. Perspective Projection .pdf
Unit-10 Graphs .pdf
Unit-9 Searching .pdf
3. List .pdf
4. Linked list .pdf
MIC3_The Intel 8086 .pptx
ch14_1 RISC Processors .pdf
ch16_1 Memory System Design .pdf
PRINCIPAL COMPONENTS (PCA) AND EXPLORATORY FACTOR ANALYSIS (EFA) WITH SPSS.pdf
Fault Tolerance in Distributed System
1. Instruction set of 8085 .pdf
6. Perspective Projection .pdf
Ad

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Lecture1 pattern recognition............
PDF
Introduction to the R Programming Language
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Knowledge Engineering Part 1
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Lecture1 pattern recognition............
Introduction to the R Programming Language
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Quality review (1)_presentation of this 21
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction-to-Cloud-ComputingFinal.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Knowledge Engineering Part 1

Dimensionality Reduction Principal Component Analysis (PCA).pdf

  • 1. CS229: Machine Learning Dimensionality Reduction Principal Component Analysis (PCA) ©2022 Carlos Guestrin CS229: Machine Learning Carlos Guestrin Stanford University Slides include content developed by and co-developed with Emily Fox
  • 2. CS229: Machine Learning 2 Embedding Example: Embedding images to visualize data Data ML Method Intelligence PCA [Saul & Roweis ‘03] Images with thousands or millions of pixels Can we give each image a coordinate, such that similar images are near each other? ©2022 Carlos Guestrin
  • 3. CS229: Machine Learning 3 [Joseph Turian 2008] Embedding words ©2022 Carlos Guestrin
  • 4. CS229: Machine Learning 4 Embedding words (zoom in) [Joseph Turian 2008] ©2022 Carlos Guestrin
  • 5. CS229: Machine Learning 5 Dimensionality reduction • Input data may have thousands or millions of dimensions! - e.g., text data • Dimensionality reduction: represent data with fewer dimensions - easier learning – fewer parameters - visualization – hard to visualize more than 3D or 4D - discover “intrinsic dimensionality” of data • high dimensional data that is truly lower dimensional ©2022 Carlos Guestrin
  • 6. CS229: Machine Learning 6 Lower dimensional projections • Rather than picking a subset of the features, we can create new features that are combinations of existing features • Let’s see this in the unsupervised setting - just x, but no y ©2022 Carlos Guestrin
  • 7. CS229: Machine Learning 7 Linear projection and reconstruction #awesome #awful 0 1 2 3 4 … 0 1 2 3 4 … Reconstruction: Only knowing z, what was (x1,x2)? Project onto 1-dimension ©2022 Carlos Guestrin
  • 8. CS229: Machine Learning 8 What if we project onto d vectors? #awesome #awful 0 1 2 3 4 … 0 1 2 3 4 … Perfect reconstruction! ©2022 Carlos Guestrin
  • 9. CS229: Machine Learning 9 If I had to choose one of these vectors, which do I prefer? #awesome #awful 0 1 2 3 4 … 0 1 2 3 4 … ©2022 Carlos Guestrin
  • 10. CS229: Machine Learning 10 Principal component analysis (PCA) – Basic idea • Project d-dimensional data into k-dimensional space while preserving as much information as possible: - e.g., project space of 10000 words into 3-dimensions - e.g., project 3-d into 2-d • Choose projection with minimum reconstruction error ©2022 Carlos Guestrin
  • 11. CS229: Machine Learning 11 “PCA explained visually” http://setosa.io/ev/principal-component-analysis/ ©2022 Carlos Guestrin
  • 12. CS229: Machine Learning 12 Linear projections, a review • Project a point into a (lower dimensional) space: - point: x = (x1,…,xd) - select a basis – set of basis vectors – (u1,…,uk) • we consider orthonormal basis: - ui•ui=1, and ui•uj=0 for i¹j - select a center – x, defines offset of space - best coordinates in lower dimensional space defined by dot-products: (z1,…,zk), zi = (x-x)•ui • minimum squared error ©2022 Carlos Guestrin
  • 13. CS229: Machine Learning 13 PCA finds projection that minimizes reconstruction error • Given N data points: xi = (x1 i,…,xd i), i=1…N • Will represent each point as a projection: here: and • PCA: - Given k<<d, find (u1,…,uk) minimizing reconstruction error: x1 x2 N N N ©2022 Carlos Guestrin
  • 14. CS229: Machine Learning 14 • Note that xi can be represented exactly by d-dimensional projection: • Rewriting error: Understanding the reconstruction error ¨Given k<<d, find (u1,…,uk) minimizing reconstruction error: N d ©2022 Carlos Guestrin
  • 15. CS229: Machine Learning 15 Reconstruction error and covariance matrix N N N d ©2022 Carlos Guestrin
  • 16. CS229: Machine Learning 16 Minimizing reconstruction error and eigen vectors N d • Minimizing reconstruction error equivalent to picking orthonormal basis (u1,…,ud) minimizing: • Eigen vector: • Minimizing reconstruction error equivalent to picking (uk+1,…,ud) to be eigen vectors with smallest eigen values ©2022 Carlos Guestrin
  • 17. CS229: Machine Learning 17 Basic PCA algoritm • Start from N by d data matrix X • Recenter: subtract mean from each row of X - Xc ¬ X – X • Compute covariance matrix: - S ¬ 1/N Xc T Xc • Find eigen vectors and values of S • Principal components: k eigen vectors with highest eigen values ©2022 Carlos Guestrin
  • 18. CS229: Machine Learning 18 PCA example ©2022 Carlos Guestrin
  • 19. CS229: Machine Learning 19 PCA example – reconstruction only used first principal component ©2022 Carlos Guestrin
  • 20. CS229: Machine Learning 20 Eigenfaces [Turk, Pentland ’91] • Input images: n Principal components: ©2022 Carlos Guestrin
  • 21. CS229: Machine Learning 21 Eigenfaces reconstruction • Each image corresponds to adding 8 principal components: ©2022 Carlos Guestrin
  • 22. CS229: Machine Learning 22 Scaling up • Covariance matrix can be really big! - S is d by d - Say, only 10000 features - finding eigenvectors is very slow… • Use singular value decomposition (SVD) - finds to k eigenvectors - great implementations available, e.g., python, R, Matlab svd ©2022 Carlos Guestrin
  • 23. CS229: Machine Learning 23 SVD • Write X = W S VT - X ¬ data matrix, one row per datapoint - W ¬ weight matrix, one row per datapoint – coordinate of xi in eigenspace - S ¬ singular value matrix, diagonal matrix • in our setting each entry is eigenvalue lj - VT ¬ singular vector matrix • in our setting each row is eigenvector vj ©2022 Carlos Guestrin
  • 24. CS229: Machine Learning 24 PCA using SVD algoritm • Start from m by n data matrix X • Recenter: subtract mean from each row of X - Xc ¬ X – X • Call SVD algorithm on Xc – ask for k singular vectors • Principal components: k singular vectors with highest singular values (rows of VT) - Coefficients become: ©2022 Carlos Guestrin
  • 25. CS229: Machine Learning 25 What you need to know • Dimensionality reduction - why and when it’s important • Simple feature selection • Principal component analysis - minimizing reconstruction error - relationship to covariance matrix and eigenvectors - using SVD ©2022 Carlos Guestrin