SlideShare a Scribd company logo
4
Most read
13
Most read
14
Most read
Foundations of Machine Learning
Sudeshna Sarkar
IIT Kharagpur
Module 3: Instance Based Learning and
Feature Reduction
Part C: Feature Extraction
Feature extraction - definition
• Given a set of features 𝐹 = {𝑥1, … , 𝑥𝑁}
the Feature Extraction(“Construction”) problem is
is to map 𝐹 to some feature set 𝐹′′ that maximizes
the learner’s ability to classify patterns
Feature Extraction
• Find a projection matrix w from N-dimensional to M-
dimensional vectors that keeps error low
• Assume that N features are linear combination of M < 𝑁
vectors
𝑧𝑖 = 𝑤𝑖1𝑥𝑖1 + ⋯ + 𝑤𝑖𝑑𝑥𝑖𝑁
𝒛 = 𝑤𝑇
𝒙
• What we expect from such basis
– Uncorrelated, cannot be reduced further
– Have large variance or otherwise bear no information
Geometric picture of principal components (PCs)
Geometric picture of principal components (PCs)
Geometric picture of principal components (PCs)
Algebraic definition of PCs
.
,
,
2
,
1
,
1
1
1
1 p
j
x
w
x
w
z
N
i
ij
i
j
T



 

  N
p
x
x
x 

,
,
, 2
1 
]
var[ 1
z
Given a sample of p observations on a vector of N variables
define the first principal component of the sample
by the linear transformation
where the vector
is chosen such that is maximum.
)
,
,
,
(
)
,
,
,
(
2
1
1
21
11
1
Nj
j
j
j
N
x
x
x
x
w
w
w
w




PCA
PCA
• Choose directions such that a total variance of data
will be maximum
1. Maximize Total Variance
• Choose directions that are orthogonal
2. Minimize correlation
• Choose 𝑀 < 𝑁 orthogonal directions which
maximize total variance
PCA
• 𝑁-dimensional feature space
• N × 𝑁 symmetric covariance matrix estimated from
samples 𝐶𝑜𝑣 𝒙 = Σ
• Select 𝑀 largest eigenvalue of the covariance matrix
and associated 𝑀 eigenvectors
• The first eigenvector will be a direction with largest
variance
PCA for image compression
p=1 p=2 p=4 p=8
p=16 p=32 p=64 p=100
Original
Image
Is PCA a good criterion for classification?
• Data variation
determines the
projection direction
• What’s missing?
– Class information
What is a good projection?
• Similarly, what is a
good criterion?
– Separating different
classes
Two classes
overlap
Two classes are
separated
What class information may be useful?
Between-class distance
• Between-class distance
– Distance between the centroids
of different classes
What class information may be useful?
Within-class distance
• Between-class distance
– Distance between the centroids of
different classes
• Within-class distance
• Accumulated distance of an instance
to the centroid of its class
• Linear discriminant analysis (LDA) finds
most discriminant projection by
• maximizing between-class distance
• and minimizing within-class distance
Linear Discriminant Analysis
• Find a low-dimensional space such that when
𝒙 is projected, classes are well-separated
Means and Scatter after projection
Good Projection
• Means are as far away as possible
• Scatter is small as possible
• Fisher Linear Discriminant
 
 
2
1 2
2 2
1 2
m m
J
s s



w
Thank You

More Related Content

PDF
Machine learning meetup
PPTX
Automated attendance system based on facial recognition
PPT
Understandig PCA and LDA
PPTX
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
PPTX
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
PDF
Machine Learning Algorithms Introduction.pdf
PPTX
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
Machine learning meetup
Automated attendance system based on facial recognition
Understandig PCA and LDA
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
Machine Learning Algorithms Introduction.pdf
EDAB - Principal Components Analysis and Classification -Module - 5.pptx

Similar to introduction to machine learning 3c-feature-extraction.pptx (20)

PPTX
Deep learning from mashine learning AI..
PPTX
Mini_Project
PPT
Image classification, remote sensing, P K MANI
PPTX
PPTX
background.pptx
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
PPTX
Lect4 principal component analysis-I
PDF
Machine Learning Notes for beginners ,Step by step
PDF
Pattern Recognition 21BR551 MODULE 03 NOTES.pdf
PPTX
introduction to Statistical Theory.pptx
PPTX
Digital Image Classification.pptx
PDF
Covariance.pdf
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
PPTX
ML SFCSE.pptx
PPTX
04 Classification in Data Mining
PPTX
Data Science and Machine Learning with Tensorflow
PPT
Supervised and unsupervised learning
PPTX
Week 13 Feature Selection Computer Vision Bagian 2
PPT
Deep learning from mashine learning AI..
Mini_Project
Image classification, remote sensing, P K MANI
background.pptx
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
Lect4 principal component analysis-I
Machine Learning Notes for beginners ,Step by step
Pattern Recognition 21BR551 MODULE 03 NOTES.pdf
introduction to Statistical Theory.pptx
Digital Image Classification.pptx
Covariance.pdf
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
ML SFCSE.pptx
04 Classification in Data Mining
Data Science and Machine Learning with Tensorflow
Supervised and unsupervised learning
Week 13 Feature Selection Computer Vision Bagian 2
Ad

More from Pratik Gohel (20)

PPTX
introduction to ARM C programming language
PPT
EDGEDETECTION algorithm and theory for image processing
PPT
Introduction to Asic Design and VLSI Design
PPT
Information and Theory coding Lecture 18
PPT
introduction to machine learning 3c.pptx
PPTX
introduction to machine learning 3d-collab-filtering.pptx
PPT
13486500-FFT.ppt
PPTX
Introduction to embedded System.pptx
PPT
Lecture 3.ppt
PPT
710402_Lecture 1.ppt
PPTX
UNIT-2.pptx
PPTX
Interdependencies of IoT and cloud computing.pptx
PDF
Chapter1.pdf
PPTX
6-IoT protocol.pptx
PPTX
IOT gateways.pptx
PPTX
AVRTIMER.pptx
PPTX
C Programming for ARM.pptx
PPTX
ARM Introduction.pptx
PDF
PPT
machine learning.ppt
introduction to ARM C programming language
EDGEDETECTION algorithm and theory for image processing
Introduction to Asic Design and VLSI Design
Information and Theory coding Lecture 18
introduction to machine learning 3c.pptx
introduction to machine learning 3d-collab-filtering.pptx
13486500-FFT.ppt
Introduction to embedded System.pptx
Lecture 3.ppt
710402_Lecture 1.ppt
UNIT-2.pptx
Interdependencies of IoT and cloud computing.pptx
Chapter1.pdf
6-IoT protocol.pptx
IOT gateways.pptx
AVRTIMER.pptx
C Programming for ARM.pptx
ARM Introduction.pptx
machine learning.ppt
Ad

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
web development for engineering and engineering
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
DOCX
573137875-Attendance-Management-System-original
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Digital Logic Computer Design lecture notes
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Internet of Things (IOT) - A guide to understanding
Arduino robotics embedded978-1-4302-3184-4.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
web development for engineering and engineering
UNIT-1 - COAL BASED THERMAL POWER PLANTS
OOP with Java - Java Introduction (Basics)
Structs to JSON How Go Powers REST APIs.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
573137875-Attendance-Management-System-original
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding

introduction to machine learning 3c-feature-extraction.pptx

  • 1. Foundations of Machine Learning Sudeshna Sarkar IIT Kharagpur Module 3: Instance Based Learning and Feature Reduction Part C: Feature Extraction
  • 2. Feature extraction - definition • Given a set of features 𝐹 = {𝑥1, … , 𝑥𝑁} the Feature Extraction(“Construction”) problem is is to map 𝐹 to some feature set 𝐹′′ that maximizes the learner’s ability to classify patterns
  • 3. Feature Extraction • Find a projection matrix w from N-dimensional to M- dimensional vectors that keeps error low • Assume that N features are linear combination of M < 𝑁 vectors 𝑧𝑖 = 𝑤𝑖1𝑥𝑖1 + ⋯ + 𝑤𝑖𝑑𝑥𝑖𝑁 𝒛 = 𝑤𝑇 𝒙 • What we expect from such basis – Uncorrelated, cannot be reduced further – Have large variance or otherwise bear no information
  • 4. Geometric picture of principal components (PCs)
  • 5. Geometric picture of principal components (PCs)
  • 6. Geometric picture of principal components (PCs)
  • 7. Algebraic definition of PCs . , , 2 , 1 , 1 1 1 1 p j x w x w z N i ij i j T         N p x x x   , , , 2 1  ] var[ 1 z Given a sample of p observations on a vector of N variables define the first principal component of the sample by the linear transformation where the vector is chosen such that is maximum. ) , , , ( ) , , , ( 2 1 1 21 11 1 Nj j j j N x x x x w w w w    
  • 8. PCA
  • 9. PCA • Choose directions such that a total variance of data will be maximum 1. Maximize Total Variance • Choose directions that are orthogonal 2. Minimize correlation • Choose 𝑀 < 𝑁 orthogonal directions which maximize total variance
  • 10. PCA • 𝑁-dimensional feature space • N × 𝑁 symmetric covariance matrix estimated from samples 𝐶𝑜𝑣 𝒙 = Σ • Select 𝑀 largest eigenvalue of the covariance matrix and associated 𝑀 eigenvectors • The first eigenvector will be a direction with largest variance
  • 11. PCA for image compression p=1 p=2 p=4 p=8 p=16 p=32 p=64 p=100 Original Image
  • 12. Is PCA a good criterion for classification? • Data variation determines the projection direction • What’s missing? – Class information
  • 13. What is a good projection? • Similarly, what is a good criterion? – Separating different classes Two classes overlap Two classes are separated
  • 14. What class information may be useful? Between-class distance • Between-class distance – Distance between the centroids of different classes
  • 15. What class information may be useful? Within-class distance • Between-class distance – Distance between the centroids of different classes • Within-class distance • Accumulated distance of an instance to the centroid of its class • Linear discriminant analysis (LDA) finds most discriminant projection by • maximizing between-class distance • and minimizing within-class distance
  • 16. Linear Discriminant Analysis • Find a low-dimensional space such that when 𝒙 is projected, classes are well-separated
  • 17. Means and Scatter after projection
  • 18. Good Projection • Means are as far away as possible • Scatter is small as possible • Fisher Linear Discriminant     2 1 2 2 2 1 2 m m J s s    w