introduction to machine learning 3c-feature-extraction.pptx

Foundations of Machine Learning
Sudeshna Sarkar
IIT Kharagpur
Module 3: Instance Based Learning and
Feature Reduction
Part C: Feature Extraction

Feature extraction - definition
• Given a set of features 𝐹 = {𝑥1, … , 𝑥𝑁}
the Feature Extraction(“Construction”) problem is
is to map 𝐹 to some feature set 𝐹′′ that maximizes
the learner’s ability to classify patterns

Feature Extraction
• Find a projection matrix w from N-dimensional to M-
dimensional vectors that keeps error low
• Assume that N features are linear combination of M < 𝑁
vectors
𝑧𝑖 = 𝑤𝑖1𝑥𝑖1 + ⋯ + 𝑤𝑖𝑑𝑥𝑖𝑁
𝒛 = 𝑤𝑇
𝒙
• What we expect from such basis
– Uncorrelated, cannot be reduced further
– Have large variance or otherwise bear no information

Geometric picture of principal components (PCs)

Algebraic definition of PCs
.
,
,
2
,
1
,
1
1
1
1 p
j
x
w
x
w
z
N
i
ij
i
j
T



 

  N
p
x
x
x 

,
,
, 2
1 
]
var[ 1
z
Given a sample of p observations on a vector of N variables
define the first principal component of the sample
by the linear transformation
where the vector
is chosen such that is maximum.
)
,
,
,
(
)
,
,
,
(
2
1
1
21
11
1
Nj
j
j
j
N
x
x
x
x
w
w
w
w





PCA
• Choose directions such that a total variance of data
will be maximum
1. Maximize Total Variance
• Choose directions that are orthogonal
2. Minimize correlation
• Choose 𝑀 < 𝑁 orthogonal directions which
maximize total variance

PCA
• 𝑁-dimensional feature space
• N × 𝑁 symmetric covariance matrix estimated from
samples 𝐶𝑜𝑣 𝒙 = Σ
• Select 𝑀 largest eigenvalue of the covariance matrix
and associated 𝑀 eigenvectors
• The first eigenvector will be a direction with largest
variance

PCA for image compression
p=1 p=2 p=4 p=8
p=16 p=32 p=64 p=100
Original
Image

Is PCA a good criterion for classification?
• Data variation
determines the
projection direction
• What’s missing?
– Class information

What is a good projection?
• Similarly, what is a
good criterion?
– Separating different
classes
Two classes
overlap
Two classes are
separated

What class information may be useful?
Between-class distance
• Between-class distance
– Distance between the centroids
of different classes

What class information may be useful?
Within-class distance
• Between-class distance
– Distance between the centroids of
different classes
• Within-class distance
• Accumulated distance of an instance
to the centroid of its class
• Linear discriminant analysis (LDA) finds
most discriminant projection by
• maximizing between-class distance
• and minimizing within-class distance

Linear Discriminant Analysis
• Find a low-dimensional space such that when
𝒙 is projected, classes are well-separated

Means and Scatter after projection

Good Projection
• Means are as far away as possible
• Scatter is small as possible
• Fisher Linear Discriminant
 
 
2
1 2
2 2
1 2
m m
J
s s



w

introduction to machine learning 3c-feature-extraction.pptx

More Related Content

Similar to introduction to machine learning 3c-feature-extraction.pptx (20)

More from Pratik Gohel (20)

Recently uploaded (20)

introduction to machine learning 3c-feature-extraction.pptx