Principal Component Analysis

Principal Component Analysis
October 9, 2019
Amit Praseed Classiﬁcation October 9, 2019 1 / 24

Feature Extraction
Feature selection involves loss of data due to the loss of features.
Even though care is taken to remove dimensions which are unlikely to
contribute much to data mining, it is still encouraged to retain all the
input data in one way or the other.
So, how to retain all the input data, and reduce the number of dimen-
sions?
Feature Extraction maps the data in a higher dimension feature space
to a lower dimension feature space without much loss of data.
The most common feature extraction technique used is Principal Com-
ponent Analysis (PCA).

The Idea behind PCA

Data as Vectors
A vector is a geometric object
that has magnitude and direc-
tion.
For example for a vector v =
[3 4]T
It has a magnitude of ||v|| =
(32
+ 42
) = 5
And has a direction given by
tan−1 4
3 anticlockwise from
the x-axis.
A unit vector is one which has
magnitude 1 and is often used
to specify directions.
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
x
y

Vector Spaces
A vector space is a collection of vectors, along with their associated
operations such as vector addition, subtraction etc.
For example, all two dimensional vectors are said to belong to a vector
space represented as R2.

Basis of a Vector Space
A basis for a vector space represents a minimal set of vectors from
which all other vectors in the vector space can be generated.
For example, all vectors in R2 can be generated using some linear
combination of [1 0] and [0 1], and hence forms a basis for R2.
For a set of vectors {v1, v2...vn} to form a basis for a vector space V ,
They must be linearly independent.
They must be able to generate all vectors in V ,
No subset of {v1, v2...vn} must be a basis for V
A vector space can have multiple basis vectors. For example, [1 2] and
[1 − 1] also form a basis for R2.

Change of Basis
Often times, vectors need to be converted from one basis to another
for ease of representation or processing. This can be done by specifying
a Change of Basis Matrix.
For example consider the vector space R2 and the standard basis B =
1 0
0 1
. Also consider an alternate basis B =
1 1
2 −1
.
The change of basis matrix from B to B is simply M =
1 1
2 −1
.
The change of basis matrix from B to B is simply M =
1 1
2 −1
−1
.
If the new basis B is orthogonal, then M =
1 1
2 −1
T

Linear Transformations
A linear transformation is a
function from one vector space
to another (or itself) that trans-
forms the vectors in the vector
space.
Linear transformations can be
represented as matrices, for ex-
ample
2 1
1.5 1
represents a
linear transformation.
Linear transformations change
the magnitude and/or direc-
tions of vectors.
2 1
1.5 1
∗
3
4
=
10
8.5
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
x
y

Another Example
M =
2 2
5 −1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
x
y
M =
2 2
5 −1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
x
y

Eigen Vectors and Eigen Values
Eigen vectors for a matrix A representing a linear transformation are
vectors which when aﬀected by A, only change in magnitude, but not
in direction.
Mathematically, a vector v is said to be an eigen vector for a matrix A
iﬀ
A.v = λv
where λ is the magnitude by which the vector v is stretched, and is
called the eigen value for v.
Think of a linear transformation (or a matrix representing it) as
a force, and the eigen vectors represent the direction in which
the force acts.
The eigen values represent the strength of the force.

What is so special about Eigen Vectors?
Eigen vectors are important in a number of ﬁelds because they possess
some important properties.
The eigen vectors of a symmetric matrix are linearly independent or
orthogonal, which means that they form a very convenient basis in
which to represent vectors.
Eigen vectors can also be used to convert a matrix into its diagonal
form, i.e. a square matrix A can be decomposed into the form
A = PDP−1
D = P−1
AP
where P is a square matrix composed of the eigen vectors of A and D
is a diagonal matrix composed of the corresponding eigen values.

Change of Basis and Linear Transformations
If A is a data vector, it can be converted to a new basis by simply
multiplying with the change of basis matrix.
If A is a linear transformation, something more involved is required.
Let A be a linear transformation is a regular canonical basis, and let V
be the new basis. Then Av , the linear transformation in V is given by
Av = V −1
AV

An Example
Let A =
5 −3
2 −2
be a linear transformation in R2 with the canonical
basis.
Let V =
3 1
1 2
be a new basis in R2.
Also let a = 1 3 be a data vector in the canonical basis.
a after applying the linear transformation A is
aA = aA
aA = 1 3
5 2
−3 −2
aA = −4 −4

An Example
A transformed in the new vector space is given by
Av = V −1AV
Av =
3 1
1 2
−1
5 −3
2 −2
3 1
1 2
Av = 1
5
2 −1
−1 3
−1
5 −3
2 −2
3 1
1 2
Av =
4 0
0 −1

An Example
a transformed in the new vector space is given by
av = aV −1
Av = 1
5 1 3
2 −1
−1 3
av = 1
5 −1 8
av transformed by the linear transformation Av is given by
atrans = 1
5 −1 8
4 0
0 −1
atrans = 1
5 −4 −8
Transforming atrans back into the canonical basis
atrans back = atransV
atrans back = 1
5 −4 −8
3 1
1 2
atrans back = −4 −4 = aA
which is the same as the vector originally transformed in the canonical
basis

Coming Back to PCA...
Our basic question is: how to ﬁnd out the new basis in which to
represent our data?
What are the new dimensions/axes/features in our new transformed
data?

The Covariance Matrix
First step in PCA is to compute the covariance matrix for our data.
If X is our input data matrix, then the covariance matrix is given by
Cx =
1
n
XT
X
Think of the covariance matrix as describing the interplay of the forces
of covariance between the diﬀerent variables in the system.
Ideally, we want the variance within each variable to be high, and
covariance between variables to be nearly zero.
In other words, we want our covariance matrix to be a diagonal matrix.

The Maths...
Let our input data matrix be X. We want to convert X into a diﬀerent
basis Y such that the covariance between variables in Y be zero. Let
P be the new orthogonal basis needed.
Y = XPT
Let Cx be the covariance matrix of X and Cy be the covariance matrix
of Y .
Cx =
1
n
XT
X
Cy =
1
n
Y T
Y
Cy =
1
n
(XPT
)T
(XP)
Cy = PT 1
n
XT
X P
Cy = PT
(Cx )P

The Maths...
Our previous result:
Cy = PT
(Cx )P
Ideally we want Cy to be a diagonal matrix, and we know that the
matrix than can diagonalize Cx is its eigen matrix!!!
In other words, P is the eigen matrix of Cx and since Cx is symmetric,
P is orthogonal i.e. P−1 = PT .
So ﬁnally, the new data points in our new basis( which is nothing but
the eigen vectors of our covariance matrix) can be obtained by
Y = XPT
where P is the eigen matrix of the covariance matrix of X.

But wait... we still have n dimensions...
If X is an matrix with n columns, then Cov(X) is also an nxn matrix.
Cov(X) has n eigen vectors and thus Y also has n features...
PCA only recomposes a data space in n dimensions to another datas-
pace of n dimensions.
However, in the new data space, we have an idea of which dimensions
contain the most variance (given by the eigen values).
We can easily retain only the top k eigen values and eigen vectors to
get a new k-dimensional vector space.

Example
Let X =








Student Maths English Art
1 90 60 90
2 90 90 30
3 60 60 60
4 60 60 90
5 30 30 30








The covariance matrix Cx =


504 360 180
360 360 0
180 0 720



Example
To ﬁnd the eigen vectors and the eigen values of Cx , we have to solve
for |Cx − λI| = 0
det




504 − λ 360 180
360 360 − λ 0
180 0 720 − λ



 = 0
Which gives −λ3 + 1584λ2 − 641520λ + 25660800 = 0
Which evaluates to λ = 44.8, 629.11, 910.06 which are the eigen values
for Cx
The eigen vectors correponding to these vectors are

−3.75
4.28
1

 ,


−0.5
−0.675
1

 ,


1.055
0.69
1



Example
We select the two eigen vectors which have the highest eigen values
for building our eigen matrix
P =


1.055 −0.5
0.69 −0.675
1 1


The new data matrix
Y =






90 60 90
90 90 30
60 60 60
60 60 90
30 30 30








1.055 −0.5
0.69 −0.675
1 1


which is in a 2 dimensional data space

Principal Component Analysis

More Related Content

What's hot (20)

Similar to Principal Component Analysis (20)

More from amitpraseed (7)

Recently uploaded (20)

Principal Component Analysis