SlideShare a Scribd company logo
Unsupervised Learning:
Dimensionality reduction
AAA-Python Edition
Plan
●
1- Dimensionality reduction
●
2- Some Math
●
3- PCA
●
4- PCA in scikit-learn
●
5- Manifold Learning
●
6- Manifold Examples
3
1-Dimensionality
reduction
[By Amina Delali]
ObjectiveObjective
●
Dimensionality reduction in machine learning is reducing the
number of features of the training dataset.
●
This reduction is necessary to:
➢
Eliminate the noise from the data
➢
Visualize the data in 2 or 3 dimensions
➢
Speed up the learning process
➢
Enhance the learning results by eliminating correlated features.
➢
Eliminate unnecessary features.
➢
Compress the data size.
●
Two main approaches to dimensionality redcution are:
➢
Projection : project the data into a lower dimensional space.
➢
Manifold: suppose that the data in the higher dimension is just
a manifold of a representation of the data in the lower
dimension.
4
1-Dimensionality
reduction
[By Amina Delali]
ProjectionProjection
●
Sometimes the degree of the variation of the data is diferent
from one dimension to an other. So, for some features, the values
can be very diverse, an for others, they can barely change.
●
So we project the data into a lower dimension in order to keep
only the most infuential information ==> we defne a mapping
between the original data from the higher dimension to new data
in a lower dimension.
●
The most used technique to defne this mapping, is PCA (Principal
Component Analysis) and its variations:
➢
Incremental PCA
➢
Randomized PCA
➢
Kernel PCA
5
1-Dimensionality
reduction
[By Amina Delali]
ManifoldManifold
●
Like we said earlier, we make the hypothesis that our data is
created from a manifold of a data in a lower dimension. So,
reducing it to this low dimension is like straightening up this
manifold (or unrolling it).
●
The diferent techniques used, are:
➢
MDS: Multidimensional Scaling. Tries to preserve the distances
between instances.
➢
LLE: Locally Linear Embedding. Tries to preserve the relationship
between a sample and its closets points.
➢
Isomap: the samples will represent nodes of a graph. These
nodes are connected to their closets neighbors. The algorithm
tries to preserve the number of nodes in the shortest path
connecting two nodes.
6
2-SomeMath
[By Amina Delali]
Singular value decompositionSingular value decomposition
● It is the the decomposition of a matrix M (m,n)
into 3 matrices:
U(m,m)
, S(m,n)
, and V(n,n) .
Considering only real values, we have the
following characteristics:
➢
M = U .S . VT
( VT
is the transpose matrix of V : value at i,j
becomes at j,i )
➢ U . UT
= UT
. U = I(m,m)
(the identity matrix)
➢ V . VT
= VT
. VT
= I (n,n)
➢
The diagonal (values with the same row and column indices) of S
are the Singular values of M
➔
Singular values are the square roots of eigenvalues
➔
The other values of S are zeros.
➢
The columns of U are the eigenvectors of M . MT
.
➢ The columns of V are the eigenvectors of MT
. M .
7
2-SomeMath
[By Amina Delali]
Eigenvectors, EigenvaluesEigenvectors, Eigenvalues
● Given A (n,n)
a square matrix:
➔ If A . V(n)
= . V(n)
then: V is an eigenvector and is its
corresponding eigenvalue.
➔
The above equation can be rewritten as follow: (A- I). V= 0
➔
Several can solve the equation. For each lambda value,
an eigenvector is computed.
●
Example:
➢
If
●
Its eigenvalues will be: 1 , 3
●
And their corresponding eigenvectors will be: and
λλ λ
λλ
λ λ
λ
A=[2 1
1 2]
[ 1
−1] [1
1]
8
2-SomeMath
[By Amina Delali]
Standard DeviationStandard Deviation
●
The standard deviation measures how data is spread (or distant
from the mean) . It is the square root of the variance.
●
The variance is computed as follow:
➢
●
And the standard deviation:
●
To project data on new axis, we select the axis that preserve the
maximum possible variance of the data. This way, most of the
information is preserved.
variance=
∑
i=1
N
(xi−μ)2
N
σ
σ=√variance
9
3-PCAinscikit-learn
[By Amina Delali]
DefnitionDefnition
●
It is a linear dimensionality reduction technique that project data
using orthogonal axes (components) that preserve the maximum
variance possible. One of the method used is singular value
decomposition of the mean centered training data.
●
As stated before the decomposition leads to 3 matrices. The
vectors of the matrix VT
will be used to project the data. They are
the “principal components”.
●
Each component will conserve a certain amount of variance. The
variance obtained after projection is the accumulation of the
variances obtained by each component
●
To project, we select a sufcient number of component to preserve
the maximum of variance, then we apply the transformation (the
projection), using only this number of vectors.
●
The number of vectors will determine the dimension of the
projection.
10
3-PCAinscikit-learn
[By Amina Delali]
ExampleExample
●
Center the data to the
mean, before
applying the
decomposition
The
decomposition
To project, we multiply the
centered data by the first
selected component==> we will
have a 3 dimensions projection
11
3-PCAinscikit-learn
[By Amina Delali]
ResultsResults
●
Since our data was
originally labeled (we
don’t use those label for
decomposition), we used
them for colorizing the
data.
And what is obvious, is
that the data is clustered
according to its classes.
Which proofs:
●
that the clustering can
in certain cases classify
data.
●
the decomposition
preserved the most
important amount of
information.
3D projection
2D projection
12
4-ProcessingData
[By Amina Delali]
With matplotlibWith matplotlib
●
It tells to only
center the
data, and to
not
standardize
It will drop all the axis with
variance ratio < minfrac.
In this case, it will only
keep 2 axis.
Same results as in our
previous implementation
13
4-ProcessingData
[By Amina Delali]
With sklearnWith sklearn
●
We have to select
the number of
components before
transforming the
data
Comparing with matplotlib we see
that the directions are inverted
The reason of
this inversion is
that sklearn
flip the
eigenvector’s
sign before the
projection : it
apply the
method
svd_flip on the
vectors U and
V in the fitting
methods
As in matplotlib, we don’t
have to center the data
14
4-ProcessingData
[By Amina Delali]
Explained variance ratioExplained variance ratio
●
The correct number of components can be defned by the
explained variance ratio of each component.
●
It is computed by the value of explained variance divided by the
sum of all variances.
●
The ratio of each component are summed up until a certain
percentage is obtained.
●
The variances can be computed from the square of the singular
values in S
15
5-ManifoldLearning:
LLE
[By Amina Delali]
AlgorithmAlgorithm
●
LLE for Locally Linear Embeeding. The algorithm consist of 3
major steps:
● Step 1 - identifying the neighbors for each sample xi
from
the data X(N,D)
(for N samples and D features) :
➢ Compute the distances of the other samples from xi
➢ Select the k smallest distances.
● Step 2 - for each sample xi
compute its neighbors weights:
➢ Create the matrix Z(k,D)
with the k samples rows from X(N,D)
corresponding to the neighbors of xi
➢ Subtract xi
values from each row of Z(k,D)
➢ Compute C(k,k)
= Z(k,D)
. ZT
(D,k)
(
in the original page it is inverted because of X and Z are transposed)
➢ Compute the row i of the matrix W(N,N)
with:
➔ Compute the weights in the one column vector w(k,1)
that solve
the equation C(k,k)
. w(k,1)
= 1(k,1)
(1 is a column vector with only 1 as
values)
16
5-ManifoldLearning:
LLE
[By Amina Delali]
Algorithm (Suite)Algorithm (Suite)
➔ For the samples j that do not belong to each xi,
neighbors, set
the weights to 0.
➔ For each neighbor b of xi
set the weight to: w(p)
/sum(w(k,1)
). Where p is the indices in w corresponding to the
b neighbor of xi.
●
Step 3 – reduce the dimensionality to d < D in a new matrix
Y(N,d)
:
➢ Compute the matrix M(N,N)
= ( I(N,N)
– W(N,N)
)T
. (I(N,N)
– W(N,N)
)
➢ Select the d+1 eigenvectors of M(N,N)
corresponding to the d+1
smallest eigenvalues. Order these eigenvectors according to the
corresponding eigenvalues sorted in a decreasing order.
➢
For each column q in Y set the values equal to the values of the
q+1 smallest eigenvector counting from the bottom (to discard
the last eigenvector corresponding to the eigenvalue 0)
17
5-ManifoldLearning:
LLE
[By Amina Delali]
ExampleExample
●
N== 1500, D == 3
LLE : k == 12, d == 2
18
6-PolynomialRegression
[By Amina Delali]
AlgorithmAlgorithm
●
There are two types of Multidimensional Scaling: classical (or
metric) that tries to reproduce the original distances. The second
one is non-metric (NMDS) that tries to reproduces only the rank of
the distances.
●
We will describe the algorithm of the classical method using the
euclidean distance:
➢
Compute the distances between all points, and form a matrix of
those distances in a matrix D.
➢
Compute the matrix A as follow: A(i,j) = -1/2 * D(i,j)2
➢
Compute the matrix B as follow: B(i,j)= A(i,j)- A(i,.) - A(.,j) +A(.,.)
where: A(i,.) is the average of all A(i,j) for a selected i
A(.,j) is the average of all A(.,j) for a selected j
A(.,.) is the average of all values of A
➢
Find the p (the new dimension, lesser than the original
dimension ) largest eigenvalues of B:
and their corresponding normalized eigenvectors L1
,L2
, …, L p
so that Li
T
. Li
=
λ1>λ2>...>λp
λi
19
6-PolynomialRegression
[By Amina Delali]
Algorithm (suite)Algorithm (suite)
➢ Form the matrix L as follow: L = (L1
, L2
, …, Lp
). The new values
(coordinates) are the rows of L.
●
This method minimizes the value of the Stress
●
The stress is a measure that can be used to fnd the optimal
lower dimension. It is computed as follow:
●
stress =
●
where: is the matrix of the distances of the new
matrix L
➢
A stress with a value < 0.05 is acceptable, below 0.01 is
considered to be good.
√
∑
i< j
(D(i, j)−Δ(i, j))2
∑
i< j
D(i , j)
2
Δ
20
6-PolynomialRegression
[By Amina Delali]
Example in Scikit-learnExample in Scikit-learn
●
The results are completely
different from the previous
manifold technique. We
see here, the goal is to
keep the same original
distances values as much
as possible.
References
●
Aurélien Géron. Hands-on machine learning with Scikit-Learn and
Tensor-Flow: concepts, tools, and techniques to build intelligent
systems. O’Reilly Media, Inc, 2017.
●
J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In
Science & Engineering, 9(3):90–95, 2007.
●
NCSS Statistical Software. Multidimensional Scaling, ncss, llc edition.
●
Scikit-learn.org. scikit-learn, machine learning in python. On-line at
https://scikit-learn.org/stable/. Accessed on 03-11-2018.
●
Jake VanderPlas. Python data science handbook: essential tools for
working with data. O’Reilly Media, Inc, 2017.
●
web.mit.edu. Singular value decomposition (svd) tutorial. On-line at
https://web.mit.edu/be.400/www/SVD/Singular_Value_Decompositio
n.htm. Accessed on 28-12-2018.
●
wikipedia.org. Wikipedia, the free encyclopedia. On-line at
https://www.wikipedia.org/. Accessed on 25-12-2018.
Thank
you!
FOR ALL YOUR TIME

More Related Content

PPTX
PDF
Principal Components Analysis, Calculation and Visualization
PPT
Understandig PCA and LDA
PPTX
PCA (Principal component analysis) Theory and Toolkits
PPTX
Introduction to Neural Netwoks
PDF
Tensor representations in signal processing and machine learning (tutorial ta...
PDF
Machine learning in science and industry — day 4
PDF
Designing a Minimum Distance classifier to Class Mean Classifier
Principal Components Analysis, Calculation and Visualization
Understandig PCA and LDA
PCA (Principal component analysis) Theory and Toolkits
Introduction to Neural Netwoks
Tensor representations in signal processing and machine learning (tutorial ta...
Machine learning in science and industry — day 4
Designing a Minimum Distance classifier to Class Mean Classifier

What's hot (15)

PDF
Implementing Minimum Error Rate Classifier
PPTX
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
PDF
505 260-266
PDF
Nonnegative Matrix Factorization
PPT
PPT
Machine Learning and Statistical Analysis
ODP
Svm V SVC
PDF
Implementation of K-Nearest Neighbor Algorithm
PDF
K-means and GMM
PDF
Performance evaluation of ds cdma
PPTX
Deep learning paper review ppt sourece -Direct clr
PDF
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Clustering tutorial
PPTX
Face Recognition using PCA-Principal Component Analysis using MATLAB
Implementing Minimum Error Rate Classifier
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
505 260-266
Nonnegative Matrix Factorization
Machine Learning and Statistical Analysis
Svm V SVC
Implementation of K-Nearest Neighbor Algorithm
K-means and GMM
Performance evaluation of ds cdma
Deep learning paper review ppt sourece -Direct clr
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
International Journal of Computational Engineering Research(IJCER)
Clustering tutorial
Face Recognition using PCA-Principal Component Analysis using MATLAB
Ad

Similar to Aaa ped-17-Unsupervised Learning: Dimensionality reduction (20)

PPTX
machine learning.pptx
PDF
Lecture7 xing fei-fei
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PDF
30thSep2014
PDF
Machine Learning Foundations for Professional Managers
PDF
Nonlinear dimension reduction
PPTX
ML unit2.pptx
PDF
Dimensionality Reduction
PDF
Survey on Feature Selection and Dimensionality Reduction Techniques
PPT
PDF
Dimensionality reduction
PPT
SVM (2).ppt
PDF
Machine learning (11)
PPT
Introduction to Support Vector Machine 221 CMU.ppt
PDF
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
machine learning.pptx
Lecture7 xing fei-fei
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
30thSep2014
Machine Learning Foundations for Professional Managers
Nonlinear dimension reduction
ML unit2.pptx
Dimensionality Reduction
Survey on Feature Selection and Dimensionality Reduction Techniques
Dimensionality reduction
SVM (2).ppt
Machine learning (11)
Introduction to Support Vector Machine 221 CMU.ppt
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
Ad

More from AminaRepo (20)

PDF
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
PDF
Aaa ped-22-Artificial Neural Network: Introduction to ANN
PDF
Aaa ped-21-Recommender Systems: Content-based Filtering
PDF
Aaa ped-20-Recommender Systems: Model-based collaborative filtering
PDF
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
PDF
Aaa ped-18-Unsupervised Learning: Association Rule Learning
PDF
Aaa ped-16-Unsupervised Learning: clustering
PDF
Aaa ped-15-Ensemble Learning: Random Forests
PDF
Aaa ped-14-Ensemble Learning: About Ensemble Learning
PDF
Aaa ped-12-Supervised Learning: Support Vector Machines & Naive Bayes Classifer
PDF
Aaa ped-11-Supervised Learning: Multivariable Regressor & Classifers
PDF
Aaa ped-10-Supervised Learning: Introduction to Supervised Learning
PDF
Aaa ped-9-Data manipulation: Time Series & Geographical visualization
PDF
Aaa ped-Data-8- manipulation: Plotting and Visualization
PDF
Aaa ped-8- Data manipulation: Data wrangling, aggregation, and group operations
PDF
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
PDF
Aaa ped-5-Data manipulation: Pandas
PDF
Aaa ped-4- Data manipulation: Numpy
PDF
Aaa ped-3. Pythond: advanced concepts
PDF
Aaa ped-2- Python: Basics
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-20-Recommender Systems: Model-based collaborative filtering
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-18-Unsupervised Learning: Association Rule Learning
Aaa ped-16-Unsupervised Learning: clustering
Aaa ped-15-Ensemble Learning: Random Forests
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-12-Supervised Learning: Support Vector Machines & Naive Bayes Classifer
Aaa ped-11-Supervised Learning: Multivariable Regressor & Classifers
Aaa ped-10-Supervised Learning: Introduction to Supervised Learning
Aaa ped-9-Data manipulation: Time Series & Geographical visualization
Aaa ped-Data-8- manipulation: Plotting and Visualization
Aaa ped-8- Data manipulation: Data wrangling, aggregation, and group operations
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Aaa ped-5-Data manipulation: Pandas
Aaa ped-4- Data manipulation: Numpy
Aaa ped-3. Pythond: advanced concepts
Aaa ped-2- Python: Basics

Recently uploaded (20)

PDF
An interstellar mission to test astrophysical black holes
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Sciences of Europe No 170 (2025)
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
protein biochemistry.ppt for university classes
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
An interstellar mission to test astrophysical black holes
TOTAL hIP ARTHROPLASTY Presentation.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
2Systematics of Living Organisms t-.pptx
Sciences of Europe No 170 (2025)
Biophysics 2.pdffffffffffffffffffffffffff
Phytochemical Investigation of Miliusa longipes.pdf
protein biochemistry.ppt for university classes
Classification Systems_TAXONOMY_SCIENCE8.pptx
Cell Membrane: Structure, Composition & Functions
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Comparative Structure of Integument in Vertebrates.pptx
ECG_Course_Presentation د.محمد صقران ppt
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
HPLC-PPT.docx high performance liquid chromatography
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
bbec55_b34400a7914c42429908233dbd381773.pdf

Aaa ped-17-Unsupervised Learning: Dimensionality reduction

  • 2. Plan ● 1- Dimensionality reduction ● 2- Some Math ● 3- PCA ● 4- PCA in scikit-learn ● 5- Manifold Learning ● 6- Manifold Examples
  • 3. 3 1-Dimensionality reduction [By Amina Delali] ObjectiveObjective ● Dimensionality reduction in machine learning is reducing the number of features of the training dataset. ● This reduction is necessary to: ➢ Eliminate the noise from the data ➢ Visualize the data in 2 or 3 dimensions ➢ Speed up the learning process ➢ Enhance the learning results by eliminating correlated features. ➢ Eliminate unnecessary features. ➢ Compress the data size. ● Two main approaches to dimensionality redcution are: ➢ Projection : project the data into a lower dimensional space. ➢ Manifold: suppose that the data in the higher dimension is just a manifold of a representation of the data in the lower dimension.
  • 4. 4 1-Dimensionality reduction [By Amina Delali] ProjectionProjection ● Sometimes the degree of the variation of the data is diferent from one dimension to an other. So, for some features, the values can be very diverse, an for others, they can barely change. ● So we project the data into a lower dimension in order to keep only the most infuential information ==> we defne a mapping between the original data from the higher dimension to new data in a lower dimension. ● The most used technique to defne this mapping, is PCA (Principal Component Analysis) and its variations: ➢ Incremental PCA ➢ Randomized PCA ➢ Kernel PCA
  • 5. 5 1-Dimensionality reduction [By Amina Delali] ManifoldManifold ● Like we said earlier, we make the hypothesis that our data is created from a manifold of a data in a lower dimension. So, reducing it to this low dimension is like straightening up this manifold (or unrolling it). ● The diferent techniques used, are: ➢ MDS: Multidimensional Scaling. Tries to preserve the distances between instances. ➢ LLE: Locally Linear Embedding. Tries to preserve the relationship between a sample and its closets points. ➢ Isomap: the samples will represent nodes of a graph. These nodes are connected to their closets neighbors. The algorithm tries to preserve the number of nodes in the shortest path connecting two nodes.
  • 6. 6 2-SomeMath [By Amina Delali] Singular value decompositionSingular value decomposition ● It is the the decomposition of a matrix M (m,n) into 3 matrices: U(m,m) , S(m,n) , and V(n,n) . Considering only real values, we have the following characteristics: ➢ M = U .S . VT ( VT is the transpose matrix of V : value at i,j becomes at j,i ) ➢ U . UT = UT . U = I(m,m) (the identity matrix) ➢ V . VT = VT . VT = I (n,n) ➢ The diagonal (values with the same row and column indices) of S are the Singular values of M ➔ Singular values are the square roots of eigenvalues ➔ The other values of S are zeros. ➢ The columns of U are the eigenvectors of M . MT . ➢ The columns of V are the eigenvectors of MT . M .
  • 7. 7 2-SomeMath [By Amina Delali] Eigenvectors, EigenvaluesEigenvectors, Eigenvalues ● Given A (n,n) a square matrix: ➔ If A . V(n) = . V(n) then: V is an eigenvector and is its corresponding eigenvalue. ➔ The above equation can be rewritten as follow: (A- I). V= 0 ➔ Several can solve the equation. For each lambda value, an eigenvector is computed. ● Example: ➢ If ● Its eigenvalues will be: 1 , 3 ● And their corresponding eigenvectors will be: and λλ λ λλ λ λ λ A=[2 1 1 2] [ 1 −1] [1 1]
  • 8. 8 2-SomeMath [By Amina Delali] Standard DeviationStandard Deviation ● The standard deviation measures how data is spread (or distant from the mean) . It is the square root of the variance. ● The variance is computed as follow: ➢ ● And the standard deviation: ● To project data on new axis, we select the axis that preserve the maximum possible variance of the data. This way, most of the information is preserved. variance= ∑ i=1 N (xi−μ)2 N σ σ=√variance
  • 9. 9 3-PCAinscikit-learn [By Amina Delali] DefnitionDefnition ● It is a linear dimensionality reduction technique that project data using orthogonal axes (components) that preserve the maximum variance possible. One of the method used is singular value decomposition of the mean centered training data. ● As stated before the decomposition leads to 3 matrices. The vectors of the matrix VT will be used to project the data. They are the “principal components”. ● Each component will conserve a certain amount of variance. The variance obtained after projection is the accumulation of the variances obtained by each component ● To project, we select a sufcient number of component to preserve the maximum of variance, then we apply the transformation (the projection), using only this number of vectors. ● The number of vectors will determine the dimension of the projection.
  • 10. 10 3-PCAinscikit-learn [By Amina Delali] ExampleExample ● Center the data to the mean, before applying the decomposition The decomposition To project, we multiply the centered data by the first selected component==> we will have a 3 dimensions projection
  • 11. 11 3-PCAinscikit-learn [By Amina Delali] ResultsResults ● Since our data was originally labeled (we don’t use those label for decomposition), we used them for colorizing the data. And what is obvious, is that the data is clustered according to its classes. Which proofs: ● that the clustering can in certain cases classify data. ● the decomposition preserved the most important amount of information. 3D projection 2D projection
  • 12. 12 4-ProcessingData [By Amina Delali] With matplotlibWith matplotlib ● It tells to only center the data, and to not standardize It will drop all the axis with variance ratio < minfrac. In this case, it will only keep 2 axis. Same results as in our previous implementation
  • 13. 13 4-ProcessingData [By Amina Delali] With sklearnWith sklearn ● We have to select the number of components before transforming the data Comparing with matplotlib we see that the directions are inverted The reason of this inversion is that sklearn flip the eigenvector’s sign before the projection : it apply the method svd_flip on the vectors U and V in the fitting methods As in matplotlib, we don’t have to center the data
  • 14. 14 4-ProcessingData [By Amina Delali] Explained variance ratioExplained variance ratio ● The correct number of components can be defned by the explained variance ratio of each component. ● It is computed by the value of explained variance divided by the sum of all variances. ● The ratio of each component are summed up until a certain percentage is obtained. ● The variances can be computed from the square of the singular values in S
  • 15. 15 5-ManifoldLearning: LLE [By Amina Delali] AlgorithmAlgorithm ● LLE for Locally Linear Embeeding. The algorithm consist of 3 major steps: ● Step 1 - identifying the neighbors for each sample xi from the data X(N,D) (for N samples and D features) : ➢ Compute the distances of the other samples from xi ➢ Select the k smallest distances. ● Step 2 - for each sample xi compute its neighbors weights: ➢ Create the matrix Z(k,D) with the k samples rows from X(N,D) corresponding to the neighbors of xi ➢ Subtract xi values from each row of Z(k,D) ➢ Compute C(k,k) = Z(k,D) . ZT (D,k) ( in the original page it is inverted because of X and Z are transposed) ➢ Compute the row i of the matrix W(N,N) with: ➔ Compute the weights in the one column vector w(k,1) that solve the equation C(k,k) . w(k,1) = 1(k,1) (1 is a column vector with only 1 as values)
  • 16. 16 5-ManifoldLearning: LLE [By Amina Delali] Algorithm (Suite)Algorithm (Suite) ➔ For the samples j that do not belong to each xi, neighbors, set the weights to 0. ➔ For each neighbor b of xi set the weight to: w(p) /sum(w(k,1) ). Where p is the indices in w corresponding to the b neighbor of xi. ● Step 3 – reduce the dimensionality to d < D in a new matrix Y(N,d) : ➢ Compute the matrix M(N,N) = ( I(N,N) – W(N,N) )T . (I(N,N) – W(N,N) ) ➢ Select the d+1 eigenvectors of M(N,N) corresponding to the d+1 smallest eigenvalues. Order these eigenvectors according to the corresponding eigenvalues sorted in a decreasing order. ➢ For each column q in Y set the values equal to the values of the q+1 smallest eigenvector counting from the bottom (to discard the last eigenvector corresponding to the eigenvalue 0)
  • 18. 18 6-PolynomialRegression [By Amina Delali] AlgorithmAlgorithm ● There are two types of Multidimensional Scaling: classical (or metric) that tries to reproduce the original distances. The second one is non-metric (NMDS) that tries to reproduces only the rank of the distances. ● We will describe the algorithm of the classical method using the euclidean distance: ➢ Compute the distances between all points, and form a matrix of those distances in a matrix D. ➢ Compute the matrix A as follow: A(i,j) = -1/2 * D(i,j)2 ➢ Compute the matrix B as follow: B(i,j)= A(i,j)- A(i,.) - A(.,j) +A(.,.) where: A(i,.) is the average of all A(i,j) for a selected i A(.,j) is the average of all A(.,j) for a selected j A(.,.) is the average of all values of A ➢ Find the p (the new dimension, lesser than the original dimension ) largest eigenvalues of B: and their corresponding normalized eigenvectors L1 ,L2 , …, L p so that Li T . Li = λ1>λ2>...>λp λi
  • 19. 19 6-PolynomialRegression [By Amina Delali] Algorithm (suite)Algorithm (suite) ➢ Form the matrix L as follow: L = (L1 , L2 , …, Lp ). The new values (coordinates) are the rows of L. ● This method minimizes the value of the Stress ● The stress is a measure that can be used to fnd the optimal lower dimension. It is computed as follow: ● stress = ● where: is the matrix of the distances of the new matrix L ➢ A stress with a value < 0.05 is acceptable, below 0.01 is considered to be good. √ ∑ i< j (D(i, j)−Δ(i, j))2 ∑ i< j D(i , j) 2 Δ
  • 20. 20 6-PolynomialRegression [By Amina Delali] Example in Scikit-learnExample in Scikit-learn ● The results are completely different from the previous manifold technique. We see here, the goal is to keep the same original distances values as much as possible.
  • 21. References ● Aurélien Géron. Hands-on machine learning with Scikit-Learn and Tensor-Flow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc, 2017. ● J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering, 9(3):90–95, 2007. ● NCSS Statistical Software. Multidimensional Scaling, ncss, llc edition. ● Scikit-learn.org. scikit-learn, machine learning in python. On-line at https://scikit-learn.org/stable/. Accessed on 03-11-2018. ● Jake VanderPlas. Python data science handbook: essential tools for working with data. O’Reilly Media, Inc, 2017. ● web.mit.edu. Singular value decomposition (svd) tutorial. On-line at https://web.mit.edu/be.400/www/SVD/Singular_Value_Decompositio n.htm. Accessed on 28-12-2018. ● wikipedia.org. Wikipedia, the free encyclopedia. On-line at https://www.wikipedia.org/. Accessed on 25-12-2018.