SlideShare a Scribd company logo
Machine Learning : Feature Engineering
Dr.M.Pyingkodi
Dept of MCA
Kongu Engineering College
Erode, Tamilnadu, India
Basics of Feature Engineering(FE)
• the process of translating a data set into features such that these features are able to represent
the data set more effectively and result in a better learning performance.
• refers to manipulation — addition, deletion, combination, mutation — of your data set to improve
machine learning model training, leading to better performance and greater accuracy.
• a part of the preparatory activities
• It is responsible for taking raw input data and converting that to well-aligned features which are
ready to be used by the machine learning models.
• FE encapsulates various data engineering techniques such as selecting relevant features, handling
missing data, encoding the data, and normalizing it.
• It has two major elements
❖ feature transformation
❖ feature subset selection
Feature
• A feature is an attribute of a data set that is used in a machine learning process.
• selection of the subset of features which are meaningful for machine learning is a sub-area of
feature engineering.
• The features in a data set are also called its dimensions.
• a data set having ‘n’ features is called an n-dimensional data set.
Class variable
- Species
predictor variables
5 dimensional dataset
Feature transformation
Apply a mathematical formula to a particular column(feature) and transform the values which are
useful for our further analysis.
Creating new features from existing features that may help in improving the model performance.
A set of features (m) to create a new feature set (n) while retaining as much information as possible
It has two major elements:
1. feature construction
2. feature extraction
Both are sometimes known as feature discovery
There are two distinct goals of feature transformation:
1. Achieving best reconstruction of the original features in the data set
2. Achieving highest efficiency in the learning task
Feature Construction
Involves transforming a given set of input features to generate a new set of more powerful features.
The data set has three features –
apartment length,
apartment breadth, and
price of the apartment.
If it is used as an input to a regression problem, such data can be training data for the regression model.
So given the training data, the model should be able to predict the price of an apartment whose price is not
known or which has just come up for sale.
However, instead of using length and breadth of the apartment as a predictor, it is much convenient and
makes more sense to use the area of the apartment, which is not an existing feature of the data set.
So such a feature, namely apartment area, can be added to the data set.
In other words, we transform the three- dimensional data set to a four-dimensional data set,
with the newly ‘discovered’ feature apartment area being added to the original data set.
Feature Construction
• when features have categorical value and machine learning needs numeric value inputs
• when features having numeric (continuous) values and need to be converted to ordinal values
• when text-specific feature construction needs to be done
Ordinal data are discrete integers that can be ranked or sorted
Encoding categorical (nominal) variables
FIG. 4.3 Feature construction (encoding nominal variables)
Encoding categorical (ordinal) variables
• The grade is an ordinal variable
with values A, B, C, and D.
• To transform this variable to a
numeric variable, we can create
a feature num_grade mapping a
numeric value against each
ordinal value.
• mapped to values 1, 2, 3, and 4
in the transformed variable
Transforming numeric (continuous) features to
categorical features
• As a real estate price category prediction,
which is a classification problem.
• In that case, we can ‘bin’ the numerical data
into multiple categories based on the data
range.
• In the context of the real estate price
prediction example, the original data set has a
numerical feature apartment_price
Text-specific feature construction
Text is arguably the most predominant medium of communication.
Text mining is an important area of research
unstructured nature of the data
All machine learning models need numerical data as input.
So the text data in the data sets need to be transformed into numerical features
EX:
Facebook or micro-blogging channels like Twitter or emails or short messaging services such as
Whatsapp, Text plays a major role in the flow of information.
Vectorization:
Turning text into vectors/ arrays
To turn text to integer (or boolean, or floating numbers) vectors.
vectors are lists with n positions.
Vectorization
I want to turn my text into data.
One-hot encoding for “I want to turn my text into data”
One-hot encoding for “I want my data”.
One hot encoding only treats values as “present” and “not present”.
Three major steps
1. Tokenize
In order to tokenize a corpus, the blank spaces and punctuations are used as delimiters to separate
out the words, or tokens
A corpus is a collection of authentic text or audio organized into datasets.
2. Count
Then the number of occurrences of each token is counted, for each document.
3. Normalize
Tokens are weighted with reducing importance when they occur in the majority of the documents.
A matrix is then formed with each token representing a column and a specific document of the
corpus representing each row.
Each cell contains the count of occurrence of the token in a specific document.
This matrix is known as a document-term matrix / term- document matrix
Typical document- term matrix which forms an
input to a machine learning model.
Feature Extraction
New features are created from a combination of original features.
Operators for combining the original features include
1. For Boolean features:
Conjunctions, Disjunctions, Negation, etc.
2. For nominal features:
Cartesian product, M of N, etc.
3. For numerical features:
Min, Max, Addition, Subtraction,Multiplication, Division, Average, Equivalence, Inequality, etc.
Feature Extraction
Principal Component Analysis
Dimensionality-reduction method.
has multiple attributes or dimensions – many of which might have similarity with each other.
EX: If the height is more, generally weight is more and vice versa
In PCA, a new set of features are extracted from the original features which are quite dissimilar in nature.
transforming a large set of variables into a smaller one.
reduce the number of variables of a data set, while preserving as much information as possible.
Objective:
1.The new features are distinct, i.e. the covariance between the new features, i.e. the principal components is
0.
2. The principal components are generated in order of the variability in the data that it captures. Hence, the
first principal component should capture the maximum variability, the second principal component should
capture the next highest variability etc.
3. The sum of variance of the new features or the principal components should be equal to the sum of
variance of the original features.
Principal Component Analysis
converts the observations of correlated features into a set of linearly uncorrelated features with the
help of orthogonal transformation
These new transformed features are called the Principal Components.
it contains the important variables and drops the least important variable.
Examples:
image processing, movie recommendation system, optimizing the power allocation in various
communication channels.
Eigen vectors are the principal components of the data set.
Eigenvectors and values exist in pairs: every eigen vector has a corresponding eigenvalue.
Eigen vector is the direction of the line (vertical, horizontal, 45 degrees etc.).
An eigenvalue is a number, telling you how much variance there is in the data in that direction,
Eigenvalue is a number telling you how spread out the data is on the line.
PCA algorithm Terms
Dimensionality
It is the number of features or variables present in the given dataset.
Correlation
It signifies that how strongly two variables are related to each other. Such as if one changes, the other
variable also gets changed. The correlation value ranges from -1 to +1.
Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
Orthogonal
It defines that variables are not correlated to each other, and hence the correlation between the pair of
variables is zero.
Eigenvectors
column vector.
Eigenvalue can be referred to as the strength of the transformation
Covariance Matrix
A matrix containing the covariance between the pair of variables is called the Covariance Matrix.
Steps for PCA algorithm
1. Standardizing the data
those variables with larger ranges will dominate over those with small ranges (For example, a
variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and
2. Covariance Matrix Calculation
how the variables of the input data set are varying from the mean with respect to each other.
if positive then : the two variables increase or decrease together (correlated)
if negative then : One increases when the other decreases (Inversely correlated)
Steps for PCA algorithm
3.Compute The Eigenvectors & Eigenvalues of The Covariance Matrix To Identify The Principal
Components
Principal components are new variables that are constructed as linear combinations or mixtures of
the initial variables.
These combinations are done in such a way that the new variables (i.e., principal components) are
uncorrelated and most of the information within the initial variables is squeezed or compressed
into the first components.
principal components represent the directions of the data that explain a maximal amount of
variance
4. The eigenvector having the next highest eigenvalue represents the direction in which data has
the highest remaining variance and also orthogonal to the first direction. So this helps in
identifying the second principal component.
5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues so as to get the ‘k’ principal
components.
eigenvectors of the Covariance matrix are actually the directions of the axes where there is the
most variance(most information) and that we call Principal Components.
eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of
variance carried in each Principal Component.
Steps for PCA algorithm
• a matrix is a factorization of that matrix into three matrices.
• linear transformations
SVD of a matrix A (m × n) is a factorization of the form:
• two orthogonal matrices U and V and diagonal matrix D.
• where, U and V are orthonormal matrices
• U is an m × m unitary matrix,
• V is an n × n unitary matrix and
• Σ is an m × n rectangular diagonal matrix.
• The diagonal entries of Σ are known as singular values of matrix A.
• The columns of U and V are called the left-singular and right-singular vectors of matrix A
• The square roots of these eigenvalues are called singular values.
Singular Value Decomposition
1. Patterns in the attributes are captured by the right-singular vectors, i.e.the columns of V.
2. Patterns among the instances are captured by the left-singular, i.e. the
columns of U.
3. Larger a singular value, larger is the part of the matrix A that it accounts for and its
associated vectors.
4. New data matrix with k’ attributes is obtained using the equation
D = D × [v , v , … , v ]
Thus, the dimensionality gets reduced to k
SVD of a data matrix : properties
PCA, is capture the data set variability.
PCA that calculates eigenvalues of the covariancematrix of the data set
LDA focuses on class separability.
To reduce the number of features to a more manageable number before classification.
commonly used for supervised classification problems
Separating the features based on class separability so as to avoid over-fitting of the machine
learning model.
• LDA calculates eigenvalues and eigenvectors within a class and inter-class scatter matrices.
Linear Discriminant Analysis
1. Calculate the mean vectors for the individual classes.
2. Calculate intra-class and inter-class scatter matrices.
3. Calculate eigenvalues and eigenvectors for S and S , where S is the intra-class scatter matrix and S
is the inter-class scatter matrix
where, mi is the sample mean for each class, m is the overall mean of the data set,
Ni is the sample size of each class
4. Identify the top k’ eigenvectors having top k’ eigenvalues
Linear Discriminant Analysis
where, m is the mean vector of the i-th class

More Related Content

PPTX
Data preprocessing in Machine learning
PDF
Modelling and evaluation
PPT
K means Clustering Algorithm
PPTX
Multidimensional schema of data warehouse
PDF
Feature Engineering in Machine Learning
PDF
Data preprocessing using Machine Learning
PPTX
K-Means Clustering Algorithm.pptx
PPTX
CART – Classification & Regression Trees
Data preprocessing in Machine learning
Modelling and evaluation
K means Clustering Algorithm
Multidimensional schema of data warehouse
Feature Engineering in Machine Learning
Data preprocessing using Machine Learning
K-Means Clustering Algorithm.pptx
CART – Classification & Regression Trees

What's hot (20)

PDF
CS6010 Social Network Analysis Unit I
PPTX
Weak Slot and Filler Structures
DOC
CS8391 Data Structures Part B Questions Anna University
PPTX
Unsupervised learning clustering
PDF
Database recovery techniques
PDF
An introduction to Machine Learning
PPTX
Ensemble methods in machine learning
PPTX
Introduction to Machine Learning
PPT
PPTX
Classification Algorithm.
PDF
Machine Learning + Analytics
PDF
Classification and Clustering
PDF
Logistic regression in Machine Learning
PPTX
Data preprocessing
PPT
2.5 backpropagation
PDF
Production System in AI
PPTX
Machine learning ppt.
PPTX
Classification and Regression
PPTX
Data Exploration.pptx
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
CS6010 Social Network Analysis Unit I
Weak Slot and Filler Structures
CS8391 Data Structures Part B Questions Anna University
Unsupervised learning clustering
Database recovery techniques
An introduction to Machine Learning
Ensemble methods in machine learning
Introduction to Machine Learning
Classification Algorithm.
Machine Learning + Analytics
Classification and Clustering
Logistic regression in Machine Learning
Data preprocessing
2.5 backpropagation
Production System in AI
Machine learning ppt.
Classification and Regression
Data Exploration.pptx
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Ad

Similar to Feature Engineering in Machine Learning (20)

PDF
Unit_2_Feature Engineering.pdf
PPTX
introduction to Statistical Theory.pptx
PPTX
Principal Component Analysis (PCA) machine Learning.
PDF
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
PPTX
Feature selection using PCA.pptx
PDF
Machine Learning.pdf
DOC
Observations
DOCX
Deep Learning Vocabulary.docx
PDF
ML-Unit-4.pdf
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
PPTX
Dimensionality Reduction.pptx
PPT
dimension reduction.ppt
PPTX
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PPTX
Feature Engineering Fundamentals Explained.pptx
PPTX
11 Principal Component Analysis Computer Graphics.pptx
PDF
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
PPT
DATA STRUCTURE AND ALGORITHMS
PDF
Machine learning Mind Map
Unit_2_Feature Engineering.pdf
introduction to Statistical Theory.pptx
Principal Component Analysis (PCA) machine Learning.
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
Feature selection using PCA.pptx
Machine Learning.pdf
Observations
Deep Learning Vocabulary.docx
ML-Unit-4.pdf
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
Dimensionality Reduction.pptx
dimension reduction.ppt
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Feature Engineering Fundamentals Explained.pptx
11 Principal Component Analysis Computer Graphics.pptx
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
DATA STRUCTURE AND ALGORITHMS
Machine learning Mind Map
Ad

More from Pyingkodi Maran (20)

PDF
Defining Identity as a Service (IDaaS) in Cloud Computing
PDF
Data Science Normal Distribution Z-Score
PDF
Data Science Introduction and Process in Data Science
PDF
Database Manipulation with MYSQL Commands
PDF
Jquery Tutorials for designing Dynamic Web Site
PDF
Working with AWS Relational Database Instances
DOC
Health Monitoring System using IoT.doc
PPT
IoT Industry Adaptation of AI.ppt
PPT
IoT_Testing.ppt
PDF
Azure Devops
PDF
Creation of Web Portal using DURPAL
PDF
AWS Relational Database Instance
PDF
AWS S3 Buckets
PDF
Creation of AWS Instance in Cloud Platform
PDF
Amazon Web Service.pdf
PDF
Cloud Security
PDF
Cloud Computing Introduction
PDF
Supervised Machine Learning Algorithm
PDF
Unsupervised Learning in Machine Learning
PDF
Normalization in DBMS
Defining Identity as a Service (IDaaS) in Cloud Computing
Data Science Normal Distribution Z-Score
Data Science Introduction and Process in Data Science
Database Manipulation with MYSQL Commands
Jquery Tutorials for designing Dynamic Web Site
Working with AWS Relational Database Instances
Health Monitoring System using IoT.doc
IoT Industry Adaptation of AI.ppt
IoT_Testing.ppt
Azure Devops
Creation of Web Portal using DURPAL
AWS Relational Database Instance
AWS S3 Buckets
Creation of AWS Instance in Cloud Platform
Amazon Web Service.pdf
Cloud Security
Cloud Computing Introduction
Supervised Machine Learning Algorithm
Unsupervised Learning in Machine Learning
Normalization in DBMS

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
Project quality management in manufacturing
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
DOCX
573137875-Attendance-Management-System-original
PPTX
Construction Project Organization Group 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
OOP with Java - Java Introduction (Basics)
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Project quality management in manufacturing
Foundation to blockchain - A guide to Blockchain Tech
Model Code of Practice - Construction Work - 21102022 .pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Embodied AI: Ushering in the Next Era of Intelligent Systems
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CH1 Production IntroductoryConcepts.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
573137875-Attendance-Management-System-original
Construction Project Organization Group 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT 4 Total Quality Management .pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...

Feature Engineering in Machine Learning

  • 1. Machine Learning : Feature Engineering Dr.M.Pyingkodi Dept of MCA Kongu Engineering College Erode, Tamilnadu, India
  • 2. Basics of Feature Engineering(FE) • the process of translating a data set into features such that these features are able to represent the data set more effectively and result in a better learning performance. • refers to manipulation — addition, deletion, combination, mutation — of your data set to improve machine learning model training, leading to better performance and greater accuracy. • a part of the preparatory activities • It is responsible for taking raw input data and converting that to well-aligned features which are ready to be used by the machine learning models. • FE encapsulates various data engineering techniques such as selecting relevant features, handling missing data, encoding the data, and normalizing it. • It has two major elements ❖ feature transformation ❖ feature subset selection
  • 3. Feature • A feature is an attribute of a data set that is used in a machine learning process. • selection of the subset of features which are meaningful for machine learning is a sub-area of feature engineering. • The features in a data set are also called its dimensions. • a data set having ‘n’ features is called an n-dimensional data set. Class variable - Species predictor variables 5 dimensional dataset
  • 4. Feature transformation Apply a mathematical formula to a particular column(feature) and transform the values which are useful for our further analysis. Creating new features from existing features that may help in improving the model performance. A set of features (m) to create a new feature set (n) while retaining as much information as possible It has two major elements: 1. feature construction 2. feature extraction Both are sometimes known as feature discovery There are two distinct goals of feature transformation: 1. Achieving best reconstruction of the original features in the data set 2. Achieving highest efficiency in the learning task
  • 5. Feature Construction Involves transforming a given set of input features to generate a new set of more powerful features. The data set has three features – apartment length, apartment breadth, and price of the apartment. If it is used as an input to a regression problem, such data can be training data for the regression model. So given the training data, the model should be able to predict the price of an apartment whose price is not known or which has just come up for sale. However, instead of using length and breadth of the apartment as a predictor, it is much convenient and makes more sense to use the area of the apartment, which is not an existing feature of the data set. So such a feature, namely apartment area, can be added to the data set. In other words, we transform the three- dimensional data set to a four-dimensional data set, with the newly ‘discovered’ feature apartment area being added to the original data set.
  • 6. Feature Construction • when features have categorical value and machine learning needs numeric value inputs • when features having numeric (continuous) values and need to be converted to ordinal values • when text-specific feature construction needs to be done Ordinal data are discrete integers that can be ranked or sorted
  • 7. Encoding categorical (nominal) variables FIG. 4.3 Feature construction (encoding nominal variables)
  • 8. Encoding categorical (ordinal) variables • The grade is an ordinal variable with values A, B, C, and D. • To transform this variable to a numeric variable, we can create a feature num_grade mapping a numeric value against each ordinal value. • mapped to values 1, 2, 3, and 4 in the transformed variable
  • 9. Transforming numeric (continuous) features to categorical features • As a real estate price category prediction, which is a classification problem. • In that case, we can ‘bin’ the numerical data into multiple categories based on the data range. • In the context of the real estate price prediction example, the original data set has a numerical feature apartment_price
  • 10. Text-specific feature construction Text is arguably the most predominant medium of communication. Text mining is an important area of research unstructured nature of the data All machine learning models need numerical data as input. So the text data in the data sets need to be transformed into numerical features EX: Facebook or micro-blogging channels like Twitter or emails or short messaging services such as Whatsapp, Text plays a major role in the flow of information. Vectorization: Turning text into vectors/ arrays To turn text to integer (or boolean, or floating numbers) vectors. vectors are lists with n positions.
  • 11. Vectorization I want to turn my text into data. One-hot encoding for “I want to turn my text into data” One-hot encoding for “I want my data”. One hot encoding only treats values as “present” and “not present”.
  • 12. Three major steps 1. Tokenize In order to tokenize a corpus, the blank spaces and punctuations are used as delimiters to separate out the words, or tokens A corpus is a collection of authentic text or audio organized into datasets. 2. Count Then the number of occurrences of each token is counted, for each document. 3. Normalize Tokens are weighted with reducing importance when they occur in the majority of the documents. A matrix is then formed with each token representing a column and a specific document of the corpus representing each row. Each cell contains the count of occurrence of the token in a specific document. This matrix is known as a document-term matrix / term- document matrix
  • 13. Typical document- term matrix which forms an input to a machine learning model.
  • 14. Feature Extraction New features are created from a combination of original features. Operators for combining the original features include 1. For Boolean features: Conjunctions, Disjunctions, Negation, etc. 2. For nominal features: Cartesian product, M of N, etc. 3. For numerical features: Min, Max, Addition, Subtraction,Multiplication, Division, Average, Equivalence, Inequality, etc.
  • 16. Principal Component Analysis Dimensionality-reduction method. has multiple attributes or dimensions – many of which might have similarity with each other. EX: If the height is more, generally weight is more and vice versa In PCA, a new set of features are extracted from the original features which are quite dissimilar in nature. transforming a large set of variables into a smaller one. reduce the number of variables of a data set, while preserving as much information as possible. Objective: 1.The new features are distinct, i.e. the covariance between the new features, i.e. the principal components is 0. 2. The principal components are generated in order of the variability in the data that it captures. Hence, the first principal component should capture the maximum variability, the second principal component should capture the next highest variability etc. 3. The sum of variance of the new features or the principal components should be equal to the sum of variance of the original features.
  • 17. Principal Component Analysis converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation These new transformed features are called the Principal Components. it contains the important variables and drops the least important variable. Examples: image processing, movie recommendation system, optimizing the power allocation in various communication channels. Eigen vectors are the principal components of the data set. Eigenvectors and values exist in pairs: every eigen vector has a corresponding eigenvalue. Eigen vector is the direction of the line (vertical, horizontal, 45 degrees etc.). An eigenvalue is a number, telling you how much variance there is in the data in that direction, Eigenvalue is a number telling you how spread out the data is on the line.
  • 18. PCA algorithm Terms Dimensionality It is the number of features or variables present in the given dataset. Correlation It signifies that how strongly two variables are related to each other. Such as if one changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly proportional to each other. Orthogonal It defines that variables are not correlated to each other, and hence the correlation between the pair of variables is zero. Eigenvectors column vector. Eigenvalue can be referred to as the strength of the transformation Covariance Matrix A matrix containing the covariance between the pair of variables is called the Covariance Matrix.
  • 19. Steps for PCA algorithm 1. Standardizing the data those variables with larger ranges will dominate over those with small ranges (For example, a variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and 2. Covariance Matrix Calculation how the variables of the input data set are varying from the mean with respect to each other. if positive then : the two variables increase or decrease together (correlated) if negative then : One increases when the other decreases (Inversely correlated)
  • 20. Steps for PCA algorithm 3.Compute The Eigenvectors & Eigenvalues of The Covariance Matrix To Identify The Principal Components Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables. These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components. principal components represent the directions of the data that explain a maximal amount of variance
  • 21. 4. The eigenvector having the next highest eigenvalue represents the direction in which data has the highest remaining variance and also orthogonal to the first direction. So this helps in identifying the second principal component. 5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues so as to get the ‘k’ principal components. eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most variance(most information) and that we call Principal Components. eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each Principal Component. Steps for PCA algorithm
  • 22. • a matrix is a factorization of that matrix into three matrices. • linear transformations SVD of a matrix A (m × n) is a factorization of the form: • two orthogonal matrices U and V and diagonal matrix D. • where, U and V are orthonormal matrices • U is an m × m unitary matrix, • V is an n × n unitary matrix and • Σ is an m × n rectangular diagonal matrix. • The diagonal entries of Σ are known as singular values of matrix A. • The columns of U and V are called the left-singular and right-singular vectors of matrix A • The square roots of these eigenvalues are called singular values. Singular Value Decomposition
  • 23. 1. Patterns in the attributes are captured by the right-singular vectors, i.e.the columns of V. 2. Patterns among the instances are captured by the left-singular, i.e. the columns of U. 3. Larger a singular value, larger is the part of the matrix A that it accounts for and its associated vectors. 4. New data matrix with k’ attributes is obtained using the equation D = D × [v , v , … , v ] Thus, the dimensionality gets reduced to k SVD of a data matrix : properties
  • 24. PCA, is capture the data set variability. PCA that calculates eigenvalues of the covariancematrix of the data set LDA focuses on class separability. To reduce the number of features to a more manageable number before classification. commonly used for supervised classification problems Separating the features based on class separability so as to avoid over-fitting of the machine learning model. • LDA calculates eigenvalues and eigenvectors within a class and inter-class scatter matrices. Linear Discriminant Analysis
  • 25. 1. Calculate the mean vectors for the individual classes. 2. Calculate intra-class and inter-class scatter matrices. 3. Calculate eigenvalues and eigenvectors for S and S , where S is the intra-class scatter matrix and S is the inter-class scatter matrix where, mi is the sample mean for each class, m is the overall mean of the data set, Ni is the sample size of each class 4. Identify the top k’ eigenvectors having top k’ eigenvalues Linear Discriminant Analysis where, m is the mean vector of the i-th class