SlideShare a Scribd company logo
302049: Artificial Intelligence &
Machine Learning
UNIT 2
FEATURE EXTRACTION AND SELECTION
Feature
● Feature is defined as a function of the basic measurement
variables or attributes that specifies some quantifiable property of
an object and is useful for classification and/or pattern
recognition.
● Obtaining a good data representation is a very domain specific task
and it is related to the available measurements.
.
Following are typical examples.
● A model for predicting the risk of cardiac disease
may have features such as age, gender, weight,
whether the person smokes, whether the person
is suffering from diabetic disease, etc.
● A model for predicting whether the person is
suitable for a job may have features such as the
education qualification, number of years of
experience, experience working in the field etc.
● A model for predicting the size of a shirt for a
person may have features such as age, gender,
Feature extraction
● Feature extraction is a technique used to reduce a
large input data set into relevant features. This is
done with dimensionality reduction to transform
large input data into smaller, meaningful groups
for processing.
● Feature extraction is a process that extracts a set
of new features from the original features through
some functional mapping.
● Assuming there are n features or attributes A1,
A2…… An. After feature extraction we have another
Feature extraction - Example
Approach 1
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
Approach 2
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
● The technique of extracting the features is useful when you have a
large data set and need to reduce the number of resources without
losing any important or relevant information.
● Feature extraction helps to reduce the amount of redundant data
from the data set.
● In the end, the reduction of the data helps to build the model with less
machine effort and also increases the speed of learning and
generalization steps in the machine learning process.
Feature vector and Feature space
● Feature vector is an n-dimensional vector of numerical features
that represent some object.
● Many algorithms in machine learning require a numerical
representation of objects, since such representations facilitate
processing and statistical analysis.
● When representing images, the feature values might correspond to
the pixels of an image, while when representing texts the features
might be the frequencies of occurrence of textual terms.
● The vector space associated with these vectors is often called the
feature space.Feature space refers to the n-dimensions where your
variables live.
Feature construction
● Feature construction is a process that discovers missing information
about the relationships between features and augments the space of
features by inferring or creating additional features.
● Assuming there are n features A1, A2…….An. After feature construction
we may have additional m features. An+1, An+2….. An+m. All new
constructed features are defined in terms of original features as such no
inherently new informed is added through feature construction.
● Feature construction attempts to increase the expressive power of the
original features. Usually the dimensionality of the new feature set is
expanded and is bigger than that of the original feature set. Intuitively there
could be many combinations of original features, and not all combinations are
necessary and useful.
Characteristics of good feature
● Features must be found in most of the data samples: Applied across different
types of data samples and are not limited to just one data samples. Eg.
Shape of an Apple
● Features must be unique and may not be found prevalent with other (different)
forms: Unique to the object and should not be applicable for other. Eg.
Hardness of sugarcane
● Features in reality:There can be features which can be accidental in nature and is
not a feature at all when considering the population.eg. Presence of kite in the
picture of tree.
Underfitting, Overfitting, and Optimum fitting of model
Underfitting is a scenario in data science
where a data model is unable to capture the
relationship between the input and output
variables accurately, generating a high error
rate on both the training set and unseen data.
Since the model is not complex enough to
classify the underlying data. In this example,
we see that the data has a second-order
relation but the model is a linear model so it
won’t be able to perform well.
Overfitting
● Overfitting is the opposite in the sense
that the model is too complex (or higher
model) and captures even the noise in the
data.
● An overfit model has overly memorized
the data set it has seen and is unable to
generalize the learning to an unseen
data set.
● Therefore in this case one would observe
a very low test error value. However,
when it would fail to generalise to both
the validation and test sets.
Optimum fitting
Model Complexity
Principal Component Analysis(PCA)
● Principal Component Analysis is an unsupervised learning algorithm that is
used for the dimensionality reduction in machine learning.
● It is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal
transformation.
● These new transformed features are called the Principal Components.
● Principal component analysis (PCA) is a technique for reducing the
dimensionality of such datasets, increasing interpretability but at the same time
minimizing information loss.
● PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
● So, to sum up, the idea of PCA is simple — reduce the number of variables
of a data set, while preserving as much information as possible.
Some common terms used in PCA algorithm
● Dimensionality: It is the number of features or variables present in the given dataset. More easily,
it is the number of columns present in the dataset.
● Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
● Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
● Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
● Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
The number of these PCs are either equal to or less than the original features present
in the dataset.
Some properties of these principal components are given below:
● The principal component must be the linear combination of the original features.
● These components are orthogonal, i.e., the correlation between a pair of variables
is zero.
● The importance of each component decreases when going to 1 to n, it means the
1 PC has the most importance, and n PC will have the least importance.
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf

More Related Content

PDF
13_Data Preprocessing in Python.pptx (1).pdf
PPTX
introduction to Statistical Theory.pptx
PDF
ML-Unit-4.pdf
PDF
Machine Learning.pdf
PPT
dimension reduction.ppt
PDF
Unit_2_Feature Engineering.pdf
PDF
Feature Engineering in Machine Learning
PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
13_Data Preprocessing in Python.pptx (1).pdf
introduction to Statistical Theory.pptx
ML-Unit-4.pdf
Machine Learning.pdf
dimension reduction.ppt
Unit_2_Feature Engineering.pdf
Feature Engineering in Machine Learning
Feature Scaling and Normalization Feature Scaling and Normalization.pptx

Similar to AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf (20)

PPTX
Dimensionality Reduction.pptx
PDF
Feature Engineering in Machine Learning
PDF
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
PPTX
When Models Meet Data: From ancient science to todays Artificial Intelligence...
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
PDF
Machine learning Mind Map
PDF
PPTX
fINAL ML PPT.pptx
PPTX
PCA.pptx
PPTX
Chapter 1 - Introduction to data structure.pptx
PPTX
Deeplearning for Computer Vision PPT with
PPTX
Feature selection using PCA.pptx
PDF
Machine learning4dummies
PDF
Machine Learning and Deep Learning 4 dummies
PDF
Machine Learning - Implementation with Python - 3.pdf
PDF
material PREDICTIVE ANALYTICS UNIT I.pdf
PDF
Integrating Artificial Intelligence with IoT
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PPTX
Machine learning module 2
Dimensionality Reduction.pptx
Feature Engineering in Machine Learning
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
When Models Meet Data: From ancient science to todays Artificial Intelligence...
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
Machine learning Mind Map
fINAL ML PPT.pptx
PCA.pptx
Chapter 1 - Introduction to data structure.pptx
Deeplearning for Computer Vision PPT with
Feature selection using PCA.pptx
Machine learning4dummies
Machine Learning and Deep Learning 4 dummies
Machine Learning - Implementation with Python - 3.pdf
material PREDICTIVE ANALYTICS UNIT I.pdf
Integrating Artificial Intelligence with IoT
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Machine learning module 2
Ad

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Project quality management in manufacturing
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
Mechanical Engineering MATERIALS Selection
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
DOCX
573137875-Attendance-Management-System-original
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CH1 Production IntroductoryConcepts.pptx
Project quality management in manufacturing
Structs to JSON How Go Powers REST APIs.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Operating System & Kernel Study Guide-1 - converted.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mechanical Engineering MATERIALS Selection
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
OOP with Java - Java Introduction (Basics)
Lesson 3_Tessellation.pptx finite Mathematics
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
573137875-Attendance-Management-System-original
Foundation to blockchain - A guide to Blockchain Tech
Internet of Things (IOT) - A guide to understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Ad

AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf

  • 1. 302049: Artificial Intelligence & Machine Learning UNIT 2 FEATURE EXTRACTION AND SELECTION
  • 2. Feature ● Feature is defined as a function of the basic measurement variables or attributes that specifies some quantifiable property of an object and is useful for classification and/or pattern recognition. ● Obtaining a good data representation is a very domain specific task and it is related to the available measurements. .
  • 3. Following are typical examples. ● A model for predicting the risk of cardiac disease may have features such as age, gender, weight, whether the person smokes, whether the person is suffering from diabetic disease, etc. ● A model for predicting whether the person is suitable for a job may have features such as the education qualification, number of years of experience, experience working in the field etc. ● A model for predicting the size of a shirt for a person may have features such as age, gender,
  • 4. Feature extraction ● Feature extraction is a technique used to reduce a large input data set into relevant features. This is done with dimensionality reduction to transform large input data into smaller, meaningful groups for processing. ● Feature extraction is a process that extracts a set of new features from the original features through some functional mapping. ● Assuming there are n features or attributes A1, A2…… An. After feature extraction we have another
  • 11. ● The technique of extracting the features is useful when you have a large data set and need to reduce the number of resources without losing any important or relevant information. ● Feature extraction helps to reduce the amount of redundant data from the data set. ● In the end, the reduction of the data helps to build the model with less machine effort and also increases the speed of learning and generalization steps in the machine learning process.
  • 12. Feature vector and Feature space ● Feature vector is an n-dimensional vector of numerical features that represent some object. ● Many algorithms in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. ● When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms. ● The vector space associated with these vectors is often called the feature space.Feature space refers to the n-dimensions where your variables live.
  • 13. Feature construction ● Feature construction is a process that discovers missing information about the relationships between features and augments the space of features by inferring or creating additional features. ● Assuming there are n features A1, A2…….An. After feature construction we may have additional m features. An+1, An+2….. An+m. All new constructed features are defined in terms of original features as such no inherently new informed is added through feature construction. ● Feature construction attempts to increase the expressive power of the original features. Usually the dimensionality of the new feature set is expanded and is bigger than that of the original feature set. Intuitively there could be many combinations of original features, and not all combinations are necessary and useful.
  • 14. Characteristics of good feature ● Features must be found in most of the data samples: Applied across different types of data samples and are not limited to just one data samples. Eg. Shape of an Apple ● Features must be unique and may not be found prevalent with other (different) forms: Unique to the object and should not be applicable for other. Eg. Hardness of sugarcane ● Features in reality:There can be features which can be accidental in nature and is not a feature at all when considering the population.eg. Presence of kite in the picture of tree.
  • 15. Underfitting, Overfitting, and Optimum fitting of model Underfitting is a scenario in data science where a data model is unable to capture the relationship between the input and output variables accurately, generating a high error rate on both the training set and unseen data. Since the model is not complex enough to classify the underlying data. In this example, we see that the data has a second-order relation but the model is a linear model so it won’t be able to perform well.
  • 16. Overfitting ● Overfitting is the opposite in the sense that the model is too complex (or higher model) and captures even the noise in the data. ● An overfit model has overly memorized the data set it has seen and is unable to generalize the learning to an unseen data set. ● Therefore in this case one would observe a very low test error value. However, when it would fail to generalise to both the validation and test sets.
  • 19. Principal Component Analysis(PCA) ● Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality reduction in machine learning. ● It is a statistical process that converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation. ● These new transformed features are called the Principal Components. ● Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. ● PCA generally tries to find the lower-dimensional surface to project the high- dimensional data. ● So, to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.
  • 20. Some common terms used in PCA algorithm ● Dimensionality: It is the number of features or variables present in the given dataset. More easily, it is the number of columns present in the dataset. ● Correlation: It signifies that how strongly two variables are related to each other. Such as if one changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly proportional to each other. ● Orthogonal: It defines that variables are not correlated to each other, and hence the correlation between the pair of variables is zero. ● Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector if Av is the scalar multiple of v. ● Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
  • 21. The number of these PCs are either equal to or less than the original features present in the dataset. Some properties of these principal components are given below: ● The principal component must be the linear combination of the original features. ● These components are orthogonal, i.e., the correlation between a pair of variables is zero. ● The importance of each component decreases when going to 1 to n, it means the 1 PC has the most importance, and n PC will have the least importance.