AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf

302049: Artificial Intelligence &
Machine Learning
UNIT 2
FEATURE EXTRACTION AND SELECTION

Feature
● Feature is defined as a function of the basic measurement
variables or attributes that specifies some quantifiable property of
an object and is useful for classification and/or pattern
recognition.
● Obtaining a good data representation is a very domain specific task
and it is related to the available measurements.
.

Following are typical examples.
● A model for predicting the risk of cardiac disease
may have features such as age, gender, weight,
whether the person smokes, whether the person
is suffering from diabetic disease, etc.
● A model for predicting whether the person is
suitable for a job may have features such as the
education qualification, number of years of
experience, experience working in the field etc.
● A model for predicting the size of a shirt for a
person may have features such as age, gender,

Feature extraction
● Feature extraction is a technique used to reduce a
large input data set into relevant features. This is
done with dimensionality reduction to transform
large input data into smaller, meaningful groups
for processing.
● Feature extraction is a process that extracts a set
of new features from the original features through
some functional mapping.
● Assuming there are n features or attributes A1,
A2…… An. After feature extraction we have another

● The technique of extracting the features is useful when you have a
large data set and need to reduce the number of resources without
losing any important or relevant information.
● Feature extraction helps to reduce the amount of redundant data
from the data set.
● In the end, the reduction of the data helps to build the model with less
machine effort and also increases the speed of learning and
generalization steps in the machine learning process.

Feature vector and Feature space
● Feature vector is an n-dimensional vector of numerical features
that represent some object.
● Many algorithms in machine learning require a numerical
representation of objects, since such representations facilitate
processing and statistical analysis.
● When representing images, the feature values might correspond to
the pixels of an image, while when representing texts the features
might be the frequencies of occurrence of textual terms.
● The vector space associated with these vectors is often called the
feature space.Feature space refers to the n-dimensions where your
variables live.

Feature construction
● Feature construction is a process that discovers missing information
about the relationships between features and augments the space of
features by inferring or creating additional features.
● Assuming there are n features A1, A2…….An. After feature construction
we may have additional m features. An+1, An+2….. An+m. All new
constructed features are defined in terms of original features as such no
inherently new informed is added through feature construction.
● Feature construction attempts to increase the expressive power of the
original features. Usually the dimensionality of the new feature set is
expanded and is bigger than that of the original feature set. Intuitively there
could be many combinations of original features, and not all combinations are
necessary and useful.

Characteristics of good feature
● Features must be found in most of the data samples: Applied across different
types of data samples and are not limited to just one data samples. Eg.
Shape of an Apple
● Features must be unique and may not be found prevalent with other (different)
forms: Unique to the object and should not be applicable for other. Eg.
Hardness of sugarcane
● Features in reality:There can be features which can be accidental in nature and is
not a feature at all when considering the population.eg. Presence of kite in the
picture of tree.

Underfitting, Overfitting, and Optimum fitting of model
Underfitting is a scenario in data science
where a data model is unable to capture the
relationship between the input and output
variables accurately, generating a high error
rate on both the training set and unseen data.
Since the model is not complex enough to
classify the underlying data. In this example,
we see that the data has a second-order
relation but the model is a linear model so it
won’t be able to perform well.

Overfitting
● Overfitting is the opposite in the sense
that the model is too complex (or higher
model) and captures even the noise in the
data.
● An overfit model has overly memorized
the data set it has seen and is unable to
generalize the learning to an unseen
data set.
● Therefore in this case one would observe
a very low test error value. However,
when it would fail to generalise to both
the validation and test sets.

Principal Component Analysis(PCA)
● Principal Component Analysis is an unsupervised learning algorithm that is
used for the dimensionality reduction in machine learning.
● It is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal
transformation.
● These new transformed features are called the Principal Components.
● Principal component analysis (PCA) is a technique for reducing the
dimensionality of such datasets, increasing interpretability but at the same time
minimizing information loss.
● PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
● So, to sum up, the idea of PCA is simple — reduce the number of variables
of a data set, while preserving as much information as possible.

Some common terms used in PCA algorithm
● Dimensionality: It is the number of features or variables present in the given dataset. More easily,
it is the number of columns present in the dataset.
● Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
● Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
● Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
● Covariance Matrix: A matrix containing the covariance between the pair of variables is called the

The number of these PCs are either equal to or less than the original features present
in the dataset.
Some properties of these principal components are given below:
● The principal component must be the linear combination of the original features.
● These components are orthogonal, i.e., the correlation between a pair of variables
is zero.
● The importance of each component decreases when going to 1 to n, it means the
1 PC has the most importance, and n PC will have the least importance.

AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf

More Related Content

Similar to AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf (20)

Recently uploaded (20)

AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf