LECTURE-2-INTRO FEATURE ENGINEERING.pptx

Machine Learning:LEC-2 (Unit 1)
Course Code: : PCCAIML-502
Prof D. Chakraborty (PhD, J.U.)
Dept: Computer Sc. & Engg.
Asansol Engineering College, WB

FEATURE ENGINEERING
• Feature engineering is the pre-processing step of machine learning, which is
used to transform raw data into features that can be used for creating a
predictive model using Machine learning or statistical Modelling.
• Since 2016, automated feature engineering is also used in different machine
learning software that helps in automatically extracting features from raw
data. Feature engineering in ML contains mainly four processes: Feature
Creation, Transformations, Feature Extraction, and Feature Selection.
• Feature engineering in machine learning aims to improve the performance of
models. In this topic, we will understand the details about feature engineering
in Machine learning. But before going into details, let's first understand what
features are? And What is the need for feature engineering?

FEATURE ENGINEERING
• What is a feature?
• Generally, all machine learning algorithms take input data to generate the
output. The input data remains in a tabular form consisting of rows
(instances or observations) and columns (variable or attributes), and these
attributes are often known as features. For example, an image is an
instance in computer vision, but a line in the image could be the feature.
Similarly, in NLP, a document can be an observation, and the word count
could be the feature. So, we can say a feature is an attribute that impacts
a problem or is useful for the problem.

FEATURE ENGINEERIG
• Feature Creation: Feature creation is finding the most useful
variables to be used in a predictive model. The process is subjective,
and it requires human creativity and intervention. The new features
are created by mixing existing features using addition, subtraction,
and ratio, and these new features have great flexibility.
• Transformations: The transformation step of feature engineering
involves adjusting the predictor variable to improve the accuracy
and performance of the model. For example, it ensures that the
model is flexible to take input of the variety of data; it ensures that
all the variables are on the same scale, making the model easier to
understand. It improves the model's accuracy and ensures that all
the features are within the acceptable range to avoid any
computational error.

CONTD..
• Feature Extraction: Feature extraction is an automated feature engineering process
that generates new variables by extracting them from the raw data. The main aim of
this step is to reduce the volume of data so that it can be easily used and managed
for data modelling. Feature extraction methods include cluster analysis, text
analytics, edge detection algorithms, and principal components analysis (PCA).
• Feature Selection: While developing the machine learning model, only a few
variables in the dataset are useful for building the model, and the rest features are
either redundant or irrelevant. If we input the dataset with all these redundant and
irrelevant features, it may negatively impact and reduce the overall performance and
accuracy of the model. Hence it is very important to identify and select the most
appropriate features from the data and remove the irrelevant or less important
features, which is done with the help of feature selection in machine
learning. "Feature selection is a way of selecting the subset of the most relevant
features from the original features set by removing the redundant, irrelevant, or
noisy features."

Machine Learning Paradigms
• Machine learning is commonly separated into three main
learning paradigms: supervised learning, unsupervised
learning, and reinforcement learning.
1. Supervised Learning
Supervised learning is the most common learning paradigm. In
supervised learning, the computer learns from a set of input-
output pairs, which are called labeled examples

CONTD..
• Our goal is to predict the weight of an animal from its other
characteristics, so we rewrite this dataset as a set of input-
output pairs:

CONTD..
• The input variables (here, age and gender) are generally
called features, and the set of features representing an
example is called a feature vector. From this dataset, we can
learn a predictor in a supervised way.
Out[•]=
OUT = 3.65KG

CONTD..
• Unsupervised Learning
• Unsupervised learning is the second most used learning
paradigm. It is not used as much as supervised learning, but it
unlocks different types of applications. In unsupervised
learning, there are neither inputs nor outputs, the data is just
a set of examples:

REINFORCEMENT LEARNING
• The third most classic learning paradigm is called reinforcement learning,
which is a way for autonomous agents to learn. Reinforcement learning is
fundamentally different from supervised and unsupervised learning in the
sense that the data is not provided as a fixed set of examples. Rather, the
data to learn from is obtained by interacting with an external system
called the environment. The name “reinforcement learning” originates
from behavioral psychology, but it could just as well be called “interactive
learning.”
• Reinforcement learning is often used to teach agents, such as robots, to
learn a given task. The agent learns by taking actions in the environment
and receiving observations from this environment:

LECTURE-2-INTRO FEATURE ENGINEERING.pptx

More Related Content

Similar to LECTURE-2-INTRO FEATURE ENGINEERING.pptx (20)

Recently uploaded (20)

LECTURE-2-INTRO FEATURE ENGINEERING.pptx