SlideShare a Scribd company logo
Module 3
Advanced Feature Engineering and Feature
Selection
Introduction to Feature Engineering
Feature engineering is the process of improving a model’s accuracy by
using domain knowledge to select and transform raw data’s most
relevant variables into features of predictive models that better
represent the underlying problem.
Feature Engineering Fundamentals Explained.pptx
Feature Engineering
Feature
Transformation
Feature
Construction
Feature Selection Feature Extraction
Missing value imputation
Handling Categorical
Features
Outlier detection
Feature scaling
Missing Value Imputation
Handling Categorical Features
Outlier Detection
interquartile range = Upper Quartile – Lower Quartile = Q­
3 – Q­
1
Feature Scaling
Why do we need feature scaling
Feature Scaling
Standardization Normalization
Standardisation(Z-score normalization)
Assume our dataset has random numeric values in the range of 1 to 95,000 (in random order)Just for our
understanding consider a small Dataset of barely 10 values with numbers in the given range and
randomized order.
If we just look at these values, their range is so high, that while training the
model with 10,000 such values will take lot of time.
We have a solution to solve the problem arisen i.e. Standardization. It helps us
solve this by :
● Down Scaling the Values to a scale common to all, usually in the range -
1 to +1.
● And keeping the Range between the values intact.
Normalization
Feature Engineering Fundamentals Explained.pptx
Feature Engineering Fundamentals Explained.pptx
Feature Selection Techniques
Feature selection is a crucial step in the machine learning pipeline, involving
the selection of a subset of relevant features (variables, predictors) for use in
model construction. Effective feature selection can improve model
performance, reduce overfitting, and decrease training time.
The role of feature selection in machine learning is,
1.To reduce the dimensionality of feature space.
2.To speed up a learning algorithm.
3.To improve the predictive accuracy of a classification algorithm.
There are several techniques for feature selection:
Filter Methods
▪ In Filter Method, features are selected on the basis of statistics measures.
▪ This method does not depend on the learning algorithm and chooses the features as a pre-
processing step.
▪ These methods are faster and less computationally expensive than wrapper methods.
▪ When dealing with high-dimensional data, it is computationally cheaper to use filter
methods.
▪ Very good for removing duplicated, correlated, redundant features but these methods do
not remove multicollinearity.
Information Gain
It is defined as the amount of information provided by the feature for identifying the target value
and measures reduction in the entropy values. Information gain of each attribute is calculated
considering the target values for feature selection.
Chi-square Test
Chi-square test is a technique to determine the relationship between the categorical variables.
The chi-square value is calculated between each feature and the target variable, and the desired
number of features with the best chi-square value is selected.
Chi-square Test Example
Steps:
1.Define Null and Alternative Hypothesis:
Null Hypothesis:There is no significant association between the two categorical data
Alternative Hypothesis:There is significant association between the two categorical data.
2.Calculate Contingency Table:
3. Calculate expected Value
4.Calculate Chi-square value
5. Compare Chi-square value with Critical value to Accept or Reject Hypothesis
Degree of freedom=(r-
1) (c-1)
Significance level=0.05
Therefore Income level is relevant feature for predicting subscription
status
Fisher’s Score
Fisher score is one of the most widely used supervised feature selection methods.The algorithm
returns the ranks of the variables based on the fisher’s score in descending order.
Missing Value Ratio
The value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations.The variable is having more
than the threshold value can be dropped.
Fisher Score
Feature Engineering Fundamentals Explained.pptx
Feature Engineering Fundamentals Explained.pptx
Feature Engineering Fundamentals Explained.pptx
Missing Value Ratio:
1. Calculate the missing value ratio for each feature by dividing the number of missing values by
the total number of instances in the dataset.
2. Set a threshold for the acceptable missing value ratio (e.g., 0.8, meaning that a feature should
have at most 80% of its values missing to be considered).
3. Filter out features that have a missing value ratio above the threshold.
Advanced Feature Selection
Wrapper Methods
Wrapper methods, also referred to as greedy algorithms train the algorithm by
using a subset of features in an iterative manner.
Based on the conclusions made from training in prior to the model, addition and
removal of features takes place.
Stopping criteria for selecting the best subset are usually pre-defined by the person
training the model such as when the performance of the model decreases or a
specific number of features has been achieved.
The main advantage of wrapper methods over the filter methods is that they provide
an optimal set of features for training the model, thus resulting in better
accuracy than the filter methods but are computationally more expensive.
Forward selection
Forward selection is an iterative process, which begins with an empty set of features. After each
iteration, it keeps adding on a feature and evaluates the performance to check whether it is
improving the performance or not. The process continues until the addition of a new
variable/feature does not improve the performance of the model.
Backward elimination
Backward elimination is also an iterative approach, but it is the opposite of forward selection.
This technique begins the process by considering all the features and removes the least
significant feature. This elimination process continues until removing the features does not
improve the performance of the model.
Recursive Feature Elimination
Recursive feature elimination is a recursive greedy optimization approach, where features are
selected by recursively taking a smaller and smaller subset of features. Now, an estimator is
trained with each set of features, and the importance of each feature is determined using
coef_attribute or through a feature_importances_attribute.
Exhaustive Feature Selection
Exhaustive feature selection is one of the best feature selection methods, which evaluates each
feature set as brute-force. It means this method tries & make each possible combination of
features and return the best performing feature set.
How Exhaustive Feature SelectionWorks
1. Generate all possible feature subsets: For a dataset with n features, this means
𝑛
evaluating 2 ^ subsets (including the empty set).
𝑛
2. Evaluate each subset:Train and evaluate a model using each subset of features.The
evaluation metric could be accuracy, precision, recall, F1 score, etc.
3. Select the best subset: Identify the subset of features that provides the best performance
according to the chosen evaluation metric.
Embedded Methods
1.Regularization
This method adds a penalty to different parameters of the machine learning model to
avoid overfitting of the model.
▪ Lasso Regression (L1 Regularization): Adds an L1 penalty (the absolute value of
the magnitude of coefficients) to the loss function. This can shrink some coefficients
to zero, effectively performing feature selection.
▪ Ridge Regression (L2 Regularization): Adds an L2 penalty (the square of the
magnitude of coefficients) to the loss function. While it does not perform feature
selection by shrinking coefficients to zero, it helps in reducing overfitting and
improving model generalization.
2.Tree-based methods
Decision Trees:
Decision Trees split the data into subsets based on the value of input features, and
the splits that provide the best separation (based on criteria like Gini impurity or
information gain) indicate the most important features.
The depth of the tree and the features selected for splits at various levels provide
insights into feature importance.
Random Forests:
Random Forests are ensembles of decision trees. They provide feature importance
by averaging the importance measures of each feature across all the trees.
Feature importance in Random Forests is typically calculated by looking at the
decrease in impurity (e.g., Gini impurity)
Automated Feature Engineering
Automated feature engineering aims to simplify and speed up the process of creating features
from raw data by leveraging algorithms and tools.This approach reduces manual effort and can
uncover complex patterns and interactions that might be missed otherwise.
Benefits of Automated Feature Engineering
● Speed: Quickly generates and evaluates a large number of features.
● Complexity Handling: Captures complex interactions and transformations that might be
difficult to manually specify.
● Consistency: Applies feature engineering techniques uniformly across different datasets and
tasks.
● Performance: Often improves model performance by discovering useful features that
enhance predictive power.
EvalML AutoML library to automate Feature Engineering
evalML is an open-source Python library designed to automate and streamline the machine
learning workflow, particularly focusing on end-to-end model development.
Feature Engineering for Specific Data Types
1. Numerical Data
▪ Feature Scaling
▪ Power Transformations
2.Categorical Data
▪ One hot encoding
▪ Label encoding
▪ Target Encoding
3.Text Data
▪ Bag of Words (BoW)
▪ TF-IDF (Term Frequency-Inverse Document Frequency
▪ Word Embeddings
4.Time-Series Data
▪ Lag
▪ Fourier Transforms
▪ Time-Based Features

More Related Content

PPTX
Wrapper feature selection method
ODP
Machine Learning With Logistic Regression
PDF
Unsupervised Learning in Machine Learning
PPTX
Optimization problems and algorithms
PDF
Introduction to Artificial Intelligence.pdf
PPTX
Ml3 logistic regression-and_classification_error_metrics
PDF
Logistic regression
Wrapper feature selection method
Machine Learning With Logistic Regression
Unsupervised Learning in Machine Learning
Optimization problems and algorithms
Introduction to Artificial Intelligence.pdf
Ml3 logistic regression-and_classification_error_metrics
Logistic regression

What's hot (20)

PDF
Linear regression
PPTX
Logistic regression
PPTX
Types of machine learning
PPTX
Collaborative Filtering Recommendation System
PPTX
K-Means Clustering Algorithm.pptx
PPTX
Supervised Machine Learning
PPTX
Distribution transparency and Distributed transaction
PDF
State space representation and search.pdf
PPTX
Clustering paradigms and Partitioning Algorithms
PPTX
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
PPT
pattern classification
PPTX
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
PDF
Logistic regression : Use Case | Background | Advantages | Disadvantages
PDF
Logistic regression in Machine Learning
PDF
Logistic regression
PPTX
Cannonical correlation
PPTX
Logistic regression
PPTX
Support vector machine-SVM's
PDF
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
PDF
Hill Climbing Algorithm in Artificial Intelligence
Linear regression
Logistic regression
Types of machine learning
Collaborative Filtering Recommendation System
K-Means Clustering Algorithm.pptx
Supervised Machine Learning
Distribution transparency and Distributed transaction
State space representation and search.pdf
Clustering paradigms and Partitioning Algorithms
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
pattern classification
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression in Machine Learning
Logistic regression
Cannonical correlation
Logistic regression
Support vector machine-SVM's
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Hill Climbing Algorithm in Artificial Intelligence
Ad

Similar to Feature Engineering Fundamentals Explained.pptx (20)

PDF
Optimization Technique for Feature Selection and Classification Using Support...
PPT
feature selection slides share and types of features selection
PPTX
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
PPTX
Data Engineer's Lunch #67: Machine Learning - Feature Selection
PDF
JUNE-77.pdf
PDF
Machine Learning.pdf
PDF
763354.MIPRO_2015_JovicBrkicBogunovic.pdf
PDF
featurers_Machinelearning___________.pdf
PDF
Feature Selection.pdf
PDF
A Review on Feature Selection Methods For Classification Tasks
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
International Journal of Engineering Research and Development (IJERD)
PPTX
feature-Selection-Lab-8-20032024-111222am.pptx
PDF
Booster in High Dimensional Data Classification
PPTX
Lecture 6 Feature Selection Techniques in Data Science.pptx
PPTX
data reduction techniques-data minig.pptx
PDF
PyData London 2018 talk on feature selection
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
PDF
Optimal Feature Selection from VMware ESXi 5.1
Optimization Technique for Feature Selection and Classification Using Support...
feature selection slides share and types of features selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature Selection
JUNE-77.pdf
Machine Learning.pdf
763354.MIPRO_2015_JovicBrkicBogunovic.pdf
featurers_Machinelearning___________.pdf
Feature Selection.pdf
A Review on Feature Selection Methods For Classification Tasks
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
feature-Selection-Lab-8-20032024-111222am.pptx
Booster in High Dimensional Data Classification
Lecture 6 Feature Selection Techniques in Data Science.pptx
data reduction techniques-data minig.pptx
PyData London 2018 talk on feature selection
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1
Ad

More from shilpamathur13 (11)

PPTX
Understanding Generative AI Models and Their Real-World Applications.pptx
PPTX
Exploring the Foundations and Applications of Generative Artificial Intellige...
PPTX
Software Configuration Management and QA.pptx
PPTX
Requirement Analysis and Modeling in Software Engineering.pptx
PPTX
Introduction to Software Engineering.pptx
PPTX
Comprehensive Testing Strategies for Reliable and Quality Software Developmen...
PPTX
Principles and Practices of Effective Software Design and Architecture.pptx
PPTX
Project Scheduling and Tracking in Software Engineering.pptx
PPTX
Machine Learning for Game Artificial Intelligence.pptx
PPTX
Crafting Interactive Experiences Through Game Programming.pptx
PPT
Clustering in Machine Learning: A Brief Overview.ppt
Understanding Generative AI Models and Their Real-World Applications.pptx
Exploring the Foundations and Applications of Generative Artificial Intellige...
Software Configuration Management and QA.pptx
Requirement Analysis and Modeling in Software Engineering.pptx
Introduction to Software Engineering.pptx
Comprehensive Testing Strategies for Reliable and Quality Software Developmen...
Principles and Practices of Effective Software Design and Architecture.pptx
Project Scheduling and Tracking in Software Engineering.pptx
Machine Learning for Game Artificial Intelligence.pptx
Crafting Interactive Experiences Through Game Programming.pptx
Clustering in Machine Learning: A Brief Overview.ppt

Recently uploaded (20)

PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Construction Project Organization Group 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Welding lecture in detail for understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Construction Project Organization Group 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
573137875-Attendance-Management-System-original
bas. eng. economics group 4 presentation 1.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Foundation to blockchain - A guide to Blockchain Tech
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
CYBER-CRIMES AND SECURITY A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mechanical Engineering MATERIALS Selection
Welding lecture in detail for understanding
Operating System & Kernel Study Guide-1 - converted.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...

Feature Engineering Fundamentals Explained.pptx

  • 1. Module 3 Advanced Feature Engineering and Feature Selection
  • 2. Introduction to Feature Engineering Feature engineering is the process of improving a model’s accuracy by using domain knowledge to select and transform raw data’s most relevant variables into features of predictive models that better represent the underlying problem.
  • 4. Feature Engineering Feature Transformation Feature Construction Feature Selection Feature Extraction Missing value imputation Handling Categorical Features Outlier detection Feature scaling
  • 7. Outlier Detection interquartile range = Upper Quartile – Lower Quartile = Q­ 3 – Q­ 1
  • 9. Why do we need feature scaling
  • 11. Standardisation(Z-score normalization) Assume our dataset has random numeric values in the range of 1 to 95,000 (in random order)Just for our understanding consider a small Dataset of barely 10 values with numbers in the given range and randomized order. If we just look at these values, their range is so high, that while training the model with 10,000 such values will take lot of time. We have a solution to solve the problem arisen i.e. Standardization. It helps us solve this by : ● Down Scaling the Values to a scale common to all, usually in the range - 1 to +1. ● And keeping the Range between the values intact.
  • 15. Feature Selection Techniques Feature selection is a crucial step in the machine learning pipeline, involving the selection of a subset of relevant features (variables, predictors) for use in model construction. Effective feature selection can improve model performance, reduce overfitting, and decrease training time. The role of feature selection in machine learning is, 1.To reduce the dimensionality of feature space. 2.To speed up a learning algorithm. 3.To improve the predictive accuracy of a classification algorithm.
  • 16. There are several techniques for feature selection:
  • 17. Filter Methods ▪ In Filter Method, features are selected on the basis of statistics measures. ▪ This method does not depend on the learning algorithm and chooses the features as a pre- processing step. ▪ These methods are faster and less computationally expensive than wrapper methods. ▪ When dealing with high-dimensional data, it is computationally cheaper to use filter methods. ▪ Very good for removing duplicated, correlated, redundant features but these methods do not remove multicollinearity.
  • 18. Information Gain It is defined as the amount of information provided by the feature for identifying the target value and measures reduction in the entropy values. Information gain of each attribute is calculated considering the target values for feature selection. Chi-square Test Chi-square test is a technique to determine the relationship between the categorical variables. The chi-square value is calculated between each feature and the target variable, and the desired number of features with the best chi-square value is selected.
  • 20. Steps: 1.Define Null and Alternative Hypothesis: Null Hypothesis:There is no significant association between the two categorical data Alternative Hypothesis:There is significant association between the two categorical data. 2.Calculate Contingency Table:
  • 23. 5. Compare Chi-square value with Critical value to Accept or Reject Hypothesis Degree of freedom=(r- 1) (c-1) Significance level=0.05
  • 24. Therefore Income level is relevant feature for predicting subscription status
  • 25. Fisher’s Score Fisher score is one of the most widely used supervised feature selection methods.The algorithm returns the ranks of the variables based on the fisher’s score in descending order. Missing Value Ratio The value of the missing value ratio can be used for evaluating the feature set against the threshold value. The formula for obtaining the missing value ratio is the number of missing values in each column divided by the total number of observations.The variable is having more than the threshold value can be dropped.
  • 30. Missing Value Ratio: 1. Calculate the missing value ratio for each feature by dividing the number of missing values by the total number of instances in the dataset. 2. Set a threshold for the acceptable missing value ratio (e.g., 0.8, meaning that a feature should have at most 80% of its values missing to be considered). 3. Filter out features that have a missing value ratio above the threshold.
  • 32. Wrapper Methods Wrapper methods, also referred to as greedy algorithms train the algorithm by using a subset of features in an iterative manner. Based on the conclusions made from training in prior to the model, addition and removal of features takes place. Stopping criteria for selecting the best subset are usually pre-defined by the person training the model such as when the performance of the model decreases or a specific number of features has been achieved. The main advantage of wrapper methods over the filter methods is that they provide an optimal set of features for training the model, thus resulting in better accuracy than the filter methods but are computationally more expensive.
  • 33. Forward selection Forward selection is an iterative process, which begins with an empty set of features. After each iteration, it keeps adding on a feature and evaluates the performance to check whether it is improving the performance or not. The process continues until the addition of a new variable/feature does not improve the performance of the model. Backward elimination Backward elimination is also an iterative approach, but it is the opposite of forward selection. This technique begins the process by considering all the features and removes the least significant feature. This elimination process continues until removing the features does not improve the performance of the model. Recursive Feature Elimination Recursive feature elimination is a recursive greedy optimization approach, where features are selected by recursively taking a smaller and smaller subset of features. Now, an estimator is trained with each set of features, and the importance of each feature is determined using coef_attribute or through a feature_importances_attribute.
  • 34. Exhaustive Feature Selection Exhaustive feature selection is one of the best feature selection methods, which evaluates each feature set as brute-force. It means this method tries & make each possible combination of features and return the best performing feature set. How Exhaustive Feature SelectionWorks 1. Generate all possible feature subsets: For a dataset with n features, this means 𝑛 evaluating 2 ^ subsets (including the empty set). 𝑛 2. Evaluate each subset:Train and evaluate a model using each subset of features.The evaluation metric could be accuracy, precision, recall, F1 score, etc. 3. Select the best subset: Identify the subset of features that provides the best performance according to the chosen evaluation metric.
  • 35. Embedded Methods 1.Regularization This method adds a penalty to different parameters of the machine learning model to avoid overfitting of the model. ▪ Lasso Regression (L1 Regularization): Adds an L1 penalty (the absolute value of the magnitude of coefficients) to the loss function. This can shrink some coefficients to zero, effectively performing feature selection. ▪ Ridge Regression (L2 Regularization): Adds an L2 penalty (the square of the magnitude of coefficients) to the loss function. While it does not perform feature selection by shrinking coefficients to zero, it helps in reducing overfitting and improving model generalization.
  • 36. 2.Tree-based methods Decision Trees: Decision Trees split the data into subsets based on the value of input features, and the splits that provide the best separation (based on criteria like Gini impurity or information gain) indicate the most important features. The depth of the tree and the features selected for splits at various levels provide insights into feature importance. Random Forests: Random Forests are ensembles of decision trees. They provide feature importance by averaging the importance measures of each feature across all the trees. Feature importance in Random Forests is typically calculated by looking at the decrease in impurity (e.g., Gini impurity)
  • 37. Automated Feature Engineering Automated feature engineering aims to simplify and speed up the process of creating features from raw data by leveraging algorithms and tools.This approach reduces manual effort and can uncover complex patterns and interactions that might be missed otherwise. Benefits of Automated Feature Engineering ● Speed: Quickly generates and evaluates a large number of features. ● Complexity Handling: Captures complex interactions and transformations that might be difficult to manually specify. ● Consistency: Applies feature engineering techniques uniformly across different datasets and tasks. ● Performance: Often improves model performance by discovering useful features that enhance predictive power.
  • 38. EvalML AutoML library to automate Feature Engineering evalML is an open-source Python library designed to automate and streamline the machine learning workflow, particularly focusing on end-to-end model development.
  • 39. Feature Engineering for Specific Data Types 1. Numerical Data ▪ Feature Scaling ▪ Power Transformations
  • 40. 2.Categorical Data ▪ One hot encoding ▪ Label encoding ▪ Target Encoding
  • 41. 3.Text Data ▪ Bag of Words (BoW) ▪ TF-IDF (Term Frequency-Inverse Document Frequency ▪ Word Embeddings 4.Time-Series Data ▪ Lag ▪ Fourier Transforms ▪ Time-Based Features