SlideShare a Scribd company logo
Feature
Engineering
AGENDA
• Feature engineering
• Feature selection
• Dealing with categorical data
Feature Scaling
Why Should we Use Feature Scaling?
• Dataset had multiple features spanning varying degrees of magnitude, range, and units. This is a
significant obstacle as a few machine learning algorithms are highly sensitive to these features.
Feature Scaling
Normalization Standardization
Normalization: Min-Max scaling
• Normalization is a scaling technique in which values are shifted and
rescaled so that they end up ranging between 0 and 1. It is also
known as Min-Max scaling.
• Here’s the formula for normalization:
•
Standardization – Z score normalization
• Standardization is another scaling technique where the values are
centered around the mean with a unit standard deviation.
• This means that the mean of the attribute becomes zero and the
resultant distribution has a unit standard deviation (equals 1).
• Here’s the formula for standardization:
featurers_Machinelearning___________.pdf
featurers_Machinelearning___________.pdf
Standardization (Z-score Normalization )
Standardization (Z-score Normalization )
Feature Selection
Creating Features
“Good” features are the key to accurate generalization
Domain knowledge can be used to generate a feature set
Medical Example: results of blood tests, age, smoking history
Game Playing example: number of pieces on the board, control of the center
of the board
Data might not be in vector form
Example: spam classification
“Bag of words”: throw out order, keep count of how many times each word appears.
Sequence: one feature for first letter in the email, one for second letter, etc.
Ngrams: one feature for every unique string of n features
What is feature selection?
Reducing the feature space by throwing out some of
the features
Features Selection
Without using it With using it
ü Increase in complexity of a model and makes
it harder to interpret.
ü Increase in time complexity for a model to get
trained.
ü Result in a dumb model with inaccurate or
less reliable predictions.
Ø feature selection helps in finding the smallest
set of features which results in
ü Training a machine learning algorithm faster.
ü Reducing the complexity of a model and
making it easier to interpret.
ü Building a sensible model with better
prediction power.
ü Reducing over-fitting by selecting the right set
of features.
Reasons for Feature Selection
Want to find which features are relevant
Domain specialist not sure which factors are predictive of disease
Common practice: throw in every feature you can think of, let feature selection get rid of
useless ones
Want to maximize accuracy, by removing irrelevant and noisy
features
For Spam, create a feature for each of ~105 English words
Training with all features computationally expensive
Irrelevant features hurt generalization
Features have associated costs, want to optimize accuracy with least
expensive features
Embedded systems with limited resources
Voice recognition on a cell phone
Branch prediction in a CPU (4K code limit)
Terminology
Univariate method: considers one variable (feature) at a time
Multivariate method: considers subsets of variables (features) together
Filter method: ranks features or feature subsets independently of the
predictor (classifier)
Wrapper method: uses a classifier to assess features or feature subsets
Filter Methods:
Wrapper Methods:
Embedded Methods:
Types of Feature Selection:
Feature Selection Methods
Filter:
Wrapper:
Supervised
Learning
Algorithm
All Features
Selected
Features
Classifier
Selected
Features
Filter (Score)
Search
Feature
Evaluation
Criterion
All Features
Feature
Subset
Criterion Value
Classifier
Selected
Features
Classifier
Supervised
Learning
Algorithm
Constant
removal(Variance
Threshold)
Correlation-based
Chi-Square Test
(for Categorical
Features)
ANOVA (Analysis
of Variance)
Information Gain
Filter method: These methods evaluate the intrinsic
characteristics of features independent of the model.
Constant removal :goal of constant removal is to identify and
eliminate features that exhibit no variation or have constant values
across all data points in a dataset
Calculate variance
or standard
deviation for each
feature
1
Set a threshold for
variance
2
Remove features
below the threshold
3
1-Calculate variance or standard deviation for
each feature
the variance of a set of data points measures how far each data point
in the set is from the mean (average) of the data
it indicates that their values are relatively constant across different
instances in the dataset
Features with zero variance (or very low variance) are considered
constant
2) Set a Threshold : Define a threshold value for the variance,
features with variance below this threshold are flagged for removal.
considerations for choosing an appropriate threshold:
1-Impact on Model Performance
2-Domain Knowledge
3-Balance Between Information Loss and Noise Reduction
4-Dataset Size
3)Remove Constant Features:
Eliminate the identified constant features from the dataset.
The remaining features are considered more informative and
are retained for further analysis or modeling
Constant
removal(Variance
Threshold)
Correlation-based
Chi-Square Test
(for Categorical
Features)
ANOVA (Analysis
of Variance)
Information Gain
Filter method:
Benefits
Improve computational efficiency
Improved model performance
Faster training times
Reduce noise in the dataset
Reduced overfitting
featurers_Machinelearning___________.pdf
Recursive
Feature
Elimination
Recursive Feature
Elimination algorithm
1.Rank the importance of all features
using the chosen RFE machine
learning algorithm.
2.Eliminate the least important feature.
3.Build a model using the remaining
features.
4.Repeat steps 1-3 until the desired
number of features is reached
featurers_Machinelearning___________.pdf
featurers_Machinelearning___________.pdf
featurers_Machinelearning___________.pdf
featurers_Machinelearning___________.pdf
featurers_Machinelearning___________.pdf
featurers_Machinelearning___________.pdf
Categorical
Data
featurers_Machinelearning___________.pdf
Encoding Categorical Data
• There are different techniques to encode the categorical
features to numeric quantities.
1) Encoding labels
2) One-Hot encoding
Label Encoding
• allows you to convert each
value in a column to a
number. Numerical labels
are always between 0 and
n_categories. - 1.
Label Encoding Example
One-Hot Encoding
• The basic strategy is to
convert each category
value into a new
column and assign a 1
or 0 (True/False) value
to the column.
Classification
Metrics
Confusion Matrix
TP, TN , FN, FP
Evaluation of classification models from confusion matrix
• Accuracy
• Precision
• Recall (sensitivity)
• F1 Score
• Specificity
Evaluation of classification models: Accuracy
Accuracy simply measures how often the classifier makes the correct prediction.
It’s the ratio between the number of correct predictions and the total number of
predictions.
Evaluation of classification models: precision
Precision It is a measure of correctness that is achieved in true prediction. In simple
words, it tells us how many predictions are actually positive out of all the total
positive predicted.
Evaluation of classification models: Recall
Recall (Sensitivity): It is a measure of actual observations which are
predicted correctly, i.e. how many observations of positive class are actually
predicted as positive. It is also known as Sensitivity.
Evaluation of classification models: Recall
F1 score: It is the harmonic mean of precision and recall. It takes both false
positive and false negatives into account.
Evaluation of classification models: Recall
Specificity
Specificity = TN / TN + FP
Regression
Metrics
Evaluation of Regression models
• Mean Absolute Error (MAE),
• Mean Squared Error (MSE),
Evaluation of Regression models: Mean Squared Error
Mean Squared Error (MSE) : the most popular metric used for regression
problems. It essentially finds the average of the squared difference
between the target value and the value predicted by the regression model.
Where:
•y_j: actual value
•y_hat: predicted value from the regression model
•N: number of samples
Evaluation of Regression models: Mean Absolute Error
Mean Absolute Error (MAE) : the average of the difference between the
ground truth and the predicted values. Mathematically, its represented as :
Where:
•y_j: actual value
•y_hat: predicted value from the regression model
•N: number of samples

More Related Content

PDF
Machine Learning.pdf
PPT
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
PDF
Machine learning Mind Map
PPT
6811067.ppt6811067.ppt6811067.ppt6811067.ppt
PPTX
Feature Engineering Fundamentals Explained.pptx
PPTX
Data Engineer's Lunch #67: Machine Learning - Feature Selection
PDF
An introduction to variable and feature selection
PPTX
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Machine Learning.pdf
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
Machine learning Mind Map
6811067.ppt6811067.ppt6811067.ppt6811067.ppt
Feature Engineering Fundamentals Explained.pptx
Data Engineer's Lunch #67: Machine Learning - Feature Selection
An introduction to variable and feature selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection

Similar to featurers_Machinelearning___________.pdf (20)

PPT
feature-selection.ppt on machine learning
PDF
Machine Learning Notes for beginners ,Step by step
PDF
Variable and feature selection
PPTX
Classification in the database system.pptx
PDF
Optimization Technique for Feature Selection and Classification Using Support...
PDF
ML-Unit-4.pdf
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PPTX
Build_Machine_Learning_System for Machine Learning Course
PPT
feature selection slides share and types of features selection
PPTX
Feature enginnering and selection
PPT
clustering, k-mean clustering, confusion matrices
PPTX
Performance Measurement for Machine Leaning.pptx
PPTX
Week_8machine learning (feature selection).pptx
PPTX
Lecture 6 Feature Selection Techniques in Data Science.pptx
PPTX
Module 3_ Classification.pptx
PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
DOCX
Deep Learning Vocabulary.docx
PDF
Feature Engineering
PPTX
introduction to Statistical Theory.pptx
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
feature-selection.ppt on machine learning
Machine Learning Notes for beginners ,Step by step
Variable and feature selection
Classification in the database system.pptx
Optimization Technique for Feature Selection and Classification Using Support...
ML-Unit-4.pdf
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Build_Machine_Learning_System for Machine Learning Course
feature selection slides share and types of features selection
Feature enginnering and selection
clustering, k-mean clustering, confusion matrices
Performance Measurement for Machine Leaning.pptx
Week_8machine learning (feature selection).pptx
Lecture 6 Feature Selection Techniques in Data Science.pptx
Module 3_ Classification.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Deep Learning Vocabulary.docx
Feature Engineering
introduction to Statistical Theory.pptx
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
Ad

More from AmirMohamedNabilSale (20)

DOCX
dcu1-distributed-computing-lecture-notes.docx
DOCX
CourseAimsaaaaaaaaaaaaaaaaàaaaaaaaa.docx
PDF
LecccccccccccccProgrammingLecture-09.pdf
PDF
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
PDF
6.c-CMPS 403-F19-Session 6-Resolution.pdf
PDF
3.b-CMPS 403-F20-Session 3-Solving CSP I.pdf
PDF
2.a-CMPS 403-F20-Session 2-Search Problems.pdf
PDF
LectureNote2.pdf
PDF
Lecture_1_matrix_operations.pdf
PDF
232021-211025052822.pdf
PDF
9a52019-211025074532.pdf
PDF
random-211016153637.pdf
PDF
PDF
Linux_Commands.pdf
PPTX
AI in covid 19 (1).pptx
PPTX
COVID-19 PowerPoint.pptx
PPTX
Edu week2022.pptx
PDF
dcu1-distributed-computing-lecture-notes.docx
CourseAimsaaaaaaaaaaaaaaaaàaaaaaaaa.docx
LecccccccccccccProgrammingLecture-09.pdf
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
6.c-CMPS 403-F19-Session 6-Resolution.pdf
3.b-CMPS 403-F20-Session 3-Solving CSP I.pdf
2.a-CMPS 403-F20-Session 2-Search Problems.pdf
LectureNote2.pdf
Lecture_1_matrix_operations.pdf
232021-211025052822.pdf
9a52019-211025074532.pdf
random-211016153637.pdf
Linux_Commands.pdf
AI in covid 19 (1).pptx
COVID-19 PowerPoint.pptx
Edu week2022.pptx
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
additive manufacturing of ss316l using mig welding
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Well-logging-methods_new................
PPTX
Internet of Things (IOT) - A guide to understanding
PPT
Project quality management in manufacturing
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
PPT on Performance Review to get promotions
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
573137875-Attendance-Management-System-original
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Lecture Notes Electrical Wiring System Components
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
additive manufacturing of ss316l using mig welding
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Well-logging-methods_new................
Internet of Things (IOT) - A guide to understanding
Project quality management in manufacturing
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Foundation to blockchain - A guide to Blockchain Tech
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT on Performance Review to get promotions
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Embodied AI: Ushering in the Next Era of Intelligent Systems
573137875-Attendance-Management-System-original
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
UNIT 4 Total Quality Management .pptx
Lecture Notes Electrical Wiring System Components

featurers_Machinelearning___________.pdf

  • 2. AGENDA • Feature engineering • Feature selection • Dealing with categorical data
  • 4. Why Should we Use Feature Scaling? • Dataset had multiple features spanning varying degrees of magnitude, range, and units. This is a significant obstacle as a few machine learning algorithms are highly sensitive to these features.
  • 6. Normalization: Min-Max scaling • Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling. • Here’s the formula for normalization: •
  • 7. Standardization – Z score normalization • Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. • This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation (equals 1). • Here’s the formula for standardization:
  • 13. Creating Features “Good” features are the key to accurate generalization Domain knowledge can be used to generate a feature set Medical Example: results of blood tests, age, smoking history Game Playing example: number of pieces on the board, control of the center of the board Data might not be in vector form Example: spam classification “Bag of words”: throw out order, keep count of how many times each word appears. Sequence: one feature for first letter in the email, one for second letter, etc. Ngrams: one feature for every unique string of n features
  • 14. What is feature selection? Reducing the feature space by throwing out some of the features
  • 15. Features Selection Without using it With using it ü Increase in complexity of a model and makes it harder to interpret. ü Increase in time complexity for a model to get trained. ü Result in a dumb model with inaccurate or less reliable predictions. Ø feature selection helps in finding the smallest set of features which results in ü Training a machine learning algorithm faster. ü Reducing the complexity of a model and making it easier to interpret. ü Building a sensible model with better prediction power. ü Reducing over-fitting by selecting the right set of features.
  • 16. Reasons for Feature Selection Want to find which features are relevant Domain specialist not sure which factors are predictive of disease Common practice: throw in every feature you can think of, let feature selection get rid of useless ones Want to maximize accuracy, by removing irrelevant and noisy features For Spam, create a feature for each of ~105 English words Training with all features computationally expensive Irrelevant features hurt generalization Features have associated costs, want to optimize accuracy with least expensive features Embedded systems with limited resources Voice recognition on a cell phone Branch prediction in a CPU (4K code limit)
  • 17. Terminology Univariate method: considers one variable (feature) at a time Multivariate method: considers subsets of variables (features) together Filter method: ranks features or feature subsets independently of the predictor (classifier) Wrapper method: uses a classifier to assess features or feature subsets
  • 18. Filter Methods: Wrapper Methods: Embedded Methods: Types of Feature Selection:
  • 19. Feature Selection Methods Filter: Wrapper: Supervised Learning Algorithm All Features Selected Features Classifier Selected Features Filter (Score) Search Feature Evaluation Criterion All Features Feature Subset Criterion Value Classifier Selected Features Classifier Supervised Learning Algorithm
  • 20. Constant removal(Variance Threshold) Correlation-based Chi-Square Test (for Categorical Features) ANOVA (Analysis of Variance) Information Gain Filter method: These methods evaluate the intrinsic characteristics of features independent of the model.
  • 21. Constant removal :goal of constant removal is to identify and eliminate features that exhibit no variation or have constant values across all data points in a dataset Calculate variance or standard deviation for each feature 1 Set a threshold for variance 2 Remove features below the threshold 3
  • 22. 1-Calculate variance or standard deviation for each feature the variance of a set of data points measures how far each data point in the set is from the mean (average) of the data it indicates that their values are relatively constant across different instances in the dataset Features with zero variance (or very low variance) are considered constant
  • 23. 2) Set a Threshold : Define a threshold value for the variance, features with variance below this threshold are flagged for removal. considerations for choosing an appropriate threshold: 1-Impact on Model Performance 2-Domain Knowledge 3-Balance Between Information Loss and Noise Reduction 4-Dataset Size 3)Remove Constant Features: Eliminate the identified constant features from the dataset. The remaining features are considered more informative and are retained for further analysis or modeling
  • 25. Benefits Improve computational efficiency Improved model performance Faster training times Reduce noise in the dataset Reduced overfitting
  • 28. Recursive Feature Elimination algorithm 1.Rank the importance of all features using the chosen RFE machine learning algorithm. 2.Eliminate the least important feature. 3.Build a model using the remaining features. 4.Repeat steps 1-3 until the desired number of features is reached
  • 37. Encoding Categorical Data • There are different techniques to encode the categorical features to numeric quantities. 1) Encoding labels 2) One-Hot encoding
  • 38. Label Encoding • allows you to convert each value in a column to a number. Numerical labels are always between 0 and n_categories. - 1.
  • 40. One-Hot Encoding • The basic strategy is to convert each category value into a new column and assign a 1 or 0 (True/False) value to the column.
  • 43. Evaluation of classification models from confusion matrix • Accuracy • Precision • Recall (sensitivity) • F1 Score • Specificity
  • 44. Evaluation of classification models: Accuracy Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions.
  • 45. Evaluation of classification models: precision Precision It is a measure of correctness that is achieved in true prediction. In simple words, it tells us how many predictions are actually positive out of all the total positive predicted.
  • 46. Evaluation of classification models: Recall Recall (Sensitivity): It is a measure of actual observations which are predicted correctly, i.e. how many observations of positive class are actually predicted as positive. It is also known as Sensitivity.
  • 47. Evaluation of classification models: Recall F1 score: It is the harmonic mean of precision and recall. It takes both false positive and false negatives into account.
  • 48. Evaluation of classification models: Recall Specificity Specificity = TN / TN + FP
  • 50. Evaluation of Regression models • Mean Absolute Error (MAE), • Mean Squared Error (MSE),
  • 51. Evaluation of Regression models: Mean Squared Error Mean Squared Error (MSE) : the most popular metric used for regression problems. It essentially finds the average of the squared difference between the target value and the value predicted by the regression model. Where: •y_j: actual value •y_hat: predicted value from the regression model •N: number of samples
  • 52. Evaluation of Regression models: Mean Absolute Error Mean Absolute Error (MAE) : the average of the difference between the ground truth and the predicted values. Mathematically, its represented as : Where: •y_j: actual value •y_hat: predicted value from the regression model •N: number of samples