featurers_Machinelearning___________.pdf

AGENDA
• Feature engineering
• Feature selection
• Dealing with categorical data

Why Should we Use Feature Scaling?
• Dataset had multiple features spanning varying degrees of magnitude, range, and units. This is a
significant obstacle as a few machine learning algorithms are highly sensitive to these features.

Feature Scaling
Normalization Standardization

Normalization: Min-Max scaling
• Normalization is a scaling technique in which values are shifted and
rescaled so that they end up ranging between 0 and 1. It is also
known as Min-Max scaling.
• Here’s the formula for normalization:
•

Standardization – Z score normalization
• Standardization is another scaling technique where the values are
centered around the mean with a unit standard deviation.
• This means that the mean of the attribute becomes zero and the
resultant distribution has a unit standard deviation (equals 1).
• Here’s the formula for standardization:

featurers_Machinelearning___________.pdf

Standardization (Z-score Normalization )

Creating Features
“Good” features are the key to accurate generalization
Domain knowledge can be used to generate a feature set
Medical Example: results of blood tests, age, smoking history
Game Playing example: number of pieces on the board, control of the center
of the board
Data might not be in vector form
Example: spam classification
“Bag of words”: throw out order, keep count of how many times each word appears.
Sequence: one feature for first letter in the email, one for second letter, etc.
Ngrams: one feature for every unique string of n features

What is feature selection?
Reducing the feature space by throwing out some of
the features

Features Selection
Without using it With using it
ü Increase in complexity of a model and makes
it harder to interpret.
ü Increase in time complexity for a model to get
trained.
ü Result in a dumb model with inaccurate or
less reliable predictions.
Ø feature selection helps in finding the smallest
set of features which results in
ü Training a machine learning algorithm faster.
ü Reducing the complexity of a model and
making it easier to interpret.
ü Building a sensible model with better
prediction power.
ü Reducing over-fitting by selecting the right set
of features.

Reasons for Feature Selection
Want to find which features are relevant
Domain specialist not sure which factors are predictive of disease
Common practice: throw in every feature you can think of, let feature selection get rid of
useless ones
Want to maximize accuracy, by removing irrelevant and noisy
features
For Spam, create a feature for each of ~105 English words
Training with all features computationally expensive
Irrelevant features hurt generalization
Features have associated costs, want to optimize accuracy with least
expensive features
Embedded systems with limited resources
Voice recognition on a cell phone
Branch prediction in a CPU (4K code limit)

Terminology
Univariate method: considers one variable (feature) at a time
Multivariate method: considers subsets of variables (features) together
Filter method: ranks features or feature subsets independently of the
predictor (classifier)
Wrapper method: uses a classifier to assess features or feature subsets

Filter Methods:
Wrapper Methods:
Embedded Methods:
Types of Feature Selection:

Feature Selection Methods
Filter:
Wrapper:
Supervised
Learning
Algorithm
All Features
Selected
Features
Classifier
Selected
Features
Filter (Score)
Search
Feature
Evaluation
Criterion
All Features
Feature
Subset
Criterion Value
Classifier
Selected
Features
Classifier
Supervised
Learning
Algorithm

Constant
removal(Variance
Threshold)
Correlation-based
Chi-Square Test
(for Categorical
Features)
ANOVA (Analysis
of Variance)
Information Gain
Filter method: These methods evaluate the intrinsic
characteristics of features independent of the model.

Constant removal :goal of constant removal is to identify and
eliminate features that exhibit no variation or have constant values
across all data points in a dataset
Calculate variance
or standard
deviation for each
feature
1
Set a threshold for
variance
2
Remove features
below the threshold
3

1-Calculate variance or standard deviation for
each feature
the variance of a set of data points measures how far each data point
in the set is from the mean (average) of the data
it indicates that their values are relatively constant across different
instances in the dataset
Features with zero variance (or very low variance) are considered
constant

2) Set a Threshold : Define a threshold value for the variance,
features with variance below this threshold are flagged for removal.
considerations for choosing an appropriate threshold:
1-Impact on Model Performance
2-Domain Knowledge
3-Balance Between Information Loss and Noise Reduction
4-Dataset Size
3)Remove Constant Features:
Eliminate the identified constant features from the dataset.
The remaining features are considered more informative and
are retained for further analysis or modeling

Constant
removal(Variance
Threshold)
Correlation-based
Chi-Square Test
(for Categorical
Features)
ANOVA (Analysis
of Variance)
Information Gain
Filter method:

Benefits
Improve computational efficiency
Improved model performance
Faster training times
Reduce noise in the dataset
Reduced overfitting

Recursive Feature
Elimination algorithm
1.Rank the importance of all features
using the chosen RFE machine
learning algorithm.
2.Eliminate the least important feature.
3.Build a model using the remaining
features.
4.Repeat steps 1-3 until the desired
number of features is reached

Encoding Categorical Data
• There are different techniques to encode the categorical
features to numeric quantities.
1) Encoding labels
2) One-Hot encoding

Label Encoding
• allows you to convert each
value in a column to a
number. Numerical labels
are always between 0 and
n_categories. - 1.

One-Hot Encoding
• The basic strategy is to
convert each category
value into a new
column and assign a 1
or 0 (True/False) value
to the column.

Confusion Matrix
TP, TN , FN, FP

Evaluation of classification models from confusion matrix
• Accuracy
• Precision
• Recall (sensitivity)
• F1 Score
• Specificity

Evaluation of classification models: Accuracy
Accuracy simply measures how often the classifier makes the correct prediction.
It’s the ratio between the number of correct predictions and the total number of
predictions.

Evaluation of classification models: precision
Precision It is a measure of correctness that is achieved in true prediction. In simple
words, it tells us how many predictions are actually positive out of all the total
positive predicted.

Evaluation of classification models: Recall
Recall (Sensitivity): It is a measure of actual observations which are
predicted correctly, i.e. how many observations of positive class are actually
predicted as positive. It is also known as Sensitivity.

F1 score: It is the harmonic mean of precision and recall. It takes both false
positive and false negatives into account.

Specificity
Specificity = TN / TN + FP

Evaluation of Regression models
• Mean Absolute Error (MAE),
• Mean Squared Error (MSE),

Evaluation of Regression models: Mean Squared Error
Mean Squared Error (MSE) : the most popular metric used for regression
problems. It essentially finds the average of the squared difference
between the target value and the value predicted by the regression model.
Where:
•y_j: actual value
•y_hat: predicted value from the regression model
•N: number of samples

Evaluation of Regression models: Mean Absolute Error
Mean Absolute Error (MAE) : the average of the difference between the
ground truth and the predicted values. Mathematically, its represented as :
Where:
•y_j: actual value
•y_hat: predicted value from the regression model
•N: number of samples

featurers_Machinelearning___________.pdf

More Related Content

Similar to featurers_Machinelearning___________.pdf (20)

More from AmirMohamedNabilSale (20)

Recently uploaded (20)

featurers_Machinelearning___________.pdf