SlideShare a Scribd company logo
10
Most read
12
Most read
17
Most read
Feature Scaling
by
Gautam Kumar
What is Feature Scaling?
Feature Scaling is a
to standardize the
independent features
in the data in a fixed range.
handle highly varying
magnitudes or values or
units.
Why feature scaling(Standardization)?
It’s a step of Data Pre-
Processing which is applied to
independent variables
or features of data.
It basically helps to normalize
the data within a particular
range. Sometimes, it also
helps in speeding up the
calculations in an algorithm
When to do
scaling?
 When we use below algorithms feature
scaling matters:
 K-nearest neighbors (KNN)
 K-Means
 Principal Component Analysis(PCA)
 gradient descent
 When we use below algorithms feature
scaling doesn’t require:
 Algorithms those rely on rules
 CART
 Random Forests
 Gradient Boosted Decision Trees
Distance calculation using different technique:
 Euclidean Distance : It is the square-root of the sum of squares of differences between the
coordinate , X is Data Point value, Y is Centroid value and K is no. of feature values.
Continue..
 Manhattan Distance : It is calculated as the sum of absolute differences between the coordinates
(feature values) of data point and centroid of each class.
Continue..
 Minkowski Distance : It is a generalization of above two methods.
Feature Scaling Techniques
Min-Max
Normalization
Standardization Max Abs Scaling Robust Scaling
Quantile
Transformer Scaling
Power Transformer
Scaling
Unit Vector Scaling
Min Max
Normalization
 This technique re-scales a feature or observation value with
distribution value between 0 and 1 or a given range.
 Min Max shrinks the data within the range of -1 to 1 if there are
negative values, and can set the range like [0,1] or [0,5] or [-1,1].
 This technique responds well if the standard deviation is small
and when a distribution is not Gaussian.
 sklearn.preprocessing.MinMaxScaler
Standardization
 This technique is used to re-scales a feature value so that it has
distribution with 0 mean value and variance equals to 1.
 scaling happen independently on each feature by computing the
relevant statistics on the samples in the training set.
 If data is not normally distributed, this is not the best Scaler to
use.
 sklearn.preprocessing.StandardScaler
Max Abs Scaling
Scale each feature by its
absolute value.
This technique scale and
each feature individually such that
the maximal absolute value of
each feature in the training set is
1.0 and minimum absolute is 0.0.
On positive-only data, this Scaler
behaves similarly to Min Max Scaler.
sklearn.preprocessing.MaxAbsScaler
Robust Scaling
This Scaling technique is robust to
outliers, If our data contains
many outliers, scaling using the
and standard deviation of the data
won’t work well.
This Scaling technique removes the
median and scales the data
to the quantile range(defaults to IQR:
Interquartile Range).
sklearn.preprocessing.robust_scale
Quantile Transformer Scaling
This technique transforms the features to follow a uniform or a normal distribution.
A quantile transform will map a variable’s probability distribution to another probability distribution.
transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature
is used to map the original values to a uniform distribution.
The obtained values are then mapped to the desired output distribution using the associated quantile function.
Then a Quantile Transformer is used to map the data distribution Gaussian and standardize the result, centering the values on
the mean value of 0 and a standard deviation of 1.0.
sklearn.preprocessing.quantile_transform
Power Transformer Scaling
The power transformer is a family of
parametric, monotonic transformations
that are applied to make data more
Gaussian-like.
This is useful for modeling issues related
to the variability of a variable that is
unequal across the range.
sklearn.preprocessing.power_transform
The power transform finds the optimal
scaling factor in stabilizing variance and
minimizing skewness through maximum
likelihood estimation.
Unit Vector Scaling
THIS SCALING TECHNIQUE IS DONE
CONSIDERING THE WHOLE FEATURE VECTOR
BE OF UNIT LENGTH.
UNIT VECTOR SCALING MEANS DIVIDING EACH
COMPONENT BY THE EUCLIDEAN LENGTH OF
VECTOR (L2 NORM).
UNIT VECTOR TECHNIQUE PRODUCES VALUES
RANGE [0,1]. WHEN DEALING WITH FEATURES
WITH HARD BOUNDARIES, THIS IS QUITE
EX. WHEN DEALING WITH IMAGE DATA, THE
COLORS CAN RANGE FROM ONLY 0 TO 255.
Any Question?
Contact:Gautam.kmr2893@outlook.com
Thank You

More Related Content

PPTX
Data preprocessing in Machine learning
PPTX
Preparing your data for Machine Learning with Feature Scaling
PDF
The Evolution of Data Science
PPT
introduction to data mining tutorial
PPTX
Hyperparameter Tuning
PPTX
Undecidable Problems - COPING WITH THE LIMITATIONS OF ALGORITHM POWER
PDF
03 Machine Learning Linear Algebra
PPTX
DATA WRANGLING presentation.pptx
Data preprocessing in Machine learning
Preparing your data for Machine Learning with Feature Scaling
The Evolution of Data Science
introduction to data mining tutorial
Hyperparameter Tuning
Undecidable Problems - COPING WITH THE LIMITATIONS OF ALGORITHM POWER
03 Machine Learning Linear Algebra
DATA WRANGLING presentation.pptx

What's hot (20)

PDF
Feature selection
PDF
Introduction to Statistical Machine Learning
PPTX
Machine Learning - Accuracy and Confusion Matrix
PPTX
PDF
Feature Engineering
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Unsupervised learning
PPT
1.8 discretization
PDF
Feature Engineering in Machine Learning
PPTX
Random forest algorithm
PPTX
Introduction to data science
PPTX
Data clustring
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PPTX
Clustering paradigms and Partitioning Algorithms
PPTX
Machine learning clustering
PPT
Introduction to data structures and Algorithm
PPTX
Overfitting & Underfitting
PDF
Dimensionality Reduction
PDF
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PDF
Performance Metrics for Machine Learning Algorithms
Feature selection
Introduction to Statistical Machine Learning
Machine Learning - Accuracy and Confusion Matrix
Feature Engineering
Data Mining: Mining ,associations, and correlations
Unsupervised learning
1.8 discretization
Feature Engineering in Machine Learning
Random forest algorithm
Introduction to data science
Data clustring
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Clustering paradigms and Partitioning Algorithms
Machine learning clustering
Introduction to data structures and Algorithm
Overfitting & Underfitting
Dimensionality Reduction
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
Performance Metrics for Machine Learning Algorithms
Ad

Similar to Feature scaling (20)

PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
PDF
Preprocessing presentation
PPTX
Data Preprocessing:Feature scaling methods
PDF
ML-Unit-4.pdf
PDF
13_Data Preprocessing in Python.pptx (1).pdf
PDF
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
PPTX
Data Scaling, data science, data preparation.
PDF
Feature Engineering - Getting most out of data for predictive models
PPTX
Data Preprocessing
PDF
Data preprocessing in Machine Learning
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
PPTX
Data Transformation – Standardization & Normalization PPM.pptx
PDF
Machine Learning.pdf
PDF
Feature Scaling with R.pdf
PDF
Machine Learning - Implementation with Python - 3.pdf
PDF
Building a performing Machine Learning model from A to Z
PDF
overview of_data_processing
 
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PDF
Machine learning Mind Map
PPTX
Pandas Data Cleaning and Preprocessing PPT.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Preprocessing presentation
Data Preprocessing:Feature scaling methods
ML-Unit-4.pdf
13_Data Preprocessing in Python.pptx (1).pdf
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
Data Scaling, data science, data preparation.
Feature Engineering - Getting most out of data for predictive models
Data Preprocessing
Data preprocessing in Machine Learning
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Data Transformation – Standardization & Normalization PPM.pptx
Machine Learning.pdf
Feature Scaling with R.pdf
Machine Learning - Implementation with Python - 3.pdf
Building a performing Machine Learning model from A to Z
overview of_data_processing
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Machine learning Mind Map
Pandas Data Cleaning and Preprocessing PPT.pptx
Ad

Recently uploaded (20)

PDF
[EN] Industrial Machine Downtime Prediction
PDF
Introduction to Data Science and Data Analysis
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Introduction to Inferential Statistics.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Global Data and Analytics Market Outlook Report
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Transcultural that can help you someday.
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
DOCX
Factor Analysis Word Document Presentation
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
[EN] Industrial Machine Downtime Prediction
Introduction to Data Science and Data Analysis
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Introduction to Inferential Statistics.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Global Data and Analytics Market Outlook Report
retention in jsjsksksksnbsndjddjdnFPD.pptx
ISS -ESG Data flows What is ESG and HowHow
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Transcultural that can help you someday.
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Factor Analysis Word Document Presentation
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Pilar Kemerdekaan dan Identi Bangsa.pptx
Leprosy and NLEP programme community medicine
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

Feature scaling

  • 2. What is Feature Scaling? Feature Scaling is a to standardize the independent features in the data in a fixed range. handle highly varying magnitudes or values or units.
  • 3. Why feature scaling(Standardization)? It’s a step of Data Pre- Processing which is applied to independent variables or features of data. It basically helps to normalize the data within a particular range. Sometimes, it also helps in speeding up the calculations in an algorithm
  • 4. When to do scaling?  When we use below algorithms feature scaling matters:  K-nearest neighbors (KNN)  K-Means  Principal Component Analysis(PCA)  gradient descent  When we use below algorithms feature scaling doesn’t require:  Algorithms those rely on rules  CART  Random Forests  Gradient Boosted Decision Trees
  • 5. Distance calculation using different technique:  Euclidean Distance : It is the square-root of the sum of squares of differences between the coordinate , X is Data Point value, Y is Centroid value and K is no. of feature values.
  • 6. Continue..  Manhattan Distance : It is calculated as the sum of absolute differences between the coordinates (feature values) of data point and centroid of each class.
  • 7. Continue..  Minkowski Distance : It is a generalization of above two methods.
  • 8. Feature Scaling Techniques Min-Max Normalization Standardization Max Abs Scaling Robust Scaling Quantile Transformer Scaling Power Transformer Scaling Unit Vector Scaling
  • 9. Min Max Normalization  This technique re-scales a feature or observation value with distribution value between 0 and 1 or a given range.  Min Max shrinks the data within the range of -1 to 1 if there are negative values, and can set the range like [0,1] or [0,5] or [-1,1].  This technique responds well if the standard deviation is small and when a distribution is not Gaussian.  sklearn.preprocessing.MinMaxScaler
  • 10. Standardization  This technique is used to re-scales a feature value so that it has distribution with 0 mean value and variance equals to 1.  scaling happen independently on each feature by computing the relevant statistics on the samples in the training set.  If data is not normally distributed, this is not the best Scaler to use.  sklearn.preprocessing.StandardScaler
  • 11. Max Abs Scaling Scale each feature by its absolute value. This technique scale and each feature individually such that the maximal absolute value of each feature in the training set is 1.0 and minimum absolute is 0.0. On positive-only data, this Scaler behaves similarly to Min Max Scaler. sklearn.preprocessing.MaxAbsScaler
  • 12. Robust Scaling This Scaling technique is robust to outliers, If our data contains many outliers, scaling using the and standard deviation of the data won’t work well. This Scaling technique removes the median and scales the data to the quantile range(defaults to IQR: Interquartile Range). sklearn.preprocessing.robust_scale
  • 13. Quantile Transformer Scaling This technique transforms the features to follow a uniform or a normal distribution. A quantile transform will map a variable’s probability distribution to another probability distribution. transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function. Then a Quantile Transformer is used to map the data distribution Gaussian and standardize the result, centering the values on the mean value of 0 and a standard deviation of 1.0. sklearn.preprocessing.quantile_transform
  • 14. Power Transformer Scaling The power transformer is a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. This is useful for modeling issues related to the variability of a variable that is unequal across the range. sklearn.preprocessing.power_transform The power transform finds the optimal scaling factor in stabilizing variance and minimizing skewness through maximum likelihood estimation.
  • 15. Unit Vector Scaling THIS SCALING TECHNIQUE IS DONE CONSIDERING THE WHOLE FEATURE VECTOR BE OF UNIT LENGTH. UNIT VECTOR SCALING MEANS DIVIDING EACH COMPONENT BY THE EUCLIDEAN LENGTH OF VECTOR (L2 NORM). UNIT VECTOR TECHNIQUE PRODUCES VALUES RANGE [0,1]. WHEN DEALING WITH FEATURES WITH HARD BOUNDARIES, THIS IS QUITE EX. WHEN DEALING WITH IMAGE DATA, THE COLORS CAN RANGE FROM ONLY 0 TO 255.