Feature scaling

Feature Scaling
by
Gautam Kumar

What is Feature Scaling?
Feature Scaling is a
to standardize the
independent features
in the data in a fixed range.
handle highly varying
magnitudes or values or
units.

Why feature scaling(Standardization)?
It’s a step of Data Pre-
Processing which is applied to
independent variables
or features of data.
It basically helps to normalize
the data within a particular
range. Sometimes, it also
helps in speeding up the
calculations in an algorithm

When to do
scaling?
 When we use below algorithms feature
scaling matters:
 K-nearest neighbors (KNN)
 K-Means
 Principal Component Analysis(PCA)
 gradient descent
 When we use below algorithms feature
scaling doesn’t require:
 Algorithms those rely on rules
 CART
 Random Forests
 Gradient Boosted Decision Trees

Distance calculation using different technique:
 Euclidean Distance : It is the square-root of the sum of squares of differences between the
coordinate , X is Data Point value, Y is Centroid value and K is no. of feature values.

Continue..
 Manhattan Distance : It is calculated as the sum of absolute differences between the coordinates
(feature values) of data point and centroid of each class.

Continue..
 Minkowski Distance : It is a generalization of above two methods.

Feature Scaling Techniques
Min-Max
Normalization
Standardization Max Abs Scaling Robust Scaling
Quantile
Transformer Scaling
Power Transformer
Scaling
Unit Vector Scaling

Min Max
Normalization
 This technique re-scales a feature or observation value with
distribution value between 0 and 1 or a given range.
 Min Max shrinks the data within the range of -1 to 1 if there are
negative values, and can set the range like [0,1] or [0,5] or [-1,1].
 This technique responds well if the standard deviation is small
and when a distribution is not Gaussian.
 sklearn.preprocessing.MinMaxScaler

Standardization
 This technique is used to re-scales a feature value so that it has
distribution with 0 mean value and variance equals to 1.
 scaling happen independently on each feature by computing the
relevant statistics on the samples in the training set.
 If data is not normally distributed, this is not the best Scaler to
use.
 sklearn.preprocessing.StandardScaler

Max Abs Scaling
Scale each feature by its
absolute value.
This technique scale and
each feature individually such that
the maximal absolute value of
each feature in the training set is
1.0 and minimum absolute is 0.0.
On positive-only data, this Scaler
behaves similarly to Min Max Scaler.
sklearn.preprocessing.MaxAbsScaler

Robust Scaling
This Scaling technique is robust to
outliers, If our data contains
many outliers, scaling using the
and standard deviation of the data
won’t work well.
This Scaling technique removes the
median and scales the data
to the quantile range(defaults to IQR:
Interquartile Range).
sklearn.preprocessing.robust_scale

Quantile Transformer Scaling
This technique transforms the features to follow a uniform or a normal distribution.
A quantile transform will map a variable’s probability distribution to another probability distribution.
transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature
is used to map the original values to a uniform distribution.
The obtained values are then mapped to the desired output distribution using the associated quantile function.
Then a Quantile Transformer is used to map the data distribution Gaussian and standardize the result, centering the values on
the mean value of 0 and a standard deviation of 1.0.
sklearn.preprocessing.quantile_transform

Power Transformer Scaling
The power transformer is a family of
parametric, monotonic transformations
that are applied to make data more
Gaussian-like.
This is useful for modeling issues related
to the variability of a variable that is
unequal across the range.
sklearn.preprocessing.power_transform
The power transform finds the optimal
scaling factor in stabilizing variance and
minimizing skewness through maximum
likelihood estimation.

Unit Vector Scaling
THIS SCALING TECHNIQUE IS DONE
CONSIDERING THE WHOLE FEATURE VECTOR
BE OF UNIT LENGTH.
UNIT VECTOR SCALING MEANS DIVIDING EACH
COMPONENT BY THE EUCLIDEAN LENGTH OF
VECTOR (L2 NORM).
UNIT VECTOR TECHNIQUE PRODUCES VALUES
RANGE [0,1]. WHEN DEALING WITH FEATURES
WITH HARD BOUNDARIES, THIS IS QUITE
EX. WHEN DEALING WITH IMAGE DATA, THE
COLORS CAN RANGE FROM ONLY 0 TO 255.

Any Question?
Contact:Gautam.kmr2893@outlook.com

Feature scaling

More Related Content

What's hot (20)

Similar to Feature scaling (20)

Recently uploaded (20)

Feature scaling