Data Scaling, data science, data preparation.

Feature Scaling
By Asma Qaiser

Question?
▪Can we compare
Virat kohli’s batting
and Shaheen
Afridi’s batting?

What is Feature Scaling?
▪ A method to scale numeric features in the same scale or
range (-1 to 1 or 0 to 1).
▪ This is the last step of feature engineering pipeline.
▪ We apply feature scaling on independent variables
▪ We fit the feature scaling with train data and transform on
train and test data.

What is Feature Scaling?
▪ In general Data set contains different types of variables having different
magnitude and units (kilograms, grams, Age in years, salary in thousands
etc).
▪ The significant issue with variables is that they might differ in terms of range
of values.
▪ So the feature with large range of values will start dominating against other
variables.
▪ Models could be biased towards those high ranged features.
▪ So to overcome this problem, we do feature scaling.
▪ The goal of applying Feature Scaling is to make sure features are on almost
the same scale so that each feature is equally important and make it easier

Why and how high range of features
impact model performance?

Why and how high range of features
impact model performance?
▪ In the table both Age and Salary have different range of values.
▪ So when we train a model it might give high importance to salary
column just because the high range of values.
▪ However it could not be the case and both columns have equal or
near to equal impact on target variable which could be based on age
and salary whether a person will buy a house or not.
▪ So in case of buying a house both age and salary have equal
importance.
▪ We need to do the feature scaling.

Feature Scaling Techniques
There are two mostly used feature scaling techniques.
▪ Normalization
▪ Standardization

Normalization
▪ Normalization is also known as min-max normalization or
min-max scaling.
▪ Normalization re-scales values in the range of 0-1
▪ Normalization is good to use when your data does not
follow a Normal distribution.

Data Scaling, data science, data preparation.

Standardization
▪ Standardization or Z-Score Normalization is one of the
feature scaling techniques, here the transformation of
features is done by subtracting from the mean and
dividing by standard deviation.
▪ This is often called Z-score normalization.
▪ The resulting data will have the mean as 0 and the
standard deviation as 1.
▪ Standardization, can be helpful in cases where the data
follows a Normal distribution.

Which ML algorithms required feature
scaling?
▪ KNN
▪ K- means
▪ SVM
▪ PCA
▪ Gradient descent based algorithms (linear regression ,
logistic regression, NN)

Data Scaling, data science, data preparation.

More Related Content

Similar to Data Scaling, data science, data preparation. (20)

Recently uploaded (20)

Data Scaling, data science, data preparation.

Editor's Notes