SlideShare a Scribd company logo
Feature Scaling
By Asma Qaiser
Question?
▪Can we compare
Virat kohli’s batting
and Shaheen
Afridi’s batting?
What is Feature Scaling?
▪ A method to scale numeric features in the same scale or
range (-1 to 1 or 0 to 1).
▪ This is the last step of feature engineering pipeline.
▪ We apply feature scaling on independent variables
▪ We fit the feature scaling with train data and transform on
train and test data.
What is Feature Scaling?
▪ In general Data set contains different types of variables having different
magnitude and units (kilograms, grams, Age in years, salary in thousands
etc).
▪ The significant issue with variables is that they might differ in terms of range
of values.
▪ So the feature with large range of values will start dominating against other
variables.
▪ Models could be biased towards those high ranged features.
▪ So to overcome this problem, we do feature scaling.
▪ The goal of applying Feature Scaling is to make sure features are on almost
the same scale so that each feature is equally important and make it easier
Why and how high range of features
impact model performance?
Why and how high range of features
impact model performance?
▪ In the table both Age and Salary have different range of values.
▪ So when we train a model it might give high importance to salary
column just because the high range of values.
▪ However it could not be the case and both columns have equal or
near to equal impact on target variable which could be based on age
and salary whether a person will buy a house or not.
▪ So in case of buying a house both age and salary have equal
importance.
▪ We need to do the feature scaling.
Feature Scaling Techniques
There are two mostly used feature scaling techniques.
▪ Normalization
▪ Standardization
Normalization
▪ Normalization is also known as min-max normalization or
min-max scaling.
▪ Normalization re-scales values in the range of 0-1
▪ Normalization is good to use when your data does not
follow a Normal distribution.
Data Scaling, data science, data preparation.
Data Scaling, data science, data preparation.
Standardization
▪ Standardization or Z-Score Normalization is one of the
feature scaling techniques, here the transformation of
features is done by subtracting from the mean and
dividing by standard deviation.
▪ This is often called Z-score normalization.
▪ The resulting data will have the mean as 0 and the
standard deviation as 1.
▪ Standardization, can be helpful in cases where the data
follows a Normal distribution.
Data Scaling, data science, data preparation.
Data Scaling, data science, data preparation.
Data Scaling, data science, data preparation.
Data Scaling, data science, data preparation.
Which ML algorithms required feature
scaling?
▪ KNN
▪ K- means
▪ SVM
▪ PCA
▪ Gradient descent based algorithms (linear regression ,
logistic regression, NN)

More Related Content

PPTX
Data Transformation – Standardization & Normalization PPM.pptx
PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PPTX
Grade structure
PDF
Trust Region Policy Optimization, Schulman et al, 2015
PDF
Scaling and Normalization
PDF
Lead Scoring Group Case Study Presentation.pdf
PDF
Introduction to machine learning
Data Transformation – Standardization & Normalization PPM.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
Grade structure
Trust Region Policy Optimization, Schulman et al, 2015
Scaling and Normalization
Lead Scoring Group Case Study Presentation.pdf
Introduction to machine learning

Similar to Data Scaling, data science, data preparation. (20)

PPTX
PPTX
HR 202 Chapter 05
PDF
Machine Learning - Implementation with Python - 3.pdf
PPTX
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
PPTX
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
PPTX
Predicting Digital Marketing Success: Conversion Forecasting Strategies
PPTX
Machine Learning in the Financial Industry
PDF
Webinar-Comp Foundations-Be an Analytics Hero
PDF
5 Steps to Master Microsoft Excel: Workbooks
PPTX
Measurement
PPT
Measurement scales
PPT
Measurement scales
PPT
Measurement scales
PDF
Introduction to Artificial Intelligence_ Lec 10
PPTX
HR 202 Chapter 11
PPTX
Digital Marketing Campaign Conversion Prediction.
PPTX
Digital Marketing Campaign Conversion Prediction
PPTX
Evaluation measures Data Science Course.pptx
PPTX
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
PDF
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
HR 202 Chapter 05
Machine Learning - Implementation with Python - 3.pdf
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
Predicting Digital Marketing Success: Conversion Forecasting Strategies
Machine Learning in the Financial Industry
Webinar-Comp Foundations-Be an Analytics Hero
5 Steps to Master Microsoft Excel: Workbooks
Measurement
Measurement scales
Measurement scales
Measurement scales
Introduction to Artificial Intelligence_ Lec 10
HR 202 Chapter 11
Digital Marketing Campaign Conversion Prediction.
Digital Marketing Campaign Conversion Prediction
Evaluation measures Data Science Course.pptx
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Ad

Recently uploaded (20)

PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Global Data and Analytics Market Outlook Report
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Predictive modeling basics in data cleaning process
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
How to run a consulting project- client discovery
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Business Analytics and business intelligence.pdf
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Database Infoormation System (DBIS).pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
annual-report-2024-2025 original latest.
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Introduction to Data Science and Data Analysis
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
retention in jsjsksksksnbsndjddjdnFPD.pptx
Global Data and Analytics Market Outlook Report
ISS -ESG Data flows What is ESG and HowHow
Predictive modeling basics in data cleaning process
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
How to run a consulting project- client discovery
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Business Analytics and business intelligence.pdf
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Database Infoormation System (DBIS).pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
annual-report-2024-2025 original latest.
IMPACT OF LANDSLIDE.....................
Introduction to Data Science and Data Analysis
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Qualitative Qantitative and Mixed Methods.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Ad

Data Scaling, data science, data preparation.

  • 2. Question? ▪Can we compare Virat kohli’s batting and Shaheen Afridi’s batting?
  • 3. What is Feature Scaling? ▪ A method to scale numeric features in the same scale or range (-1 to 1 or 0 to 1). ▪ This is the last step of feature engineering pipeline. ▪ We apply feature scaling on independent variables ▪ We fit the feature scaling with train data and transform on train and test data.
  • 4. What is Feature Scaling? ▪ In general Data set contains different types of variables having different magnitude and units (kilograms, grams, Age in years, salary in thousands etc). ▪ The significant issue with variables is that they might differ in terms of range of values. ▪ So the feature with large range of values will start dominating against other variables. ▪ Models could be biased towards those high ranged features. ▪ So to overcome this problem, we do feature scaling. ▪ The goal of applying Feature Scaling is to make sure features are on almost the same scale so that each feature is equally important and make it easier
  • 5. Why and how high range of features impact model performance?
  • 6. Why and how high range of features impact model performance? ▪ In the table both Age and Salary have different range of values. ▪ So when we train a model it might give high importance to salary column just because the high range of values. ▪ However it could not be the case and both columns have equal or near to equal impact on target variable which could be based on age and salary whether a person will buy a house or not. ▪ So in case of buying a house both age and salary have equal importance. ▪ We need to do the feature scaling.
  • 7. Feature Scaling Techniques There are two mostly used feature scaling techniques. ▪ Normalization ▪ Standardization
  • 8. Normalization ▪ Normalization is also known as min-max normalization or min-max scaling. ▪ Normalization re-scales values in the range of 0-1 ▪ Normalization is good to use when your data does not follow a Normal distribution.
  • 11. Standardization ▪ Standardization or Z-Score Normalization is one of the feature scaling techniques, here the transformation of features is done by subtracting from the mean and dividing by standard deviation. ▪ This is often called Z-score normalization. ▪ The resulting data will have the mean as 0 and the standard deviation as 1. ▪ Standardization, can be helpful in cases where the data follows a Normal distribution.
  • 16. Which ML algorithms required feature scaling? ▪ KNN ▪ K- means ▪ SVM ▪ PCA ▪ Gradient descent based algorithms (linear regression , logistic regression, NN)

Editor's Notes

  • #2: Comparison can be performed between similar entities else it will be biased. Same logic applies to Machine Learning as well. Feature Scaling in Machine Learning brings features to the same scale before we apply any comparison or model building. Normalization and Standardization are the two frequently used techniques of Feature Scaling in Machine Learning.