SlideShare a Scribd company logo
2
Most read
7
Most read
10
Most read
Lesson 5
Scaling and Normalization Kush Kulshrestha
Topics
1. Scaling of Data
2. Data Normalization
3. Difference between Scaling and Normalization
4. Min-max scalar and Box-Cox Transformation
Scaling of data
• Variable scaling requires taking values that span a specific range and representing them in another range.
• The standard method is to scale variables to [0,1].
• This may introduce various distortions or biases into the data but the distribution or shape remains same.
• Depending on the modeling tool, scaling variable ranges can be beneficial or sometimes even required.
One way of doing this is:
Linear Scaling Transform
• First task in this scaling is to determine the minimum and maximum values of variables.
• Then applying the transform:
(x - min{x1, xN}) / (max{x1, xN} - min{x1, xN})
• This introduces no distortion to the variable distribution.
• Has a one-to-one relationship between the original and scaled values.
Scaling of data
Out of Range values
• In data preparation, the data used is only a sample of the population.
• Therefore, it is not certain that the actual minimum and maximum values of the variable have been
discovered when scaling the ranges.
• If some values that turn up later in the mining process are outside of the limits discovered in the sample,
they are called out-of-range values.
Scaling of data
Dealing with Out of Range values
• After range scaling, all variables should be in the range of [0,1].
• Out-of-range values, however, have values like -0.2 or 1.1 which can cause unwanted behavior.
Solution 1: Ignore that the range has been exceeded.
• Most modeling tools have (at least) some capacity to handle numbers outside the scaling range.
• Important question to ask: Does this affect the quality of the model?
Solution 2: Exclude the out of range instances.
• One problem is that reducing the number of instances reduces the confidence that the sample represents
the population.
• Another problem: Introduction of bias. Out-of-range values may occur with a certain pattern and ignoring
these instances removes samples according to a pattern introducing distortion to the sample
Scaling of data
Dealing with Out of Range values
Solution 3: Clip the out of range values
• If the value is greater than 1, assign 1 to it. If less than 0, assign 0.
• This approach assumes that out-of-range values are somehow equivalent with range limit values.
• Therefore, the information content on the limits is distorted by projecting multiple values into a single
value.
• This also introduces some bias.
Solution 4: Making room for out of range values
• The linear scaling transform provides an undistorted normalization but suffers from out-of-range values.
• Therefore, we should modify it to somehow include also values that are out of range.
• Most of the population is inside the range so for these values the normalization should be linear.
• The solution is to reserve some part of the range for the out-of-range values.
• Reserved amount of space depends on the confidence level of the sample:
e.g. - 98% confidence linear part is [0.01, 0.99]
Scaling of data
Dealing with Out of Range values
Squashing the out of range values
• Now the problem reduces to fitting the out-of-range values into the space left for them.
• The greater the difference between a value and the range limit, the less likely any such value is found.
• Therefore, the transformation should be such that as the distance to the range grows, the smaller the
increase towards one or decrease towards zero.
• One possibility is to use functions of the form y =1/x and attach them to the ends of the linear part.
Its difficult to carry out the scaling in pieces depending on the nature of the data point.
We can do all of the above steps using one function called Softmax scaling.
Scaling of data
Dealing with Out of Range values
Softmax Scaling –
• The extent of the linear part can be controlled by one parameter.
• The space assigned for out-of-range values can be controlled by the level of uncertainty in the sample.
• Non identical values have always different normalized values.
Softmax scaling is based on the logistic function:
y = 1 / (1 + e-x)
Where y is the scaled value and x is the input value.
• The logistic function transforms the original range of
[-,] to [0,1] and also has a linear part on the transform.
Scaling of data
Min-Max Scaler In scikit-learn
Data Normalization
• Applying a function to each data point z in the data: yi = f(zi)
• Unlike scaling, not only the data is distorted, but the shape or distribution of the data changes as well.
• The point of normalization is to change your observations so that they can be described as a normal
distribution.
• In general, you'll only want to normalize your data if you're going to be using a machine learning or
statistics technique that assumes your data is normally distributed.
• Or before using any technique containing “Gaussian” in its name.
Data Normalization
Data Standardization
• Also called z-scores.
• The terms normalization and standardization are
sometimes used interchangeably.
• Standardization transforms data to have a mean of
zero and a standard deviation of 1.
• Done by subtracting the mean and dividing by the
standard deviation for each data point.
Data Normalization
Log Transformation
• The log transformation can be used to make highly skewed distributions less skewed.
• This can be valuable both for making patterns in the data more interpretable and for helping to meet the
assumptions of inferential statistics.
• Figure shows an example of how a log transformation can make patterns more visible. Both graphs plot
the brain weight of animals as a function of their body weight. The raw weights are shown in the left
panel; the log-transformed weights are plotted in the right panel.
Data Normalization
Box-Cox Transformation
The Box-Cox transformation of the variable x is also indexed by λ, and is defined as:
• Box-Cox transformations cannot handle negative values.
• One way to deal with this is by adding the minimum data point value to add the data points.
• We want to use the same lambda generated from the training set of box cox transformation in the test
set.
• Another way to write this equation:
as ƛ turns to be 0
Data Normalization
Box-Cox Transformation
Notice with this definition of that x = 1 always maps to the point = 0 for all values of λ. To see how
the transformation works, look at the examples in Figure.
Data Normalization
Box-Cox Transformation
In the top row, the choice λ = 1 simply shifts x to the value x−1, which is a straight line. In the bottom row (on
a semi-logarithmic scale), the choice λ = 0 corresponds to a logarithmic transformation, which is now a
straight line. We superimpose a larger collection of transformations on a semi-logarithmic scale in Figure 2.
Data Normalization
Box-Cox Transformation
Data Normalization
Why do we need normalization?
1. Some algorithms need normally distributed data as input, otherwise the results are not reliable.
2. Easy comparison of values.
3. Interpretation of results makes more sense.
4. Objective function works faster in some cases if the data is normalized, i.e., speed of the algorithm
increases.
5. Can create complex features that may improve the model (or make it non-linear)

More Related Content

PDF
Deep Dive into Hyperparameter Tuning
PPTX
Types of Machine Learning
PDF
Machine learning Mind Map
PDF
Classification Based Machine Learning Algorithms
PPTX
Word embedding
PDF
12 ch ken black solution
PDF
07 ch ken black solution
PDF
Time Table Management System Software Report
Deep Dive into Hyperparameter Tuning
Types of Machine Learning
Machine learning Mind Map
Classification Based Machine Learning Algorithms
Word embedding
12 ch ken black solution
07 ch ken black solution
Time Table Management System Software Report

What's hot (20)

PDF
Bias and variance trade off
PDF
Linear models for classification
PPTX
Mining single dimensional boolean association rules from transactional
PDF
Logistic regression in Machine Learning
PPTX
Feedforward neural network
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Data preprocessing in Machine learning
PPT
Data preprocessing
PDF
Data preprocessing using Machine Learning
PDF
Support Vector Machines ( SVM )
PPT
2.5 backpropagation
PPTX
Uncertainty in AI
PPT
3. mining frequent patterns
PPTX
Image Representation & Descriptors
PPTX
Regularization in deep learning
PPTX
Machine learning with ADA Boost
PPTX
Machine learning session4(linear regression)
PPTX
Clustering in Data Mining
PPT
01 Data Mining: Concepts and Techniques, 2nd ed.
Bias and variance trade off
Linear models for classification
Mining single dimensional boolean association rules from transactional
Logistic regression in Machine Learning
Feedforward neural network
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Data preprocessing in Machine learning
Data preprocessing
Data preprocessing using Machine Learning
Support Vector Machines ( SVM )
2.5 backpropagation
Uncertainty in AI
3. mining frequent patterns
Image Representation & Descriptors
Regularization in deep learning
Machine learning with ADA Boost
Machine learning session4(linear regression)
Clustering in Data Mining
01 Data Mining: Concepts and Techniques, 2nd ed.
Ad

Similar to Scaling and Normalization (20)

PPTX
Data Transformation – Standardization & Normalization PPM.pptx
PPT
5954987.ppt
PPT
dimension reduction.ppt
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PPTX
Data Preprocessing:Feature scaling methods
PPTX
Supervised learning for IOT IN Vellore Institute of Technology
PPTX
Dimensionality Reduction.pptx
PPTX
Regression ppt
PDF
Lead Scoring Group Case Study Presentation.pdf
PDF
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
PPT
15303589.ppt
PDF
ML-Unit-4.pdf
PPTX
Feature scaling
PPTX
Classification Algortyhm of Machine Learning
PPTX
PCA.pptx
PPTX
support vector machine 1.pptx
PDF
Introduction to Artificial Intelligence_ Lec 5
PDF
overview of_data_processing
 
PPTX
UNIT 3.pptx.......................................
Data Transformation – Standardization & Normalization PPM.pptx
5954987.ppt
dimension reduction.ppt
MACHINE LEARNING YEAR DL SECOND PART.pptx
Data Preprocessing:Feature scaling methods
Supervised learning for IOT IN Vellore Institute of Technology
Dimensionality Reduction.pptx
Regression ppt
Lead Scoring Group Case Study Presentation.pdf
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
15303589.ppt
ML-Unit-4.pdf
Feature scaling
Classification Algortyhm of Machine Learning
PCA.pptx
support vector machine 1.pptx
Introduction to Artificial Intelligence_ Lec 5
overview of_data_processing
 
UNIT 3.pptx.......................................
Ad

More from Kush Kulshrestha (17)

PDF
Clustering - Machine Learning Techniques
PDF
Machine Learning Algorithm - KNN
PDF
Machine Learning Algorithm - Decision Trees
PDF
Machine Learning Algorithm - Naive Bayes for Classification
PDF
Machine Learning Algorithm - Logistic Regression
PDF
Assumptions of Linear Regression - Machine Learning
PDF
Interpreting Regression Results - Machine Learning
PDF
Machine Learning Algorithm - Linear Regression
PDF
General Concepts of Machine Learning
PDF
Performance Metrics for Machine Learning Algorithms
PDF
Visualization-2
PDF
Visualization-1
PDF
Inferential Statistics
PDF
Descriptive Statistics
PPTX
Wireless Charging of Electric Vehicles
PPTX
Time management
PPTX
Handshakes and their types
Clustering - Machine Learning Techniques
Machine Learning Algorithm - KNN
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Logistic Regression
Assumptions of Linear Regression - Machine Learning
Interpreting Regression Results - Machine Learning
Machine Learning Algorithm - Linear Regression
General Concepts of Machine Learning
Performance Metrics for Machine Learning Algorithms
Visualization-2
Visualization-1
Inferential Statistics
Descriptive Statistics
Wireless Charging of Electric Vehicles
Time management
Handshakes and their types

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Global journeys: estimating international migration
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Foundation of Data Science unit number two notes
Major-Components-ofNKJNNKNKNKNKronment.pptx
Reliability_Chapter_ presentation 1221.5784
Acceptance and paychological effects of mandatory extra coach I classes.pptx
1_Introduction to advance data techniques.pptx
Global journeys: estimating international migration
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IBA_Chapter_11_Slides_Final_Accessible.pptx

Scaling and Normalization

  • 1. Lesson 5 Scaling and Normalization Kush Kulshrestha
  • 2. Topics 1. Scaling of Data 2. Data Normalization 3. Difference between Scaling and Normalization 4. Min-max scalar and Box-Cox Transformation
  • 3. Scaling of data • Variable scaling requires taking values that span a specific range and representing them in another range. • The standard method is to scale variables to [0,1]. • This may introduce various distortions or biases into the data but the distribution or shape remains same. • Depending on the modeling tool, scaling variable ranges can be beneficial or sometimes even required. One way of doing this is: Linear Scaling Transform • First task in this scaling is to determine the minimum and maximum values of variables. • Then applying the transform: (x - min{x1, xN}) / (max{x1, xN} - min{x1, xN}) • This introduces no distortion to the variable distribution. • Has a one-to-one relationship between the original and scaled values.
  • 4. Scaling of data Out of Range values • In data preparation, the data used is only a sample of the population. • Therefore, it is not certain that the actual minimum and maximum values of the variable have been discovered when scaling the ranges. • If some values that turn up later in the mining process are outside of the limits discovered in the sample, they are called out-of-range values.
  • 5. Scaling of data Dealing with Out of Range values • After range scaling, all variables should be in the range of [0,1]. • Out-of-range values, however, have values like -0.2 or 1.1 which can cause unwanted behavior. Solution 1: Ignore that the range has been exceeded. • Most modeling tools have (at least) some capacity to handle numbers outside the scaling range. • Important question to ask: Does this affect the quality of the model? Solution 2: Exclude the out of range instances. • One problem is that reducing the number of instances reduces the confidence that the sample represents the population. • Another problem: Introduction of bias. Out-of-range values may occur with a certain pattern and ignoring these instances removes samples according to a pattern introducing distortion to the sample
  • 6. Scaling of data Dealing with Out of Range values Solution 3: Clip the out of range values • If the value is greater than 1, assign 1 to it. If less than 0, assign 0. • This approach assumes that out-of-range values are somehow equivalent with range limit values. • Therefore, the information content on the limits is distorted by projecting multiple values into a single value. • This also introduces some bias. Solution 4: Making room for out of range values • The linear scaling transform provides an undistorted normalization but suffers from out-of-range values. • Therefore, we should modify it to somehow include also values that are out of range. • Most of the population is inside the range so for these values the normalization should be linear. • The solution is to reserve some part of the range for the out-of-range values. • Reserved amount of space depends on the confidence level of the sample: e.g. - 98% confidence linear part is [0.01, 0.99]
  • 7. Scaling of data Dealing with Out of Range values Squashing the out of range values • Now the problem reduces to fitting the out-of-range values into the space left for them. • The greater the difference between a value and the range limit, the less likely any such value is found. • Therefore, the transformation should be such that as the distance to the range grows, the smaller the increase towards one or decrease towards zero. • One possibility is to use functions of the form y =1/x and attach them to the ends of the linear part. Its difficult to carry out the scaling in pieces depending on the nature of the data point. We can do all of the above steps using one function called Softmax scaling.
  • 8. Scaling of data Dealing with Out of Range values Softmax Scaling – • The extent of the linear part can be controlled by one parameter. • The space assigned for out-of-range values can be controlled by the level of uncertainty in the sample. • Non identical values have always different normalized values. Softmax scaling is based on the logistic function: y = 1 / (1 + e-x) Where y is the scaled value and x is the input value. • The logistic function transforms the original range of [-,] to [0,1] and also has a linear part on the transform.
  • 9. Scaling of data Min-Max Scaler In scikit-learn
  • 10. Data Normalization • Applying a function to each data point z in the data: yi = f(zi) • Unlike scaling, not only the data is distorted, but the shape or distribution of the data changes as well. • The point of normalization is to change your observations so that they can be described as a normal distribution. • In general, you'll only want to normalize your data if you're going to be using a machine learning or statistics technique that assumes your data is normally distributed. • Or before using any technique containing “Gaussian” in its name.
  • 11. Data Normalization Data Standardization • Also called z-scores. • The terms normalization and standardization are sometimes used interchangeably. • Standardization transforms data to have a mean of zero and a standard deviation of 1. • Done by subtracting the mean and dividing by the standard deviation for each data point.
  • 12. Data Normalization Log Transformation • The log transformation can be used to make highly skewed distributions less skewed. • This can be valuable both for making patterns in the data more interpretable and for helping to meet the assumptions of inferential statistics. • Figure shows an example of how a log transformation can make patterns more visible. Both graphs plot the brain weight of animals as a function of their body weight. The raw weights are shown in the left panel; the log-transformed weights are plotted in the right panel.
  • 13. Data Normalization Box-Cox Transformation The Box-Cox transformation of the variable x is also indexed by λ, and is defined as: • Box-Cox transformations cannot handle negative values. • One way to deal with this is by adding the minimum data point value to add the data points. • We want to use the same lambda generated from the training set of box cox transformation in the test set. • Another way to write this equation: as ƛ turns to be 0
  • 14. Data Normalization Box-Cox Transformation Notice with this definition of that x = 1 always maps to the point = 0 for all values of λ. To see how the transformation works, look at the examples in Figure.
  • 15. Data Normalization Box-Cox Transformation In the top row, the choice λ = 1 simply shifts x to the value x−1, which is a straight line. In the bottom row (on a semi-logarithmic scale), the choice λ = 0 corresponds to a logarithmic transformation, which is now a straight line. We superimpose a larger collection of transformations on a semi-logarithmic scale in Figure 2.
  • 17. Data Normalization Why do we need normalization? 1. Some algorithms need normally distributed data as input, otherwise the results are not reliable. 2. Easy comparison of values. 3. Interpretation of results makes more sense. 4. Objective function works faster in some cases if the data is normalized, i.e., speed of the algorithm increases. 5. Can create complex features that may improve the model (or make it non-linear)