SlideShare a Scribd company logo
Honey, I Shrunk the Target Variable!
Florian Wilhelm
Common pitfalls when transforming the target variable and
how to exploit transformations
Berlin, April 12th 2022
Dein
Foto
hier
Mathematical Modelling
dA Data Science to Production & MLOps
Personalisation & RecSys
Uncertainty Quantification & Causality
Python Data Stack
Creator of PyScaffold
@FlorianWilhelm
FlorianWilhelm
FlorianWilhem.info
2
Dr. Florian Wilhelm
Head of Data Science @ inovex
inovex is an IT project house
with focus on digital transformation
› Product Discovery · Product Ownership
› Web · UI/UX · Replatforming · Microservices
› Mobile · Apps · Smart Devices · Robotics
› Big Data & Business Intelligence Platforms
› Data Science · Data Products · Search · Deep Learning
› Data Center Automation · DevOps · Cloud · Hosting
› Agile Training · Technology Training · Coaching
Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg
www.inovex.de/en
Using technology to inspire our clients.
And ourselves.
Recap about
Metrics
4
Choosing the Right Metric
› (R)MSE is most often
used in practice
› Scikit-Learn’s
regressors use mostly
MSE as default
5
In which Use-Cases does (R)MSE make sense?
Quadratic Absolute
Little Recap about Metrics
6
Difference
Relation
Our Use-Case
7
8
How much should
I sell my car for?
Model fitted on
many sold cars
and their features
could provide a
fair market value
Our Use-Case Setting
9
1. take used-cars database from Kaggle with 370k cars having
features: vehicle type, model, registration date, gearbox,
powerPS, model, mileage, fuel type, brand and price
2. built a model to estimate the price based on these features
and treat this as a fair market value
3. decide what’s a good/fair/bad price based on this fair
market value
source-code: https://github.com/FlorianWilhelm/used-cars-log-trans/
Question 1:
10
What’s worse? Selling 10 equal cars
with an actual price of 50,000 € and
1. getting the actual price for 9
but only 40,000 € for the last car or
2. getting 49,000 € for every car?
● For (R)MSE option 1 is much worse
● For MAE both options are equally good/bad
Question 2:
11
Which one is worse?
Getting 1,000 € less if your
car’s actual value is
1. 100,000 € or
2. 10,000 €?
● For RMSE & MAE this makes no difference
● For RMSPE & MAPE option 2 is much worse
Learning 1:
The right metric depends on the
use-case and will affect your results!
12
What does minimizing (R)MSE
actually Mean?
13
Minimizing MSE
14
is continuous
random variable
Derive and set to 0:
is actually the Mean!
Analog proof
for MAE and
Median
For the Math Skeptics…
15
Learning 2:
The mean (expected value) minimizes (R)MSE
and the median minimizes MAE.
16
Shrinking the Prices with Log
17
18
18
Distribution of Prices
19
19
Distribution of Prices and LogNormal Fit
Not perfectly lognormal,
which will be important later
Minimizing (R)MSE with log(price)
20
What we gonna do:
1. Take log(price) as target variable
2. Minimize (R)MSE to find ŷ
3. Transform ŷ back with exp(ŷ)
Minimizing (R)MSE with log(price) is …
21
… the Median?!?
Mathematically, in case of a
lognormal residual distribution:
› taking the log, minimizing for
RMSE and transforming back
with exp, will lead to the median.
› if we wanted the mean, we need
to correct the transformed result
by adding .
22
On our data (not perfectly lognormal)
https://www.pinterest.de/pin/494973815284951824/
Uploaded by Jittanisa Sukaphatana
a bit higher than the “actual” mean of 6807
And there is much more…
Correction terms when applying log to the a target variable
with lognormal residuals and minimizing (R)MSE:
23
(R)MSE MAE MAPE RMSPE
Proofs under https://www.inovex.de/de/blog/honey-i-shrunk-the-target-variable/
Learning 3:
Transforming your target might change the
metric you are actually minimizing!
24
Transforming the Target Variable
for Fun & Profit
25
What To Do If Your Metric Is Not Supported?
26
Imagine you want to optimise for RMSPE, and your data has
a lognormal residual distribution but the ML-library your
are using only supports (R)MSE?
One More Time. Instead of doing…
27
model fit with (R)MSE
1. Fitting a model using (R)MSE as loss/metric
2. Evaluating our predictions with another
metric, e.g. MAD, MAPE, RMSPE
… We Do for Our Use-Case…
28
transform
model fit with (R)MSE
correction
&
transform
1. Log transformation
2. Fitting a model using (R)MSE as loss/metric
3. Correction & back-transformation
4. Evaluating our predictions with another
metric, e.g. MAD, MAPE, RMSPE
Let’s Apply This In Our Use-Case
29
Improvements over raw target when using a log transformation & correction
and evaluating the final prediction under a given metric, e.g. MAPE, …
In case of the Kaggle competition the
transformation was key for winning
negative numbers mean improvement
30
Want to know more?
blog.inovex.de
31
https://www.inovex.de/de/blog/honey-i-shrunk-the-target-variable/
Thank you!
Florian Wilhelm
Head of Data Science
inovex GmbH
Schanzenstraße 6-20
Kupferhütte 1.13
51063 Köln
florian.wilhelm@inovex.de
Linear Models
&
Normal Distribution
33
Recap: Linear Model
34
raw features
(non-linear) functions, feature engineering
weights to fit
true latent (unknown) outcome
noise
observations/samples
Normal Distribution
Cathedral Distribution
35
Linear model with a single, binary feature variable x and random noise.
Appendix Learning:
The residuals of a linear model should be
normally distributed, not the target variable.
36

More Related Content

PPT
High responders
PPTX
GeoMesa: Scalable Geospatial Analytics
PDF
Bayesian model averaging
PDF
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
PDF
Building a performing Machine Learning model from A to Z
PDF
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
PDF
VSSML18. Practical Workshops
PPTX
Importance of Computer In Petroleum Engineering
High responders
GeoMesa: Scalable Geospatial Analytics
Bayesian model averaging
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
Building a performing Machine Learning model from A to Z
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
VSSML18. Practical Workshops
Importance of Computer In Petroleum Engineering

Similar to Honey I Shrunk the Target Variable! Common pitfalls when transforming the target variable and how to exploit transformations. (20)

PDF
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
PDF
Unlocking the Power of Integer Programming
PPTX
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
PPTX
2024-02-24_Session 1 - PMLE_UPDATED.pptx
PPTX
[OFW 14] Prediction of Flow Characteristics by Applying Machine Learning of S...
PPT
P01executive Summary Yy2009mm03dd16
PDF
Declarative SelfImproving Pipelines-mikkel-hempel-frederik-hvalkof.pdf
PDF
Lpp through graphical analysis
PDF
Energy Management Solution - iARMS-EMS/PMS
PDF
Can Machine Learning Models be Trusted? Explaining Decisions of ML Models
PDF
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
PPTX
Building Custom
Machine Learning Algorithms
with Apache SystemML
PPTX
Succeeding with Functional-first Programming in Enterprise
PDF
IBM i & Data Science in the AI era.
PPTX
"Custom ML Models for Each User", Siamion Karasik
PPTX
Machine Learning In Production
PDF
How to add machine learning to your applications today
PDF
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
PPTX
How to fine-tune and develop your own large language model.pptx
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
Unlocking the Power of Integer Programming
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
2024-02-24_Session 1 - PMLE_UPDATED.pptx
[OFW 14] Prediction of Flow Characteristics by Applying Machine Learning of S...
P01executive Summary Yy2009mm03dd16
Declarative SelfImproving Pipelines-mikkel-hempel-frederik-hvalkof.pdf
Lpp through graphical analysis
Energy Management Solution - iARMS-EMS/PMS
Can Machine Learning Models be Trusted? Explaining Decisions of ML Models
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemML
Succeeding with Functional-first Programming in Enterprise
IBM i & Data Science in the AI era.
"Custom ML Models for Each User", Siamion Karasik
Machine Learning In Production
How to add machine learning to your applications today
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
How to fine-tune and develop your own large language model.pptx
Ad

More from Florian Wilhelm (17)

PDF
Why Exceptions are just sophisticated GoTos ... and How to Move Beyond
PDF
Vodafone Mathematical Modelling 2024.pdf
PDF
Streamlining Python Development: A Guide to a Modern Project Setup
PDF
WALD: A Modern & Sustainable Analytics Stack
PDF
Forget about AI and do Mathematical Modelling instead!
PDF
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
PDF
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
PDF
Uncertainty Quantification in AI
PDF
Performance evaluation of GANs in a semisupervised OCR use case
PDF
Bridging the Gap: from Data Science to Production
PDF
How mobile.de brings Data Science to Production for a Personalized Web Experi...
PDF
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
PDF
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
PDF
Declarative Thinking and Programming
PDF
Which car fits my life? - PyData Berlin 2017
PDF
PyData Meetup Berlin 2017-04-19
PDF
Explaining the idea behind automatic relevance determination and bayesian int...
Why Exceptions are just sophisticated GoTos ... and How to Move Beyond
Vodafone Mathematical Modelling 2024.pdf
Streamlining Python Development: A Guide to a Modern Project Setup
WALD: A Modern & Sustainable Analytics Stack
Forget about AI and do Mathematical Modelling instead!
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Uncertainty Quantification in AI
Performance evaluation of GANs in a semisupervised OCR use case
Bridging the Gap: from Data Science to Production
How mobile.de brings Data Science to Production for a Personalized Web Experi...
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Declarative Thinking and Programming
Which car fits my life? - PyData Berlin 2017
PyData Meetup Berlin 2017-04-19
Explaining the idea behind automatic relevance determination and bayesian int...
Ad

Recently uploaded (20)

PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
The scientific heritage No 166 (166) (2025)
PPTX
2Systematics of Living Organisms t-.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
famous lake in india and its disturibution and importance
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Classification Systems_TAXONOMY_SCIENCE8.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Phytochemical Investigation of Miliusa longipes.pdf
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
The scientific heritage No 166 (166) (2025)
2Systematics of Living Organisms t-.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Comparative Structure of Integument in Vertebrates.pptx
. Radiology Case Scenariosssssssssssssss
HPLC-PPT.docx high performance liquid chromatography
Biophysics 2.pdffffffffffffffffffffffffff
Introduction to Cardiovascular system_structure and functions-1
TOTAL hIP ARTHROPLASTY Presentation.pptx
famous lake in india and its disturibution and importance
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf

Honey I Shrunk the Target Variable! Common pitfalls when transforming the target variable and how to exploit transformations.

  • 1. Honey, I Shrunk the Target Variable! Florian Wilhelm Common pitfalls when transforming the target variable and how to exploit transformations Berlin, April 12th 2022
  • 2. Dein Foto hier Mathematical Modelling dA Data Science to Production & MLOps Personalisation & RecSys Uncertainty Quantification & Causality Python Data Stack Creator of PyScaffold @FlorianWilhelm FlorianWilhelm FlorianWilhem.info 2 Dr. Florian Wilhelm Head of Data Science @ inovex
  • 3. inovex is an IT project house with focus on digital transformation › Product Discovery · Product Ownership › Web · UI/UX · Replatforming · Microservices › Mobile · Apps · Smart Devices · Robotics › Big Data & Business Intelligence Platforms › Data Science · Data Products · Search · Deep Learning › Data Center Automation · DevOps · Cloud · Hosting › Agile Training · Technology Training · Coaching Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg www.inovex.de/en Using technology to inspire our clients. And ourselves.
  • 5. Choosing the Right Metric › (R)MSE is most often used in practice › Scikit-Learn’s regressors use mostly MSE as default 5 In which Use-Cases does (R)MSE make sense?
  • 6. Quadratic Absolute Little Recap about Metrics 6 Difference Relation
  • 8. 8 How much should I sell my car for? Model fitted on many sold cars and their features could provide a fair market value
  • 9. Our Use-Case Setting 9 1. take used-cars database from Kaggle with 370k cars having features: vehicle type, model, registration date, gearbox, powerPS, model, mileage, fuel type, brand and price 2. built a model to estimate the price based on these features and treat this as a fair market value 3. decide what’s a good/fair/bad price based on this fair market value source-code: https://github.com/FlorianWilhelm/used-cars-log-trans/
  • 10. Question 1: 10 What’s worse? Selling 10 equal cars with an actual price of 50,000 € and 1. getting the actual price for 9 but only 40,000 € for the last car or 2. getting 49,000 € for every car? ● For (R)MSE option 1 is much worse ● For MAE both options are equally good/bad
  • 11. Question 2: 11 Which one is worse? Getting 1,000 € less if your car’s actual value is 1. 100,000 € or 2. 10,000 €? ● For RMSE & MAE this makes no difference ● For RMSPE & MAPE option 2 is much worse
  • 12. Learning 1: The right metric depends on the use-case and will affect your results! 12
  • 13. What does minimizing (R)MSE actually Mean? 13
  • 14. Minimizing MSE 14 is continuous random variable Derive and set to 0: is actually the Mean! Analog proof for MAE and Median
  • 15. For the Math Skeptics… 15
  • 16. Learning 2: The mean (expected value) minimizes (R)MSE and the median minimizes MAE. 16
  • 17. Shrinking the Prices with Log 17
  • 19. 19 19 Distribution of Prices and LogNormal Fit Not perfectly lognormal, which will be important later
  • 20. Minimizing (R)MSE with log(price) 20 What we gonna do: 1. Take log(price) as target variable 2. Minimize (R)MSE to find ŷ 3. Transform ŷ back with exp(ŷ)
  • 21. Minimizing (R)MSE with log(price) is … 21
  • 22. … the Median?!? Mathematically, in case of a lognormal residual distribution: › taking the log, minimizing for RMSE and transforming back with exp, will lead to the median. › if we wanted the mean, we need to correct the transformed result by adding . 22 On our data (not perfectly lognormal) https://www.pinterest.de/pin/494973815284951824/ Uploaded by Jittanisa Sukaphatana a bit higher than the “actual” mean of 6807
  • 23. And there is much more… Correction terms when applying log to the a target variable with lognormal residuals and minimizing (R)MSE: 23 (R)MSE MAE MAPE RMSPE Proofs under https://www.inovex.de/de/blog/honey-i-shrunk-the-target-variable/
  • 24. Learning 3: Transforming your target might change the metric you are actually minimizing! 24
  • 25. Transforming the Target Variable for Fun & Profit 25
  • 26. What To Do If Your Metric Is Not Supported? 26 Imagine you want to optimise for RMSPE, and your data has a lognormal residual distribution but the ML-library your are using only supports (R)MSE?
  • 27. One More Time. Instead of doing… 27 model fit with (R)MSE 1. Fitting a model using (R)MSE as loss/metric 2. Evaluating our predictions with another metric, e.g. MAD, MAPE, RMSPE
  • 28. … We Do for Our Use-Case… 28 transform model fit with (R)MSE correction & transform 1. Log transformation 2. Fitting a model using (R)MSE as loss/metric 3. Correction & back-transformation 4. Evaluating our predictions with another metric, e.g. MAD, MAPE, RMSPE
  • 29. Let’s Apply This In Our Use-Case 29 Improvements over raw target when using a log transformation & correction and evaluating the final prediction under a given metric, e.g. MAPE, … In case of the Kaggle competition the transformation was key for winning negative numbers mean improvement
  • 30. 30
  • 31. Want to know more? blog.inovex.de 31 https://www.inovex.de/de/blog/honey-i-shrunk-the-target-variable/
  • 32. Thank you! Florian Wilhelm Head of Data Science inovex GmbH Schanzenstraße 6-20 Kupferhütte 1.13 51063 Köln florian.wilhelm@inovex.de
  • 34. Recap: Linear Model 34 raw features (non-linear) functions, feature engineering weights to fit true latent (unknown) outcome noise observations/samples Normal Distribution
  • 35. Cathedral Distribution 35 Linear model with a single, binary feature variable x and random noise.
  • 36. Appendix Learning: The residuals of a linear model should be normally distributed, not the target variable. 36