SlideShare a Scribd company logo
Retail Demand Forecasting with Machine Learning
Ronald P. (Ron) Menich
mlconf NYC 27 Mar 2015
GO, TEAM!
▪ Syrine Besbes
▪ Wafa Hwess
▪ Rihab Ben Aicha
▪ Abhijit Oka
▪ Mark Tabladillo
▪ Ahmed Yassine Khaili
2
▪ Nikolaos Vasiloglou
▪ Eugene Kamarchik
▪ Kurt Stirewalt
▪ Andy Dean
▪ Firas Aloui
▪ Molham Aref
▪ Rafael Gonzalez-Coloni
Forgive me if I’ve missed someone
PREDICTIX’ CORE RETAIL DECISION SUPPORT OFFERINGS
▪ Planning
▪ Assortment Planning
▪ Merchandise Financial Planning
▪ Item Planning
▪ Forecasting
▪ Machine-learning models
▪ All demand drivers
▪ Internal (promo, price, etc.)
▪ External (weather, competition, events, etc.)
▪ Supply Chain Optimization
▪ Network flow optimization
▪ Optimize for profit
3
GETTING DEMAND FORECASTING RIGHT TRANSLATES TO $$$
▪ Size of the problem
▪ 62 billion weekly forecasts (150K active skus X 8,000 stores X 52 weeks)
▪ Many TB’s of data
▪ 3,000 computing cores elastically provisioned
▪ Forecast accuracy
▪ Measured 25% to 50% reduction in MAPE
▪ The harder the problem the better the improvement
▪ Measured reduction of bias in forecasts
▪ Benefits
▪ $125M from inventory reductions alone
▪ 20% ongoing benefit
4
IN THE BEGINNING, DEMAND FORECASTING SEEMED SIMPLE...
5
Time-series forecasting
…BUT THEN EVER GREATER COMPLEXITY AROSE
6
A Last year’s sales
B Manual partitioning of
data, different TS
models for different
partitions
C Croston’s for sparse,
Winters for dense
D Forecast at aggregate
levels, spread down
J if/then/else assignment of
different TS algorithms
...
N Have user manually
map a new SKU to an
existing one
...
O Have user manually
inject local market
knowledge
L Linear regression for
promotions
Alarm Clock: Demand
forecasts. But are they
really “simple”?
…AND SO NOW WE ASK THE QUESTION
7
A Last year’s sales
B Manual partitioning of
data, different TS
models for different
partitions
C Croston’s for sparse
demand, Winters for
dense
D Forecast at different
hierarchical levels,
spread down
J Automated if/then/else
assignment of different TS
algorithms
...
N Have user manually
map a new SKU to an
existing one
...
O Have user manually
inject local market
knowledge
L Linear regression for
promo
Alarm Clock: Demand
forecasts. But are they
really “simple”?
REALLY?
Machine learning can provide a modern, simpler,
theoretically sound and more extensible alternative for
retail demand forecasting
CAUSAL FACTORS DRIVE RETAIL DEMAND
How much additional
demand was generated for
Post Cereals because
these were on promotion?
How much does the $4 in-store
coupon contribute to the total
uplift?
Does the table highlighting the
$1.50 coupon and the final offer
price drive any additional uplift?
Competition
Weather
SO AN ATTRIBUTE-BASED FORECASTING APPROACH IS APT
Inputs include:
• Product Attributes
(including text descriptions e.g. reviews)
• Hierarchies
• Competitor Data
• Promotions
• Pricing
• Display
• Store Attributes
• Local events
• Weather
• Customer data
• ...
CLOUD ELASTICITY
Machine Learning:
• 2-way interactions
• 3-way
• 4-way
Predictive Analytics
What If on
price/promo/display
changes
Demand Forecasts
▪ Basic products
▪ New products
▪ Short lifecycle
▪ Customer specific
▪ ...
POSSIBLE SUPERVISED LEARNING MODELS
10
Random forests Restricted Boltzman
machines
Deep learning
We chose factorization machines for
several reasons
● Linear regression heritage of market mix
modeling
● SGD/online suitability for handling large
data sets
● Trend can be modeled
ZERO-FILLING --- KNOWING WHY DEMAND DID AND DIDN’T OCCUR AND WHEN
● Unlike for product recommender
systems, retail forecasting must
predict the timing of when demand
will happen (not just the rating
whenever it happens)
● An observation of sales might have
(sku,store,day) primary key
○ Was the product on the shelf
available to be sold?
○ How much was sold, if any?
● In many retail contexts, the vast
majority of observations have zero
sales
○ Recent example: zero sales
observations account for >97.5% of
the training set
○ It is important to know why demand
was zero
11
Extreme Case:
Demand only occurs when there’s a discount
EXAMPLE FORECASTS - TOYS
12
Training set
Test set
EXAMPLE FORECASTS - SEASONAL GROCERY ITEM
13
Training on the left and middle
One month of holdout / test at the very right
EXAMPLE FORECASTS - QUICK SERVICE RESTAURANT
14
For very dense
data - few
zeros - almost
unbiased
forecasts with
WAPE values
below 12.5%
can be
achieved
NEW SKUS CAN READILY BE FORECASTED
15
REPLACEMENT SKUS CAN BE READILY FORECASTED
16
CHALLENGES / ONGOING WORK
● Zero-filling / training set cardinality control using weighted least squares
● Global effects and 2-way interactions are easily trainable, but 3-way and higher-order
interactions require judicious feature engineering
● Parallel learning / consensus of learners
● Visualization / explanation of hidden factors used for interaction modeling
● Automated pruning of non-important attributes
17
THANK YOU.
18

More Related Content

PDF
thesis_jinxing_lin
PPTX
Cross selling of staffing solutions to american and indian companies in niche...
PDF
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
PDF
How to Build a Bottom-Up Revenue Forecast for Software Products
PPTX
Predictive Analytics within the Analytics Value Chain
PPTX
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
PDF
RSR's Brian Kilcourse Presents The State of Retail Demand Forecasting 2011
PDF
Machine learning ~ Forecasting
thesis_jinxing_lin
Cross selling of staffing solutions to american and indian companies in niche...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
How to Build a Bottom-Up Revenue Forecast for Software Products
Predictive Analytics within the Analytics Value Chain
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
RSR's Brian Kilcourse Presents The State of Retail Demand Forecasting 2011
Machine learning ~ Forecasting

Viewers also liked (20)

PPS
Probability Forecasting - a Machine Learning Perspective
PDF
Practical Machine Learning with Prediction APIs
PDF
Expertise on Demand - How machine learning puts the best-of-the-best at your ...
PPT
PPTX
Innovations & Best Practices from Clorox, P&G, General Mills, Walmart & Coca-...
PPT
Re-Engineering Demand Planning
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
PPTX
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
PPTX
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
PDF
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
PPTX
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
PPTX
Amy Langville, Professor of Mathematics, The College of Charleston in South C...
PPTX
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
PPTX
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
PPTX
Kristian Kersting, Associate Professor for Computer Science, TU Dortmund Univ...
PPTX
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
PPTX
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
PDF
Jason Baldridge, Associate Professor of Computational Linguistics, University...
PDF
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
Probability Forecasting - a Machine Learning Perspective
Practical Machine Learning with Prediction APIs
Expertise on Demand - How machine learning puts the best-of-the-best at your ...
Innovations & Best Practices from Clorox, P&G, General Mills, Walmart & Coca-...
Re-Engineering Demand Planning
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Amy Langville, Professor of Mathematics, The College of Charleston in South C...
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Jake Mannix, Lead Data Engineer, Lucidworks at MLconf SEA - 5/20/16
Kristian Kersting, Associate Professor for Computer Science, TU Dortmund Univ...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Jason Baldridge, Associate Professor of Computational Linguistics, University...
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
Ad

Similar to Ronald Menich, Chief Data Scientist, Predictix, LLC at MLconf NYC (20)

PPTX
2 strategic sourcing.pptx
PPTX
[DSC Europe 22] Riddles in Supply Chain Management - How AI solves them - Nin...
PDF
Data & Storytelling - What Now?
PPTX
How to integrate volatile/non-transparent emerging markets into powerful S&OP...
PPTX
Product1 [3] forecasting v2
PPTX
Data and Storytelling | What Now?
PPTX
Mkt Week 2013 - connecting innovation with success - Crea presentation
PDF
Big Data & Analytics to Improve Supply Chain and Business Performance
PDF
Growing your SaaS Product Business (with speaker notes)
PPTX
What's Hiding in Your Point of Sale Data?
PPTX
IQ vs EQ in Supply Chain Management
PDF
Tighten Up Your Back End: Productivity Evolution in Retail and Harnessing Eve...
PDF
[DSC Europe 22] Data-driven transformation: Use case in demand forecasting @ ...
PPTX
Seeing signal through noise
PDF
Modelling for decisions
PDF
Mairi robertson nmp - workshop 2
PPTX
Mba 433 MIS - Data Warehouse
PPTX
Distributor S&OP in Emerging Markets
PPTX
Promotion Analytics in Consumer Electronics - Module 1: Data
PDF
Market Potential PowerPoint Presentation Slides
2 strategic sourcing.pptx
[DSC Europe 22] Riddles in Supply Chain Management - How AI solves them - Nin...
Data & Storytelling - What Now?
How to integrate volatile/non-transparent emerging markets into powerful S&OP...
Product1 [3] forecasting v2
Data and Storytelling | What Now?
Mkt Week 2013 - connecting innovation with success - Crea presentation
Big Data & Analytics to Improve Supply Chain and Business Performance
Growing your SaaS Product Business (with speaker notes)
What's Hiding in Your Point of Sale Data?
IQ vs EQ in Supply Chain Management
Tighten Up Your Back End: Productivity Evolution in Retail and Harnessing Eve...
[DSC Europe 22] Data-driven transformation: Use case in demand forecasting @ ...
Seeing signal through noise
Modelling for decisions
Mairi robertson nmp - workshop 2
Mba 433 MIS - Data Warehouse
Distributor S&OP in Emerging Markets
Promotion Analytics in Consumer Electronics - Module 1: Data
Market Potential PowerPoint Presentation Slides
Ad

More from MLconf (20)

PDF
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
PPTX
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
PDF
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
PPTX
Josh Wills - Data Labeling as Religious Experience
PDF
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
PDF
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
PDF
Meghana Ravikumar - Optimized Image Classification on the Cheap
PDF
Noam Finkelstein - The Importance of Modeling Data Collection
PDF
June Andrews - The Uncanny Valley of ML
PDF
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
PDF
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
PDF
Vito Ostuni - The Voice: New Challenges in a Zero UI World
PDF
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
PDF
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
PPTX
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
PPTX
Neel Sundaresan - Teaching a machine to code
PDF
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
PPTX
Soumith Chintala - Increasing the Impact of AI Through Better Software
PPTX
Roy Lowrance - Predicting Bond Prices: Regime Changes
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Josh Wills - Data Labeling as Religious Experience
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Meghana Ravikumar - Optimized Image Classification on the Cheap
Noam Finkelstein - The Importance of Modeling Data Collection
June Andrews - The Uncanny Valley of ML
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Neel Sundaresan - Teaching a machine to code
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Soumith Chintala - Increasing the Impact of AI Through Better Software
Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Hybrid model detection and classification of lung cancer
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Chapter 5: Probability Theory and Statistics
Unlocking AI with Model Context Protocol (MCP)
cloud_computing_Infrastucture_as_cloud_p
SOPHOS-XG Firewall Administrator PPT.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DP Operators-handbook-extract for the Mautical Institute
Hindi spoken digit analysis for native and non-native speakers
Hybrid model detection and classification of lung cancer
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Web App vs Mobile App What Should You Build First.pdf
Approach and Philosophy of On baking technology
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Tartificialntelligence_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A novel scalable deep ensemble learning framework for big data classification...
A comparative analysis of optical character recognition models for extracting...
WOOl fibre morphology and structure.pdf for textiles
Digital-Transformation-Roadmap-for-Companies.pptx

Ronald Menich, Chief Data Scientist, Predictix, LLC at MLconf NYC

  • 1. Retail Demand Forecasting with Machine Learning Ronald P. (Ron) Menich mlconf NYC 27 Mar 2015
  • 2. GO, TEAM! ▪ Syrine Besbes ▪ Wafa Hwess ▪ Rihab Ben Aicha ▪ Abhijit Oka ▪ Mark Tabladillo ▪ Ahmed Yassine Khaili 2 ▪ Nikolaos Vasiloglou ▪ Eugene Kamarchik ▪ Kurt Stirewalt ▪ Andy Dean ▪ Firas Aloui ▪ Molham Aref ▪ Rafael Gonzalez-Coloni Forgive me if I’ve missed someone
  • 3. PREDICTIX’ CORE RETAIL DECISION SUPPORT OFFERINGS ▪ Planning ▪ Assortment Planning ▪ Merchandise Financial Planning ▪ Item Planning ▪ Forecasting ▪ Machine-learning models ▪ All demand drivers ▪ Internal (promo, price, etc.) ▪ External (weather, competition, events, etc.) ▪ Supply Chain Optimization ▪ Network flow optimization ▪ Optimize for profit 3
  • 4. GETTING DEMAND FORECASTING RIGHT TRANSLATES TO $$$ ▪ Size of the problem ▪ 62 billion weekly forecasts (150K active skus X 8,000 stores X 52 weeks) ▪ Many TB’s of data ▪ 3,000 computing cores elastically provisioned ▪ Forecast accuracy ▪ Measured 25% to 50% reduction in MAPE ▪ The harder the problem the better the improvement ▪ Measured reduction of bias in forecasts ▪ Benefits ▪ $125M from inventory reductions alone ▪ 20% ongoing benefit 4
  • 5. IN THE BEGINNING, DEMAND FORECASTING SEEMED SIMPLE... 5 Time-series forecasting
  • 6. …BUT THEN EVER GREATER COMPLEXITY AROSE 6 A Last year’s sales B Manual partitioning of data, different TS models for different partitions C Croston’s for sparse, Winters for dense D Forecast at aggregate levels, spread down J if/then/else assignment of different TS algorithms ... N Have user manually map a new SKU to an existing one ... O Have user manually inject local market knowledge L Linear regression for promotions Alarm Clock: Demand forecasts. But are they really “simple”?
  • 7. …AND SO NOW WE ASK THE QUESTION 7 A Last year’s sales B Manual partitioning of data, different TS models for different partitions C Croston’s for sparse demand, Winters for dense D Forecast at different hierarchical levels, spread down J Automated if/then/else assignment of different TS algorithms ... N Have user manually map a new SKU to an existing one ... O Have user manually inject local market knowledge L Linear regression for promo Alarm Clock: Demand forecasts. But are they really “simple”? REALLY? Machine learning can provide a modern, simpler, theoretically sound and more extensible alternative for retail demand forecasting
  • 8. CAUSAL FACTORS DRIVE RETAIL DEMAND How much additional demand was generated for Post Cereals because these were on promotion? How much does the $4 in-store coupon contribute to the total uplift? Does the table highlighting the $1.50 coupon and the final offer price drive any additional uplift? Competition Weather
  • 9. SO AN ATTRIBUTE-BASED FORECASTING APPROACH IS APT Inputs include: • Product Attributes (including text descriptions e.g. reviews) • Hierarchies • Competitor Data • Promotions • Pricing • Display • Store Attributes • Local events • Weather • Customer data • ... CLOUD ELASTICITY Machine Learning: • 2-way interactions • 3-way • 4-way Predictive Analytics What If on price/promo/display changes Demand Forecasts ▪ Basic products ▪ New products ▪ Short lifecycle ▪ Customer specific ▪ ...
  • 10. POSSIBLE SUPERVISED LEARNING MODELS 10 Random forests Restricted Boltzman machines Deep learning We chose factorization machines for several reasons ● Linear regression heritage of market mix modeling ● SGD/online suitability for handling large data sets ● Trend can be modeled
  • 11. ZERO-FILLING --- KNOWING WHY DEMAND DID AND DIDN’T OCCUR AND WHEN ● Unlike for product recommender systems, retail forecasting must predict the timing of when demand will happen (not just the rating whenever it happens) ● An observation of sales might have (sku,store,day) primary key ○ Was the product on the shelf available to be sold? ○ How much was sold, if any? ● In many retail contexts, the vast majority of observations have zero sales ○ Recent example: zero sales observations account for >97.5% of the training set ○ It is important to know why demand was zero 11 Extreme Case: Demand only occurs when there’s a discount
  • 12. EXAMPLE FORECASTS - TOYS 12 Training set Test set
  • 13. EXAMPLE FORECASTS - SEASONAL GROCERY ITEM 13 Training on the left and middle One month of holdout / test at the very right
  • 14. EXAMPLE FORECASTS - QUICK SERVICE RESTAURANT 14 For very dense data - few zeros - almost unbiased forecasts with WAPE values below 12.5% can be achieved
  • 15. NEW SKUS CAN READILY BE FORECASTED 15
  • 16. REPLACEMENT SKUS CAN BE READILY FORECASTED 16
  • 17. CHALLENGES / ONGOING WORK ● Zero-filling / training set cardinality control using weighted least squares ● Global effects and 2-way interactions are easily trainable, but 3-way and higher-order interactions require judicious feature engineering ● Parallel learning / consensus of learners ● Visualization / explanation of hidden factors used for interaction modeling ● Automated pruning of non-important attributes 17