SlideShare a Scribd company logo
Business Forecasting in Real Life
Ganes Kesari
Nov 2021 Thanoj Kattamanchi
2
BUSINESS
APPLICATIONS
DEEP DIVE: PRICE
FORECASTING
CHALLENGES AND
LEARNINGS
3
BUSINESS
APPLICATIONS
CHALLENGES AND
LEARNINGS
DEEP DIVE: PRICE
FORECASTING
4
80% of analytics insights will not deliver
business outcomes through 2022.
- Gartner
Source: Gartner, 2019
5
The biggest challenge for data scientists
is not with the mastery of techniques..
6
…IT IS IN THE RIGHT APPLICATION OF TECHNIQUES TO PROBLEMS
Approach to data Insights
Insights as recommendations What actions will help me convert my
detractors into promoters?
Machine learning models What will be my promoter score
next quarter?
Statistics, Explorations
What led to lower satisfaction
in North America?
Simple summaries
Did I improve on
customer satisfaction?
7
ABOUT 1/4TH OF OUR CLIENTS ASK FOR FORECASTING
This is a portfolio of some of the analytics work we’ve done. The forecasting projects are highlighted.
Customer Feedback Analysis
TV Show Content Analysis
Earnings Transcript Analysis
What do parties focus on?
Duplicate Candidate Names
TV Show Drivers
Securities Clustering
Geo-demographic segments
Advertiser Clustering
Energy Fraud Detection
Directorship Network Analysis for
Fraud
Restaurant Sales Correlation
Restaurant Revenue Analysis
Telecom Customer Clusters
Sentiment Carry-Forward
Brand Association Mapping
TV Viewership Forecasting
Budget Forecasting
Impact of Age and number of
Contestants on Votes
Price Forecasting
Telecom Churn Prediction
Server Demand & Capacity
Restaurant Sales Forecasting
Bank Disbursement Forecast
Retail Sales Forecasting
Bot forecasting
Cargo Delay Optimization
Service Request Workflow
Poultry Mortality Drivers
Student Performance Drivers
Factors Affecting Attrition
Predicting Job Failure
Factors Driving Performance
Route Optimization
Product Recommendation
Descriptive Diagnostic Predictive Prescriptive
Problem Approach Outcome
BUDGET FORECASTING FOR THE INDIAN MINISTRY OF FINANCE
A national government wanted to forecast
the annual spend and deficit.
This was needed at the lowest level of
granularity, individual accounts in each
department.
With only 3 years of historical data available
at an annual level, simpler forecasting
techniques were used.
Regression-based techniques delivered
reliable estimates.
Multi-level forecasts of budget and actual
expenditure helped the government
officials plan outlays efficiently.
9
BUDGET FORECASTING FOR THE INDIAN MINISTRY OF FINANCE
Source: Gramener – Budget planning & Cash management
Problem Approach Outcome
DEMAND & CAPACITY MANAGEMENT OF SERVERS
FOR A LEADING PHARMA COMPANY
A leading pharma company wanted to plan
IT server capacities (CPU, RAM, Disk Space)
thru forecasting of server utilization at a
daily level.
This would help the Infrastructure
operations better plan consumption needs
and in procurement of server resources
Weekly, Monthly Level Utilizations for
~15000 Servers were forecasted using a
variety of time series models.
Servers were classified and Smart
Assistance Tags were placed to monitor
server performance & utilization
CPU, RAM, Disk Utilization were forecasted
at an accuracy of 80%+, much more
accurate then client methods.
Through a custom application, utilization
forecasts and continuous monitoring thru
tags could be visualized providing higher
visibility
11
SERVER FAILURE PREDICTION: PROCESS & OUTCOME
Problem Approach Outcome
PRICE FORECASTING FOR AN
ASIAN AGRICULTURAL ENTERPRISE
A leading agricultural enterprise wanted
price forecasts for their products in order to
plan inventory release to optimize revenue.
Incorrect timing was leading either to loss
of revenue or unsold inventory.
Gramener applied a suite of price
forecasting models based on internal and
external factors.
The models were evaluated on multiple test
datasets to select one that minimized
median absolute deviation.
The model was able to forecast the price to
an accuracy of 88%.
Within the first quarter of deploying the
model, the revenue uplift attributable
directly to pricing was +3.2%.
13
A COMPARISON OF PRICE FORECAST ACCURACY OF PURE MODELS
14
BUSINESS
APPLICATIONS
CHALLENGES AND
LEARNINGS
DEEP DIVE: PRICE
FORECASTING
2. EXPLORE THE DATA
3. SELECT VARIABLES
4. FORECAST RESULTS
5. ITERATE & DEPLOY
PREDICT THE PRICE
OF VISCOSE FIBER
1. UNDERSTAND THE DOMAIN
16
WHAT’S VISCOSE? A QUICK OVERVIEW
Wood
Dissolving
Pulp
Viscose (Dissolving Pulp + Caustic Soda + Coal)
Hygiene products
like Diapers
Yarn
<????>
Viscose is manufactured in 3 varieties, based on purity.
The client’s buyers were majorly from China.
17
Demand
Supply
Substitutes
Competition
Market
FACTORS THAT INFLUENCE VISCOSE PRICE
2. EXPLORE THE DATA
3. SELECT VARIABLES
4. FORECAST RESULTS
5. ITERATE & DEPLOY
PREDICT THE PRICE
OF VISCOSE FIBER
1. UNDERSTAND THE DOMAIN
19
• Shanghai & Shenzhen Stock Indices
• Competitor share prices
• Cotlook, Cotton, WTI & Crude Oil
indices
• USD to RMB, IDR, TRY, BRL, EUR & PKR
currency exchange rates
• CLP delivery & VFY prices, DWP Index,
Hardwood and Softwood import prices
• Viscose median prices and list prices of
competitor’s products
• Viscose competitor prices of other
purities
• 32s cotton and polyester yarn prices
• 30s rayon yarn price, knitting &
weaving prices, open end prices and
MVS prices.
• 40s rayon, SIRO, MVS prices & compact
Siro prices.
• 60s airjet rayon price, China grey fabric
prices & China cotton spot prices
• ….
AROUND 80 VARIABLES WERE AVAILABLE TO FORECAST VISCOSE PRICE
Daily Stock Prices/Indices,
Substitute, Downstream prices
Daily Order Intake,
inventory & raw materials
Weekly Operating rates
and Inventory variables
Monthly raw material
prices
Quarterly economic
indicators
• Viscose Order intake volume
• Viscose total order intake
volumes
• Physical inventory
• Log inventory
• Coal daily prices
• ….
• Operating rate
• Inventory levels
• Rayon Operating rate
• Rayon yarn inventory
• ….
• China Paper pulp rates
• Wuhu & Jiangxi prices of
Caustic soda
• SJX and L&M list prices of
Caustic soda
• ….
• China nominal GDP
• China GDP Constant price,
Cumulative prices, QoQ and
YoY prices
• ….
20
HERE’S WHAT THE DATA LOOKED LIKE
Data scope:
- Viscose price data available from Jan 2014 till August 2017 (~3.5 Years)
- Forecasting was done at a weekly level
- Aggregation/Disaggregation was done to a weekly level for all variables
Missing values:
Output variable - Viscose price
- Only 1% of the values were missing
Input variables
- Checked for coverage of the entire time period
- Checked for the extent of missing values. ~30 variables had high missing%.
They were dropped
- For the rest, imputation techniques such as back fill, forward fill,
neighborhood averages were applied
21
WHAT ARE THE COMMON COMPONENTS IN TIME SERIES?
Source: https://www.otexts.org/fpp/6/1
Seasonality Trend
Trend and Seasonality
Randomness
Or we could have combinations..
2. EXPLORE THE DATA
3. SELECT VARIABLES
4. FORECAST RESULTS
5. ITERATE & DEPLOY
PREDICT THE PRICE
OF VISCOSE FIBER
1. UNDERSTAND THE DOMAIN
23
CAUSALITY CHECKS WHETHER A VARIABLE AFFECTS VISCOSE PRICE
Perform a causality test to determine if
one time series can help forecast another –
Granger test
affects affects
• 1 variable certainly affects the viscose prices
This is included in the model.
• 17 variables may affect the viscose prices – or they may be affected by the prices. Both are possible.
These are included in the model
• 8 variables are indicators, not time series. These values are not available at a daily or weekly level.
These will be considered in the model.
• 5 variables are affected by viscose prices. We can use viscose price to predict them, but not the other way.
These are not included in the model.
• 15 variables have no relation to the viscose prices.
These are not included in the model.
24
CORRELATIONS CHECK ASSOCIATIONS WITH VISCOSE PRICES
Compute correlation between
variables to decide their usefulness
• 18 variables have a high correlation (> 60%).
But 3 of these are lagging indicators of viscose price and are not included.
The remaining 15 are included in the model.
• 5 variables have a medium correlation (30 – 60%)
But 2 of these are lagging indicators of viscose price and are not included.
There remaining 3 are included in the model.
• 23 variables have low correlation (< 30%).
These are not included in the model
• No variables have a negative correlation.
25
Causality Correlation
Affects Viscose Medium
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other High
Affect each other Medium
Affect each other Medium
NA Low
NA Low
NA Low
NA Low
NA Low
NA Low
NA Low
NA Low
26 OF THE 46 VARIABLES HELP WITH FORECASTING
Category Variable
1 Downstream Substitute prices 32s polyester yarn index
2 Competitor Prices <Competitor 1> list price
3 Competitor Prices <Competitor 2> list price
4 Competitor Prices <Competitor 3> price
5 Competitor Prices <Competitor 4> list price
6 Competitor Prices <Competitor 5> list price
7 Competitor Prices <Competitor 6> list price
8 Downstream Substitute prices 30s rayon yarn index
9 Substitute Indices Cotlook A cotton index
10 Substitute Indices Cotton future index
11 FeedStock price China cotton spot price index
12 FeedStock price Cotton Linter Ex-work Price
13 FeedStock price Import Hardwood DP index
14 Downstream Substitute prices 32s cotton yarn index
15 Further Downstream Prices China Grey Fabric Cotton 32S weaving
16 Further Downstream Prices China Grey Fabric Rayon 30S weaving
17 Substitute Prices CLP delivery Price
18 Substitute Indices China cotton spot price index
19 Economic variables China Nominal GDP (Billion USD)
20 Economic variables China GDP Constant Price Cumulative
21 Economic variables China GDP Constant Price YoY
22 Economic variables China GDP Constant Price QoQ
23 Economic variables China GDP Constant Price
24 Raw Material Prices SJX Caustic Soda list price
25 Raw Material Prices Wuhu Caustic Soda list price
26 Raw Material Prices China Paper Pulp price index
We evaluated each variable on 2
parameters:
1. Causality. Does the variable
cause a change in price? Or
do they potentially cause
changes in each other? Or are
they unrelated?
2. Correlation. When the
variable rises, does the price
also rise? By how much?
Likely influencers
May be influencers
Indicators
16 strong
2 moderate
8 unknowns
26
THE REMAINING 20 VARIABLES WILL WORSEN THE MODEL
Category Variable Causality Correlation
27 Competitor Prices <Competitor 7> list price Affected by Viscose High
28 Competitor Prices <Competitor 8> list price Affected by Viscose High
29 Competitor Prices <Competitor 9> list price Affected by Viscose High
30 Competitor Prices <Competitor 10> list price Affected by Viscose Medium
31 Competitor Prices <Competitor 11> list price Affected by Viscose Medium
32 Share Index Shanghai Stock Exchange Index No causality Low
33 Share Index Shenzhen Stock Exchange Index No causality Low
34 Competitor Share prices <Competitor 1> share price No causality Low
35 Competitor Share prices <Competitor 2> share price No causality Low
36 Competitor Share prices <Competitor 3> share price No causality Low
37 Competitor Share prices <Competitor 4> share price No causality Low
38 Competitor Share prices <Competitor 5> share price No causality Low
39 Indices WTI crude oil index No causality Low
40 Indices Brent crude oil index No causality Low
41 Exchange Rates RMB exchange rate No causality Low
42 Exchange Rates Euro exchange rate No causality Low
43 Exchange Rates Indonesian Rupiah exchange rate No causality Low
44 Exchange Rates Turkish Lira exchange rate No causality Low
45 Exchange Rates Brazilian Real exchange rate No causality Low
46 Exchange Rates Pakistani Rupee exchange rate No causality Low
These variables were rejected
because of:
1. Reverse Causality. Some of
them are affected by viscose
prices. Their price changes
after the viscose price changes.
So they are a lagging indicator
2. No causality. We find that
there is no proof in the data
that one of these variables
causes a change in the other.
Affected by viscose
Don’t affect viscose
5 variables
15 variables
2. EXPLORE THE DATA
3. SELECT VARIABLES
4. FORECAST RESULTS
5. ITERATE & DEPLOY
PREDICT THE PRICE
OF VISCOSE FIBER
1. UNDERSTAND THE DOMAIN
28
SOME OF THE MODELING TECHNIQUES APPLIED
• Univariate time series models
• Multi-variate time series models
• Moving averages: Simple smoothing by averaging the historical values to project the next period
• ETS: Exponential smoothing model extrapolates the timeseries by constructing a smoothing curve
• TBATS: A state space model that detects multiple seasonality in the data and assigns weights in order to forecast
• Auto regression: Specifies that the output depends linearly on its own previous values and on a stochastic term
• ARIMA: Combines Auto regression and Moving Average models by assigning weights to historical values
• Neural Nets: Builds neural net model with the historical values as inputs
• Ensemble: Combines outputs from all/many of the above models
• Vector Auto regressive (VAR) : Linear regression with its own lags and other variables lags
• Vector Error Correction (VECM): VAR models, but for co-integrated non-stationary time series models
29
UNIVARIATE: AUTO REGRESSIVE MODEL HAS 95% ACCURACY
Date Actual Forecast Error
10th -14th July 15,788 15,503 -1.8%
17th -21th July 15,946 15,529 -2.6%
24th -28th July 15,939 15,547 -2.5%
21st Jul-5th Aug 15,930 15,545 -2.4%
Fitted
Original
30
MULTI-VARIATE: VECTOR ERROR CORRECTION MODEL HAS A 96%
ACCURACY
Fitted
Original
Date Actual Forecast Error
10th -14th July 15,788 15,587 -1.3%
17th -21th July 15,946 15,778 -1.1%
24th -28th July 15,939 15,961 0.1%
21st Jul-5th Aug 15,930 16,097 1.0%
Using Rayon Yarn Index, Cotton Yarn
Index and so on
31
THE TOP 3 MODELS ACHIEVED OVER 95% ACCURACY
S. No Model Name Accuracy Parameters and Significant Variables Consideration
1 VECM Models 96.03% 10 variables þ
2 Auto Regression 95.70% Order = 4 þ
3 Neural Networks 95.54% (4,2,1) þ
4 State Space models 95.48% ý
5
Exponential smoothing
models
95.35% ý
6 Ensemble models 95.31%
Models: (ARIMA + Exponential smoothing + State space model +
Neural network) ý
7 ARIMA 94.00% ý
8 Moving Averages 93.99% Order = 4 ý
32
MID-POINT: 96% FORECAST ACCURACY WITH 26 VARIABLES
Aug 2017 Sep 2017 Oct 2017 Nov 2017
Data Exploration Forecasting Visualization
Optimization
33
“But, a 4% price variation will NOT help
me make a decision”
– Key decision maker on the client team
We had to go back to the drawing board to relook
at the modelling approach, midway in the project
2. EXPLORE THE DATA
3. SELECT VARIABLES
4. FORECAST RESULTS
5. ITERATE & DEPLOY
PREDICT THE PRICE
OF VISCOSE FIBER
1. UNDERSTAND THE DOMAIN
35
SVM
(1-Week Change)
Random Forest
(1-Week Change)
Fall Fall
Rise Rise
Rise Rise
Rise Rise
Rise Rise
Rise Rise
Rise Rise
Rise Rise
THE MODEL NOW FORECASTS THE CHANGE IN DIRECTION, NOT PRICE
Date Range Week Actuals
2017-06-05 - 2017-06-11 23 Rise
2017-06-12 - 2017-06-18 24 Rise
2017-06-19 - 2017-06-25 25 Rise
2017-06-26 - 2017-07-02 26 Rise
2017-07-03 - 2017-07-09 27 Rise
2017-07-10 - 2017-07-16 28 Rise
2017-07-17 - 2017-07-23 29 Fall
2017-07-24 - 2017-07-30 30 Fall
Both models wrong
Both models right
Both models wrong
81% 74%
36
THE MODELS SHOW FAIR STABILITY
SVM
Random Forest
Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model
Aug-17 74% 75% 78% 68% 66% 52%
Jul-17 72% 74% 75% 68% 63% 54%
Jun-17 75% 76% 82% 76% 70% 57%
May-17 74% 75% 78% 74% 60% 54%
Apr-17 75% 74% 78% 69% 63% 52%
Mar-17 75% 76% 79% 68% 64% 49%
Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model
Aug-17 81% 79% 81% 78% 70% 57%
Jul-17 76% 78% 78% 78% 69% 60%
Jun-17 76% 81% 82% 81% 73% 63%
May-17 74% 79% 79% 74% 70% 58%
Apr-17 74% 76% 79% 74% 70% 58%
Mar-17 74% 78% 79% 74% 66% 54%
Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model
Aug-17 81% 79% 81% 78% 70% 57%
Jul-17 76% 78% 78% 78% 69% 60%
Jun-17 76% 81% 82% 81% 73% 63%
May-17 74% 79% 79% 74% 70% 58%
Apr-17 74% 76% 79% 74% 70% 58%
Mar-17 74% 78% 79% 74% 66% 54%
Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model
Aug-17 81% 79% 81% 78% 70% 57%
Jul-17 76% 78% 78% 78% 69% 60%
Jun-17 76% 81% 82% 81% 73% 63%
May-17 74% 79% 79% 74% 70% 58%
Apr-17 74% 76% 79% 74% 70% 58%
Mar-17 74% 78% 79% 74% 66% 54%
37
MODEL ACCURACY: 80% TREND ACCURACY WITH 40 VARIABLES
Aug 2017 Sep 2017 Oct 2017 Nov 2017
Data Exploration Forecasting Visualization
Optimization
38
BUSINESS
APPLICATIONS
CHALLENGES AND
LEARNINGS
DEEP DIVE: PRICE
FORECASTING
39
GUIDELINES TO APPLY DATA SCIENCE TO BUSINESS PROBLEMS
1. Understand the business. How will the audience use your forecast?
• “I want to trade on the basis of forecasts”
• “I want to push or hold inventory on the basis of forecasts”
2. Understand the metric to forecast. What approaches are relevant?
• Will the price rise or fall? Will we will make a profit from the trade?
• Whether we need to buy additional capacity or rent it out
3. Apply multiple techniques and tune them
• New models and measures are evolving. Start with the simplest ones
• Set up an infrastructure to rapidly forecast and compare
4. Take feedback on actionability of insights. Iterate rapidly
• Every week, we ran multiple iterations – across 4 months
• Building an accelerator or a library helps a lot
5. All models are wrong. Machine learning is a perpetual work in progress
• Set up processes to monitor the models in production
• Plan ongoing refresh, maintenance and set expectations with clients
40
Thank You!
/gkesari
gramener.com
/thanoj-kumar-
reddy-kattamanchi
Please help us improve the session by
answering the feedback survey J
https://forms.gle/jwV4iKR2JfkCk13p9

More Related Content

PPTX
An intelligent approach to demand forecasting
PDF
A035401010
PDF
IRJET- Retail Chain Sales Analysis and Forecasting
DOCX
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
PPTX
[DSC DACH 24] Reduce waste by state-of-the art Demand Forecasting using AI an...
PPTX
Ch5 - Forecasting in marketing engineering
PPTX
Where's my T-Shirt? Supply chain forecasting in fashion
PPT
Ch3. Demand Forecasting.ppt
An intelligent approach to demand forecasting
A035401010
IRJET- Retail Chain Sales Analysis and Forecasting
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
[DSC DACH 24] Reduce waste by state-of-the art Demand Forecasting using AI an...
Ch5 - Forecasting in marketing engineering
Where's my T-Shirt? Supply chain forecasting in fashion
Ch3. Demand Forecasting.ppt

Similar to Penn State Guest Lecture: Business Forecasting in Real Life (20)

PPTX
Demand Estimation and Forecast
PPTX
Sourcing & Procurement Analytics for the modern enterprise
PPTX
Modeling in the Healthcare Industry: A Collaborative Approach
PPTX
Demandforecasting
PPTX
Time series modelling for price forecasting in plantation crops.pptx
PPTX
Forecasting in OPM.pptx
PDF
data-science-in-retailasaservice
PDF
Data Science in Retail-as-a-Service
PDF
Assumptions: Check yo'self before you wreck yourself
PDF
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
PPT
3...demand forecasting 1207335276942149-9
PPTX
Sales Forecast and Store Analysis for Data Analytics
PPT
Demand forecasting
PDF
10th Alex Marketing Club (Forecasting) by Dr. Haitham Maraei 6 Jan-2018
PDF
Smart E-Logistics for SCM Spend Analysis
PDF
Demand forecasting case study
PPT
Demandforecasting 1207335276942149-9
PPTX
Demand forecasting
PPTX
Demand forecasting
PPTX
Introduction to demand forecasting
Demand Estimation and Forecast
Sourcing & Procurement Analytics for the modern enterprise
Modeling in the Healthcare Industry: A Collaborative Approach
Demandforecasting
Time series modelling for price forecasting in plantation crops.pptx
Forecasting in OPM.pptx
data-science-in-retailasaservice
Data Science in Retail-as-a-Service
Assumptions: Check yo'self before you wreck yourself
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
3...demand forecasting 1207335276942149-9
Sales Forecast and Store Analysis for Data Analytics
Demand forecasting
10th Alex Marketing Club (Forecasting) by Dr. Haitham Maraei 6 Jan-2018
Smart E-Logistics for SCM Spend Analysis
Demand forecasting case study
Demandforecasting 1207335276942149-9
Demand forecasting
Demand forecasting
Introduction to demand forecasting
Ad

More from Ganes Kesari (20)

PDF
Project Management Careers in Data Science
PDF
How AI Can Help Anonymize Clinical Trial Data
PDF
500 startups cognitive bias in decision making - ganes kesari - nov 2021 - final
PDF
RBS Guest Lecture - Actionable Customer Intelligence with Journey Mapping
PDF
AI - Savior or Supervillain?
PDF
5 Steps to Transform into a Data-Driven Organization - Ganes Kesari - Gramen...
PDF
Applications of AI in Supply Chain Management: Hype versus Reality
PDF
How AI can Save Lives with the Help of Satellite Imagery
PDF
Saving lives by applying AI to Satellite imagery
PDF
What Really is AI and How will it Shape our Future?
PDF
How AI can help you make your Audience Sit up and take Notice
PDF
'Recession-proofing' your Business with Data
PDF
What's the Value of Data Science for Organizations: Tips for Invincibility in...
PDF
How Brands can use AI for Actionable Customer Intelligence
PDF
Transform your Brand's Customer Experience by using AI
PDF
How to Build Data Science Teams
PDF
How Data Science can help Understand your Customers Better
PDF
Why is it difficult to achieve strategic differentiation using AI
PDF
How Organizations can gain Strategic Advantage when Everyone is applying AI
PDF
How to Build Data Science Teams that Deliver Business Value
Project Management Careers in Data Science
How AI Can Help Anonymize Clinical Trial Data
500 startups cognitive bias in decision making - ganes kesari - nov 2021 - final
RBS Guest Lecture - Actionable Customer Intelligence with Journey Mapping
AI - Savior or Supervillain?
5 Steps to Transform into a Data-Driven Organization - Ganes Kesari - Gramen...
Applications of AI in Supply Chain Management: Hype versus Reality
How AI can Save Lives with the Help of Satellite Imagery
Saving lives by applying AI to Satellite imagery
What Really is AI and How will it Shape our Future?
How AI can help you make your Audience Sit up and take Notice
'Recession-proofing' your Business with Data
What's the Value of Data Science for Organizations: Tips for Invincibility in...
How Brands can use AI for Actionable Customer Intelligence
Transform your Brand's Customer Experience by using AI
How to Build Data Science Teams
How Data Science can help Understand your Customers Better
Why is it difficult to achieve strategic differentiation using AI
How Organizations can gain Strategic Advantage when Everyone is applying AI
How to Build Data Science Teams that Deliver Business Value
Ad

Recently uploaded (20)

PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
New ISO 27001_2022 standard and the changes
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
annual-report-2024-2025 original latest.
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Introduction to Inferential Statistics.pptx
PDF
Global Data and Analytics Market Outlook Report
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPT
Predictive modeling basics in data cleaning process
PDF
Microsoft 365 products and services descrption
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Business_Capability_Map_Collection__pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Introduction to Data Science and Data Analysis
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
New ISO 27001_2022 standard and the changes
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Navigating the Thai Supplements Landscape.pdf
annual-report-2024-2025 original latest.
retention in jsjsksksksnbsndjddjdnFPD.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Introduction to Inferential Statistics.pptx
Global Data and Analytics Market Outlook Report
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Predictive modeling basics in data cleaning process
Microsoft 365 products and services descrption
A Complete Guide to Streamlining Business Processes
Business_Capability_Map_Collection__pptx
ISS -ESG Data flows What is ESG and HowHow
CYBER SECURITY the Next Warefare Tactics
Introduction to Data Science and Data Analysis
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...

Penn State Guest Lecture: Business Forecasting in Real Life

  • 1. Business Forecasting in Real Life Ganes Kesari Nov 2021 Thanoj Kattamanchi
  • 4. 4 80% of analytics insights will not deliver business outcomes through 2022. - Gartner Source: Gartner, 2019
  • 5. 5 The biggest challenge for data scientists is not with the mastery of techniques..
  • 6. 6 …IT IS IN THE RIGHT APPLICATION OF TECHNIQUES TO PROBLEMS Approach to data Insights Insights as recommendations What actions will help me convert my detractors into promoters? Machine learning models What will be my promoter score next quarter? Statistics, Explorations What led to lower satisfaction in North America? Simple summaries Did I improve on customer satisfaction?
  • 7. 7 ABOUT 1/4TH OF OUR CLIENTS ASK FOR FORECASTING This is a portfolio of some of the analytics work we’ve done. The forecasting projects are highlighted. Customer Feedback Analysis TV Show Content Analysis Earnings Transcript Analysis What do parties focus on? Duplicate Candidate Names TV Show Drivers Securities Clustering Geo-demographic segments Advertiser Clustering Energy Fraud Detection Directorship Network Analysis for Fraud Restaurant Sales Correlation Restaurant Revenue Analysis Telecom Customer Clusters Sentiment Carry-Forward Brand Association Mapping TV Viewership Forecasting Budget Forecasting Impact of Age and number of Contestants on Votes Price Forecasting Telecom Churn Prediction Server Demand & Capacity Restaurant Sales Forecasting Bank Disbursement Forecast Retail Sales Forecasting Bot forecasting Cargo Delay Optimization Service Request Workflow Poultry Mortality Drivers Student Performance Drivers Factors Affecting Attrition Predicting Job Failure Factors Driving Performance Route Optimization Product Recommendation Descriptive Diagnostic Predictive Prescriptive
  • 8. Problem Approach Outcome BUDGET FORECASTING FOR THE INDIAN MINISTRY OF FINANCE A national government wanted to forecast the annual spend and deficit. This was needed at the lowest level of granularity, individual accounts in each department. With only 3 years of historical data available at an annual level, simpler forecasting techniques were used. Regression-based techniques delivered reliable estimates. Multi-level forecasts of budget and actual expenditure helped the government officials plan outlays efficiently.
  • 9. 9 BUDGET FORECASTING FOR THE INDIAN MINISTRY OF FINANCE Source: Gramener – Budget planning & Cash management
  • 10. Problem Approach Outcome DEMAND & CAPACITY MANAGEMENT OF SERVERS FOR A LEADING PHARMA COMPANY A leading pharma company wanted to plan IT server capacities (CPU, RAM, Disk Space) thru forecasting of server utilization at a daily level. This would help the Infrastructure operations better plan consumption needs and in procurement of server resources Weekly, Monthly Level Utilizations for ~15000 Servers were forecasted using a variety of time series models. Servers were classified and Smart Assistance Tags were placed to monitor server performance & utilization CPU, RAM, Disk Utilization were forecasted at an accuracy of 80%+, much more accurate then client methods. Through a custom application, utilization forecasts and continuous monitoring thru tags could be visualized providing higher visibility
  • 11. 11 SERVER FAILURE PREDICTION: PROCESS & OUTCOME
  • 12. Problem Approach Outcome PRICE FORECASTING FOR AN ASIAN AGRICULTURAL ENTERPRISE A leading agricultural enterprise wanted price forecasts for their products in order to plan inventory release to optimize revenue. Incorrect timing was leading either to loss of revenue or unsold inventory. Gramener applied a suite of price forecasting models based on internal and external factors. The models were evaluated on multiple test datasets to select one that minimized median absolute deviation. The model was able to forecast the price to an accuracy of 88%. Within the first quarter of deploying the model, the revenue uplift attributable directly to pricing was +3.2%.
  • 13. 13 A COMPARISON OF PRICE FORECAST ACCURACY OF PURE MODELS
  • 15. 2. EXPLORE THE DATA 3. SELECT VARIABLES 4. FORECAST RESULTS 5. ITERATE & DEPLOY PREDICT THE PRICE OF VISCOSE FIBER 1. UNDERSTAND THE DOMAIN
  • 16. 16 WHAT’S VISCOSE? A QUICK OVERVIEW Wood Dissolving Pulp Viscose (Dissolving Pulp + Caustic Soda + Coal) Hygiene products like Diapers Yarn <????> Viscose is manufactured in 3 varieties, based on purity. The client’s buyers were majorly from China.
  • 18. 2. EXPLORE THE DATA 3. SELECT VARIABLES 4. FORECAST RESULTS 5. ITERATE & DEPLOY PREDICT THE PRICE OF VISCOSE FIBER 1. UNDERSTAND THE DOMAIN
  • 19. 19 • Shanghai & Shenzhen Stock Indices • Competitor share prices • Cotlook, Cotton, WTI & Crude Oil indices • USD to RMB, IDR, TRY, BRL, EUR & PKR currency exchange rates • CLP delivery & VFY prices, DWP Index, Hardwood and Softwood import prices • Viscose median prices and list prices of competitor’s products • Viscose competitor prices of other purities • 32s cotton and polyester yarn prices • 30s rayon yarn price, knitting & weaving prices, open end prices and MVS prices. • 40s rayon, SIRO, MVS prices & compact Siro prices. • 60s airjet rayon price, China grey fabric prices & China cotton spot prices • …. AROUND 80 VARIABLES WERE AVAILABLE TO FORECAST VISCOSE PRICE Daily Stock Prices/Indices, Substitute, Downstream prices Daily Order Intake, inventory & raw materials Weekly Operating rates and Inventory variables Monthly raw material prices Quarterly economic indicators • Viscose Order intake volume • Viscose total order intake volumes • Physical inventory • Log inventory • Coal daily prices • …. • Operating rate • Inventory levels • Rayon Operating rate • Rayon yarn inventory • …. • China Paper pulp rates • Wuhu & Jiangxi prices of Caustic soda • SJX and L&M list prices of Caustic soda • …. • China nominal GDP • China GDP Constant price, Cumulative prices, QoQ and YoY prices • ….
  • 20. 20 HERE’S WHAT THE DATA LOOKED LIKE Data scope: - Viscose price data available from Jan 2014 till August 2017 (~3.5 Years) - Forecasting was done at a weekly level - Aggregation/Disaggregation was done to a weekly level for all variables Missing values: Output variable - Viscose price - Only 1% of the values were missing Input variables - Checked for coverage of the entire time period - Checked for the extent of missing values. ~30 variables had high missing%. They were dropped - For the rest, imputation techniques such as back fill, forward fill, neighborhood averages were applied
  • 21. 21 WHAT ARE THE COMMON COMPONENTS IN TIME SERIES? Source: https://www.otexts.org/fpp/6/1 Seasonality Trend Trend and Seasonality Randomness Or we could have combinations..
  • 22. 2. EXPLORE THE DATA 3. SELECT VARIABLES 4. FORECAST RESULTS 5. ITERATE & DEPLOY PREDICT THE PRICE OF VISCOSE FIBER 1. UNDERSTAND THE DOMAIN
  • 23. 23 CAUSALITY CHECKS WHETHER A VARIABLE AFFECTS VISCOSE PRICE Perform a causality test to determine if one time series can help forecast another – Granger test affects affects • 1 variable certainly affects the viscose prices This is included in the model. • 17 variables may affect the viscose prices – or they may be affected by the prices. Both are possible. These are included in the model • 8 variables are indicators, not time series. These values are not available at a daily or weekly level. These will be considered in the model. • 5 variables are affected by viscose prices. We can use viscose price to predict them, but not the other way. These are not included in the model. • 15 variables have no relation to the viscose prices. These are not included in the model.
  • 24. 24 CORRELATIONS CHECK ASSOCIATIONS WITH VISCOSE PRICES Compute correlation between variables to decide their usefulness • 18 variables have a high correlation (> 60%). But 3 of these are lagging indicators of viscose price and are not included. The remaining 15 are included in the model. • 5 variables have a medium correlation (30 – 60%) But 2 of these are lagging indicators of viscose price and are not included. There remaining 3 are included in the model. • 23 variables have low correlation (< 30%). These are not included in the model • No variables have a negative correlation.
  • 25. 25 Causality Correlation Affects Viscose Medium Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other High Affect each other Medium Affect each other Medium NA Low NA Low NA Low NA Low NA Low NA Low NA Low NA Low 26 OF THE 46 VARIABLES HELP WITH FORECASTING Category Variable 1 Downstream Substitute prices 32s polyester yarn index 2 Competitor Prices <Competitor 1> list price 3 Competitor Prices <Competitor 2> list price 4 Competitor Prices <Competitor 3> price 5 Competitor Prices <Competitor 4> list price 6 Competitor Prices <Competitor 5> list price 7 Competitor Prices <Competitor 6> list price 8 Downstream Substitute prices 30s rayon yarn index 9 Substitute Indices Cotlook A cotton index 10 Substitute Indices Cotton future index 11 FeedStock price China cotton spot price index 12 FeedStock price Cotton Linter Ex-work Price 13 FeedStock price Import Hardwood DP index 14 Downstream Substitute prices 32s cotton yarn index 15 Further Downstream Prices China Grey Fabric Cotton 32S weaving 16 Further Downstream Prices China Grey Fabric Rayon 30S weaving 17 Substitute Prices CLP delivery Price 18 Substitute Indices China cotton spot price index 19 Economic variables China Nominal GDP (Billion USD) 20 Economic variables China GDP Constant Price Cumulative 21 Economic variables China GDP Constant Price YoY 22 Economic variables China GDP Constant Price QoQ 23 Economic variables China GDP Constant Price 24 Raw Material Prices SJX Caustic Soda list price 25 Raw Material Prices Wuhu Caustic Soda list price 26 Raw Material Prices China Paper Pulp price index We evaluated each variable on 2 parameters: 1. Causality. Does the variable cause a change in price? Or do they potentially cause changes in each other? Or are they unrelated? 2. Correlation. When the variable rises, does the price also rise? By how much? Likely influencers May be influencers Indicators 16 strong 2 moderate 8 unknowns
  • 26. 26 THE REMAINING 20 VARIABLES WILL WORSEN THE MODEL Category Variable Causality Correlation 27 Competitor Prices <Competitor 7> list price Affected by Viscose High 28 Competitor Prices <Competitor 8> list price Affected by Viscose High 29 Competitor Prices <Competitor 9> list price Affected by Viscose High 30 Competitor Prices <Competitor 10> list price Affected by Viscose Medium 31 Competitor Prices <Competitor 11> list price Affected by Viscose Medium 32 Share Index Shanghai Stock Exchange Index No causality Low 33 Share Index Shenzhen Stock Exchange Index No causality Low 34 Competitor Share prices <Competitor 1> share price No causality Low 35 Competitor Share prices <Competitor 2> share price No causality Low 36 Competitor Share prices <Competitor 3> share price No causality Low 37 Competitor Share prices <Competitor 4> share price No causality Low 38 Competitor Share prices <Competitor 5> share price No causality Low 39 Indices WTI crude oil index No causality Low 40 Indices Brent crude oil index No causality Low 41 Exchange Rates RMB exchange rate No causality Low 42 Exchange Rates Euro exchange rate No causality Low 43 Exchange Rates Indonesian Rupiah exchange rate No causality Low 44 Exchange Rates Turkish Lira exchange rate No causality Low 45 Exchange Rates Brazilian Real exchange rate No causality Low 46 Exchange Rates Pakistani Rupee exchange rate No causality Low These variables were rejected because of: 1. Reverse Causality. Some of them are affected by viscose prices. Their price changes after the viscose price changes. So they are a lagging indicator 2. No causality. We find that there is no proof in the data that one of these variables causes a change in the other. Affected by viscose Don’t affect viscose 5 variables 15 variables
  • 27. 2. EXPLORE THE DATA 3. SELECT VARIABLES 4. FORECAST RESULTS 5. ITERATE & DEPLOY PREDICT THE PRICE OF VISCOSE FIBER 1. UNDERSTAND THE DOMAIN
  • 28. 28 SOME OF THE MODELING TECHNIQUES APPLIED • Univariate time series models • Multi-variate time series models • Moving averages: Simple smoothing by averaging the historical values to project the next period • ETS: Exponential smoothing model extrapolates the timeseries by constructing a smoothing curve • TBATS: A state space model that detects multiple seasonality in the data and assigns weights in order to forecast • Auto regression: Specifies that the output depends linearly on its own previous values and on a stochastic term • ARIMA: Combines Auto regression and Moving Average models by assigning weights to historical values • Neural Nets: Builds neural net model with the historical values as inputs • Ensemble: Combines outputs from all/many of the above models • Vector Auto regressive (VAR) : Linear regression with its own lags and other variables lags • Vector Error Correction (VECM): VAR models, but for co-integrated non-stationary time series models
  • 29. 29 UNIVARIATE: AUTO REGRESSIVE MODEL HAS 95% ACCURACY Date Actual Forecast Error 10th -14th July 15,788 15,503 -1.8% 17th -21th July 15,946 15,529 -2.6% 24th -28th July 15,939 15,547 -2.5% 21st Jul-5th Aug 15,930 15,545 -2.4% Fitted Original
  • 30. 30 MULTI-VARIATE: VECTOR ERROR CORRECTION MODEL HAS A 96% ACCURACY Fitted Original Date Actual Forecast Error 10th -14th July 15,788 15,587 -1.3% 17th -21th July 15,946 15,778 -1.1% 24th -28th July 15,939 15,961 0.1% 21st Jul-5th Aug 15,930 16,097 1.0% Using Rayon Yarn Index, Cotton Yarn Index and so on
  • 31. 31 THE TOP 3 MODELS ACHIEVED OVER 95% ACCURACY S. No Model Name Accuracy Parameters and Significant Variables Consideration 1 VECM Models 96.03% 10 variables þ 2 Auto Regression 95.70% Order = 4 þ 3 Neural Networks 95.54% (4,2,1) þ 4 State Space models 95.48% ý 5 Exponential smoothing models 95.35% ý 6 Ensemble models 95.31% Models: (ARIMA + Exponential smoothing + State space model + Neural network) ý 7 ARIMA 94.00% ý 8 Moving Averages 93.99% Order = 4 ý
  • 32. 32 MID-POINT: 96% FORECAST ACCURACY WITH 26 VARIABLES Aug 2017 Sep 2017 Oct 2017 Nov 2017 Data Exploration Forecasting Visualization Optimization
  • 33. 33 “But, a 4% price variation will NOT help me make a decision” – Key decision maker on the client team We had to go back to the drawing board to relook at the modelling approach, midway in the project
  • 34. 2. EXPLORE THE DATA 3. SELECT VARIABLES 4. FORECAST RESULTS 5. ITERATE & DEPLOY PREDICT THE PRICE OF VISCOSE FIBER 1. UNDERSTAND THE DOMAIN
  • 35. 35 SVM (1-Week Change) Random Forest (1-Week Change) Fall Fall Rise Rise Rise Rise Rise Rise Rise Rise Rise Rise Rise Rise Rise Rise THE MODEL NOW FORECASTS THE CHANGE IN DIRECTION, NOT PRICE Date Range Week Actuals 2017-06-05 - 2017-06-11 23 Rise 2017-06-12 - 2017-06-18 24 Rise 2017-06-19 - 2017-06-25 25 Rise 2017-06-26 - 2017-07-02 26 Rise 2017-07-03 - 2017-07-09 27 Rise 2017-07-10 - 2017-07-16 28 Rise 2017-07-17 - 2017-07-23 29 Fall 2017-07-24 - 2017-07-30 30 Fall Both models wrong Both models right Both models wrong 81% 74%
  • 36. 36 THE MODELS SHOW FAIR STABILITY SVM Random Forest Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model Aug-17 74% 75% 78% 68% 66% 52% Jul-17 72% 74% 75% 68% 63% 54% Jun-17 75% 76% 82% 76% 70% 57% May-17 74% 75% 78% 74% 60% 54% Apr-17 75% 74% 78% 69% 63% 52% Mar-17 75% 76% 79% 68% 64% 49% Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model Aug-17 81% 79% 81% 78% 70% 57% Jul-17 76% 78% 78% 78% 69% 60% Jun-17 76% 81% 82% 81% 73% 63% May-17 74% 79% 79% 74% 70% 58% Apr-17 74% 76% 79% 74% 70% 58% Mar-17 74% 78% 79% 74% 66% 54% Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model Aug-17 81% 79% 81% 78% 70% 57% Jul-17 76% 78% 78% 78% 69% 60% Jun-17 76% 81% 82% 81% 73% 63% May-17 74% 79% 79% 74% 70% 58% Apr-17 74% 76% 79% 74% 70% 58% Mar-17 74% 78% 79% 74% 66% 54% Evaluated for 1 Week Model 2 Week Model 3 Week Model 4 Week Model 5 Week Model 6 Week Model Aug-17 81% 79% 81% 78% 70% 57% Jul-17 76% 78% 78% 78% 69% 60% Jun-17 76% 81% 82% 81% 73% 63% May-17 74% 79% 79% 74% 70% 58% Apr-17 74% 76% 79% 74% 70% 58% Mar-17 74% 78% 79% 74% 66% 54%
  • 37. 37 MODEL ACCURACY: 80% TREND ACCURACY WITH 40 VARIABLES Aug 2017 Sep 2017 Oct 2017 Nov 2017 Data Exploration Forecasting Visualization Optimization
  • 39. 39 GUIDELINES TO APPLY DATA SCIENCE TO BUSINESS PROBLEMS 1. Understand the business. How will the audience use your forecast? • “I want to trade on the basis of forecasts” • “I want to push or hold inventory on the basis of forecasts” 2. Understand the metric to forecast. What approaches are relevant? • Will the price rise or fall? Will we will make a profit from the trade? • Whether we need to buy additional capacity or rent it out 3. Apply multiple techniques and tune them • New models and measures are evolving. Start with the simplest ones • Set up an infrastructure to rapidly forecast and compare 4. Take feedback on actionability of insights. Iterate rapidly • Every week, we ran multiple iterations – across 4 months • Building an accelerator or a library helps a lot 5. All models are wrong. Machine learning is a perpetual work in progress • Set up processes to monitor the models in production • Plan ongoing refresh, maintenance and set expectations with clients
  • 40. 40 Thank You! /gkesari gramener.com /thanoj-kumar- reddy-kattamanchi Please help us improve the session by answering the feedback survey J https://forms.gle/jwV4iKR2JfkCk13p9