SlideShare a Scribd company logo
- Pro . Sid r Cha r
si h h _c a d @s i .ac.in
Forecasting Techniques
● Hypothesis Testing
● Alpha and Critical Values
● Errors in Hypothesis Testing
● Independent and dependent t-tests
● Chi-Square Tests
● Goodness of Fit test
● Test of Independence
● Anova - one-way ANOVA, two-way ANOVA
Recap
Todays Specials
● Introduction
● Components
● Errors
● Moving Average
● Exponential Smoothing
● Regression
● ARIMA
● Tests
● Thiel’s coefficient
Lecture_18 hypothesis testing and probability
● most important and frequently used application of
predictive analytics
● long-range and short-range planning for the organization
● forecasting demand for product and service is an important
input for both plannings
● manpower planning, machine capacity, warehouse
capacity, materials requirements planning (MRP) depend
on the forecasted demand for the product/ service
Forecasting: Introduction
● Trend (Tt
) → Consistent long-term upward or downward
movement of data over a period of time
● Seasonality (St
) → repetitive upward/downward
movement from the trend that occurs within a year
(seasons, qrtrs, months, etc.)
● Cyclical component (Ct
) → fluctuation around the trend
line due to changes such as recession, unemployment, etc
● Irregular component (It
) → white noise or random
uncorrelated changes that follow a normal distribution
with mean value of 0 and constant variance
Components of Time-Series Data
Components of Time-Series Data
Yt
=Tt
+ St
+ Ct
+ It
Additive Time-series
Yt
=Tt
X St
X Ct
X It
Multiplicative Time-series
● Mean Absolute Error
● Mean Absolute Percentage Error
● Mean Square Error
● Root Mean Square Error
Errors in Forecasting
Moving Average
● Simple Moving Average
● simplest forecasting techniques which
forecasts the future value of a time- series
● uses average of the past ‘N’ observations
● Weighted Moving Average
● Wk
→ weight given to the value of Y at
time k (yk
) and
Exponential Smoothing
● Assign differential weights to past observations
● SES (Simple ES) → weights assigned to past data decline
exponentially; most recent observations assigned ↑ weights
Ft+1
=𝜶Yt
+(1−𝜶)Ft
Substituting Ft
recursively:
Ft+1
=𝜶Yt
+𝜶(1−𝜶)Yt-1
+ 𝜶(1−𝜶)2
Yt-2
+...+ 𝜶(1−𝜶)t-1
Y1
+ (1−𝜶)t
F1
Exponential Smoothing
1. Uses all the historic data, unlike MA, to predict the future value
2. Assigns progressively decreasing weights to older data
1. Increasing ‘n’ makes forecast less sensitive to changes in data
2. Always lags behind trend as its based on past observations
3. Forecast bias & systematic errors occur when observations exhibit strong trend or seasonal patterns
● If data is smooth, we may choose higher value of 𝛂
● If data is fluctuating, lower value of 𝛂 is preferred
● Optimal value: Solve a nonlinear optimization problem
Optimal 𝛂 in Exponential Smoothing
● SES does not do well in presence of trend
● Introduce addnl eqn for capturing trend in time-series data
● 2 equations for forecasting:
○ Level (short-term avg)
○ Trend
Double ES - Holt’s method
OR
● MA, SES, DES do not handle seasonality component
● Fitted errors => systematic error patterns due to seasonality
● TES → when data has trend as well as seasonality
● 3 eqns for forecasting:
○ Level
○ Trend
○ Seasonal
Triple ES - Holt-Winter method
● More appropriate in presence of predictor variables
Here Ft
is the forecasted value of Yt
, and X1t
, X2t
, etc. are the
predictor variables measured at time t
Regression
Forecasting in presence of seasonality
● The initial ARMA & ARIMA models ⇒ Box & Jenkins in 1970
● auto-regression ⇒ regression of a variable on itself measured
at different time periods
● AR model assumption: Time-series is a stationary process
○ The mean values of Yt
at different values of t are constant
○ The variances of Yt at different time periods are constant
○ Covariances of Yt
& Yt-k
for different lags depend only on
k
● Non-stationary data ⇒ stationary before applying AR
AR, MA and ARMA
● Auto-regressive model with lag 1, AR(1), is given by
OR
AR models
𝛃 can be
estimated
using OLS
● Auto-regressive model with lag 1, AR(1)
● Auto-regressive model p lags, AR(p)
AR models (contd)
Forecast
● Q: How to identify the value of ‘p’ (number of lags)?
● Ans: Auto-correlation function(ACF) & Partial ACF
● Auto-correlation ⇒ memory of a process
● Auto-correlation of k-lags (correlation between Yt
and Yt-k
) is:
● A plot of auto-correlation for different values of k ⇒ ACF
● Partial auto-correlation of lag k (𝞀pk
) ⇒ correlation b/w Yt
& Yt-k
w/o influence of all intermediate values (Yt−1
, Yt−2
, ..., Yt−k+1
)
● Plot of partial auto-correlation for different values of k → PACF
AR model identification: ACF & PACF
AR model identification: ACF & PACF
The null hypothesis is rejected when 𝞀k
>1.96/ sqrt(n) and 𝞀pk
>1.96/ sqrt(n)
Thumb-rule: The number of lags is ‘p’ when:
● Partial autocorrelation, 𝞀k
> 1.96 / sqrt(n) for first p values & cuts off to 0
● The auto-correlation function (ACF), 𝞀k
, decreases exponentially
● Past residuals are used for forecasting future values of the
time-series data
● MA process is different from MA technique
● MA process of lag 1, MA(1) is given by:
● MA process with q lags, MA(q), is given by:
MA Process MA(q)
MA Process MA(q)
ARMA(p, q) process
● Can be used only when the time-series data is non-stationary
● ARIMA has the following three components:
○ Auto-regressive component with p lags AR(p)
○ Integration component I(d)
○ Moving average with q lags, MA(q)
● integration component: non-stationary ⇒ stationary
● A Slow decrease in ACF ⇒ non-stationary process
● In addition to ACF plot, Dickey−Fuller or augmented
Dickey−Fuller tests can check the presence of stationarity
ARIMA process
ARIMA process
● Consider AR(1) process as below:
● AR(1) process can become very large when 𝛃 > 1 and is
non-stationary when |𝛃 | = 1
● DF is a hypothesis test with H0
and HA
as below:
● AR(1) ⇒
Tests: Dickey Fuller Test
● DF test is valid only when residual 𝛆t+1
follows a white noise
● When 𝛆t+1
is not white noise ⇒ series may not be AR(1)
● To address this, augment p-lags of dependent variable Y
Tests: Augmented DF Test
● 1st
step in ARIMA → identify order of difference (d)
● Factors for non-stationarity: Trend & Seasonality
● Trend stationarity: Fit a trend line and subtract from time series
● Difference stationarity: Difference the original time-series
○ 1st
difference (d = 1), ▽Yt
= Yt
- Yt-1
○ 2nd
difference (d=2), ▽2
Yt
= ▽(▽Yt
) = Yt
- 2Yt-1
+ Yt-2
Non-stationary ⇒ Stationary process
ARIMA(p,d,q) model building
● Stage 1: Model Identification
○ Refer flowchart
● Stage 2: Parameter Estimation & Model Selection
○ Estimate coefficients in AR & MA components using OLS
○ Model selection criteria: RMSE, MAPE, AIC, BIC
AIC & BIC ⇒ distance measure between actual & forecasted values
AIC = −2LL + 2K BIC = −2LL + K ln(n)
● Stage 3: Model Validation
○ Should satisfy all the assumptions of regression
○ The residual should be white noise
ARIMA(p,d,q) model building
● Comparison b/w Naïve forecasting & developed model
● Naïve forecasting model: Ft+1
= Yt
● Theil’s coefficient (U-statistic) is given by:
● U < 1 ⇒ forecasting model is better than Naïve model
● U > 1 ⇒ forecasting model is not better than Naïve model
Power of Forecasting Model: Theil’s coeff
Recap
● Introduction
● Components
● Errors
● Moving Average
● Exponential Smoothing
● Regression
● ARIMA
● Tests
● Power of Forecasting model: Thiel’s coefficient

More Related Content

PPTX
Time series analysis
PDF
Different Models Used In Time Series - InsideAIML
PDF
Forecasting%20Economic%20Series%20using%20ARMA
PPTX
Business Analytics Foundation with R tool - Part 5
PDF
Time Series for FRAM-Second_Sem_2021-22 (1).pdf
PPT
ARIMA Model for analysis of time series data.ppt
PPT
ARIMA Model.ppt
PPT
ARIMA Model.ppt
Time series analysis
Different Models Used In Time Series - InsideAIML
Forecasting%20Economic%20Series%20using%20ARMA
Business Analytics Foundation with R tool - Part 5
Time Series for FRAM-Second_Sem_2021-22 (1).pdf
ARIMA Model for analysis of time series data.ppt
ARIMA Model.ppt
ARIMA Model.ppt

Similar to Lecture_18 hypothesis testing and probability (20)

PPT
ARIMA Model agfmabmnbamnbgf afgkhkahkgh asfhdkhkasfhk
PDF
arimamodel-170204090012.pdf
PDF
timeseries cheat sheet with example code for R
PDF
Forecasting time series powerful and simple
PPTX
Arima model
PDF
a brief introduction to Arima
PPTX
HR Cost Forecasting using ARIMA modelling
PDF
Intro to Forecasting in R - Part 4
PPTX
Time series analysis 101
PPTX
Air Passenger Prediction Using ARIMA Model
PPTX
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
PPTX
ARIMA MODEL USED FOR TIME SERIES FORECASTING
PPT
Time series mnr
PPT
Arima model (time series)
PDF
PPTX
Project time series ppt
PDF
Time series modelling arima-arch
PPTX
Time series Modelling Basics
PDF
6-130914140240-phpapp01.pdf
ARIMA Model agfmabmnbamnbgf afgkhkahkgh asfhdkhkasfhk
arimamodel-170204090012.pdf
timeseries cheat sheet with example code for R
Forecasting time series powerful and simple
Arima model
a brief introduction to Arima
HR Cost Forecasting using ARIMA modelling
Intro to Forecasting in R - Part 4
Time series analysis 101
Air Passenger Prediction Using ARIMA Model
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
ARIMA MODEL USED FOR TIME SERIES FORECASTING
Time series mnr
Arima model (time series)
Project time series ppt
Time series modelling arima-arch
Time series Modelling Basics
6-130914140240-phpapp01.pdf
Ad

Recently uploaded (20)

PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Mega Projects Data Mega Projects Data
PDF
Business Analytics and business intelligence.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
annual-report-2024-2025 original latest.
PPT
Reliability_Chapter_ presentation 1221.5784
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Qualitative Qantitative and Mixed Methods.pptx
ISS -ESG Data flows What is ESG and HowHow
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Acumen Training GuidePresentation.pptx
Fluorescence-microscope_Botany_detailed content
Mega Projects Data Mega Projects Data
Business Analytics and business intelligence.pdf
Supervised vs unsupervised machine learning algorithms
annual-report-2024-2025 original latest.
Reliability_Chapter_ presentation 1221.5784
Ad

Lecture_18 hypothesis testing and probability

  • 1. - Pro . Sid r Cha r si h h _c a d @s i .ac.in Forecasting Techniques
  • 2. ● Hypothesis Testing ● Alpha and Critical Values ● Errors in Hypothesis Testing ● Independent and dependent t-tests ● Chi-Square Tests ● Goodness of Fit test ● Test of Independence ● Anova - one-way ANOVA, two-way ANOVA Recap
  • 3. Todays Specials ● Introduction ● Components ● Errors ● Moving Average ● Exponential Smoothing ● Regression ● ARIMA ● Tests ● Thiel’s coefficient
  • 5. ● most important and frequently used application of predictive analytics ● long-range and short-range planning for the organization ● forecasting demand for product and service is an important input for both plannings ● manpower planning, machine capacity, warehouse capacity, materials requirements planning (MRP) depend on the forecasted demand for the product/ service Forecasting: Introduction
  • 6. ● Trend (Tt ) → Consistent long-term upward or downward movement of data over a period of time ● Seasonality (St ) → repetitive upward/downward movement from the trend that occurs within a year (seasons, qrtrs, months, etc.) ● Cyclical component (Ct ) → fluctuation around the trend line due to changes such as recession, unemployment, etc ● Irregular component (It ) → white noise or random uncorrelated changes that follow a normal distribution with mean value of 0 and constant variance Components of Time-Series Data
  • 7. Components of Time-Series Data Yt =Tt + St + Ct + It Additive Time-series Yt =Tt X St X Ct X It Multiplicative Time-series
  • 8. ● Mean Absolute Error ● Mean Absolute Percentage Error ● Mean Square Error ● Root Mean Square Error Errors in Forecasting
  • 9. Moving Average ● Simple Moving Average ● simplest forecasting techniques which forecasts the future value of a time- series ● uses average of the past ‘N’ observations ● Weighted Moving Average ● Wk → weight given to the value of Y at time k (yk ) and
  • 10. Exponential Smoothing ● Assign differential weights to past observations ● SES (Simple ES) → weights assigned to past data decline exponentially; most recent observations assigned ↑ weights Ft+1 =𝜶Yt +(1−𝜶)Ft Substituting Ft recursively: Ft+1 =𝜶Yt +𝜶(1−𝜶)Yt-1 + 𝜶(1−𝜶)2 Yt-2 +...+ 𝜶(1−𝜶)t-1 Y1 + (1−𝜶)t F1
  • 11. Exponential Smoothing 1. Uses all the historic data, unlike MA, to predict the future value 2. Assigns progressively decreasing weights to older data 1. Increasing ‘n’ makes forecast less sensitive to changes in data 2. Always lags behind trend as its based on past observations 3. Forecast bias & systematic errors occur when observations exhibit strong trend or seasonal patterns
  • 12. ● If data is smooth, we may choose higher value of 𝛂 ● If data is fluctuating, lower value of 𝛂 is preferred ● Optimal value: Solve a nonlinear optimization problem Optimal 𝛂 in Exponential Smoothing
  • 13. ● SES does not do well in presence of trend ● Introduce addnl eqn for capturing trend in time-series data ● 2 equations for forecasting: ○ Level (short-term avg) ○ Trend Double ES - Holt’s method OR
  • 14. ● MA, SES, DES do not handle seasonality component ● Fitted errors => systematic error patterns due to seasonality ● TES → when data has trend as well as seasonality ● 3 eqns for forecasting: ○ Level ○ Trend ○ Seasonal Triple ES - Holt-Winter method
  • 15. ● More appropriate in presence of predictor variables Here Ft is the forecasted value of Yt , and X1t , X2t , etc. are the predictor variables measured at time t Regression
  • 16. Forecasting in presence of seasonality
  • 17. ● The initial ARMA & ARIMA models ⇒ Box & Jenkins in 1970 ● auto-regression ⇒ regression of a variable on itself measured at different time periods ● AR model assumption: Time-series is a stationary process ○ The mean values of Yt at different values of t are constant ○ The variances of Yt at different time periods are constant ○ Covariances of Yt & Yt-k for different lags depend only on k ● Non-stationary data ⇒ stationary before applying AR AR, MA and ARMA
  • 18. ● Auto-regressive model with lag 1, AR(1), is given by OR AR models 𝛃 can be estimated using OLS
  • 19. ● Auto-regressive model with lag 1, AR(1) ● Auto-regressive model p lags, AR(p) AR models (contd) Forecast
  • 20. ● Q: How to identify the value of ‘p’ (number of lags)? ● Ans: Auto-correlation function(ACF) & Partial ACF ● Auto-correlation ⇒ memory of a process ● Auto-correlation of k-lags (correlation between Yt and Yt-k ) is: ● A plot of auto-correlation for different values of k ⇒ ACF ● Partial auto-correlation of lag k (𝞀pk ) ⇒ correlation b/w Yt & Yt-k w/o influence of all intermediate values (Yt−1 , Yt−2 , ..., Yt−k+1 ) ● Plot of partial auto-correlation for different values of k → PACF AR model identification: ACF & PACF
  • 21. AR model identification: ACF & PACF The null hypothesis is rejected when 𝞀k >1.96/ sqrt(n) and 𝞀pk >1.96/ sqrt(n) Thumb-rule: The number of lags is ‘p’ when: ● Partial autocorrelation, 𝞀k > 1.96 / sqrt(n) for first p values & cuts off to 0 ● The auto-correlation function (ACF), 𝞀k , decreases exponentially
  • 22. ● Past residuals are used for forecasting future values of the time-series data ● MA process is different from MA technique ● MA process of lag 1, MA(1) is given by: ● MA process with q lags, MA(q), is given by: MA Process MA(q)
  • 25. ● Can be used only when the time-series data is non-stationary ● ARIMA has the following three components: ○ Auto-regressive component with p lags AR(p) ○ Integration component I(d) ○ Moving average with q lags, MA(q) ● integration component: non-stationary ⇒ stationary ● A Slow decrease in ACF ⇒ non-stationary process ● In addition to ACF plot, Dickey−Fuller or augmented Dickey−Fuller tests can check the presence of stationarity ARIMA process
  • 27. ● Consider AR(1) process as below: ● AR(1) process can become very large when 𝛃 > 1 and is non-stationary when |𝛃 | = 1 ● DF is a hypothesis test with H0 and HA as below: ● AR(1) ⇒ Tests: Dickey Fuller Test
  • 28. ● DF test is valid only when residual 𝛆t+1 follows a white noise ● When 𝛆t+1 is not white noise ⇒ series may not be AR(1) ● To address this, augment p-lags of dependent variable Y Tests: Augmented DF Test
  • 29. ● 1st step in ARIMA → identify order of difference (d) ● Factors for non-stationarity: Trend & Seasonality ● Trend stationarity: Fit a trend line and subtract from time series ● Difference stationarity: Difference the original time-series ○ 1st difference (d = 1), ▽Yt = Yt - Yt-1 ○ 2nd difference (d=2), ▽2 Yt = ▽(▽Yt ) = Yt - 2Yt-1 + Yt-2 Non-stationary ⇒ Stationary process
  • 31. ● Stage 1: Model Identification ○ Refer flowchart ● Stage 2: Parameter Estimation & Model Selection ○ Estimate coefficients in AR & MA components using OLS ○ Model selection criteria: RMSE, MAPE, AIC, BIC AIC & BIC ⇒ distance measure between actual & forecasted values AIC = −2LL + 2K BIC = −2LL + K ln(n) ● Stage 3: Model Validation ○ Should satisfy all the assumptions of regression ○ The residual should be white noise ARIMA(p,d,q) model building
  • 32. ● Comparison b/w Naïve forecasting & developed model ● Naïve forecasting model: Ft+1 = Yt ● Theil’s coefficient (U-statistic) is given by: ● U < 1 ⇒ forecasting model is better than Naïve model ● U > 1 ⇒ forecasting model is not better than Naïve model Power of Forecasting Model: Theil’s coeff
  • 33. Recap ● Introduction ● Components ● Errors ● Moving Average ● Exponential Smoothing ● Regression ● ARIMA ● Tests ● Power of Forecasting model: Thiel’s coefficient