SlideShare a Scribd company logo
國立臺北護理健康大學 NTUHS
Forecasting Model
Orozco Hsu
2021-05-20
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
「How can you not get romantic about baseball ? 」
Tutorial
Content
3
What is the forecasting
Random Walk for Time-Series
Hidden Markov Model for stock prediction
Homework
Multi-variables or Periodic attributes Time-Series
prediction (WEKA)
Code
• Download code
• https://github.com/orozcohsu/ntunhs_2020/tree/master/alg_20210520
4
What is the forecasting
5
What is the forecasting
• Key Terms and differences
Forecasting
Predictive
Modeling
6
What is the forecasting
• Forecasting is a process of predicting or estimating the future based
on past and present data.
• Examples:
• How many passengers can we expect in a given flight?
• How many customer calls can we expect in next hour? (Poisson distribution)
• Weather forecasting
• Stock market forecasting
7
What is the forecasting
• Passengers count forecasting for next one year
Time series
8
What is the forecasting
• Predictive modeling, used to perform prediction more granular.
• Complex variable test
• Data sampling
• Feature selection
• Data visualization
• Examples:
• Who are the customers who are likely to buy a product in next month?
• And then taking action accordingly.
9
What is the forecasting
• Some useful tools
• Moving average forecasting (MA)
• Simple Exponential Smoothing forecasting(SES; Holt-Winters seasonal method)
• Stepwise Autoregressive forecasting
• Autoregressive Integrated Moving Average model forecasting (ARIMA)
• Autoregressive conditional heteroskedasticity forecasting (ARCH)
• Value at Risk
• Hidden Markov Model forecasting (HMM)
• Multi-variables or Periodic attributes Time-Series forecasting
10
What is the forecasting
• Random or non-random series
• Yes, it’s a random series, so we can’t forecast it anything.
• No, it’s a non-random series, but it’s chaotic time series. The purpose of
chaos theory is to reveal the simple laws that may be hidden behind
seemingly random series.
11
It’s a real mess. It looks nothing like a time series.
It also NOT a random walk series.
Chaotic time series: it looks like the random series, but it
generates from a specific function behind. Refer to chaotic
time series analysis.
Question: A series of random number were generated by computer (seed), is that a RANDOM OR NON-RANDOM?
What is the forecasting
• Stationary or non-stationary series
• An assumption must be made that our observations all come from the same
distribution function, it is called stationary.
• Indicates that the standard deviation and/or average value of time series
data will NOT change over time and it has a certain regularity.
• Time series data will be stationary if seasonality and trend are removed.
• From this point of view, we discuss correlation.
12
What is the forecasting
• Correlation
• Time series data can be analyzed and evaluated from the past few data to see
the correlation, such as measuring the relationship between Xt, Xt-1, Xt-2, …Xt-n
• Indicates the degree of correlation between time series data and previous data,
Autocorrelation function (ACF) can be used to determine whether this time
series data has stationarity or seasonality or periodicity properties.
• ACF value between [-1,1], 1: Positive correlation, -1: Negative correlation, 0: Not
relevant.
13
ACF value by lag0~lag24 with higher than statistically significant (fixed value),
It means the series is non-stationary
Fixed value
Random Walk for Time-Series
14
Random Walk for Time-Series
• We know autocorrelation is closer to zero, it is a random walk series.
• A random walk is unpredictable; it cannot reasonably be predicted.
• Why?
• Because we know that the next time step will be a function of the prior time step,
highly relatived with time.
• We called it as naive forecast, or a persistence model.
• Many time series are random walks, particularly those of security prices
over time.
• The random walk hypothesis is a theory that stock market prices are a
random walk and cannot be predicted.
15
Random Walk for Time-Series
• A random walk is different from a list of random numbers because the
next value in the sequence is a modification of the previous value in
the sequence.
• Start with a random number of either -1 or 1
• Randomly select a -1 or 1 and add it to the observation from the previous
time step.
• Repeat step 2 for as long as you like.
• From step-to-step rather than the large jumps that a series of
independent, random numbers provides.
y(t) = B0 + B1*X(t-1) + e(t) White noise/ random function
Observation at the previous time step
Coefficient to weight
16
The next value
in the series
Coefficient
Random Walk for Time-Series
• Random Walk and Autocorrelation
• Calculate the correlation between each observation and the observations at
previous time steps.
• It is constructed, we would expect a strong autocorrelation with the previous
observation and a linear fall off from there with previous lag values.
17
Lag
Random walk is constructed, so the beginning of lag the
autocorrelation value is high. (The current observation is a
random step from the previous observation)
Random Walk for Time-Series
• Random Walk and Stationarity (ADF)
• A stationary time series is one where the values are NOT a function of time.
• Use Augmented Dickey-Fuller test (ADF). Range: [-1,1] => Most positive/ negative
series autocorrelation; 0: non-autocorrelation
• Predicting a Random Walk
• A random walk is unpredictable; it cannot reasonably be predicted.
• Make a stationary Random Walk Time-Series
• Difference (Lag).
• All correlations are small, close to zero.
random_walk_demo.ipynb
predicting_a_random_walk.ipynb
18
Hidden Markov Model for stock prediction
19
Hidden Markov Model for stock prediction
• A hidden process of unknown parameters.
• The most difficult part is unknown parameters of observation.
20
Hidden Markov Model for stock prediction
• Definition
• Roll the dice 10 times, the series are [1 6 3 5 2 7 3 5 2 4] => 「Visible status chain」
• Invisible status chain may be from those dices [D6 D8 D8 D6 D4 D8 D6 D6 D4 D8]
• Visible status
• Status emission probability
• Invisible status
• Status transition probability, it is a HMM
21
Human: Status chain
Computer: Transition probability
Hidden Markov Model for stock prediction
• Question
• Known types of dice, and transition probability
• From visible status results to find the which of dice.
• Knows types of dice, and transition probability
• From visible status results to find the probability of dice.
• Only known types of dice.
• From visible status results to find the transition probability.
22
Hidden Markov Model for stock prediction
• Find the probability
• P = P(D6)*P(D6 to 1)*P(D6 to D8)*P(D8 to 6)*P(D8 to D8)*P(D8 to 3)
• => 1/3 * 1/6 * 1/3 * 1/8 * 1/3 * 1/8
23
Hidden Markov Model for stock prediction
• Find the probability
• Visible state are in series [1 6 3 5 2 7 3 5 2 4]
24
P1(D4) => 1/4
P2(D6) = P1(D4)*P(D4 to 1)*P(D4 to D6)*P(D6 to 6) => 1/3* 1/4* 1/3* 1/6
P3(D4) = P2(D6)*P(D6 to D4)*P(D4 to 3) => 1/216 * 1/3 * 1/4
1 point with the highest probability to dice D4
Three types of dice we have
• D4
• D6
• D8
Hidden Markov Model for stock prediction
• Sum all of probabilities
25
We have 3 types of dice, and randomly pick one of dice and roll with 1 point.
The probability is 18%
Hidden Markov Model for stock prediction
• Sum all of probabilities
26
Hidden Markov Model for stock prediction
• Sum all of probabilities
27
Hidden Markov Model for stock prediction
• How to calculate the most probability of invisible state
• Brute force (for each result of all combination, p21~p23)
• Viterbi algorithm (make a hidden status chain from visible states)
28
x: hidden status
y: visible status
a: transition probabilities
b: emission probabilities
Dice: X2, X1, X3
Homework1: Continue to the visible series of dice [1 6 3 5 2 7 3 5 2 4], P4 ~ P10
Hidden Markov Model for stock prediction
• FB market stock prediction (Using Markov property, not pure time series)
• Download historical data from
• https://www.nasdaq.com/market-activity/stocks/fb/historical
29
hmm_makrtket_stock_prediction.ipynb
Stock data we used is not based on random walk time series data, check the code!
Hidden Markov Model for stock prediction
• Extract feature
30
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
31
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• WEKA
• The workbench for machine learning.
• It is widely used for teaching, research, and industrial applications, contains a
plethora of built-in tools for standard machine learning tasks, and additionally
gives transparent access to well-known toolboxes such as scikit-learn, R.
• WEKA Installation and import packages
• Demo
https://www.cs.waikato.ac.nz/ml/weka/
https://sourceforge.net/projects/weka/
32
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
33
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
34
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Sales order prediction
Observation
Overlay data
Row data remove or not
• WEKA requires
• Input data format must be ARFF
• You need to convert your csv
• Covert to WEKA ARFF format
• https://pulipulichen.github.io/jieba-
js/weka/spreadsheet2arff/index.html
• Download the ARFF files
35
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Load Data
• Find the ARFF file
36
4 columns
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Basic configuration
• Number of time units forecast = 25
• Periodicity = Daily
• Copy the content from skip_list-
sales_order_1_8_p9.txt and paste
it
• Check Perform evaluation
37
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Base learner
• LinearRegression
• MultilayerPerceptron
• SMOreg (SVM)
38
Homework2: Continue to improve
performance in WEKA sales order
prediction
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Click 「periodic attributes」
• Check Customize
• Load file form periodics_set-
sales_order_1_8_p9.periodics
39
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Output
• Check Graph target at steps
• Press Start button
40
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Output
Prediction with
asterisk 41
Using Linear Regression Model with those variables
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Output/ Prediction at steps
42
NOT so good, is it good using prediction model to
forecast future value?
Multi-variables or Periodic attributes Time-
Series prediction (WEKA)
• Output/ Future prediction
43
Homework
44
Homework
• Calculate the invisible states from visible series of dice [1 6 3 5 2 7 3 5
2 4]
• Continue to improve performance in WEKA sales order prediction
• Change algorithms
• Add more observation data
• If we don’t use Periodic attributes or Overlay data
45

More Related Content

PDF
analytic hierarchy_process
 
PPT
Test design techniques
PDF
Automating System Test Case Classification and Prioritization for Use Case-Dr...
PPTX
Wrapper feature selection method
PPT
Intro to Feature Selection
PDF
Practical Constraint Solving for Generating System Test Data
PPTX
Test Case Design Techniques
PDF
Feature Reduction Techniques
analytic hierarchy_process
 
Test design techniques
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Wrapper feature selection method
Intro to Feature Selection
Practical Constraint Solving for Generating System Test Data
Test Case Design Techniques
Feature Reduction Techniques

What's hot (10)

PPT
Design Test Case Technique (Equivalence partitioning And Boundary value analy...
PPTX
Test case design techniques
PPTX
Test design techniques: Structured and Experienced-based techniques
PPTX
Feature selection
PPTX
Test Case Design and Technique
PPSX
Test Case Design and Technique
PPSX
Test Case Design and Technique
PPT
State transition testing-software_testing
PPTX
Test design techniques
PPTX
Test design techniques
Design Test Case Technique (Equivalence partitioning And Boundary value analy...
Test case design techniques
Test design techniques: Structured and Experienced-based techniques
Feature selection
Test Case Design and Technique
Test Case Design and Technique
Test Case Design and Technique
State transition testing-software_testing
Test design techniques
Test design techniques
Ad

Similar to forecasting model (20)

PDF
timeseries cheat sheet with example code for R
PPTX
Presentation On Time Series Analysis in Mechine Learning
PPTX
Machine Learning for Forecasting: From Data to Deployment
PDF
Forecasting time series powerful and simple
PPTX
Time series analysis
PPTX
Time series analysis 101
PPTX
Application of time series analysis in financial economics
PDF
prediction of_inventory_management
 
PDF
Social_Distancing_DIS_Time_Series
PDF
Time Series for FRAM-Second_Sem_2021-22 (1).pdf
PPTX
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
PDF
Demand time series analysis and forecasting
PPT
Time Series Analysis - Modeling and Forecasting
PDF
Project Paper
PDF
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
PDF
Mastering Time Series Forecasting - Guide to Techniques, Applications, and Fu...
PDF
Forecasting emergency department patients volumes with ML (Giammarco Quaglia,...
PDF
Visualizing and Forecasting Stocks Using Machine Learning
PPTX
Time series
PPTX
Project time series ppt
timeseries cheat sheet with example code for R
Presentation On Time Series Analysis in Mechine Learning
Machine Learning for Forecasting: From Data to Deployment
Forecasting time series powerful and simple
Time series analysis
Time series analysis 101
Application of time series analysis in financial economics
prediction of_inventory_management
 
Social_Distancing_DIS_Time_Series
Time Series for FRAM-Second_Sem_2021-22 (1).pdf
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Demand time series analysis and forecasting
Time Series Analysis - Modeling and Forecasting
Project Paper
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
Mastering Time Series Forecasting - Guide to Techniques, Applications, and Fu...
Forecasting emergency department patients volumes with ML (Giammarco Quaglia,...
Visualizing and Forecasting Stocks Using Machine Learning
Time series
Project time series ppt
Ad

More from FEG (20)

PDF
Supervised learning in decision tree algorithm
 
PDF
Unsupervised learning in data clustering
 
PDF
CNN_Image Classification for deep learning.pdf
 
PDF
Sequence Model with practicing hands on coding.pdf
 
PDF
Seq2seq Model introduction with practicing hands on coding.pdf
 
PDF
AIGEN introduction with practicing hands on coding.pdf
 
PDF
資料視覺化_Exploation_Data_Analysis_20241015.pdf
 
PDF
Operation_research_Linear_programming_20241015.pdf
 
PDF
Operation_research_Linear_programming_20241112.pdf
 
PDF
非監督是學習_Kmeans_process_visualization20241110.pdf
 
PDF
Sequence Model pytorch at colab with gpu.pdf
 
PDF
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
PDF
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
PDF
Pytorch cnn netowork introduction 20240318
 
PDF
2023 Decision Tree analysis in business practices
 
PDF
2023 Clustering analysis using Python from scratch
 
PDF
2023 Data visualization using Python from scratch
 
PDF
2023 Supervised Learning for Orange3 from scratch
 
PDF
2023 Supervised_Learning_Association_Rules
 
PDF
202312 Exploration Data Analysis Visualization (English version)
 
Supervised learning in decision tree algorithm
 
Unsupervised learning in data clustering
 
CNN_Image Classification for deep learning.pdf
 
Sequence Model with practicing hands on coding.pdf
 
Seq2seq Model introduction with practicing hands on coding.pdf
 
AIGEN introduction with practicing hands on coding.pdf
 
資料視覺化_Exploation_Data_Analysis_20241015.pdf
 
Operation_research_Linear_programming_20241015.pdf
 
Operation_research_Linear_programming_20241112.pdf
 
非監督是學習_Kmeans_process_visualization20241110.pdf
 
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
 
202312 Exploration Data Analysis Visualization (English version)
 

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Modernizing your data center with Dell and AMD
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction

forecasting model

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2 「How can you not get romantic about baseball ? 」
  • 3. Tutorial Content 3 What is the forecasting Random Walk for Time-Series Hidden Markov Model for stock prediction Homework Multi-variables or Periodic attributes Time-Series prediction (WEKA)
  • 4. Code • Download code • https://github.com/orozcohsu/ntunhs_2020/tree/master/alg_20210520 4
  • 5. What is the forecasting 5
  • 6. What is the forecasting • Key Terms and differences Forecasting Predictive Modeling 6
  • 7. What is the forecasting • Forecasting is a process of predicting or estimating the future based on past and present data. • Examples: • How many passengers can we expect in a given flight? • How many customer calls can we expect in next hour? (Poisson distribution) • Weather forecasting • Stock market forecasting 7
  • 8. What is the forecasting • Passengers count forecasting for next one year Time series 8
  • 9. What is the forecasting • Predictive modeling, used to perform prediction more granular. • Complex variable test • Data sampling • Feature selection • Data visualization • Examples: • Who are the customers who are likely to buy a product in next month? • And then taking action accordingly. 9
  • 10. What is the forecasting • Some useful tools • Moving average forecasting (MA) • Simple Exponential Smoothing forecasting(SES; Holt-Winters seasonal method) • Stepwise Autoregressive forecasting • Autoregressive Integrated Moving Average model forecasting (ARIMA) • Autoregressive conditional heteroskedasticity forecasting (ARCH) • Value at Risk • Hidden Markov Model forecasting (HMM) • Multi-variables or Periodic attributes Time-Series forecasting 10
  • 11. What is the forecasting • Random or non-random series • Yes, it’s a random series, so we can’t forecast it anything. • No, it’s a non-random series, but it’s chaotic time series. The purpose of chaos theory is to reveal the simple laws that may be hidden behind seemingly random series. 11 It’s a real mess. It looks nothing like a time series. It also NOT a random walk series. Chaotic time series: it looks like the random series, but it generates from a specific function behind. Refer to chaotic time series analysis. Question: A series of random number were generated by computer (seed), is that a RANDOM OR NON-RANDOM?
  • 12. What is the forecasting • Stationary or non-stationary series • An assumption must be made that our observations all come from the same distribution function, it is called stationary. • Indicates that the standard deviation and/or average value of time series data will NOT change over time and it has a certain regularity. • Time series data will be stationary if seasonality and trend are removed. • From this point of view, we discuss correlation. 12
  • 13. What is the forecasting • Correlation • Time series data can be analyzed and evaluated from the past few data to see the correlation, such as measuring the relationship between Xt, Xt-1, Xt-2, …Xt-n • Indicates the degree of correlation between time series data and previous data, Autocorrelation function (ACF) can be used to determine whether this time series data has stationarity or seasonality or periodicity properties. • ACF value between [-1,1], 1: Positive correlation, -1: Negative correlation, 0: Not relevant. 13 ACF value by lag0~lag24 with higher than statistically significant (fixed value), It means the series is non-stationary Fixed value
  • 14. Random Walk for Time-Series 14
  • 15. Random Walk for Time-Series • We know autocorrelation is closer to zero, it is a random walk series. • A random walk is unpredictable; it cannot reasonably be predicted. • Why? • Because we know that the next time step will be a function of the prior time step, highly relatived with time. • We called it as naive forecast, or a persistence model. • Many time series are random walks, particularly those of security prices over time. • The random walk hypothesis is a theory that stock market prices are a random walk and cannot be predicted. 15
  • 16. Random Walk for Time-Series • A random walk is different from a list of random numbers because the next value in the sequence is a modification of the previous value in the sequence. • Start with a random number of either -1 or 1 • Randomly select a -1 or 1 and add it to the observation from the previous time step. • Repeat step 2 for as long as you like. • From step-to-step rather than the large jumps that a series of independent, random numbers provides. y(t) = B0 + B1*X(t-1) + e(t) White noise/ random function Observation at the previous time step Coefficient to weight 16 The next value in the series Coefficient
  • 17. Random Walk for Time-Series • Random Walk and Autocorrelation • Calculate the correlation between each observation and the observations at previous time steps. • It is constructed, we would expect a strong autocorrelation with the previous observation and a linear fall off from there with previous lag values. 17 Lag Random walk is constructed, so the beginning of lag the autocorrelation value is high. (The current observation is a random step from the previous observation)
  • 18. Random Walk for Time-Series • Random Walk and Stationarity (ADF) • A stationary time series is one where the values are NOT a function of time. • Use Augmented Dickey-Fuller test (ADF). Range: [-1,1] => Most positive/ negative series autocorrelation; 0: non-autocorrelation • Predicting a Random Walk • A random walk is unpredictable; it cannot reasonably be predicted. • Make a stationary Random Walk Time-Series • Difference (Lag). • All correlations are small, close to zero. random_walk_demo.ipynb predicting_a_random_walk.ipynb 18
  • 19. Hidden Markov Model for stock prediction 19
  • 20. Hidden Markov Model for stock prediction • A hidden process of unknown parameters. • The most difficult part is unknown parameters of observation. 20
  • 21. Hidden Markov Model for stock prediction • Definition • Roll the dice 10 times, the series are [1 6 3 5 2 7 3 5 2 4] => 「Visible status chain」 • Invisible status chain may be from those dices [D6 D8 D8 D6 D4 D8 D6 D6 D4 D8] • Visible status • Status emission probability • Invisible status • Status transition probability, it is a HMM 21 Human: Status chain Computer: Transition probability
  • 22. Hidden Markov Model for stock prediction • Question • Known types of dice, and transition probability • From visible status results to find the which of dice. • Knows types of dice, and transition probability • From visible status results to find the probability of dice. • Only known types of dice. • From visible status results to find the transition probability. 22
  • 23. Hidden Markov Model for stock prediction • Find the probability • P = P(D6)*P(D6 to 1)*P(D6 to D8)*P(D8 to 6)*P(D8 to D8)*P(D8 to 3) • => 1/3 * 1/6 * 1/3 * 1/8 * 1/3 * 1/8 23
  • 24. Hidden Markov Model for stock prediction • Find the probability • Visible state are in series [1 6 3 5 2 7 3 5 2 4] 24 P1(D4) => 1/4 P2(D6) = P1(D4)*P(D4 to 1)*P(D4 to D6)*P(D6 to 6) => 1/3* 1/4* 1/3* 1/6 P3(D4) = P2(D6)*P(D6 to D4)*P(D4 to 3) => 1/216 * 1/3 * 1/4 1 point with the highest probability to dice D4 Three types of dice we have • D4 • D6 • D8
  • 25. Hidden Markov Model for stock prediction • Sum all of probabilities 25 We have 3 types of dice, and randomly pick one of dice and roll with 1 point. The probability is 18%
  • 26. Hidden Markov Model for stock prediction • Sum all of probabilities 26
  • 27. Hidden Markov Model for stock prediction • Sum all of probabilities 27
  • 28. Hidden Markov Model for stock prediction • How to calculate the most probability of invisible state • Brute force (for each result of all combination, p21~p23) • Viterbi algorithm (make a hidden status chain from visible states) 28 x: hidden status y: visible status a: transition probabilities b: emission probabilities Dice: X2, X1, X3 Homework1: Continue to the visible series of dice [1 6 3 5 2 7 3 5 2 4], P4 ~ P10
  • 29. Hidden Markov Model for stock prediction • FB market stock prediction (Using Markov property, not pure time series) • Download historical data from • https://www.nasdaq.com/market-activity/stocks/fb/historical 29 hmm_makrtket_stock_prediction.ipynb Stock data we used is not based on random walk time series data, check the code!
  • 30. Hidden Markov Model for stock prediction • Extract feature 30
  • 31. Multi-variables or Periodic attributes Time- Series prediction (WEKA) 31
  • 32. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • WEKA • The workbench for machine learning. • It is widely used for teaching, research, and industrial applications, contains a plethora of built-in tools for standard machine learning tasks, and additionally gives transparent access to well-known toolboxes such as scikit-learn, R. • WEKA Installation and import packages • Demo https://www.cs.waikato.ac.nz/ml/weka/ https://sourceforge.net/projects/weka/ 32
  • 33. Multi-variables or Periodic attributes Time- Series prediction (WEKA) 33
  • 34. Multi-variables or Periodic attributes Time- Series prediction (WEKA) 34
  • 35. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Sales order prediction Observation Overlay data Row data remove or not • WEKA requires • Input data format must be ARFF • You need to convert your csv • Covert to WEKA ARFF format • https://pulipulichen.github.io/jieba- js/weka/spreadsheet2arff/index.html • Download the ARFF files 35
  • 36. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Load Data • Find the ARFF file 36 4 columns
  • 37. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Basic configuration • Number of time units forecast = 25 • Periodicity = Daily • Copy the content from skip_list- sales_order_1_8_p9.txt and paste it • Check Perform evaluation 37
  • 38. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Base learner • LinearRegression • MultilayerPerceptron • SMOreg (SVM) 38 Homework2: Continue to improve performance in WEKA sales order prediction
  • 39. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Click 「periodic attributes」 • Check Customize • Load file form periodics_set- sales_order_1_8_p9.periodics 39
  • 40. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Output • Check Graph target at steps • Press Start button 40
  • 41. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Output Prediction with asterisk 41 Using Linear Regression Model with those variables
  • 42. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Output/ Prediction at steps 42 NOT so good, is it good using prediction model to forecast future value?
  • 43. Multi-variables or Periodic attributes Time- Series prediction (WEKA) • Output/ Future prediction 43
  • 45. Homework • Calculate the invisible states from visible series of dice [1 6 3 5 2 7 3 5 2 4] • Continue to improve performance in WEKA sales order prediction • Change algorithms • Add more observation data • If we don’t use Periodic attributes or Overlay data 45