Machine learning & Time Series Analysis , Finlab CTO 韓承佑

Solutions
Machine learning
Time Series Analysis
Challenges
Finlab CTO 韓承佑
2019 / 07 / 05

Outline
• Introduction
• Background
• Motivation
• Proposed Method
• Conclusion

Human Evolution
Agricultural
revolution
Industrial
revolution
i
Information
revolution
14000 years

Human vs Computer at Go
4300 years
200 years

Gartner Hype Cycle
For emerging technologies 2018
Deep Learning

Gartner Hype Cycle
For emerging technologies 2015
Machine Learning

1948 Technical Indicators
1970 Algorithmic Trading
1980 Personal Computer
1990 High frequency trading
History of Trading

• Mimics the ability to see and hear
• Extract rules automatically from data
• ML spots patterns in high dimensional data
Machine Learning

Live trading result
date
profit

What is AI, ML, DL?
AI (Artificial Intelligent)
ML (Machine Learning)
DL (Deep Learning)
@
Dog or
Cat?
Categories Probability
Dog 0.9
Cat 0.1

Supervised Machine Learning
Color Weight Age Category
3.2 kg 2 cat
4.2 kg 5 cat
6.2 kg 4 dog
features labels
ML
Model
Training
Testing
Color Weight Age
3.2 kg 2
4.2 kg 5
6.2 kg 4
ML
Model
True Answer Prediction
cat cat
cat dog
dog dog

Outline
• Machine Learning Models
• Training & Testing
• Evaluation
• Feature Engineering
• Data Preprocessing

Feature Engineering &
Data Preprocessing

Feature Source
• Difficult to confirm data release date
• Missing data is often backfilled
• Consider multiple correction
• Maybe useful to combine other data types
Fundamental data
• Trading book
• market participant characteristic footprint
• Massive amount of data generated in one day
Market data
Alternative data
• news, trend, web, satellites…
• Primary data source
• Hard to process, difficult to confirm consistency

Challenging of Labeling the data
Time t KD RSI MACD Category
1 -1
2 -1
… 0
t 1
features labels
price
time
1
0
-1
𝑝 𝑡 + 𝑤 + 𝜏
𝑝 𝑡 + 𝑤 − 𝜏
𝑝 𝑡 + 𝑤𝑝 𝑡
• A popular method in the literature
• 𝜏 is a constant regardless of the volatility
• Do not have stop-loss limits
Fixed time horizon

Labeling generation for financial price
• Triple barrier [Prado 2018]
• Continuous trading signals [Dash 2016]
• Trading Point decision [Chang 2009]
[Prado 2018] Advances in Financial Machine Learning
[Tsantekidis 2017] Using Deep Learning to Detect Price Change Indications in Financial Markets
[Dash 2016] A hybrid stock trading framework integrating technical analysis with machine learning techniques
[Chang 2009] Integrating a Piecewise Linear Representation Method and a Neural Network Model for Stock Trading Points Prediction

Triple barrier [Prado 2018]
price
time
1
0
-1
𝑝 𝑡 + 𝑤 + 𝜏1
𝑝 𝑡 + 𝑤 − 𝜏2
• Labels according to the first barrier touched out of three barriers
• horizontal barriers are defined by profit-taking and stop- loss limit
• 𝜏1 and 𝜏2 are dynamic according to estimated volatility

Continuous trading signals [Dash 2016]
time
price
𝑝𝑡,𝑡+𝑤
max
𝑝𝑡,𝑡+𝑤
min
𝑝𝑡,𝑡+𝑤
min
𝑝𝑡,𝑡+𝑤
max
0.5
1
0.5
0
• Using momentum of the stock price
• y(t)’s are continuous
• Provides more detailed information

Trading point decision [Chang 2009]
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend

Trading point decision
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend

Neural Network [McCulloch 1943]
• Built to model the human brain
• Designed to recognize patterns
• Interpret numeric data through a kind of machine perception
Human neuron structure Single neuron model
y1= g (w1x1+w2x2+
v
g(v)
1
w0
w1x1
x2
g
w2
y1
[McCulloch 1943] A Logical Calculus of Ideas Immanent in Nervous Activity

Neural network [McCulloch 1943]
Single node in neural network
g y1
1
w0
w1x1
x2
w2
v
g(v)

Simplified expression
y1
1
w0
w1x1
x2
w2

A layer contain multiple neurons
1
x1
x2
y2
y1
y3
y4

Deep Neural network
Multi-layer deep neural network
1
x1
x2
y2
y3
y1

Deep Neural Network Training Result
2018-1-1 2019-7-1
Train
2006 ~ 2014 2016 ~ 2019-3-1
Validate
2015
Backtest
Features
Scaled Technical Indicators
Asset
Data split
Labels
Fixed time horizon
Taiwan Capitalization
Weighted Stock Index
benchmark
backtest

Long short term memory neural network(LSTM)
[Hochreiter 1997]
• Can process sequence of data
• LSTM deals with the exploding and vanishing gradient problems
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
Neural
network
x y
Neural
network
x y
Neural
network
x y
Neural
network
x y
Neural
network
x y
t = 1
t = 2
t = 3
x
y
Input gate +
tanh
LST
M
Forget gate
Output gate

Long short term memory neural network(LSTM)
[Hochreiter 1997]
Train
2006 ~ 2014 2016 ~ 2019-3-1
Validate
2015
Backtest
Features
Scaled Technical Indicators
Asset
Data split
Labels
Fixed time horizon
Taiwan Capitalization
Weighted Stock Index loss
profit
epoch

Convolutional Neural Network [2012 Krizhevsky]
• Commonly applied to computer visual imagery
• Prevent overfitting
Convolutional Layer

Time series to Image conversion Approach
[Sezer 2018]
[Sezer 2018] Algorithmic FinancialTrading with Deep Convolutional Neural Networks:Time Series to Image Conversion Approach

Which time series is random walk?
1
2
3
4
5
6

Generative Adversarial Networks (GAN)
Historical
Price data
Real/Generated
• The Generator is trained to generate data that looks like historical price
• The Discriminator is trained to tell the difference between generated and real data
https://github.com/nmharmon8/StockMarketGAN

StockMarketGAN
Discriminator features is good for predict the direction of the price
Historical
Price data
Fixed
Weights
(New features)
Rise/Fall
https://github.com/nmharmon8/StockMarketGAN

Backtest
• Survivor bias, lookahead bias, training, transection cost, outlier
• Finding the lottery tickets that won the last game
• Machine learning overfitting
• Solutions
• Develop model for entire. asset or classes
• Use Bootstrap aggregating
• Record every backtest conducted
• K-fold Cross validation

K-fold Cross Validation
• Determine the generalization error of an ML algorithm
• Prevent overfitting
• Assume the training set and the testing set are IID

Drawback
Train
Test
Train
Price
• Training set contains information
that also appears in the testing set
• Observations cannot be assumed
to be drawn from an IID process
• Multiple testing and selection bias

Purged K-fold Cross Validation [Prado 2018]
Price
Price
Before Purging
After Purging

Feature Importance [Liaw 2002]
• Understand features contributed to the performance
• Add some features that strengthen the predictive power
• Opens up the proverbial black box
• How to deal with selection bias
• Evaluate features in multiple assets
• Using different tree base classifier/regressor
• Use random feature to distinguish powerful features

Mean decrease impurity (MDI)
• Adding up information gain for each features
• Simple and efficient to calculate
• inflate importance of continuous feature
Correlation Coefficient
0.94
Importance
indicator ID’s
Humidity
Rain Sunny day
Throw a coin
Rain Sunny day
Importance: technical indicators of TXF1
Importance: technical Indicators of random walk time series
[2013 Louppe] Understanding variable importances
in forests of randomized trees.
[2013 Louppe]

Permutation Importance
• Much more computationally expensive
• Results are more reliable
Color Weight Age
3.2 kg 2
4.2 kg 5
6.2 kg 4
baseline = model.score(X, y)
for each column in X
permutate column in X randomly
reduced_score = model.score(X, y)
recover the column
importance of the feature = baseline – reduced_score
indicator ID’s
Importance: technical indicators of TXF1
Importance: technical Indicators of random walk time series
-0.04
Correlation Coefficient
Importance
[2013 Louppe]
[2013 Louppe] Understanding variable importances in forests of randomized trees

Conclusion
• Feature engineering
• Labeling
• Fixed time horizon, triple barrier, Continuous signal
• Multi-asset feature selection
• MDI, MDA
• Model
• Neural network
• LSTN
• CNN
• Cross validation, backtest
• Realistic check

Future work
Feature enumeration
Stationarize features
Feature selection
Preprocessing
Training
Trading strategies
development
Reality check
Multiple assets
Strategies
Strategy management
Live trading

Q&A
Facebook AI CourseFinlab Blog AI Diagnose

Machine learning & Time Series Analysis , Finlab CTO 韓承佑

More Related Content

What's hot (20)

Similar to Machine learning & Time Series Analysis , Finlab CTO 韓承佑 (20)

Recently uploaded (20)

Machine learning & Time Series Analysis , Finlab CTO 韓承佑

Editor's Notes