Machine Learning for Trading

ML for Trading
ML/DL /
BU
(BU Head) Larry Guo
larryguo5078@gmail.com, FB: ( )
May 16th, 2018
: Marketing, Sales, Technical Marketing, Legal / ,
BU Head, .

Disclaimer
• talk, ...
or .....
•
• /
( , )
• / Google image

Agenda
•Machine Learning/Deep Learning ( /
• ML for Trading
• ,, the path to AI

vs AI
+
+
If….
buy/
hold/
sell
data
arg min
✓
Loss(f(X))
learning
fˆ✓(X)
model
rule base leanring base

?
• ??
✴ ( , )
✴(vs , , label, ground truth)
✴ ( ) (Loss/objective function)
✴
•
• ( )
• Lear from Error ( )
✓⇤
f(X, ✓), vs.{(x1, y1), (x2, y2), ..(xn, yn)}
s.t., f✓⇤ (xi) ! yi

Types of Learning
Learn from Paired data
(with Label)Supervised
Learning
Learn from Un-Paired data
(without Label)
Un-Supervised
Learning
Learn from Action,
reward)
Reinforcement
Learning

Supervised Learning
Classiﬁcation
: (1,2,3) vs. (1,5,6)
Regression
vs: %

Reinforcement Learning
State
Action
Reward
Environment
Agent
State’
Agent’s Objective:
Find best policy max E
"
X
i
i
Ri
#
i
Ri

(Supervised Learning)
•Classiﬁcation ( / , ,Buy/Hold/Sell)
P(y|X)
(X, y) pair data set f(X) = ˆy ! y
X = features y = label
•Regression ( )
eg. p(ˆy1 = cat|x1) > p(ˆy1 = dog|x1) ! cat

(Unsupervised Learning)
•Clustering ,
•PCA
•Generating ( )
P(X)

Learn from Loss
min(loss)
y: loss:
: ✓
dy
d✓dy
d✓
✓ =: ✓ ⌘ ⇤
dy
d✓
✓ =: ✓
dy
d✓
r : Gradient Gradient Decent ,
✓ =: ✓ ⌘ ⇤ r✓y

Robust
Under ﬁtting
Over ﬁtting
’ ’
<<
> >Generalization
Train/Test Split,
No Data Leak !

Mean Squared Error
ˆy = ax1 + bx2 + cx3
L =
1
N
X
kˆy yk2

Cross Entropy
H(p, p) =
X
x
p(x) log p(x)
0
=
X
x
p(x) log
1
p(x)entropy
DKL(p||q) =
X
x
p(x) log
p(x)
q(x)
q p , ??
p = q, =0

Cross Entropy
DKL(p||q) =
X
x
p(x) log
1
q(x)
(
X
x
p(x) log
1
p(x)
)
DKL(p||q) = H(p, q) H(p)
(if given p)
min DKL(p||q) ⇠ min H(p, q)
cross entropy
H(p, q)
=
X
x
p(x) log q(x) or
X
i
yi log ˆyi or log ˆy
for binary
i=
X
n
X
i
yn
i log ˆyn
i
n=#sample
min (- log likelihood) ~ max (likelihood)

X: Feature( )
K_D RSI ROE
Day1
Day2
..
DayN
{
)
Time dependency
To predict Day(N+1)
??
Domain Knowledge!!

y: labeling
•Category Data into Numeric Encoding, (or
even, One_Hot Encoding)
yi 2 D ⇢ {A, B, C, D, E, F}
,
A = 0
B = 1
C = 2
D = 3
E = 4
F = 5

y:One-Hot
0
0
1
0
0
0
0
yˆy
0.1
0.05
0.05
0.748
0.05
0.001
0.001
maximize
minimize
minimize
X
Model
fˆ✓(X)
sum=1
p
ˆy to estimate y
loss =
X
yi log(ˆyi) = log(0.748)
Softmax!

ML Frameworks
•Regression (Logistic Regression)
•SVM (Kernel Trick.) , LibSVM
•Decision Tree, Random Forest, XGBoost
•Deep Learning

•import numpy as np
•from sklearn.model_selection import train_test_split
✴
•X_train, X_test, y_train,y_test = train_test_split (X,y,
test_size=0.2,shuffle=True)
✴# X (feature), y (label) , # 80% training 20%
test .
•from sklearn.tree import DecisionTreeClassifier
•clf = DecisionTreeClassifier()
•clf.fit(X_train, y_train)
✴ Deciion tree ; / ; clf = classifier
•clf.score(X_test, y_test) # test set
•clf.predict (new_data) #

Deep Learning
W1
W2
W3
X1
X2
X3
Neural Network
ˆy yˆy = (
X
i
WiXi)
ˆˆy = (
X
i
Ui ˆyi) y
activation
eg 0,1

FCN, fully connected, dense
Neural Network
2
4
x1
x2
x3
3
5
2
6
6
6
4
h
(1)
1
h
(1)
2
h
(1)
3
h
(1)
4
3
7
7
7
5
=
2
6
6
4
w11 w12 w13
w21 . .
w31 . .
w41 w42 w43
3
7
7
5
h
(1)
i =
0
@
X
j
wijxj
1
A
Neural Network,
” ”(tensor)
( : ) !

Activation Function
Relu,
y=max(x,0)
0.5
1.0
0.0
Sigmoid
0.0
1.0
-1.0
tanh
activation
eg 0,1

Back Propagation
Forward Pass
Backward Pass loss

Dot Product
???
~u
~vi
~vi • ~u = kvkkuk cos ✓
cosine similarity
target

2D Convolution
1 0 1
0 1 0
1 0 1
???
Kernel ,
ﬁlter,

Filters
horizon ﬁlter
vertical ﬁlter
Filter

Filter on Image
Filter ?
detect
detect pattern 
detecte

ImageNet Competition
100 1000
’12 AlexNet, DL
VGG 19, Google Inception (NiN) , ResNet 152
: Transfer Learning,

Hidden Representation
hidden representation

Convolution Auto Encoder
dim: 1024 x 768
dim: 300
input output
Representation Learning
DNA
Neural Network

Train your own
classiﬁer
Neural style
Transfer
Face
Recognition

Recurrent Neural Network
RNN, LSTM, GRU
This movie is good, really not bad
This movie is bad, really not good

Sequence Modeling
When the recurrent network is trained to perform a task that requires predicting
the future from the past, the network typically learns to use h(t) as a kind of lossy
summary of the task-relevant aspects of the past sequence of inputs up to t. This
summary is in general necessarily lossy, since it maps an arbitrary length sequence
(x(t), x(t 1), x(t 2), . . . , x(2), x(1)) to a ﬁxed length vector h(t). Depending on the
training criterion, this summary might selectively keep some aspects of the past
sequence with more precision than other aspects. For example, if the RNN is used
in statistical language modeling, typically to predict the next word given previous
words, it may not be necessary to store all of the information in the input sequence
up to time t, but rather only enough information to predict the rest of the sentence.
The most demanding situation is when we ask h(t) to be rich enough to allow
one to approximately recover the input sequence, as in autoencoder frameworks
(chapter 14).
ff
hh
xx
h(t 1)
h(t 1)
h(t)
h(t)
h(t+1)
h(t+1)
x(t 1)
x(t 1)
x(t)
x(t)
x(t+1)
x(t+1)
h(... )
h(... )
h(... )
h(... )
ff
Unfold
ff ff f
Figure 10.2: A recurrent network with no outputs. This recurrent network just processes
information from the input x by incorporating it into the state h that is passed forward
through time. (Left)Circuit diagram. The black square indicates a delay of a single time
step. (Right)The same network seen as an unfolded computational graph, where each
node is now associated with one particular time instance.
Equation 10.5 can be drawn in two diﬀerent ways. One way to draw the RNN
is with a diagram containing one node for every component that might exist in a
376
h(t)
= f(h(t 1)
, x(t)
, ✓)
input
xt 1
ht 1
ˆyt 1 ˆyt
ˆyt+1
ht+1
xt+1xt
ht
P(ˆyt = buy|x1, x2, ...xt 1, xt)
L =
X
t
Lt =
X
i
log P(yt|x1, x2, ...xt)

RNN Various Type
Image Classiﬁcation Sentiment Analysis , ( ) Image Captioning
MNIST A Baby Eating a piece of paper

RNN Various Type
Character RNN
Deep Shakespear
Deep Math
Project Magenta
NLP, Translation
Seq2Seq
Google Translation
representation

“Deep Math “
source: The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy

Q-learning
action discrete ﬁnite set
a : epsilon greedy a =
(
random, prob(✏)
arg max
x
Q(s, x), prob(1 ✏)
new old learning
rate
state s, action a, state s’, reward r
a’: state s’, action
Q(s, a) Q(s, a) + (↵) ⇤ [r + max
a0
Q(s0
, a0
) Q(s, a)]
, 0r + max
a0
Q(s0
, a0
) Q(s, a)
Value Iteration

Deep Q-Learning
state
action
Q-
leanring
Q(s,a)
w =: w ⌘ ⇤ rwLoss(w)
state
Q(s,a0)
Q(s,a1)
Q(s,a2)
Neural Network as Q function
a0:sell
a1:hold
a2:buy
, 0r + max
a0
Q(s0
, a0
) Q(s, a)
Loss(w) = (r + max
a0
Qw(s0
, a0
) Qw(s, a))2 w

Modeling
Price Based/
Technical Analysis
Factor Model Event/Text

K
D1
D2
D3
D4
D5
K
P(SD6|(DK
1 , DK
2 , DK
3 ..DK
5 )
{
1
0
1
K
D1
D2
DD3
D4
D5
LSTM or GRU (RNN)
D6
? ?
buy/sell signal

RNN(LSTM)
label: [1,0,0] ( /long day 6 close - day 5 close > threshold)
label: [0,0,1] ( /short day 6 close - day 5 close < - threshold)
K
D1
D2
DD3
D4
D5
D1: (K )
D2: (K )
D5: (K )
.
.
LSTM
Input
LSTM
(LSTM)
fully connected
ˆy0, ˆy1, ˆy2,
X
= 1
y,y2, y3
cross entropy
one-hot
(Keras: 10

Technical Index as Features
K D RSV MA5 MA20 OBV MACD
D1
D2
D3
D4
D5

Factor Model as Features
Rev G% EPS PE ROE
M1
M2
M3
M4
M5
{Preprocess
/

Using CNN
Input
CNN
(CNN)
fully connected
ˆy0, ˆy1, ˆy2,
X
= 1
y,y2, y3
cross entropy
K
D1
D2
DD3
D4
D5
D1: (K )
D2: (K )
D5: (K )
.
.
CNN Sliding window
Self Deﬁne Features
label: [1,0,0] ( /long day 6 close - day 5 close > threshold)
label: [0,0,1] ( /short day 6 close - day 5 close < - threshold)

Consider Candle Chart as
Image (Using CNN)
source: DEEP STOCK REPRESENTATION LEARNING: FROM CANDLESTICK CHARTS TO INVESTMENT DECISIONS
Candlestick image Convolution Auto Encoder, VGG
512D Representation
Clustering Method: Network Modularity
Find Best Stock in
each cluster
¯µ

Using Reinforcement Learning
state
Q(s,a0)
Q(s,a1)
Q(s,a2)
Neural Network as Q function
a0:sell
a1:hold
a2:buy
K
D1
D2
DD3
D4
D5
Position Sizing, Asset
+
{LSTM/CNN
get representation
Or,
Reward ?
Fix a time T as a episode
for all t <=T
R_t = log return of day t Rt = log
Rt
Rt 1

1: Day Trading
•
Long/Short
features
•No perfect answer
• Features ?
• Label ? loss ?
•

2: Using RL for Portfolio
Optimization
Asset : [S1, S2]
S1, S2 : Stocks with predicted growth
Use RL to Optimize Portfolio
A = w1S1 + w2S2
w1 + w2 = 1
Consider action: w1 : {0.10, 0.25, 0.5, 0.75, 0.9}
Reward: Rt = log
At
At 1
How about n-stocks ?

3: Inspired from Project
Magenta
source: Tuning Recurrent Neural Networks with Reinforcement Learning
: note , as input of character RNN
RL: (reward)
Character RNN
RL ? 
(min DD, ..or ....)

Path
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
~ Robert Frost

Generation Busy
•
• po
•
•
•

: Overwhelming
• MOOC, CS231n, CS224n, …
Arxiv, Youtube Udacity, Coursera,
•Open Source Package…github,
•
•

,
• Excel
•Learn to be focused: Leverage, to be Focused, for at
least 45 minutes , Flora, )
• 30
✴ 30 10 ...
✴ 30 , )
• / .

•ML/DL: Coursera: Andrew NG, Udacity
•ML/DL: ,
•CNN: CS231n
•NLP: CS224n
•RL: CS294, CS234 (no video), David Silver
•book: Hands on machine Learning & deep learning with tensor ﬂow
•book: Deep Learning ( )
•RL: David Silver

Machine Learning for Trading

More Related Content

What's hot (20)

Similar to Machine Learning for Trading (20)

Recently uploaded (20)

Machine Learning for Trading