SlideShare a Scribd company logo
Recurrent Neural Networks
Viacheslav Khomenko, Ph.D.
Contents
 Recap: feed-forward artificial neural network
 Temporal dependencies
 Recurrent neural network architectures
 RNN training
 New RNN architectures
 Practical considerations
 Neural models for locomotion
 Application of RNNs
RECAP: FEED-FORWARD
ARTIFICIAL NEURAL
NETWORK
Feed-forward network
W. McCulloch and W. Pitts , 1940s Abstract mathematical model of a brain cell
Perceptron for classificationF. Rosenblatt, 1958
Multi-layer artificial neural networkP. Werbos, 1975
Input
Features
Input
Input
Input
Petals
Sepal
Yellow
patch
VeinsIris flower
Input
layer
Hidden
layer(s)
Output
layer
Hid-
den
Hid-
den
Hid-
den
Out-
put
Iris
Out-
put
¬Iris
Decisions
Feed-forward network
Decisions are based on current inputs:
• No memory about the past
• No future scope
A 𝒚x A
Input layer Hidden layer(s) Output layer
A
Input Decision output
Simplified representation:
Vector of input features:
Vector of predicted values:
x
𝒚
Neural activation:
A – some activation function (tanh etc…)
𝑤, 𝑏 – network parameters
TEMPORAL
DEPENDENCIES
Temporal dependencies
Analyzing temporal dependencies
Frame 0 Frame 1 Frame 2 Frame 3 Frame 4
P(Iris): 0.1
P(¬Iris): 0.9
P(Iris): 0.11
P(¬Iris): 0.89
P(Iris): 0.2
P(¬Iris): 0.8
P(Iris): 0.45
P(¬Iris): 0.55
P(Iris): 0.9
P(¬Iris): 0.1
Decision on
sequence of
observations
Improved decisions
Stem: seen
Petals: hidden
Stem: seen
Petals: hidden
Stem: seen
Petals: partial
Stem: partial
Petals: partial
Stem: hidden
Petals: seen
For each state
Reber Grammar
Synthetic problem that can not be solved without memory.
Learn to predict
next possible edges
Transitions have equal probabilities:
P(1→2) = P(1→3) = 0.5
0.5
0.5 0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
States (nodes)
Transitions
(edges)
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0
T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0
T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0
T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0
T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0
T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0
V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0
P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0
X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0
T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0
T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0
T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0
T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0
V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0
V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0
E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1
Reber Grammar
Input vector x at time t = 2 Output vector y at time t = 2
Memory is important → Reasoning relies on
experience
Pro: Dependencies between features at different timestamps
Cons:
• Limited history of the input (< 10 timestamps)
• Delay values should be set explicitly
• Not general, can not solve complex tasks (such as Reber Grammar)
• FFNN with delayed inputs
• No internal state
Time-delay neural network
Input
Features
Input
Input
Input
Input
layer Hidden
layer
Output
layer
Hid-
den
Hid-
den
Hid-
den
Out-
put 𝒚(𝒕)
x(t)
x(t-1)
x(t-2)
x(t-3)
delay
delay
delay
RECURRENT NEURAL
NETWORK
ARCHITECTURES
But… not working because not stable!
Simple recurrence:
feed-back output to input
Naïve attempts…
Lack of the feedback control
A∑
𝒚(𝒕)x(t)
A
Input layer Hidden layer Output layer
A
Input Decision output
Past output state
1 step delay
Expected
𝒚
𝒚
Obtained
𝒚
𝒚
Introducing recurrence
A
A
∑
𝒚(𝒕)x(t)
A
Input layer Hidden layer Output layer
A
Context layer
Pro: Fast to train because can be parallelized in time
Cons:
• Output transforms hidden state → nonlinear effects, information distorted
• The output dimension may be too small → information in hidden states is truncated
M.I. Jordan, 1986
1 step delay
Jordan recurrent network
Limited short-term
memory
Output-to hidden
connections
J.L. Elman, 1990
Often referenced as the basic RNN structure
and called “Vanilla” RNN
• Should see complete sequence to be trained
• Can not be parallelized by timestamps
• Has some important training difficulties….
A
A
∑
𝒚(𝒕)x(t)
A
Input layer Hidden layer Output layer
A
Context layer 1 step delay
Hidden-to hidden connections
make system Turing-complete
Elman recurrent network
𝑾𝑖ℎ Weight matrix from input to hidden
𝑾 𝑜 Weight matrix from hidden to output
𝒙 𝑡 Input (feature) vector at time t
𝒚 𝑡 Network output vector at time t
𝒉 𝑡 Network internal (hidden) states vector at time t
𝑼 Weight matrix from hidden to hidden
𝒃 Bias parameter vector
𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃
𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡
Vanilla RNN
Unfolding the network in time
Vanilla RNN
𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃
𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡
RNN TRAINING
Backpropagation: • Reliable and controlled convergence
• Supported by most of ML frameworks
Evolutionary methods, expectation maximization,
non-parametric methods, particle swarm optimization
Target: obtain the network parameters that optimize the cost function
Cost functions: log loss, mean squared root error etc…
Tasks:
Methods:
• For each timestamp of the input sequence x predict output y (synchronously)
• For the input sequence x predict the scalar value of y (e.g., at end of sequence)
• For the input sequence x of length Lx generate the
output sequence y of different length Ly
Research
RNN training
1. Unfold the network.
2. Repeat for the train data:
1. Given some input sequence 𝒙
2. For t in 0, N-1:
1. Forward-propagate
2. Initialize hidden state to the past value 𝒉 𝑡−1
3. Obtain output sequence 𝒚
4. Calculate error 𝑬 𝒚, 𝒚
5. Back-propagate error across the unfolded network
6. Average the weights
7. Compute next hidden state value 𝒉 𝑡
𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃
𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡
𝑬 𝒚, 𝒚 = −
𝑡
𝒚 𝒕 lg 𝒚 𝒕
E.g., cross entropy loss:
Back-propagation through time
Apply chain rule:
Back-propagation through time
𝜕𝑬 𝟐
𝜕𝜽
=
𝑘=0
2
𝜕𝑬 𝟐
𝜕 𝒚 𝟐
∙
𝜕 𝒚 𝟐
𝜕𝒉 𝟐
∙
𝜕𝒉 𝟐
𝜕𝒉 𝒌
∙
𝜕𝒉 𝒌
𝜕𝜽
𝜽 - Network parametersFor time 2:
𝜕𝒉 𝟐
𝜕𝒉 𝟎
=
𝜕𝒉 𝟐
𝜕𝒉 𝟏
∙
𝜕𝒉 𝟏
𝜕𝒉 𝟎
Back-propagation through time
Back-propagation through time
Back-propagation through time
Back-propagation through time
Back-propagation through time
Back-propagation through time
Saturation
Gradient
close to 0
Saturated neurons gradients → 0
• Smaller weigh parameters leads to faster gradients vanishing.
• Very big initial parameters make the gradient descent to diverge fast (explode).
Drive previous layers gradients to 0
(especially for far time-stamps)
Known problem for deep feed-forward networks.
For recurrent networks (even shallow) makes impossible to learn long-term dependencies!
𝝏𝒉 𝑡
𝝏𝒉0
=
𝝏𝒉 𝑡
𝝏𝒉 𝑡−1
∙ ⋯ ∙
𝝏𝒉3
𝝏𝒉2
∙
𝝏𝒉2
𝝏𝒉1
∙
𝝏𝒉1
𝝏𝒉0
• Decays exponentially
• Network stops learning, can’t update
• Impossible to learn correlations
• between temporally distant events
Problem: vanishing gradients
Network can not converge and
weigh parameters do not stabilize
Diagnostics: NaNs; Cost function large fluctuations
Large increase in the norm of the gradient
during training
Pascanou R. et al, On the difficulty of training
recurrent neural networks. arXiv (2012)
Problem: exploding gradients
Solutions:
• Use gradient clipping
• Try reduce learning rate
• Change loss function by setting constrains on weights (L1/L2 norms)
Deep networks train difficulties:
• Vanishing gradient
• Exploding gradient
Possible solutions:
• One of the previously proposed solutions
or
• Use unsupervised pre-training →
difficult to implement, sometimes the
unsupervised solution differs much from the supervised
or
• Improve network architecture!
Fundamental deep learning problem
NEW RNN ARCHITECTURES
Echo State
Network Readout
Only readout
neurons are
trained!
Herbert Jaeger, 2001
In practice:
• Easy to over-fit
(models learns by
heart) – gives good
results on the train
data only
• The reservoir hyper-
parameters
optimization is not
evident
Reservoir computing
Liquid state
machine
Similar to ESN, but using more
biological plausible neuron models
→ spiking (dynamic) neurons
In practice:
• Still, more a
research area
• Requires special
hardware to be
computationally
efficient
Daniel Brunner
Tal Dahan and Astar Sade
Reservoir computing
• No Input Gate
• No Forget Gate
• No Output Gate
• No Input Activation
Function
• No Output
Activation Function
• No Peepholes
• Coupled Input and
• Forget Gate
• Full Gate Recurrence
Variants
S. Hochreiter & J. Schmidhuber, 1997
Long short-term memory
Due to gaining routing
mechanism, can be
efficiently trained to learn
LONG-TERM dependencies
Has context in both directions, at any timestamp
Bidirectional RNN
Last-1 output = First+1 output
BPXXXXXPE
BTXXXXXXXXTE
Testing capacity to
maintain long term
dependencies
Correct cases
BT ….. TE
BP ….. PE
Incorrect cases
BT ….. PE
BP ….. TE
System must be able to learn to compare
First+1 symbol with Last-1 symbol
Embedded Reber Grammar
PRACTICAL
CONSIDERATIONS
Masking input (output)
Input (output) has variable length
Data batch
Length of input ≠ length of output
•CTC loss function•Encoder-decoder architecture
Transform the network outputs into a
conditional probability distribution over label
sequences
- C - A - T -
- BLANK
labelling
Result decoding
Raw output: -----CCCC---AA-TTTT---
1) Remove repeating symbols: -C-A-T-
2) Remove blanks: CAT
NEURAL MODELS FOR
LOCOMOTION
Locomotion principles in nature
[S.Roland et al., 2004]
Locomotion: movement or
the ability to move from
one place to another
Manipulation ≠ Locomotion
Aperiodic
series of
motions
Stable
Periodic
motion
gaits
Quasi stable
[A. Ijspeert et al., 2007]
Wheeled on soft
ground
[S.Roland et al. 2004 ]
Locomotion efficiency
Nature: no “pure” wheeled locomotion
Reason: variety of surfaces, rough terrain, adaptation is necessary
Biological locomotion exploits patterns
The number of legs influences
• Mechanical complexity
• Control complexity
• Generated patterns (for 6 legs N = (2k-1)! = 11! = 39 916 800 )
[S.Roland 2004]
Locomotion efficiency
• Gait control is on “automatic pilot”
• Automatic gait is energy efficient
• Perturbation introduces modification
Not fully nature way (weak adaptation, no decisions)
How the nature deals with locomotion?
- Initiate motion by putting energy
- Passive stage
- Generate
- Control for stability
- Repeat
- Brain?
- Nervous system?
- Spinal cord?
Inconceivable automation
Complexity of the phenomena involved in motor control
Central Nervous
System
Motor Nervous
System
Neuromuscular
Junction
 Models of musculoskeletal system …
 Models of Motor Nervous System
Extrait: Univ du Québec-etsmtl (cours) Extrait: collège de France ( L. Damn)
Extrait: Univ. Paris 8- cours Licence L.612
Spinal
cord
[P. Hénaff 2013]
Biological motor control
MU aggregates muscular fibers
innervated by the common
motor neuron. Contraction of
these fibers is thus
simultaneous.
Motor unit
Sensory nerve
Motor nerve
Dorsal root
Posterior horn
Anterior horn
Ventral root
Nervo-muscular
fiber
Reflexes: pathways
Muscle contraction as a
response to its own elongation
Muscle contraction as a
response to external stimuli
[P. Hénaff 2013]
Central Pattern Generator
• Automatic activity is controlled by spinal centers
• CPG (Central Pattern Generator) is a group of synaptic connections to generate
rhythmic motions
• The spinal pattern-generating networks do not require sensory input but nevertheless
are strongly regulated by input from limb proprioceptors
Sensory-motor architecture for locomotion
[McCrea 2006]
Biological sensory-motor architecture
models
Muscular contraction is put in place during embryonic life or after the birth
• Insects can walk directly upon
birth
• Most mammals require several
minutes to stand
• Humans require more than a
year to walk on two legs
How learning occurs
[ejjack2]
Mathematical modeling of CPG
[J. Nassour et al.
2010]
[P.F. Rowat,
A.I. Selverston
1997]
CPG approximation Limit cycle behavior
Gait Matrix
Coupling different CPG
Sensory feedback
Mathematical modeling of CPG
Hopf oscillator
Neural controllers
CPG of tronc
ipsilateral
And
Contralateral
Connections
Matsuoka model
Neural based CPG controller for biped locomotion [Taga 1995]
Neural controller
• 1 CPG per joint
• 2 coupled neurons per CPG
• Inhibitions: contra and ipsi latéral
• sensori motricity Intégration
Extrait de Taga 1995 (Biol. Cyb.)
Internal coupling of
the network
Articular sensory inputs:
speeds, forces, contact
ground
Model of Neuron i
(Matsuoka 1985)
[P. Hénaff 2013]
With couplingTemporal evaluation of frequency components of the
sagittal acceleration of the robot’s pelvis
• Automatically determines robot’s natural frequencies
• Continuously adapts to evolution of defects
Phase portraits of the oscillator
Without coupling
Learning
Synchronous
Compensation of articulation defects
ROBIAN
LISV, UVSQ
ROBIAN
LISV, UVSQ
[V.Khomenko, 2013,
LISV, UVSQ, France]
APPLICATION OF
RECURRENT NEURAL
NETWORKS
• Human-computer interaction
– Speech and handwriting recognition
– Music composition
– Activity recognition
• Identification and control
– Identification and control of dynamic systems by learning
– Biologically inspired robotics for adaptive locomotion
– Study of biological pattern structures forming and evaluation
Application of RNNs

More Related Content

PDF
Recurrent neural networks rnn
PDF
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PPTX
Recurrent neural network
PDF
Sequence to sequence (encoder-decoder) learning
PPTX
Single Layer Rosenblatt Perceptron
PDF
07 regularization
PPTX
Equivalence of DFAs and NFAs.pptx
Recurrent neural networks rnn
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent neural network
Sequence to sequence (encoder-decoder) learning
Single Layer Rosenblatt Perceptron
07 regularization
Equivalence of DFAs and NFAs.pptx

What's hot (20)

PDF
Recurrent Neural Networks, LSTM and GRU
PPT
Artificial Intelligence: Artificial Neural Networks
PPT
B trees in Data Structure
PPTX
Feed forward ,back propagation,gradient descent
PDF
Deep Learning through Examples
PPTX
INTRODUCTION TO NLP, RNN, LSTM, GRU
PDF
Artificial Intelligence - Hill climbing.
PPTX
Transformers AI PPT.pptx
PPTX
Recurrent Neural Network
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PPTX
Word embedding
PPTX
Inference in First-Order Logic 2
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
PPTX
Curse of dimensionality
PPT
Artificial Neural Networks - ANN
PPTX
Problem solving agents
PDF
MLIP - Chapter 3 - Introduction to deep learning
PDF
Introduction to Recurrent Neural Network
PDF
An introduction to the Transformers architecture and BERT
PDF
Training Neural Networks
Recurrent Neural Networks, LSTM and GRU
Artificial Intelligence: Artificial Neural Networks
B trees in Data Structure
Feed forward ,back propagation,gradient descent
Deep Learning through Examples
INTRODUCTION TO NLP, RNN, LSTM, GRU
Artificial Intelligence - Hill climbing.
Transformers AI PPT.pptx
Recurrent Neural Network
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Word embedding
Inference in First-Order Logic 2
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Curse of dimensionality
Artificial Neural Networks - ANN
Problem solving agents
MLIP - Chapter 3 - Introduction to deep learning
Introduction to Recurrent Neural Network
An introduction to the Transformers architecture and BERT
Training Neural Networks
Ad

Similar to Recurrent neural networks (20)

PPT
Sequence Alignment
PDF
Contributions to connectionist language modeling and its application to seque...
PPT
Chap09alg
PPT
Chap09alg
PPT
TuringMachineS FOUNDATION OF DATA SCIENCE
PPT
THEORY OF COMPUTATION PROCESS AND MECHANISUMS
PPT
TuringMachineS FOUNDATION OF DATA SCIENCE
PPT
THEORY OF COMPUTATION PROCESS AND MECHANISUMS
PDF
Learning Relational Grammars from Sequences of Actions
PPT
TuringMachines and its introduction for computer science studetns
PPTX
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
PPTX
Abductive learning of quantized stochastic processes
PPTX
Parsing using graphs
PPT
The Smith Waterman algorithm
PDF
RubyConf Argentina 2011
PDF
Skiena algorithm 2007 lecture17 edit distance
PDF
A Study on Compositional Semantics of Words in Distributional Spaces
PPT
Turing Machine
PDF
Examples for loopless
PDF
GeneIndex: an open source parallel program for enumerating and locating words...
Sequence Alignment
Contributions to connectionist language modeling and its application to seque...
Chap09alg
Chap09alg
TuringMachineS FOUNDATION OF DATA SCIENCE
THEORY OF COMPUTATION PROCESS AND MECHANISUMS
TuringMachineS FOUNDATION OF DATA SCIENCE
THEORY OF COMPUTATION PROCESS AND MECHANISUMS
Learning Relational Grammars from Sequences of Actions
TuringMachines and its introduction for computer science studetns
How Computer Games Help Children Learn (Stockholm University Dept of Educatio...
Abductive learning of quantized stochastic processes
Parsing using graphs
The Smith Waterman algorithm
RubyConf Argentina 2011
Skiena algorithm 2007 lecture17 edit distance
A Study on Compositional Semantics of Words in Distributional Spaces
Turing Machine
Examples for loopless
GeneIndex: an open source parallel program for enumerating and locating words...
Ad

Recently uploaded (20)

PPT
Project quality management in manufacturing
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Geodesy 1.pptx...............................................
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
additive manufacturing of ss316l using mig welding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
UNIT 4 Total Quality Management .pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Current and future trends in Computer Vision.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Project quality management in manufacturing
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Mechanical Engineering MATERIALS Selection
Geodesy 1.pptx...............................................
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
additive manufacturing of ss316l using mig welding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
UNIT 4 Total Quality Management .pptx
573137875-Attendance-Management-System-original
Current and future trends in Computer Vision.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
Internet of Things (IOT) - A guide to understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

Recurrent neural networks

  • 2. Contents  Recap: feed-forward artificial neural network  Temporal dependencies  Recurrent neural network architectures  RNN training  New RNN architectures  Practical considerations  Neural models for locomotion  Application of RNNs
  • 4. Feed-forward network W. McCulloch and W. Pitts , 1940s Abstract mathematical model of a brain cell Perceptron for classificationF. Rosenblatt, 1958 Multi-layer artificial neural networkP. Werbos, 1975 Input Features Input Input Input Petals Sepal Yellow patch VeinsIris flower Input layer Hidden layer(s) Output layer Hid- den Hid- den Hid- den Out- put Iris Out- put ¬Iris Decisions
  • 5. Feed-forward network Decisions are based on current inputs: • No memory about the past • No future scope A 𝒚x A Input layer Hidden layer(s) Output layer A Input Decision output Simplified representation: Vector of input features: Vector of predicted values: x 𝒚 Neural activation: A – some activation function (tanh etc…) 𝑤, 𝑏 – network parameters
  • 7. Temporal dependencies Analyzing temporal dependencies Frame 0 Frame 1 Frame 2 Frame 3 Frame 4 P(Iris): 0.1 P(¬Iris): 0.9 P(Iris): 0.11 P(¬Iris): 0.89 P(Iris): 0.2 P(¬Iris): 0.8 P(Iris): 0.45 P(¬Iris): 0.55 P(Iris): 0.9 P(¬Iris): 0.1 Decision on sequence of observations Improved decisions Stem: seen Petals: hidden Stem: seen Petals: hidden Stem: seen Petals: partial Stem: partial Petals: partial Stem: hidden Petals: seen
  • 8. For each state Reber Grammar Synthetic problem that can not be solved without memory. Learn to predict next possible edges Transitions have equal probabilities: P(1→2) = P(1→3) = 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 States (nodes) Transitions (edges)
  • 9. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 10. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 11. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 12. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 13. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 14. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 15. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 16. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 17. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 18. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  • 19. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar Input vector x at time t = 2 Output vector y at time t = 2
  • 20. Memory is important → Reasoning relies on experience
  • 21. Pro: Dependencies between features at different timestamps Cons: • Limited history of the input (< 10 timestamps) • Delay values should be set explicitly • Not general, can not solve complex tasks (such as Reber Grammar) • FFNN with delayed inputs • No internal state Time-delay neural network Input Features Input Input Input Input layer Hidden layer Output layer Hid- den Hid- den Hid- den Out- put 𝒚(𝒕) x(t) x(t-1) x(t-2) x(t-3) delay delay delay
  • 23. But… not working because not stable! Simple recurrence: feed-back output to input Naïve attempts… Lack of the feedback control A∑ 𝒚(𝒕)x(t) A Input layer Hidden layer Output layer A Input Decision output Past output state 1 step delay Expected 𝒚 𝒚 Obtained 𝒚 𝒚 Introducing recurrence
  • 24. A A ∑ 𝒚(𝒕)x(t) A Input layer Hidden layer Output layer A Context layer Pro: Fast to train because can be parallelized in time Cons: • Output transforms hidden state → nonlinear effects, information distorted • The output dimension may be too small → information in hidden states is truncated M.I. Jordan, 1986 1 step delay Jordan recurrent network Limited short-term memory Output-to hidden connections
  • 25. J.L. Elman, 1990 Often referenced as the basic RNN structure and called “Vanilla” RNN • Should see complete sequence to be trained • Can not be parallelized by timestamps • Has some important training difficulties…. A A ∑ 𝒚(𝒕)x(t) A Input layer Hidden layer Output layer A Context layer 1 step delay Hidden-to hidden connections make system Turing-complete Elman recurrent network
  • 26. 𝑾𝑖ℎ Weight matrix from input to hidden 𝑾 𝑜 Weight matrix from hidden to output 𝒙 𝑡 Input (feature) vector at time t 𝒚 𝑡 Network output vector at time t 𝒉 𝑡 Network internal (hidden) states vector at time t 𝑼 Weight matrix from hidden to hidden 𝒃 Bias parameter vector 𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃 𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡 Vanilla RNN
  • 27. Unfolding the network in time Vanilla RNN 𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃 𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡
  • 29. Backpropagation: • Reliable and controlled convergence • Supported by most of ML frameworks Evolutionary methods, expectation maximization, non-parametric methods, particle swarm optimization Target: obtain the network parameters that optimize the cost function Cost functions: log loss, mean squared root error etc… Tasks: Methods: • For each timestamp of the input sequence x predict output y (synchronously) • For the input sequence x predict the scalar value of y (e.g., at end of sequence) • For the input sequence x of length Lx generate the output sequence y of different length Ly Research RNN training
  • 30. 1. Unfold the network. 2. Repeat for the train data: 1. Given some input sequence 𝒙 2. For t in 0, N-1: 1. Forward-propagate 2. Initialize hidden state to the past value 𝒉 𝑡−1 3. Obtain output sequence 𝒚 4. Calculate error 𝑬 𝒚, 𝒚 5. Back-propagate error across the unfolded network 6. Average the weights 7. Compute next hidden state value 𝒉 𝑡 𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃 𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡 𝑬 𝒚, 𝒚 = − 𝑡 𝒚 𝒕 lg 𝒚 𝒕 E.g., cross entropy loss: Back-propagation through time
  • 31. Apply chain rule: Back-propagation through time 𝜕𝑬 𝟐 𝜕𝜽 = 𝑘=0 2 𝜕𝑬 𝟐 𝜕 𝒚 𝟐 ∙ 𝜕 𝒚 𝟐 𝜕𝒉 𝟐 ∙ 𝜕𝒉 𝟐 𝜕𝒉 𝒌 ∙ 𝜕𝒉 𝒌 𝜕𝜽 𝜽 - Network parametersFor time 2: 𝜕𝒉 𝟐 𝜕𝒉 𝟎 = 𝜕𝒉 𝟐 𝜕𝒉 𝟏 ∙ 𝜕𝒉 𝟏 𝜕𝒉 𝟎
  • 38. Saturation Gradient close to 0 Saturated neurons gradients → 0 • Smaller weigh parameters leads to faster gradients vanishing. • Very big initial parameters make the gradient descent to diverge fast (explode). Drive previous layers gradients to 0 (especially for far time-stamps) Known problem for deep feed-forward networks. For recurrent networks (even shallow) makes impossible to learn long-term dependencies! 𝝏𝒉 𝑡 𝝏𝒉0 = 𝝏𝒉 𝑡 𝝏𝒉 𝑡−1 ∙ ⋯ ∙ 𝝏𝒉3 𝝏𝒉2 ∙ 𝝏𝒉2 𝝏𝒉1 ∙ 𝝏𝒉1 𝝏𝒉0 • Decays exponentially • Network stops learning, can’t update • Impossible to learn correlations • between temporally distant events Problem: vanishing gradients
  • 39. Network can not converge and weigh parameters do not stabilize Diagnostics: NaNs; Cost function large fluctuations Large increase in the norm of the gradient during training Pascanou R. et al, On the difficulty of training recurrent neural networks. arXiv (2012) Problem: exploding gradients Solutions: • Use gradient clipping • Try reduce learning rate • Change loss function by setting constrains on weights (L1/L2 norms)
  • 40. Deep networks train difficulties: • Vanishing gradient • Exploding gradient Possible solutions: • One of the previously proposed solutions or • Use unsupervised pre-training → difficult to implement, sometimes the unsupervised solution differs much from the supervised or • Improve network architecture! Fundamental deep learning problem
  • 42. Echo State Network Readout Only readout neurons are trained! Herbert Jaeger, 2001 In practice: • Easy to over-fit (models learns by heart) – gives good results on the train data only • The reservoir hyper- parameters optimization is not evident Reservoir computing
  • 43. Liquid state machine Similar to ESN, but using more biological plausible neuron models → spiking (dynamic) neurons In practice: • Still, more a research area • Requires special hardware to be computationally efficient Daniel Brunner Tal Dahan and Astar Sade Reservoir computing
  • 44. • No Input Gate • No Forget Gate • No Output Gate • No Input Activation Function • No Output Activation Function • No Peepholes • Coupled Input and • Forget Gate • Full Gate Recurrence Variants S. Hochreiter & J. Schmidhuber, 1997 Long short-term memory Due to gaining routing mechanism, can be efficiently trained to learn LONG-TERM dependencies
  • 45. Has context in both directions, at any timestamp Bidirectional RNN
  • 46. Last-1 output = First+1 output BPXXXXXPE BTXXXXXXXXTE Testing capacity to maintain long term dependencies Correct cases BT ….. TE BP ….. PE Incorrect cases BT ….. PE BP ….. TE System must be able to learn to compare First+1 symbol with Last-1 symbol Embedded Reber Grammar
  • 48. Masking input (output) Input (output) has variable length Data batch
  • 49. Length of input ≠ length of output •CTC loss function•Encoder-decoder architecture Transform the network outputs into a conditional probability distribution over label sequences - C - A - T - - BLANK labelling Result decoding Raw output: -----CCCC---AA-TTTT--- 1) Remove repeating symbols: -C-A-T- 2) Remove blanks: CAT
  • 51. Locomotion principles in nature [S.Roland et al., 2004] Locomotion: movement or the ability to move from one place to another Manipulation ≠ Locomotion Aperiodic series of motions Stable Periodic motion gaits Quasi stable [A. Ijspeert et al., 2007]
  • 52. Wheeled on soft ground [S.Roland et al. 2004 ] Locomotion efficiency
  • 53. Nature: no “pure” wheeled locomotion Reason: variety of surfaces, rough terrain, adaptation is necessary Biological locomotion exploits patterns The number of legs influences • Mechanical complexity • Control complexity • Generated patterns (for 6 legs N = (2k-1)! = 11! = 39 916 800 ) [S.Roland 2004] Locomotion efficiency
  • 54. • Gait control is on “automatic pilot” • Automatic gait is energy efficient • Perturbation introduces modification Not fully nature way (weak adaptation, no decisions) How the nature deals with locomotion? - Initiate motion by putting energy - Passive stage - Generate - Control for stability - Repeat - Brain? - Nervous system? - Spinal cord? Inconceivable automation
  • 55. Complexity of the phenomena involved in motor control Central Nervous System Motor Nervous System Neuromuscular Junction  Models of musculoskeletal system …  Models of Motor Nervous System Extrait: Univ du Québec-etsmtl (cours) Extrait: collège de France ( L. Damn) Extrait: Univ. Paris 8- cours Licence L.612 Spinal cord [P. Hénaff 2013] Biological motor control
  • 56. MU aggregates muscular fibers innervated by the common motor neuron. Contraction of these fibers is thus simultaneous. Motor unit Sensory nerve Motor nerve Dorsal root Posterior horn Anterior horn Ventral root Nervo-muscular fiber Reflexes: pathways Muscle contraction as a response to its own elongation Muscle contraction as a response to external stimuli [P. Hénaff 2013]
  • 57. Central Pattern Generator • Automatic activity is controlled by spinal centers • CPG (Central Pattern Generator) is a group of synaptic connections to generate rhythmic motions • The spinal pattern-generating networks do not require sensory input but nevertheless are strongly regulated by input from limb proprioceptors
  • 58. Sensory-motor architecture for locomotion [McCrea 2006] Biological sensory-motor architecture models
  • 59. Muscular contraction is put in place during embryonic life or after the birth • Insects can walk directly upon birth • Most mammals require several minutes to stand • Humans require more than a year to walk on two legs How learning occurs [ejjack2]
  • 60. Mathematical modeling of CPG [J. Nassour et al. 2010] [P.F. Rowat, A.I. Selverston 1997]
  • 61. CPG approximation Limit cycle behavior Gait Matrix Coupling different CPG Sensory feedback Mathematical modeling of CPG Hopf oscillator
  • 62. Neural controllers CPG of tronc ipsilateral And Contralateral Connections Matsuoka model Neural based CPG controller for biped locomotion [Taga 1995] Neural controller • 1 CPG per joint • 2 coupled neurons per CPG • Inhibitions: contra and ipsi latéral • sensori motricity Intégration Extrait de Taga 1995 (Biol. Cyb.) Internal coupling of the network Articular sensory inputs: speeds, forces, contact ground Model of Neuron i (Matsuoka 1985) [P. Hénaff 2013]
  • 63. With couplingTemporal evaluation of frequency components of the sagittal acceleration of the robot’s pelvis • Automatically determines robot’s natural frequencies • Continuously adapts to evolution of defects Phase portraits of the oscillator Without coupling Learning Synchronous Compensation of articulation defects ROBIAN LISV, UVSQ ROBIAN LISV, UVSQ [V.Khomenko, 2013, LISV, UVSQ, France]
  • 65. • Human-computer interaction – Speech and handwriting recognition – Music composition – Activity recognition • Identification and control – Identification and control of dynamic systems by learning – Biologically inspired robotics for adaptive locomotion – Study of biological pattern structures forming and evaluation Application of RNNs