RNN, LSTM and Seq-2-Seq Models

6/23/2016 Presentation Final
ﬁle:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 1/11
Introductory Presentation on RNN, LSTM and Seq-2-Seq
Models
by Jayeol Chun and Sang-Hyun Eun
1. Brief Overview of Theory behind RNN
Q: What is RNN?

Feed-Forward vs. Feed-Back : Static vs. Dynamic
As opposed to Convolution Neural Network (CNN) where there are no cycles, Recurrent
Neural Network (RNN) maintains the Persistence of Information by linking the outputs of
previous computations to the later computations, and is thus well suited for processing sequence of
characters, naturally making it an ideal tool in NLP.
Basic RNN Computation in Theory
class RNN:
# ...
def step(self, x):
# update the hidden state
self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
# compute the output vector
y = np.dot(self.W_hy, self.h)
return y
# main instruction
rnn = RNN()
y = rnn.step(x) # x is an input vector, y is the RNN's output vector
~ Point to Take Away : Quite Simple !!
Challenge
Unstable Gradient Problem
-> "The gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers."
In at least some deep neural networks, the gradient tends to get smaller as we move backward through the
hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers.
Question : The more the hidden layers, the better ??

"Backpropagation computes gradients by chain rule -> This has the eﬀect of multiplying n of these small
numbers to compute gradients of the front layers in an n-layer network, meaning that the gradient (error signal)
decreases exponentially with n and the front layers train very slowly."
2. Long Short Term Memory Network (LSTM)

The most commonly used type of RNN that addresses the above challenge
Can learn to recognize context-sensitive languages
Key is the cell state. It runs down the straight down the entire chain, with only minor linear interactions
It updates its information with a structure called gates
There are the 3 main types of gates
Forget Gate Layer - Sigmoid layer and chooses what information to forget.
Input Gate Layer - Choose what values to update and whats values to add
Output Gate Layer - Based on our cell state, ﬁlter it to decide which values we want to
output
3. Sequence to Sequence Model

Seq-2-Seq Model consists of two RNNs : an encoder that processes the input and maps it to a vector, and a
decoder that generates the output sequence of symbols from the vector representation. Speciﬁcally, the
encoder maps a variable-length source sequence to a ﬁxed-length vector, and the decoder maps the vector
representation back to a variable-length target sequence of symbols. The two networks are trained jointly to
maximize the conditional probability of the target sequence given a source sequence.
Each box in the picture above represents a cell of the RNN.

Example
Sample RNN / Seq-2-Seq Code

In [1]: import tensorflow as tf
from tensorflow.models.rnn import rnn, rnn_cell
import numpy as np
char_rdic = ['h','e','l','o'] # id -> char
char_dic = {w: i for i, w in enumerate(char_rdic)} # char -> id
sample = [char_dic[c] for c in "hello"] # to index
x_data = np.array([ [1,0,0,0], # h
[0,1,0,0], # e
[0,0,1,0], # l
[0,0,1,0]], # l
dtype='f')
# Configuration
char_vocab_size = len(char_dic)
rnn_size = 4 #char_vocab_size # 1 hot coding (one of 4)
time_step_size = 4 # 'hell' -> predict 'ello'
batch_size = 1 # one sample
# RNN model
rnn_cell = rnn_cell.BasicRNNCell(rnn_size)
state = tf.zeros([batch_size, rnn_cell.state_size])
X_split = tf.split(0, time_step_size, x_data)
outputs, state = tf.nn.seq2seq.rnn_decoder ( X_split, state, rnn_cell)
print (state)
print (outputs)
# logits: list of 2D Tensors of shape [batch_size x num_decoder_symbol
s].
# targets: list of 1D batch-sized int32 Tensors of the same length as lo
gits.
# weights: list of 1D batch-sized float-Tensors of the same length as lo
gits.
logits = tf.reshape(tf.concat(1, outputs), [-1, rnn_size])
targets = tf.reshape(sample[1:], [-1])
weights = tf.ones([time_step_size * batch_size])
loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [targets], [weig
hts])
cost = tf.reduce_sum(loss) / batch_size
train_op = tf.train.RMSPropOptimizer(0.01, 0.9).minimize(cost)
# Launch the graph in a session
with tf.Session() as sess:
# you need to initialize all variables
tf.initialize_all_variables().run()
for i in range(100):
sess.run(train_op)
result = sess.run(tf.arg_max(logits, 1))
print (result, [char_rdic[t] for t in result])

Tensor("rnn_decoder/BasicRNNCell_3/Tanh:0", shape=(1, 4), dtype=float3
2)
[<tf.Tensor 'rnn_decoder/BasicRNNCell/Tanh:0' shape=(1, 4) dtype=float3
2>, <tf.Tensor 'rnn_decoder/BasicRNNCell_1/Tanh:0' shape=(1, 4) dtype=f
loat32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_2/Tanh:0' shape=(1, 4) dt
ype=float32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_3/Tanh:0' shape=(1,
4) dtype=float32>]
(array([2, 0, 0, 0]), ['l', 'h', 'h', 'h'])
(array([2, 0, 0, 0]), ['l', 'h', 'h', 'h'])
(array([2, 0, 3, 0]), ['l', 'h', 'o', 'h'])
(array([2, 0, 3, 0]), ['l', 'h', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 2, 2]), ['l', 'l', 'l', 'l'])
(array([2, 2, 2, 2]), ['l', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])

(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])

4. RNN Applications
- Language Modeling
- Conversation Modeling / Question Answering
- Machine Translation
- Speech Recognition
- Image / Video Captioning
- Image / Music Generation
5. References
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- https://en.wikipedia.org/wiki/Convolutional_neural_network
- https://en.wikipedia.org/wiki/Recurrent_neural_network
- http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-in
troduction-to-rnns/
- https://github.com/hunkim/ml
- https://www.tensorflow.org/versions/r0.9/tutorials/index.html
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- http://arxiv.org/pdf/1409.3215v3.pdf
- http://arxiv.org/pdf/1406.1078v3.pdf
Code References:
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ker
nel_tests/rnn_test.py
- https://github.com/hans/ipython-notebooks/blob/master/tf/TF%20tutorial.ipy
nb
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rn
n/ptb/ptb_word_lm.py
- https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py

RNN, LSTM and Seq-2-Seq Models

More Related Content

What's hot (19)

Similar to RNN, LSTM and Seq-2-Seq Models (20)

Recently uploaded (20)

RNN, LSTM and Seq-2-Seq Models