SlideShare a Scribd company logo
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 1/11
Introductory Presentation on RNN, LSTM and Seq-2-Seq
Models
by Jayeol Chun and Sang-Hyun Eun
1. Brief Overview of Theory behind RNN
Q: What is RNN?
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 2/11
Feed-Forward vs. Feed-Back : Static vs. Dynamic
As opposed to Convolution Neural Network (CNN) where there are no cycles, Recurrent
Neural Network (RNN) maintains the Persistence of Information by linking the outputs of
previous computations to the later computations, and is thus well suited for processing sequence of
characters, naturally making it an ideal tool in NLP.
Basic RNN Computation in Theory
class RNN:
# ...
def step(self, x):
# update the hidden state
self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
# compute the output vector
y = np.dot(self.W_hy, self.h)
return y
# main instruction
rnn = RNN()
y = rnn.step(x) # x is an input vector, y is the RNN's output vector
~ Point to Take Away : Quite Simple !!
Challenge
Unstable Gradient Problem
-> "The gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers."
In at least some deep neural networks, the gradient tends to get smaller as we move backward through the
hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers.
Question : The more the hidden layers, the better ??
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 3/11
"Backpropagation computes gradients by chain rule -> This has the effect of multiplying n of these small
numbers to compute gradients of the front layers in an n-layer network, meaning that the gradient (error signal)
decreases exponentially with n and the front layers train very slowly."
2. Long Short Term Memory Network (LSTM)
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 4/11
The most commonly used type of RNN that addresses the above challenge
Can learn to recognize context-sensitive languages
Key is the cell state. It runs down the straight down the entire chain, with only minor linear interactions
It updates its information with a structure called gates
There are the 3 main types of gates
Forget Gate Layer - Sigmoid layer and chooses what information to forget.
Input Gate Layer - Choose what values to update and whats values to add
Output Gate Layer - Based on our cell state, filter it to decide which values we want to
output
3. Sequence to Sequence Model
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 5/11
Seq-2-Seq Model consists of two RNNs : an encoder that processes the input and maps it to a vector, and a
decoder that generates the output sequence of symbols from the vector representation. Specifically, the
encoder maps a variable-length source sequence to a fixed-length vector, and the decoder maps the vector
representation back to a variable-length target sequence of symbols. The two networks are trained jointly to
maximize the conditional probability of the target sequence given a source sequence.
Each box in the picture above represents a cell of the RNN.
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 6/11
Example
Sample RNN / Seq-2-Seq Code
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 7/11
In [1]: import tensorflow as tf
from tensorflow.models.rnn import rnn, rnn_cell
import numpy as np
char_rdic = ['h','e','l','o'] # id -> char
char_dic = {w: i for i, w in enumerate(char_rdic)} # char -> id
sample = [char_dic[c] for c in "hello"] # to index
x_data = np.array([ [1,0,0,0], # h
[0,1,0,0], # e
[0,0,1,0], # l
[0,0,1,0]], # l
dtype='f')
# Configuration
char_vocab_size = len(char_dic)
rnn_size = 4 #char_vocab_size # 1 hot coding (one of 4)
time_step_size = 4 # 'hell' -> predict 'ello'
batch_size = 1 # one sample
# RNN model
rnn_cell = rnn_cell.BasicRNNCell(rnn_size)
state = tf.zeros([batch_size, rnn_cell.state_size])
X_split = tf.split(0, time_step_size, x_data)
outputs, state = tf.nn.seq2seq.rnn_decoder ( X_split, state, rnn_cell)
print (state)
print (outputs)
# logits: list of 2D Tensors of shape [batch_size x num_decoder_symbol
s].
# targets: list of 1D batch-sized int32 Tensors of the same length as lo
gits.
# weights: list of 1D batch-sized float-Tensors of the same length as lo
gits.
logits = tf.reshape(tf.concat(1, outputs), [-1, rnn_size])
targets = tf.reshape(sample[1:], [-1])
weights = tf.ones([time_step_size * batch_size])
loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [targets], [weig
hts])
cost = tf.reduce_sum(loss) / batch_size
train_op = tf.train.RMSPropOptimizer(0.01, 0.9).minimize(cost)
# Launch the graph in a session
with tf.Session() as sess:
# you need to initialize all variables
tf.initialize_all_variables().run()
for i in range(100):
sess.run(train_op)
result = sess.run(tf.arg_max(logits, 1))
print (result, [char_rdic[t] for t in result])
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 8/11
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 9/11
Tensor("rnn_decoder/BasicRNNCell_3/Tanh:0", shape=(1, 4), dtype=float3
2)
[<tf.Tensor 'rnn_decoder/BasicRNNCell/Tanh:0' shape=(1, 4) dtype=float3
2>, <tf.Tensor 'rnn_decoder/BasicRNNCell_1/Tanh:0' shape=(1, 4) dtype=f
loat32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_2/Tanh:0' shape=(1, 4) dt
ype=float32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_3/Tanh:0' shape=(1,
4) dtype=float32>]
(array([2, 0, 0, 0]), ['l', 'h', 'h', 'h'])
(array([2, 0, 0, 0]), ['l', 'h', 'h', 'h'])
(array([2, 0, 3, 0]), ['l', 'h', 'o', 'h'])
(array([2, 0, 3, 0]), ['l', 'h', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 0]), ['l', 'l', 'o', 'h'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 3, 2]), ['l', 'l', 'o', 'l'])
(array([2, 2, 2, 2]), ['l', 'l', 'l', 'l'])
(array([2, 2, 2, 2]), ['l', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 10/11
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
(array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
6/23/2016 Presentation Final
file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 11/11
4. RNN Applications
- Language Modeling
- Conversation Modeling / Question Answering
- Machine Translation
- Speech Recognition
- Image / Video Captioning
- Image / Music Generation
5. References
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- https://en.wikipedia.org/wiki/Convolutional_neural_network
- https://en.wikipedia.org/wiki/Recurrent_neural_network
- http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-in
troduction-to-rnns/
- https://github.com/hunkim/ml
- https://www.tensorflow.org/versions/r0.9/tutorials/index.html
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- http://arxiv.org/pdf/1409.3215v3.pdf
- http://arxiv.org/pdf/1406.1078v3.pdf
Code References:
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ker
nel_tests/rnn_test.py
- https://github.com/hans/ipython-notebooks/blob/master/tf/TF%20tutorial.ipy
nb
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rn
n/ptb/ptb_word_lm.py
- https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py

More Related Content

PDF
Seq2Seq (encoder decoder) model
PDF
Deep Learning: Recurrent Neural Network (Chapter 10)
PDF
Attention mechanisms with tensorflow
PPTX
Introduction For seq2seq(sequence to sequence) and RNN
PDF
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
PDF
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
PPTX
Neural network basic and introduction of Deep learning
PPTX
RNN & LSTM: Neural Network for Sequential Data
Seq2Seq (encoder decoder) model
Deep Learning: Recurrent Neural Network (Chapter 10)
Attention mechanisms with tensorflow
Introduction For seq2seq(sequence to sequence) and RNN
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Neural network basic and introduction of Deep learning
RNN & LSTM: Neural Network for Sequential Data

What's hot (19)

PPTX
TypeScript and Deep Learning
PDF
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
PPTX
Recurrent Neural Networks for Text Analysis
PPTX
[Paper Reading] Attention is All You Need
PPTX
Java and Deep Learning (Introduction)
PDF
Introduction to Tree-LSTMs
PDF
Recurrent Neural Networks, LSTM and GRU
PDF
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
PDF
Recurrent Neural Networks
PPTX
Tutorial on convolutional neural networks
PDF
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
PPTX
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
PPTX
Deep Learning, Keras, and TensorFlow
PDF
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
PDF
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
PDF
The Perceptron (D1L2 Deep Learning for Speech and Language)
PDF
Neural Network as a function
PPTX
Angular and Deep Learning
PDF
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
TypeScript and Deep Learning
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Recurrent Neural Networks for Text Analysis
[Paper Reading] Attention is All You Need
Java and Deep Learning (Introduction)
Introduction to Tree-LSTMs
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks
Tutorial on convolutional neural networks
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Deep Learning, Keras, and TensorFlow
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
The Perceptron (D1L2 Deep Learning for Speech and Language)
Neural Network as a function
Angular and Deep Learning
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Ad

Similar to RNN, LSTM and Seq-2-Seq Models (20)

PDF
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
PDF
Sequencing and Attention Models - 2nd Version
PDF
Cheatsheet recurrent-neural-networks
PDF
Generating Sequences with Deep LSTMs & RNNS in julia
PPTX
10.0 SequenceModeling-merged-compressed_edited.pptx
PDF
Rnn presentation 2
PDF
Recurrent Neural Networks
PDF
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Recent Progress in RNN and NLP
PPTX
Introduction to deep learning
PDF
RNN sharing at Trend Micro
PDF
Recurrent and Recursive Networks (Part 1)
PDF
RNNs for Timeseries Analysis
PPTX
Tensorflow, deep learning and recurrent neural networks without a ph d
PDF
Cs231n 2017 lecture10 Recurrent Neural Networks
PDF
Artificial Intelligence Practical Manual.pdf
PDF
Recurrent Neural Networks (RNN): Unlocking Sequential Data Processing
PDF
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
PDF
Towards typesafe deep learning in scala
PDF
Recurrent Neural Networks
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Sequencing and Attention Models - 2nd Version
Cheatsheet recurrent-neural-networks
Generating Sequences with Deep LSTMs & RNNS in julia
10.0 SequenceModeling-merged-compressed_edited.pptx
Rnn presentation 2
Recurrent Neural Networks
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recent Progress in RNN and NLP
Introduction to deep learning
RNN sharing at Trend Micro
Recurrent and Recursive Networks (Part 1)
RNNs for Timeseries Analysis
Tensorflow, deep learning and recurrent neural networks without a ph d
Cs231n 2017 lecture10 Recurrent Neural Networks
Artificial Intelligence Practical Manual.pdf
Recurrent Neural Networks (RNN): Unlocking Sequential Data Processing
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Towards typesafe deep learning in scala
Recurrent Neural Networks
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Monthly Chronicles - July 2025
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

RNN, LSTM and Seq-2-Seq Models

  • 1. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 1/11 Introductory Presentation on RNN, LSTM and Seq-2-Seq Models by Jayeol Chun and Sang-Hyun Eun 1. Brief Overview of Theory behind RNN Q: What is RNN?
  • 2. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 2/11 Feed-Forward vs. Feed-Back : Static vs. Dynamic As opposed to Convolution Neural Network (CNN) where there are no cycles, Recurrent Neural Network (RNN) maintains the Persistence of Information by linking the outputs of previous computations to the later computations, and is thus well suited for processing sequence of characters, naturally making it an ideal tool in NLP. Basic RNN Computation in Theory class RNN: # ... def step(self, x): # update the hidden state self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x)) # compute the output vector y = np.dot(self.W_hy, self.h) return y # main instruction rnn = RNN() y = rnn.step(x) # x is an input vector, y is the RNN's output vector ~ Point to Take Away : Quite Simple !! Challenge Unstable Gradient Problem -> "The gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers." In at least some deep neural networks, the gradient tends to get smaller as we move backward through the hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers. Question : The more the hidden layers, the better ??
  • 3. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 3/11 "Backpropagation computes gradients by chain rule -> This has the effect of multiplying n of these small numbers to compute gradients of the front layers in an n-layer network, meaning that the gradient (error signal) decreases exponentially with n and the front layers train very slowly." 2. Long Short Term Memory Network (LSTM)
  • 4. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 4/11 The most commonly used type of RNN that addresses the above challenge Can learn to recognize context-sensitive languages Key is the cell state. It runs down the straight down the entire chain, with only minor linear interactions It updates its information with a structure called gates There are the 3 main types of gates Forget Gate Layer - Sigmoid layer and chooses what information to forget. Input Gate Layer - Choose what values to update and whats values to add Output Gate Layer - Based on our cell state, filter it to decide which values we want to output 3. Sequence to Sequence Model
  • 5. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 5/11 Seq-2-Seq Model consists of two RNNs : an encoder that processes the input and maps it to a vector, and a decoder that generates the output sequence of symbols from the vector representation. Specifically, the encoder maps a variable-length source sequence to a fixed-length vector, and the decoder maps the vector representation back to a variable-length target sequence of symbols. The two networks are trained jointly to maximize the conditional probability of the target sequence given a source sequence. Each box in the picture above represents a cell of the RNN.
  • 7. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 7/11 In [1]: import tensorflow as tf from tensorflow.models.rnn import rnn, rnn_cell import numpy as np char_rdic = ['h','e','l','o'] # id -> char char_dic = {w: i for i, w in enumerate(char_rdic)} # char -> id sample = [char_dic[c] for c in "hello"] # to index x_data = np.array([ [1,0,0,0], # h [0,1,0,0], # e [0,0,1,0], # l [0,0,1,0]], # l dtype='f') # Configuration char_vocab_size = len(char_dic) rnn_size = 4 #char_vocab_size # 1 hot coding (one of 4) time_step_size = 4 # 'hell' -> predict 'ello' batch_size = 1 # one sample # RNN model rnn_cell = rnn_cell.BasicRNNCell(rnn_size) state = tf.zeros([batch_size, rnn_cell.state_size]) X_split = tf.split(0, time_step_size, x_data) outputs, state = tf.nn.seq2seq.rnn_decoder ( X_split, state, rnn_cell) print (state) print (outputs) # logits: list of 2D Tensors of shape [batch_size x num_decoder_symbol s]. # targets: list of 1D batch-sized int32 Tensors of the same length as lo gits. # weights: list of 1D batch-sized float-Tensors of the same length as lo gits. logits = tf.reshape(tf.concat(1, outputs), [-1, rnn_size]) targets = tf.reshape(sample[1:], [-1]) weights = tf.ones([time_step_size * batch_size]) loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [targets], [weig hts]) cost = tf.reduce_sum(loss) / batch_size train_op = tf.train.RMSPropOptimizer(0.01, 0.9).minimize(cost) # Launch the graph in a session with tf.Session() as sess: # you need to initialize all variables tf.initialize_all_variables().run() for i in range(100): sess.run(train_op) result = sess.run(tf.arg_max(logits, 1)) print (result, [char_rdic[t] for t in result])
  • 9. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 9/11 Tensor("rnn_decoder/BasicRNNCell_3/Tanh:0", shape=(1, 4), dtype=float3 2) [<tf.Tensor 'rnn_decoder/BasicRNNCell/Tanh:0' shape=(1, 4) dtype=float3 2>, <tf.Tensor 'rnn_decoder/BasicRNNCell_1/Tanh:0' shape=(1, 4) dtype=f loat32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_2/Tanh:0' shape=(1, 4) dt ype=float32>, <tf.Tensor 'rnn_decoder/BasicRNNCell_3/Tanh:0' shape=(1, 4) dtype=float32>] (array([2, 0, 0, 0]), ['l', 'h', 'h', 'h']) (array([2, 0, 0, 0]), ['l', 'h', 'h', 'h']) (array([2, 0, 3, 0]), ['l', 'h', 'o', 'h']) (array([2, 0, 3, 0]), ['l', 'h', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 0]), ['l', 'l', 'o', 'h']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 3, 2]), ['l', 'l', 'o', 'l']) (array([2, 2, 2, 2]), ['l', 'l', 'l', 'l']) (array([2, 2, 2, 2]), ['l', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l'])
  • 10. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 10/11 (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 2]), ['e', 'l', 'l', 'l']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o']) (array([1, 2, 2, 3]), ['e', 'l', 'l', 'o'])
  • 11. 6/23/2016 Presentation Final file:///Users/jayeolchun/Downloads/Presentation+Final%20(1).html 11/11 4. RNN Applications - Language Modeling - Conversation Modeling / Question Answering - Machine Translation - Speech Recognition - Image / Video Captioning - Image / Music Generation 5. References - http://colah.github.io/posts/2015-08-Understanding-LSTMs/ - https://en.wikipedia.org/wiki/Convolutional_neural_network - https://en.wikipedia.org/wiki/Recurrent_neural_network - http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-in troduction-to-rnns/ - https://github.com/hunkim/ml - https://www.tensorflow.org/versions/r0.9/tutorials/index.html - http://karpathy.github.io/2015/05/21/rnn-effectiveness/ - http://arxiv.org/pdf/1409.3215v3.pdf - http://arxiv.org/pdf/1406.1078v3.pdf Code References: - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ker nel_tests/rnn_test.py - https://github.com/hans/ipython-notebooks/blob/master/tf/TF%20tutorial.ipy nb - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rn n/ptb/ptb_word_lm.py - https://gist.github.com/karpathy/d4dee566867f8291f086#file-min-char-rnn-py