Lecture 7: Recurrent Neural Networks

Recurrent Neural Networks
Sang Jun Lee
Ph.D. candidate, POSTECH
Email: lsj4u0208@postech.ac.kr
EECE695J 전자전기공학특론J(딥러닝기초및철강공정에의활용) – LECTURE 7 (2017. 11. 10)

2
▣ Lecture 6: Convolutional Neural Network
1-page Review
Convolution layer Pooling layer
32x32x3 image
5x5x3 filter
Convolve (slide)
over all spatial
locations
Activation maps
Depth slice
Max pool with
2x2 filters and
stride 2
“Parameters are shared on
spatial domain”

3
Introduction to recurrent neural network
Vanilla neural network
h
𝑥
𝑦
𝑥𝑥
𝑥 : concatenated data of 𝑥 , 𝑥 , 𝑥 , ⋯
h
𝑥
y
𝑊
𝑊𝑊𝑊
𝑊 𝑊 ; 𝑊 ; 𝑊 ; ⋯
A naive idea for handling sequential data
 We usually want to predict a vector at a time step for a time domain data 𝑥

4
ℎ
𝑥
𝑦
𝑥𝑥
ℎℎ
𝑊𝑊𝑊
𝑊𝑊𝑊
Recurrent neural network (RNN)
 Assume that
the relation between 𝑥 and 𝑥 is similar to the relation between 𝑥 and 𝑥
→ Parameter sharing for 𝑊
 Identical feature extraction from inputs
→ Parameter sharing for 𝑊

5
ℎ
𝑥
𝑦
𝑥𝑥
ℎℎ
𝑊𝑊𝑊
𝑊𝑊
Recurrent neural network (RNN)
 Multiple copies of a same network (same function and same paramters)
 ℎ : a hidden state that consists of a vector
ℎ 𝑓 ℎ , 𝑥
ℎ tanh 𝑊 ⋅ ℎ 𝑊 ⋅ 𝑥
𝑦 𝑊 ⋅ ℎ
ℎ
Usually set to 0
Fully-
connected
layer
RNN cell
Input layer
Output layer
(RNN feature)

6
Various architectures of RNN
 Flexibility for handling various types of data
Vanilla neural network

7
Various architectures of RNN
 Flexibility for handling various types of data
e.g. machine translation
(sequence of words
→ sequence of words)

8
Limitations of vanilla RNN
 Vanilla RNN works well for a small time step
 However, the sensitivity of the input values decays over time in a standard RNN
“the clouds are in the sky”
“I grew up in France
…
I speak fluent French.”

9
LSTM (long short-term memory)
 A standard RNN contains a single layer in the repeating module

10
 A special kind of RNN for learning long-term dependencies
 Introduced by Hochreiter & Schmidhuber (1997)

11
The key idea of LSTMs : cell state
 The cell state is kind of like a conveyor belt

12
Forget gate
 LSTM have the ability to remove or add information to the cell state, carefully regulated by
structures call gates
 The decision what information we’re going to throw away from the cell state is made by a
sigmoid layer called forget gate layer

13
Input gate layer
 Decide what new information we’re going to store in the cell state
 First, input gate layer decide which values we’ll update
 Next, tanh layer creates a vector of new candidate values
 Finally, combine two to create an update to the state

14
Update
Output
Forget previous information
Add new information
Output is based on the cell state

15

16
Variants of RNN
Gated Recurrent Unit (GRU)
 Combine the forget and input gates into a single update gate
 Merge the cell state and hidden state

17
Implementation of RNN
Manipulation of time series data
Split raw data into train, validation, and test dataset
def split_data(data, val_size=0.2, test_size=0.2):
ntest = int(round(len(data) * (1 ‐ test_size)))
nval = int(round(len(data.iloc[:ntest]) * (1 ‐ val_size)))
df_train, df_val, df_test = data.iloc[:nval], data.iloc[nval:ntest],
data.iloc[ntest:]
return df_train, df_val, df_test
train, val, test = split_data(raw_data, val_size=0.2, test_size=0.2)
Raw data
(100%)
Train
(80%)
Validation
(20%)
Test
(20%)

18
Generate sequence pair (x, y)
def rnn_data(data, time_steps, labels=False):
"""
creates new data frame based on previous observation
* example:
l = [1, 2, 3, 4, 5]
time_steps = 2
‐> labels == False [[1, 2], [2, 3], [3, 4]]
‐> labels == True [3, 4, 5]
"""
rnn_df = []
for i in range(len(data) ‐ time_steps):
if labels:
try:
rnn_df.append(data.iloc[i + time_steps].as_matrix())
except AttributeError:
rnn_df.append(data.iloc[i + time_steps])
else:
data_ = data.iloc[i: i + time_steps].as_matrix()
rnn_df.append(data_ if len(data_.shape) > 1 else [[i] for i in data_])
return np.array(rnn_df)

19
Generate sequence pair (x, y)
time_steps = 10
train_x = rnn_data(df_train, time_steps, labels=false)
train_y = rnn_data(df_train, time_steps, labels=true)
Training data [1:10000]
x #01
[1, 2, 3, …,10]
y #01
11
…
…
x #02
[2, 3, 4, …,11]
y #02
12
x #9990
[9990, 9991, 9992, …,9999]
y #9990
10000
train_x
train_y

20
Split each sample data
time_step = 10
x_split = tf.unpack(x_data, time_steps,1)
tf.unpack
1 2 3 10
𝑥 𝑥 𝑥 … 𝑥
…
x #01
[1, 2, 3, …,10]
Placeholder

21
Choose a RNN cell
Connect input and recurrent layer
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
output, state = tf.nn.rnn(rnn_cell, x_split)
Import tensorflow as tf
num_units = 100
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units)
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
rnn_cell = tf.nn.rnn_cell.GRUCell(num_units)

22
Case study 1: MNIST classification
Hyper parameters for implementing a RNN
 Learning rate, training iteration, batch size, etc.
 Time step, the number of RNN neurons
Placeholder and variable tensor preparation
One-hot encoding 된 라벨
“Sequential processing of
non-sequence data”

23
RNN cell 구성
28x28 sample을 28개의 28-dimensional vector로 split
Vanilla RNN: rnn.rnn_cell.BasicRNNCell
Output layer 구성
RNN cell의 neuron 개수
각 category에 속할 추정 확률

24
Define loss and training operation
tf.Session()
Session을 열고 train_op run!

25
Case study 2 (2017년도 하계 최대전력수요 예측, 대한전기학회)
 예측 전략
• 일별 최대전력 수요 예측을 통한 하계 최대전력수요 예측
 알고리즘 개요
• 특별시 및 광역시의 평균 온도를 전력수요 비율로 weighted sum하여 일별 우리나라의 대표 기온 데이터 구성
• 과거 전력/기온데이터를 활용한 RNN/CNN 복합모델 기반의 일별 최대전력수요/기온 예측
• 전력수요 데이터의 특징인 요일과 계절에 따른 주기성을 반영하기 위한 딥러닝 알고리즘 개발

26
 RNN 구조의 학습을 위한 학습 데이터 구성
• 과거 28일간의 전력/온도데이터를 이용하여 향후 28일간의 전력/온도 예측
 Vanilla RNN model
Electricity (E)
Temperature (T)
Training data Test data
A training sample A label data
Time step Output dimension
Fully-connected layer
RNN cell
Input layer
𝑊
𝑊
𝑊
𝑊
𝐸
𝑇
𝑡𝑡 1
Output layer (→ RNN feature)

27
 Seasonal data
• 계절성을 학습에 반영하기 위한 데이터 구성
 계절성을 반영하기 위한 CNN model
Electricity (E)
Temperature (T)
1st sample of the training data
Time step (𝑡𝑠) Output dimension
𝑡𝑡 𝑇
𝑡 2𝑇
𝑡 3𝑇
𝑘 x 𝑡𝑠
𝑋
𝑋
𝑋
𝑋
𝑋
𝑋
2𝑘 x 𝑡𝑠
Convolution layer
(2 x 𝑡𝑠 x 1 x 𝐶𝑁𝑁 𝑑𝑒𝑝𝑡ℎ)
𝑘 x 𝐶𝑁𝑁 𝑑𝑒𝑝𝑡ℎ
Fully-connected layer
CNN feature

28
 RNN과 CNN의 복합 모델
 Training
• Total loss Loss Loss
• Backpropagation via Adam optimizer
CNN feature
(50)
RNN feature
(200)
Fully-connectedlayer
(100)
Outputlayer
Predicted electricity
Outputlayer
Predicted temperature
𝐿𝑜𝑠𝑠
𝐿𝑜𝑠𝑠
RNN cell
(200
Convolutionlayer
(2x28x1x200)
Convolutionlayer
(5x1x200x50)
(100)
Electricity &
Temperature
Seasonal data

29
 2017년 하계 최대 전력 수요 예측 결과: 86,477MW
(2017년도 하계 최대 전력 수요: 86,298MW, 오차: 0.21%)
 Back testing
• 2016.5.31 이전 데이터로 학습하여 2016.6.1 이후 데이터에 대하여 테스트
• Averaged error rate : 2.37%/2.81% (28-day/56-day prediction)

- Properties of RNN: parameter sharing
- Various architectures
- Limitation
- Components of LSTM
- Forget gate, input gate, update, output
Case studies
- MNIST classification
- 2017 하계 최대전력수요 예측
30
Summary

Lecture 7: Recurrent Neural Networks

More Related Content

What's hot (20)

Similar to Lecture 7: Recurrent Neural Networks (20)

More from Sang Jun Lee (7)

Recently uploaded (20)

Lecture 7: Recurrent Neural Networks