Differential Neural Computers

1
Differentiable Neural
Computers
Presenter: Daewoo Kim
LANADA, KAIST

2
 Graves, Alex, Greg Wayne, and Ivo Danihelka.
"Neural turing machines."
arXiv preprint arXiv:1410.5401 (2014).
 Graves, Alex, et al.
"Hybrid computing using a neural network with
dynamic external memory."
Nature (2016)
Papers

3
 Computer program
 Differentiable Neural Computer
Can Neural Nets Learn Computer Program?
Control Unit
Arithmetic Unit
Input Output
Input Output

4
 “Copy” program
Example 1. Simple Program
Computer program
Input:
“Byunguk has
boyfriend”
Output:
“Byunguk has
boyfriend”
B y u n
g u k
h a s
b o y f
r i e n
d
Memory
array

5https://thumbs.gfycat.com/WelllitInferiorAndeancondor-mobile.mp4
Example 1. Simple Program
Input Output

6
Ex2. Reasoning Problem
Example
Input:
𝑖1. Byunguk has dinner with his boyfriend.
𝑖2. Byunguk has dinner with A.
Q. Who is Byunguk’s boyfriend?Answer. A
Breakfas
t
Byunguk Lunch Boyfrien
d
Dinner Girlfriend
A
Memory in brain
Working Memory System
(use memory smartly)
• How to arrange the information
• Where to write into/read from memory
Breakfas
t
Byunguk Lunch Boyfrien
d
Dinner Girlfriend
A

7
 DNC can be learned to answer reasoning problem
 DNC emulates “working memory system” of human
brain
 DNC is based on RNN (Recurrent Neural Networks)
Ex2. Reasoning Problem

8
 Like human brain  Artificial Neural Network (ANN or
NN)
Neural Network
Output
layer
Hidden
layer
Input
layer
Input #1 Input #2 Input #3
Output #1 Output #2

9
 Learn from sequence data
 Transfer temporal summarization (chain structure)
 The Problem of Long-Term Dependencies
Recurrent Neural Network (RNN)

10
 LTSM is special kind of RNN, capable of learning long-
term dependencies.
 LSTMs also have this chain like RNN, but the repeating
module has a different structure.
LSTM (Long Short-Term Memory)

11
LSTM (Long Short-Term Memory)
Input 𝑥 𝑡−1
Output ℎ 𝑡−1
𝑖 𝑡
𝑜𝑡
𝑐𝑡
Input
Gate
Output
Gate
Cell
Input 𝑥 𝑡
Output ℎ 𝑡
time 𝑡 − 1 time 𝑡
𝑐𝑡−1
𝑐𝑡−1
𝑖 𝑡−1
𝑜𝑡−1
Memory
𝑓𝑡𝑓𝑡−1
𝑐𝑡−2
Forget
Gate
𝑥𝑡 ℎ 𝑡 𝑐𝑡

12
 Memory augmented RNN is proposed in Neural Turing
Machine
 Extend the memory using external memory
 How to read and write memory
– Emulate “working memory” in DNC
Memory augmented RNN
Input 𝑥𝑡−1
Output ℎ 𝑡−1
𝑖 𝑡
𝑜𝑡
𝑐𝑡
Input
Gate
Output
Gate
Cell
Input 𝑥𝑡
Output ℎ 𝑡
𝑐𝑡−1
𝑐𝑡−1
𝑖 𝑡−1
𝑜𝑡−1
Memory
𝑓𝑡𝑓𝑡−1
𝑐𝑡−2
Forget
Gate
Write Read

13
 Modern computers separate computation and external
memory (RAM)
– Computation is performed by a CPU
– CPU can use an addressable memory through virtual memory
 Benefits
– Use of extensible storage
– Treat the contents of memory as variable
 Algorithm generality: to perform same procedure on one datum
or another
Concept of DNC
Control Unit
Arithmetic Unit

14
 DNC consists of 3 parts;
(i) Controller, (ii) Memory interface, (iii) External
memory
Concept of DNC
Overall Architecture of DNC

15
 𝑀𝑡 is 𝑁 × 𝑀 matrix of memory at time 𝑡.
 𝑁 locations in memory, each location has 𝑀 blocks
 Block is minimum unit for reading and writing data
Memory Structure
Location
Block

16
 Selecting locations for reading depends on weightings
 Read vector: output of reading
– Reading weightings: 𝑤𝑡
𝑟
=
where
Read Mechanism

17
 Writing involves (i) erasing and (ii) adding steps.
– Taking inspiration from the forget and input gates in LSTM
 Erasing step
𝑀𝑡 = 𝑀𝑡−1°(𝑬 − 𝒘 𝒕
𝒘
𝒆 𝒕
T
)
– Erase vector: 𝑒𝑡 ∈ 0,1 𝑊
– Writing vector: 𝑤𝑡
𝑤
∈ Δ 𝑁
Write Mechanism

18
 Adding step
– Add vector (𝑣𝑡 ∈ R 𝑊) is added to the memory after erase step
Mt = 𝑀𝑡−1° 𝑬 − 𝒘 𝒕
𝒘
𝒆 𝒕
T
+ 𝒘 𝒕
𝒘
𝒗 𝒕
T
– Add vector (data) and weightings are given by controller
Write Mechanism

19
 How to control the memory
 Every time step
– Receives input vector 𝑥𝑡 from environment
– Emit output vector 𝑦𝑡 that parameterizes a
predictive distribution for a target vector 𝑧𝑡
– Read 𝑅 vectors 𝑟𝑡−1
1
, 𝑟𝑡−1
2
,…, 𝑟𝑡−1
𝑅
from
memory
– Emits an interface vector 𝜉𝑡
which includes key vector (𝒌 𝒕), key
strength (𝛽𝑡) and read policy (𝝅𝒕)
Controller Network
𝑥𝑡
𝑦𝑡

20
 Controller of DNC is based on LSTM
Controller Network
Input 𝑋𝑡−1
ℎ 𝑡−1
𝑖 𝑡
𝑜𝑡
𝑠𝑡
Input
Gate
Output
Gate
Cell
Input 𝑋𝑡
ℎ 𝑡
𝑐𝑡−1
𝑠𝑡−1
𝑖 𝑡−1
𝑜𝑡−1
Memory
𝑓𝑡
𝑓𝑡−1
𝑐𝑡−2
Forget
Gate
Write Read
𝑦𝑡−1, 𝜉𝑡−1 𝑦𝑡, 𝜉𝑡
Locatio
Block
𝑦𝑡−1
Writing weight
𝜉𝑡−1
𝑟𝑡
1
Read weight
𝜉𝑡−1
𝑟𝑡
2
𝑟𝑡
𝑅
…

21
 How are write and read weightings determined ?
Interface and Addressing Mechanism

22
 Content based addressing
– Write similar (related) data at the
similar location
– Each head produces key vector 𝒌 𝒕 of
length 𝑀, included in interface vector
𝜉𝑡
– Generate weight 𝑤𝑡 based on
similarity measure, using ‘key
strength’ 𝛽𝑡
Weightings

23
 Reading policy
1. Contents based reading
2. Read head iterates memory in the order they were written
3. Read head iterates in the reverse order
Remember the
order they were
written

25
 Synthetic question
answering
– DNC Trained on 20
question types with 10,000
instances
– Error rate: 3.8%
(previous result: 7.5%)
Experiment 1
bAb1 dataset
Playg
round
John
Foot
ball
1 John is in the playground.
2 John picked up the football
3 Where is the football? playground
4 Sheep are afraid of wolves
5 Gertrude is a sheep
6 Mice are afraid of cats
7 What is Gertrude afraid of? Wolves
Be in

26
Experiment 2: Graph Problem

27
Experiment 3: Graph Problem

Differential Neural Computers

More Related Content

What's hot (20)

Similar to Differential Neural Computers (20)

Recently uploaded (20)

Differential Neural Computers

Editor's Notes