SlideShare a Scribd company logo
1
Differentiable Neural
Computers
Presenter: Daewoo Kim
LANADA, KAIST
2
 Graves, Alex, Greg Wayne, and Ivo Danihelka.
"Neural turing machines."
arXiv preprint arXiv:1410.5401 (2014).
 Graves, Alex, et al.
"Hybrid computing using a neural network with
dynamic external memory."
Nature (2016)
Papers
3
 Computer program
 Differentiable Neural Computer
Can Neural Nets Learn Computer Program?
Control Unit
Arithmetic Unit
Input Output
Input Output
4
 “Copy” program
Example 1. Simple Program
Computer program
Input:
“Byunguk has
boyfriend”
Output:
“Byunguk has
boyfriend”
B y u n
g u k
h a s
b o y f
r i e n
d
Memory
array
5https://thumbs.gfycat.com/WelllitInferiorAndeancondor-mobile.mp4
Example 1. Simple Program
Input Output
6
Ex2. Reasoning Problem
Example
Input:
𝑖1. Byunguk has dinner with his boyfriend.
𝑖2. Byunguk has dinner with A.
Q. Who is Byunguk’s boyfriend?Answer. A
Breakfas
t
Byunguk Lunch Boyfrien
d
Dinner Girlfriend
A
Memory in brain
Working Memory System
(use memory smartly)
• How to arrange the information
• Where to write into/read from memory
Breakfas
t
Byunguk Lunch Boyfrien
d
Dinner Girlfriend
A
7
 DNC can be learned to answer reasoning problem
 DNC emulates “working memory system” of human
brain
 DNC is based on RNN (Recurrent Neural Networks)
Ex2. Reasoning Problem
8
 Like human brain  Artificial Neural Network (ANN or
NN)
Neural Network
Output
layer
Hidden
layer
Input
layer
Input #1 Input #2 Input #3
Output #1 Output #2
9
 Learn from sequence data
 Transfer temporal summarization (chain structure)
 The Problem of Long-Term Dependencies
Recurrent Neural Network (RNN)
10
 LTSM is special kind of RNN, capable of learning long-
term dependencies.
 LSTMs also have this chain like RNN, but the repeating
module has a different structure.
LSTM (Long Short-Term Memory)
11
LSTM (Long Short-Term Memory)
Input 𝑥 𝑡−1
Output ℎ 𝑡−1
𝑖 𝑡
𝑜𝑡
𝑐𝑡
Input
Gate
Output
Gate
Cell
Input 𝑥 𝑡
Output ℎ 𝑡
time 𝑡 − 1 time 𝑡
𝑐𝑡−1
𝑐𝑡−1
𝑖 𝑡−1
𝑜𝑡−1
Memory
𝑓𝑡𝑓𝑡−1
𝑐𝑡−2
Forget
Gate
𝑥𝑡 ℎ 𝑡 𝑐𝑡
12
 Memory augmented RNN is proposed in Neural Turing
Machine
 Extend the memory using external memory
 How to read and write memory
– Emulate “working memory” in DNC
Memory augmented RNN
Input 𝑥𝑡−1
Output ℎ 𝑡−1
𝑖 𝑡
𝑜𝑡
𝑐𝑡
Input
Gate
Output
Gate
Cell
Input 𝑥𝑡
Output ℎ 𝑡
time 𝑡 − 1 time 𝑡
𝑐𝑡−1
𝑐𝑡−1
𝑖 𝑡−1
𝑜𝑡−1
Memory
𝑓𝑡𝑓𝑡−1
𝑐𝑡−2
Forget
Gate
Write Read
13
 Modern computers separate computation and external
memory (RAM)
– Computation is performed by a CPU
– CPU can use an addressable memory through virtual memory
 Benefits
– Use of extensible storage
– Treat the contents of memory as variable
 Algorithm generality: to perform same procedure on one datum
or another
Concept of DNC
Control Unit
Arithmetic Unit
14
 DNC consists of 3 parts;
(i) Controller, (ii) Memory interface, (iii) External
memory
Concept of DNC
Overall Architecture of DNC
15
 𝑀𝑡 is 𝑁 × 𝑀 matrix of memory at time 𝑡.
 𝑁 locations in memory, each location has 𝑀 blocks
 Block is minimum unit for reading and writing data
Memory Structure
Location
Block
16
 Selecting locations for reading depends on weightings
 Read vector: output of reading
– Reading weightings: 𝑤𝑡
𝑟
=
where
Read Mechanism
17
 Writing involves (i) erasing and (ii) adding steps.
– Taking inspiration from the forget and input gates in LSTM
 Erasing step
𝑀𝑡 = 𝑀𝑡−1°(𝑬 − 𝒘 𝒕
𝒘
𝒆 𝒕
T
)
– Erase vector: 𝑒𝑡 ∈ 0,1 𝑊
– Writing vector: 𝑤𝑡
𝑤
∈ Δ 𝑁
Write Mechanism
18
 Adding step
– Add vector (𝑣𝑡 ∈ R 𝑊) is added to the memory after erase step
Mt = 𝑀𝑡−1° 𝑬 − 𝒘 𝒕
𝒘
𝒆 𝒕
T
+ 𝒘 𝒕
𝒘
𝒗 𝒕
T
– Add vector (data) and weightings are given by controller
Write Mechanism
19
 How to control the memory
 Every time step
– Receives input vector 𝑥𝑡 from environment
– Emit output vector 𝑦𝑡 that parameterizes a
predictive distribution for a target vector 𝑧𝑡
– Read 𝑅 vectors 𝑟𝑡−1
1
, 𝑟𝑡−1
2
,…, 𝑟𝑡−1
𝑅
from
memory
– Emits an interface vector 𝜉𝑡
which includes key vector (𝒌 𝒕), key
strength (𝛽𝑡) and read policy (𝝅𝒕)
Controller Network
𝑥𝑡
𝑦𝑡
20
 Controller of DNC is based on LSTM
Controller Network
Input 𝑋𝑡−1
ℎ 𝑡−1
𝑖 𝑡
𝑜𝑡
𝑠𝑡
Input
Gate
Output
Gate
Cell
Input 𝑋𝑡
ℎ 𝑡
time 𝑡 − 1 time 𝑡
𝑐𝑡−1
𝑠𝑡−1
𝑖 𝑡−1
𝑜𝑡−1
Memory
𝑓𝑡
𝑓𝑡−1
𝑐𝑡−2
Forget
Gate
Write Read
𝑦𝑡−1, 𝜉𝑡−1 𝑦𝑡, 𝜉𝑡
Locatio
Block
𝑦𝑡−1
Writing weight
𝜉𝑡−1
𝑟𝑡
1
Read weight
𝜉𝑡−1
𝑟𝑡
2
𝑟𝑡
𝑅
…
21
 How are write and read weightings determined ?
Interface and Addressing Mechanism
22
 Content based addressing
– Write similar (related) data at the
similar location
– Each head produces key vector 𝒌 𝒕 of
length 𝑀, included in interface vector
𝜉𝑡
– Generate weight 𝑤𝑡 based on
similarity measure, using ‘key
strength’ 𝛽𝑡
Interface and Addressing Mechanism
Weightings
23
 Reading policy
1. Contents based reading
2. Read head iterates memory in the order they were written
3. Read head iterates in the reverse order
Interface and Addressing Mechanism
Remember the
order they were
written
24
Summary
25
 Synthetic question
answering
– DNC Trained on 20
question types with 10,000
instances
– Error rate: 3.8%
(previous result: 7.5%)
Experiment 1
bAb1 dataset
Playg
round
John
Foot
ball
1 John is in the playground.
2 John picked up the football
3 Where is the football? playground
4 Sheep are afraid of wolves
5 Gertrude is a sheep
6 Mice are afraid of cats
7 What is Gertrude afraid of? Wolves
Be in
26
Experiment 2: Graph Problem
27
Experiment 3: Graph Problem

More Related Content

PDF
Demystifying Differentiable Neural Computers and Their Brain Inspired Origin...
PDF
Meta learning with memory augmented neural network
PDF
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
PPTX
MAC-Message Authentication Codes
PPTX
Classification and Clustering
PPTX
Hash Function
PDF
Data Mining: Association Rules Basics
PPTX
Association Analysis in Data Mining
Demystifying Differentiable Neural Computers and Their Brain Inspired Origin...
Meta learning with memory augmented neural network
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
MAC-Message Authentication Codes
Classification and Clustering
Hash Function
Data Mining: Association Rules Basics
Association Analysis in Data Mining

What's hot (20)

PPT
Machine Learning Ch 1.ppt
PPT
Socket System Calls
PDF
Spam Filtering
PDF
PAC Learning
PPTX
Hyperparameter Tuning
PPTX
K-Means Clustering Algorithm.pptx
PPTX
Introduction to Machine Learning
PDF
Machine Learning ebook.pdf
PDF
Adaptive Machine Learning for Credit Card Fraud Detection
PDF
Machine Learning Model Deployment: Strategy to Implementation
PPT
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
PPTX
Presentation on K-Means Clustering
PPTX
Pgp pretty good privacy
PDF
Performance Metrics for Machine Learning Algorithms
PDF
Vc dimension in Machine Learning
PPTX
Unsupervised learning (clustering)
PPT
Cluster analysis
PDF
Big data and machine learning for Businesses
PPTX
Public Key Cryptosystem
Machine Learning Ch 1.ppt
Socket System Calls
Spam Filtering
PAC Learning
Hyperparameter Tuning
K-Means Clustering Algorithm.pptx
Introduction to Machine Learning
Machine Learning ebook.pdf
Adaptive Machine Learning for Credit Card Fraud Detection
Machine Learning Model Deployment: Strategy to Implementation
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Presentation on K-Means Clustering
Pgp pretty good privacy
Performance Metrics for Machine Learning Algorithms
Vc dimension in Machine Learning
Unsupervised learning (clustering)
Cluster analysis
Big data and machine learning for Businesses
Public Key Cryptosystem
Ad

Similar to Differential Neural Computers (20)

PPTX
Computer Architecture
PPTX
Unit2fit
PPT
Brief Introduction.ppt
PDF
One shot learning - deep learning ( meta learn )
PDF
Episodic Memory Reader: Learning What to Remember for Question Answering from...
PDF
Computer science class 11th, kseeb notes
PPTX
ECE_Lec 1,2_ Fundamentals of computer.pptx
PPT
03. top level view of computer function & interconnection
PDF
Computer organization and architecture|KTU
PDF
Computer organization and architecture|KTU
PPTX
5.6 Basic computer structure microprocessors
PPT
Ch01 .pptssysueueueueu65egegeg3f3geye6d6yeueu
PDF
Ch1Intro.pdf Computer organization and org.
PPT
Lecture1 ch01
PDF
Supplementary Reading 01 - Introduction to computers, programs and java.pdf
PPTX
Introduction to computer components
PPT
CA UNIT I PPT.ppt
PPTX
cc112 prog 1 topic 1.pptx
PPT
AI-CH5 (ANN) - Artificial Neural Network
Computer Architecture
Unit2fit
Brief Introduction.ppt
One shot learning - deep learning ( meta learn )
Episodic Memory Reader: Learning What to Remember for Question Answering from...
Computer science class 11th, kseeb notes
ECE_Lec 1,2_ Fundamentals of computer.pptx
03. top level view of computer function & interconnection
Computer organization and architecture|KTU
Computer organization and architecture|KTU
5.6 Basic computer structure microprocessors
Ch01 .pptssysueueueueu65egegeg3f3geye6d6yeueu
Ch1Intro.pdf Computer organization and org.
Lecture1 ch01
Supplementary Reading 01 - Introduction to computers, programs and java.pdf
Introduction to computer components
CA UNIT I PPT.ppt
cc112 prog 1 topic 1.pptx
AI-CH5 (ANN) - Artificial Neural Network
Ad

Recently uploaded (20)

PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Total quality management ppt for engineering students
PPTX
Current and future trends in Computer Vision.pptx
PDF
737-MAX_SRG.pdf student reference guides
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
communication and presentation skills 01
PDF
PPT on Performance Review to get promotions
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPTX
UNIT - 3 Total quality Management .pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
III.4.1.2_The_Space_Environment.p pdffdf
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Total quality management ppt for engineering students
Current and future trends in Computer Vision.pptx
737-MAX_SRG.pdf student reference guides
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
communication and presentation skills 01
PPT on Performance Review to get promotions
Safety Seminar civil to be ensured for safe working.
Visual Aids for Exploratory Data Analysis.pdf
UNIT - 3 Total quality Management .pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
Categorization of Factors Affecting Classification Algorithms Selection
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Fundamentals of safety and accident prevention -final (1).pptx
Soil Improvement Techniques Note - Rabbi
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

Differential Neural Computers

  • 2. 2  Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).  Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature (2016) Papers
  • 3. 3  Computer program  Differentiable Neural Computer Can Neural Nets Learn Computer Program? Control Unit Arithmetic Unit Input Output Input Output
  • 4. 4  “Copy” program Example 1. Simple Program Computer program Input: “Byunguk has boyfriend” Output: “Byunguk has boyfriend” B y u n g u k h a s b o y f r i e n d Memory array
  • 6. 6 Ex2. Reasoning Problem Example Input: 𝑖1. Byunguk has dinner with his boyfriend. 𝑖2. Byunguk has dinner with A. Q. Who is Byunguk’s boyfriend?Answer. A Breakfas t Byunguk Lunch Boyfrien d Dinner Girlfriend A Memory in brain Working Memory System (use memory smartly) • How to arrange the information • Where to write into/read from memory Breakfas t Byunguk Lunch Boyfrien d Dinner Girlfriend A
  • 7. 7  DNC can be learned to answer reasoning problem  DNC emulates “working memory system” of human brain  DNC is based on RNN (Recurrent Neural Networks) Ex2. Reasoning Problem
  • 8. 8  Like human brain  Artificial Neural Network (ANN or NN) Neural Network Output layer Hidden layer Input layer Input #1 Input #2 Input #3 Output #1 Output #2
  • 9. 9  Learn from sequence data  Transfer temporal summarization (chain structure)  The Problem of Long-Term Dependencies Recurrent Neural Network (RNN)
  • 10. 10  LTSM is special kind of RNN, capable of learning long- term dependencies.  LSTMs also have this chain like RNN, but the repeating module has a different structure. LSTM (Long Short-Term Memory)
  • 11. 11 LSTM (Long Short-Term Memory) Input 𝑥 𝑡−1 Output ℎ 𝑡−1 𝑖 𝑡 𝑜𝑡 𝑐𝑡 Input Gate Output Gate Cell Input 𝑥 𝑡 Output ℎ 𝑡 time 𝑡 − 1 time 𝑡 𝑐𝑡−1 𝑐𝑡−1 𝑖 𝑡−1 𝑜𝑡−1 Memory 𝑓𝑡𝑓𝑡−1 𝑐𝑡−2 Forget Gate 𝑥𝑡 ℎ 𝑡 𝑐𝑡
  • 12. 12  Memory augmented RNN is proposed in Neural Turing Machine  Extend the memory using external memory  How to read and write memory – Emulate “working memory” in DNC Memory augmented RNN Input 𝑥𝑡−1 Output ℎ 𝑡−1 𝑖 𝑡 𝑜𝑡 𝑐𝑡 Input Gate Output Gate Cell Input 𝑥𝑡 Output ℎ 𝑡 time 𝑡 − 1 time 𝑡 𝑐𝑡−1 𝑐𝑡−1 𝑖 𝑡−1 𝑜𝑡−1 Memory 𝑓𝑡𝑓𝑡−1 𝑐𝑡−2 Forget Gate Write Read
  • 13. 13  Modern computers separate computation and external memory (RAM) – Computation is performed by a CPU – CPU can use an addressable memory through virtual memory  Benefits – Use of extensible storage – Treat the contents of memory as variable  Algorithm generality: to perform same procedure on one datum or another Concept of DNC Control Unit Arithmetic Unit
  • 14. 14  DNC consists of 3 parts; (i) Controller, (ii) Memory interface, (iii) External memory Concept of DNC Overall Architecture of DNC
  • 15. 15  𝑀𝑡 is 𝑁 × 𝑀 matrix of memory at time 𝑡.  𝑁 locations in memory, each location has 𝑀 blocks  Block is minimum unit for reading and writing data Memory Structure Location Block
  • 16. 16  Selecting locations for reading depends on weightings  Read vector: output of reading – Reading weightings: 𝑤𝑡 𝑟 = where Read Mechanism
  • 17. 17  Writing involves (i) erasing and (ii) adding steps. – Taking inspiration from the forget and input gates in LSTM  Erasing step 𝑀𝑡 = 𝑀𝑡−1°(𝑬 − 𝒘 𝒕 𝒘 𝒆 𝒕 T ) – Erase vector: 𝑒𝑡 ∈ 0,1 𝑊 – Writing vector: 𝑤𝑡 𝑤 ∈ Δ 𝑁 Write Mechanism
  • 18. 18  Adding step – Add vector (𝑣𝑡 ∈ R 𝑊) is added to the memory after erase step Mt = 𝑀𝑡−1° 𝑬 − 𝒘 𝒕 𝒘 𝒆 𝒕 T + 𝒘 𝒕 𝒘 𝒗 𝒕 T – Add vector (data) and weightings are given by controller Write Mechanism
  • 19. 19  How to control the memory  Every time step – Receives input vector 𝑥𝑡 from environment – Emit output vector 𝑦𝑡 that parameterizes a predictive distribution for a target vector 𝑧𝑡 – Read 𝑅 vectors 𝑟𝑡−1 1 , 𝑟𝑡−1 2 ,…, 𝑟𝑡−1 𝑅 from memory – Emits an interface vector 𝜉𝑡 which includes key vector (𝒌 𝒕), key strength (𝛽𝑡) and read policy (𝝅𝒕) Controller Network 𝑥𝑡 𝑦𝑡
  • 20. 20  Controller of DNC is based on LSTM Controller Network Input 𝑋𝑡−1 ℎ 𝑡−1 𝑖 𝑡 𝑜𝑡 𝑠𝑡 Input Gate Output Gate Cell Input 𝑋𝑡 ℎ 𝑡 time 𝑡 − 1 time 𝑡 𝑐𝑡−1 𝑠𝑡−1 𝑖 𝑡−1 𝑜𝑡−1 Memory 𝑓𝑡 𝑓𝑡−1 𝑐𝑡−2 Forget Gate Write Read 𝑦𝑡−1, 𝜉𝑡−1 𝑦𝑡, 𝜉𝑡 Locatio Block 𝑦𝑡−1 Writing weight 𝜉𝑡−1 𝑟𝑡 1 Read weight 𝜉𝑡−1 𝑟𝑡 2 𝑟𝑡 𝑅 …
  • 21. 21  How are write and read weightings determined ? Interface and Addressing Mechanism
  • 22. 22  Content based addressing – Write similar (related) data at the similar location – Each head produces key vector 𝒌 𝒕 of length 𝑀, included in interface vector 𝜉𝑡 – Generate weight 𝑤𝑡 based on similarity measure, using ‘key strength’ 𝛽𝑡 Interface and Addressing Mechanism Weightings
  • 23. 23  Reading policy 1. Contents based reading 2. Read head iterates memory in the order they were written 3. Read head iterates in the reverse order Interface and Addressing Mechanism Remember the order they were written
  • 25. 25  Synthetic question answering – DNC Trained on 20 question types with 10,000 instances – Error rate: 3.8% (previous result: 7.5%) Experiment 1 bAb1 dataset Playg round John Foot ball 1 John is in the playground. 2 John picked up the football 3 Where is the football? playground 4 Sheep are afraid of wolves 5 Gertrude is a sheep 6 Mice are afraid of cats 7 What is Gertrude afraid of? Wolves Be in

Editor's Notes

  • #3: p2 Deep mind publish these two papers There are two reasons that I choose these papers The second reason is that The author says DNC can solve some graph problem. So I hope that DNC will be used as a tool for solving the problem discussed in our lab.
  • #4: P3 The main question of these papers is To answer this question, let’s see how the computer program works. In computer program, there is a source code. And CPU control the memory and conduct arithmetic process. And when the input is given, then it emits the outputs. What this paper want to do is learn this computer program using neural networks. DNC learns program training with input and output of the program without any further information
  • #5: P4 Here is simple example of this. Let’s consider the copy program. For given input string, copy and print out the same string. Here is the code I wrote last night. In this program, write char in to memory incremental manner like this, and print out string by reading from memory in same order. This is simplest way to do this I think.
  • #6: P5 This shows how DNC work. Based on input and output training set, it updates the param of nn. Here is the video. This shows the memory map used by DNC during the experi. DNC does not know the this program is copy program but it works exactly same as the program I wrote.
  • #7: P6 This is the another example shows the feature of DNC. Reasoning problem. For the human, it is very easy to answer, however, it is difficult to write the program to answer this question. That is mainly because of the way to memorize information by the human’s brain. When this kind of problem is given, human think and memorize based on working memory system. This term comes from neuro science. This example shows how does it work. There is some data on the memory of brain. And when the new information comes, brain re-generate the data and write into multiple area which is related
  • #8: P7 How to use memory smartly. There is long story why this system adopt RNN, but I’ll skip this. Instead, I’ll briefly explain what is RNN to figure out the position of these papers
  • #9: P8 Every body knows the NN Back propagation. 인간과 비슷한 성능을 내는 시스템을 만들기 위해, human brain을 모방하는 machine learning model을 만들고자 함
  • #13: We can say DNC is advance version of RNN. Usage weight: http://www.modulabs.co.kr/DeepLAB_library/11115 Random Access