[246]reasoning, attention and memory toward differentiable reasoning machines

Reasoning, Attention and Memory
toward differentiable reasoning machines
Name : Julien Perez
Team: Machine Learning and Optimization

CONTENTS
1. Introduction
1. The differentiable programming paradigm
2. Memory and attention for reasoning
2. Language and reasoning tasks
1. Machine reading
2. Dialog state tracking
3. End-to-end dialog learning
3. Further work

1.
Toward differentiable
machines

Differentiable machines
Input
Process
Output
Backend
Fig1: Architecture of a program
Computer program: “A sequence of instructions,
written to perform a specified task with a computer.”
Wikipedia
Place of Machine Learning within Computer Science
1. The application is too complex for people to
manually design the algorithm
2. The application requires that the software customize
itself to its operational environment after being
fielded

Definition
• Input space(s)
• Output space(s)
• Topology // Parameters
• Optimizable // Differentiable decision function
Properties
• Identifiable capabilities
• Evaluated using error measurement on tasks
• Each discipline has currently developed their own
models using this paradigm (Computer Vision, NLP,
ASR …)
Differentiable machines
Fig2: Architecture of a differentiable program
Input
Differentiable function
Output
Backend

Why attention and memory?
Input
Output
Backend
Long term memories - attending to memories
• Dealing with gradient vanishing problem
• Build sufficient decision support
Overcoming computational limits for large data
• Focusing only on relevant parts of the inputs
• Scalability independent of the size of the inputs
Adds interpretability to the models

Applications
1. Machine reading
2. Dialog state tracking
3. End-to-End dialog learning
Input
Output
Backend

Machine Reading
Definition
“A machine comprehends a passage of text if, for any question regarding that text,
it can be answered correctly by a majority of native speakers.
The machine can provide a string which human readers would agree both
1. Answers that question
2. Does not contain information irrelevant to that question.”
Applications
• Information extraction from collection of documents
• Social media opinion mining
• Security/surveillance on the web
• End-to-End Dialog systems

Document
James was always getting in trouble. His aunt Jane tried as
hard as she could to keep him out of trouble, but he was
sneaky and got into lots of trouble behind her back. He
went to the grocery store and pulled all the pudding off the
shelves and ate two jars. Then he walked to the fast food
restaurant and ordered 15 bags of fries. He didn't pay, and
instead headed home.
Question: Where did James go after he went to the
grocery store?
• his deck
• his freezer
• a fast food restaurant
• his home
Document
The BBC producer allegedly struck by Jeremy
Clarkson will not press charges against the “Top
Gear” host, his lawyer said Friday. Clarkson, who
hosted one of the most-watched television shows in
the world, was dropped by the BBC Wednesday after
an internal investigation by the British broadcaster
found he had subjected producer Oisin Tymon to an
unprovoked physical and verbal attack."
Question: Producer X will not press charges
against Jeremy Clarkson, his lawyer says
Answer: Oisin Tymon
Machine Reading
as Ranking and Cloze style queries
[1] Teaching Machines to Read and Comprehend, Blunsom et al, 2015
[2] Text as knowledge bases, Manning et al, 2016

Machine Reading
CNN dataset
[3] Teaching Machines to Read and Comprehend, Blunsom et al 2015
The CNN and DailyMail websites
provide paraphrase summary
sentences for each full news story.
Articles were collected from April 2007
for CNN and June 2010 for the Daily
Mail, until the end of April 2015.
Validation data is from March, test
data from April 2015.

Machine Reading
Deep Long Short Term Memory readers

Machine Reading
Attention Sum Reader Network
[5] Text Understanding with the Attention Sum Reader Network, Kadlec et al, 2016

Machine Reading
Competent statistical NLP
Featured Logistic Regression
• Whether e is in the passage
• Whether e is in the question
• Frequency of e in passage
• First position of e in passage
• n-gram exact match
• Syntactic dependency around e
• The required reasoning and inference level is can be limited
• There isn’t much room left for improvement
• However, the scale and ease of data production is appealing
• Not yet proven whether NNs can do more challenging RC tasks
[6] Texts as Knowledge Bases, Manning et al, 2016

Beyond structure extraction
– Much of this information comes in the form of
unstructured text which cannot easily be
searched, mined, visualized or, ultimately,
acted upon.
– Textual data can specify reasoning capabilities
– Goal: build machines that can "understand"
textual information, i.e. converting it into
interpretable structured knowledge to be
leveraged by humans and other machines
alike.
– Reasoning capability is a frontier of current
ML approaches
[7] Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks, Weston et al, 2015

Memory Networks
• Class of models that combine large memory with a learning
component that can read and write to it.
• Most current deep learning models have limited memory for
“low level” tasks completion e.g. object detection.
• Incorporates reasoning with attention over memory (RAM).
[8] End-To-End Memory Networks, Sukhbaatar et al, 2016

End-to-End Memory Network
Optimization task
• Categorical cross-entropy
• Stochastic Gradient Descent with clipping
• Grid-searched Hyper Parameters
deterministic controller
update
[8] End-To-End Memory Networks, Sukhbaatar et al, 2016

Gated End-to-End Memory Network
Properties
• End-to-End memory access regulation
• Close to Highway Network and Residual Network
[9] Gated End-to-End Memory Network, Fei Liu and Julien Perez, EACL 2017
gated controller
update

Benchmark results
20 bAbi tasks Naver Labs Europe results

Dialog systems design
Dialog state tracking
• Central module of a dialog system
• Requires a large volume of annotations
• Provide interpretability to the dialog policy
Limitations
• Longer context handling
• Looser supervision schema
• Reasoning capability
[10] The Dialog State Tracking Challenge Series: A Review, Williams et al, 2016

End-to-End Memory Network for dialog
Dialog state tracking as machine reading
On “one supporting fact” task (DSTC-2 dataset) we obtained 83% accuracy vs 79%
for current state of the art.
[11] Dialog State Tracking, a machine reading approach, Julien Perez and Fei Liu, EACL 2017

Learning dialog from dialogs
• 5 dialog tasks with/without OOV
• 1 DSTC-2 end-to-end dialog task
Goal oriented dialog
• Learn on synthetic + real dialogs
• Backed with a Knowledge Base
Memory Networks
• End-to-End learnable and flexible
• Non-parametric memory due to attention
• KB-fact and utterance support of the decision
• Dialog as a Machine Reading task
End-to-End Memory Network for dialog
FAIR Dialog tasks
[12] Learning End-to-End Goal-Oriented Dialog, Bordes and Weston, 2016
[13] Dialog state tracking challenge 6, task-1, Bourreau, Perez and Bordes, 2017

End-to-End dialog management
October 16, 2017
Performances on FAIR End-to-End dialog tasks
Naver Labs Europe results

visualizations
• Memory access patterns can be visualized
• Attention as a tool to interpret model’s decision
Gated Memory Access Regulation

Conclusion
Toward automation of repetitive cognitive tasks
– End-to-end training of the reasoning capability
– Open and very dynamic field of research
Lack of theoretical analysis
– Optimization algorithm convergence
– Nature of the loss surface with respect to parameters
– Learnability // Safety
Limitation of current learning procedures
– Active // Interactive Learning
– Curriculum learning
– Regularization strategies

[246]reasoning, attention and memory toward differentiable reasoning machines

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to [246]reasoning, attention and memory toward differentiable reasoning machines (20)

More from NAVER D2 (20)

Recently uploaded (20)

[246]reasoning, attention and memory toward differentiable reasoning machines