SlideShare a Scribd company logo
Reasoning, Attention and Memory
toward differentiable reasoning machines
Name : Julien Perez
Team: Machine Learning and Optimization
CONTENTS
1. Introduction
1. The differentiable programming paradigm
2. Memory and attention for reasoning
2. Language and reasoning tasks
1. Machine reading
2. Dialog state tracking
3. End-to-end dialog learning
3. Further work
1.
Toward differentiable
machines
Differentiable machines
Input
Process
Output
Backend
Fig1: Architecture of a program
Computer program: “A sequence of instructions,
written to perform a specified task with a computer.”
Wikipedia
Place of Machine Learning within Computer Science
1. The application is too complex for people to
manually design the algorithm
2. The application requires that the software customize
itself to its operational environment after being
fielded
Definition
• Input space(s)
• Output space(s)
• Topology // Parameters
• Optimizable // Differentiable decision function
Properties
• Identifiable capabilities
• Evaluated using error measurement on tasks
• Each discipline has currently developed their own
models using this paradigm (Computer Vision, NLP,
ASR …)
Differentiable machines
Fig2: Architecture of a differentiable program
Input
Differentiable function
Output
Backend
Why attention and memory?
Input
Differentiable function
Output
Backend
Fig2: Architecture of a differentiable program
Long term memories - attending to memories
• Dealing with gradient vanishing problem
• Build sufficient decision support
Overcoming computational limits for large data
• Focusing only on relevant parts of the inputs
• Scalability independent of the size of the inputs
Adds interpretability to the models
Applications
1. Machine reading
2. Dialog state tracking
3. End-to-End dialog learning
Input
Differentiable function
Output
Backend
Fig2: Architecture of a differentiable program
2.
Machine Reading
Machine Reading
Definition
“A machine comprehends a passage of text if, for any question regarding that text,
it can be answered correctly by a majority of native speakers.
The machine can provide a string which human readers would agree both
1. Answers that question
2. Does not contain information irrelevant to that question.”
Applications
• Information extraction from collection of documents
• Social media opinion mining
• Security/surveillance on the web
• End-to-End Dialog systems
Document
James was always getting in trouble. His aunt Jane tried as
hard as she could to keep him out of trouble, but he was
sneaky and got into lots of trouble behind her back. He
went to the grocery store and pulled all the pudding off the
shelves and ate two jars. Then he walked to the fast food
restaurant and ordered 15 bags of fries. He didn't pay, and
instead headed home.
Question: Where did James go after he went to the
grocery store?
• his deck
• his freezer
• a fast food restaurant
• his home
Document
The BBC producer allegedly struck by Jeremy
Clarkson will not press charges against the “Top
Gear” host, his lawyer said Friday. Clarkson, who
hosted one of the most-watched television shows in
the world, was dropped by the BBC Wednesday after
an internal investigation by the British broadcaster
found he had subjected producer Oisin Tymon to an
unprovoked physical and verbal attack."
Question: Producer X will not press charges
against Jeremy Clarkson, his lawyer says
Answer: Oisin Tymon
Machine Reading
as Ranking and Cloze style queries
[1] Teaching Machines to Read and Comprehend, Blunsom et al, 2015
[2] Text as knowledge bases, Manning et al, 2016
Machine Reading
CNN dataset
[3] Teaching Machines to Read and Comprehend, Blunsom et al 2015
The CNN and DailyMail websites
provide paraphrase summary
sentences for each full news story.
Articles were collected from April 2007
for CNN and June 2010 for the Daily
Mail, until the end of April 2015.
Validation data is from March, test
data from April 2015.
Machine Reading
Deep Long Short Term Memory readers
Machine Reading
Attention Sum Reader Network
[5] Text Understanding with the Attention Sum Reader Network, Kadlec et al, 2016
Machine Reading
Competent statistical NLP
Featured Logistic Regression
• Whether e is in the passage
• Whether e is in the question
• Frequency of e in passage
• First position of e in passage
• n-gram exact match
• Syntactic dependency around e
• The required reasoning and inference level is can be limited
• There isn’t much room left for improvement
• However, the scale and ease of data production is appealing
• Not yet proven whether NNs can do more challenging RC tasks
[6] Texts as Knowledge Bases, Manning et al, 2016
Beyond structure extraction
– Much of this information comes in the form of
unstructured text which cannot easily be
searched, mined, visualized or, ultimately,
acted upon.
– Textual data can specify reasoning capabilities
– Goal: build machines that can "understand"
textual information, i.e. converting it into
interpretable structured knowledge to be
leveraged by humans and other machines
alike.
– Reasoning capability is a frontier of current
ML approaches
[7] Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks, Weston et al, 2015
Memory Networks
• Class of models that combine large memory with a learning
component that can read and write to it.
• Most current deep learning models have limited memory for
“low level” tasks completion e.g. object detection.
• Incorporates reasoning with attention over memory (RAM).
[8] End-To-End Memory Networks, Sukhbaatar et al, 2016
End-to-End Memory Network
Optimization task
• Categorical cross-entropy
• Stochastic Gradient Descent with clipping
• Grid-searched Hyper Parameters
deterministic controller
update
[8] End-To-End Memory Networks, Sukhbaatar et al, 2016
Gated End-to-End Memory Network
Properties
• End-to-End memory access regulation
• Close to Highway Network and Residual Network
[9] Gated End-to-End Memory Network, Fei Liu and Julien Perez, EACL 2017
gated controller
update
Benchmark results
20 bAbi tasks Naver Labs Europe results
3.
Dialog State Tracking
Dialog systems design
Dialog state tracking
• Central module of a dialog system
• Requires a large volume of annotations
• Provide interpretability to the dialog policy
Limitations
• Longer context handling
• Looser supervision schema
• Reasoning capability
[10] The Dialog State Tracking Challenge Series: A Review, Williams et al, 2016
End-to-End Memory Network for dialog
Dialog state tracking as machine reading
On “one supporting fact” task (DSTC-2 dataset) we obtained 83% accuracy vs 79%
for current state of the art.
[11] Dialog State Tracking, a machine reading approach, Julien Perez and Fei Liu, EACL 2017
3.
End-to-End Dialog Learning
Learning dialog from dialogs
• 5 dialog tasks with/without OOV
• 1 DSTC-2 end-to-end dialog task
Goal oriented dialog
• Learn on synthetic + real dialogs
• Backed with a Knowledge Base
Memory Networks
• End-to-End learnable and flexible
• Non-parametric memory due to attention
• KB-fact and utterance support of the decision
• Dialog as a Machine Reading task
End-to-End Memory Network for dialog
FAIR Dialog tasks
[12] Learning End-to-End Goal-Oriented Dialog, Bordes and Weston, 2016
[13] Dialog state tracking challenge 6, task-1, Bourreau, Perez and Bordes, 2017
Gated End-to-End Memory Network
End-to-End dialog management
October 16, 2017
Performances on FAIR End-to-End dialog tasks
Naver Labs Europe results
Gated End-to-End Memory Network
visualizations
• Memory access patterns can be visualized
• Attention as a tool to interpret model’s decision
Gated Memory Access Regulation
Conclusion
Toward automation of repetitive cognitive tasks
– End-to-end training of the reasoning capability
– Open and very dynamic field of research
Lack of theoretical analysis
– Optimization algorithm convergence
– Nature of the loss surface with respect to parameters
– Learnability // Safety
Limitation of current learning procedures
– Active // Interactive Learning
– Curriculum learning
– Regularization strategies
Q & A
Thank you

More Related Content

PPTX
Deep learning
PDF
Introduction to Deep Learning and neon at Galvanize
PDF
[212]big models without big data using domain specific deep networks in data-...
PDF
Deep Learning as a Cat/Dog Detector
PPTX
Squeezing Deep Learning Into Mobile Phones
PPTX
Deep Learning with Python (PyData Seattle 2015)
PPTX
Android and Deep Learning
PDF
ODSC West
Deep learning
Introduction to Deep Learning and neon at Galvanize
[212]big models without big data using domain specific deep networks in data-...
Deep Learning as a Cat/Dog Detector
Squeezing Deep Learning Into Mobile Phones
Deep Learning with Python (PyData Seattle 2015)
Android and Deep Learning
ODSC West

What's hot (20)

PDF
Recent developments in Deep Learning
PDF
Moving Toward Deep Learning Algorithms on HPCC Systems
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PDF
Improving Hardware Efficiency for DNN Applications
PDF
Startup.Ml: Using neon for NLP and Localization Applications
PDF
Nervana and the Future of Computing
PPTX
An introduction to Machine Learning (and a little bit of Deep Learning)
PPTX
Deep learning on mobile - 2019 Practitioner's Guide
PDF
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
PPTX
Deep Learning Made Easy with Deep Features
PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
Introduction to deep learning @ Startup.ML by Andres Rodriguez
PDF
Using Deep Learning to do Real-Time Scoring in Practical Applications
PDF
Deep Learning through Examples
PDF
Urs Köster - Convolutional and Recurrent Neural Networks
PDF
Language translation with Deep Learning (RNN) with TensorFlow
 
PDF
孫民/從電腦視覺看人工智慧 : 下一件大事
PPTX
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
PDF
Large Scale Deep Learning with TensorFlow
PDF
GDG-Shanghai 2017 TensorFlow Summit Recap
Recent developments in Deep Learning
Moving Toward Deep Learning Algorithms on HPCC Systems
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Improving Hardware Efficiency for DNN Applications
Startup.Ml: Using neon for NLP and Localization Applications
Nervana and the Future of Computing
An introduction to Machine Learning (and a little bit of Deep Learning)
Deep learning on mobile - 2019 Practitioner's Guide
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Deep Learning Made Easy with Deep Features
Deep Learning for Personalized Search and Recommender Systems
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Using Deep Learning to do Real-Time Scoring in Practical Applications
Deep Learning through Examples
Urs Köster - Convolutional and Recurrent Neural Networks
Language translation with Deep Learning (RNN) with TensorFlow
 
孫民/從電腦視覺看人工智慧 : 下一件大事
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Large Scale Deep Learning with TensorFlow
GDG-Shanghai 2017 TensorFlow Summit Recap
Ad

Viewers also liked (20)

PDF
[216]네이버 검색 사용자를 만족시켜라! 의도파악과 의미검색
PDF
[242]open stack neutron dataplane 구현
PDF
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
PDF
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
PDF
[234]멀티테넌트 하둡 클러스터 운영 경험기
PDF
[221]똑똑한 인공지능 dj 비서 clova music
PDF
[241]large scale search with polysemous codes
PDF
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
PDF
[215]streetwise machine learning for painless parking
PDF
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
PDF
[213]building ai to recreate our visual world
PDF
[232]mist 고성능 iot 스트림 처리 시스템
PDF
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
PPTX
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
PDF
인공지능추천시스템 airs개발기_모델링과시스템
PDF
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
PPTX
[211] HBase 기반 검색 데이터 저장소 (공개용)
PDF
유연하고 확장성 있는 빅데이터 처리
PPTX
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
PDF
[141]네이버랩스의 로보틱스 연구 소개
[216]네이버 검색 사용자를 만족시켜라! 의도파악과 의미검색
[242]open stack neutron dataplane 구현
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
[234]멀티테넌트 하둡 클러스터 운영 경험기
[221]똑똑한 인공지능 dj 비서 clova music
[241]large scale search with polysemous codes
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[215]streetwise machine learning for painless parking
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
[213]building ai to recreate our visual world
[232]mist 고성능 iot 스트림 처리 시스템
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
인공지능추천시스템 airs개발기_모델링과시스템
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[211] HBase 기반 검색 데이터 저장소 (공개용)
유연하고 확장성 있는 빅데이터 처리
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
[141]네이버랩스의 로보틱스 연구 소개
Ad

Similar to [246]reasoning, attention and memory toward differentiable reasoning machines (20)

PDF
Deep analytics via learning to reason
PDF
Deep Reasoning
PDF
Foundations of ANNs: Tolstoy’s Genius Explored Using Transformer Architecture
PDF
Foundations of ANNs: Tolstoy’s Genius Explored Using Transformer Architecture
PDF
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
PDF
Deep learning 1.0 and Beyond, Part 2
PPTX
Differential Neural Computers
PDF
Biemann ibm cog_comp_jan2015_noanim
PPTX
AI - Exploring Frontiers
PDF
IRJET- Chatbot Using Gated End-to-End Memory Networks
PPTX
Deep Learning for Natural Language Processing
PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
PPTX
Study of End to End memory networks
PDF
Artificial Intelligence - Anna Uni -v1.pdf
PDF
Deep Learning and What's Next?
PDF
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
PPTX
ARTIFICIAL INTELLIGENCE.pptx
PDF
Recurrent Neural Networks A Deep Dive in 2025.pdf
PDF
Developing Reading Machines
PPTX
What Deep Learning Means for Artificial Intelligence
Deep analytics via learning to reason
Deep Reasoning
Foundations of ANNs: Tolstoy’s Genius Explored Using Transformer Architecture
Foundations of ANNs: Tolstoy’s Genius Explored Using Transformer Architecture
Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture
Deep learning 1.0 and Beyond, Part 2
Differential Neural Computers
Biemann ibm cog_comp_jan2015_noanim
AI - Exploring Frontiers
IRJET- Chatbot Using Gated End-to-End Memory Networks
Deep Learning for Natural Language Processing
Natural Language Processing Advancements By Deep Learning: A Survey
Study of End to End memory networks
Artificial Intelligence - Anna Uni -v1.pdf
Deep Learning and What's Next?
A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3
ARTIFICIAL INTELLIGENCE.pptx
Recurrent Neural Networks A Deep Dive in 2025.pdf
Developing Reading Machines
What Deep Learning Means for Artificial Intelligence

More from NAVER D2 (20)

PDF
[211] 인공지능이 인공지능 챗봇을 만든다
PDF
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
PDF
[215] Druid로 쉽고 빠르게 데이터 분석하기
PDF
[245]Papago Internals: 모델분석과 응용기술 개발
PDF
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
PDF
[235]Wikipedia-scale Q&A
PDF
[244]로봇이 현실 세계에 대해 학습하도록 만들기
PDF
[243] Deep Learning to help student’s Deep Learning
PDF
[234]Fast & Accurate Data Annotation Pipeline for AI applications
PDF
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
PDF
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
PDF
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
PDF
[224]네이버 검색과 개인화
PDF
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
PDF
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
PDF
[213] Fashion Visual Search
PDF
[232] TensorRT를 활용한 딥러닝 Inference 최적화
PDF
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
PDF
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
PDF
[223]기계독해 QA: 검색인가, NLP인가?
[211] 인공지능이 인공지능 챗봇을 만든다
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[215] Druid로 쉽고 빠르게 데이터 분석하기
[245]Papago Internals: 모델분석과 응용기술 개발
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[235]Wikipedia-scale Q&A
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[243] Deep Learning to help student’s Deep Learning
[234]Fast & Accurate Data Annotation Pipeline for AI applications
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[224]네이버 검색과 개인화
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[213] Fashion Visual Search
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[223]기계독해 QA: 검색인가, NLP인가?

Recently uploaded (20)

PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
project resource management chapter-09.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mushroom cultivation and it's methods.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Approach and Philosophy of On baking technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
August Patch Tuesday
PPTX
A Presentation on Artificial Intelligence
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Encapsulation_ Review paper, used for researhc scholars
Web App vs Mobile App What Should You Build First.pdf
1. Introduction to Computer Programming.pptx
DP Operators-handbook-extract for the Mautical Institute
gpt5_lecture_notes_comprehensive_20250812015547.pdf
project resource management chapter-09.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
TLE Review Electricity (Electricity).pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Mushroom cultivation and it's methods.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Programs and apps: productivity, graphics, security and other tools
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
August Patch Tuesday
A Presentation on Artificial Intelligence
1 - Historical Antecedents, Social Consideration.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Hybrid model detection and classification of lung cancer
Encapsulation_ Review paper, used for researhc scholars

[246]reasoning, attention and memory toward differentiable reasoning machines

  • 1. Reasoning, Attention and Memory toward differentiable reasoning machines Name : Julien Perez Team: Machine Learning and Optimization
  • 2. CONTENTS 1. Introduction 1. The differentiable programming paradigm 2. Memory and attention for reasoning 2. Language and reasoning tasks 1. Machine reading 2. Dialog state tracking 3. End-to-end dialog learning 3. Further work
  • 4. Differentiable machines Input Process Output Backend Fig1: Architecture of a program Computer program: “A sequence of instructions, written to perform a specified task with a computer.” Wikipedia Place of Machine Learning within Computer Science 1. The application is too complex for people to manually design the algorithm 2. The application requires that the software customize itself to its operational environment after being fielded
  • 5. Definition • Input space(s) • Output space(s) • Topology // Parameters • Optimizable // Differentiable decision function Properties • Identifiable capabilities • Evaluated using error measurement on tasks • Each discipline has currently developed their own models using this paradigm (Computer Vision, NLP, ASR …) Differentiable machines Fig2: Architecture of a differentiable program Input Differentiable function Output Backend
  • 6. Why attention and memory? Input Differentiable function Output Backend Fig2: Architecture of a differentiable program Long term memories - attending to memories • Dealing with gradient vanishing problem • Build sufficient decision support Overcoming computational limits for large data • Focusing only on relevant parts of the inputs • Scalability independent of the size of the inputs Adds interpretability to the models
  • 7. Applications 1. Machine reading 2. Dialog state tracking 3. End-to-End dialog learning Input Differentiable function Output Backend Fig2: Architecture of a differentiable program
  • 9. Machine Reading Definition “A machine comprehends a passage of text if, for any question regarding that text, it can be answered correctly by a majority of native speakers. The machine can provide a string which human readers would agree both 1. Answers that question 2. Does not contain information irrelevant to that question.” Applications • Information extraction from collection of documents • Social media opinion mining • Security/surveillance on the web • End-to-End Dialog systems
  • 10. Document James was always getting in trouble. His aunt Jane tried as hard as she could to keep him out of trouble, but he was sneaky and got into lots of trouble behind her back. He went to the grocery store and pulled all the pudding off the shelves and ate two jars. Then he walked to the fast food restaurant and ordered 15 bags of fries. He didn't pay, and instead headed home. Question: Where did James go after he went to the grocery store? • his deck • his freezer • a fast food restaurant • his home Document The BBC producer allegedly struck by Jeremy Clarkson will not press charges against the “Top Gear” host, his lawyer said Friday. Clarkson, who hosted one of the most-watched television shows in the world, was dropped by the BBC Wednesday after an internal investigation by the British broadcaster found he had subjected producer Oisin Tymon to an unprovoked physical and verbal attack." Question: Producer X will not press charges against Jeremy Clarkson, his lawyer says Answer: Oisin Tymon Machine Reading as Ranking and Cloze style queries [1] Teaching Machines to Read and Comprehend, Blunsom et al, 2015 [2] Text as knowledge bases, Manning et al, 2016
  • 11. Machine Reading CNN dataset [3] Teaching Machines to Read and Comprehend, Blunsom et al 2015 The CNN and DailyMail websites provide paraphrase summary sentences for each full news story. Articles were collected from April 2007 for CNN and June 2010 for the Daily Mail, until the end of April 2015. Validation data is from March, test data from April 2015.
  • 12. Machine Reading Deep Long Short Term Memory readers
  • 13. Machine Reading Attention Sum Reader Network [5] Text Understanding with the Attention Sum Reader Network, Kadlec et al, 2016
  • 14. Machine Reading Competent statistical NLP Featured Logistic Regression • Whether e is in the passage • Whether e is in the question • Frequency of e in passage • First position of e in passage • n-gram exact match • Syntactic dependency around e • The required reasoning and inference level is can be limited • There isn’t much room left for improvement • However, the scale and ease of data production is appealing • Not yet proven whether NNs can do more challenging RC tasks [6] Texts as Knowledge Bases, Manning et al, 2016
  • 15. Beyond structure extraction – Much of this information comes in the form of unstructured text which cannot easily be searched, mined, visualized or, ultimately, acted upon. – Textual data can specify reasoning capabilities – Goal: build machines that can "understand" textual information, i.e. converting it into interpretable structured knowledge to be leveraged by humans and other machines alike. – Reasoning capability is a frontier of current ML approaches [7] Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks, Weston et al, 2015
  • 16. Memory Networks • Class of models that combine large memory with a learning component that can read and write to it. • Most current deep learning models have limited memory for “low level” tasks completion e.g. object detection. • Incorporates reasoning with attention over memory (RAM). [8] End-To-End Memory Networks, Sukhbaatar et al, 2016
  • 17. End-to-End Memory Network Optimization task • Categorical cross-entropy • Stochastic Gradient Descent with clipping • Grid-searched Hyper Parameters deterministic controller update [8] End-To-End Memory Networks, Sukhbaatar et al, 2016
  • 18. Gated End-to-End Memory Network Properties • End-to-End memory access regulation • Close to Highway Network and Residual Network [9] Gated End-to-End Memory Network, Fei Liu and Julien Perez, EACL 2017 gated controller update
  • 19. Benchmark results 20 bAbi tasks Naver Labs Europe results
  • 21. Dialog systems design Dialog state tracking • Central module of a dialog system • Requires a large volume of annotations • Provide interpretability to the dialog policy Limitations • Longer context handling • Looser supervision schema • Reasoning capability [10] The Dialog State Tracking Challenge Series: A Review, Williams et al, 2016
  • 22. End-to-End Memory Network for dialog Dialog state tracking as machine reading On “one supporting fact” task (DSTC-2 dataset) we obtained 83% accuracy vs 79% for current state of the art. [11] Dialog State Tracking, a machine reading approach, Julien Perez and Fei Liu, EACL 2017
  • 24. Learning dialog from dialogs • 5 dialog tasks with/without OOV • 1 DSTC-2 end-to-end dialog task Goal oriented dialog • Learn on synthetic + real dialogs • Backed with a Knowledge Base Memory Networks • End-to-End learnable and flexible • Non-parametric memory due to attention • KB-fact and utterance support of the decision • Dialog as a Machine Reading task End-to-End Memory Network for dialog FAIR Dialog tasks [12] Learning End-to-End Goal-Oriented Dialog, Bordes and Weston, 2016 [13] Dialog state tracking challenge 6, task-1, Bourreau, Perez and Bordes, 2017
  • 25. Gated End-to-End Memory Network End-to-End dialog management October 16, 2017 Performances on FAIR End-to-End dialog tasks Naver Labs Europe results
  • 26. Gated End-to-End Memory Network visualizations • Memory access patterns can be visualized • Attention as a tool to interpret model’s decision Gated Memory Access Regulation
  • 27. Conclusion Toward automation of repetitive cognitive tasks – End-to-end training of the reasoning capability – Open and very dynamic field of research Lack of theoretical analysis – Optimization algorithm convergence – Nature of the loss surface with respect to parameters – Learnability // Safety Limitation of current learning procedures – Active // Interactive Learning – Curriculum learning – Regularization strategies
  • 28. Q & A