SlideShare a Scribd company logo
Interactive Learning of
Task-Oriented Dialog Systems
Bing Liu
Research Scientist, Facebook Conversational AI
Rasa Developer Summit - 2019
Interactive Learning of Task-Oriented
Dialog Systems
Bing Liu
Research Scientist, Facebook
PhD, Carnegie Mellon University
❖ Dialog systems
➢ Chit-chat bot, QA bot, task-oriented dialog system, ...
❖ Get stuff done - assist users in completing specific tasks
➢ Personal assistants (e.g. Siri, Alexa, Google Assistant, Hey Portal)
➢ Voice command in vehicle and smart home
➢ Customer service; Sales and marketing
Task-Oriented Dialog System
2
Modular Dialog System Architecture
3
Task-Oriented Dialog System
❖ Highly handcrafted
❖ Process interdependent
4
❖ Data driven end-to-end (E2E) systems
➢ [Wen et al. 2016]: E2E supervised training neural dialog model
➢ [Bordes and Weston, 2017]: E2E model with memory network
➢ [Andrea et al, 2018]: Mem2Seq for incorporating knowledge to E2E
system
❖ Interactive learning for E2E system with less human supervision
Why Learn through Interactions?
❖ Task-oriented dialog as a sequential decision making process over
multiple steps
5
❖ State space grows exponentially with number of dialog turns
❖ Extremely hard to
➢ Design all possible dialog paths
➢ Collect a dialog corpus that is large
enough to cover all dialog scenarios
→ Continuously learn through the interaction
with users and improve over time
How can we learn end-to-end task-oriented dialog
system effectively through interaction with users?
6
End-to-End Task-Oriented Dialog Modeling
7
❖ Dialog context modeling with hierarchical RNN
B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
End-to-End Task-Oriented Dialog Modeling
8
End-to-End Modeling of
SLU, DST, and Dialog Policy
Supervised Pre-training
❖ Supervised model pre-training on dialog corpus with MLE
➢ Objective function: linear interpolation of cross-entropy losses for
■ Dialog state tracking, i.e. user goal estimation, and
■ Dialog policy, i.e. system action prediction
➢ Optimization: Stochastic gradient descent, Adam
9
← Loss for user goal estimation
← Loss for system action prediction
Learn Interactively from User Feedback
❖ Interactive dialog learning with user feedback
10
Provide feedback for
policy optimization
Human-Human
Dialog Corpora
Supervised
Pre-training
Learn Interactively from User Feedback
❖ Use user feedback as dialog reward
❖ Introduce step penalty to encourage
shorter dialog for task completion
❖ Optimize dialog model end-to-end
with policy gradient RL:
11
Learn Interactively from User Feedback
❖ Policy optimization with RL can be slow due to sparse reward
12
❖ Dialog state distribution mismatch between offline training and
interactive learning leads to compounding errors
→ Ask user for correction/demonstration
when fails at a task and learn to act
❖ Agent may learn to recover from bad state with
RL but the search process can be very inefficient
Learn Interactively from User Teaching
❖ Interactive dialog learning with user teaching
13
Correct mistakes &
Demo desired dialog
agent behavior
Add to existing corpora
Driven by the
agent’s own policy
New
Dialog
Human-Human
Dialog Corpora
Supervised
Pre-training
Evaluation
14
Slots: theatre name, movie, date, time, num of people
SL: Supervised pre-training model
IL: Imitation learning with user teaching
RL: Reinforcement learning with user feedback
❖ Movie booking domain simulation (M2M)
Table: Human evaluation results. Mean and
standard deviation of crowd worker scores (1-5)
B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
15
What if a user did not provide any feedback, can we
still learn anything from the interaction?
Can we learn a dialog reward function?
❖ User feedback serves as reward to RL optimization
16
❖ Task completion based reward requires prior knowledge of user’s goal
→ NOT usually accessible in real world user interactions
❖ In practice, user feedback can be inconsistent and is NOT always
available
Adversarial Dialog Learning
17
Reward
Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
❖ Reward a machine-agent for conducting task-oriented dialog in a way
that is indistinguishable from the way human-agents do it.
Discriminative Reward Model
18
User’s Turn Agent’s Turn
External
Entity Info
❖ Input:
➢ Sequence of dialog turns
❖ Representation:
➢ BiLSTM with max-pooling
❖ Output:
➢ Prob. of a dialog being
successfully completed by
a human agent
Bing Liu and Ian Lane, "Adversarial Learning
of Task-Oriented Neural Dialog Models", in
SIGDIAL 2018.
Model Training
❖ Supervised pre-training with an initial set of pos & neg samples
➢ Pre-train dialog agent G on positive dialog samples with MLE
➢ Pre-train discriminative reward function D on pos & neg samples
❖ Interactive learning cycle
➢ Collect new dialog sample(s) between agent G and users
➢ Update dialog agent G with RL using the reward produced by D
➢ Update reward function D using the newly collected sample(s)
➢ Continue for next learning cycle
19
❖ Comparing different reward functions
Evaluation
20
Bing Liu and Ian Lane, "Adversarial Learning of
Task-Oriented Neural Dialog Models", in
SIGDIAL 2018.
Summary
❖ The multi-turn nature of task-oriented dialogs makes it especially
important for a system to learn through interaction with users
❖ Learning task-oriented dialog model end-to-end with user teaching
and feedback
❖ Adversarial dialog learning to address the challenges with missing or
inconsistent user feedback with less human supervision
21
Thanks!
Q & A
22

More Related Content

PDF
Reinventing Deep Learning
 with Hugging Face Transformers
PPTX
Fine tune and deploy Hugging Face NLP models
PPTX
Data Con LA 2022 - Transformers for NLP
PDF
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
PDF
Speech Sentiment Analysis
PDF
Research Updates from Rasa: Transformers in NLU and Dialogue
PDF
An introduction to the Transformers architecture and BERT
PDF
Natural Language Processing NLP (Transformers)
Reinventing Deep Learning
 with Hugging Face Transformers
Fine tune and deploy Hugging Face NLP models
Data Con LA 2022 - Transformers for NLP
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Speech Sentiment Analysis
Research Updates from Rasa: Transformers in NLU and Dialogue
An introduction to the Transformers architecture and BERT
Natural Language Processing NLP (Transformers)

What's hot (20)

PPTX
PPTX
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
PDF
gpt3_presentation.pdf
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PDF
BERT: Bidirectional Encoder Representations from Transformers
PDF
Rasa AI: Building clever chatbots
PDF
Transforming deep into transformers – a computer vision approach
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
PPTX
A Simple Introduction to Word Embeddings
PPTX
NLP State of the Art | BERT
PDF
Introduction to Natural Language Processing (NLP)
PPTX
Chatbot_Presentation
PPTX
Build a chatbot using rasa
PDF
Ontology matching
PPTX
Chatbot ppt
PPTX
What is word2vec?
PPTX
PDF
An introduction to computer vision with Hugging Face
PPTX
Introduction to natural language processing (NLP)
PPTX
Natural language processing and transformer models
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
gpt3_presentation.pdf
Beyond the Symbols: A 30-minute Overview of NLP
BERT: Bidirectional Encoder Representations from Transformers
Rasa AI: Building clever chatbots
Transforming deep into transformers – a computer vision approach
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
A Simple Introduction to Word Embeddings
NLP State of the Art | BERT
Introduction to Natural Language Processing (NLP)
Chatbot_Presentation
Build a chatbot using rasa
Ontology matching
Chatbot ppt
What is word2vec?
An introduction to computer vision with Hugging Face
Introduction to natural language processing (NLP)
Natural language processing and transformer models
Ad

Similar to Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems (20)

PDF
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
PDF
#1 Berlin Students in AI, Machine Learning & NLP presentation
PDF
Continuous Improvement of Conversational AI in Production | Rasa Summit
PDF
Realizing AI Conversational Bot
PPTX
Deep Dialog System Review
PPTX
Enhance customer service with conversational AI.pptx
PDF
case study-home.pdf
PPTX
Case study OOPS .pptx
PDF
UX class presentation
PPTX
World Usability Day 2009 - Remote vs Lab Usability Testing
PPTX
LESSON 10_ETECH.pptx-Empowerment technology refers to the use of technology t...
PPTX
Phase 4 Presentation
PDF
Best Practices for Building Successful LLM Applications
PDF
Social sales enablement with jive
PPTX
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
PDF
Design UX for AI
PDF
Bill on the Hill
PPTX
Jason Brenier's Presentation "Principles of Conversational Business" - Activa...
PDF
By Thoughtworks | Accessible by default: Shift accessibility left with Katie ...
PPTX
Understanding Chatbot-Mediated Task Management
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
#1 Berlin Students in AI, Machine Learning & NLP presentation
Continuous Improvement of Conversational AI in Production | Rasa Summit
Realizing AI Conversational Bot
Deep Dialog System Review
Enhance customer service with conversational AI.pptx
case study-home.pdf
Case study OOPS .pptx
UX class presentation
World Usability Day 2009 - Remote vs Lab Usability Testing
LESSON 10_ETECH.pptx-Empowerment technology refers to the use of technology t...
Phase 4 Presentation
Best Practices for Building Successful LLM Applications
Social sales enablement with jive
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Design UX for AI
Bill on the Hill
Jason Brenier's Presentation "Principles of Conversational Business" - Activa...
By Thoughtworks | Accessible by default: Shift accessibility left with Katie ...
Understanding Chatbot-Mediated Task Management
Ad

More from Rasa Technologies (20)

PDF
Six Steps to Conversation Driven Development
PDF
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
PDF
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...
PDF
How to Effectively Test Your Chatbot | Rasa Summit
PDF
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
PDF
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
PDF
The missing link: How AI can help create a safer society and better businesse...
PDF
Boss - Bringing More Diversity to Tech | Rasa Summit
PDF
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
PDF
Applying Conversational AI in the Enterprise
PDF
Ai = your data | Rasa Summit 2021
PDF
Supercharging User Interfaces with Rasa | Rasa Summit 2021
PPTX
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
PDF
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
PDF
The State of Conversation Design - Designing for the Conversational Future
PDF
Rasa Open Source - What's next?
PDF
Building an AI Assistant Factory - Rasa Summit 2021
PDF
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
PDF
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
PDF
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
Six Steps to Conversation Driven Development
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...
How to Effectively Test Your Chatbot | Rasa Summit
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
The missing link: How AI can help create a safer society and better businesse...
Boss - Bringing More Diversity to Tech | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
Applying Conversational AI in the Enterprise
Ai = your data | Rasa Summit 2021
Supercharging User Interfaces with Rasa | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
The State of Conversation Design - Designing for the Conversational Future
Rasa Open Source - What's next?
Building an AI Assistant Factory - Rasa Summit 2021
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PDF
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
Approach and Philosophy of On baking technology

Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems

  • 1. Interactive Learning of Task-Oriented Dialog Systems Bing Liu Research Scientist, Facebook Conversational AI Rasa Developer Summit - 2019
  • 2. Interactive Learning of Task-Oriented Dialog Systems Bing Liu Research Scientist, Facebook PhD, Carnegie Mellon University
  • 3. ❖ Dialog systems ➢ Chit-chat bot, QA bot, task-oriented dialog system, ... ❖ Get stuff done - assist users in completing specific tasks ➢ Personal assistants (e.g. Siri, Alexa, Google Assistant, Hey Portal) ➢ Voice command in vehicle and smart home ➢ Customer service; Sales and marketing Task-Oriented Dialog System 2
  • 4. Modular Dialog System Architecture 3
  • 5. Task-Oriented Dialog System ❖ Highly handcrafted ❖ Process interdependent 4 ❖ Data driven end-to-end (E2E) systems ➢ [Wen et al. 2016]: E2E supervised training neural dialog model ➢ [Bordes and Weston, 2017]: E2E model with memory network ➢ [Andrea et al, 2018]: Mem2Seq for incorporating knowledge to E2E system ❖ Interactive learning for E2E system with less human supervision
  • 6. Why Learn through Interactions? ❖ Task-oriented dialog as a sequential decision making process over multiple steps 5 ❖ State space grows exponentially with number of dialog turns ❖ Extremely hard to ➢ Design all possible dialog paths ➢ Collect a dialog corpus that is large enough to cover all dialog scenarios → Continuously learn through the interaction with users and improve over time
  • 7. How can we learn end-to-end task-oriented dialog system effectively through interaction with users? 6
  • 8. End-to-End Task-Oriented Dialog Modeling 7 ❖ Dialog context modeling with hierarchical RNN B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
  • 9. End-to-End Task-Oriented Dialog Modeling 8 End-to-End Modeling of SLU, DST, and Dialog Policy
  • 10. Supervised Pre-training ❖ Supervised model pre-training on dialog corpus with MLE ➢ Objective function: linear interpolation of cross-entropy losses for ■ Dialog state tracking, i.e. user goal estimation, and ■ Dialog policy, i.e. system action prediction ➢ Optimization: Stochastic gradient descent, Adam 9 ← Loss for user goal estimation ← Loss for system action prediction
  • 11. Learn Interactively from User Feedback ❖ Interactive dialog learning with user feedback 10 Provide feedback for policy optimization Human-Human Dialog Corpora Supervised Pre-training
  • 12. Learn Interactively from User Feedback ❖ Use user feedback as dialog reward ❖ Introduce step penalty to encourage shorter dialog for task completion ❖ Optimize dialog model end-to-end with policy gradient RL: 11
  • 13. Learn Interactively from User Feedback ❖ Policy optimization with RL can be slow due to sparse reward 12 ❖ Dialog state distribution mismatch between offline training and interactive learning leads to compounding errors → Ask user for correction/demonstration when fails at a task and learn to act ❖ Agent may learn to recover from bad state with RL but the search process can be very inefficient
  • 14. Learn Interactively from User Teaching ❖ Interactive dialog learning with user teaching 13 Correct mistakes & Demo desired dialog agent behavior Add to existing corpora Driven by the agent’s own policy New Dialog Human-Human Dialog Corpora Supervised Pre-training
  • 15. Evaluation 14 Slots: theatre name, movie, date, time, num of people SL: Supervised pre-training model IL: Imitation learning with user teaching RL: Reinforcement learning with user feedback ❖ Movie booking domain simulation (M2M) Table: Human evaluation results. Mean and standard deviation of crowd worker scores (1-5) B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
  • 16. 15 What if a user did not provide any feedback, can we still learn anything from the interaction?
  • 17. Can we learn a dialog reward function? ❖ User feedback serves as reward to RL optimization 16 ❖ Task completion based reward requires prior knowledge of user’s goal → NOT usually accessible in real world user interactions ❖ In practice, user feedback can be inconsistent and is NOT always available
  • 18. Adversarial Dialog Learning 17 Reward Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018. ❖ Reward a machine-agent for conducting task-oriented dialog in a way that is indistinguishable from the way human-agents do it.
  • 19. Discriminative Reward Model 18 User’s Turn Agent’s Turn External Entity Info ❖ Input: ➢ Sequence of dialog turns ❖ Representation: ➢ BiLSTM with max-pooling ❖ Output: ➢ Prob. of a dialog being successfully completed by a human agent Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
  • 20. Model Training ❖ Supervised pre-training with an initial set of pos & neg samples ➢ Pre-train dialog agent G on positive dialog samples with MLE ➢ Pre-train discriminative reward function D on pos & neg samples ❖ Interactive learning cycle ➢ Collect new dialog sample(s) between agent G and users ➢ Update dialog agent G with RL using the reward produced by D ➢ Update reward function D using the newly collected sample(s) ➢ Continue for next learning cycle 19
  • 21. ❖ Comparing different reward functions Evaluation 20 Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
  • 22. Summary ❖ The multi-turn nature of task-oriented dialogs makes it especially important for a system to learn through interaction with users ❖ Learning task-oriented dialog model end-to-end with user teaching and feedback ❖ Adversarial dialog learning to address the challenges with missing or inconsistent user feedback with less human supervision 21