SlideShare a Scribd company logo
End-to-End Task-Completion
Neural Dialogue Systems
Xiujun Li Yun-Nung (Vivian) Chen Lihong Li Jianfeng Gao Asli Celikyilmaz
The 8th International Joint Conference on Natural Language Processing
https://github.com/MiuLab/TC-Bot
1
Dialogue Interactions
“I want to talk” Chit-chat
“I have a question” Information consumption
“I need to get this done” Task completion
Task-oriented
Non task-oriented
2
Dialogue Interactions
“I want to talk” Chit-chat
“I have a question” Information consumption
“I need to get this done” Task completion
• What is the paper review schedule?
• Which room is the dialogue tutorial in?
• When is the IJCNLP 2017 conference?
• What does NLP stand for?
3
Dialogue Interactions
“I want to talk” Chit-chat
“I have a question” Information consumption
“I need to get this done” Task completion
• Book me the flight from Seattle to Taipei
• Reserve a table at Din Tai Fung for 5 people, 7PM tonight
• Purchase two movie tickets about Thor 3 at 9PM tonight
4
Goal: navigating users to complete tasks (transactions, API calls, etc.)
Challenges: including multiple information consumption
Task-Completion Dialogues
• Crowdsourced data
– Domain: Movie
– Dialogue act: intent & slot-value pairs
• Problem setting
– User wants to achieve a specific task, i.e. buying a movie ticket
– Agent tries to understand user’s goal via dialogue before booking the ticket
Turn Speaker Intent & Slots Utterance
1 User greeting(greeting=hi) Hi
1 User
request(moviename; city=Houma;
state=Louisiana; date=this week; genre=foreign)
Are there any good movies showing around
Houma Louisiana this week?
2 Agent request(date)
What date would you like me to look for a
reservation
3 User inform(date=9th) How about the 9th
4 Agent inform(result={}; genre=foreign)
Unfortunately here are no foreign movies
playing at this time
5
Rule-Based Agent
• Gather all information via a
series of requests (form-filling)
• Issues
1) No notion about what requests
can be skipped
User goal – Two tickets for “Deadpool” tomorrow
9PM at AMC Pacific Place 11 theater, Seattle.
User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?
Agent: Which theater do you want?
User: AMC pacific place 11 theater, please!
Agent: How many tickets do you want?
User: Two!
Agent: What date?
User: Tomorrow!
Agent: Which city do you want?
User: Seattle!
Agent: Do you have any specification for zip code?
User: No, I do not care.
Agent: Do you have any distance constraint?
User: No, I do not care.
Agent: How about video format? 2d? Imax 3d?
User: Any kind is okay.
Agent: Do you have any constraint to price?
User: No!
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at
AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
Wasteful! (very likely
the user does not care)
6
Rule-Based Agent
• Gather all information via a
series of requests (form-filling)
• Issues
1) No notion about what requests
can be skipped
2) No principle way to deal with
errors/uncertainty in language
understanding
User goal – Two tickets for “Hail Caesar” tomorrow
7:15PM at AMC Lowes Oak tree 6, Seattle.
User: Do you any ticket for movie ‘Hail Caesar’ at theater
AMC Lowes Oak Tree 6?
Agent: What start time do you prefer?
User: 7:15 pm Tomorrow
Agent: Which city you are looking for?
User: Seattle
Agent: How many tickets do you want?
User: Two
Agent: Okay, I have issued two tickets for movie ‘Hail Caeser’
tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle.
User: Oh, no.
Agent: Thanks!
Task failed!
(cannot correct the errors from proceeding component)
7
Rule-Based Agent
• Gather all information via a
series of requests (form-filling)
• Issues
1) No notion about what requests
can be skipped
2) No principle way to deal with
errors/uncertainty in language
understanding
3) Do not know how to
recommend options if the
user’s goal is not achievable
8
Task-Oriented Dialogue System Framework
Language Understanding
(LU)
Natural Language
Generation (NLG)
Semantic Frame
request_ticket
(moviename=Star Wars;
numberofpeople=5)
System Action/ Policy
request(theater)
Text Response
“Which theater do you prefer?”
Text Input
“Book 5 tickets for movie Star Wars”
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
9
State Representation
Task-Oriented Dialogue System Framework
Language Understanding
(LU)
Natural Language
Generation (NLG)
Dialogue Act
Natural Language
Natural Language
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
Dialogue Act
10
State Representation
User
System
User Simulation in Frame-Level Semantics
User
Dialogue Act
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
Error Model Controller
• Recognition error
• LU error
User Model
User Simulation
System
Dialogue Act
User
Dialogue Act
11
State Representation
User Simulation in Natural Language Level
Natural Language
Generation (NLG)
User Model
User Simulation
System
Dialogue Act
Language
Understanding
(LU)
NL
12
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
State RepresentationUser
Dialogue Act
Task-Completion Neural Dialogue Systems
• Language Understanding
– Joint semantic frame parsing by BLSTM (Hakkani-Tur et al., 2016)
• Dialogue State Tracking
– Available results returned by the formed symbolic query
– Latest user dialogue action
• Dialogue Policy Learning
– Reinforcement learning policy (Mnih et al., 2015)
• Natural Language Generation
– Template-based
– Model-based: semantically-conditioned LSTM generation (SC-LSTM)
(Wen et al., 2015)
13
Supervised
Supervised
Supervised
Reinforcement
+ Reinforcement (fine-tuning)
+ Reinforcement (fine-tuning)
+ Reinforcement (fine-tuning)
wi
<slot>
wi+1
O
EOS
<intent>
wi
<slot>
wi+1
O
EOS
<intent>
End-to-End Neural Dialogue Systems
• LU, DST (neural dialogue system), and NLG (user simulation)
are trained in supervised way
• End-to-end training for dialogue policy learning
Knowledge
Database
Neural
Dialogue
System
User Model
User Simulation
Dialogue
Policy
Natural
Language
w
0
w1 w2
NLG
EOS
User Goal
wi
<slot>
wi+1
O
EOS
<intent>
LU
𝑠𝑡
DST
𝑠1 𝑠2 𝑠 𝑛
𝑎1 𝑎2 𝑎 𝑘
……
…
Dialogue Policy Learning
14
Reinforcement Learning Agent
• Dialogue policy learning
– Deep Q-network: estimate Q-value given the state and action pair
• Reward
– Success: agent answers all the requested slots based on the user’s
constraints, and book the movie tickets within the max turns
– Failure: 1) agent finds no matching movie based on the user’s constraints
or 2) exceed the max turns
15
Experiments
• Rule-based agent
– Actions
• ask question (request)
• answer question (inform)
• give multiple_choice to user
• say “confirm_answer” to user
• say “closing”, “thanks”
– Agent asks the slot in a priority
order, but within a sliding
window (say, size = 2 or 3),
there is randomness.
• RL agent
– Model: Deep Q-Network
– Rewards
• Success: 2 × max_turn
• Fail: - max_turn
• -1 for per turn penalty
– Actions: 45 actions
• i.e. request(starttime), inform(moviename),
confirm(question), etc.
– State transition tuples ( 𝑠𝑡, 𝑎 𝑡, 𝑟𝑡, 𝑠𝑡+1)
– Experience replay (Schaul et al., 2015)
• Priority, dynamic pool
• Starts with a pool of rule-based tuples
16
• Frame-level semantics  Natural language
The RL agent is able to learn how to interact with users to complete tasks more
efficiently and effectively, and outperforms the rule-based agent.
End-to-End Reinforcement Learning17
Language Understanding Impact
• Different error rates in LU
18
Language understanding results are crucial in terms of dialogue-level performance
Intent Error Analysis
• Intent error type
– I0: random
– I1: within group
– I2: between group
• Intent error rate
– I3: 0.00
– I4: 0.10
– I5: 0.20
Intent errors slightly influence the RL system performance
Group 1: greeting(), thanks(), etc
Group 2: inform(xx)
Group 3: request(xx)
Between-group intent errors degrade the system performance more
request_moviename(actor=Robert Downey Jr)
request_year
19
Slot Error Analysis
• Slot error type
– I0: random
– I1: slot deletion
– I2: value substitution
– I3: slot substitution
• Slot error rate
– S4: 0.00
– S5: 0.10
– S6: 0.20
Slot errors significantly degrade the RL system performance
Value substitution has the largest impact on the system performance
request_moviename
(actor=Robert Downey Jr)
director Robert Downey Sr20
Error Comparison
• Intent error rate • Slot error rate
The RL agent has better robustness to intent errors for dialogue-level performance
Slot filling is more important than intent detection in language understanding
21
Human Evaluation
• Users presented with a predefined user goal
• User rating: 1 (worst) to 5 (best) based on both naturalness
and coherence of the dialogue
22
User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle.
RULE BASED SYSTEM
User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?
Agent: Which theater do you want?
User: AMC pacific place 11 theater, please!
Agent: How many tickets do you want?
User: Two!
Agent: What date?
User: Tomorrow!
Agent: Which city do you want?
User: Seattle!
Agent: Do you have any specification for zip code?
User: No, I do not care.
Agent: Do you have any distance constraint?
User: No, I do not care.
Agent: How about video format? 2d? Imax 3d?
User: Any kind is okay.
Agent: Do you have any constraint to price?
User: No!
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC
pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
REINFORCEMENT LEARNING SYSTEM
User: I want two tickets for movie ‘Deadpool’.
Agent: What is the starttime you prefer?
User: 9:00 PM
Agent: Which day?
User: Tomorrow
Agent: Which city?
User: Seattle
Agent: Which theater do you prefer?
User: AMC Pacific Place 11 theater.
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm
at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
Dialogue Examples
Skip the requests the user may not care about to improve efficiency
Issue 1: no notion about what
requests can be skipped
23
User goal – Two tickets for “Hail Caesar” tomorrow 7:15PM at AMC Lowes Oak tree 6, Seattle.
Dialogue Examples
Automatically learn when and what to confirm for better robustness
REINFORCEMENT LEARNING SYSTEM
Agent: What start time do you prefer?
User: 7:15 pm Tomorrow
Agent: Which city you are looking for?
User: Seattle
Agent: Okay, I will issue you tickets for movie ‘Hail Caeser’
tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle.
User: Oh, no.
Agent: It is 7:50 pm, right?
User: Two tickets for 7:15 pm, please.
Agent: Okay, I will issue you two tickets for movie ‘Hail Caeser’
tomorrow 7:15 pm at theater AMC Lowes Oak Tree 6, Seattle.
User: Thanks!
Agent: Thanks!
RULE BASED SYSTEM
User: Do you any ticket for movie ‘Hail Caesar’ at theater
AMC Lowes Oak Tree 6?
Agent: What start time do you prefer?
User: 7:15 pm Tomorrow
Agent: Which city you are looking for?
User: Seattle
Agent: How many tickets do you want?
User: Two
Agent: Okay, I have issued two tickets for movie ‘Hail
Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak
Tree 6, Seattle.
User: Oh, no.
Agent: Thanks!
Issue 2: no principle way to deal
with LU errors/uncertainty
24
Conclusions
• Our neural dialogue system demonstrates
– Robustness
• selecting actions based on uncertainty and confusion by RL
• systematic analysis about language understanding errors on dialogue
system performance
 slot-level errors affect system performance more than intent-level errors
 slot value replacement degrades the performance most
– Flexibility
• first neural dialogue system that allows user-initiated behaviors during
conversations
– Reproducibility
• demonstrate how to evaluate RL dialogue agents using crowdsourced
datasets and simulated users in an end-to-end fashion
• guaranteeing reproducibility and consistent comparisons of competing
methods in an identical setting
25
Thanks for Attention!
Q & A
https://github.com/MiuLab/TC-Bot

More Related Content

PPTX
INTRODUCTION TO NLP, RNN, LSTM, GRU
PDF
Reinventing Deep Learning
 with Hugging Face Transformers
PPTX
인공지능, 기계학습 그리고 딥러닝
PDF
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
PPTX
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
PPTX
Introduction to RAG (Retrieval Augmented Generation) and its application
PPTX
Natural language processing (NLP)
PDF
Deep Natural Language Processing for Search and Recommender Systems
INTRODUCTION TO NLP, RNN, LSTM, GRU
Reinventing Deep Learning
 with Hugging Face Transformers
인공지능, 기계학습 그리고 딥러닝
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Introduction to RAG (Retrieval Augmented Generation) and its application
Natural language processing (NLP)
Deep Natural Language Processing for Search and Recommender Systems

What's hot (20)

PPTX
Natural Language Processing.pptx
PPTX
Fine tune and deploy Hugging Face NLP models
PDF
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PDF
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
PDF
Deep Learning for Natural Language Processing: Word Embeddings
PPTX
Large Language Models | How Large Language Models Work? | Introduction to LLM...
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
PDF
Introduction to Transformers for NLP - Olga Petrova
PDF
Deep Generative Models
PPTX
Generative adversarial networks
PPTX
[Paper review] BERT
PPTX
Language Model (N-Gram).pptx
PDF
Beyond Churn Prediction : An Introduction to uplift modeling
PPTX
Introduction to Named Entity Recognition
PDF
Scaling Instruction-Finetuned Language Models
PPTX
PPTX
Attention Mechanism in Language Understanding and its Applications
PPT
Game balancing
PDF
Skip gram and cbow
Natural Language Processing.pptx
Fine tune and deploy Hugging Face NLP models
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deep Learning for Natural Language Processing: Word Embeddings
Large Language Models | How Large Language Models Work? | Introduction to LLM...
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Introduction to Transformers for NLP - Olga Petrova
Deep Generative Models
Generative adversarial networks
[Paper review] BERT
Language Model (N-Gram).pptx
Beyond Churn Prediction : An Introduction to uplift modeling
Introduction to Named Entity Recognition
Scaling Instruction-Finetuned Language Models
Attention Mechanism in Language Understanding and its Applications
Game balancing
Skip gram and cbow
Ad

Similar to End-to-End Task-Completion Neural Dialogue Systems (20)

PDF
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
PDF
2017 Tutorial - Deep Learning for Dialogue Systems
PDF
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
PPTX
Deep Learning for Dialogue Modeling - NTHU
PDF
Deep Learning for Dialogue Systems
PDF
PDF
SXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
PDF
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
PDF
W6L1_LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Chatbots and AI Agents
PPTX
Games for Analysis of Technologies in Human-Intensive Systems - Dr. Tim Dasey...
PPT
AAMAS-2006 TANDEM Design Method (poster format)
PPTX
CarolinaCon Presentation on Streaming Analytics
PPT
Identifying MMORPG Bots: A Traffic Analysis Approach
PDF
W1L2_11-667 - Building Blocks of Modern LLMs 2: Pretraining Tasks
PDF
Pretraining Task - Auto-Regressive LM, Transformer Encoder-Decoders
PDF
One Day for Bot 一天搞懂聊天機器人
PDF
[系列活動] 一天搞懂對話機器人
PDF
CrowdQ: Crowdsourced Query Understanding
PPTX
PDF
UX STRAT Europe 2019: Zhaochang He, VMware
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
2017 Tutorial - Deep Learning for Dialogue Systems
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Systems
SXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
W6L1_LARGE LANGUAGE MODELS: METHODS AND APPLICATIONS - Chatbots and AI Agents
Games for Analysis of Technologies in Human-Intensive Systems - Dr. Tim Dasey...
AAMAS-2006 TANDEM Design Method (poster format)
CarolinaCon Presentation on Streaming Analytics
Identifying MMORPG Bots: A Traffic Analysis Approach
W1L2_11-667 - Building Blocks of Modern LLMs 2: Pretraining Tasks
Pretraining Task - Auto-Regressive LM, Transformer Encoder-Decoders
One Day for Bot 一天搞懂聊天機器人
[系列活動] 一天搞懂對話機器人
CrowdQ: Crowdsourced Query Understanding
UX STRAT Europe 2019: Zhaochang He, VMware
Ad

More from Yun-Nung (Vivian) Chen (15)

PPTX
How the Context Matters Language and Interaction in Dialogues
PPTX
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
PDF
Chatbot的智慧與靈魂
PPTX
Language Empowering Intelligent Assistants (CHT)
PPTX
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
PPTX
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
PPTX
Automatic Key Term Extraction from Spoken Course Lectures
PPTX
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
PPTX
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
PPTX
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
PPTX
An Intelligent Assistant for High-Level Task Understanding
PPTX
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
PPTX
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
PPTX
Statistical Learning from Dialogues for Intelligent Assistants
PPTX
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
How the Context Matters Language and Interaction in Dialogues
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Chatbot的智慧與靈魂
Language Empowering Intelligent Assistants (CHT)
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Automatic Key Term Extraction from Spoken Course Lectures
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
An Intelligent Assistant for High-Level Task Understanding
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
Statistical Learning from Dialogues for Intelligent Assistants
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Electronic commerce courselecture one. Pdf
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction

End-to-End Task-Completion Neural Dialogue Systems

  • 1. End-to-End Task-Completion Neural Dialogue Systems Xiujun Li Yun-Nung (Vivian) Chen Lihong Li Jianfeng Gao Asli Celikyilmaz The 8th International Joint Conference on Natural Language Processing https://github.com/MiuLab/TC-Bot 1
  • 2. Dialogue Interactions “I want to talk” Chit-chat “I have a question” Information consumption “I need to get this done” Task completion Task-oriented Non task-oriented 2
  • 3. Dialogue Interactions “I want to talk” Chit-chat “I have a question” Information consumption “I need to get this done” Task completion • What is the paper review schedule? • Which room is the dialogue tutorial in? • When is the IJCNLP 2017 conference? • What does NLP stand for? 3
  • 4. Dialogue Interactions “I want to talk” Chit-chat “I have a question” Information consumption “I need to get this done” Task completion • Book me the flight from Seattle to Taipei • Reserve a table at Din Tai Fung for 5 people, 7PM tonight • Purchase two movie tickets about Thor 3 at 9PM tonight 4 Goal: navigating users to complete tasks (transactions, API calls, etc.) Challenges: including multiple information consumption
  • 5. Task-Completion Dialogues • Crowdsourced data – Domain: Movie – Dialogue act: intent & slot-value pairs • Problem setting – User wants to achieve a specific task, i.e. buying a movie ticket – Agent tries to understand user’s goal via dialogue before booking the ticket Turn Speaker Intent & Slots Utterance 1 User greeting(greeting=hi) Hi 1 User request(moviename; city=Houma; state=Louisiana; date=this week; genre=foreign) Are there any good movies showing around Houma Louisiana this week? 2 Agent request(date) What date would you like me to look for a reservation 3 User inform(date=9th) How about the 9th 4 Agent inform(result={}; genre=foreign) Unfortunately here are no foreign movies playing at this time 5
  • 6. Rule-Based Agent • Gather all information via a series of requests (form-filling) • Issues 1) No notion about what requests can be skipped User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle. User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm? Agent: Which theater do you want? User: AMC pacific place 11 theater, please! Agent: How many tickets do you want? User: Two! Agent: What date? User: Tomorrow! Agent: Which city do you want? User: Seattle! Agent: Do you have any specification for zip code? User: No, I do not care. Agent: Do you have any distance constraint? User: No, I do not care. Agent: How about video format? 2d? Imax 3d? User: Any kind is okay. Agent: Do you have any constraint to price? User: No! Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! Wasteful! (very likely the user does not care) 6
  • 7. Rule-Based Agent • Gather all information via a series of requests (form-filling) • Issues 1) No notion about what requests can be skipped 2) No principle way to deal with errors/uncertainty in language understanding User goal – Two tickets for “Hail Caesar” tomorrow 7:15PM at AMC Lowes Oak tree 6, Seattle. User: Do you any ticket for movie ‘Hail Caesar’ at theater AMC Lowes Oak Tree 6? Agent: What start time do you prefer? User: 7:15 pm Tomorrow Agent: Which city you are looking for? User: Seattle Agent: How many tickets do you want? User: Two Agent: Okay, I have issued two tickets for movie ‘Hail Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Oh, no. Agent: Thanks! Task failed! (cannot correct the errors from proceeding component) 7
  • 8. Rule-Based Agent • Gather all information via a series of requests (form-filling) • Issues 1) No notion about what requests can be skipped 2) No principle way to deal with errors/uncertainty in language understanding 3) Do not know how to recommend options if the user’s goal is not achievable 8
  • 9. Task-Oriented Dialogue System Framework Language Understanding (LU) Natural Language Generation (NLG) Semantic Frame request_ticket (moviename=Star Wars; numberofpeople=5) System Action/ Policy request(theater) Text Response “Which theater do you prefer?” Text Input “Book 5 tickets for movie Star Wars” Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) 9 State Representation
  • 10. Task-Oriented Dialogue System Framework Language Understanding (LU) Natural Language Generation (NLG) Dialogue Act Natural Language Natural Language Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) Dialogue Act 10 State Representation User System
  • 11. User Simulation in Frame-Level Semantics User Dialogue Act Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) Error Model Controller • Recognition error • LU error User Model User Simulation System Dialogue Act User Dialogue Act 11 State Representation
  • 12. User Simulation in Natural Language Level Natural Language Generation (NLG) User Model User Simulation System Dialogue Act Language Understanding (LU) NL 12 Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) State RepresentationUser Dialogue Act
  • 13. Task-Completion Neural Dialogue Systems • Language Understanding – Joint semantic frame parsing by BLSTM (Hakkani-Tur et al., 2016) • Dialogue State Tracking – Available results returned by the formed symbolic query – Latest user dialogue action • Dialogue Policy Learning – Reinforcement learning policy (Mnih et al., 2015) • Natural Language Generation – Template-based – Model-based: semantically-conditioned LSTM generation (SC-LSTM) (Wen et al., 2015) 13 Supervised Supervised Supervised Reinforcement + Reinforcement (fine-tuning) + Reinforcement (fine-tuning) + Reinforcement (fine-tuning)
  • 14. wi <slot> wi+1 O EOS <intent> wi <slot> wi+1 O EOS <intent> End-to-End Neural Dialogue Systems • LU, DST (neural dialogue system), and NLG (user simulation) are trained in supervised way • End-to-end training for dialogue policy learning Knowledge Database Neural Dialogue System User Model User Simulation Dialogue Policy Natural Language w 0 w1 w2 NLG EOS User Goal wi <slot> wi+1 O EOS <intent> LU 𝑠𝑡 DST 𝑠1 𝑠2 𝑠 𝑛 𝑎1 𝑎2 𝑎 𝑘 …… … Dialogue Policy Learning 14
  • 15. Reinforcement Learning Agent • Dialogue policy learning – Deep Q-network: estimate Q-value given the state and action pair • Reward – Success: agent answers all the requested slots based on the user’s constraints, and book the movie tickets within the max turns – Failure: 1) agent finds no matching movie based on the user’s constraints or 2) exceed the max turns 15
  • 16. Experiments • Rule-based agent – Actions • ask question (request) • answer question (inform) • give multiple_choice to user • say “confirm_answer” to user • say “closing”, “thanks” – Agent asks the slot in a priority order, but within a sliding window (say, size = 2 or 3), there is randomness. • RL agent – Model: Deep Q-Network – Rewards • Success: 2 × max_turn • Fail: - max_turn • -1 for per turn penalty – Actions: 45 actions • i.e. request(starttime), inform(moviename), confirm(question), etc. – State transition tuples ( 𝑠𝑡, 𝑎 𝑡, 𝑟𝑡, 𝑠𝑡+1) – Experience replay (Schaul et al., 2015) • Priority, dynamic pool • Starts with a pool of rule-based tuples 16
  • 17. • Frame-level semantics  Natural language The RL agent is able to learn how to interact with users to complete tasks more efficiently and effectively, and outperforms the rule-based agent. End-to-End Reinforcement Learning17
  • 18. Language Understanding Impact • Different error rates in LU 18 Language understanding results are crucial in terms of dialogue-level performance
  • 19. Intent Error Analysis • Intent error type – I0: random – I1: within group – I2: between group • Intent error rate – I3: 0.00 – I4: 0.10 – I5: 0.20 Intent errors slightly influence the RL system performance Group 1: greeting(), thanks(), etc Group 2: inform(xx) Group 3: request(xx) Between-group intent errors degrade the system performance more request_moviename(actor=Robert Downey Jr) request_year 19
  • 20. Slot Error Analysis • Slot error type – I0: random – I1: slot deletion – I2: value substitution – I3: slot substitution • Slot error rate – S4: 0.00 – S5: 0.10 – S6: 0.20 Slot errors significantly degrade the RL system performance Value substitution has the largest impact on the system performance request_moviename (actor=Robert Downey Jr) director Robert Downey Sr20
  • 21. Error Comparison • Intent error rate • Slot error rate The RL agent has better robustness to intent errors for dialogue-level performance Slot filling is more important than intent detection in language understanding 21
  • 22. Human Evaluation • Users presented with a predefined user goal • User rating: 1 (worst) to 5 (best) based on both naturalness and coherence of the dialogue 22
  • 23. User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle. RULE BASED SYSTEM User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm? Agent: Which theater do you want? User: AMC pacific place 11 theater, please! Agent: How many tickets do you want? User: Two! Agent: What date? User: Tomorrow! Agent: Which city do you want? User: Seattle! Agent: Do you have any specification for zip code? User: No, I do not care. Agent: Do you have any distance constraint? User: No, I do not care. Agent: How about video format? 2d? Imax 3d? User: Any kind is okay. Agent: Do you have any constraint to price? User: No! Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! REINFORCEMENT LEARNING SYSTEM User: I want two tickets for movie ‘Deadpool’. Agent: What is the starttime you prefer? User: 9:00 PM Agent: Which day? User: Tomorrow Agent: Which city? User: Seattle Agent: Which theater do you prefer? User: AMC Pacific Place 11 theater. Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! Dialogue Examples Skip the requests the user may not care about to improve efficiency Issue 1: no notion about what requests can be skipped 23
  • 24. User goal – Two tickets for “Hail Caesar” tomorrow 7:15PM at AMC Lowes Oak tree 6, Seattle. Dialogue Examples Automatically learn when and what to confirm for better robustness REINFORCEMENT LEARNING SYSTEM Agent: What start time do you prefer? User: 7:15 pm Tomorrow Agent: Which city you are looking for? User: Seattle Agent: Okay, I will issue you tickets for movie ‘Hail Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Oh, no. Agent: It is 7:50 pm, right? User: Two tickets for 7:15 pm, please. Agent: Okay, I will issue you two tickets for movie ‘Hail Caeser’ tomorrow 7:15 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Thanks! Agent: Thanks! RULE BASED SYSTEM User: Do you any ticket for movie ‘Hail Caesar’ at theater AMC Lowes Oak Tree 6? Agent: What start time do you prefer? User: 7:15 pm Tomorrow Agent: Which city you are looking for? User: Seattle Agent: How many tickets do you want? User: Two Agent: Okay, I have issued two tickets for movie ‘Hail Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Oh, no. Agent: Thanks! Issue 2: no principle way to deal with LU errors/uncertainty 24
  • 25. Conclusions • Our neural dialogue system demonstrates – Robustness • selecting actions based on uncertainty and confusion by RL • systematic analysis about language understanding errors on dialogue system performance  slot-level errors affect system performance more than intent-level errors  slot value replacement degrades the performance most – Flexibility • first neural dialogue system that allows user-initiated behaviors during conversations – Reproducibility • demonstrate how to evaluate RL dialogue agents using crowdsourced datasets and simulated users in an end-to-end fashion • guaranteeing reproducibility and consistent comparisons of competing methods in an identical setting 25
  • 26. Thanks for Attention! Q & A https://github.com/MiuLab/TC-Bot

Editor's Notes

  • #3: General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it. But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit. Where does it make sense for MS to focus its efforts in this space? For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product. For 2 & 3 the user already has some idea of what the information need is. For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
  • #4: General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it. But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit. Where does it make sense for MS to focus its efforts in this space? For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product. For 2 & 3 the user already has some idea of what the information need is. For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
  • #5: General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it. But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit. Where does it make sense for MS to focus its efforts in this space? For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product. For 2 & 3 the user already has some idea of what the information need is. For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.