SlideShare a Scribd company logo
DSTC6 – Dialogue System
Technology Challenges
An Introduction
서강대학교 자연어처리 연구실
허광호
2017.07.12
DSTC6 Tracks
• Track 1 – End-to-End Goal Oriented Dialog Learning
• Y-Lan Boureau et al. - Facebook AI Research
• Track 2 – End-to-End Conversation Modeling
• Chiori HORI et al. (Mitsubishi Electric Research Laboratories)
• Track 3 – Dialog Breakdown Detection
• Ryuichiro Higashinaka et al. (NTT)
Track 3 – Dialog Breakdown Detection
NB: Not a breakdown
PB: Possible breakdown
B: Breakdown
Track 3 – Dialog Breakdown Detection
• 필요한 이유
• Voice agent 서비스가 상업용으로 출시되고 있지만
• Still cannot converse as naturally as two humans.
• 가장 큰 문제점은 Voice agent가 가끔 Dialogue breakdown을 유발하는 부적절한 발화를
생성함.
• 용도
• Breakdown detection 기술은 Chat-oriented 대화와 같이 대화유지가 중요한 경우 유용함.
• 대화 시스템의 error recovery에도 사용할 수 있음.
Track 3 – Dialog Breakdown Detection
• Dataset
• 100 chat-oriented dialogues (21 utterances per dialogue) – 24 annotators.
• 1000 chat-oriented dialogues – 2~3 annotators.
• 300 chat-oriented dialogues – 30 annotators.
• Unfortunately, the data above are in Japanese;
• 추가로 영어로 된 100 dialogues 를 수집하여 배포한다고 함.
• 평가방법
• Classification-Related metrics – Accuracy, Precision, Recall, F-measure
• Distribution-related metrics – JS Divergence and Mean squared error
Track 3 – Dialog Breakdown Detection
• LREC 2016 Breakdown Detection (In Japanese) 결과
• Baseline: CRF-based method
• Team1: LSTM-RNN-based method
• Features: Word2Vec + co-occurrence freq. vector + Sent2Vec vector
• Team2: LSTM-RNN-based method (Word2Vec)
• Team3: Rule-based method (Keyword는 시스템 발화에서 추출)
• Team4: SVM-based method (Word frequency vector)
• Team5: DNN-based method
• Features: dialogue act of the system and previous user utterance.
• Team6: LSTM-RNN-based method
• Features: Word vector encoded by the use of NCM (Neural Conversation Model), LSTM,
bag-of-word embedding, and an extended NCM.
Track 3 – Dialog Breakdown Detection
Classification-Related Metrics
Baseline: CRF
Team1: LSTM-RNN-based
Team2: LSTM-RNN-based
Team3: Rule-based
Team4: SVM-based
Team5: DNN-based
Team6: LSTM-RNN based
출처: The Dialogue Breakdown Detection Challenge - Task Description, Datasets, and Evaluation Metrics
Track 3 – Dialog Breakdown Detection
Distribution-Related Metrics
Baseline: CRF
Team1: LSTM-RNN-based
Team2: LSTM-RNN-based
Team3: Rule-based
Team4: SVM-based
Team5: DNN-based
Team6: LSTM-RNN based
출처: The Dialogue Breakdown Detection Challenge - Task Description, Datasets, and Evaluation Metrics
Track 1 – End-to-End Goal Oriented Dialog Learning
• Goal-oriented Dialog Learning
• Goal-oriented 대화는 language modeling 이상의 기술을 필요로 함.
• Asking questions to clearly define a user request.
• Querying Knowledge Bases (KBs).
• Interpreting results from queries to display options to users or Completing a
transaction.
• 대화 도메인
• Restaurant reservation system
• Facebook AI Research open resource 를 코퍼스로 사용 (Bordes et al. 2017)
Track 1 – End-to-End Goal Oriented Dialog Learning
• Task 구성
• Goal-oriented 대화시스템이 갖춰야 할 기능들을 sub-task로 나누어서 각각 평가.
• Task 1: Issuing API calls
• Task 2: Updating API calls
• Task 3: Displaying options
• Task 4: Providing extra information
• Task 5: Conducting full dialogs
Track 1 Example
Track 1 – End-to-End Goal Oriented Dialog Learning
• Dataset 구성
• Task 당 10,000 examples
• 정답 발화 + 발화 candidates
• Evaluation
• Language generation 방식이 아니라
• 발화 candidates Ranking 방식으로 진행됨
• Next-Utterance Classification 이라고 함
• (Lowe et al. 2016)
Track 1 – End-to-End Goal Oriented Dialog Learning
Results published in ICLR 2017
출처: learning end-to-end goal-oriented dialog – A. Bordes 2017
Track 2 – End-to-End Conversation Modeling
• The system has to generate sentences responsive to a user input in a
given dialogue history where it can use external knowledge from web.
Track 2 – End-to-End Conversation Modeling
• Dataset
• Training Data (OpenSubtitles/Twitter ≈ 1M dialogs, 2.2M utterances)
• Test Data: 500 – 1000 dialogs
Track 2 – End-to-End Conversation Modeling
• Baseline System
• LSTM-based seq2seq generation system and a pre-trained model will be
provided.
• Evaluation
• Objective measure: Perplexity, BLEU, etc.
• Subjective measure: Human rating using crowd source.
Track 2 – End-to-End Conversation Modeling
DSTC6 Tracks Conclusion
• Track 1 – End-to-End Goal Oriented Dialog Learning
• Next-Utterance Classification Task
• Track 2 – End-to-End Conversation Modeling
• Language Generation Task
• Track 3 – Dialog Breakdown Detection
• Label Classification Task

More Related Content

PDF
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
PDF
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
PDF
Sentence representations and question answering (YerevaNN)
PDF
Question Answering - Application and Challenges
PPTX
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
PDF
NLP Project Full Cycle
PDF
Practical machine learning - Part 1
PDF
Crash-course in Natural Language Processing
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
Sentence representations and question answering (YerevaNN)
Question Answering - Application and Challenges
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
NLP Project Full Cycle
Practical machine learning - Part 1
Crash-course in Natural Language Processing

What's hot (20)

PPTX
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
PDF
NLP from scratch
PPTX
From TREC to Watson: is open domain question answering a solved problem?
PDF
Aspects of NLP Practice
PPTX
Tomáš Mikolov - Distributed Representations for NLP
PPTX
Recurrent networks and beyond by Tomas Mikolov
PDF
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
PDF
Natural Language Processing in Practice
PDF
ACL 2018 Recap
PDF
AINL 2016: Nikolenko
PDF
The State of #NLProc
PDF
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
PDF
AINL 2016: Galinsky, Alekseev, Nikolenko
PPTX
Text Classification
PDF
Classifying Text using CNN
PDF
Self training improves_nlu
PPTX
Natural language processing: feature extraction
PDF
Can functional programming be liberated from static typing?
PPTX
NLP and LSA getting started
PDF
FaCoY – A Code-to-Code Search Engine
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
NLP from scratch
From TREC to Watson: is open domain question answering a solved problem?
Aspects of NLP Practice
Tomáš Mikolov - Distributed Representations for NLP
Recurrent networks and beyond by Tomas Mikolov
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Natural Language Processing in Practice
ACL 2018 Recap
AINL 2016: Nikolenko
The State of #NLProc
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
AINL 2016: Galinsky, Alekseev, Nikolenko
Text Classification
Classifying Text using CNN
Self training improves_nlu
Natural language processing: feature extraction
Can functional programming be liberated from static typing?
NLP and LSA getting started
FaCoY – A Code-to-Code Search Engine
Ad

Similar to Dstc6 an introduction (20)

PDF
Software Analytics: Data Analytics for Software Engineering
PDF
Deep Domain
PDF
The Eighth Dialog System Technology Challenge (DSTC8)
PDF
GPT-2: Language Models are Unsupervised Multitask Learners
PDF
社内勉強会資料_AnyGPT_Unified Multimodal LLM with Discrete Sequence Modeling
PDF
Software Analytics - Achievements and Challenges
PDF
CommCon 2018 - Realtime Machine Learning
PDF
[246]reasoning, attention and memory toward differentiable reasoning machines
PDF
Triantafyllia Voulibasi
PDF
Internals of Presto Service
PPTX
DSpace 7 - Creating High-Quality Software: Update to Development Practices
PDF
AI for Program Specifications UW PLSE 2025 - final.pdf
PPTX
Deep learning - Chatbot
PPTX
Foundation of ML Project Presentation - 1.pptx
PPTX
Ask me anything: A Conversational Interface to Augment Information Security w...
PPTX
Text Analytics for Legal work
PDF
Automated product categorization
PDF
Automated product categorization
PDF
Multi-turn QA: A RNN Contextual Approach to Intent Classification for Goal-or...
PPT
01.intro
Software Analytics: Data Analytics for Software Engineering
Deep Domain
The Eighth Dialog System Technology Challenge (DSTC8)
GPT-2: Language Models are Unsupervised Multitask Learners
社内勉強会資料_AnyGPT_Unified Multimodal LLM with Discrete Sequence Modeling
Software Analytics - Achievements and Challenges
CommCon 2018 - Realtime Machine Learning
[246]reasoning, attention and memory toward differentiable reasoning machines
Triantafyllia Voulibasi
Internals of Presto Service
DSpace 7 - Creating High-Quality Software: Update to Development Practices
AI for Program Specifications UW PLSE 2025 - final.pdf
Deep learning - Chatbot
Foundation of ML Project Presentation - 1.pptx
Ask me anything: A Conversational Interface to Augment Information Security w...
Text Analytics for Legal work
Automated product categorization
Automated product categorization
Multi-turn QA: A RNN Contextual Approach to Intent Classification for Goal-or...
01.intro
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Transcultural that can help you someday.
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
annual-report-2024-2025 original latest.
PPTX
modul_python (1).pptx for professional and student
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Introduction to Data Science and Data Analysis
Lecture1 pattern recognition............
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Mega Projects Data Mega Projects Data
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Introduction to Knowledge Engineering Part 1
Transcultural that can help you someday.
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
annual-report-2024-2025 original latest.
modul_python (1).pptx for professional and student
IB Computer Science - Internal Assessment.pptx
Predictive modeling basics in data cleaning process
SAP 2 completion done . PRESENTATION.pptx
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Data Science and Data Analysis

Dstc6 an introduction

  • 1. DSTC6 – Dialogue System Technology Challenges An Introduction 서강대학교 자연어처리 연구실 허광호 2017.07.12
  • 2. DSTC6 Tracks • Track 1 – End-to-End Goal Oriented Dialog Learning • Y-Lan Boureau et al. - Facebook AI Research • Track 2 – End-to-End Conversation Modeling • Chiori HORI et al. (Mitsubishi Electric Research Laboratories) • Track 3 – Dialog Breakdown Detection • Ryuichiro Higashinaka et al. (NTT)
  • 3. Track 3 – Dialog Breakdown Detection NB: Not a breakdown PB: Possible breakdown B: Breakdown
  • 4. Track 3 – Dialog Breakdown Detection • 필요한 이유 • Voice agent 서비스가 상업용으로 출시되고 있지만 • Still cannot converse as naturally as two humans. • 가장 큰 문제점은 Voice agent가 가끔 Dialogue breakdown을 유발하는 부적절한 발화를 생성함. • 용도 • Breakdown detection 기술은 Chat-oriented 대화와 같이 대화유지가 중요한 경우 유용함. • 대화 시스템의 error recovery에도 사용할 수 있음.
  • 5. Track 3 – Dialog Breakdown Detection • Dataset • 100 chat-oriented dialogues (21 utterances per dialogue) – 24 annotators. • 1000 chat-oriented dialogues – 2~3 annotators. • 300 chat-oriented dialogues – 30 annotators. • Unfortunately, the data above are in Japanese; • 추가로 영어로 된 100 dialogues 를 수집하여 배포한다고 함. • 평가방법 • Classification-Related metrics – Accuracy, Precision, Recall, F-measure • Distribution-related metrics – JS Divergence and Mean squared error
  • 6. Track 3 – Dialog Breakdown Detection • LREC 2016 Breakdown Detection (In Japanese) 결과 • Baseline: CRF-based method • Team1: LSTM-RNN-based method • Features: Word2Vec + co-occurrence freq. vector + Sent2Vec vector • Team2: LSTM-RNN-based method (Word2Vec) • Team3: Rule-based method (Keyword는 시스템 발화에서 추출) • Team4: SVM-based method (Word frequency vector) • Team5: DNN-based method • Features: dialogue act of the system and previous user utterance. • Team6: LSTM-RNN-based method • Features: Word vector encoded by the use of NCM (Neural Conversation Model), LSTM, bag-of-word embedding, and an extended NCM.
  • 7. Track 3 – Dialog Breakdown Detection Classification-Related Metrics Baseline: CRF Team1: LSTM-RNN-based Team2: LSTM-RNN-based Team3: Rule-based Team4: SVM-based Team5: DNN-based Team6: LSTM-RNN based 출처: The Dialogue Breakdown Detection Challenge - Task Description, Datasets, and Evaluation Metrics
  • 8. Track 3 – Dialog Breakdown Detection Distribution-Related Metrics Baseline: CRF Team1: LSTM-RNN-based Team2: LSTM-RNN-based Team3: Rule-based Team4: SVM-based Team5: DNN-based Team6: LSTM-RNN based 출처: The Dialogue Breakdown Detection Challenge - Task Description, Datasets, and Evaluation Metrics
  • 9. Track 1 – End-to-End Goal Oriented Dialog Learning • Goal-oriented Dialog Learning • Goal-oriented 대화는 language modeling 이상의 기술을 필요로 함. • Asking questions to clearly define a user request. • Querying Knowledge Bases (KBs). • Interpreting results from queries to display options to users or Completing a transaction. • 대화 도메인 • Restaurant reservation system • Facebook AI Research open resource 를 코퍼스로 사용 (Bordes et al. 2017)
  • 10. Track 1 – End-to-End Goal Oriented Dialog Learning • Task 구성 • Goal-oriented 대화시스템이 갖춰야 할 기능들을 sub-task로 나누어서 각각 평가. • Task 1: Issuing API calls • Task 2: Updating API calls • Task 3: Displaying options • Task 4: Providing extra information • Task 5: Conducting full dialogs
  • 12. Track 1 – End-to-End Goal Oriented Dialog Learning • Dataset 구성 • Task 당 10,000 examples • 정답 발화 + 발화 candidates • Evaluation • Language generation 방식이 아니라 • 발화 candidates Ranking 방식으로 진행됨 • Next-Utterance Classification 이라고 함 • (Lowe et al. 2016)
  • 13. Track 1 – End-to-End Goal Oriented Dialog Learning Results published in ICLR 2017 출처: learning end-to-end goal-oriented dialog – A. Bordes 2017
  • 14. Track 2 – End-to-End Conversation Modeling • The system has to generate sentences responsive to a user input in a given dialogue history where it can use external knowledge from web.
  • 15. Track 2 – End-to-End Conversation Modeling • Dataset • Training Data (OpenSubtitles/Twitter ≈ 1M dialogs, 2.2M utterances) • Test Data: 500 – 1000 dialogs
  • 16. Track 2 – End-to-End Conversation Modeling • Baseline System • LSTM-based seq2seq generation system and a pre-trained model will be provided. • Evaluation • Objective measure: Perplexity, BLEU, etc. • Subjective measure: Human rating using crowd source.
  • 17. Track 2 – End-to-End Conversation Modeling
  • 18. DSTC6 Tracks Conclusion • Track 1 – End-to-End Goal Oriented Dialog Learning • Next-Utterance Classification Task • Track 2 – End-to-End Conversation Modeling • Language Generation Task • Track 3 – Dialog Breakdown Detection • Label Classification Task