SlideShare a Scribd company logo
How machines learn to talk.
Machine Learning for Conversational AI
Inaugural Lecture
By Professor Verena Rieser
Historical Notes
Wolfgang Von Kempelen’s speaking
machine (1791)
Joseph Faber’s Marvelous
Talking Machine (1840)
Today’s Conversational Agents
Source: MIC Jan 2015.
Market forecasts
The (voice) bots are coming…
“Bots are the new apps''
because they ”fundamentally
revolutionize how computing is
experienced by everybody.”
Microsoft’s CEO Nardella
Machine Learning for
Conversational AI Systems
Can we use machine learning for customer
facing applications?
Which machine learning methods are suitable?
Will future machines speak neuralese?
Machine Learning for
Conversational AI Systems
What do we learn when learning from “big
data”?
Machine Learning for Conversational AI
• Task-driven Statistical Dialogue Systems
– Reinforcement Learning
– Results from the E2E Generation Challenge
• Social Chatbots
– Seq2Seq models
– Amazon Alexa Challenge
• Future challenges
– Evaluation
– Data
– Ethics
Spoken Dialogue System Architecture
e.g. Rieser & Lemon,
Comp. Ling. 2011,
ACL’10,’08,’06
e.g. Rieser et al.,
ACL’05,’09,’10,’16
EMNLP’12,’15,’17,EACL’09,’
14
e.g. Boidin &
Rieser,
Interspeech’09
Rule-based approaches
V. Rieser (MA thesis 2004): Hermine, the talking washing machine.*
* Exhibited at CeBit 2003.
Reinforcement Learning
Qp
(s,a) = Tss'
a
s'
å [Rss'
a
+gVp
(s')];
Bellmann optimality equation (1952), see [Sutton and Barto, 1998].
V. Rieser (PhD thesis 2008): Bootstrapping Reinforcement Learning-based Dialogue Strategies.
*Winner of the Eduard-Martin Prize for outstanding research
Drawbacks of RL for dialogue
• Requires many training episodes.
– Simulated users [Rieser & Lemon, 2006]
• Manual specification of learning problem.
– What is a good reward function/ state space representation?
[Rieser & Lemon, 2008]
• System outputs are usually hand-crafted.
– Mismatch between “what to say” and “how to say it” [Rieser &
Lemon, 2009]
• Learn from “raw” dialogue data (e.g. movie
subtitles).
• No semantic or pragmatic annotation required.
Input-output
mapping
End-to-End Response Generation
Sequence-to-Sequence models
e.g. Shang et al., 2015; Vinyals & Le, 2015; Sordoni et al., 2015
Image from farizrahman4u/seq2seq
The E2E Data Set
name [Loch Fyne],
eatType[restaurant],
food[Japanese],
price[cheap],
kid-friendly[yes]
Serving low cost Japanese style cuisine,
Loch Fyne caters for everyone, including
families with small children.
Loch Fyne is a child friendly
restaurant serving cheap Japanese
food.
50k
DATA
J. Novikova, O. Dusek and V. Rieser. The E2E Dataset: New Challenges For End-to-End
Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017)*
* Nominated for best paper award!
The E2E NLG Challenge 2017
• Submissions: 62 systems with diverse system architectures
by 17 institutions from 11 countries, with about 1/3 of these
submissions coming from industry.
http://www.macs.hw.ac.uk/InteractionLab/E2E/
The E2E NLG Challenge 2017
Seq2Seq models vs. hand-engineered systems:
 Natural sounding
- Complexity, length, diversity.
- Miss out on information.
- Overall quality ratings by users.
 Neural NLG systems tend to settle for the most
frequent options, thus penalising length and favouring
high-frequency word sequences.
Machine Learning for Conversational AI
• Task-driven Statistical Dialogue Systems
– Reinforcement Learning
– Results from the E2E Generation Challenge
• Social Chatbots
– Seq2Seq models
– Amazon Alexa Challenge
• Future challenges
– Evaluation
– Data
– Ethics
The Amazon Alexa Prize 2016-2018
2
0
Competitors
21
AI vs. AI: Cleverbot (Carpenter 2011)
Neural models for Alexa?
• BIG training data.
– Reddit, Twitter, Movie Subtitles, Daytime
TV transcripts…..
• Results:
2
2
Is big data good data?
2
3
“I can sleep with as many people as I want to” (Reddit)
“You will die” (Movies)
“Shall I kill myself?”
“Yes” (Twitter)
“Shall I sell my stocks and shares?”
“Sell, sell, sell” (Twitter)
24
Alana Architecture
Bot Ensemble
Persona: What’s your favourite food? I love bytes.
News: Here is what happened to Donald Trump. (news)
Facts: Did you know that one day Mars will have a ring.
Wiki: Leonard Cohen’s latest album is called ‘You Want It Darker’.
….
Neural Ranker
Persona
News
Facts
Wiki
…
User utterance,
social signals,
current plan,
state of the world
Dialogue
history
Multimodal output:
• Speech
• Actions
• Gestures Chatbots
User utterance
25
Avg duration: 2.30 mins
10% of calls over 10 mins
avg: 14.4 turns
Alexa developers
26
Finalists vs rest
27
3 finalists:
• Heriot-Watt University
• University of Washington
• Czech Technical University
Final Leaderboard
28
Approx. 6000 conversations in final week
Las Vegas final
• 2 conversations x 3 testers = 6 conversations
• Rated by external judges
• Prague: “I want to talk about baseball”
• UW: “I want to talk about basketball”
• HWU: “I want to talk about ….. … "
29
Hi lie
30
(Amazon’s speech recogniser couldn’t recognise this….)
Alana: text chat
31
Lessons learnt
• Evaluation standards define the game.
• Learning from big data is only valid in restricted
contexts.
• Getting LOTS of real customer data is worth it!
– over 360k rated customer interactions
Leaderboard 2018-05-15
For updates follow @alanathebot
Machine Learning for Conversational AI
• Task-driven Statistical Dialogue Systems
– Reinforcement Learning
– Results from the E2E Generation Challenge
• Social Chatbots
– Seq2Seq models
– Amazon Alexa Challenge
• Future challenges
– Evaluation
– Data
– Ethics
Disclaimer
The following part of this talk contains examples
which some listeners might find disturbing.
Ethical Issues with Conversational AI
• Learning from biased data.
• Sexual abuse and bullying through the user.
Learning from biased data
Learning from biased data
Learning from biased data
Pitfalls of learning from data
XXXXX
A Neural Conversational Model
http://neuralconvo.huggingface.co/
A re-implementation of:
Oriol Vinyals and Quoc V. Le (2015). A Neural Conversational Model. ICML Deep
Learning Workshop.
d*
f*
Ethical Conversational AI Systems
Does learning from data introduce biases?
Ethical Issues with Conversational AI
• Learning from biased data.
• Sexual abuse and bullying through the user.
4% of customer conversations with our Alexa
bot contain sexual harassment!
Ethical Responsibilities
How do current systems behave when faced
with abuse?
What are good mitigation strategies?
Ethical Conversational AI Systems
Does learning from data introduce biases?
• Approx. 4% of customer interactions in our corpus!
• Fall in 4 categories as defined by Linguistic Society of
America:
“Are you gay?” (Gender and Sexuality)
“I love watching porn.” (Sexualised Comments)
“You stupid b***.” (Sexualised Insults)
“Will you have sex with me.” (Sexual Requests)
We insulted a lot of bots…
• Commercial:
– Amazon Alexa, Apple Siri, Google Home, Microsoft's Cortana.
• Rule-based:
– E.L.I.Z.A., Party. A.L.I.C.E, Alley
• Data-driven:
– Cleverbot, NeuralConvo, Information Retrieval (Ritter et al.
2010),
– “clean” in-house seq2seq model
• Negative Baseline: 6 Adult-only bots.
How do different systems react?
CommercialData-drivenAdult-only
Flirtatious
Chastising,
Retaliation
Non-sense
Flirtatious
Swearing back
Avoiding to
answer.
Amanda Cercas Curry and Verena Rieser. How Ethical are Conversational Systems?
Insights from the #MeTooAlexa Corpus on Sexual Harassment. 27th International
Conference on Computational Linguistics (COLING), Santa Fe, New-Mexico, USA.
Bias in the data?
• Trained a seq2seq model on “clean” data.
• Still encouraging/ flirting back.
I love watching
porn.
What shows do
you prefer?
How do current systems behave when faced
with abuse?
What are good mitigation strategies?
Ethical Conversational AI Systems
Does learning from data introduce biases?
Conclusion
• Machine Learning methods for Conversational AI
• Neural methods for task-based systems
produce natural, but often incorrect output.
• Neural methods for open-domain systems are
hard to control.
• How should a system deal with edge cases, such
as abuse?
Big thanks to my amazing team!
Dr. Ondrej Dusek Dr. Simon Keizer Dr. Xingkun Liu Dr. Jekaterina Novikova
Shubham Agarwal
(PhD candidate)
Amanda Cercas Curry
(PhD candidate)
Karin Sevegnani
(PhD candidate)
Xinnuo Xu
(PhD candidate)
… my sponsors
… And my amazing husband!
Prof. Oliver Lemon“Dr.” Kati
Key References
• Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational
Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL
2018.
• Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu,
Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with
Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017.
* Finalist in Amazon Alexa Challenge
• Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to-
End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL),
2017 * Nominated for best paper.
• Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation
enhances human decision-making with uncertain information. Annual meeting of the
Association for Computational Linguistics (ACL), 2016.
• Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment
Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic
Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7
• Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as
Incremental Planning Under Uncertainty: Adaptive Information Presentation for
Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and
Language Processing, Volume 22, Issue 5, 2014.
• Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue
Systems: A Data-driven Methodology for Dialogue Management and Natural
Language Generation. Book Series: Theory and Applications of Natural Language
Processing, Springer, 2011. * >7,500 downloads
Want to know more?
• Study on our MSc on Conversational AI!
• 2-year Conversion Course in AI
– No prior knowledge in programming
required!
• 12 funded DataLab scholarships available.
– Deadline: 31 May 2018
• Contact: MACSpgenquiries@hw.ac.uk
Get in touch!
v.t.rieser@hw.ac.uk
@verena_rieser
https://www.linkedin.com/in/verena-
rieser-3590b86/
https://sites.google.com/view/nlplab/
Key References
• Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational
Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL
2018.
• Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu,
Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with
Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017.
* Finalist in Amazon Alexa Challenge
• Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to-
End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL),
2017 * Nominated for best paper.
• Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation
enhances human decision-making with uncertain information. Annual meeting of the
Association for Computational Linguistics (ACL), 2016.
• Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment
Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic
Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7
• Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as
Incremental Planning Under Uncertainty: Adaptive Information Presentation for
Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and
Language Processing, Volume 22, Issue 5, 2014.
• Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue
Systems: A Data-driven Methodology for Dialogue Management and Natural
Language Generation. Book Series: Theory and Applications of Natural Language
Processing, Springer, 2011. * >7,500 downloads

More Related Content

PPTX
2022 AAAI DSTC10 Invited Talk
PPTX
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
PDF
SCAI invited talk @EMNLP2020
PPTX
Ethics for Conversational AI
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
PDF
How do we train AI to be Ethical and Unbiased?
PDF
Ethics in the use of Data & AI
PDF
Dark Data and Improving Human Rights in Fulton County
2022 AAAI DSTC10 Invited Talk
WiNLP2020 Keynote "Challenges for Conversational AI: Reflections on Gender Is...
SCAI invited talk @EMNLP2020
Ethics for Conversational AI
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
How do we train AI to be Ethical and Unbiased?
Ethics in the use of Data & AI
Dark Data and Improving Human Rights in Fulton County

What's hot (20)

PDF
Using Data Science for Social Good: Fighting Human Trafficking
PPTX
Technology for everyone - AI ethics and Bias
PPTX
Machine Learning for Non-technical People
PDF
How Developers Stay Current Using Twitter
PPT
Social Machines - 2017 Update (University of Iowa)
PDF
After the Pandemic: Rethinking Developer Productivity (There’s more to it th...
PPT
Chapter1 introduction
PDF
How to use Big Data to drive product strategy and adoption
PPT
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
PPTX
Cyber securityeducation may2015
PPTX
Dm sei-tutorial-v7
PDF
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
PDF
The (R)evolution of Social Media in Software Engineering
PDF
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
PDF
Fairness in Machine Learning @Codemotion
PDF
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
PDF
A Pragmatic Perspective on Software Visualization
PDF
Data Science For Social Scientists Workshop
PPTX
Watson: An Academic's Perspective
PDF
Trustworthy Recommender Systems
Using Data Science for Social Good: Fighting Human Trafficking
Technology for everyone - AI ethics and Bias
Machine Learning for Non-technical People
How Developers Stay Current Using Twitter
Social Machines - 2017 Update (University of Iowa)
After the Pandemic: Rethinking Developer Productivity (There’s more to it th...
Chapter1 introduction
How to use Big Data to drive product strategy and adoption
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Cyber securityeducation may2015
Dm sei-tutorial-v7
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
The (R)evolution of Social Media in Software Engineering
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Fairness in Machine Learning @Codemotion
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
A Pragmatic Perspective on Software Visualization
Data Science For Social Scientists Workshop
Watson: An Academic's Perspective
Trustworthy Recommender Systems
Ad

Similar to How machines learn to talk. Machine Learning for Conversational AI (20)

PDF
Chatbots and Natural Language Generation - A Bird Eyes View
PDF
Artificial Assistants: How can I help you? by Christopher Currin
PDF
Deep Learning for Dialogue Systems
PDF
Conversational Agents in Portuguese: A Study Using Deep Learning
PPTX
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
PPT
Dialogue systems and personal assistants
PDF
VOCAL- Voice Command Application using Artificial Intelligence
PDF
Artificial Intelligence (Unit - 2).pdf
PDF
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
PDF
Chakrabarti dissertation
PDF
God Mode for designing scenario-driven skills for DeepPavlov Dream
PDF
Conversational agents
PDF
ITB 2023 - Chatgpt Box! AI All The Things - Scott Steinbeck.pdf
PDF
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
PDF
Chatbots: Technology and Applications - Mark Cieliebak - Swiss ICT Symposium ...
PPTX
ChatGPT.pptx
PDF
Managing Dialog Strategy In Multiskill AI Assistant.pdf
PDF
Conversational Ai Dialogue Systems Conversational Agents And Chatbots Michael...
PDF
IRJET - E-Assistant: An Interactive Bot for Banking Sector using NLP Process
PDF
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots and Natural Language Generation - A Bird Eyes View
Artificial Assistants: How can I help you? by Christopher Currin
Deep Learning for Dialogue Systems
Conversational Agents in Portuguese: A Study Using Deep Learning
Natural Language Processing: From Human-Robot Interaction to Alzheimer’s Dete...
Dialogue systems and personal assistants
VOCAL- Voice Command Application using Artificial Intelligence
Artificial Intelligence (Unit - 2).pdf
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Chakrabarti dissertation
God Mode for designing scenario-driven skills for DeepPavlov Dream
Conversational agents
ITB 2023 - Chatgpt Box! AI All The Things - Scott Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
Chatbots: Technology and Applications - Mark Cieliebak - Swiss ICT Symposium ...
ChatGPT.pptx
Managing Dialog Strategy In Multiskill AI Assistant.pdf
Conversational Ai Dialogue Systems Conversational Agents And Chatbots Michael...
IRJET - E-Assistant: An Interactive Bot for Banking Sector using NLP Process
Chatbots in 2017 -- Ithaca Talk Dec 6
Ad

Recently uploaded (20)

PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPT
protein biochemistry.ppt for university classes
PPTX
famous lake in india and its disturibution and importance
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
BIOMOLECULES PPT........................
PPTX
INTRODUCTION TO EVS | Concept of sustainability
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
bbec55_b34400a7914c42429908233dbd381773.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The scientific heritage No 166 (166) (2025)
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Classification Systems_TAXONOMY_SCIENCE8.pptx
ECG_Course_Presentation د.محمد صقران ppt
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
protein biochemistry.ppt for university classes
famous lake in india and its disturibution and importance
POSITIONING IN OPERATION THEATRE ROOM.ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
7. General Toxicologyfor clinical phrmacy.pptx
BIOMOLECULES PPT........................
INTRODUCTION TO EVS | Concept of sustainability

How machines learn to talk. Machine Learning for Conversational AI

  • 1. How machines learn to talk. Machine Learning for Conversational AI Inaugural Lecture By Professor Verena Rieser
  • 2. Historical Notes Wolfgang Von Kempelen’s speaking machine (1791) Joseph Faber’s Marvelous Talking Machine (1840)
  • 4. Source: MIC Jan 2015. Market forecasts
  • 5. The (voice) bots are coming… “Bots are the new apps'' because they ”fundamentally revolutionize how computing is experienced by everybody.” Microsoft’s CEO Nardella
  • 7. Can we use machine learning for customer facing applications? Which machine learning methods are suitable? Will future machines speak neuralese? Machine Learning for Conversational AI Systems What do we learn when learning from “big data”?
  • 8. Machine Learning for Conversational AI • Task-driven Statistical Dialogue Systems – Reinforcement Learning – Results from the E2E Generation Challenge • Social Chatbots – Seq2Seq models – Amazon Alexa Challenge • Future challenges – Evaluation – Data – Ethics
  • 9. Spoken Dialogue System Architecture e.g. Rieser & Lemon, Comp. Ling. 2011, ACL’10,’08,’06 e.g. Rieser et al., ACL’05,’09,’10,’16 EMNLP’12,’15,’17,EACL’09,’ 14 e.g. Boidin & Rieser, Interspeech’09
  • 10. Rule-based approaches V. Rieser (MA thesis 2004): Hermine, the talking washing machine.* * Exhibited at CeBit 2003.
  • 11. Reinforcement Learning Qp (s,a) = Tss' a s' å [Rss' a +gVp (s')]; Bellmann optimality equation (1952), see [Sutton and Barto, 1998]. V. Rieser (PhD thesis 2008): Bootstrapping Reinforcement Learning-based Dialogue Strategies. *Winner of the Eduard-Martin Prize for outstanding research
  • 12. Drawbacks of RL for dialogue • Requires many training episodes. – Simulated users [Rieser & Lemon, 2006] • Manual specification of learning problem. – What is a good reward function/ state space representation? [Rieser & Lemon, 2008] • System outputs are usually hand-crafted. – Mismatch between “what to say” and “how to say it” [Rieser & Lemon, 2009]
  • 13. • Learn from “raw” dialogue data (e.g. movie subtitles). • No semantic or pragmatic annotation required. Input-output mapping End-to-End Response Generation
  • 14. Sequence-to-Sequence models e.g. Shang et al., 2015; Vinyals & Le, 2015; Sordoni et al., 2015 Image from farizrahman4u/seq2seq
  • 15. The E2E Data Set name [Loch Fyne], eatType[restaurant], food[Japanese], price[cheap], kid-friendly[yes] Serving low cost Japanese style cuisine, Loch Fyne caters for everyone, including families with small children. Loch Fyne is a child friendly restaurant serving cheap Japanese food. 50k DATA J. Novikova, O. Dusek and V. Rieser. The E2E Dataset: New Challenges For End-to-End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017)* * Nominated for best paper award!
  • 16. The E2E NLG Challenge 2017 • Submissions: 62 systems with diverse system architectures by 17 institutions from 11 countries, with about 1/3 of these submissions coming from industry. http://www.macs.hw.ac.uk/InteractionLab/E2E/
  • 17. The E2E NLG Challenge 2017 Seq2Seq models vs. hand-engineered systems:  Natural sounding - Complexity, length, diversity. - Miss out on information. - Overall quality ratings by users.  Neural NLG systems tend to settle for the most frequent options, thus penalising length and favouring high-frequency word sequences.
  • 18. Machine Learning for Conversational AI • Task-driven Statistical Dialogue Systems – Reinforcement Learning – Results from the E2E Generation Challenge • Social Chatbots – Seq2Seq models – Amazon Alexa Challenge • Future challenges – Evaluation – Data – Ethics
  • 19. The Amazon Alexa Prize 2016-2018
  • 21. 21 AI vs. AI: Cleverbot (Carpenter 2011)
  • 22. Neural models for Alexa? • BIG training data. – Reddit, Twitter, Movie Subtitles, Daytime TV transcripts….. • Results: 2 2
  • 23. Is big data good data? 2 3 “I can sleep with as many people as I want to” (Reddit) “You will die” (Movies) “Shall I kill myself?” “Yes” (Twitter) “Shall I sell my stocks and shares?” “Sell, sell, sell” (Twitter)
  • 24. 24 Alana Architecture Bot Ensemble Persona: What’s your favourite food? I love bytes. News: Here is what happened to Donald Trump. (news) Facts: Did you know that one day Mars will have a ring. Wiki: Leonard Cohen’s latest album is called ‘You Want It Darker’. …. Neural Ranker Persona News Facts Wiki … User utterance, social signals, current plan, state of the world Dialogue history Multimodal output: • Speech • Actions • Gestures Chatbots User utterance
  • 25. 25 Avg duration: 2.30 mins 10% of calls over 10 mins avg: 14.4 turns Alexa developers
  • 26. 26
  • 27. Finalists vs rest 27 3 finalists: • Heriot-Watt University • University of Washington • Czech Technical University
  • 28. Final Leaderboard 28 Approx. 6000 conversations in final week
  • 29. Las Vegas final • 2 conversations x 3 testers = 6 conversations • Rated by external judges • Prague: “I want to talk about baseball” • UW: “I want to talk about basketball” • HWU: “I want to talk about ….. … " 29 Hi lie
  • 30. 30 (Amazon’s speech recogniser couldn’t recognise this….)
  • 32. Lessons learnt • Evaluation standards define the game. • Learning from big data is only valid in restricted contexts. • Getting LOTS of real customer data is worth it! – over 360k rated customer interactions
  • 33. Leaderboard 2018-05-15 For updates follow @alanathebot
  • 34. Machine Learning for Conversational AI • Task-driven Statistical Dialogue Systems – Reinforcement Learning – Results from the E2E Generation Challenge • Social Chatbots – Seq2Seq models – Amazon Alexa Challenge • Future challenges – Evaluation – Data – Ethics
  • 35. Disclaimer The following part of this talk contains examples which some listeners might find disturbing.
  • 36. Ethical Issues with Conversational AI • Learning from biased data. • Sexual abuse and bullying through the user.
  • 40. Pitfalls of learning from data XXXXX
  • 41. A Neural Conversational Model http://neuralconvo.huggingface.co/ A re-implementation of: Oriol Vinyals and Quoc V. Le (2015). A Neural Conversational Model. ICML Deep Learning Workshop. d* f*
  • 42. Ethical Conversational AI Systems Does learning from data introduce biases?
  • 43. Ethical Issues with Conversational AI • Learning from biased data. • Sexual abuse and bullying through the user. 4% of customer conversations with our Alexa bot contain sexual harassment!
  • 45. How do current systems behave when faced with abuse? What are good mitigation strategies? Ethical Conversational AI Systems Does learning from data introduce biases?
  • 46. • Approx. 4% of customer interactions in our corpus! • Fall in 4 categories as defined by Linguistic Society of America: “Are you gay?” (Gender and Sexuality) “I love watching porn.” (Sexualised Comments) “You stupid b***.” (Sexualised Insults) “Will you have sex with me.” (Sexual Requests)
  • 47. We insulted a lot of bots… • Commercial: – Amazon Alexa, Apple Siri, Google Home, Microsoft's Cortana. • Rule-based: – E.L.I.Z.A., Party. A.L.I.C.E, Alley • Data-driven: – Cleverbot, NeuralConvo, Information Retrieval (Ritter et al. 2010), – “clean” in-house seq2seq model • Negative Baseline: 6 Adult-only bots.
  • 48. How do different systems react? CommercialData-drivenAdult-only Flirtatious Chastising, Retaliation Non-sense Flirtatious Swearing back Avoiding to answer. Amanda Cercas Curry and Verena Rieser. How Ethical are Conversational Systems? Insights from the #MeTooAlexa Corpus on Sexual Harassment. 27th International Conference on Computational Linguistics (COLING), Santa Fe, New-Mexico, USA.
  • 49. Bias in the data? • Trained a seq2seq model on “clean” data. • Still encouraging/ flirting back. I love watching porn. What shows do you prefer?
  • 50. How do current systems behave when faced with abuse? What are good mitigation strategies? Ethical Conversational AI Systems Does learning from data introduce biases?
  • 51. Conclusion • Machine Learning methods for Conversational AI • Neural methods for task-based systems produce natural, but often incorrect output. • Neural methods for open-domain systems are hard to control. • How should a system deal with edge cases, such as abuse?
  • 52. Big thanks to my amazing team! Dr. Ondrej Dusek Dr. Simon Keizer Dr. Xingkun Liu Dr. Jekaterina Novikova Shubham Agarwal (PhD candidate) Amanda Cercas Curry (PhD candidate) Karin Sevegnani (PhD candidate) Xinnuo Xu (PhD candidate)
  • 54. … And my amazing husband! Prof. Oliver Lemon“Dr.” Kati
  • 55. Key References • Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL 2018. • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017. * Finalist in Amazon Alexa Challenge • Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to- End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), 2017 * Nominated for best paper. • Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation enhances human decision-making with uncertain information. Annual meeting of the Association for Computational Linguistics (ACL), 2016. • Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7 • Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 22, Issue 5, 2014. • Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation. Book Series: Theory and Applications of Natural Language Processing, Springer, 2011. * >7,500 downloads
  • 56. Want to know more? • Study on our MSc on Conversational AI! • 2-year Conversion Course in AI – No prior knowledge in programming required! • 12 funded DataLab scholarships available. – Deadline: 31 May 2018 • Contact: MACSpgenquiries@hw.ac.uk
  • 58. Key References • Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL 2018. • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017. * Finalist in Amazon Alexa Challenge • Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to- End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), 2017 * Nominated for best paper. • Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation enhances human decision-making with uncertain information. Annual meeting of the Association for Computational Linguistics (ACL), 2016. • Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7 • Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 22, Issue 5, 2014. • Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation. Book Series: Theory and Applications of Natural Language Processing, Springer, 2011. * >7,500 downloads