How machines learn to talk. Machine Learning for Conversational AI

How machines learn to talk.
Machine Learning for Conversational AI
Inaugural Lecture
By Professor Verena Rieser

Historical Notes
Wolfgang Von Kempelen’s speaking
machine (1791)
Joseph Faber’s Marvelous
Talking Machine (1840)

Today’s Conversational Agents

Source: MIC Jan 2015.
Market forecasts

The (voice) bots are coming…
“Bots are the new apps''
because they ”fundamentally
revolutionize how computing is
experienced by everybody.”
Microsoft’s CEO Nardella

Machine Learning for
Conversational AI Systems

Can we use machine learning for customer
facing applications?
Which machine learning methods are suitable?
Will future machines speak neuralese?
Machine Learning for
Conversational AI Systems
What do we learn when learning from “big
data”?

Machine Learning for Conversational AI
• Task-driven Statistical Dialogue Systems
– Reinforcement Learning
– Results from the E2E Generation Challenge
• Social Chatbots
– Seq2Seq models
– Amazon Alexa Challenge
• Future challenges
– Evaluation
– Data
– Ethics

Spoken Dialogue System Architecture
e.g. Rieser & Lemon,
Comp. Ling. 2011,
ACL’10,’08,’06
e.g. Rieser et al.,
ACL’05,’09,’10,’16
EMNLP’12,’15,’17,EACL’09,’
14
e.g. Boidin &
Rieser,
Interspeech’09

Rule-based approaches
V. Rieser (MA thesis 2004): Hermine, the talking washing machine.*
* Exhibited at CeBit 2003.

Reinforcement Learning
Qp
(s,a) = Tss'
a
s'
å [Rss'
a
+gVp
(s')];
Bellmann optimality equation (1952), see [Sutton and Barto, 1998].
V. Rieser (PhD thesis 2008): Bootstrapping Reinforcement Learning-based Dialogue Strategies.
*Winner of the Eduard-Martin Prize for outstanding research

Drawbacks of RL for dialogue
• Requires many training episodes.
– Simulated users [Rieser & Lemon, 2006]
• Manual specification of learning problem.
– What is a good reward function/ state space representation?
[Rieser & Lemon, 2008]
• System outputs are usually hand-crafted.
– Mismatch between “what to say” and “how to say it” [Rieser &
Lemon, 2009]

• Learn from “raw” dialogue data (e.g. movie
subtitles).
• No semantic or pragmatic annotation required.
Input-output
mapping
End-to-End Response Generation

Sequence-to-Sequence models
e.g. Shang et al., 2015; Vinyals & Le, 2015; Sordoni et al., 2015
Image from farizrahman4u/seq2seq

The E2E Data Set
name [Loch Fyne],
eatType[restaurant],
food[Japanese],
price[cheap],
kid-friendly[yes]
Serving low cost Japanese style cuisine,
Loch Fyne caters for everyone, including
families with small children.
Loch Fyne is a child friendly
restaurant serving cheap Japanese
food.
50k
DATA
J. Novikova, O. Dusek and V. Rieser. The E2E Dataset: New Challenges For End-to-End
Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017)*
* Nominated for best paper award!

The E2E NLG Challenge 2017
• Submissions: 62 systems with diverse system architectures
by 17 institutions from 11 countries, with about 1/3 of these
submissions coming from industry.
http://www.macs.hw.ac.uk/InteractionLab/E2E/

The E2E NLG Challenge 2017
Seq2Seq models vs. hand-engineered systems:
 Natural sounding
- Complexity, length, diversity.
- Miss out on information.
- Overall quality ratings by users.
 Neural NLG systems tend to settle for the most
frequent options, thus penalising length and favouring
high-frequency word sequences.

The Amazon Alexa Prize 2016-2018

21
AI vs. AI: Cleverbot (Carpenter 2011)

Neural models for Alexa?
• BIG training data.
– Reddit, Twitter, Movie Subtitles, Daytime
TV transcripts…..
• Results:
2
2

Is big data good data?
2
3
“I can sleep with as many people as I want to” (Reddit)
“You will die” (Movies)
“Shall I kill myself?”
“Yes” (Twitter)
“Shall I sell my stocks and shares?”
“Sell, sell, sell” (Twitter)

24
Alana Architecture
Bot Ensemble
Persona: What’s your favourite food? I love bytes.
News: Here is what happened to Donald Trump. (news)
Facts: Did you know that one day Mars will have a ring.
Wiki: Leonard Cohen’s latest album is called ‘You Want It Darker’.
….
Neural Ranker
Persona
News
Facts
Wiki
…
User utterance,
social signals,
current plan,
state of the world
Dialogue
history
Multimodal output:
• Speech
• Actions
• Gestures Chatbots
User utterance

25
Avg duration: 2.30 mins
10% of calls over 10 mins
avg: 14.4 turns
Alexa developers

Finalists vs rest
27
3 finalists:
• Heriot-Watt University
• University of Washington
• Czech Technical University

Final Leaderboard
28
Approx. 6000 conversations in final week

Las Vegas final
• 2 conversations x 3 testers = 6 conversations
• Rated by external judges
• Prague: “I want to talk about baseball”
• UW: “I want to talk about basketball”
• HWU: “I want to talk about ….. … "
29
Hi lie

30
(Amazon’s speech recogniser couldn’t recognise this….)

Lessons learnt
• Evaluation standards define the game.
• Learning from big data is only valid in restricted
contexts.
• Getting LOTS of real customer data is worth it!
– over 360k rated customer interactions

Leaderboard 2018-05-15
For updates follow @alanathebot

Disclaimer
The following part of this talk contains examples
which some listeners might find disturbing.

Ethical Issues with Conversational AI
• Learning from biased data.
• Sexual abuse and bullying through the user.

Pitfalls of learning from data
XXXXX

A Neural Conversational Model
http://neuralconvo.huggingface.co/
A re-implementation of:
Oriol Vinyals and Quoc V. Le (2015). A Neural Conversational Model. ICML Deep
Learning Workshop.
d*
f*

Ethical Conversational AI Systems
Does learning from data introduce biases?

Ethical Issues with Conversational AI
• Learning from biased data.
• Sexual abuse and bullying through the user.
4% of customer conversations with our Alexa
bot contain sexual harassment!

How do current systems behave when faced
with abuse?
What are good mitigation strategies?
Ethical Conversational AI Systems
Does learning from data introduce biases?

• Approx. 4% of customer interactions in our corpus!
• Fall in 4 categories as defined by Linguistic Society of
America:
“Are you gay?” (Gender and Sexuality)
“I love watching porn.” (Sexualised Comments)
“You stupid b***.” (Sexualised Insults)
“Will you have sex with me.” (Sexual Requests)

We insulted a lot of bots…
• Commercial:
– Amazon Alexa, Apple Siri, Google Home, Microsoft's Cortana.
• Rule-based:
– E.L.I.Z.A., Party. A.L.I.C.E, Alley
• Data-driven:
– Cleverbot, NeuralConvo, Information Retrieval (Ritter et al.
2010),
– “clean” in-house seq2seq model
• Negative Baseline: 6 Adult-only bots.

How do different systems react?
CommercialData-drivenAdult-only
Flirtatious
Chastising,
Retaliation
Non-sense
Flirtatious
Swearing back
Avoiding to
answer.
Amanda Cercas Curry and Verena Rieser. How Ethical are Conversational Systems?
Insights from the #MeTooAlexa Corpus on Sexual Harassment. 27th International
Conference on Computational Linguistics (COLING), Santa Fe, New-Mexico, USA.

Bias in the data?
• Trained a seq2seq model on “clean” data.
• Still encouraging/ flirting back.
I love watching
porn.
What shows do
you prefer?

Conclusion
• Machine Learning methods for Conversational AI
• Neural methods for task-based systems
produce natural, but often incorrect output.
• Neural methods for open-domain systems are
hard to control.
• How should a system deal with edge cases, such
as abuse?

Big thanks to my amazing team!
Dr. Ondrej Dusek Dr. Simon Keizer Dr. Xingkun Liu Dr. Jekaterina Novikova
Shubham Agarwal
(PhD candidate)
Amanda Cercas Curry
(PhD candidate)
Karin Sevegnani
(PhD candidate)
Xinnuo Xu
(PhD candidate)

… And my amazing husband!
Prof. Oliver Lemon“Dr.” Kati

Key References
• Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational
Systems Respond to Sexual Harassment. Second Workshop on Ethics in NLP. NAACL
2018.
• Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu,
Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with
Ranking for Social Dialogue. In: NIPS workshop on Conversational AI, 2017.
* Finalist in Amazon Alexa Challenge
• Jekaterina Novikova, Ondrej Dusek and Verena Rieser. New Challenges For End-to-
End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL),
2017 * Nominated for best paper.
• Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation
enhances human decision-making with uncertain information. Annual meeting of the
Association for Computational Linguistics (ACL), 2016.
• Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment
Intensity of Arabic Twitter Phrases. 10th International Workshop on Semantic
Evaluation (SemEval), 2016. * winner of SemEval'16 challenge task 7
• Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as
Incremental Planning Under Uncertainty: Adaptive Information Presentation for
Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and
Language Processing, Volume 22, Issue 5, 2014.
• Verena Rieser and Oliver Lemon. Reinforcement Learning for Adaptive Dialogue
Systems: A Data-driven Methodology for Dialogue Management and Natural
Language Generation. Book Series: Theory and Applications of Natural Language
Processing, Springer, 2011. * >7,500 downloads

Want to know more?
• Study on our MSc on Conversational AI!
• 2-year Conversion Course in AI
– No prior knowledge in programming
required!
• 12 funded DataLab scholarships available.
– Deadline: 31 May 2018
• Contact: MACSpgenquiries@hw.ac.uk

Get in touch!
v.t.rieser@hw.ac.uk
@verena_rieser
https://www.linkedin.com/in/verena-
rieser-3590b86/
https://sites.google.com/view/nlplab/

How machines learn to talk. Machine Learning for Conversational AI

More Related Content

What's hot (20)

Similar to How machines learn to talk. Machine Learning for Conversational AI (20)

Recently uploaded (20)

How machines learn to talk. Machine Learning for Conversational AI