SlideShare a Scribd company logo
© 2017 Versay Solutions
Voice User Interface
Design:
Skills, Actions, And The Future
Crispin Reedy, Versay Solutions
@crispinTX crispinreedy.com
#BigD17
© 2017 Versay Solutions
Voice User Interface
Design:
Skills, Actions, And The Future
Disclaimer: This session was
NOT sponsored by Dominos
© 2017 Versay Solutions
• Voice User Interface Designer
• 15+ years in the field
• Former coder; got interested in UX
• President of the Association for Voice
Interaction Design
• Consultant for Versay Solutions
@crispinTX
crispinreedy.com
© 2017 Versay Solutions
Session Description
• Amazon Skills for Alexa, Google Actions for Home
– Should your company build a conversational
voice interface for one of these systems, and if
so, how?
• What are the differences between a voice user
interface and other types of UIs?
• What types of skills does a VUI designer need?
• What are some best practices for these VUIs?
• You’ll walk away with answers to the questions
“If, Why, and How” you might choose to explore
this interesting new area of design.
© 2017 Versay Solutions
Easy Answer To #1
• If your company is involved in home
automation:
• Mostly likely Yes, and Yesterday
• Although how you do it will depend on your
platform
• More on that later!
• Everyone else
• Let’s keep talking!
© 2017 Versay Solutions
Basic Terms
© 2017 Versay Solutions
Terms & Technologies
•Speech Recognition
•Natural Language Understanding
•Voice Verification (Biometrics)
•Text to Speech
© 2017 Versay Solutions
Speech Recognition “ASR”
“See the cat.”
© 2017 Versay Solutions
Natural Language Understanding
•Extracting meaning from natural text
“Hello, yes,
I’d like to
pay my
water bill.
Can you
help me with
that?
Intent =
BillPay
Entity
(Bill Type) =
Water
© 2017 Versay Solutions
Voice Verification
“My voice is
my password.”
“Authenticated.
Welcome, Mr.
Smith.”
✓
Text To Speech
© 2017 Versay Solutions
Speech Recognition
• Hands-free command /
control
• Dictation
• Input text
• Small form factor
device, etc.
Text To Speech
• Output text dynamically
• Respond to input
• Useful when no
display is available
Natural Language
Understanding
• Necessary for all
language-based input
• Extract meaning
• Parse large volumes of
text
Voice Verification
• Security
ASR
Application
Data
• Sign-In
• Interaction
• Request
• Action
• Meaning
• Access Data
• Output
TTS
NLU
Voice
prints
Verifi-
cation
© 2017 Versay Solutions
Speech Technology
Today
© 2017 Versay Solutions
Speech Agents, Apps, and APIs
Speech Agents:
• Amazon Alexa
• Echo, Dot, Echo Show
• Google Assistant
• Pixel, Android, Google Home, iPhone app
• Apple’s Siri
• iPhone, iPad, MacOS (Sierra), AppleTV
• Microsoft’s Cortana
• Windows 10, Windows Phone, Xbox, iPhone app
• Samsung’s Bixby
• Galaxy S8, Family Hub 2.0 Fridge
© 2017 Versay Solutions
Speech Agents, Apps, and APIs
Speech Agents can be extended with
“Voice Apps”
• Alexa Skills
• Google Actions
• SiriKit
• Cortana SDK
© 2017 Versay Solutions
Speech Agents, Apps, and APIs
Agent capabilities and apps are somewhat
determined by:
• Platform: Device
• Screen, keyboard, phone, mics, etc.
• Environment: Web site, apps that interact with
the agent
• Ecosystem: Underlying connections, technical
partnerships
© 2017 Versay Solutions
Platforms
© 2017 Versay Solutions
Environment
Google “Actions” or
“Apps”
• Curated
• Direct vs.
Conversational
Siri - Works via apps
Order Uber Order Lyft
© 2017 Versay Solutions New York Times
© 2017 Versay Solutions
Speech Agents, Apps, and APIs
APIs: Allow you access to the underlying
technology
• Amazon
• AVS (Alexa Voice Service) Create an “Alexa” on your
own device
• Amazon Lex, Amazon Polly
• Google
• Cloud Speech API
• API.ai
• Apple
• Apple Speech Framework
• Microsoft
• Bing Speech API
Ecobee Smart
Thermostat
© 2017 Versay Solutions
Use Cases
Use Case “Bakeoff” from Tech Insider
•Travel
•Email
•Messaging
•Sports
•Music
•Weather
•Calendar
•Social
• Translation
• Basic tasks
• General knowledge
• Personality
http://www.businessinsider.com/siri-vs-google-assistant-cortana-alexa-2016-11/
© 2017 Versay Solutions
Use Case “Bakeoff” from Tech Insider
• “wildly finicky when it comes to phrasing.”
• “Each assistant still feels like a fragile, thinly veiled
web of loosely connected services — because that's
what they are.”
• “incredibly uncomfortable to speak to an inanimate
thing in public.”
• “In Google Assistant's case, normalizing the need to
call on a brand ("OK Google") whenever you need a
hand is Orwellian.”
• “None of these things
are at a place I could
comfortably call
"good.””
© 2017 Versay Solutions
Personal Assistant vs. Home Assistant
The Google Pixel XL.
Hollis Johnson/Business Insider
Google.com
© 2017 Versay Solutions
Personal Assistant vs. Home Assistant
© 2017 Versay Solutions
Getting Specific With
Alexa
© 2017 Versay Solutions
“Layers” of Alexa
•Alexa Native Capabilities
•Alexa Skills
•Alexa Voice Services
© 2017 Versay Solutions
“Layers” of Alexa
• Alexa Native Capabilities
• Come out of the box
• Require Alexa wake word (can be changed)
• Alexa Skills
• Alexa’s “Extensions” or “Add-Ons”
• Designed for and deployed on Echo Device
• Skills must be downloaded to Echo
• Require Alexa wake word + Skill name
• Alexa Voice Services
• Add Alexa voice control to your own device
© 2017 Versay Solutions
Alexa “Native” Capabilities
Alexa, what’s 3 + 5?
Alexa, set an alarm
for 3 am.
Alexa, set a thirty
second timer.
Alexa, what’s the
weather?
Note:
Mix of TTS &
Pre-Recorded
Audio
Note:
“Hint”
© 2017 Versay Solutions
Design Considerations
•Proactive “Hints”
• Similar to “Hover Help” or “Tool Tip”
• But less avoidable!
• Pro: Can teach user about other capabilities
• Con: Can be annoying!
• Guideline: If used, be sparing
• Develop rules for when and how frequently to
offer
© 2017 Versay Solutions Amazon.com
Native & Skill
Skill
Skill
Skill
Native & Skill
Alexa Skills
© 2017 Versay Solutions Source: David Attwater, EIG Inc.
© 2017 Versay Solutions Amazon.com
Alexa Skills
Amazon.comAmazon.com
© 2017 Versay Solutions
Amazon.com
© 2017 Versay Solutions
Design Considerations
• Invoking Skills:
• Alexa, open Oprah Magazine
• Alexa, order a pizza from Domino’s
• Alexa, ask Cook Reference what’s the
safe temperature for chicken
• Syntax:
Open <skill>
Ask <skill> for (about, to, with, etc.)
<action>
Ask <skill> <question>
Also: Search, Tell, Talk to, Launch, Start,
Resume, Run, Load, Begin
Oprah Magazine
© 2017 Versay Solutions
Design Considerations
•Skills can be “installed” on the
fly
•If the user knows the name of
the skill
•Skills that require account
information will need extra
steps
Cook Reference
Domino’s
© 2017 Versay Solutions
Alexa App + Linking
© 2017 Versay Solutions
Design Considerations
•Managing access to skills may become
difficult or confusing.
© 2017 Versay Solutions
Design Considerations
•Attention (or lack of attention!) to
technical details can become “deal-
killing” part of overall experience
Domino’s
© 2017 Versay Solutions
Really?
Dominos.com
© 2017 Versay Solutions
No
Dominos.com
© 2017 Versay Solutions
Design Considerations
• Confirmation
• What’s the phone number?
• 214-555-1235
• You said 214-555-1235. Is that correct?
• Yes
• Note: System confirmed the phone number but
not the address
• Was the address really correct?
© 2017 Versay Solutions Dominos.com
© 2017 Versay Solutions
Design Considerations
• “Would you like to place your Easy
Order, reorder your most recent
order, or start a new order?”
• If I’m not logged into my account on
the Alexa app, options 1 and 2 don’t
make much sense.
• “Would you like” is ambiguous – could
be used for Yes / No questions or for
multi-item questions
• First part of the sentence runs into the
choices
• Reuse of the word “order” just seems
odd (but may be unavoidable).
• Could have used more pauses (SSML)
Domino’s
© 2017 Versay Solutions
Design Considerations: SSML
• Speech Synthesis Markup Language
• Can control the way your TTS playback sounds
• Very important if your output is mostly TTS
• Which is true of all most platforms
• Should be supported by all types of TTS
engine
• Amazon has platform specific options
• Plan on using it to fine tune your audio output
© 2017 Versay Solutions
New Prompts & SSML Examples
• Note: TTS Samples with SSML created
with Amazon Polly, not Alexa
• “You can: Place your easy order.
Reorder your most recent order. Or,
start a *new* order.”
• You can: <break time="500ms"/>Place your easy order,
<break time="500ms"/> Reorder your <emphasis
level="moderate">most recent</emphasis> order, <break
time="500ms"/> Or, start a <emphasis
level="strong">new</emphasis> order.
• Placing an order, great! Choose from:
My easy order. My most recent. Or,
start a *new* order.
• <speak>Placing an order. <prosody
pitch="high">Great!</prosody> Choose from: My easy
order. My most recent. Or, start a <emphasis> <prosody
pitch="high">new</prosody> </emphasis>
order.</speak>
Domino’s
© 2017 Versay Solutions
Still Trying To Order That Pizza
• Start of the interaction has changed!
• Probably due to login
• “Would you like to place an order, or
track an order?”
• What just happened!!!?
•System was expecting me to say
“Start a new order” and I only said
“New Order.”
Domino’s
© 2017 Versay Solutions
Design Considerations
• Make sure your input grammar covers all
possible logical utterances (what user can say)
• Don’t leave this stuff up to the programmers!
• Provide examples of coverage
• Coverage should match prompts
• Use some kind of markup to show coverage
• [] optional
• () grouping
• | or
• “Would you like to place your Easy Order,
reorder your most recent order, or start a new
order?”
• [place] [my | an] Easy Order
• [reorder] [my] most recent [order]
• [start a] new [order]
© 2017 Versay Solutions
Design Considerations
•Reprompts:
• What do you do when you didn’t
understand what the caller said?
• Probably don’t want to say “Sorry”
• This can be annoying
• But you CAN rephrase the prompt to make it
different
• Using the same prompt gives the user a sense
that something has gone wrong
© 2017 Versay Solutions
Pizza Pizza Pizza
• Hey you didn’t really need to
explain about the phone number
since I saved it but OK….
• Address has been saved to profile,
great!
• And then boom
Domino’s
© 2017 Versay Solutions
With Speech, you need to spend
a lot more time thinking about
what happens when things go
wrong.
© 2017 Versay Solutions
I Didn’t Really Want to Order Pizza
But By Now I Am Hungry
And So Is Somebody Else
• Note “Easy Order” and Credit Card
cannot be set up on the website
unless you’re actually placing an order.
• Give people enough time to talk!
• There’s that grammar coverage issue
again
• Bell pepper = Green pepper
• What synonyms is your user likely to say?
• At some point couldn’t you just give me a list?
• Notice how they screwed up the article +
the item “… adding a parmesan bread
twists”
Meow
Domino’s
© 2017 Versay Solutions
© 2017 Versay Solutions
© 2017 Versay Solutions
Design Considerations
• Confirm and correct
• “Do you want to add anything else?”
• “Yes, I want to add peppers.”
• Disambiguation
• “Olives”
• “Ok, we have two kinds of olives. Black olives, or
green olives.”
• A Voice User Interface design is a time-based
interface
• As a designer concerned with user experience
you’re going to be involved in things (such as
pauses) which may not occur to you
© 2017 Versay Solutions
How Did Google Home Do?
•“OK Google, Order Dominos”
• “There are stores at….”
• Had to go find the right “App Name” online
•“OK Google, Talk to Dominos”
• “You can link to your Domino’s account…”
• Had a terrible time finding the “Google
Apps.”
© 2017 Versay Solutions
How Did Google Home Do?
•Menu worked!
• System did not recognize “Ham” (Should
offer list of ingredients)
• System became very laggy
© 2017 Versay Solutions
How Did Google Home Do?
• Edited for time
• Original was 3:35
• This is 2:15
• Use of “Dom” persona and male voice
• “Hand off”
• Playback of address:
• Alexa: “Eighty seven twenty three”
• Google: “Eight thousand seven hundred twenty
three”
• Same issue with “twists”
• “Your day just got cheesier”
© 2017 Versay Solutions
Design Considerations
•Discoverability
• “OK Google, Order Dominos”
•Persona
• Google Home has more control over the
voice
• Branding considerations – “Dom” name and
male TTS
•Playback of Dynamic Data
• Attention to detail – don’t trust the platform
to do it the way you want it
© 2017 Versay Solutions
Design Considerations
Maintaining State:
•Between dialogs
• “Who is Seth McFarlane?”
• “Seth McFarlane is…”
• “When’s his birthday?”
• “I’m not sure what you’re talking about.”
•From session to session
Oprah Magazine
© 2017 Versay Solutions
Home Automation
•Onboarding issues are very similar to
“Skills,” but there is an additional layer of
complexity
• Companies are working to improve the
experience
• After setup, you get a lot of bang for the
buck
© 2017 Versay Solutions
“Computer,
turn on the
library lights”
© 2017 Versay Solutions
TP Link
© 2017 Versay Solutions Amazon
© 2017 Versay Solutions
Design Considerations: Summary
• Managing access to Skills
(App, Store)
• Managing the Onboarding
Experience
• Discoverability
• Invoking Skills
• Hints
• Confirmation
• Asking Yes/No Questions vs.
Multi-Item Questions
• SSML
• Silences
• Reprompting
• Coverage (prompt vs.
possible input)
• Managing technical
errors
• Timing and Timeouts
• Article matching the
noun
• Confirm and Correct
• Disambiguation
• Persona
• Playback of Dynamic
Data
• Maintaining State
© 2017 Versay Solutions
What Makes a Good VUI Designer?
•Concern with the overall experience
• All of the channels that go into making up
how something happens
•Attention to “small” technical details
• Pauses
• SSML
•Writing skills!
• Dialog, not tech doc
• English majors, screenwriters
© 2017 Versay Solutions
Session Description
• Amazon Skills for Alexa, Google Actions for Home
– Should your company build a conversational
voice interface for one of these systems, and if
so, how?
• What are the differences between a voice user
interface and other types of UIs? ✔
• What types of skills does a VUI designer need? ✔
• What are some best practices for these VUIs? ✔
• You’ll walk away with answers to the questions
“If, Why, and How” you might choose to explore
this interesting new area of design.
✔
© 2017 Versay Solutions
If, Why, How
•What are you trying to build?
•Existing guidelines / research
•User testing is key
• Especially if you’re trying to do something
complicated
© 2017 Versay Solutions
If, Why, How: Beyond Skills
Write an app (skill) for
an agent such as
Google Assistant /
Alexa
Use cloud APIs to add
ASR / NLU to your app /
device / page / gadget
Download software and
use full-featured
capabilities for more robust
recognition on a specific
device
Build your own
© 2017 Versay Solutions
If, Why, How: What’s the Use Case?
•Enabling application
• User can’t do it any other way
• New tasks
•Enhancing application
• User can do it now
• But speech makes it better
• Faster
• Safer
© 2017 Versay Solutions
API-Based
Device-
Based
Roll Your
Own /
Open-
Source
•Flexibility
•Power
•Customization
•Time
•Difficulty
© 2017 Versay Solutions
Existing Guidelines / Research
• Caveat: Best practices evolved in one
modality (e.g. voice-only) may not apply the
same way in another (e.g. combined voice +
touch)
• But they could be adapted
• Association for Voice Interaction Design
(AVIxD.org)
• Wiki
• Peer-Reviewed Journal
• Virtual “Brown Bags”
• Academic Sources, Books
© 2017 Versay Solutions
AVIxD.org
CUI Working Group is actively recruiting!
© 2017 Versay Solutions
@crispinTX
Crispin Reedy
Thank You!

More Related Content

PPTX
A.I.PPT
PDF
Deep Learning For Speech Recognition
PPTX
Biometric Security Systems ppt
PPT
Chat bots and AI
PDF
Edge Computing: Bringing the Internet Closer to You
PPTX
Artificially Intelligent chatbot Implementation
PPTX
Iot Security
PDF
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
A.I.PPT
Deep Learning For Speech Recognition
Biometric Security Systems ppt
Chat bots and AI
Edge Computing: Bringing the Internet Closer to You
Artificially Intelligent chatbot Implementation
Iot Security
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬

What's hot (20)

PDF
Edge Computing
PPTX
Artificial Intelligence in Security and Surveillance
PDF
Autonomic Computing: Vision or Reality - Presentation
PPTX
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
PPTX
Use of Artificial Intelligence in Cyber Security - Avantika University
PDF
Large Language Models.pdf
PDF
NLP with Deep Learning
PPTX
A.I based chatbot on healthcare and medical science
PPTX
Ubiquitous computing
PDF
Expert Systems
PDF
AI for security or security for AI - Sergey Gordeychik
PPTX
Overview of Artificial Intelligence in Cybersecurity
PDF
Machine Learning in Cyber Security Domain
PDF
Artificial Intelligence Automation PowerPoint Presentation Slides
PPTX
Chatbot_Presentation
PPTX
Ai vs machine learning vs deep learning
PDF
Artificial Intelligence for Automated Software Testing
PPT
Message Authentication Requirement-MAC
PPTX
IOT gateways.pptx
PPTX
Speech to text conversion
Edge Computing
Artificial Intelligence in Security and Surveillance
Autonomic Computing: Vision or Reality - Presentation
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Use of Artificial Intelligence in Cyber Security - Avantika University
Large Language Models.pdf
NLP with Deep Learning
A.I based chatbot on healthcare and medical science
Ubiquitous computing
Expert Systems
AI for security or security for AI - Sergey Gordeychik
Overview of Artificial Intelligence in Cybersecurity
Machine Learning in Cyber Security Domain
Artificial Intelligence Automation PowerPoint Presentation Slides
Chatbot_Presentation
Ai vs machine learning vs deep learning
Artificial Intelligence for Automated Software Testing
Message Authentication Requirement-MAC
IOT gateways.pptx
Speech to text conversion
Ad

Similar to Voice User Interface Design - Big Design 2017 (20)

PPTX
Conversational User Interfaces, Past and Future
PDF
729 Solutions Helps Connect The Dots - Our Services At A Glance
PPT
Tools of a Successful Force.com Developer
PDF
OpenNTF Webinar, March, 2021
PDF
Secrets Of Web Company Profile
PPTX
How to Implement Domain Driven Design in Real Life SDLC
PPT
Increasing Website Sales & Conversions with Brad Hauck
PPTX
Touch Screens | The very expensive mistake
PDF
AI Services on AWS - CTO Club JLM
PDF
An Introduction to AI Services on AWS - Web Summit Lisbon
PPTX
Get to Know Softway Solutions
PDF
Agile mobile first
PDF
Web Designing and Development Media Kit Details UAE
PDF
Best Of SEJ Summit: Duane Forrester on the Future of Voice Search
KEY
Mobilizing wordpress WordCamp Edmonton 2011
PPTX
Inflectra CodeCamp Internship Introduction
PDF
LyteSpark for Business
PPTX
Understanding Content Management Services
PDF
Creating a Global Website
PDF
Javascript Framework Acessibiliity Review
Conversational User Interfaces, Past and Future
729 Solutions Helps Connect The Dots - Our Services At A Glance
Tools of a Successful Force.com Developer
OpenNTF Webinar, March, 2021
Secrets Of Web Company Profile
How to Implement Domain Driven Design in Real Life SDLC
Increasing Website Sales & Conversions with Brad Hauck
Touch Screens | The very expensive mistake
AI Services on AWS - CTO Club JLM
An Introduction to AI Services on AWS - Web Summit Lisbon
Get to Know Softway Solutions
Agile mobile first
Web Designing and Development Media Kit Details UAE
Best Of SEJ Summit: Duane Forrester on the Future of Voice Search
Mobilizing wordpress WordCamp Edmonton 2011
Inflectra CodeCamp Internship Introduction
LyteSpark for Business
Understanding Content Management Services
Creating a Global Website
Javascript Framework Acessibiliity Review
Ad

More from Crispin Reedy (15)

PPTX
Association for Voice Interaction Design - Annual Meeting 2018
PPTX
Assertive Niceness
PPTX
Adding Visuals to Voice Panel - SpeechTEK 2017
PPTX
Chatbots vs. Voicebots Sunrise Session SpeechTEK 2017-final
PPTX
Association for Voice Interaction Design Annual Meeting 2017
POTX
Where's Jarvis? The Future of Voice Recognition and Natural Language User In...
PPTX
Voice Recognition and Natural Language - Dallas TechFest 2016
PDF
Top 10 Tips for Making Complicated Things Simple
PPTX
Association for Voice Interaction Design Annual Meeting 2016
PDF
Going Solo: Design and Productivity Techniques for the Team of One
PPTX
Service Design and the Omnichannel Experience - SpeechTEK 2015
PPTX
Association for Voice Interaction Design Annual Meeting 2015
PPTX
SpeechTEK University Outtakes 2014: Zero Out Strategies
PPTX
2013 Speech TEK - Alphanumeric Recognition Discussion
PPTX
Design Thinking Action Lab Exercise 1
Association for Voice Interaction Design - Annual Meeting 2018
Assertive Niceness
Adding Visuals to Voice Panel - SpeechTEK 2017
Chatbots vs. Voicebots Sunrise Session SpeechTEK 2017-final
Association for Voice Interaction Design Annual Meeting 2017
Where's Jarvis? The Future of Voice Recognition and Natural Language User In...
Voice Recognition and Natural Language - Dallas TechFest 2016
Top 10 Tips for Making Complicated Things Simple
Association for Voice Interaction Design Annual Meeting 2016
Going Solo: Design and Productivity Techniques for the Team of One
Service Design and the Omnichannel Experience - SpeechTEK 2015
Association for Voice Interaction Design Annual Meeting 2015
SpeechTEK University Outtakes 2014: Zero Out Strategies
2013 Speech TEK - Alphanumeric Recognition Discussion
Design Thinking Action Lab Exercise 1

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Tartificialntelligence_presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
1. Introduction to Computer Programming.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Electronic commerce courselecture one. Pdf
SOPHOS-XG Firewall Administrator PPT.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”

Voice User Interface Design - Big Design 2017

  • 1. © 2017 Versay Solutions Voice User Interface Design: Skills, Actions, And The Future Crispin Reedy, Versay Solutions @crispinTX crispinreedy.com #BigD17
  • 2. © 2017 Versay Solutions Voice User Interface Design: Skills, Actions, And The Future Disclaimer: This session was NOT sponsored by Dominos
  • 3. © 2017 Versay Solutions • Voice User Interface Designer • 15+ years in the field • Former coder; got interested in UX • President of the Association for Voice Interaction Design • Consultant for Versay Solutions @crispinTX crispinreedy.com
  • 4. © 2017 Versay Solutions Session Description • Amazon Skills for Alexa, Google Actions for Home – Should your company build a conversational voice interface for one of these systems, and if so, how? • What are the differences between a voice user interface and other types of UIs? • What types of skills does a VUI designer need? • What are some best practices for these VUIs? • You’ll walk away with answers to the questions “If, Why, and How” you might choose to explore this interesting new area of design.
  • 5. © 2017 Versay Solutions Easy Answer To #1 • If your company is involved in home automation: • Mostly likely Yes, and Yesterday • Although how you do it will depend on your platform • More on that later! • Everyone else • Let’s keep talking!
  • 6. © 2017 Versay Solutions Basic Terms
  • 7. © 2017 Versay Solutions Terms & Technologies •Speech Recognition •Natural Language Understanding •Voice Verification (Biometrics) •Text to Speech
  • 8. © 2017 Versay Solutions Speech Recognition “ASR” “See the cat.”
  • 9. © 2017 Versay Solutions Natural Language Understanding •Extracting meaning from natural text “Hello, yes, I’d like to pay my water bill. Can you help me with that? Intent = BillPay Entity (Bill Type) = Water
  • 10. © 2017 Versay Solutions Voice Verification “My voice is my password.” “Authenticated. Welcome, Mr. Smith.” ✓
  • 12. © 2017 Versay Solutions Speech Recognition • Hands-free command / control • Dictation • Input text • Small form factor device, etc. Text To Speech • Output text dynamically • Respond to input • Useful when no display is available Natural Language Understanding • Necessary for all language-based input • Extract meaning • Parse large volumes of text Voice Verification • Security
  • 13. ASR Application Data • Sign-In • Interaction • Request • Action • Meaning • Access Data • Output TTS NLU Voice prints Verifi- cation
  • 14. © 2017 Versay Solutions Speech Technology Today
  • 15. © 2017 Versay Solutions Speech Agents, Apps, and APIs Speech Agents: • Amazon Alexa • Echo, Dot, Echo Show • Google Assistant • Pixel, Android, Google Home, iPhone app • Apple’s Siri • iPhone, iPad, MacOS (Sierra), AppleTV • Microsoft’s Cortana • Windows 10, Windows Phone, Xbox, iPhone app • Samsung’s Bixby • Galaxy S8, Family Hub 2.0 Fridge
  • 16. © 2017 Versay Solutions Speech Agents, Apps, and APIs Speech Agents can be extended with “Voice Apps” • Alexa Skills • Google Actions • SiriKit • Cortana SDK
  • 17. © 2017 Versay Solutions Speech Agents, Apps, and APIs Agent capabilities and apps are somewhat determined by: • Platform: Device • Screen, keyboard, phone, mics, etc. • Environment: Web site, apps that interact with the agent • Ecosystem: Underlying connections, technical partnerships
  • 18. © 2017 Versay Solutions Platforms
  • 19. © 2017 Versay Solutions Environment Google “Actions” or “Apps” • Curated • Direct vs. Conversational Siri - Works via apps Order Uber Order Lyft
  • 20. © 2017 Versay Solutions New York Times
  • 21. © 2017 Versay Solutions Speech Agents, Apps, and APIs APIs: Allow you access to the underlying technology • Amazon • AVS (Alexa Voice Service) Create an “Alexa” on your own device • Amazon Lex, Amazon Polly • Google • Cloud Speech API • API.ai • Apple • Apple Speech Framework • Microsoft • Bing Speech API Ecobee Smart Thermostat
  • 22. © 2017 Versay Solutions Use Cases
  • 23. Use Case “Bakeoff” from Tech Insider •Travel •Email •Messaging •Sports •Music •Weather •Calendar •Social • Translation • Basic tasks • General knowledge • Personality http://www.businessinsider.com/siri-vs-google-assistant-cortana-alexa-2016-11/
  • 24. © 2017 Versay Solutions Use Case “Bakeoff” from Tech Insider • “wildly finicky when it comes to phrasing.” • “Each assistant still feels like a fragile, thinly veiled web of loosely connected services — because that's what they are.” • “incredibly uncomfortable to speak to an inanimate thing in public.” • “In Google Assistant's case, normalizing the need to call on a brand ("OK Google") whenever you need a hand is Orwellian.” • “None of these things are at a place I could comfortably call "good.””
  • 25. © 2017 Versay Solutions Personal Assistant vs. Home Assistant The Google Pixel XL. Hollis Johnson/Business Insider Google.com
  • 26. © 2017 Versay Solutions Personal Assistant vs. Home Assistant
  • 27. © 2017 Versay Solutions Getting Specific With Alexa
  • 28. © 2017 Versay Solutions “Layers” of Alexa •Alexa Native Capabilities •Alexa Skills •Alexa Voice Services
  • 29. © 2017 Versay Solutions “Layers” of Alexa • Alexa Native Capabilities • Come out of the box • Require Alexa wake word (can be changed) • Alexa Skills • Alexa’s “Extensions” or “Add-Ons” • Designed for and deployed on Echo Device • Skills must be downloaded to Echo • Require Alexa wake word + Skill name • Alexa Voice Services • Add Alexa voice control to your own device
  • 30. © 2017 Versay Solutions Alexa “Native” Capabilities Alexa, what’s 3 + 5? Alexa, set an alarm for 3 am. Alexa, set a thirty second timer. Alexa, what’s the weather? Note: Mix of TTS & Pre-Recorded Audio Note: “Hint”
  • 31. © 2017 Versay Solutions Design Considerations •Proactive “Hints” • Similar to “Hover Help” or “Tool Tip” • But less avoidable! • Pro: Can teach user about other capabilities • Con: Can be annoying! • Guideline: If used, be sparing • Develop rules for when and how frequently to offer
  • 32. © 2017 Versay Solutions Amazon.com Native & Skill Skill Skill Skill Native & Skill Alexa Skills
  • 33. © 2017 Versay Solutions Source: David Attwater, EIG Inc.
  • 34. © 2017 Versay Solutions Amazon.com
  • 36. © 2017 Versay Solutions Amazon.com
  • 37. © 2017 Versay Solutions Design Considerations • Invoking Skills: • Alexa, open Oprah Magazine • Alexa, order a pizza from Domino’s • Alexa, ask Cook Reference what’s the safe temperature for chicken • Syntax: Open <skill> Ask <skill> for (about, to, with, etc.) <action> Ask <skill> <question> Also: Search, Tell, Talk to, Launch, Start, Resume, Run, Load, Begin Oprah Magazine
  • 38. © 2017 Versay Solutions Design Considerations •Skills can be “installed” on the fly •If the user knows the name of the skill •Skills that require account information will need extra steps Cook Reference Domino’s
  • 39. © 2017 Versay Solutions Alexa App + Linking
  • 40. © 2017 Versay Solutions Design Considerations •Managing access to skills may become difficult or confusing.
  • 41. © 2017 Versay Solutions Design Considerations •Attention (or lack of attention!) to technical details can become “deal- killing” part of overall experience Domino’s
  • 42. © 2017 Versay Solutions Really? Dominos.com
  • 43. © 2017 Versay Solutions No Dominos.com
  • 44. © 2017 Versay Solutions Design Considerations • Confirmation • What’s the phone number? • 214-555-1235 • You said 214-555-1235. Is that correct? • Yes • Note: System confirmed the phone number but not the address • Was the address really correct?
  • 45. © 2017 Versay Solutions Dominos.com
  • 46. © 2017 Versay Solutions Design Considerations • “Would you like to place your Easy Order, reorder your most recent order, or start a new order?” • If I’m not logged into my account on the Alexa app, options 1 and 2 don’t make much sense. • “Would you like” is ambiguous – could be used for Yes / No questions or for multi-item questions • First part of the sentence runs into the choices • Reuse of the word “order” just seems odd (but may be unavoidable). • Could have used more pauses (SSML) Domino’s
  • 47. © 2017 Versay Solutions Design Considerations: SSML • Speech Synthesis Markup Language • Can control the way your TTS playback sounds • Very important if your output is mostly TTS • Which is true of all most platforms • Should be supported by all types of TTS engine • Amazon has platform specific options • Plan on using it to fine tune your audio output
  • 48. © 2017 Versay Solutions New Prompts & SSML Examples • Note: TTS Samples with SSML created with Amazon Polly, not Alexa • “You can: Place your easy order. Reorder your most recent order. Or, start a *new* order.” • You can: <break time="500ms"/>Place your easy order, <break time="500ms"/> Reorder your <emphasis level="moderate">most recent</emphasis> order, <break time="500ms"/> Or, start a <emphasis level="strong">new</emphasis> order. • Placing an order, great! Choose from: My easy order. My most recent. Or, start a *new* order. • <speak>Placing an order. <prosody pitch="high">Great!</prosody> Choose from: My easy order. My most recent. Or, start a <emphasis> <prosody pitch="high">new</prosody> </emphasis> order.</speak> Domino’s
  • 49. © 2017 Versay Solutions Still Trying To Order That Pizza • Start of the interaction has changed! • Probably due to login • “Would you like to place an order, or track an order?” • What just happened!!!? •System was expecting me to say “Start a new order” and I only said “New Order.” Domino’s
  • 50. © 2017 Versay Solutions Design Considerations • Make sure your input grammar covers all possible logical utterances (what user can say) • Don’t leave this stuff up to the programmers! • Provide examples of coverage • Coverage should match prompts • Use some kind of markup to show coverage • [] optional • () grouping • | or • “Would you like to place your Easy Order, reorder your most recent order, or start a new order?” • [place] [my | an] Easy Order • [reorder] [my] most recent [order] • [start a] new [order]
  • 51. © 2017 Versay Solutions Design Considerations •Reprompts: • What do you do when you didn’t understand what the caller said? • Probably don’t want to say “Sorry” • This can be annoying • But you CAN rephrase the prompt to make it different • Using the same prompt gives the user a sense that something has gone wrong
  • 52. © 2017 Versay Solutions Pizza Pizza Pizza • Hey you didn’t really need to explain about the phone number since I saved it but OK…. • Address has been saved to profile, great! • And then boom Domino’s
  • 53. © 2017 Versay Solutions With Speech, you need to spend a lot more time thinking about what happens when things go wrong.
  • 54. © 2017 Versay Solutions I Didn’t Really Want to Order Pizza But By Now I Am Hungry And So Is Somebody Else • Note “Easy Order” and Credit Card cannot be set up on the website unless you’re actually placing an order. • Give people enough time to talk! • There’s that grammar coverage issue again • Bell pepper = Green pepper • What synonyms is your user likely to say? • At some point couldn’t you just give me a list? • Notice how they screwed up the article + the item “… adding a parmesan bread twists” Meow Domino’s
  • 55. © 2017 Versay Solutions
  • 56. © 2017 Versay Solutions
  • 57. © 2017 Versay Solutions Design Considerations • Confirm and correct • “Do you want to add anything else?” • “Yes, I want to add peppers.” • Disambiguation • “Olives” • “Ok, we have two kinds of olives. Black olives, or green olives.” • A Voice User Interface design is a time-based interface • As a designer concerned with user experience you’re going to be involved in things (such as pauses) which may not occur to you
  • 58. © 2017 Versay Solutions How Did Google Home Do? •“OK Google, Order Dominos” • “There are stores at….” • Had to go find the right “App Name” online •“OK Google, Talk to Dominos” • “You can link to your Domino’s account…” • Had a terrible time finding the “Google Apps.”
  • 59. © 2017 Versay Solutions How Did Google Home Do? •Menu worked! • System did not recognize “Ham” (Should offer list of ingredients) • System became very laggy
  • 60. © 2017 Versay Solutions How Did Google Home Do? • Edited for time • Original was 3:35 • This is 2:15 • Use of “Dom” persona and male voice • “Hand off” • Playback of address: • Alexa: “Eighty seven twenty three” • Google: “Eight thousand seven hundred twenty three” • Same issue with “twists” • “Your day just got cheesier”
  • 61. © 2017 Versay Solutions Design Considerations •Discoverability • “OK Google, Order Dominos” •Persona • Google Home has more control over the voice • Branding considerations – “Dom” name and male TTS •Playback of Dynamic Data • Attention to detail – don’t trust the platform to do it the way you want it
  • 62. © 2017 Versay Solutions Design Considerations Maintaining State: •Between dialogs • “Who is Seth McFarlane?” • “Seth McFarlane is…” • “When’s his birthday?” • “I’m not sure what you’re talking about.” •From session to session Oprah Magazine
  • 63. © 2017 Versay Solutions Home Automation •Onboarding issues are very similar to “Skills,” but there is an additional layer of complexity • Companies are working to improve the experience • After setup, you get a lot of bang for the buck
  • 64. © 2017 Versay Solutions “Computer, turn on the library lights”
  • 65. © 2017 Versay Solutions TP Link
  • 66. © 2017 Versay Solutions Amazon
  • 67. © 2017 Versay Solutions Design Considerations: Summary • Managing access to Skills (App, Store) • Managing the Onboarding Experience • Discoverability • Invoking Skills • Hints • Confirmation • Asking Yes/No Questions vs. Multi-Item Questions • SSML • Silences • Reprompting • Coverage (prompt vs. possible input) • Managing technical errors • Timing and Timeouts • Article matching the noun • Confirm and Correct • Disambiguation • Persona • Playback of Dynamic Data • Maintaining State
  • 68. © 2017 Versay Solutions What Makes a Good VUI Designer? •Concern with the overall experience • All of the channels that go into making up how something happens •Attention to “small” technical details • Pauses • SSML •Writing skills! • Dialog, not tech doc • English majors, screenwriters
  • 69. © 2017 Versay Solutions Session Description • Amazon Skills for Alexa, Google Actions for Home – Should your company build a conversational voice interface for one of these systems, and if so, how? • What are the differences between a voice user interface and other types of UIs? ✔ • What types of skills does a VUI designer need? ✔ • What are some best practices for these VUIs? ✔ • You’ll walk away with answers to the questions “If, Why, and How” you might choose to explore this interesting new area of design. ✔
  • 70. © 2017 Versay Solutions If, Why, How •What are you trying to build? •Existing guidelines / research •User testing is key • Especially if you’re trying to do something complicated
  • 71. © 2017 Versay Solutions If, Why, How: Beyond Skills Write an app (skill) for an agent such as Google Assistant / Alexa Use cloud APIs to add ASR / NLU to your app / device / page / gadget Download software and use full-featured capabilities for more robust recognition on a specific device Build your own
  • 72. © 2017 Versay Solutions If, Why, How: What’s the Use Case? •Enabling application • User can’t do it any other way • New tasks •Enhancing application • User can do it now • But speech makes it better • Faster • Safer
  • 73. © 2017 Versay Solutions API-Based Device- Based Roll Your Own / Open- Source •Flexibility •Power •Customization •Time •Difficulty
  • 74. © 2017 Versay Solutions Existing Guidelines / Research • Caveat: Best practices evolved in one modality (e.g. voice-only) may not apply the same way in another (e.g. combined voice + touch) • But they could be adapted • Association for Voice Interaction Design (AVIxD.org) • Wiki • Peer-Reviewed Journal • Virtual “Brown Bags” • Academic Sources, Books
  • 75. © 2017 Versay Solutions AVIxD.org CUI Working Group is actively recruiting!
  • 76. © 2017 Versay Solutions @crispinTX Crispin Reedy Thank You!

Editor's Notes

  • #4: DO NOT FORGET TO BRING THE MINI-SPEAKERS!!!
  • #9: “Speech to Text” ? Spoken Language – Machine readable format
  • #10: Not necessarily tied to speech recognition
  • #11: Also called voiceprints, biometrics, voice authentication, etc. Not going to discuss this one in a lot of detail today but it’s important that you understand the difference between these technologies. Recognizes a person, not necessarily what they are saying. You can have ASR without Voice Verification And vice versa
  • #12: Human voice talent Hundreds of hours of recording Digitized Phonemes: Concatenated speech synthesis
  • #36: Alexa, Ask Capitol One What’s my current credit card balance?
  • #74: What do you need it for? What kind of device will you be running it on? Connectivity? Can you use cloud based ASR? How much control do you need over the application / user interface?
  • #77: DO NOT FORGET TO BRING THE MINI-SPEAKERS!!!