SlideShare a Scribd company logo
PAGE1
© 2015 Apio Systems, Inc. Confidential 1
Jared Sheehan @ Driversiti
Speech Recognition as a User Interface
PAGE2
© 2015 Apio Systems, Inc. Confidential 2
Who am I
Glass explorer, speech recognition enthusiast and big android nerd
Android Lead @Driversiti - driving safety for the mobile generation
Speech Recognition application for the Amazon Fire Phone
Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch
Android, AOL HD, AIM Blackberry
Meetup evangelist – “DC Android Meetup Group” – Join today!
PAGE3
© 2015 Apio Systems, Inc. Confidential 3
Overview
What is voice/speech recognition?
What awesome stuff you can do with it?
How it works…
Demo!
Question and Answer
PAGE4
© 2015 Apio Systems, Inc. Confidential 4
Hello Computer…
PAGE5
© 2015 Apio Systems, Inc. Confidential 5
Definition
PAGE6
© 2015 Apio Systems, Inc. Confidential 6
What can you do with SR?
Technology that allows spoken input into software systems.
You speak to your computer, tablet, phone or device and it uses what you said as input to
trigger some sort of action.
Replace other methods of input like clicking, swiping, typing or selecting in other ways.
It is a means to make devices and software more user-friendly and to increase productivity.
It is used extensively as a form of accessibility assistance.
PAGE7
© 2015 Apio Systems, Inc. Confidential 7
ASR - Dictation
Automatic speech recognition (ASR) also called Dictation
Translates speech input into words, sentences and punctuation.
Audio is input through a microphone and streamed somewhere
The result is usually returned as a string with a confidence level
Very easy integration with Android – 2 ways to do it.
PAGE8
© 2015 Apio Systems, Inc. Confidential 8
How does it work?
A user speaks into a recording device of some sort
Speech recognition begins with the digital sampling of speech and then acoustic signal
processing of the audio.
Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models)
and NN’s (Neural Networks) can achieve the desired results
Most systems use language specific knowledge to tune the models.
Next is the actual recognition of phonemes, groups of phonemes and words
PAGE9
© 2015 Apio Systems, Inc. Confidential 9
Speech Recognition system architecture
PAGE10
© 2015 Apio Systems, Inc. Confidential 10
Into the weeds
Speaker dependence
Speaker independence
Continuous Speech
How good is your system? Hint: Word Error Rate
Isolated word
Is that all it does??
PAGE11
© 2015 Apio Systems, Inc. Confidential 11
Dictation is cool, but not that cool
Next step is understanding what the user wants to do
Then act on it
Generally, the ASR results are passed into an Intent recognition system with additional
information
Contextual information can be, where the utterance is coming from (mobile phone,
computer), what app they are using, location etc.
That information is used to determine the user’s intent and execute the request.
PAGE12
© 2015 Apio Systems, Inc. Confidential 12
Intent recognition
Recognizing speech is only part of the process. How does Google Now know that I want to
send an SMS message to a friend? How does Siri know when I want to know how tall
Kobe Bryant is?
ASR is only the first step in true Speech as a user interface. To successfully help users
perform useful actions we must understand their intent. How to do this?
Three systems; ASR, Intent Recognition and a Dialog Engine
The Dialog engine takes the output from the IR system and sends responses and
actionable information to the caller.
PAGE13
© 2015 Apio Systems, Inc. Confidential 13
Android Speech APIs
PAGE14
© 2015 Apio Systems, Inc. Confidential 14
Android Speech APIs
http://developer.android.com/reference/android/speech/package-summary.html
Relatively easy implementation
<uses-permission android:name="android.permission.RECORD_AUDIO" />
A UI and no UI API
InputMethodServices use the no UI version - Keyboards
PAGE15
© 2015 Apio Systems, Inc. Confidential 15
Recognizer Intent
UI is supplied for you
Fire the intent and get a result
Again very easy to use
PAGE16
© 2015 Apio Systems, Inc. Confidential 16
SpeechRecognizer
UI is not supplied for you
Results are streamed directly to the EditText
Still “fairly” easy to use
PAGE17
© 2015 Apio Systems, Inc. Confidential 17
Google Now – Onto Intent recognition systems…
PAGE18
© 2015 Apio Systems, Inc. Confidential 18
Google Now – On tap
PAGE19
© 2015 Apio Systems, Inc. Confidential 19
Apple – Siri
PAGE20
© 2015 Apio Systems, Inc. Confidential 20
Amazon – Fire phone, Fire Tv and Echo
PAGE21
© 2015 Apio Systems, Inc. Confidential 21
Microsoft – Cortana
PAGE22
© 2015 Apio Systems, Inc. Confidential 22
Speech providers – Google, Nuance, IBM Watson
PAGE23
© 2015 Apio Systems, Inc. Confidential 23
Google Voice Interaction API
PAGE24
© 2015 Apio Systems, Inc. Confidential 24
Nuance Speech SDK
Dragon Mobile – SDK – Free up to 20k transactions per/month
Upload custom vocabularies
Developer: Uploads a new song and music vocabulary
Utterance: “Eminem” higher probability then “M&M”
PAGE25
© 2015 Apio Systems, Inc. Confidential 25
User Interface examples - Google Glass
PAGE26
© 2015 Apio Systems, Inc. Confidential 26
User Interface examples - Google Glass continued…
PAGE27
© 2015 Apio Systems, Inc. Confidential 27
User Interface examples - Google Glass continued…
PAGE28
© 2015 Apio Systems, Inc. Confidential
Enough talk!
PAGE29
© 2015 Apio Systems, Inc. Confidential
Show me code!
PAGE30
© 2015 Apio Systems, Inc. Confidential
jared.sheehan@driversiti.com
http://www.meetup.com/DCAndroid/
Tweet: @jayroo5245
THANK YOU

More Related Content

PPTX
Taking the blinders off – The power of Stetho
PDF
"Open Source-ing Your Tech Startup" by Asep Bagja Priandana (Tanibox)
PDF
CIS13: Mobile Single Sign-On: Extending SSO Out to the Client
PPTX
Mobile Single-Sign On: Extending SSO Out to the Client - Layer 7's CTO Scott ...
PDF
APIdays Singapore 2019 - Rethinking security and compliance for the API ecosy...
PDF
Mobile SSO: Give App Users a Break from Typing Passwords
PDF
WSO2Con USA 2015: Connected Device Management for Enterprise Mobility and Beyond
PDF
CIS14: Mobile SSO using NAPPS: OpenID Connect Profile for Native Apps-jain
Taking the blinders off – The power of Stetho
"Open Source-ing Your Tech Startup" by Asep Bagja Priandana (Tanibox)
CIS13: Mobile Single Sign-On: Extending SSO Out to the Client
Mobile Single-Sign On: Extending SSO Out to the Client - Layer 7's CTO Scott ...
APIdays Singapore 2019 - Rethinking security and compliance for the API ecosy...
Mobile SSO: Give App Users a Break from Typing Passwords
WSO2Con USA 2015: Connected Device Management for Enterprise Mobility and Beyond
CIS14: Mobile SSO using NAPPS: OpenID Connect Profile for Native Apps-jain

What's hot (13)

PPTX
OOW13: Developing secure mobile applications (CON8902)
PDF
Device Management for Connected Devices
PDF
SYPHERSAFE
PDF
Effective Smartphone UX at GREE
PPTX
Connecting The Real World With The Virtual World
PDF
Providing Internet Access via WSO2 Enterprise Mobility Manager
PPTX
I phone
 
PPTX
Nexus Protocol Gateway and BYOD
PDF
Patterns and Practices in Mobile SSO
PPTX
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
PPTX
Mobile SSO using NAPPS
PDF
Security Checklist: how iOS can help protecting your data.
PDF
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
OOW13: Developing secure mobile applications (CON8902)
Device Management for Connected Devices
SYPHERSAFE
Effective Smartphone UX at GREE
Connecting The Real World With The Virtual World
Providing Internet Access via WSO2 Enterprise Mobility Manager
I phone
 
Nexus Protocol Gateway and BYOD
Patterns and Practices in Mobile SSO
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
Mobile SSO using NAPPS
Security Checklist: how iOS can help protecting your data.
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
Ad

Similar to Speech Recognition as a User Interface (20)

PPTX
Speech Recognition, Text to Speech, and Voice Interfaces
PPTX
Voice Recognition and Natural Language - Dallas TechFest 2016
PPTX
Speech recognition An overview
PPTX
Google Voice-to-text
PDF
Speech recognition - how does it work?
PPTX
Voice User Interface Design - Big Design 2017
PPTX
Speech Recognition
PDF
Voice Command Mobile Phone Dialer
PDF
How does speech recognition AI work.pdf
PDF
IRJET - Speech Recognition using Android
PPTX
Artificial Intelligence - An Introduction
PPTX
Artificial Intelligence- An Introduction
PPTX
Speech Recognition by Iqbal
PPTX
Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
PDF
Designing for Voice Interactions (UXAustralia)
PPTX
Dilpreetanshika major project
PPTX
Voice input and speech recognition system in tourism/social media
PDF
ACHIEVING SECURITY VIA SPEECH RECOGNITION
PPTX
Speech user interface
PDF
Introduction to Speech Interfaces for Web Applications
Speech Recognition, Text to Speech, and Voice Interfaces
Voice Recognition and Natural Language - Dallas TechFest 2016
Speech recognition An overview
Google Voice-to-text
Speech recognition - how does it work?
Voice User Interface Design - Big Design 2017
Speech Recognition
Voice Command Mobile Phone Dialer
How does speech recognition AI work.pdf
IRJET - Speech Recognition using Android
Artificial Intelligence - An Introduction
Artificial Intelligence- An Introduction
Speech Recognition by Iqbal
Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
Designing for Voice Interactions (UXAustralia)
Dilpreetanshika major project
Voice input and speech recognition system in tourism/social media
ACHIEVING SECURITY VIA SPEECH RECOGNITION
Speech user interface
Introduction to Speech Interfaces for Web Applications
Ad

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
Machine Learning_overview_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Speech Recognition as a User Interface

  • 1. PAGE1 © 2015 Apio Systems, Inc. Confidential 1 Jared Sheehan @ Driversiti Speech Recognition as a User Interface
  • 2. PAGE2 © 2015 Apio Systems, Inc. Confidential 2 Who am I Glass explorer, speech recognition enthusiast and big android nerd Android Lead @Driversiti - driving safety for the mobile generation Speech Recognition application for the Amazon Fire Phone Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch Android, AOL HD, AIM Blackberry Meetup evangelist – “DC Android Meetup Group” – Join today!
  • 3. PAGE3 © 2015 Apio Systems, Inc. Confidential 3 Overview What is voice/speech recognition? What awesome stuff you can do with it? How it works… Demo! Question and Answer
  • 4. PAGE4 © 2015 Apio Systems, Inc. Confidential 4 Hello Computer…
  • 5. PAGE5 © 2015 Apio Systems, Inc. Confidential 5 Definition
  • 6. PAGE6 © 2015 Apio Systems, Inc. Confidential 6 What can you do with SR? Technology that allows spoken input into software systems. You speak to your computer, tablet, phone or device and it uses what you said as input to trigger some sort of action. Replace other methods of input like clicking, swiping, typing or selecting in other ways. It is a means to make devices and software more user-friendly and to increase productivity. It is used extensively as a form of accessibility assistance.
  • 7. PAGE7 © 2015 Apio Systems, Inc. Confidential 7 ASR - Dictation Automatic speech recognition (ASR) also called Dictation Translates speech input into words, sentences and punctuation. Audio is input through a microphone and streamed somewhere The result is usually returned as a string with a confidence level Very easy integration with Android – 2 ways to do it.
  • 8. PAGE8 © 2015 Apio Systems, Inc. Confidential 8 How does it work? A user speaks into a recording device of some sort Speech recognition begins with the digital sampling of speech and then acoustic signal processing of the audio. Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models) and NN’s (Neural Networks) can achieve the desired results Most systems use language specific knowledge to tune the models. Next is the actual recognition of phonemes, groups of phonemes and words
  • 9. PAGE9 © 2015 Apio Systems, Inc. Confidential 9 Speech Recognition system architecture
  • 10. PAGE10 © 2015 Apio Systems, Inc. Confidential 10 Into the weeds Speaker dependence Speaker independence Continuous Speech How good is your system? Hint: Word Error Rate Isolated word Is that all it does??
  • 11. PAGE11 © 2015 Apio Systems, Inc. Confidential 11 Dictation is cool, but not that cool Next step is understanding what the user wants to do Then act on it Generally, the ASR results are passed into an Intent recognition system with additional information Contextual information can be, where the utterance is coming from (mobile phone, computer), what app they are using, location etc. That information is used to determine the user’s intent and execute the request.
  • 12. PAGE12 © 2015 Apio Systems, Inc. Confidential 12 Intent recognition Recognizing speech is only part of the process. How does Google Now know that I want to send an SMS message to a friend? How does Siri know when I want to know how tall Kobe Bryant is? ASR is only the first step in true Speech as a user interface. To successfully help users perform useful actions we must understand their intent. How to do this? Three systems; ASR, Intent Recognition and a Dialog Engine The Dialog engine takes the output from the IR system and sends responses and actionable information to the caller.
  • 13. PAGE13 © 2015 Apio Systems, Inc. Confidential 13 Android Speech APIs
  • 14. PAGE14 © 2015 Apio Systems, Inc. Confidential 14 Android Speech APIs http://developer.android.com/reference/android/speech/package-summary.html Relatively easy implementation <uses-permission android:name="android.permission.RECORD_AUDIO" /> A UI and no UI API InputMethodServices use the no UI version - Keyboards
  • 15. PAGE15 © 2015 Apio Systems, Inc. Confidential 15 Recognizer Intent UI is supplied for you Fire the intent and get a result Again very easy to use
  • 16. PAGE16 © 2015 Apio Systems, Inc. Confidential 16 SpeechRecognizer UI is not supplied for you Results are streamed directly to the EditText Still “fairly” easy to use
  • 17. PAGE17 © 2015 Apio Systems, Inc. Confidential 17 Google Now – Onto Intent recognition systems…
  • 18. PAGE18 © 2015 Apio Systems, Inc. Confidential 18 Google Now – On tap
  • 19. PAGE19 © 2015 Apio Systems, Inc. Confidential 19 Apple – Siri
  • 20. PAGE20 © 2015 Apio Systems, Inc. Confidential 20 Amazon – Fire phone, Fire Tv and Echo
  • 21. PAGE21 © 2015 Apio Systems, Inc. Confidential 21 Microsoft – Cortana
  • 22. PAGE22 © 2015 Apio Systems, Inc. Confidential 22 Speech providers – Google, Nuance, IBM Watson
  • 23. PAGE23 © 2015 Apio Systems, Inc. Confidential 23 Google Voice Interaction API
  • 24. PAGE24 © 2015 Apio Systems, Inc. Confidential 24 Nuance Speech SDK Dragon Mobile – SDK – Free up to 20k transactions per/month Upload custom vocabularies Developer: Uploads a new song and music vocabulary Utterance: “Eminem” higher probability then “M&M”
  • 25. PAGE25 © 2015 Apio Systems, Inc. Confidential 25 User Interface examples - Google Glass
  • 26. PAGE26 © 2015 Apio Systems, Inc. Confidential 26 User Interface examples - Google Glass continued…
  • 27. PAGE27 © 2015 Apio Systems, Inc. Confidential 27 User Interface examples - Google Glass continued…
  • 28. PAGE28 © 2015 Apio Systems, Inc. Confidential Enough talk!
  • 29. PAGE29 © 2015 Apio Systems, Inc. Confidential Show me code!
  • 30. PAGE30 © 2015 Apio Systems, Inc. Confidential jared.sheehan@driversiti.com http://www.meetup.com/DCAndroid/ Tweet: @jayroo5245 THANK YOU

Editor's Notes

  • #17: What else is there?