SlideShare a Scribd company logo
JavaScript
Speech
Recognition
Applicationsand maybe some other HTML5 goodness
Who is this guy?
core contributor
nutty about
IBM'er
@macdonst
macdonst on Github
simonmacdonald.com
PhoneGap
speech recognition
Why do I care about speech rec?
Here's a conversation between
two Cape Bretoners
P1: jeet?
P2: naw, jew?
P1: naw, t'rly t'eet bye.
And here's the translation
P1: jeet?
P1: Did you eat?
P2: naw, jew?
P2: No, did you?
P1: naw, t'rly t'eet bye.
P1: No, it's too early to eat buddy.
What is
speech
recognition?
Speech
recognition is
the process of
translating
the spoken
word into text.
The process of speech
rec includes
1. Record and digitize the audio data
2. Split data into phonemes
3. Apply the phonemes to the recognition model
4. Analyze the results against the grammar
5. Return a confidence weighted result
Basically...
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
So how do we
add speech
rec to our
app?
You may look at the W3C
Speech API Specification
but only Chrome on the
desktop has
implemented that spec
But that's okay!
The spec looks like this:
interface SpeechRecognition : EventTarget {
// recognition parameters
attribute SpeechGrammarList grammars;
attribute DOMString lang;
attribute boolean continuous;
attribute boolean interimResults;
attribute unsigned long maxAlternatives;
attribute DOMString serviceURI;
// methods to drive the speech interaction
void start();
void stop();
void abort();
};
With additional event
methods to control
behaviour:
attribute EventHandler onaudiostart;
attribute EventHandler onsoundstart;
attribute EventHandler onspeechstart;
attribute EventHandler onspeechend;
attribute EventHandler onsoundend;
attribute EventHandler onaudioend;
attribute EventHandler onresult;
attribute EventHandler onnomatch;
attribute EventHandler onerror;
attribute EventHandler onstart;
attribute EventHandler onend;
Let's recognize some
speech
Click to Speak
hello world
var recognition = new SpeechRecognition();
recognition.onresult = function(event) {
if (event.results.length > 0) {
var test1 = document.getElementById("test1");
test1.innerHTML = event.results[0][0].transcript;
}
};
recognition.start();
So that's
pretty cool...
...if taking
dictation gets
you going
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
But I want to
do something
more exciting
with the
result
Let's do something a little
less trivial
Click to Speak
recognition.onresult = function(event) {
var result = event.results[0][0].transcript;
var music = document.getElementById("music");
switch(result) {
case "jazz":
music.src="jazz.mp3";
music.play();
break;
case "rock":
music.src="rock.mp3";
music.play();
break;
case "stop":
default:
music.pause();
}
};
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
Which seems
much cooler
to me
Let's ask the web a
question
Click to Speak
Q: what day is it today
A: Friday July 19th, 2013
Works pretty
good...
...but ugly!
Let's style our
button with
some CSS
+
=
<a class="speechinput">
<img src="images/mic.png">
</a>
#speechinput input {
cursor:pointer;
margin:auto;
margin:15px;
color:transparent;
background-color:transparent;
border:5px;
width:15px;
-webkit-transform: scale(3.0, 3.0);
}
And we'll add some color
using
by Nicholas Gallagher
Speech
Bubbles
Pure-CSS-Speech-Bubbles
Then pull it all
together!
what is steve jobs middle name
Steven Paul Jobs
But wait, why
am I using my
eyes like a
sucker
We'll output the answer
using SpeechSynthesis
The SpeechSynthesis
spec looks like this:
interface SpeechSynthesis {
readonly attribute boolean pending;
readonly attribute boolean speaking;
readonly attribute boolean paused;
void speak(SpeechSynthesisUtterance utterance);
void cancel();
void pause();
void resume();
SpeechSynthesisVoiceList getVoices();
};
The
SpeechSynthesisUtterance
spec looks like this:
interface SpeechSynthesisUtterance : EventTarget {
attribute DOMString text;
attribute DOMString lang;
attribute DOMString voiceURI;
attribute float volume;
attribute float rate;
attribute float pitch;
};
With additional event
methods to control
behaviour:
attribute EventHandler onstart;
attribute EventHandler onend;
attribute EventHandler onerror;
attribute EventHandler onpause;
attribute EventHandler onresume;
attribute EventHandler onmark;
attribute EventHandler onboundary;
who won the stanley cup this year
Chicago Blackhawks
Plugin repo's
SpeechRecognitionPlugin -
https://github.com/macdonst/SpeechRecognitionPlugin
SpeechSynthesisPlugin -
https://github.com/macdonst/SpeechSynthesisPlugin
THE END

More Related Content

PPT
Ben 10 presentation
KEY
PhoneGapの始め方
PDF
PhoneGap Day US 2013 - Chrome Packaged Apps
PDF
Atoum, le framework de tests unitaires pour PHP 5.3 simple, moderne et intuit...
PDF
Machine Learning on the web - moving from Terminator to Star Trek
PDF
Spa のための web サーバ構築ノウハウ
PDF
Getting Browsers to Improve the Security of Your Webapp
PDF
サーバサイドエンジニアが 1年間まじめにSPAやってみた
Ben 10 presentation
PhoneGapの始め方
PhoneGap Day US 2013 - Chrome Packaged Apps
Atoum, le framework de tests unitaires pour PHP 5.3 simple, moderne et intuit...
Machine Learning on the web - moving from Terminator to Star Trek
Spa のための web サーバ構築ノウハウ
Getting Browsers to Improve the Security of Your Webapp
サーバサイドエンジニアが 1年間まじめにSPAやってみた

Similar to PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition (20)

PDF
JavaScript Speech Recognition
PDF
Voicecon - Mashups with Tropo.com
ODP
Device Emulation with OSGi and Flash
PDF
Destruction, Decapods and Doughnuts: Continuous Delivery for Audio & Video Fa...
PDF
NDC 2011 - The FLUID Principles
PDF
通往測試最高殿堂的旅程 - GTAC 2016
PDF
What lies beneath
PDF
Prophet - Beijing Perl Workshop
PPT
"Probably, Maybe, No: The State of HTML5 Audio" - Scott Schiller
ODP
Passing The Joel Test In The PHP World
PDF
mri-bp2015
PDF
Spring, CDI, Jakarta EE good parts
PDF
Design and Evolution of cyber-dojo
PPTX
London Web Performance Meetup: Performance for mortal companies
PPT
Programming For Designers V3
PPTX
DevDay.lk - Bare Knuckle Web Development
PDF
Hey man, can I get a clue?
PPT
PDF
BSides LA/PDX
JavaScript Speech Recognition
Voicecon - Mashups with Tropo.com
Device Emulation with OSGi and Flash
Destruction, Decapods and Doughnuts: Continuous Delivery for Audio & Video Fa...
NDC 2011 - The FLUID Principles
通往測試最高殿堂的旅程 - GTAC 2016
What lies beneath
Prophet - Beijing Perl Workshop
"Probably, Maybe, No: The State of HTML5 Audio" - Scott Schiller
Passing The Joel Test In The PHP World
mri-bp2015
Spring, CDI, Jakarta EE good parts
Design and Evolution of cyber-dojo
London Web Performance Meetup: Performance for mortal companies
Programming For Designers V3
DevDay.lk - Bare Knuckle Web Development
Hey man, can I get a clue?
BSides LA/PDX
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Approach and Philosophy of On baking technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Ad

PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition