SlideShare a Scribd company logo
Speech Recognition,
Text-To-Speech,
and Voice Interfaces
By:
Taryne Cahalin
Stephanie Sirico
Christiana Vasquez
Adelphi University - Mobile Learning, Fall 2013
What is Speech
Recognition?
Instead of an automated voice recording that enables a
person to press buttons, he or she is able to speak specific
words into a device and command orders with the help of a
speech recognition program.
The Uses
Individuals With Disabilities – Assists those who have visual
impairment, hand immobility, dyslexia, etc.
Medical Transcription – Reduces delays to write out
medical transcriptions
Dictation - Converts words to text in emails or other word
documents (also helpful for English Language Learners).
Access Menu Commands – Opens files using voice commands.
Using Dragon Mobile
How does it work?
Speech recognition functions as a
pipeline:
The pipeline converts PCM (pulse
code modulation) digital audio into
recognized speech from a sound
card.
Speech Recognition, Text to Speech, and Voice Interfaces
Transforming PCM Digital Audio

16,000 PCM values
per second, a “wavy
line”, that repeat while
the user speaks

Information is
converted for
better
recognition in
the program

Fast-Fourier
transform
identifies
frequency
components of a
specific sound

The program
can
approximate
how our ears
distinguish the
sound
Transform PCM digital audio
using Fast-Fourier Transform
Fast-Fourier analyzes every 1/100th of a second
and converts the audio data

Each 1/100th produces an amplitude graph
These graphs are in a database called a “codebook”
Sounds matched to the most similar entry in the codebook.
Sound is given a number which describes the sound, called the “feature
number”
Two Categories

Small Vocabulary/many-users:
• Leaves room for speech disparity (i.e. accents)
• Limited, preset number of commands that are able to be used

Large Vocabulary/limited-users:
• Best for business settings
• Train system to work with a small number of users
• Accuracy rate will increase as it learns its users
Discrete vs. Continuous Speech
Discrete
• Easier for program to understand
• Noticeable pause after each word
Continuous
• Allows speaking at conversational speed
• Used in most modern systems
Programs now can recognize accents and pronunciations better. In
earlier programs, accents, pronunciations, speed, and background noise
were all variables that made sounds difficult for programs to understand.
Using Talk – Text to Voice

This app allows you to type and then have the device repeat what was
typed. In this case, instead of the device saying Taryne as “Ta-rin”, it
pronounced it as “Ta-reen”. This is an example of speech recognition
programs still need some work to be done because of emphasis on a
syllable. The codebook did not have Taryne in it, so it was unable to
pronounce her name.
The Future of Assistive Technology
in Schools
Students who need assistance in their writing skills because
they have stronger oral skills.
Students who were absent for a class, have poor memory, or
need assistance hearing the lesson.
Students who need assistance during Guided Reading.

Students who are English Language Learners.

Students with visual/hearing impairments and learning
disabilities regarding reading/spelling/writing.

More Related Content

DOCX
Speech Recognition
PPTX
Teleconferencing 2
PPTX
Optimal use of emerging technology (2)
DOC
Types of audio conferencing and its uses
PPTX
Mobile world
PPT
Video Interpreting to Meet Patient Needs
PPTX
Teleconfere
PPTX
Teleconferencing
Speech Recognition
Teleconferencing 2
Optimal use of emerging technology (2)
Types of audio conferencing and its uses
Mobile world
Video Interpreting to Meet Patient Needs
Teleconfere
Teleconferencing

Viewers also liked (20)

PDF
클라우드기반 음성변환 서비스 보이스몬제안서_201312
PDF
Speech analytics solution overview
PPTX
Voice Interfaces Usergroup Berlin - 05-10-2016 : Kay Lerch on Morse-Coder skill
PPTX
How to Succeed With Rewarded Video Ads
PPTX
Mobile Gaming Monetization Trends in 2016
PPTX
KiwiPyCon 2014 talk - Understanding human language with Python
PDF
Designing a Conversational Intelligent Bot which can cook
PPTX
ICS2208 lecture4
PDF
Applying Science to Conversational UX Design
PPTX
The Journey to conversational interfaces
PDF
Amazon Alexa Voice Interfaces Meetup Berlin August 2016
POTX
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
PDF
Introduction to Chat Bots
PPTX
Chatbots - What, Why and How? - Beerud Sheth
PPTX
Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups
PDF
Build your first messenger bot
PDF
How to implement chatbots for Alexa and Facebook Messenger
PDF
The lifecycle of a chatbot
PPTX
Speech recognition system seminar
PPTX
An Introduction To Chat Bots
클라우드기반 음성변환 서비스 보이스몬제안서_201312
Speech analytics solution overview
Voice Interfaces Usergroup Berlin - 05-10-2016 : Kay Lerch on Morse-Coder skill
How to Succeed With Rewarded Video Ads
Mobile Gaming Monetization Trends in 2016
KiwiPyCon 2014 talk - Understanding human language with Python
Designing a Conversational Intelligent Bot which can cook
ICS2208 lecture4
Applying Science to Conversational UX Design
The Journey to conversational interfaces
Amazon Alexa Voice Interfaces Meetup Berlin August 2016
Where's Jarvis? The future of Voice Recognition and Natural Language User Int...
Introduction to Chat Bots
Chatbots - What, Why and How? - Beerud Sheth
Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups
Build your first messenger bot
How to implement chatbots for Alexa and Facebook Messenger
The lifecycle of a chatbot
Speech recognition system seminar
An Introduction To Chat Bots
Ad

Similar to Speech Recognition, Text to Speech, and Voice Interfaces (20)

PPTX
Speech recognition An overview
PPTX
Introduction to myanmar Text-To-Speech
PPTX
Proposal presentation.pptx
PPT
Speech Recognition in Artificail Inteligence
PPT
Speech recognition
PDF
An communication app for hearing impaired groups
PPTX
PPTX
PPTX
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
PPTX
Speech to text conversion
PPTX
Speech to text conversion
PDF
ACHIEVING SECURITY VIA SPEECH RECOGNITION
PPT
Turn Talking Software
PPT
Turn Talking Software
PPT
Noise Adaptive Training for Robust Automatic Speech Recognition
PPT
F 08 dragon naturally speaking
PPT
Speechrecognition 100423091251-phpapp01
PPT
Synchronous Communication
DOCX
PurposeSpeech recognition software has existed for decades; diff.docx
PPTX
Assistive technology presentation
Speech recognition An overview
Introduction to myanmar Text-To-Speech
Proposal presentation.pptx
Speech Recognition in Artificail Inteligence
Speech recognition
An communication app for hearing impaired groups
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
Speech to text conversion
Speech to text conversion
ACHIEVING SECURITY VIA SPEECH RECOGNITION
Turn Talking Software
Turn Talking Software
Noise Adaptive Training for Robust Automatic Speech Recognition
F 08 dragon naturally speaking
Speechrecognition 100423091251-phpapp01
Synchronous Communication
PurposeSpeech recognition software has existed for decades; diff.docx
Assistive technology presentation
Ad

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Empathic Computing: Creating Shared Understanding

Speech Recognition, Text to Speech, and Voice Interfaces

  • 1. Speech Recognition, Text-To-Speech, and Voice Interfaces By: Taryne Cahalin Stephanie Sirico Christiana Vasquez Adelphi University - Mobile Learning, Fall 2013
  • 2. What is Speech Recognition? Instead of an automated voice recording that enables a person to press buttons, he or she is able to speak specific words into a device and command orders with the help of a speech recognition program.
  • 3. The Uses Individuals With Disabilities – Assists those who have visual impairment, hand immobility, dyslexia, etc. Medical Transcription – Reduces delays to write out medical transcriptions Dictation - Converts words to text in emails or other word documents (also helpful for English Language Learners). Access Menu Commands – Opens files using voice commands.
  • 5. How does it work? Speech recognition functions as a pipeline: The pipeline converts PCM (pulse code modulation) digital audio into recognized speech from a sound card.
  • 7. Transforming PCM Digital Audio 16,000 PCM values per second, a “wavy line”, that repeat while the user speaks Information is converted for better recognition in the program Fast-Fourier transform identifies frequency components of a specific sound The program can approximate how our ears distinguish the sound
  • 8. Transform PCM digital audio using Fast-Fourier Transform Fast-Fourier analyzes every 1/100th of a second and converts the audio data Each 1/100th produces an amplitude graph These graphs are in a database called a “codebook” Sounds matched to the most similar entry in the codebook. Sound is given a number which describes the sound, called the “feature number”
  • 9. Two Categories Small Vocabulary/many-users: • Leaves room for speech disparity (i.e. accents) • Limited, preset number of commands that are able to be used Large Vocabulary/limited-users: • Best for business settings • Train system to work with a small number of users • Accuracy rate will increase as it learns its users
  • 10. Discrete vs. Continuous Speech Discrete • Easier for program to understand • Noticeable pause after each word Continuous • Allows speaking at conversational speed • Used in most modern systems Programs now can recognize accents and pronunciations better. In earlier programs, accents, pronunciations, speed, and background noise were all variables that made sounds difficult for programs to understand.
  • 11. Using Talk – Text to Voice This app allows you to type and then have the device repeat what was typed. In this case, instead of the device saying Taryne as “Ta-rin”, it pronounced it as “Ta-reen”. This is an example of speech recognition programs still need some work to be done because of emphasis on a syllable. The codebook did not have Taryne in it, so it was unable to pronounce her name.
  • 12. The Future of Assistive Technology in Schools Students who need assistance in their writing skills because they have stronger oral skills. Students who were absent for a class, have poor memory, or need assistance hearing the lesson. Students who need assistance during Guided Reading. Students who are English Language Learners. Students with visual/hearing impairments and learning disabilities regarding reading/spelling/writing.