Speech Recognition, Text to Speech, and Voice Interfaces

Speech Recognition,
Text-To-Speech,
and Voice Interfaces
By:
Taryne Cahalin
Stephanie Sirico
Christiana Vasquez
Adelphi University - Mobile Learning, Fall 2013

What is Speech
Recognition?
Instead of an automated voice recording that enables a
person to press buttons, he or she is able to speak specific
words into a device and command orders with the help of a
speech recognition program.

The Uses
Individuals With Disabilities – Assists those who have visual
impairment, hand immobility, dyslexia, etc.
Medical Transcription – Reduces delays to write out
medical transcriptions
Dictation - Converts words to text in emails or other word
documents (also helpful for English Language Learners).
Access Menu Commands – Opens files using voice commands.

How does it work?
Speech recognition functions as a
pipeline:
The pipeline converts PCM (pulse
code modulation) digital audio into
recognized speech from a sound
card.

Transforming PCM Digital Audio

16,000 PCM values
per second, a “wavy
line”, that repeat while
the user speaks

Information is
converted for
better
recognition in
the program

Fast-Fourier
transform
identifies
frequency
components of a
specific sound

The program
can
approximate
how our ears
distinguish the
sound

Transform PCM digital audio
using Fast-Fourier Transform
Fast-Fourier analyzes every 1/100th of a second
and converts the audio data

Each 1/100th produces an amplitude graph
These graphs are in a database called a “codebook”
Sounds matched to the most similar entry in the codebook.
Sound is given a number which describes the sound, called the “feature
number”

Two Categories

Small Vocabulary/many-users:
• Leaves room for speech disparity (i.e. accents)
• Limited, preset number of commands that are able to be used

Large Vocabulary/limited-users:
• Best for business settings
• Train system to work with a small number of users
• Accuracy rate will increase as it learns its users

Discrete vs. Continuous Speech
Discrete
• Easier for program to understand
• Noticeable pause after each word
Continuous
• Allows speaking at conversational speed
• Used in most modern systems
Programs now can recognize accents and pronunciations better. In
earlier programs, accents, pronunciations, speed, and background noise
were all variables that made sounds difficult for programs to understand.

Using Talk – Text to Voice

This app allows you to type and then have the device repeat what was
typed. In this case, instead of the device saying Taryne as “Ta-rin”, it
pronounced it as “Ta-reen”. This is an example of speech recognition
programs still need some work to be done because of emphasis on a
syllable. The codebook did not have Taryne in it, so it was unable to
pronounce her name.

The Future of Assistive Technology
in Schools
Students who need assistance in their writing skills because
they have stronger oral skills.
Students who were absent for a class, have poor memory, or
need assistance hearing the lesson.
Students who need assistance during Guided Reading.

Students who are English Language Learners.

Students with visual/hearing impairments and learning
disabilities regarding reading/spelling/writing.

Speech Recognition, Text to Speech, and Voice Interfaces

More Related Content

Viewers also liked (20)

Similar to Speech Recognition, Text to Speech, and Voice Interfaces (20)

Recently uploaded (20)

Speech Recognition, Text to Speech, and Voice Interfaces