Speech user interface

INTERFACE
1
Husain Firoz Master
(302093)
Guided by proff. Vidya Patil

Outline
 Introduction
 Need for SUI
 Expectations from SUI
 overview of speech Recognition
 Voice features extraction and it technique
 Implementation of SUI
 Applications
 Future scope
 Shortcomings
 Conclusion
2

SPEECH USER INTERFACE (SUI)
 A user interface that works with human voice commands
 It offers truly hands free, eyes free interaction with computers
 It provide interface for operating computers with following understandings:
 Technology support
 User category
 User support
3

NEED FOR A SPEECH USER INTERFACE
 It offer truly hands free, eyes-free interaction
 have unmatched throughput rates
 are the only plausible interaction modality for illiterate users across the world
 speech is faster than typing on a keyboard
 Present opportunities for illiterate users in developing regions, giving them a
feasible way to access computing.
 but they are not yet developed in abundance to support every type of user,
language, or acoustic scenario.
4

Expectations from speech user interface
(SUI)
 Recognize speech from any untrained users
 Understand the meaning of the spoken word
 Make the action as per meaning extracted word
 Deal with multiple languages
 Incorporate with large vocabularies
 Provide good fault tolerance level
 Provide help and messages, to users during interaction
 Operate in real-time
5

Speech Recognition
 Translation of spoken words into text
 It is the ability of machines to understand natural human spoken language.
 Two types of Speech recognition, speaker-dependent and speaker-independent
 speaker-dependent :-Systems that require training
 speaker-independent :-Systems that do not require training
 Basically it is the process by which a computer maps an acoustic speech signal to
text.
6

VOICE FEATURE EXTRACTION
 Voice feature extraction is known as front end processing.
 It is performed in both recognition and training mode.
 converts digital speech signal into sets of numerical descriptors called feature
vectors .
 contain key characteristics of the speech signal.
 It evaluate of the different types of feature extracted from voice to determine their
suitability for voice recognition
 MFCC and HMM are one the most currently used feature extraction techniques.
8

Feature Extraction Techniques
 Mel-frequency Cepstral coefficients (MFCC)
 Hidden Markov model (HMM)
 Dynamic time warping (DTW)
 Fusion HMM and DTW
9

Mel-frequency Cepstral coefficients (MFCC)
 Mel-frequency cepstral (MFC) is a representation of the short-term power
spectrum of a sound, based on a linear cosine transform of a log power spectrum.
 Mel-frequency Cepstral coefficients (MFCCs) are coefficients which contents the
frequency bands of an audio input.
 The frequency bands are equally spaced, by which mfcc approximates the human
auditory system's response more closely.
 The use of about 20 MFCC coefficients is common in ASR, although 10-12
coefficients are often considered to be sufficient for coding speech.
10

Hidden Markov model (HMM)
 HMM models are used for representing the possible symbol sequences underlying
speech utterances.
 The states in the HMM represent easy spoken basic linguistic units (e.g. phonemes
or smaller phases of phonemes) that are used by the human to pronounce a word.
 For each word, one or more complex HMMs exist, which model the probability to
articulate a state sequence representing a word.
 HMMs are usually trained from large sets of recorded and feature analyzed
samples.
11

Dynamic time warping (DTW)
 dynamic time warping (DTW) is an algorithm for measuring similarity between two
spoken word sequences which may vary in time or speed.
 it compares two speech sequences.
 It measures similarity between two sequences
12

Fusion HMM and DTW
 In this method HMM and DTW are combined
 Basically the results of HMM and DTW are combined in weight mean vectors
 DTW find the similarity between two signals based on time
 Meanwhile HMM trains cluster and iteratively moves between clusters based on
their likelihoods given to it while training.
13

SUI IMPLEMENTATION (contd.)
 Recording Speech
 Applying Noise cancellation
 End point detection
 Feature extraction :MFCC algorithm is used and parameters are separated, that are
further used in training part.
 Normalization: Word length is calculated for all groups and made an average for
each.
15

SUI IMPLEMENTATION (contd.)
 Training using HMM
 Fusion HMM and DTW
 Recognized word or sentence is given to the application
16

Applications
 Speech Operated Calculator.
 Voice Dialing
 intelligent voice assistant (Personal Agent) like Apple SIRI, google voice talk etc.
 Home and Building Automation Systems using SUI
 live subtitling on television
 speech-to-text conversion or note taking systems
17

Future Scope
 Speech User Interface For Learning Foreign Languages
 Dictation tools in the medical and legal profession
18

SHORTCOMINGS
 Train the speech recognition system in the implementation environ-ment.
 Keep vocabulary Size small
 Keep short each speech input (word length).
 Use speech inputs that sound distinctly deferent from each other.
 Keep the user interface simple.
 Don't use speech to position objects.
 Use a command-based user interface.
 Allow users to quickly and easily turn oFF and on the speech recognizer.
 Use a highly directional, noise-canceling microphone
19

Speech user interface

More Related Content

What's hot (16)

Viewers also liked (18)

Similar to Speech user interface (20)

Recently uploaded (20)

Speech user interface