scribgy.ppt

Introduction to Automatic
Speech Recognition

Outline
Define the problem
What is speech?
Feature Selection
Models
 Early methods
 Modern statistical models
Current State of ASR
Future Work

The ASR Problem
There is no single ASR problem
The problem depends on many factors
 Microphone: Close-mic, throat-mic, microphone array, audio-visual
 Sources: band-limited, background noise, reverberation
 Speaker: speaker dependent, speaker independent
 Language: open/closed vocabulary, vocabulary size, read/spontaneous
speech
 Output: Transcription, speaker id, keywords

Performance Evaluation
Accuracy
 Percentage of tokens correctly recognized
Error Rate
 Inverse of accuracy
Token Type
 Phones
 Words*
 Sentences
 Semantics?

What is Speech?
Analog signal produced by humans
You can think about the speech signal being decomposed into the source
and filter
The source is the vocal folds in voiced speech
The filter is the vocal tract and articulators

Acoustic Model
 For each frame of data, we need some way of describing the
likelihood of it belonging to any of our classes
 Two methods are commonly used
 Multilayer perceptron (MLP) gives the likelihood of a class given the data
 Gaussian Mixture Model (GMM) gives the likelihood of the data given a class

Pronunciation Model
 While the pronunciation model can be very complex, it is typically
just a dictionary
 The dictionary contains the valid pronunciations for each word
 Examples:
 Cat: k ae t
 Dog: d ao g
 Fox: f aa x s

Language Model
 Now we need some way of representing the likelihood of any given
word sequence
 Many methods exist, but ngrams are the most common
 Ngrams models are trained by simply counting the occurrences of
words in a training set

Ngrams
 A unigram is the probability of any word in isolation
 A bigram is the probability of a given word given the previous word
 Higher order ngrams continue in a similar fashion
 A backoff probability is used for any unseen data

How do we put it together?
 We now have models to represent the three parts of our equation
 We need a framework to join these models together
 The standard framework used is the Hidden Markov Model (HMM)

Markov Model
 A state model using the markov property
 The markov property states that the future depends only on the present
state
 Models the likelihood of transitions between states in a model
 Given the model, we can determine the likelihood of any sequence of
states

Hidden Markov Model
 Similar to a markov model except the states are hidden
 We now have observations tied to the individual states
 We no longer know the exact state sequence given the data
 Allows for the modeling of an underlying unobservable process

HMMs for ASR
 First we build an HMM for each phone
 Next we combine the phone models based on the pronunciation model
to create word level models
 Finally, the word level models are combined based on the language
model
 We now have a giant network with potentially thousands or even
millions of states

Decoding
 Decoding happens in the same way as the previous example
 For each time frame we need to maintain two pieces of information
 The likelihood of being at any state
 The previous state for every state

State of the Art
 What works well
 Constrained vocabulary systems
 Systems adapted to a given speaker
 Systems in anechoic environments without background noise
 Systems expecting read speech
 What doesn't work
 Large unconstrained vocabulary
 Noisy environments
 Conversational speech

Future Work
 Better representations of audio based on humans
 Better representation of acoustic elements based on articulatory
phonology
 Segmental models that do not rely on the simple frame-based
approach

Resources
 Hidden Markov Model Toolkit (HTK)
 http://htk.eng.cam.ac.uk/
 CHIME ( a freely available dataset)
 http://spandh.dcs.shef.ac.uk/projects/chime/PCC/datasets.html
 Machine Learning Lectures
 http://www.stanford.edu/class/cs229/
 http://www.youtube.com/watch?v=UzxYlbK2c7E

scribgy.ppt

More Related Content

Similar to scribgy.ppt (20)

Recently uploaded (20)

scribgy.ppt