Speech recognition final presentation

• What is speech
recognition?

 Speech recognition technology has recently
reached a higher level of performance and
robustness, allowing it to communicate to another
user by talking .
 Speech Recognization is process of decoding
acoustic speech signal captured by microphone or
telephone ,to a set of words.
 And with the help of these it will recognize whole
speech is recognized word by word .

 : speaker independent and speaker dependent.
 Speaker independent models recognize the speech patterns of a
large group of people.
 Speaker dependent models recognize speech patterns from only
one person. Both models use mathematical and statistical
formulas to yield the best work match for speech. A third
variation of speaker models is now emerging, called speaker
adaptive.
 Speaker adaptive systems usually begin with a speaker
independent model and adjust these models more closely to
each individual during a brief training period.

• Most Natural Form Of
Communication
• Differently abled people
• Illiterate
• Helplines
• Cars

Voice Input Analog to Digital Acoustic Model
Language Model
Feedback Display Speech Engine

 Step 1:User Input
The system catches user’s voice in the form of
analog acoustic signal.
 Step 2:Digitization
Digitize the analog acoustic signal.
 Step 3:Phonetic Breakdown
Breaking signals into phonemes.

 Step 4:Statistical Modeling
 Mapping phonemes to their phonetic
representation using statistics model.
 Step 5:Matching
 According to grammar , phonetic representation
and Dictionary , the system returns an n-best list
(I.e.:a word plus a confidence score)
 Grammar-the union words or phrases to constraint
the range of input or output in the voice application.
 Dictionary-the mapping table of phonetic
representation and word(EX:thu,theethe)

13
/3
4
Approaches
to ASR
Template
based
Statistics
based

Store examples of units (words,
phonemes), then find the example that
most closely fits the input
Extract features from speech signal, then
it’s “just” a complex similarity matching
problem, using solutions developed for all
sorts of applications
OK for discrete utterances, and a single
user
14
/3
4

Hard to distinguish very similar templates
And quickly degrades when input differs
from templates
Therefore needs techniques to mitigate
this degradation:
• More subtle matching techniques
• Multiple templates which are aggregated
 Taken together, these suggested …
15
/3
4

Collect a large corpus of transcribed
speech recordings
Train the computer to learn the
correspondences (“machine learning”)
At run time, apply statistical processes to
search through the space of all possible
solutions, and pick the statistically most
likely one
16
/3
4

Acoustic and Lexical Models
• Analyse training data in terms of relevant features
• Learn from large amount of data different
possibilities
 different phone sequences for a given word
 different combinations of elements of the speech signal
for a given phone/phoneme
• Combine these into a Hidden Markov Model
expressing the probabilities
17
/3
4

 Real-world has structures and processes which have (or
produce) observable outputs:
o Usually sequential (process unfolds over time)
o Cannot see the event producing the output
Example: speech signals

HMM Overview
• Machine learning method
• Makes use of state machines
• Based on probabilistic model
• Can only observe output from states,
not the states themselves
– Example: speech recognition
• Observe: acoustic signals
• Hidden States: phonemes
(distinctive sounds of a language)

HMM Components
• A set of states (x’s)
• A set of possible output symbols
(y’s)
• A state transition matrix (a’s):
probability of making transition from
one state to the next
• Output emission matrix (b’s):
probability of a emitting/observing a
symbol at a particular state
• Initial probability vector:
o probability of starting at a
particular state
o Not shown, sometimes assumed
to be 1

HMM Advantages
• Advantages:
o Effective
o Can handle variations in record structure
Optional fields
Varying field ordering

 Digitization
• Converting analogue signal into digital representation.
 Signal processing
• Separating speech from background noise.
 Phonetics
• Variability in human speech.
 Phonology
• Recognizing individual sound distinctions (similar phonemes.)
 Lexicology and syntax
• Disambiguating homophones.
• Features of continuous speech.
 Syntax and pragmatics
• Interpreting features.
• Filtering of performance errors (disfluencies).

Speech Recognition is still a very cumbersome problem.
Following are the problem….
 Speaker Variability
Two speakers or even the same speaker will
pronounce the same word differently
 Channel Variability
The quality and position of microphone and
background environment will affect the output

 Speech recognition applications include
 Voice dialling (e.g., "Call home"),
 Call routing (e.g., "I would like to make a collect call"),
 Simple data entry (e.g., entering a credit card number),
 Preparation of structured documents (e.g., A radiology
report),
 Speech-to-text processing (e.g., word processors or emails),
and
 In aircraft cockpits (usually termed Direct Voice Input).

 Medical Transcription
 Military
 Telephony and other domains
 Serving the disabled
Further Applications
• Home automation
• Automobile audio systems
• Telematics

 Faster than “hand-writing”.
 Allows for better spelling, whether it be in
text or documents.
 Helpful for people with a mental or
physical disability .
 Hands-free capability .

 No program is 100% perfect
 Factors that affect the accuracy of speech
recognition are: slang, homonyms, signal-to-
noise ratio, and overlapping speech
 Can be expensive depending on the
program

 http://en.wikipedia.org/wiki/Speech_recognition
 https://www.scribd.com/doc/130376790/Speech-
Recognition
 "Speaker Independent Connected Speech Recognition- Fifth
Generation Computer Corporation". Fifthgen.com.
 http://books.google.co.in/books?hl=en&lr=&id=iDHgboYR
zmgC&oi=fnd&pg=PA1&dq=speech+recognition+papers+
publications&ots=jb6NESTrjF&sig=oMKROIXccSgEyMGO
Zmi5lkToJvM#v=onepage&q=speech%20recognition%20p
apers%20publications&f=false
 http://www.speechrecognition.com
 https://www.google.co.in/?gfe_rd=cr&ei=GbHdU9f1MtKAo
AOW64GADg&gws_rd=ssl

Speech recognition final presentation

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to Speech recognition final presentation (20)

Recently uploaded (20)

Speech recognition final presentation