SlideShare a Scribd company logo
Speech recognition final presentation
Speech recognition final presentation
Speech recognition final presentation
• What is speech 
recognition?
 Speech recognition technology has recently 
reached a higher level of performance and 
robustness, allowing it to communicate to another 
user by talking . 
 Speech Recognization is process of decoding 
acoustic speech signal captured by microphone or 
telephone ,to a set of words. 
 And with the help of these it will recognize whole 
speech is recognized word by word .
 : speaker independent and speaker dependent. 
 Speaker independent models recognize the speech patterns of a 
large group of people. 
 Speaker dependent models recognize speech patterns from only 
one person. Both models use mathematical and statistical 
formulas to yield the best work match for speech. A third 
variation of speaker models is now emerging, called speaker 
adaptive. 
 Speaker adaptive systems usually begin with a speaker 
independent model and adjust these models more closely to 
each individual during a brief training period.
• Most Natural Form Of 
Communication 
• Differently abled people 
• Illiterate 
• Helplines 
• Cars
Speech recognition final presentation
Speech recognition final presentation
Voice Input Analog to Digital Acoustic Model 
Language Model 
Feedback Display Speech Engine
 Step 1:User Input 
The system catches user’s voice in the form of 
analog acoustic signal. 
 Step 2:Digitization 
Digitize the analog acoustic signal. 
 Step 3:Phonetic Breakdown 
Breaking signals into phonemes.
 Step 4:Statistical Modeling 
 Mapping phonemes to their phonetic 
representation using statistics model. 
 Step 5:Matching 
 According to grammar , phonetic representation 
and Dictionary , the system returns an n-best list 
(I.e.:a word plus a confidence score) 
 Grammar-the union words or phrases to constraint 
the range of input or output in the voice application. 
 Dictionary-the mapping table of phonetic 
representation and word(EX:thu,theethe)
13 
/3 
4 
Approaches 
to ASR 
Template 
based 
Statistics 
based
Store examples of units (words, 
phonemes), then find the example that 
most closely fits the input 
Extract features from speech signal, then 
it’s “just” a complex similarity matching 
problem, using solutions developed for all 
sorts of applications 
OK for discrete utterances, and a single 
user 
14 
/3 
4
Hard to distinguish very similar templates 
And quickly degrades when input differs 
from templates 
Therefore needs techniques to mitigate 
this degradation: 
• More subtle matching techniques 
• Multiple templates which are aggregated 
 Taken together, these suggested … 
15 
/3 
4
Collect a large corpus of transcribed 
speech recordings 
Train the computer to learn the 
correspondences (“machine learning”) 
At run time, apply statistical processes to 
search through the space of all possible 
solutions, and pick the statistically most 
likely one 
16 
/3 
4
Acoustic and Lexical Models 
• Analyse training data in terms of relevant features 
• Learn from large amount of data different 
possibilities 
 different phone sequences for a given word 
 different combinations of elements of the speech signal 
for a given phone/phoneme 
• Combine these into a Hidden Markov Model 
expressing the probabilities 
17 
/3 
4
 Real-world has structures and processes which have (or 
produce) observable outputs: 
o Usually sequential (process unfolds over time) 
o Cannot see the event producing the output 
Example: speech signals
HMM Overview 
• Machine learning method 
• Makes use of state machines 
• Based on probabilistic model 
• Can only observe output from states, 
not the states themselves 
– Example: speech recognition 
• Observe: acoustic signals 
• Hidden States: phonemes 
(distinctive sounds of a language)
HMM Components 
• A set of states (x’s) 
• A set of possible output symbols 
(y’s) 
• A state transition matrix (a’s): 
probability of making transition from 
one state to the next 
• Output emission matrix (b’s): 
probability of a emitting/observing a 
symbol at a particular state 
• Initial probability vector: 
o probability of starting at a 
particular state 
o Not shown, sometimes assumed 
to be 1
21 
/3 
4
HMM Advantages 
• Advantages: 
o Effective 
o Can handle variations in record structure 
Optional fields 
Varying field ordering
 Digitization 
• Converting analogue signal into digital representation. 
 Signal processing 
• Separating speech from background noise. 
 Phonetics 
• Variability in human speech. 
 Phonology 
• Recognizing individual sound distinctions (similar phonemes.) 
 Lexicology and syntax 
• Disambiguating homophones. 
• Features of continuous speech. 
 Syntax and pragmatics 
• Interpreting features. 
• Filtering of performance errors (disfluencies).
Speech Recognition is still a very cumbersome problem. 
Following are the problem…. 
 Speaker Variability 
Two speakers or even the same speaker will 
pronounce the same word differently 
 Channel Variability 
The quality and position of microphone and 
background environment will affect the output
 Speech recognition applications include 
 Voice dialling (e.g., "Call home"), 
 Call routing (e.g., "I would like to make a collect call"), 
 Simple data entry (e.g., entering a credit card number), 
 Preparation of structured documents (e.g., A radiology 
report), 
 Speech-to-text processing (e.g., word processors or emails), 
and 
 In aircraft cockpits (usually termed Direct Voice Input).
 Medical Transcription 
 Military 
 Telephony and other domains 
 Serving the disabled 
Further Applications 
• Home automation 
• Automobile audio systems 
• Telematics
 Faster than “hand-writing”. 
 Allows for better spelling, whether it be in 
text or documents. 
 Helpful for people with a mental or 
physical disability . 
 Hands-free capability .
 No program is 100% perfect 
 Factors that affect the accuracy of speech 
recognition are: slang, homonyms, signal-to- 
noise ratio, and overlapping speech 
 Can be expensive depending on the 
program
 http://en.wikipedia.org/wiki/Speech_recognition 
 https://www.scribd.com/doc/130376790/Speech- 
Recognition 
 "Speaker Independent Connected Speech Recognition- Fifth 
Generation Computer Corporation". Fifthgen.com. 
 http://books.google.co.in/books?hl=en&lr=&id=iDHgboYR 
zmgC&oi=fnd&pg=PA1&dq=speech+recognition+papers+ 
publications&ots=jb6NESTrjF&sig=oMKROIXccSgEyMGO 
Zmi5lkToJvM#v=onepage&q=speech%20recognition%20p 
apers%20publications&f=false 
 http://www.speechrecognition.com 
 https://www.google.co.in/?gfe_rd=cr&ei=GbHdU9f1MtKAo 
AOW64GADg&gws_rd=ssl
Speech recognition final presentation

More Related Content

PPT
Speech recognition
PPTX
Artificial intelligence for speech recognition
PPT
Speech Recognition in Artificail Inteligence
PPTX
Speech recognition system seminar
PPTX
Speech Recognition Technology
PPT
Artificial intelligence Speech recognition system
PDF
Speech recognition project report
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
Speech recognition
Artificial intelligence for speech recognition
Speech Recognition in Artificail Inteligence
Speech recognition system seminar
Speech Recognition Technology
Artificial intelligence Speech recognition system
Speech recognition project report
SPEECH RECOGNITION USING NEURAL NETWORK

What's hot (20)

PPT
Speech Recognition
PPTX
Speech Recognition
PDF
Deep Learning For Speech Recognition
PPSX
Speech recognition an overview
PPTX
Speech to text conversion
PPT
Speech Recognition
PPTX
Speech Recognition Technology
PDF
speech processing and recognition basic in data mining
PPT
Voice Recognition
PPTX
Voice recognition system
PPTX
Sign language recognizer
PPTX
Automatic speech recognition system
PPTX
Hand gesture recognition
PPT
Automatic speech recognition
PPTX
Speech synthesis technology
PPTX
Speech Recognition by Iqbal
DOCX
A seminar report on speech recognition technology
PPTX
Voice browser
Speech Recognition
Speech Recognition
Deep Learning For Speech Recognition
Speech recognition an overview
Speech to text conversion
Speech Recognition
Speech Recognition Technology
speech processing and recognition basic in data mining
Voice Recognition
Voice recognition system
Sign language recognizer
Automatic speech recognition system
Hand gesture recognition
Automatic speech recognition
Speech synthesis technology
Speech Recognition by Iqbal
A seminar report on speech recognition technology
Voice browser
Ad

Viewers also liked (17)

PDF
PPTX
Speech recognition techniques
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPTX
Voice & Speech Recognition Technology in Healthcare
PPT
Introduction to medical transcription
PPT
Translation and Transcription Process | Medical Transcription Service Company
PDF
Medical Records Destruction Guide
PDF
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
PPSX
Medical Transcription Power Point Show
PDF
Universal Patient Identity: eliminating duplicate records, medical identity t...
PPTX
Medical Transcription
PPTX
What is medical transcription
PPT
Noise Adaptive Training for Robust Automatic Speech Recognition
PPTX
Transcription
PPT
Medical Records Role and its Maintenance.
PPT
Speech Recognition System By Matlab
PPTX
Medical records ppt
Speech recognition techniques
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Voice & Speech Recognition Technology in Healthcare
Introduction to medical transcription
Translation and Transcription Process | Medical Transcription Service Company
Medical Records Destruction Guide
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
Medical Transcription Power Point Show
Universal Patient Identity: eliminating duplicate records, medical identity t...
Medical Transcription
What is medical transcription
Noise Adaptive Training for Robust Automatic Speech Recognition
Transcription
Medical Records Role and its Maintenance.
Speech Recognition System By Matlab
Medical records ppt
Ad

Similar to Speech recognition final presentation (20)

PDF
De4201715719
PPTX
PPT
Automatic speech recognition
PPT
Speech recognition system
PDF
Kc3517481754
PDF
A_Review_on_Different_Approaches_for_Spe.pdf
PDF
Course report-islam-taharimul (1)
PPTX
Speech Recognition
PPTX
How speech reorganization works
PPTX
Speech to text conversion
PPTX
Sequence to sequence model speech recognition
PPTX
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
PDF
International journal of signal and image processing issues vol 2015 - no 1...
PDF
AReviewonDifferentApproachesforSpeechRecognitionSystem.pdf
PPTX
Speech Recognition
PDF
5215ijcseit01
PDF
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
PDF
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
PDF
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
De4201715719
Automatic speech recognition
Speech recognition system
Kc3517481754
A_Review_on_Different_Approaches_for_Spe.pdf
Course report-islam-taharimul (1)
Speech Recognition
How speech reorganization works
Speech to text conversion
Sequence to sequence model speech recognition
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
International journal of signal and image processing issues vol 2015 - no 1...
AReviewonDifferentApproachesforSpeechRecognitionSystem.pdf
Speech Recognition
5215ijcseit01
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Welding lecture in detail for understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Digital Logic Computer Design lecture notes
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
UNIT 4 Total Quality Management .pptx
Welding lecture in detail for understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Geodesy 1.pptx...............................................
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Strings in CPP - Strings in C++ are sequences of characters used to store and...
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Lesson 3_Tessellation.pptx finite Mathematics
Lecture Notes Electrical Wiring System Components
Digital Logic Computer Design lecture notes
Embodied AI: Ushering in the Next Era of Intelligent Systems
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CH1 Production IntroductoryConcepts.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...

Speech recognition final presentation

  • 4. • What is speech recognition?
  • 5.  Speech recognition technology has recently reached a higher level of performance and robustness, allowing it to communicate to another user by talking .  Speech Recognization is process of decoding acoustic speech signal captured by microphone or telephone ,to a set of words.  And with the help of these it will recognize whole speech is recognized word by word .
  • 6.  : speaker independent and speaker dependent.  Speaker independent models recognize the speech patterns of a large group of people.  Speaker dependent models recognize speech patterns from only one person. Both models use mathematical and statistical formulas to yield the best work match for speech. A third variation of speaker models is now emerging, called speaker adaptive.  Speaker adaptive systems usually begin with a speaker independent model and adjust these models more closely to each individual during a brief training period.
  • 7. • Most Natural Form Of Communication • Differently abled people • Illiterate • Helplines • Cars
  • 10. Voice Input Analog to Digital Acoustic Model Language Model Feedback Display Speech Engine
  • 11.  Step 1:User Input The system catches user’s voice in the form of analog acoustic signal.  Step 2:Digitization Digitize the analog acoustic signal.  Step 3:Phonetic Breakdown Breaking signals into phonemes.
  • 12.  Step 4:Statistical Modeling  Mapping phonemes to their phonetic representation using statistics model.  Step 5:Matching  According to grammar , phonetic representation and Dictionary , the system returns an n-best list (I.e.:a word plus a confidence score)  Grammar-the union words or phrases to constraint the range of input or output in the voice application.  Dictionary-the mapping table of phonetic representation and word(EX:thu,theethe)
  • 13. 13 /3 4 Approaches to ASR Template based Statistics based
  • 14. Store examples of units (words, phonemes), then find the example that most closely fits the input Extract features from speech signal, then it’s “just” a complex similarity matching problem, using solutions developed for all sorts of applications OK for discrete utterances, and a single user 14 /3 4
  • 15. Hard to distinguish very similar templates And quickly degrades when input differs from templates Therefore needs techniques to mitigate this degradation: • More subtle matching techniques • Multiple templates which are aggregated  Taken together, these suggested … 15 /3 4
  • 16. Collect a large corpus of transcribed speech recordings Train the computer to learn the correspondences (“machine learning”) At run time, apply statistical processes to search through the space of all possible solutions, and pick the statistically most likely one 16 /3 4
  • 17. Acoustic and Lexical Models • Analyse training data in terms of relevant features • Learn from large amount of data different possibilities  different phone sequences for a given word  different combinations of elements of the speech signal for a given phone/phoneme • Combine these into a Hidden Markov Model expressing the probabilities 17 /3 4
  • 18.  Real-world has structures and processes which have (or produce) observable outputs: o Usually sequential (process unfolds over time) o Cannot see the event producing the output Example: speech signals
  • 19. HMM Overview • Machine learning method • Makes use of state machines • Based on probabilistic model • Can only observe output from states, not the states themselves – Example: speech recognition • Observe: acoustic signals • Hidden States: phonemes (distinctive sounds of a language)
  • 20. HMM Components • A set of states (x’s) • A set of possible output symbols (y’s) • A state transition matrix (a’s): probability of making transition from one state to the next • Output emission matrix (b’s): probability of a emitting/observing a symbol at a particular state • Initial probability vector: o probability of starting at a particular state o Not shown, sometimes assumed to be 1
  • 22. HMM Advantages • Advantages: o Effective o Can handle variations in record structure Optional fields Varying field ordering
  • 23.  Digitization • Converting analogue signal into digital representation.  Signal processing • Separating speech from background noise.  Phonetics • Variability in human speech.  Phonology • Recognizing individual sound distinctions (similar phonemes.)  Lexicology and syntax • Disambiguating homophones. • Features of continuous speech.  Syntax and pragmatics • Interpreting features. • Filtering of performance errors (disfluencies).
  • 24. Speech Recognition is still a very cumbersome problem. Following are the problem….  Speaker Variability Two speakers or even the same speaker will pronounce the same word differently  Channel Variability The quality and position of microphone and background environment will affect the output
  • 25.  Speech recognition applications include  Voice dialling (e.g., "Call home"),  Call routing (e.g., "I would like to make a collect call"),  Simple data entry (e.g., entering a credit card number),  Preparation of structured documents (e.g., A radiology report),  Speech-to-text processing (e.g., word processors or emails), and  In aircraft cockpits (usually termed Direct Voice Input).
  • 26.  Medical Transcription  Military  Telephony and other domains  Serving the disabled Further Applications • Home automation • Automobile audio systems • Telematics
  • 27.  Faster than “hand-writing”.  Allows for better spelling, whether it be in text or documents.  Helpful for people with a mental or physical disability .  Hands-free capability .
  • 28.  No program is 100% perfect  Factors that affect the accuracy of speech recognition are: slang, homonyms, signal-to- noise ratio, and overlapping speech  Can be expensive depending on the program
  • 29.  http://en.wikipedia.org/wiki/Speech_recognition  https://www.scribd.com/doc/130376790/Speech- Recognition  "Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation". Fifthgen.com.  http://books.google.co.in/books?hl=en&lr=&id=iDHgboYR zmgC&oi=fnd&pg=PA1&dq=speech+recognition+papers+ publications&ots=jb6NESTrjF&sig=oMKROIXccSgEyMGO Zmi5lkToJvM#v=onepage&q=speech%20recognition%20p apers%20publications&f=false  http://www.speechrecognition.com  https://www.google.co.in/?gfe_rd=cr&ei=GbHdU9f1MtKAo AOW64GADg&gws_rd=ssl