SlideShare a Scribd company logo
Speech Recognition BY  Charu joshi
Introduction What is Speech Recognition? also known as  automatic speech   recognition  or  computer speech   recognition  which means understanding voice by the computer and performing any required task.
Where can it be used? - Dictation - System control/navigation - Commercial/Industrial applications - Voice dialing
Recognition Voice Input Analog to Digital Acoustic Model Language Model Display Speech Engine Feedback
Acoustic Model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech .  Language Model Language modeling is used in many natural language processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence .
TYPES OF VOICE RECOGNITION There are two types of speech recognition. One is called  speaker-dependent  and the other is speaker -independent . Speaker-dependent software is commonly used for dictation software, while speaker-independent software is more commonly found in telephone applications. Speaker-dependent software works by learning the unique characteristics of a single person’s voice, in a way similar to voice recognition. New users must first “train” the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few pages of text to the computer before they can use the speech recognition software.
TYPES OF VOICE RECOGNITION Speaker-independent software is designed to recognize anyone’s voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems — where businesses can’t ask callers to read pages of text before using the system. The downside is that speaker-independent software is generally less accurate than speaker-dependent software. Speech recognition engines that are speaker independent generally deal with this fact by limiting the grammars they use. By using a smaller list of recognized words, the speech engine is more likely to correctly recognize what a speaker said.
How do humans do it? Articulation produces sound waves which the ear conveys to the brain for processing
How might computers do it? Digitization Acoustic analysis of the speech signal Linguistic interpretation Acoustic waveform Acoustic signal Speech recognition
 
DIFFERENT PROCESSES INVOLVED Digitization Converting analogue signal into digital representation Signal processing  Separating speech from background noise Phonetics Variability in human speech Phonology Recognizing individual sound distinctions (similar phonemes)   is the systematic use of sound to encode meaning in any spoken human language   Lexicology and syntax Lexicology  is that part of linguistics which studies  words , their nature and meaning, words' elements, relations between words , words groups and the whole lexicon.       .
DIFFERENT PROCESSES INVOLVED(CONTD.) Syntax and pragmatics Semantics tells about the meaning   Pragmatics is concerned with bridging the explanatory gap between sentence meaning and speaker's meaning
Digitization Analogue to digital conversion  Sampling and quantizing Sampling is converting a continuous signal into a discrete signal   Quantizing is the process of approximating a continuous range of values   Use filters to measure energy levels for various points on the frequency spectrum Knowing the relative importance of different frequency bands (for speech) makes this process more efficient E.g. high frequency sounds are less informative, so can be sampled using a broader bandwidth (log scale)
Separating speech from background noise Noise cancelling microphones Two mics, one facing speaker, the other facing away Ambient noise is roughly same for both mics Knowing which bits of the signal relate to speech
EVOLUTION OF VOICE RECOGNITION Pattern Matching Interactive Voice Recognition (IVR ) Dictation Speech Integration into Applications Hands Free – Eyes Free
Process of speech recognition Speaker  Recognition Speech  Recognition parsing and arbitration S1 S2 SK SN
Speaker  Recognition Speech  Recognition parsing and arbitration Switch on Channel 9 S1 S2 SK SN
Speaker  Recognition Speech  Recognition parsing and arbitration Who is speaking? Annie David Cathy S1 S2 SK SN “ Authentication”
Speaker  Recognition Speech  Recognition parsing and arbitration What is he saying? On,Off,TV Fridge,Door S1 S2 SK SN “ Understanding”
Speaker  Recognition Speech  Recognition parsing and arbitration What is he talking about? Channel->TV Dim->Lamp On->TV,Lamp S1 S2 SK SN “ Switch”,”to”,”channel”,”nine” “ Inferring and execution”
Framework of Voice Recognition Face  Recognition Gesture Recognition parsing and arbitration S1 S2 SK SN “ Authentication” “ Understanding” “ Inferring and execution”
Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of  speaker specific  characteristics.
Generic Speaker Recognition System Preprocessing Feature  Extraction Pattern Matching Preprocessing Feature Extraction Speaker Model Speech signal Analysis Frames Feature Vector Score
ADVANTAGES Advantages People with disabilities Organizations - Increases productivity, reduces costs and errors. Lower operational Costs Advances in technology will allow consumers and businesses to implement  speech recognition systems at a relatively low cost. Cell-phone users can dial pre-programmed numbers by voice command. Users can trade stocks through a voice-activated trading system. Speech recognition technology can also replace touch-tone dialing resulting in the ability to target customers that speak different languages
DISADVANTAGES Difficult to build a perfect system. Conversations Involves more than just words (non-verbal communication; stutters etc. Every human being has differences such as their voice, mouth, and speaking style.  Filtering background noise is a task that can even be difficult for humans to accomplish.
Future of Speech Recognition Accuracy will become better and better. Dictation speech recognition will gradually become accepted.  Small hand-held writing tablets for computer speech recognition dictation and data entry will be developed, as faster processors and more memory become available.  Greater use will be made of "intelligent systems" which will attempt to guess what the speaker intended to say, rather than what was actually said, as people often misspeak and make unintentional mistakes.  Microphone and sound systems will be designed to adapt more quickly to changing background noise levels, different environments, with better recognition of extraneous material to be discarded.

More Related Content

PPTX
Speech recognition final presentation
PPT
Speech Recognition in Artificail Inteligence
PPTX
Speech Recognition Technology
PPTX
Speech Recognition Technology
PPTX
Speech to text conversion
PPTX
Speech Synthesis.pptx
PPT
Speech Recognition
Speech recognition final presentation
Speech Recognition in Artificail Inteligence
Speech Recognition Technology
Speech Recognition Technology
Speech to text conversion
Speech Synthesis.pptx
Speech Recognition

What's hot (20)

PPSX
Speech recognition an overview
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PDF
Desktop assistant
PDF
speech processing and recognition basic in data mining
PPTX
Artificial intelligence for speech recognition
PDF
Deep Learning For Speech Recognition
DOCX
A seminar report on speech recognition technology
PPTX
Speaker Recognition
PPTX
Speech Recognition
PPT
Voice morphing-
DOCX
Automatic Speech Recognition
PPT
Automatic speech recognition
PPTX
speech processing basics
PPTX
Natural language processing
PPT
Automatic speech recognition
PPTX
Introduction to text to speech
PPTX
Brain chips ppt
PPTX
Speech recognition system seminar
Speech recognition an overview
SPEECH RECOGNITION USING NEURAL NETWORK
Desktop assistant
speech processing and recognition basic in data mining
Artificial intelligence for speech recognition
Deep Learning For Speech Recognition
A seminar report on speech recognition technology
Speaker Recognition
Speech Recognition
Voice morphing-
Automatic Speech Recognition
Automatic speech recognition
speech processing basics
Natural language processing
Automatic speech recognition
Introduction to text to speech
Brain chips ppt
Speech recognition system seminar
Ad

Viewers also liked (19)

PPT
Speech Recognition System By Matlab
PPTX
Speech synthesis technology
PPT
Speech Recognition
PPT
Netbeans IDE & Platform
DOCX
Project final
PPTX
Speech recognition techniques
PDF
Speech recognition project report
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPTX
What is medical transcription
PDF
Universal Patient Identity: eliminating duplicate records, medical identity t...
PDF
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
PDF
Medical Records Destruction Guide
PPT
Noise Adaptive Training for Robust Automatic Speech Recognition
PPTX
Voice & Speech Recognition Technology in Healthcare
PPTX
Medical Transcription
PPSX
Medical Transcription Power Point Show
PPT
Translation and Transcription Process | Medical Transcription Service Company
PPT
Introduction to medical transcription
PPTX
Transcription
Speech Recognition System By Matlab
Speech synthesis technology
Speech Recognition
Netbeans IDE & Platform
Project final
Speech recognition techniques
Speech recognition project report
Deep Learning for Speech Recognition - Vikrant Singh Tomar
What is medical transcription
Universal Patient Identity: eliminating duplicate records, medical identity t...
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
Medical Records Destruction Guide
Noise Adaptive Training for Robust Automatic Speech Recognition
Voice & Speech Recognition Technology in Healthcare
Medical Transcription
Medical Transcription Power Point Show
Translation and Transcription Process | Medical Transcription Service Company
Introduction to medical transcription
Transcription
Ad

Similar to Speech recognition (20)

PPT
Speechrecognition 100423091251-phpapp01
PPTX
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
PPTX
PPTX
Speech to text conversion
PPTX
Artificial Intelligence- An Introduction
PPTX
Artificial Intelligence - An Introduction
PPTX
PDF
Artificial intelligence - research areas
PDF
Speech recognizers & generators
PDF
Artificial Intelligence for Speech Recognition
PDF
Speech recognition - how does it work?
PDF
Paper on Speech Recognition
PDF
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
PPTX
PDF
Dy36749754
PPTX
AI for voice recognition.pptx
PDF
ACHIEVING SECURITY VIA SPEECH RECOGNITION
PDF
Voice Recognition System using Template Matching
PDF
How does speech recognition AI work.pdf
Speechrecognition 100423091251-phpapp01
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
Speech to text conversion
Artificial Intelligence- An Introduction
Artificial Intelligence - An Introduction
Artificial intelligence - research areas
Speech recognizers & generators
Artificial Intelligence for Speech Recognition
Speech recognition - how does it work?
Paper on Speech Recognition
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
Dy36749754
AI for voice recognition.pptx
ACHIEVING SECURITY VIA SPEECH RECOGNITION
Voice Recognition System using Template Matching
How does speech recognition AI work.pdf

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Institutional Correction lecture only . . .
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
RMMM.pdf make it easy to upload and study
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Types and Its function , kingdom of life
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Cell Structure & Organelles in detailed.
PDF
Insiders guide to clinical Medicine.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Pre independence Education in Inndia.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Institutional Correction lecture only . . .
01-Introduction-to-Information-Management.pdf
Week 4 Term 3 Study Techniques revisited.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
RMMM.pdf make it easy to upload and study
Anesthesia in Laparoscopic Surgery in India
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPH.pptx obstetrics and gynecology in nursing
Cell Types and Its function , kingdom of life
Complications of Minimal Access Surgery at WLH
Cell Structure & Organelles in detailed.
Insiders guide to clinical Medicine.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pre independence Education in Inndia.pdf
Renaissance Architecture: A Journey from Faith to Humanism
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx

Speech recognition

  • 1. Speech Recognition BY Charu joshi
  • 2. Introduction What is Speech Recognition? also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task.
  • 3. Where can it be used? - Dictation - System control/navigation - Commercial/Industrial applications - Voice dialing
  • 4. Recognition Voice Input Analog to Digital Acoustic Model Language Model Display Speech Engine Feedback
  • 5. Acoustic Model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech . Language Model Language modeling is used in many natural language processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence .
  • 6. TYPES OF VOICE RECOGNITION There are two types of speech recognition. One is called  speaker-dependent  and the other is speaker -independent . Speaker-dependent software is commonly used for dictation software, while speaker-independent software is more commonly found in telephone applications. Speaker-dependent software works by learning the unique characteristics of a single person’s voice, in a way similar to voice recognition. New users must first “train” the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few pages of text to the computer before they can use the speech recognition software.
  • 7. TYPES OF VOICE RECOGNITION Speaker-independent software is designed to recognize anyone’s voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems — where businesses can’t ask callers to read pages of text before using the system. The downside is that speaker-independent software is generally less accurate than speaker-dependent software. Speech recognition engines that are speaker independent generally deal with this fact by limiting the grammars they use. By using a smaller list of recognized words, the speech engine is more likely to correctly recognize what a speaker said.
  • 8. How do humans do it? Articulation produces sound waves which the ear conveys to the brain for processing
  • 9. How might computers do it? Digitization Acoustic analysis of the speech signal Linguistic interpretation Acoustic waveform Acoustic signal Speech recognition
  • 10.  
  • 11. DIFFERENT PROCESSES INVOLVED Digitization Converting analogue signal into digital representation Signal processing Separating speech from background noise Phonetics Variability in human speech Phonology Recognizing individual sound distinctions (similar phonemes)   is the systematic use of sound to encode meaning in any spoken human language Lexicology and syntax Lexicology is that part of linguistics which studies  words , their nature and meaning, words' elements, relations between words , words groups and the whole lexicon.   .
  • 12. DIFFERENT PROCESSES INVOLVED(CONTD.) Syntax and pragmatics Semantics tells about the meaning Pragmatics is concerned with bridging the explanatory gap between sentence meaning and speaker's meaning
  • 13. Digitization Analogue to digital conversion Sampling and quantizing Sampling is converting a continuous signal into a discrete signal Quantizing is the process of approximating a continuous range of values Use filters to measure energy levels for various points on the frequency spectrum Knowing the relative importance of different frequency bands (for speech) makes this process more efficient E.g. high frequency sounds are less informative, so can be sampled using a broader bandwidth (log scale)
  • 14. Separating speech from background noise Noise cancelling microphones Two mics, one facing speaker, the other facing away Ambient noise is roughly same for both mics Knowing which bits of the signal relate to speech
  • 15. EVOLUTION OF VOICE RECOGNITION Pattern Matching Interactive Voice Recognition (IVR ) Dictation Speech Integration into Applications Hands Free – Eyes Free
  • 16. Process of speech recognition Speaker Recognition Speech Recognition parsing and arbitration S1 S2 SK SN
  • 17. Speaker Recognition Speech Recognition parsing and arbitration Switch on Channel 9 S1 S2 SK SN
  • 18. Speaker Recognition Speech Recognition parsing and arbitration Who is speaking? Annie David Cathy S1 S2 SK SN “ Authentication”
  • 19. Speaker Recognition Speech Recognition parsing and arbitration What is he saying? On,Off,TV Fridge,Door S1 S2 SK SN “ Understanding”
  • 20. Speaker Recognition Speech Recognition parsing and arbitration What is he talking about? Channel->TV Dim->Lamp On->TV,Lamp S1 S2 SK SN “ Switch”,”to”,”channel”,”nine” “ Inferring and execution”
  • 21. Framework of Voice Recognition Face Recognition Gesture Recognition parsing and arbitration S1 S2 SK SN “ Authentication” “ Understanding” “ Inferring and execution”
  • 22. Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of speaker specific characteristics.
  • 23. Generic Speaker Recognition System Preprocessing Feature Extraction Pattern Matching Preprocessing Feature Extraction Speaker Model Speech signal Analysis Frames Feature Vector Score
  • 24. ADVANTAGES Advantages People with disabilities Organizations - Increases productivity, reduces costs and errors. Lower operational Costs Advances in technology will allow consumers and businesses to implement speech recognition systems at a relatively low cost. Cell-phone users can dial pre-programmed numbers by voice command. Users can trade stocks through a voice-activated trading system. Speech recognition technology can also replace touch-tone dialing resulting in the ability to target customers that speak different languages
  • 25. DISADVANTAGES Difficult to build a perfect system. Conversations Involves more than just words (non-verbal communication; stutters etc. Every human being has differences such as their voice, mouth, and speaking style. Filtering background noise is a task that can even be difficult for humans to accomplish.
  • 26. Future of Speech Recognition Accuracy will become better and better. Dictation speech recognition will gradually become accepted. Small hand-held writing tablets for computer speech recognition dictation and data entry will be developed, as faster processors and more memory become available. Greater use will be made of "intelligent systems" which will attempt to guess what the speaker intended to say, rather than what was actually said, as people often misspeak and make unintentional mistakes. Microphone and sound systems will be designed to adapt more quickly to changing background noise levels, different environments, with better recognition of extraneous material to be discarded.