SlideShare a Scribd company logo
2
Most read
11
Most read
20
Most read
Visit www.seminarlinks.blogspot.in to Download
Introduction
• Speech recognition is the process of converting an acoustic signal, captured
by a microphone or a telephone, to a set of words.
• The recognized words can be an end in themselves, as for applications such
as commands & control, data entry, and document preparation.
• They can also serve as the input to further linguistic processing in order to
achieve speech understanding.
• It is also known as Automatic Speech Recognition (ASR) ,computer speech
recognition, speech to text (STT).
History
• Around since the 1960s, ASR has seen steady, incremental improvement
over the years.
• It has benefited greatly from increased processing speed of computers in
the last decade, entering the marketplace in the mid-2000s.
• Early systems were acoustic phonetics-based and worked with small
vocabularies to identify isolated words.
• Over the years, vocabularies have grown while ASR systems have become
statistics-based
• They now have large vocabularies and can recognize continuous speech.
Basic Structure
Digital Sampling
• When you speak, you create vibrations in the air. The analog-to-digital
converter (ADC) translates this analog wave into digital data that the
computer can understand.
• To do this, it samples, or digitizes, the sound by taking precise
measurements of the wave at frequent intervals.
• The system filters the digitized sound to remove unwanted noise, and
sometimes to separate it into different bands of frequency.
Acoustic model
• Next the signal is divided into small segments as short as a few
hundredths of a second, or even thousandths in the case of plosive
consonant sounds -- consonant stops produced by obstructing airflow
in the vocal tract -- like "p" or "t."
• The program then matches these segments to known phonemes in
the appropriate language.
• A phoneme is the smallest element of a language -- a representation
of the sounds we make and put together to form meaningful
expressions.
Language model
• The program examines phonemes in the context of the other
phonemes around them.
• It runs the contextual phoneme plot through a complex statistical
model and compares them to a large library of known words, phrases
and sentences.
• The program then determines what the user was probably saying and
either outputs it as text or issues a computer command.
Speech Recognition Technology
Statistical Modeling Systems
• These systems use probability and mathematical functions to
determine the most likely outcome.
• The two models that dominate the field today are the Hidden Markov
Model and Neural Networks.
• These methods involve complex mathematical functions, but
essentially, they take the information known to the system to figure
out the information hidden from it.
Hidden Markov Model (HMM)
• In this model, each phoneme is like a link in a chain, and the
completed chain is a word.
• The chain branches off in different directions as the program
attempts to match the digital sound with the phoneme that's most
likely to come next.
• During this process, the program assigns a probability score to each
phoneme, based on its built-in dictionary and user training.
Markov Model
Neural Networks
A class of statistical models may be called "neural" if they consist of

• sets of adaptive weights, i.e. numerical parameters that are tuned by
a learning algorithm, and
• are capable of approximating non-linear functions of their inputs.
The adaptive weights are conceptually connection strengths between
neurons, which are activated during training and prediction.
Each circular node represents an artificial neuron and an arrow represents a
connection from the output of one neuron to the input of another.
Program Training
• The process is more complicated for phrases and sentences -- the system
has to figure out where each word stops and starts.
• The statistical systems need lots of exemplary training data to reach their
optimal performance.

• Sometimes on the order of thousands of hours of human-transcribed
speech and hundreds of megabytes of text.
• The training data are used to create acoustic models of words, word lists
and multi-word probability networks.
• The details can make the difference between a well-performing system and
a poorly-performing system -- even when using the same basic algorithm.
Applications
• Transcription
• dictation, information retrieval

• Command and control
• data entry, device control, navigation, call routing

• Information access
• airline schedules, stock quotes, directory assistance

• Problem solving
• travel planning, logistics
Weaknesses and Flaws
• Low signal-to-noise ratio - The program needs to "hear" the words
spoken distinctly, and any extra noise introduced into the sound will
interfere with this.
• Overlapping speech- Current systems have difficulty separating
simultaneous speech from multiple users.

• Intensive use of computer power.
• Homonyms e.g. "There" and "their," "air" and "heir," "be" and "bee"
Major Challenges
• Making a system that can flawlessly handle roadblocks like
slang, dialects, accents and background noise.
• The different grammatical structures used by languages can also pose
a problem. For example, Arabic sometimes uses single words to
convey ideas that are entire sentences in English.
The Future of Speech Recognition
• The Defense Advanced Research Projects Agency (DARPA) has three teams
of researchers working on Global Autonomous Language Exploitation
(GALE), a program that will take in streams of information from foreign
news broadcasts and newspapers and translate them.
• It hopes to create software that can instantly translate two languages with
at least 90 percent accuracy.
• "DARPA is also funding an R&D effort called TRANSTAC to enable the
soldiers to communicate more effectively with civilian populations in nonEnglish-speaking countries.
Conclusion
At some point in the future, speech recognition may become speech
understanding.
The statistical models that allow computers to decide what a person
just said may someday allow them to grasp the meaning behind the
words.
Although it is a huge leap in terms of computational power and
software sophistication, some researchers argue that speech
recognition development offers the most direct line from the
computers of today to true artificial intelligence.
References
• http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speechrecognition.htm
• http://project.uet.itgo.com/speech.htm
• http://www.hitl.washington.edu/scivw/EVE/I.D.2.d.VoiceRecognition.html
• http://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx
• http://www.plumvoice.com/resources/blog/speech-recognition/
• http://en.wikipedia.org/wiki/Hidden_Markov_model
• http://en.wikipedia.org/wiki/Automatic_translation
Speech Recognition Technology

More Related Content

PPTX
Speech recognition final presentation
PPTX
Speech Recognition Technology
PPT
Speech Recognition
PPT
Speech Recognition in Artificail Inteligence
PPT
Speech recognition
PPT
Artificial intelligence Speech recognition system
PPTX
Clampers
Speech recognition final presentation
Speech Recognition Technology
Speech Recognition
Speech Recognition in Artificail Inteligence
Speech recognition
Artificial intelligence Speech recognition system
Clampers

What's hot (20)

PPSX
Speech recognition an overview
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PDF
speech processing and recognition basic in data mining
PPTX
Speech Recognition
PPTX
Speech recognition system seminar
DOCX
Speech Recognition by Iqbal
PPTX
Speech to text conversion
PPTX
Speech Recognition
PPT
Speech Recognition
PPTX
Artificial intelligence for speech recognition
PPT
Automatic speech recognition
PPTX
Speech recognition An overview
PPT
Face Detection and Recognition System
PPTX
speech processing basics
PDF
Deep Learning For Speech Recognition
PPT
Voice Recognition
PPTX
Introduction to text to speech
PPSX
Face recognition technology - BEST PPT
PDF
Speech recognition project report
Speech recognition an overview
SPEECH RECOGNITION USING NEURAL NETWORK
speech processing and recognition basic in data mining
Speech Recognition
Speech recognition system seminar
Speech Recognition by Iqbal
Speech to text conversion
Speech Recognition
Speech Recognition
Artificial intelligence for speech recognition
Automatic speech recognition
Speech recognition An overview
Face Detection and Recognition System
speech processing basics
Deep Learning For Speech Recognition
Voice Recognition
Introduction to text to speech
Face recognition technology - BEST PPT
Speech recognition project report
Ad

Viewers also liked (15)

PPTX
Uses of speech recognition system
PPTX
What is medical transcription
PDF
Universal Patient Identity: eliminating duplicate records, medical identity t...
PDF
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
PPTX
Voice & Speech Recognition Technology in Healthcare
PPT
Noise Adaptive Training for Robust Automatic Speech Recognition
PDF
Medical Records Destruction Guide
PPTX
Medical Transcription
PPT
Translation and Transcription Process | Medical Transcription Service Company
PPT
Introduction to medical transcription
PPSX
Medical Transcription Power Point Show
PPTX
Transcription
PPT
Medical Records Role and its Maintenance.
PPT
Speech Recognition System By Matlab
PPTX
Medical records ppt
Uses of speech recognition system
What is medical transcription
Universal Patient Identity: eliminating duplicate records, medical identity t...
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
Voice & Speech Recognition Technology in Healthcare
Noise Adaptive Training for Robust Automatic Speech Recognition
Medical Records Destruction Guide
Medical Transcription
Translation and Transcription Process | Medical Transcription Service Company
Introduction to medical transcription
Medical Transcription Power Point Show
Transcription
Medical Records Role and its Maintenance.
Speech Recognition System By Matlab
Medical records ppt
Ad

Similar to Speech Recognition Technology (20)

PDF
Master LLMs with LangChain -the basics of LLM
PPTX
Sequence to sequence model speech recognition
PDF
Recent advances in LVCSR : A benchmark comparison of performances
PPTX
Gnerative AI presidency Module1_L4_LLMs_new.pptx
PDF
unit 1 nlp(natural language processing intro)
PPTX
Voice Assistance Technology for integration with smart home ecosystem
PPTX
Artificial Intelligence- An Introduction
PPTX
Artificial Intelligence - An Introduction
PDF
Kc3517481754
PPTX
NLP Introduction and basics of natural language processing
PDF
Speech recognition using neural + fuzzy logic
PPTX
NLP,expert,robotics.pptx
PDF
Speech recognizers & generators
PPTX
NLP, Expert system and pattern recognition
DOCX
speech enhancement
PPTX
AI for voice recognition.pptx
PDF
Efficient Intralingual Text To Speech Web Podcasting And Recording
PPTX
Wreck a nice beach: adventures in speech recognition
PDF
Integration of speech recognition with computer assisted translation
PPTX
Natural Language Processing (NLP).pptx
Master LLMs with LangChain -the basics of LLM
Sequence to sequence model speech recognition
Recent advances in LVCSR : A benchmark comparison of performances
Gnerative AI presidency Module1_L4_LLMs_new.pptx
unit 1 nlp(natural language processing intro)
Voice Assistance Technology for integration with smart home ecosystem
Artificial Intelligence- An Introduction
Artificial Intelligence - An Introduction
Kc3517481754
NLP Introduction and basics of natural language processing
Speech recognition using neural + fuzzy logic
NLP,expert,robotics.pptx
Speech recognizers & generators
NLP, Expert system and pattern recognition
speech enhancement
AI for voice recognition.pptx
Efficient Intralingual Text To Speech Web Podcasting And Recording
Wreck a nice beach: adventures in speech recognition
Integration of speech recognition with computer assisted translation
Natural Language Processing (NLP).pptx

More from Seminar Links (20)

PDF
Artificial Intelligence (A.I.) in Schools (PPT)
PDF
Sustainable Materials Management (SMM)
PDF
Are Top Grades Enough (PPT)
PDF
AI and Youth Employment (PPT)
PDF
Environmental Impacts of COVID-19 Pandemic: PPT
PDF
20 Latest Computer Science Seminar Topics on Emerging Technologies
PDF
Claytronics | Programmable Matter | PPT
PDF
Three-dimensional Holographic Projection Technology PPT | 2018
PDF
MicroLED : Latest Display Technology | PPT
DOCX
Performance of 400 kV line insulators under pollution | PDF | DOC | PPT
PDF
Box Pushing Technique
PDF
Highest Largest Tallest Longest in India 2018
PDF
Atmospheric Vortex Engine (AVE)
PDF
Artificial photosynthesis PPT
PDF
How to prevent WannaCry Ransomware
PDF
Dams PPT
PDF
Bio mass Energy
PDF
Babbitt material ppt
PDF
Ceramic Bearing ppt
PDF
Carbon Foam Military Applications
Artificial Intelligence (A.I.) in Schools (PPT)
Sustainable Materials Management (SMM)
Are Top Grades Enough (PPT)
AI and Youth Employment (PPT)
Environmental Impacts of COVID-19 Pandemic: PPT
20 Latest Computer Science Seminar Topics on Emerging Technologies
Claytronics | Programmable Matter | PPT
Three-dimensional Holographic Projection Technology PPT | 2018
MicroLED : Latest Display Technology | PPT
Performance of 400 kV line insulators under pollution | PDF | DOC | PPT
Box Pushing Technique
Highest Largest Tallest Longest in India 2018
Atmospheric Vortex Engine (AVE)
Artificial photosynthesis PPT
How to prevent WannaCry Ransomware
Dams PPT
Bio mass Energy
Babbitt material ppt
Ceramic Bearing ppt
Carbon Foam Military Applications

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Speech Recognition Technology

  • 2. Introduction • Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. • The recognized words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation. • They can also serve as the input to further linguistic processing in order to achieve speech understanding. • It is also known as Automatic Speech Recognition (ASR) ,computer speech recognition, speech to text (STT).
  • 3. History • Around since the 1960s, ASR has seen steady, incremental improvement over the years. • It has benefited greatly from increased processing speed of computers in the last decade, entering the marketplace in the mid-2000s. • Early systems were acoustic phonetics-based and worked with small vocabularies to identify isolated words. • Over the years, vocabularies have grown while ASR systems have become statistics-based • They now have large vocabularies and can recognize continuous speech.
  • 5. Digital Sampling • When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand. • To do this, it samples, or digitizes, the sound by taking precise measurements of the wave at frequent intervals. • The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into different bands of frequency.
  • 6. Acoustic model • Next the signal is divided into small segments as short as a few hundredths of a second, or even thousandths in the case of plosive consonant sounds -- consonant stops produced by obstructing airflow in the vocal tract -- like "p" or "t." • The program then matches these segments to known phonemes in the appropriate language. • A phoneme is the smallest element of a language -- a representation of the sounds we make and put together to form meaningful expressions.
  • 7. Language model • The program examines phonemes in the context of the other phonemes around them. • It runs the contextual phoneme plot through a complex statistical model and compares them to a large library of known words, phrases and sentences. • The program then determines what the user was probably saying and either outputs it as text or issues a computer command.
  • 9. Statistical Modeling Systems • These systems use probability and mathematical functions to determine the most likely outcome. • The two models that dominate the field today are the Hidden Markov Model and Neural Networks. • These methods involve complex mathematical functions, but essentially, they take the information known to the system to figure out the information hidden from it.
  • 10. Hidden Markov Model (HMM) • In this model, each phoneme is like a link in a chain, and the completed chain is a word. • The chain branches off in different directions as the program attempts to match the digital sound with the phoneme that's most likely to come next. • During this process, the program assigns a probability score to each phoneme, based on its built-in dictionary and user training.
  • 12. Neural Networks A class of statistical models may be called "neural" if they consist of • sets of adaptive weights, i.e. numerical parameters that are tuned by a learning algorithm, and • are capable of approximating non-linear functions of their inputs. The adaptive weights are conceptually connection strengths between neurons, which are activated during training and prediction.
  • 13. Each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.
  • 14. Program Training • The process is more complicated for phrases and sentences -- the system has to figure out where each word stops and starts. • The statistical systems need lots of exemplary training data to reach their optimal performance. • Sometimes on the order of thousands of hours of human-transcribed speech and hundreds of megabytes of text. • The training data are used to create acoustic models of words, word lists and multi-word probability networks. • The details can make the difference between a well-performing system and a poorly-performing system -- even when using the same basic algorithm.
  • 15. Applications • Transcription • dictation, information retrieval • Command and control • data entry, device control, navigation, call routing • Information access • airline schedules, stock quotes, directory assistance • Problem solving • travel planning, logistics
  • 16. Weaknesses and Flaws • Low signal-to-noise ratio - The program needs to "hear" the words spoken distinctly, and any extra noise introduced into the sound will interfere with this. • Overlapping speech- Current systems have difficulty separating simultaneous speech from multiple users. • Intensive use of computer power. • Homonyms e.g. "There" and "their," "air" and "heir," "be" and "bee"
  • 17. Major Challenges • Making a system that can flawlessly handle roadblocks like slang, dialects, accents and background noise. • The different grammatical structures used by languages can also pose a problem. For example, Arabic sometimes uses single words to convey ideas that are entire sentences in English.
  • 18. The Future of Speech Recognition • The Defense Advanced Research Projects Agency (DARPA) has three teams of researchers working on Global Autonomous Language Exploitation (GALE), a program that will take in streams of information from foreign news broadcasts and newspapers and translate them. • It hopes to create software that can instantly translate two languages with at least 90 percent accuracy. • "DARPA is also funding an R&D effort called TRANSTAC to enable the soldiers to communicate more effectively with civilian populations in nonEnglish-speaking countries.
  • 19. Conclusion At some point in the future, speech recognition may become speech understanding. The statistical models that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words. Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition development offers the most direct line from the computers of today to true artificial intelligence.
  • 20. References • http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speechrecognition.htm • http://project.uet.itgo.com/speech.htm • http://www.hitl.washington.edu/scivw/EVE/I.D.2.d.VoiceRecognition.html • http://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx • http://www.plumvoice.com/resources/blog/speech-recognition/ • http://en.wikipedia.org/wiki/Hidden_Markov_model • http://en.wikipedia.org/wiki/Automatic_translation