SlideShare a Scribd company logo
2
Most read
7
Most read
11
Most read
SPEECH TO TEXT
CONVERSION
Why this project?
 Speech recognition technology is one from the fast growing
engineering technologies.
 Nearly 20% people of the world are suffering from various
disabilities; many of them are blind or unable to use their
hands effectively. they can share information with people by
operating computer through voice input.
 Our project is capable to recognize the speech and convert
the input audio into text; it also enables a user to perform
operations such as open calculator, wordpad, notepad, log off
computer.
APPLICATIONS
 In Car Systems
 Health Care
 Military
 Training air traffic controllers
 Telephony and other domains
 Usage in education and daily life
PERFORMANCE
The performance of speech recognition systems is usually
evaluated in terms of accuracy and speed. Accuracy is
usually rated with word error rate (WER), whereas speed is
measured with the real time factor. Other measures of
accuracy include Single Word Error Rate (SWER)
and Command Success Rate (CSR).
Accuracy
Accuracy of speech recognition vary
with the following:
 Vocabulary size and confusability
 Speaker dependence vs. independence
 Isolated, discontinuous, or continuous speech
 Task and language constraints
 Read vs. spontaneous speech
SYSTEM BLOCK DIAGRAM
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text
transcriptions, and using software to create statistical representations of the sounds that
make up each word. It is used by a speech recognition engine to recognize speech.
Language Model
A language model is a file containing the probabilities of sequences of words. Language
models are used for dictation applications, whereas grammars are used in desktop
command and control or telephony interactive voice response (IVR) type applications.
Speech Engine
A speech engine is software that gives your computer the ability to play back text in a
spoken voice (referred to as text-to-speech or TTS).
HMM (HIDDEN MARKOV
MODEL)
These are statistical models that output a sequence of symbols or
quantities. HMMs are used in speech recognition because a speech
signal can be viewed as a piecewise stationary signal or a short-time
stationary signal. In a short time-scale (e.g., 10 milliseconds),
speech can be approximated as a stationary process. Speech can be
thought of as a Markov model for many stochastic purposes.
HMM Codebook
HMM Speech Process
Speech to text conversion
Advantages
 Able to write the text both through keyboard and voice input .
 Voice recognition of different notepad commands such as
open save and clear.
 Open different windows soft wares, based on voice input.
 Lower operational costs.
 Provide significant help for the people with disabilities.
 Requires less consumption of time in writing text.
WEAKNESS
Homonyms:
Are the words that are differently spelled and have the different meaning but
acquires the same meaning, for example “there” “their”, “be” and “bee”. This is a
challenge for computer machine to distinguish between such types of phrases that
sound alike.
Speeches:
A second challenge in the process, is to understand the speech uttered by different
users, current systems have a difficulty to separate simultaneous speeches form
multiple users.
Noise factor:
the program requires hearing the words uttered by a human distinctly and clearly.
Any extra sound can create interference, first you need to place system away from
noisy environments and then speak clearly else the machine will confuse and will
mix up the words.
FUTURE SCOPE
 Accuracy will become better and better
 Dictation speech recognition will gradually become accepted
 Greater use will be made of “intelligent systems” which will
attempt to guess what the speaker intended to say, rather than what
was actually said, as people often misspeak and make unintentional
mistakes.
 Microphone and sound systems will be designed to adapt more
quickly to changing background noise levels, different
environments, with better recognition of extraneous material to be
discarded.

More Related Content

PPT
Speech recognition
PPSX
Speech recognition an overview
PDF
consumer right project.pdf
PPTX
Virtual reality ppt
PPTX
Presentation1.pptx, ultrasound examination of the uterus and ovaries.
PPT
Speech Recognition in Artificail Inteligence
PPTX
Speech recognition final presentation
PDF
Cisco Cyber Security Essentials Chapter-1
Speech recognition
Speech recognition an overview
consumer right project.pdf
Virtual reality ppt
Presentation1.pptx, ultrasound examination of the uterus and ovaries.
Speech Recognition in Artificail Inteligence
Speech recognition final presentation
Cisco Cyber Security Essentials Chapter-1

What's hot (20)

PDF
Speech recognition project report
PPSX
Text to-speech & voice recognition
PPTX
TEXT-SPEECH PPT.pptx
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PPTX
Artificial intelligence for speech recognition
PPTX
Speech Recognition Technology
PPT
Voice morphing-
PPTX
Speech Recognition
PPTX
Sign language recognizer
PPTX
Visual speech to text conversion applicable to telephone communication
PPTX
SPEECH BASED EMOTION RECOGNITION USING VOICE
DOCX
A seminar report on speech recognition technology
PPT
Artificial intelligence Speech recognition system
PPT
Speech Recognition
PPSX
Face recognition technology - BEST PPT
PPTX
AI: Learning in AI
PPT
Face recognition ppt
PDF
Deep Learning For Speech Recognition
PDF
Speech to text conversion for visually impaired person using µ law companding
PDF
Speech Recognition Using Python | Edureka
Speech recognition project report
Text to-speech & voice recognition
TEXT-SPEECH PPT.pptx
SPEECH RECOGNITION USING NEURAL NETWORK
Artificial intelligence for speech recognition
Speech Recognition Technology
Voice morphing-
Speech Recognition
Sign language recognizer
Visual speech to text conversion applicable to telephone communication
SPEECH BASED EMOTION RECOGNITION USING VOICE
A seminar report on speech recognition technology
Artificial intelligence Speech recognition system
Speech Recognition
Face recognition technology - BEST PPT
AI: Learning in AI
Face recognition ppt
Deep Learning For Speech Recognition
Speech to text conversion for visually impaired person using µ law companding
Speech Recognition Using Python | Edureka
Ad

Similar to Speech to text conversion (20)

PPTX
PPTX
PDF
Paper on Speech Recognition
PDF
Artificial Intelligence for Speech Recognition
PPTX
AI for voice recognition.pptx
PPTX
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
PDF
AReviewonDifferentApproachesforSpeechRecognitionSystem.pdf
PPTX
voice browser
PPTX
Artificial Intelligence- An Introduction
PPTX
Artificial Intelligence - An Introduction
PPTX
Google Voice-to-text
PDF
Speech recognizers & generators
PDF
A_Review_on_Different_Approaches_for_Spe.pdf
PPTX
Speech Recognition
PPTX
Speech Recognition Technology
PPT
Abstract of speech recognition
PDF
Bt35408413
PDF
VOICE RECOGNITION SYSTEM
PDF
PDF
ACHIEVING SECURITY VIA SPEECH RECOGNITION
Paper on Speech Recognition
Artificial Intelligence for Speech Recognition
AI for voice recognition.pptx
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
AReviewonDifferentApproachesforSpeechRecognitionSystem.pdf
voice browser
Artificial Intelligence- An Introduction
Artificial Intelligence - An Introduction
Google Voice-to-text
Speech recognizers & generators
A_Review_on_Different_Approaches_for_Spe.pdf
Speech Recognition
Speech Recognition Technology
Abstract of speech recognition
Bt35408413
VOICE RECOGNITION SYSTEM
ACHIEVING SECURITY VIA SPEECH RECOGNITION
Ad

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PPTX
additive manufacturing of ss316l using mig welding
PPTX
web development for engineering and engineering
PPTX
Sustainable Sites - Green Building Construction
PDF
PPT on Performance Review to get promotions
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Welding lecture in detail for understanding
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mechanical Engineering MATERIALS Selection
additive manufacturing of ss316l using mig welding
web development for engineering and engineering
Sustainable Sites - Green Building Construction
PPT on Performance Review to get promotions
CYBER-CRIMES AND SECURITY A guide to understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT 4 Total Quality Management .pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Operating System & Kernel Study Guide-1 - converted.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Digital Logic Computer Design lecture notes
Model Code of Practice - Construction Work - 21102022 .pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
Welding lecture in detail for understanding
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Speech to text conversion

  • 2. Why this project?  Speech recognition technology is one from the fast growing engineering technologies.  Nearly 20% people of the world are suffering from various disabilities; many of them are blind or unable to use their hands effectively. they can share information with people by operating computer through voice input.  Our project is capable to recognize the speech and convert the input audio into text; it also enables a user to perform operations such as open calculator, wordpad, notepad, log off computer.
  • 3. APPLICATIONS  In Car Systems  Health Care  Military  Training air traffic controllers  Telephony and other domains  Usage in education and daily life
  • 4. PERFORMANCE The performance of speech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).
  • 5. Accuracy Accuracy of speech recognition vary with the following:  Vocabulary size and confusability  Speaker dependence vs. independence  Isolated, discontinuous, or continuous speech  Task and language constraints  Read vs. spontaneous speech
  • 7. Acoustic Model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech. Language Model A language model is a file containing the probabilities of sequences of words. Language models are used for dictation applications, whereas grammars are used in desktop command and control or telephony interactive voice response (IVR) type applications. Speech Engine A speech engine is software that gives your computer the ability to play back text in a spoken voice (referred to as text-to-speech or TTS).
  • 8. HMM (HIDDEN MARKOV MODEL) These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can be thought of as a Markov model for many stochastic purposes.
  • 12. Advantages  Able to write the text both through keyboard and voice input .  Voice recognition of different notepad commands such as open save and clear.  Open different windows soft wares, based on voice input.  Lower operational costs.  Provide significant help for the people with disabilities.  Requires less consumption of time in writing text.
  • 13. WEAKNESS Homonyms: Are the words that are differently spelled and have the different meaning but acquires the same meaning, for example “there” “their”, “be” and “bee”. This is a challenge for computer machine to distinguish between such types of phrases that sound alike. Speeches: A second challenge in the process, is to understand the speech uttered by different users, current systems have a difficulty to separate simultaneous speeches form multiple users. Noise factor: the program requires hearing the words uttered by a human distinctly and clearly. Any extra sound can create interference, first you need to place system away from noisy environments and then speak clearly else the machine will confuse and will mix up the words.
  • 14. FUTURE SCOPE  Accuracy will become better and better  Dictation speech recognition will gradually become accepted  Greater use will be made of “intelligent systems” which will attempt to guess what the speaker intended to say, rather than what was actually said, as people often misspeak and make unintentional mistakes.  Microphone and sound systems will be designed to adapt more quickly to changing background noise levels, different environments, with better recognition of extraneous material to be discarded.