SlideShare a Scribd company logo
INTERFACE
1
Husain Firoz Master
(302093)
Guided by proff. Vidya Patil
Outline
 Introduction
 Need for SUI
 Expectations from SUI
 overview of speech Recognition
 Voice features extraction and it technique
 Implementation of SUI
 Applications
 Future scope
 Shortcomings
 Conclusion
2
SPEECH USER INTERFACE (SUI)
 A user interface that works with human voice commands
 It offers truly hands free, eyes free interaction with computers
 It provide interface for operating computers with following understandings:
 Technology support
 User category
 User support
3
NEED FOR A SPEECH USER INTERFACE
 It offer truly hands free, eyes-free interaction
 have unmatched throughput rates
 are the only plausible interaction modality for illiterate users across the world
 speech is faster than typing on a keyboard
 Present opportunities for illiterate users in developing regions, giving them a
feasible way to access computing.
 but they are not yet developed in abundance to support every type of user,
language, or acoustic scenario.
4
Expectations from speech user interface
(SUI)
 Recognize speech from any untrained users
 Understand the meaning of the spoken word
 Make the action as per meaning extracted word
 Deal with multiple languages
 Incorporate with large vocabularies
 Provide good fault tolerance level
 Provide help and messages, to users during interaction
 Operate in real-time
5
Speech Recognition
 Translation of spoken words into text
 It is the ability of machines to understand natural human spoken language.
 Two types of Speech recognition, speaker-dependent and speaker-independent
 speaker-dependent :-Systems that require training
 speaker-independent :-Systems that do not require training
 Basically it is the process by which a computer maps an acoustic speech signal to
text.
6
SPEECH RECOGNITION MODEL7
VOICE FEATURE EXTRACTION
 Voice feature extraction is known as front end processing.
 It is performed in both recognition and training mode.
 converts digital speech signal into sets of numerical descriptors called feature
vectors .
 contain key characteristics of the speech signal.
 It evaluate of the different types of feature extracted from voice to determine their
suitability for voice recognition
 MFCC and HMM are one the most currently used feature extraction techniques.
8
Feature Extraction Techniques
 Mel-frequency Cepstral coefficients (MFCC)
 Hidden Markov model (HMM)
 Dynamic time warping (DTW)
 Fusion HMM and DTW
9
Mel-frequency Cepstral coefficients (MFCC)
 Mel-frequency cepstral (MFC) is a representation of the short-term power
spectrum of a sound, based on a linear cosine transform of a log power spectrum.
 Mel-frequency Cepstral coefficients (MFCCs) are coefficients which contents the
frequency bands of an audio input.
 The frequency bands are equally spaced, by which mfcc approximates the human
auditory system's response more closely.
 The use of about 20 MFCC coefficients is common in ASR, although 10-12
coefficients are often considered to be sufficient for coding speech.
10
Hidden Markov model (HMM)
 HMM models are used for representing the possible symbol sequences underlying
speech utterances.
 The states in the HMM represent easy spoken basic linguistic units (e.g. phonemes
or smaller phases of phonemes) that are used by the human to pronounce a word.
 For each word, one or more complex HMMs exist, which model the probability to
articulate a state sequence representing a word.
 HMMs are usually trained from large sets of recorded and feature analyzed
samples.
11
Dynamic time warping (DTW)
 dynamic time warping (DTW) is an algorithm for measuring similarity between two
spoken word sequences which may vary in time or speed.
 it compares two speech sequences.
 It measures similarity between two sequences
12
Fusion HMM and DTW
 In this method HMM and DTW are combined
 Basically the results of HMM and DTW are combined in weight mean vectors
 DTW find the similarity between two signals based on time
 Meanwhile HMM trains cluster and iteratively moves between clusters based on
their likelihoods given to it while training.
13
SUI IMPLEMENTATION14
SUI IMPLEMENTATION (contd.)
 Recording Speech
 Applying Noise cancellation
 End point detection
 Feature extraction :MFCC algorithm is used and parameters are separated, that are
further used in training part.
 Normalization: Word length is calculated for all groups and made an average for
each.
15
SUI IMPLEMENTATION (contd.)
 Training using HMM
 Fusion HMM and DTW
 Recognized word or sentence is given to the application
16
Applications
 Speech Operated Calculator.
 Voice Dialing
 intelligent voice assistant (Personal Agent) like Apple SIRI, google voice talk etc.
 Home and Building Automation Systems using SUI
 live subtitling on television
 speech-to-text conversion or note taking systems
17
Future Scope
 Speech User Interface For Learning Foreign Languages
 Dictation tools in the medical and legal profession
18
SHORTCOMINGS
 Train the speech recognition system in the implementation environ-ment.
 Keep vocabulary Size small
 Keep short each speech input (word length).
 Use speech inputs that sound distinctly deferent from each other.
 Keep the user interface simple.
 Don't use speech to position objects.
 Use a command-based user interface.
 Allow users to quickly and easily turn oFF and on the speech recognizer.
 Use a highly directional, noise-canceling microphone
19
CONCLUSION20

More Related Content

PDF
International journal of signal and image processing issues vol 2015 - no 1...
PDF
PPTX
Speech recognition final
PDF
Ijartes v1-i1-005
PDF
Hindi digits recognition system on speech data collected in different natural...
PDF
5215ijcseit01
PDF
A Marathi Hidden-Markov Model Based Speech Synthesis System
PDF
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
International journal of signal and image processing issues vol 2015 - no 1...
Speech recognition final
Ijartes v1-i1-005
Hindi digits recognition system on speech data collected in different natural...
5215ijcseit01
A Marathi Hidden-Markov Model Based Speech Synthesis System
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK

What's hot (16)

PPT
Voice Recognition
PPTX
Automatic speech recognition
PPT
Abstract of speech recognition
PPT
Coping tactics in interpretation
PPTX
SCHOOL OF HEALTH INFORMATION MANAGEMENT
PDF
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
PPT
Speech recognition system
PDF
An expert system for automatic reading of a text written in standard arabic
PDF
High level speaker specific features modeling in automatic speaker recognitio...
PPSX
Speech recognition an overview
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PDF
Mjfg now
PDF
AN EFFICIENT SPEECH RECOGNITION SYSTEM
PDF
Ijetcas14 390
PDF
Improving the role of language model in statistical machine translation (Indo...
DOC
Voice controlled robot
Voice Recognition
Automatic speech recognition
Abstract of speech recognition
Coping tactics in interpretation
SCHOOL OF HEALTH INFORMATION MANAGEMENT
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
Speech recognition system
An expert system for automatic reading of a text written in standard arabic
High level speaker specific features modeling in automatic speaker recognitio...
Speech recognition an overview
SPEECH RECOGNITION USING NEURAL NETWORK
Mjfg now
AN EFFICIENT SPEECH RECOGNITION SYSTEM
Ijetcas14 390
Improving the role of language model in statistical machine translation (Indo...
Voice controlled robot
Ad

Viewers also liked (18)

PDF
Pensamiento Computacional
PPTX
industrial ppt
DOCX
MA Dissertation
PPT
PPT
Perkembangan budchips di madura
PPTX
Music video analysis
PDF
office furniture in gurgaon
PPTX
Mis bandas favoritas de música
PDF
Réglement jeu beauregard 2016_plateformefestival
PDF
Técnicas e instrumentos de evaluación
DOC
TECNICA 35
PPTX
Los códigos qr en el aula
PPTX
Émile Durkheim
PPTX
MS SQL SERVER: Sql Functions And Procedures
PPTX
Francia
PPT
Hydraulic system of aircrafts
PDF
Ký Hiệu đường hàn Theo TCVN
PPTX
Rapes in India
Pensamiento Computacional
industrial ppt
MA Dissertation
Perkembangan budchips di madura
Music video analysis
office furniture in gurgaon
Mis bandas favoritas de música
Réglement jeu beauregard 2016_plateformefestival
Técnicas e instrumentos de evaluación
TECNICA 35
Los códigos qr en el aula
Émile Durkheim
MS SQL SERVER: Sql Functions And Procedures
Francia
Hydraulic system of aircrafts
Ký Hiệu đường hàn Theo TCVN
Rapes in India
Ad

Similar to Speech user interface (20)

PDF
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
PPTX
Speech to text conversion
PPTX
Speech to text conversion
PDF
Speech to text conversion for visually impaired person using µ law companding
PDF
H010625862
PDF
E0502 01 2327
PDF
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
PDF
SMATalk: Standard Malay Text to Speech Talk System
PDF
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
PDF
Effect of MFCC Based Features for Speech Signal Alignments
PDF
Effect of MFCC Based Features for Speech Signal Alignments
PDF
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
DOCX
speech enhancement
PDF
Audio/Speech Signal Analysis for Depression
PDF
A_Review_on_Different_Approaches_for_Spe.pdf
PDF
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
DOCX
Intelligent speech based sms system
PDF
Indonesian continuous speech recognition optimization with convolution bidir...
PDF
Av4103298302
PDF
A Review On Speech Feature Techniques And Classification Techniques
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Speech to text conversion
Speech to text conversion
Speech to text conversion for visually impaired person using µ law companding
H010625862
E0502 01 2327
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
SMATalk: Standard Malay Text to Speech Talk System
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
speech enhancement
Audio/Speech Signal Analysis for Depression
A_Review_on_Different_Approaches_for_Spe.pdf
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
Intelligent speech based sms system
Indonesian continuous speech recognition optimization with convolution bidir...
Av4103298302
A Review On Speech Feature Techniques And Classification Techniques

Recently uploaded (20)

PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Mechanical Engineering MATERIALS Selection
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PPT on Performance Review to get promotions
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Artificial Intelligence
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT
Project quality management in manufacturing
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mechanical Engineering MATERIALS Selection
bas. eng. economics group 4 presentation 1.pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
UNIT 4 Total Quality Management .pptx
Foundation to blockchain - A guide to Blockchain Tech
PPT on Performance Review to get promotions
Current and future trends in Computer Vision.pptx
Artificial Intelligence
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Project quality management in manufacturing
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
R24 SURVEYING LAB MANUAL for civil enggi
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...

Speech user interface

  • 2. Outline  Introduction  Need for SUI  Expectations from SUI  overview of speech Recognition  Voice features extraction and it technique  Implementation of SUI  Applications  Future scope  Shortcomings  Conclusion 2
  • 3. SPEECH USER INTERFACE (SUI)  A user interface that works with human voice commands  It offers truly hands free, eyes free interaction with computers  It provide interface for operating computers with following understandings:  Technology support  User category  User support 3
  • 4. NEED FOR A SPEECH USER INTERFACE  It offer truly hands free, eyes-free interaction  have unmatched throughput rates  are the only plausible interaction modality for illiterate users across the world  speech is faster than typing on a keyboard  Present opportunities for illiterate users in developing regions, giving them a feasible way to access computing.  but they are not yet developed in abundance to support every type of user, language, or acoustic scenario. 4
  • 5. Expectations from speech user interface (SUI)  Recognize speech from any untrained users  Understand the meaning of the spoken word  Make the action as per meaning extracted word  Deal with multiple languages  Incorporate with large vocabularies  Provide good fault tolerance level  Provide help and messages, to users during interaction  Operate in real-time 5
  • 6. Speech Recognition  Translation of spoken words into text  It is the ability of machines to understand natural human spoken language.  Two types of Speech recognition, speaker-dependent and speaker-independent  speaker-dependent :-Systems that require training  speaker-independent :-Systems that do not require training  Basically it is the process by which a computer maps an acoustic speech signal to text. 6
  • 8. VOICE FEATURE EXTRACTION  Voice feature extraction is known as front end processing.  It is performed in both recognition and training mode.  converts digital speech signal into sets of numerical descriptors called feature vectors .  contain key characteristics of the speech signal.  It evaluate of the different types of feature extracted from voice to determine their suitability for voice recognition  MFCC and HMM are one the most currently used feature extraction techniques. 8
  • 9. Feature Extraction Techniques  Mel-frequency Cepstral coefficients (MFCC)  Hidden Markov model (HMM)  Dynamic time warping (DTW)  Fusion HMM and DTW 9
  • 10. Mel-frequency Cepstral coefficients (MFCC)  Mel-frequency cepstral (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum.  Mel-frequency Cepstral coefficients (MFCCs) are coefficients which contents the frequency bands of an audio input.  The frequency bands are equally spaced, by which mfcc approximates the human auditory system's response more closely.  The use of about 20 MFCC coefficients is common in ASR, although 10-12 coefficients are often considered to be sufficient for coding speech. 10
  • 11. Hidden Markov model (HMM)  HMM models are used for representing the possible symbol sequences underlying speech utterances.  The states in the HMM represent easy spoken basic linguistic units (e.g. phonemes or smaller phases of phonemes) that are used by the human to pronounce a word.  For each word, one or more complex HMMs exist, which model the probability to articulate a state sequence representing a word.  HMMs are usually trained from large sets of recorded and feature analyzed samples. 11
  • 12. Dynamic time warping (DTW)  dynamic time warping (DTW) is an algorithm for measuring similarity between two spoken word sequences which may vary in time or speed.  it compares two speech sequences.  It measures similarity between two sequences 12
  • 13. Fusion HMM and DTW  In this method HMM and DTW are combined  Basically the results of HMM and DTW are combined in weight mean vectors  DTW find the similarity between two signals based on time  Meanwhile HMM trains cluster and iteratively moves between clusters based on their likelihoods given to it while training. 13
  • 15. SUI IMPLEMENTATION (contd.)  Recording Speech  Applying Noise cancellation  End point detection  Feature extraction :MFCC algorithm is used and parameters are separated, that are further used in training part.  Normalization: Word length is calculated for all groups and made an average for each. 15
  • 16. SUI IMPLEMENTATION (contd.)  Training using HMM  Fusion HMM and DTW  Recognized word or sentence is given to the application 16
  • 17. Applications  Speech Operated Calculator.  Voice Dialing  intelligent voice assistant (Personal Agent) like Apple SIRI, google voice talk etc.  Home and Building Automation Systems using SUI  live subtitling on television  speech-to-text conversion or note taking systems 17
  • 18. Future Scope  Speech User Interface For Learning Foreign Languages  Dictation tools in the medical and legal profession 18
  • 19. SHORTCOMINGS  Train the speech recognition system in the implementation environ-ment.  Keep vocabulary Size small  Keep short each speech input (word length).  Use speech inputs that sound distinctly deferent from each other.  Keep the user interface simple.  Don't use speech to position objects.  Use a command-based user interface.  Allow users to quickly and easily turn oFF and on the speech recognizer.  Use a highly directional, noise-canceling microphone 19