SlideShare a Scribd company logo
Visual-speech to text 
conversion applicable 
to telephone 
communication for deaf 
individuals 
30TH APRIL 2013
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
INTRODUCTION 
 Lip-reading technique, 
 speech can be understood by interpreting 
movements of lips, face and tongue. 
 not one-to-one 
 Impossible to distinguish phonemes using 
visual information alone
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
 the Cued Speech system 
 developed by Cornett 
 contains two components: 
the hand shape the hand position relative to the 
face. 
 Hand shapes- consonant phonemes 
 hand positions -vowel phonemes. 
 improves speech perception to a large extent
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
the Cued Speech system
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
AIM OF NEW SYSTEM 
 To investigate the designing of a system able to 
automatically recognize Cued Speech and convert it 
to text. 
 Possible for deaf or speech-impaired individuals to 
communicate with each other and also with normal-hearing 
persons 
 Using gestures 
 captured by devices equipped by a camera
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
METHODS 
 Corpus, feature extraction, and 
statistical modeling 
 The speakers’ lips were painted blue, and color 
marks were placed on the speakers’ fingers. . 
 The data were derived from a video recording of 
the cuers pronouncing and coding in Cued 
Speech 
 landmarks with different colors were placed on 
the fingers
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
 faster and more accurate image processing 
stage. 
 The audio part of the video recording was 
synchronized with the image. 
 An automatic image processing method was 
appliedli pt ow idththe ( Av)i,d eo 
 lip aperture (B), 
 lip area (S). 
 pinching of the upper lip (Bsup) 
 lower (Binf) lip
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
 Concatenative feature fusion 
 Tracks and extracts the xy coordinates 
each time frame, 
 uses those values as features in the 
HMM modeling. 
 uses the concatenation of the 
synchronous lip shape and hand features 
as the joint feature vector given by,
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
Joint lip hand 
feature vector, 
Lip shape 
feature vector, 
Hand feature 
vector, 
Dimensionality of the 
joint feature vector 
 Parameters used for lip 
shape modeling.
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
RESULTS 
 Isolated word recognition 
1. Recognition in normal-hearing subject
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
2. Recognition in deaf subject
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
3. Multi-speaker isolated word recognition: 
 investigate whether it is possible to train speaker-independent 
HMMs for Cued Speech recognition. 
 The training data consisted of 750 words from the 
normal-hearing subject, and 750 words from the 
deaf subject. 
 For testing 700 words from normal-hearing subject 
and 700 words from the deaf subject were used, 
respectively. 
 Each state was modeled with a mixture of 4 
Gaussian distributions. 
 For lip shape and hand shape integration, 
concatenative feature fusion was used.
Visual-speech to text conversion applicable to telephone communication for deaf individuals
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
4. Continuous phoneme recognition 
 Phoneme correct for continuous phoneme word 
recognition in the case of a normal-hearing subject.
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
Phoneme correct for continuous phoneme word 
recognition in the case of a deaf subject.
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
CONCLUSION 
 Hand shapes and lips shape were integrated 
using concatenative feature fusion and HMM-based 
automatic recognition was conducted. 
 For continuous phoneme recognition, a 86% 
phoneme correct was achieved for the normal-hearing 
cuer and a 82.7% phoneme correct for 
the dead cuer were achieved, respectively. 
 Speech in both normal-hearing and deaf 
subjects were also conducted obtaining a 
94.9% and a 89% accuracy, respectively. 
.
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
CONCLUSION 
 A multi-speaker experiment using data 
from both normal-hearing and deaf subject 
showed a 89.6% word accuracy, on 
average. 
 This result indicates that training speaker-independent 
HMMs for Cued Speech using 
a large number of subjects should not face 
particular difficulties
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
REFERENCES 
 G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior, 
“recent Advances in the automatic recognition of audiovisual 
speech,” in Proceedings of the IEEE, vol. 91, issue 9, pp. 
1306–1326, 2003. 
 S. Nakamura, K. Kumatani, and S. Tamura, “Multi-modal 
temporal asynchronicity modeling by product hmms for 
robust audio-visual speech recognition,” in Proceedings of 
Fourth IEEE International Conference on Multimodal 
Interfaces (ICMI’02), p. 305, 2002. 
 R. O. Cornett, “Cued speech,” American Annals of the Deaf, 
vol. 112, pp. 3–13, 1967. 
 J. Leybaert, “Phonology acquired through the eyes and 
spelling in deaf children,”Journal of Experimental Child 
Psychology, vol. 75, pp. 291– 318, 2000
Thank you!
Visual-speech to text conversion applicable to telephone communication for deaf individuals 
ANY 
QUESTION 
S?

More Related Content

PPTX
Speech to text conversion
PDF
Speech recognition project report
PPT
Speech Recognition
PPTX
Speech Recognition
PPTX
Speech recognition final presentation
PPTX
Speech recognition An overview
PPSX
Speech recognition an overview
PPSX
Text to-speech & voice recognition
Speech to text conversion
Speech recognition project report
Speech Recognition
Speech Recognition
Speech recognition final presentation
Speech recognition An overview
Speech recognition an overview
Text to-speech & voice recognition

What's hot (20)

PPTX
Artificial intelligence for speech recognition
PDF
Speech Recognition Using Python | Edureka
PPT
Speech Recognition in Artificail Inteligence
PPT
Automatic speech recognition
PPTX
Automatic speech recognition system
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPTX
Speech recognition techniques
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PDF
Artificial Intelligence for Speech Recognition
PDF
Lecture: Word Sense Disambiguation
PPTX
Hand Gesture Recognition
PDF
Employee Recruitment System srs
PDF
Face detection and recognition
PPTX
Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text
PDF
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
PPT
Face Detection and Recognition System
PPTX
Project presentation by Debendra Adhikari
PDF
Deep Learning For Speech Recognition
PDF
IRJET- Credit Card Fraud Detection using Random Forest
PPTX
Chat Application - Requirements Analysis & Design
Artificial intelligence for speech recognition
Speech Recognition Using Python | Edureka
Speech Recognition in Artificail Inteligence
Automatic speech recognition
Automatic speech recognition system
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Speech recognition techniques
SPEECH RECOGNITION USING NEURAL NETWORK
Artificial Intelligence for Speech Recognition
Lecture: Word Sense Disambiguation
Hand Gesture Recognition
Employee Recruitment System srs
Face detection and recognition
Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
Face Detection and Recognition System
Project presentation by Debendra Adhikari
Deep Learning For Speech Recognition
IRJET- Credit Card Fraud Detection using Random Forest
Chat Application - Requirements Analysis & Design
Ad

Viewers also liked (20)

PDF
Chat Room System using Java Swing
PPT
Maestria en gestion de la innovación uniminuto
DOC
(2012-12-12) HIPERTENSION ARTERIAL (DOC)
DOCX
PPTX
Noches románticas
PPS
10 pasos a la felicidad
PPTX
STOP DIABETES CON FLP PERU
PDF
Foro - Ley electoral
PDF
The Future of News, Publishing, and Media (INMA 2010 Presentation)
PPT
Industrial Investment Engineering Presentation
PDF
eCommerce Helsinki 2016_Anders Innovations & GlobalSign_16th march, 2016, Hel...
PDF
Color vision
DOCX
Actividad 2
PDF
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
PDF
Agua de mar es salud
PDF
13 insights (Troiano Branding)
PPS
Helden - Jugend gestern Und heute
PPT
Historia De Mi Vida
PDF
DAFO PERSONAL - Catedra Bancaja UPF-idec 12 feb2011 _lluis soldevila
Chat Room System using Java Swing
Maestria en gestion de la innovación uniminuto
(2012-12-12) HIPERTENSION ARTERIAL (DOC)
Noches románticas
10 pasos a la felicidad
STOP DIABETES CON FLP PERU
Foro - Ley electoral
The Future of News, Publishing, and Media (INMA 2010 Presentation)
Industrial Investment Engineering Presentation
eCommerce Helsinki 2016_Anders Innovations & GlobalSign_16th march, 2016, Hel...
Color vision
Actividad 2
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
Agua de mar es salud
13 insights (Troiano Branding)
Helden - Jugend gestern Und heute
Historia De Mi Vida
DAFO PERSONAL - Catedra Bancaja UPF-idec 12 feb2011 _lluis soldevila
Ad

Similar to Visual speech to text conversion applicable to telephone communication (20)

PDF
Paper id 23201490
PDF
Speech to text conversion for visually impaired person using µ law companding
PDF
H010625862
PDF
5.smart multilingual sign boards
PDF
IRJET - Speaking System for Mute People
PDF
Efficient Intralingual Text To Speech Web Podcasting And Recording
PDF
IRJET - Sign Language Converter
PDF
IRJET - A Review on Text Recognition for Visually Blind People
PDF
IRJET - Gesture based Communication Recognition System
PPTX
Speechreading
PDF
IRJET- Text Reading for Visually Impaired Person using Raspberry Pi
PDF
Speech Recognition: Transcription and transformation of human speech
PDF
American Standard Sign Language Representation Using Speech Recognition
PDF
D1803041822
PPTX
An Android Communication Platform between Hearing Impaired and General People
PPTX
Proposal presentation.pptx
DOCX
Procedia Computer Science 94 ( 2016 ) 295 – 301 Avail.docx
PDF
IRJET- Communication Aid for Deaf and Dumb People
PDF
A teaching system for non disabled people who communicate with deafblind pe
PPTX
Speech to text conversion
Paper id 23201490
Speech to text conversion for visually impaired person using µ law companding
H010625862
5.smart multilingual sign boards
IRJET - Speaking System for Mute People
Efficient Intralingual Text To Speech Web Podcasting And Recording
IRJET - Sign Language Converter
IRJET - A Review on Text Recognition for Visually Blind People
IRJET - Gesture based Communication Recognition System
Speechreading
IRJET- Text Reading for Visually Impaired Person using Raspberry Pi
Speech Recognition: Transcription and transformation of human speech
American Standard Sign Language Representation Using Speech Recognition
D1803041822
An Android Communication Platform between Hearing Impaired and General People
Proposal presentation.pptx
Procedia Computer Science 94 ( 2016 ) 295 – 301 Avail.docx
IRJET- Communication Aid for Deaf and Dumb People
A teaching system for non disabled people who communicate with deafblind pe
Speech to text conversion

More from Swathi Venugopal (7)

PPTX
A new low cost shrm for adjustable-speed pump applications
PPTX
Harnessing high altitude wind power
PPTX
Micro stepping mode for stepper motor
PPTX
A Frequency-based RF Partial Discharge Detector for Low-power Wireless Sens...
PPTX
Estimation of induction motor operating power factor.
PPTX
Save energy save enviornment ii
PPTX
Grid integration issues and solutions
A new low cost shrm for adjustable-speed pump applications
Harnessing high altitude wind power
Micro stepping mode for stepper motor
A Frequency-based RF Partial Discharge Detector for Low-power Wireless Sens...
Estimation of induction motor operating power factor.
Save energy save enviornment ii
Grid integration issues and solutions

Recently uploaded (20)

PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
PPT on Performance Review to get promotions
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Digital Logic Computer Design lecture notes
PPTX
web development for engineering and engineering
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Welding lecture in detail for understanding
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Embodied AI: Ushering in the Next Era of Intelligent Systems
CH1 Production IntroductoryConcepts.pptx
PPT on Performance Review to get promotions
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Digital Logic Computer Design lecture notes
web development for engineering and engineering
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Sustainable Sites - Green Building Construction
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Welding lecture in detail for understanding
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
R24 SURVEYING LAB MANUAL for civil enggi
UNIT-1 - COAL BASED THERMAL POWER PLANTS
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
UNIT 4 Total Quality Management .pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Visual speech to text conversion applicable to telephone communication

  • 1. Visual-speech to text conversion applicable to telephone communication for deaf individuals 30TH APRIL 2013
  • 2. Visual-speech to text conversion applicable to telephone communication for deaf individuals INTRODUCTION  Lip-reading technique,  speech can be understood by interpreting movements of lips, face and tongue.  not one-to-one  Impossible to distinguish phonemes using visual information alone
  • 3. Visual-speech to text conversion applicable to telephone communication for deaf individuals  the Cued Speech system  developed by Cornett  contains two components: the hand shape the hand position relative to the face.  Hand shapes- consonant phonemes  hand positions -vowel phonemes.  improves speech perception to a large extent
  • 4. Visual-speech to text conversion applicable to telephone communication for deaf individuals the Cued Speech system
  • 5. Visual-speech to text conversion applicable to telephone communication for deaf individuals AIM OF NEW SYSTEM  To investigate the designing of a system able to automatically recognize Cued Speech and convert it to text.  Possible for deaf or speech-impaired individuals to communicate with each other and also with normal-hearing persons  Using gestures  captured by devices equipped by a camera
  • 6. Visual-speech to text conversion applicable to telephone communication for deaf individuals METHODS  Corpus, feature extraction, and statistical modeling  The speakers’ lips were painted blue, and color marks were placed on the speakers’ fingers. .  The data were derived from a video recording of the cuers pronouncing and coding in Cued Speech  landmarks with different colors were placed on the fingers
  • 7. Visual-speech to text conversion applicable to telephone communication for deaf individuals  faster and more accurate image processing stage.  The audio part of the video recording was synchronized with the image.  An automatic image processing method was appliedli pt ow idththe ( Av)i,d eo  lip aperture (B),  lip area (S).  pinching of the upper lip (Bsup)  lower (Binf) lip
  • 8. Visual-speech to text conversion applicable to telephone communication for deaf individuals  Concatenative feature fusion  Tracks and extracts the xy coordinates each time frame,  uses those values as features in the HMM modeling.  uses the concatenation of the synchronous lip shape and hand features as the joint feature vector given by,
  • 9. Visual-speech to text conversion applicable to telephone communication for deaf individuals Joint lip hand feature vector, Lip shape feature vector, Hand feature vector, Dimensionality of the joint feature vector  Parameters used for lip shape modeling.
  • 10. Visual-speech to text conversion applicable to telephone communication for deaf individuals RESULTS  Isolated word recognition 1. Recognition in normal-hearing subject
  • 11. Visual-speech to text conversion applicable to telephone communication for deaf individuals 2. Recognition in deaf subject
  • 12. Visual-speech to text conversion applicable to telephone communication for deaf individuals 3. Multi-speaker isolated word recognition:  investigate whether it is possible to train speaker-independent HMMs for Cued Speech recognition.  The training data consisted of 750 words from the normal-hearing subject, and 750 words from the deaf subject.  For testing 700 words from normal-hearing subject and 700 words from the deaf subject were used, respectively.  Each state was modeled with a mixture of 4 Gaussian distributions.  For lip shape and hand shape integration, concatenative feature fusion was used.
  • 13. Visual-speech to text conversion applicable to telephone communication for deaf individuals
  • 14. Visual-speech to text conversion applicable to telephone communication for deaf individuals 4. Continuous phoneme recognition  Phoneme correct for continuous phoneme word recognition in the case of a normal-hearing subject.
  • 15. Visual-speech to text conversion applicable to telephone communication for deaf individuals Phoneme correct for continuous phoneme word recognition in the case of a deaf subject.
  • 16. Visual-speech to text conversion applicable to telephone communication for deaf individuals CONCLUSION  Hand shapes and lips shape were integrated using concatenative feature fusion and HMM-based automatic recognition was conducted.  For continuous phoneme recognition, a 86% phoneme correct was achieved for the normal-hearing cuer and a 82.7% phoneme correct for the dead cuer were achieved, respectively.  Speech in both normal-hearing and deaf subjects were also conducted obtaining a 94.9% and a 89% accuracy, respectively. .
  • 17. Visual-speech to text conversion applicable to telephone communication for deaf individuals CONCLUSION  A multi-speaker experiment using data from both normal-hearing and deaf subject showed a 89.6% word accuracy, on average.  This result indicates that training speaker-independent HMMs for Cued Speech using a large number of subjects should not face particular difficulties
  • 18. Visual-speech to text conversion applicable to telephone communication for deaf individuals REFERENCES  G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior, “recent Advances in the automatic recognition of audiovisual speech,” in Proceedings of the IEEE, vol. 91, issue 9, pp. 1306–1326, 2003.  S. Nakamura, K. Kumatani, and S. Tamura, “Multi-modal temporal asynchronicity modeling by product hmms for robust audio-visual speech recognition,” in Proceedings of Fourth IEEE International Conference on Multimodal Interfaces (ICMI’02), p. 305, 2002.  R. O. Cornett, “Cued speech,” American Annals of the Deaf, vol. 112, pp. 3–13, 1967.  J. Leybaert, “Phonology acquired through the eyes and spelling in deaf children,”Journal of Experimental Child Psychology, vol. 75, pp. 291– 318, 2000
  • 20. Visual-speech to text conversion applicable to telephone communication for deaf individuals ANY QUESTION S?