Speech enhanced gesture based navigation for Google Maps
Speech Enhanced Gesture Based Navigation System for Google Maps
An exploration in Multimodal HCI
Under the Guidance of: Asst. Professor Manoj Majhi
Vikas Luthra | Himanshu Bansal | Maulishree Pandey
Goal of Our Journey
Abstract
• Conventional method of using different features of Google Maps on touch-based devices entails
use of touch-based gestures defined for the devices.
• For certain touch-based devices like public kiosks, touch-screens, etc, it is possible to define in-
air or 3D gestures.
• Coupled with basic speech commands, a new group of interactions can be prepared for
accessing Google Maps.
• However, it becomes important to measure the usability of this new group of gestures against the
conventional touch-based gestures before substation is considered.
Final Destination: Aim
• Define the gestures and speech commons for the features of Google maps, and evaluate them
against the existing interactions
Final Destination: Aim
• Define the gestures and speech commons for the features of Google maps, and evaluate them
against the existing interactions
• Compare and evaluate usability of 3D gestures as well as speech against touch-based gestures
for using Google Maps for a large touchscreen
The Route to follow for our Journey: Methodology
Literature Research (Aug 1st week – Sept 1st week)
Background of the technologies
Multimodal HCI theory
Similar Works
The Route to follow for our Journey: Methodology
Literature Research (Aug 1st week – Sept 1st week)
Background of the technologies
Multimodal HCI theory
Similar Works
System Definition and Design (Sept 2nd week –Oct 1st week)
To decide case-study features of Google maps
Use-case scenarios
Feature wise gesture definition
Addition of voice commands where gesture control is not applicable
The Route to follow for our Journey: Methodology
Prototype Development (Oct 2nd week-Nov 4th week)
Skelton Based Gesture Tracking System Development
Speech Recognition System Development
Debugging and Refinement
The Route to follow for our Journey: Methodology
Prototype Development (Oct 2nd week-Nov 4th week)
Skelton Based Gesture Tracking System Development
Speech Recognition System Development
Debugging and Refinement
Comparative Study (Next Semester)
Experiments on comparison between 2 solutions having different gestures and voice
commands
Statistical analysis
The Route to follow for our Journey: Methodology
Prototype Development (Oct 2nd week-Nov 4th week)
Skelton Based Gesture Tracking System Development
Speech Recognition System Development
Debugging and Refinement
Comparative Study (Next Semester)
Experiments on comparison between 2 solutions having different gestures and voice
commands
Statistical analysis
Conclusion (Next Semester)
Inferences and Guidelines
Mode of Transportation : Microsoft Kinect
Mode of Transportation : Microsoft Kinect
Mode of Transportation : Microsoft Kinect
Microsoft Kinect
• Kinect sensor can build a 'depth map' of the area in front of it.
• This depth map is used to recognize the distance of various objects in front of the kinect.
• One of the popular uses is recognizing and tracking people standing in front of the sensor.
• Kinect has four microphones to pick up audio
Mode of Transportation : Microsoft Kinect
Kinect for Windows SDK
• This SDK has been provided by Microsoft for free use and experimentation, without the
permission of commercial distribution. SDK contains APIs that allow tracking of people
in front of the Kinect and provide coordinates of different bodily joints.
• There are APIs that recognize basic and common hand gestures like grip, release, etc.
• Speech APIs are provided to capture sound and program them for use.
Mode of Transportation : Microsoft Kinect
Kinect for Windows SDK
• This SDK has been provided by Microsoft for free use and experimentation, without the
permission of commercial distribution. SDK contains APIs that allow tracking of people
in front of the Kinect and provide coordinates of different bodily joints.
• There are APIs that recognize basic and common hand gestures like grip, release, etc.
• Speech APIs are provided to capture sound and program them for use.
“We would be using Kinect for Windows SDK and Kinect for XBox 360 to design gestures
and recognition of certain speech commands. Development would occur in Microsoft
Visual Studio 2010, using C# programming language.”
Mode of Transportation : Speech Recognition
What is needed
1. Acoustic Model
probabilistic models which makes try to build connection between voice utterances and its
transcriptions present in training data
Mode of Transportation : Speech Recognition
What is needed
1. Acoustic Model
probabilistic models which makes try to build connection between voice utterances and its
transcriptions present in training data
2. Language Model
#monogram, #bigram, #trigram
not much in our case
Mode of Transportation : Speech Recognition
What is needed
1. Acoustic Model
probabilistic models which makes try to build connection between voice utterances and its
transcriptions present in training data
2. Language Model
#monogram, #bigram, #trigram
not much in our case
3. Mapping Dictionary
grapheme to phoneme
Mode of Transportation : Speech Recognition
Current Challenges
1. Large variability in accents
2. Variability in gender
3. Surrounding noise
4. So many names of cities and places
Mode of Transportation : Speech Recognition
Development Tools
1. Microsoft speech SDK 5.1
Preferable to work Microsoft Kinect
Mode of Transportation : Speech Recognition
Development Tools
1. Microsoft speech SDK 5.1
Preferable to work Microsoft Kinect
2. CMU sphinx 0.8
Open Source Toolkit For Speech Recognition
Mode of Transportation : Speech Recognition
Development Tools
1. Microsoft speech SDK 5.1
Preferable to work Microsoft Kinect
2. CMU sphinx 0.8
Open Source Toolkit For Speech Recognition
3. Dragon SDKs - Nuance
Discussions & Conclusion
1. Speech input is about 4 times faster than typing
2. Touch interaction on vertical screen can cause Gorilla Arm effect
3. Free hand gesture has been used previously also for navigation systems
4. Assumption of improved ease of use by integration these two modalities
5. Need to have training corpus for Indian accent users for ASR system
6. Need to define variables
Thank You for Listening
Picture abhi baaki hai mere dost (our journey still continues)……

More Related Content

PPTX
Sais svcc
PDF
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
PDF
市長直轄プロジェクトの設置 2013.11.20.
PPT
Future grps0-1226583494014006-9
PDF
Portfolio english
PDF
Future Offshore Foundations Web
PPTX
How to estimate_oracle_cost
PDF
JAWS-UG Aomori #0 LT
Sais svcc
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
市長直轄プロジェクトの設置 2013.11.20.
Future grps0-1226583494014006-9
Portfolio english
Future Offshore Foundations Web
How to estimate_oracle_cost
JAWS-UG Aomori #0 LT

Viewers also liked (20)

PPTX
Pm 04 华胜天成openstack实践汇报-20120808
PDF
Becker dossier, part 2
PPTX
CSS Layout Tutorial
PPTX
The Online Academy Budget $ t-r-e-t-c-h Opportunity-v171213
PDF
Energy UAB_master
PPTX
Java peresentation new soft
PPTX
CIC Networked Learning Practices Workshop - Caroline Haythornthwaite
DOCX
118773548 communication
PPT
lolcats
PDF
SafePeak - How to manually configure SafePeak Cluster
PDF
Bewonersbedrijf na tekening
PPTX
Veterans health care benefits
PPTX
Con8833 access at scale for hundreds of millions of users final
PPTX
BIRTE-13-Kawashima
ODP
Veiliger door gezond verstand - Presentatie Safe@schools 27 mei 2014
PPTX
Film opening lessons sep 2013
PDF
Pietilä: Move! Fyysisen toimintakyvyn seurantajärjestelmä
PPTX
6 Development Tools we Love for Mac
DOCX
Tugas 3 Rangkuman Protocol DNS, FTP, HTTP, dan SMTP
PDF
Paperless - smartare pappershantering
Pm 04 华胜天成openstack实践汇报-20120808
Becker dossier, part 2
CSS Layout Tutorial
The Online Academy Budget $ t-r-e-t-c-h Opportunity-v171213
Energy UAB_master
Java peresentation new soft
CIC Networked Learning Practices Workshop - Caroline Haythornthwaite
118773548 communication
lolcats
SafePeak - How to manually configure SafePeak Cluster
Bewonersbedrijf na tekening
Veterans health care benefits
Con8833 access at scale for hundreds of millions of users final
BIRTE-13-Kawashima
Veiliger door gezond verstand - Presentatie Safe@schools 27 mei 2014
Film opening lessons sep 2013
Pietilä: Move! Fyysisen toimintakyvyn seurantajärjestelmä
6 Development Tools we Love for Mac
Tugas 3 Rangkuman Protocol DNS, FTP, HTTP, dan SMTP
Paperless - smartare pappershantering
Ad

Similar to Speech enhanced gesture based navigation for Google Maps (20)

PPTX
AI for UI: How AI technology may support human-technology interaction by Roop...
PPTX
Aplikace pro rozpoznávání řeči - Jan Šedivý
PPTX
Next generation User interfaces
PDF
Gesture-based Interaction - Lecture 08 - Next Generation User Interfaces (401...
PPTX
Speak, wave, touch: How to do it right. User research insights about Natural ...
PPTX
What User Interface to Use for VR: 2D, 3D or Speech – A User Study
PPTX
BFA Digital Design Thesis Proposal Presentation DRAFT
PDF
ICS3211_lecture 08_2023.pdf
PPTX
ICS3211 lecture 07
PDF
TAT Dynamic UIs 250609
PPTX
Chapter 10 - Universal Design and User Support.pptx
PDF
ICS3211 Lecture 07
PPT
Chapter 9 Universal Design
PDF
Sixth sense technology
PDF
Jancke kinect programming
PPTX
Virtual mouse
PPTX
Methodology for the Development of Vocal User Interfaces
PDF
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
PDF
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
PPTX
Become a Voice and Gesture UI/UX Expert with CBitss
AI for UI: How AI technology may support human-technology interaction by Roop...
Aplikace pro rozpoznávání řeči - Jan Šedivý
Next generation User interfaces
Gesture-based Interaction - Lecture 08 - Next Generation User Interfaces (401...
Speak, wave, touch: How to do it right. User research insights about Natural ...
What User Interface to Use for VR: 2D, 3D or Speech – A User Study
BFA Digital Design Thesis Proposal Presentation DRAFT
ICS3211_lecture 08_2023.pdf
ICS3211 lecture 07
TAT Dynamic UIs 250609
Chapter 10 - Universal Design and User Support.pptx
ICS3211 Lecture 07
Chapter 9 Universal Design
Sixth sense technology
Jancke kinect programming
Virtual mouse
Methodology for the Development of Vocal User Interfaces
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
Guide presentation aegis-fp7-projects-round_table_2011-11-30_v0.1
Become a Voice and Gesture UI/UX Expert with CBitss
Ad

More from Himanshu Bansal (16)

PDF
Studies in application of Augmented Reality in E-Learning Courses
PPTX
Human senses: Making sense of a new language
PPTX
Textual and visual analysis of print advertisements
PPTX
Media as mirror vs. prosthesis
PDF
Intern presentation
PPTX
Shopping Mall Entrance Design
PDF
Piet Mondrian
PPTX
Sensitive Windows Explorer
PDF
Design of shopping mall entrance
PDF
IIT Delhi Branding
PPTX
Traplate
PDF
Matrix Magazine' 12- Anantha
PDF
Presentation1
PDF
chair_10020516
PDF
brick_10020516
PDF
matrix magazine pages
Studies in application of Augmented Reality in E-Learning Courses
Human senses: Making sense of a new language
Textual and visual analysis of print advertisements
Media as mirror vs. prosthesis
Intern presentation
Shopping Mall Entrance Design
Piet Mondrian
Sensitive Windows Explorer
Design of shopping mall entrance
IIT Delhi Branding
Traplate
Matrix Magazine' 12- Anantha
Presentation1
chair_10020516
brick_10020516
matrix magazine pages

Recently uploaded (20)

PPTX
UNIT III - GRAPHICS AND AUDIO FOR MOBILE
PDF
How Animation is Used by Sports Teams and Leagues
PPTX
Introduction to Building Information Modeling
PPT
Wheezing1.ppt powerpoint presentation for
PDF
2025CategoryRanking of technology university
PPTX
3 - Meeting Life Challengjrh89wyrhnadiurhjdsknhfueihru
PPTX
Presentation.pptx anemia in pregnancy in
PPTX
ENG4-Q2-W5-PPT (1).pptx nhdedhhehejjedheh
PDF
Instagram Marketing in 2025 Reels, Stories, and Strategy (14) (2).pdf
PPTX
Project_Presentation Bitcoin Price Prediction
PPTX
8086.pptx microprocessor and microcontroller
PPTX
22CDO02-IMGD-UNIT-I-MOBILE GAME DESIGN PROCESS
PDF
This presentation is made for a design foundation class at Avantika Universit...
PDF
Timeless Interiors by PEE VEE INTERIORS
PPTX
URBAN FINANCEnhynhynnnytnynnnynynyynynynyn
PPTX
Presentation1.pptxnmnmnmnjhjhkjkjkkjkjjk
PPTX
UNITy8 human computer interac5ion-1.pptx
PDF
IARG - ICTC ANALOG RESEARCH GROUP - GROUP 1 - CHAPTER 2.pdf
PPTX
SOBALAJE WORK.pptxe4544556y8878998yy6555y5
PDF
Designing Through Complexity - Four Perspectives.pdf
UNIT III - GRAPHICS AND AUDIO FOR MOBILE
How Animation is Used by Sports Teams and Leagues
Introduction to Building Information Modeling
Wheezing1.ppt powerpoint presentation for
2025CategoryRanking of technology university
3 - Meeting Life Challengjrh89wyrhnadiurhjdsknhfueihru
Presentation.pptx anemia in pregnancy in
ENG4-Q2-W5-PPT (1).pptx nhdedhhehejjedheh
Instagram Marketing in 2025 Reels, Stories, and Strategy (14) (2).pdf
Project_Presentation Bitcoin Price Prediction
8086.pptx microprocessor and microcontroller
22CDO02-IMGD-UNIT-I-MOBILE GAME DESIGN PROCESS
This presentation is made for a design foundation class at Avantika Universit...
Timeless Interiors by PEE VEE INTERIORS
URBAN FINANCEnhynhynnnytnynnnynynyynynynyn
Presentation1.pptxnmnmnmnjhjhkjkjkkjkjjk
UNITy8 human computer interac5ion-1.pptx
IARG - ICTC ANALOG RESEARCH GROUP - GROUP 1 - CHAPTER 2.pdf
SOBALAJE WORK.pptxe4544556y8878998yy6555y5
Designing Through Complexity - Four Perspectives.pdf

Speech enhanced gesture based navigation for Google Maps

  • 2. Speech Enhanced Gesture Based Navigation System for Google Maps An exploration in Multimodal HCI Under the Guidance of: Asst. Professor Manoj Majhi Vikas Luthra | Himanshu Bansal | Maulishree Pandey
  • 3. Goal of Our Journey Abstract • Conventional method of using different features of Google Maps on touch-based devices entails use of touch-based gestures defined for the devices. • For certain touch-based devices like public kiosks, touch-screens, etc, it is possible to define in- air or 3D gestures. • Coupled with basic speech commands, a new group of interactions can be prepared for accessing Google Maps. • However, it becomes important to measure the usability of this new group of gestures against the conventional touch-based gestures before substation is considered.
  • 4. Final Destination: Aim • Define the gestures and speech commons for the features of Google maps, and evaluate them against the existing interactions
  • 5. Final Destination: Aim • Define the gestures and speech commons for the features of Google maps, and evaluate them against the existing interactions • Compare and evaluate usability of 3D gestures as well as speech against touch-based gestures for using Google Maps for a large touchscreen
  • 6. The Route to follow for our Journey: Methodology Literature Research (Aug 1st week – Sept 1st week) Background of the technologies Multimodal HCI theory Similar Works
  • 7. The Route to follow for our Journey: Methodology Literature Research (Aug 1st week – Sept 1st week) Background of the technologies Multimodal HCI theory Similar Works System Definition and Design (Sept 2nd week –Oct 1st week) To decide case-study features of Google maps Use-case scenarios Feature wise gesture definition Addition of voice commands where gesture control is not applicable
  • 8. The Route to follow for our Journey: Methodology Prototype Development (Oct 2nd week-Nov 4th week) Skelton Based Gesture Tracking System Development Speech Recognition System Development Debugging and Refinement
  • 9. The Route to follow for our Journey: Methodology Prototype Development (Oct 2nd week-Nov 4th week) Skelton Based Gesture Tracking System Development Speech Recognition System Development Debugging and Refinement Comparative Study (Next Semester) Experiments on comparison between 2 solutions having different gestures and voice commands Statistical analysis
  • 10. The Route to follow for our Journey: Methodology Prototype Development (Oct 2nd week-Nov 4th week) Skelton Based Gesture Tracking System Development Speech Recognition System Development Debugging and Refinement Comparative Study (Next Semester) Experiments on comparison between 2 solutions having different gestures and voice commands Statistical analysis Conclusion (Next Semester) Inferences and Guidelines
  • 11. Mode of Transportation : Microsoft Kinect
  • 12. Mode of Transportation : Microsoft Kinect
  • 13. Mode of Transportation : Microsoft Kinect Microsoft Kinect • Kinect sensor can build a 'depth map' of the area in front of it. • This depth map is used to recognize the distance of various objects in front of the kinect. • One of the popular uses is recognizing and tracking people standing in front of the sensor. • Kinect has four microphones to pick up audio
  • 14. Mode of Transportation : Microsoft Kinect Kinect for Windows SDK • This SDK has been provided by Microsoft for free use and experimentation, without the permission of commercial distribution. SDK contains APIs that allow tracking of people in front of the Kinect and provide coordinates of different bodily joints. • There are APIs that recognize basic and common hand gestures like grip, release, etc. • Speech APIs are provided to capture sound and program them for use.
  • 15. Mode of Transportation : Microsoft Kinect Kinect for Windows SDK • This SDK has been provided by Microsoft for free use and experimentation, without the permission of commercial distribution. SDK contains APIs that allow tracking of people in front of the Kinect and provide coordinates of different bodily joints. • There are APIs that recognize basic and common hand gestures like grip, release, etc. • Speech APIs are provided to capture sound and program them for use. “We would be using Kinect for Windows SDK and Kinect for XBox 360 to design gestures and recognition of certain speech commands. Development would occur in Microsoft Visual Studio 2010, using C# programming language.”
  • 16. Mode of Transportation : Speech Recognition What is needed 1. Acoustic Model probabilistic models which makes try to build connection between voice utterances and its transcriptions present in training data
  • 17. Mode of Transportation : Speech Recognition What is needed 1. Acoustic Model probabilistic models which makes try to build connection between voice utterances and its transcriptions present in training data 2. Language Model #monogram, #bigram, #trigram not much in our case
  • 18. Mode of Transportation : Speech Recognition What is needed 1. Acoustic Model probabilistic models which makes try to build connection between voice utterances and its transcriptions present in training data 2. Language Model #monogram, #bigram, #trigram not much in our case 3. Mapping Dictionary grapheme to phoneme
  • 19. Mode of Transportation : Speech Recognition Current Challenges 1. Large variability in accents 2. Variability in gender 3. Surrounding noise 4. So many names of cities and places
  • 20. Mode of Transportation : Speech Recognition Development Tools 1. Microsoft speech SDK 5.1 Preferable to work Microsoft Kinect
  • 21. Mode of Transportation : Speech Recognition Development Tools 1. Microsoft speech SDK 5.1 Preferable to work Microsoft Kinect 2. CMU sphinx 0.8 Open Source Toolkit For Speech Recognition
  • 22. Mode of Transportation : Speech Recognition Development Tools 1. Microsoft speech SDK 5.1 Preferable to work Microsoft Kinect 2. CMU sphinx 0.8 Open Source Toolkit For Speech Recognition 3. Dragon SDKs - Nuance
  • 23. Discussions & Conclusion 1. Speech input is about 4 times faster than typing 2. Touch interaction on vertical screen can cause Gorilla Arm effect 3. Free hand gesture has been used previously also for navigation systems 4. Assumption of improved ease of use by integration these two modalities 5. Need to have training corpus for Indian accent users for ASR system 6. Need to define variables
  • 24. Thank You for Listening Picture abhi baaki hai mere dost (our journey still continues)……