SlideShare a Scribd company logo
Hands and Speech in Space
Mark Billinghurst
mark.billinghurst@hitlabnz.org
The HIT Lab NZ, University of Canterbury
May 28th 2014
2012 – Iron Man 2
To Make the Vision Real..
  Hardware/software requirements
 Contact lens displays
 Free space hand/body tracking
 Speech/gesture recognition
 Etc..
  Most importantly
 Usability/User Experience
Natural Hand Interaction
  Using bare hands to interact with AR content
  MS Kinect depth sensing
  Real time hand tracking
  Physics based simulation model
Pros and Cons of Gesture Only Input
  Gesture-only good for
 Direct manipulation,
 Selection, Motion
 Rapid expressiveness
  Limitations
 Descriptions (eg Temporal information)
 Operation on large numbers of objects
 Indirect manipulation, delayed actions
Multimodal Interaction
  Combined speech and gesture input
  Gesture and Speech complimentary
  Speech: modal commands, quantities
  Gesture: selection, motion, qualities
  Previous work found multimodal interfaces
intuitive for 2D/3D graphics interaction
  However, few multimodal AR interfaces
Wizard of Oz Study
  What speech and gesture input
would people like to use?
  Wizard
  Perform speech recognition
  Command interpretation
  Domain
  3D object interaction/modelling
Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR
multimodal interface. In Proceedings of the 10th international conference on
Multimodal interfaces (pp. 249-256). ACM.
System Architecture
System Set Up
Key Results
  Most commands multimodal
  Multimodal (63%), Gesture (34%), Speech (4%)
  Most spoken phrases short
  74% phrases average 1.25 words long
  Sentences (26%) average 3 words
  Main gestures deictic (65%), metaphoric (35%)
  In multimodal commands gesture issued first
  94% time gesture begun before speech
Free Hand Multimodal Input
  Use free hand to interact with AR content
  Recognize simple gestures
  Open hand, closed hand, pointing
Point Move Pick/Drop
Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of
multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
Speech Input
  MS Speech + MS SAPI (> 90% accuracy)
  Single word speech commands
Multimodal Architecture
Multimodal Fusion
Hand Occlusion
Experimental Setup
Change object shape
and colour
User Evaluation
  Change object shape, colour and position
  Conditions
  (1) Speech only, (2) gesture only, (3) multimodal
  Measures
  performance time, errors, subjective survey
Results - Performance
  Average performance time
  Gesture: 15.44s
  Speech: 12.38s
  Multimodal: 11.78s
  Significant difference across conditions (p < 0.01)
  Difference between gesture and speech/MMI
Subjective Results (Likert 1-7)
  User subjective survey
  Gesture significantly worse, MMI and Speech same
  MMI perceived as most efficient
  Preference
  70% MMI, 25% speech only, 5% gesture only
Gesture Speech MMI
Naturalness 4.60 5.60 5.80
Ease of Use 4.00 5.90 6.00
Efficiency 4.45 5.15 6.05
Physical Effort 4.75 3.15 3.85
Observations
  Significant difference in number of commands
  Gesture (6.14), Speech (5.23), MMI (4.93)
  MMI Simultaneous vs. Sequential commands
  79% sequential, 21% simultaneous
  Reaction to system errors
  Almost always repeated same command
  In MMI rarely changes modalities
Lessons Learned
  Multimodal interaction significantly better than
gesture alone in AR interfaces for 3D tasks
  Shorter task time, more efficient
  Multimodal input was more natural, easier,
and more effective that gesture/speech only
  Simultaneous input rarely used
  More studies need to be conducted
  What gesture/speech patterns? Richer input
3D Gesture Tracking
  3 Gear Systems
  Kinect/Primesense Sensor
  Two hand tracking
  http://www.threegear.com
Skeleton Interaction + AR
  HMD AR View
  Viewpoint tracking
  Two hand input
  Skeleton interaction, occlusion
AR Rift Display
Hands and Speech in Space
Hands and Speech in Space
Conclusions
  AR experiences need new interaction methods
  Combined speech and gesture more powerful
  Complimentary input modalities
  Natural user interfaces possible
  Free hand gesture, speech, intelligence interfaces
  Important research directions for the future
  What gesture/speech commands should be used?
  Relationship better speech and gesture?
More Information
•  Mark Billinghurst
–  Email: mark.billinghurst@hitlabnz.org
–  Twitter: @marknb00
•  Website
–  http://www.hitlabnz.org/

More Related Content

PDF
Modeling the Dynamics of Gaze-Contingent Social Behaviors in Human-Agent Inte...
PDF
Hands and Speech in Space: Multimodal Input for Augmented Reality
PDF
Natural Interaction for Augmented Reality Applications
PDF
ICS3211_lecture 08_2023.pdf
PDF
ICS3211 Lecture 07
PPTX
ICS3211 lecture 07
PDF
Comp4010 Lecture13 More Research Directions
PPTX
What User Interface to Use for VR: 2D, 3D or Speech – A User Study
Modeling the Dynamics of Gaze-Contingent Social Behaviors in Human-Agent Inte...
Hands and Speech in Space: Multimodal Input for Augmented Reality
Natural Interaction for Augmented Reality Applications
ICS3211_lecture 08_2023.pdf
ICS3211 Lecture 07
ICS3211 lecture 07
Comp4010 Lecture13 More Research Directions
What User Interface to Use for VR: 2D, 3D or Speech – A User Study

Similar to Hands and Speech in Space (20)

PDF
Trans-Disciplinary Practice week 6
PDF
Multimodal Multi-sensory Interaction for Mixed Reality
PPTX
Multimodal Interaction
PDF
426 lecture6b: AR Interaction
PDF
The Reality of Augmented Reality: Are we there yet?
PPTX
Usability of Gestural Interfaces
PDF
Mobile AR Lecture 10 - Research Directions
PPTX
PDF
Cross Reality User Experience: Taming the Wild West of VR and AR Interaction ...
PDF
COSC 426 lect. 4: AR Interaction
PPTX
Exploring mixed reality ui ben franfurter
PDF
Abstract MSc thesis (linedin)
PDF
Comp4010 Lecture10 VR Interface Design
PPTX
Map Navigation using hand gesture recognition
PDF
Future Research Directions for Augmented Reality
PPTX
Work completion seminar defence
PPT
Multimodal man machine interaction
PPTX
BFA Digital Design Thesis Proposal Presentation DRAFT
PDF
An analysis of desktop control and information retrieval from the internet us...
PDF
An analysis of desktop control and information retrieval from the internet us...
Trans-Disciplinary Practice week 6
Multimodal Multi-sensory Interaction for Mixed Reality
Multimodal Interaction
426 lecture6b: AR Interaction
The Reality of Augmented Reality: Are we there yet?
Usability of Gestural Interfaces
Mobile AR Lecture 10 - Research Directions
Cross Reality User Experience: Taming the Wild West of VR and AR Interaction ...
COSC 426 lect. 4: AR Interaction
Exploring mixed reality ui ben franfurter
Abstract MSc thesis (linedin)
Comp4010 Lecture10 VR Interface Design
Map Navigation using hand gesture recognition
Future Research Directions for Augmented Reality
Work completion seminar defence
Multimodal man machine interaction
BFA Digital Design Thesis Proposal Presentation DRAFT
An analysis of desktop control and information retrieval from the internet us...
An analysis of desktop control and information retrieval from the internet us...
Ad

More from Mark Billinghurst (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
PDF
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
PDF
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
PDF
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
PDF
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
PDF
Research Directions in Heads-Up Computing
PDF
IVE 2024 Short Course - Lecture18- Hacking Emotions in VR Collaboration.
PDF
IVE 2024 Short Course - Lecture13 - Neurotechnology for Enhanced Interaction ...
PDF
IVE 2024 Short Course Lecture15 - Measuring Cybersickness
PDF
IVE 2024 Short Course - Lecture14 - Evaluation
PDF
IVE 2024 Short Course - Lecture12 - OpenVibe Tutorial
PDF
IVE 2024 Short Course Lecture10 - Multimodal Emotion Recognition in Conversat...
PDF
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
PDF
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
PDF
IVE 2024 Short Course - Lecture16- Cognixion Axon-R
PDF
IVE 2024 Short Course - Lecture 2 - Fundamentals of Perception
PDF
Research Directions for Cross Reality Interfaces
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Research Directions in Heads-Up Computing
IVE 2024 Short Course - Lecture18- Hacking Emotions in VR Collaboration.
IVE 2024 Short Course - Lecture13 - Neurotechnology for Enhanced Interaction ...
IVE 2024 Short Course Lecture15 - Measuring Cybersickness
IVE 2024 Short Course - Lecture14 - Evaluation
IVE 2024 Short Course - Lecture12 - OpenVibe Tutorial
IVE 2024 Short Course Lecture10 - Multimodal Emotion Recognition in Conversat...
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture16- Cognixion Axon-R
IVE 2024 Short Course - Lecture 2 - Fundamentals of Perception
Research Directions for Cross Reality Interfaces
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A comparative analysis of optical character recognition models for extracting...
Group 1 Presentation -Planning and Decision Making .pptx
Tartificialntelligence_presentation.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia

Hands and Speech in Space

  • 1. Hands and Speech in Space Mark Billinghurst mark.billinghurst@hitlabnz.org The HIT Lab NZ, University of Canterbury May 28th 2014
  • 3. To Make the Vision Real..   Hardware/software requirements  Contact lens displays  Free space hand/body tracking  Speech/gesture recognition  Etc..   Most importantly  Usability/User Experience
  • 4. Natural Hand Interaction   Using bare hands to interact with AR content   MS Kinect depth sensing   Real time hand tracking   Physics based simulation model
  • 5. Pros and Cons of Gesture Only Input   Gesture-only good for  Direct manipulation,  Selection, Motion  Rapid expressiveness   Limitations  Descriptions (eg Temporal information)  Operation on large numbers of objects  Indirect manipulation, delayed actions
  • 6. Multimodal Interaction   Combined speech and gesture input   Gesture and Speech complimentary   Speech: modal commands, quantities   Gesture: selection, motion, qualities   Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction   However, few multimodal AR interfaces
  • 7. Wizard of Oz Study   What speech and gesture input would people like to use?   Wizard   Perform speech recognition   Command interpretation   Domain   3D object interaction/modelling Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR multimodal interface. In Proceedings of the 10th international conference on Multimodal interfaces (pp. 249-256). ACM.
  • 10. Key Results   Most commands multimodal   Multimodal (63%), Gesture (34%), Speech (4%)   Most spoken phrases short   74% phrases average 1.25 words long   Sentences (26%) average 3 words   Main gestures deictic (65%), metaphoric (35%)   In multimodal commands gesture issued first   94% time gesture begun before speech
  • 11. Free Hand Multimodal Input   Use free hand to interact with AR content   Recognize simple gestures   Open hand, closed hand, pointing Point Move Pick/Drop Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
  • 12. Speech Input   MS Speech + MS SAPI (> 90% accuracy)   Single word speech commands
  • 17. User Evaluation   Change object shape, colour and position   Conditions   (1) Speech only, (2) gesture only, (3) multimodal   Measures   performance time, errors, subjective survey
  • 18. Results - Performance   Average performance time   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s   Significant difference across conditions (p < 0.01)   Difference between gesture and speech/MMI
  • 19. Subjective Results (Likert 1-7)   User subjective survey   Gesture significantly worse, MMI and Speech same   MMI perceived as most efficient   Preference   70% MMI, 25% speech only, 5% gesture only Gesture Speech MMI Naturalness 4.60 5.60 5.80 Ease of Use 4.00 5.90 6.00 Efficiency 4.45 5.15 6.05 Physical Effort 4.75 3.15 3.85
  • 20. Observations   Significant difference in number of commands   Gesture (6.14), Speech (5.23), MMI (4.93)   MMI Simultaneous vs. Sequential commands   79% sequential, 21% simultaneous   Reaction to system errors   Almost always repeated same command   In MMI rarely changes modalities
  • 21. Lessons Learned   Multimodal interaction significantly better than gesture alone in AR interfaces for 3D tasks   Shorter task time, more efficient   Multimodal input was more natural, easier, and more effective that gesture/speech only   Simultaneous input rarely used   More studies need to be conducted   What gesture/speech patterns? Richer input
  • 22. 3D Gesture Tracking   3 Gear Systems   Kinect/Primesense Sensor   Two hand tracking   http://www.threegear.com
  • 23. Skeleton Interaction + AR   HMD AR View   Viewpoint tracking   Two hand input   Skeleton interaction, occlusion
  • 27. Conclusions   AR experiences need new interaction methods   Combined speech and gesture more powerful   Complimentary input modalities   Natural user interfaces possible   Free hand gesture, speech, intelligence interfaces   Important research directions for the future   What gesture/speech commands should be used?   Relationship better speech and gesture?
  • 28. More Information •  Mark Billinghurst –  Email: mark.billinghurst@hitlabnz.org –  Twitter: @marknb00 •  Website –  http://www.hitlabnz.org/