Hands and Speech in Space

Hands and Speech in Space
Mark Billinghurst
mark.billinghurst@hitlabnz.org
The HIT Lab NZ, University of Canterbury
May 28th 2014

To Make the Vision Real..
  Hardware/software requirements
 Contact lens displays
 Free space hand/body tracking
 Speech/gesture recognition
 Etc..
  Most importantly
 Usability/User Experience

Natural Hand Interaction
  Using bare hands to interact with AR content
  MS Kinect depth sensing
  Real time hand tracking
  Physics based simulation model

Pros and Cons of Gesture Only Input
  Gesture-only good for
 Direct manipulation,
 Selection, Motion
 Rapid expressiveness
  Limitations
 Descriptions (eg Temporal information)
 Operation on large numbers of objects
 Indirect manipulation, delayed actions

Multimodal Interaction
  Combined speech and gesture input
  Gesture and Speech complimentary
  Speech: modal commands, quantities
  Gesture: selection, motion, qualities
  Previous work found multimodal interfaces
intuitive for 2D/3D graphics interaction
  However, few multimodal AR interfaces

Wizard of Oz Study
  What speech and gesture input
would people like to use?
  Wizard
  Perform speech recognition
  Command interpretation
  Domain
  3D object interaction/modelling
Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR
multimodal interface. In Proceedings of the 10th international conference on
Multimodal interfaces (pp. 249-256). ACM.

Key Results
  Most commands multimodal
  Multimodal (63%), Gesture (34%), Speech (4%)
  Most spoken phrases short
  74% phrases average 1.25 words long
  Sentences (26%) average 3 words
  Main gestures deictic (65%), metaphoric (35%)
  In multimodal commands gesture issued first
  94% time gesture begun before speech

Free Hand Multimodal Input
  Use free hand to interact with AR content
  Recognize simple gestures
  Open hand, closed hand, pointing
Point Move Pick/Drop
Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of
multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.

Speech Input
  MS Speech + MS SAPI (> 90% accuracy)
  Single word speech commands

Experimental Setup
Change object shape
and colour

User Evaluation
  Change object shape, colour and position
  Conditions
  (1) Speech only, (2) gesture only, (3) multimodal
  Measures
  performance time, errors, subjective survey

Results - Performance
  Average performance time
  Gesture: 15.44s
  Speech: 12.38s
  Multimodal: 11.78s
  Significant difference across conditions (p < 0.01)
  Difference between gesture and speech/MMI

Subjective Results (Likert 1-7)
  User subjective survey
  Gesture significantly worse, MMI and Speech same
  MMI perceived as most efficient
  Preference
  70% MMI, 25% speech only, 5% gesture only
Gesture Speech MMI
Naturalness 4.60 5.60 5.80
Ease of Use 4.00 5.90 6.00
Efficiency 4.45 5.15 6.05
Physical Effort 4.75 3.15 3.85

Observations
  Significant difference in number of commands
  Gesture (6.14), Speech (5.23), MMI (4.93)
  MMI Simultaneous vs. Sequential commands
  79% sequential, 21% simultaneous
  Reaction to system errors
  Almost always repeated same command
  In MMI rarely changes modalities

Lessons Learned
  Multimodal interaction significantly better than
gesture alone in AR interfaces for 3D tasks
  Shorter task time, more efficient
  Multimodal input was more natural, easier,
and more effective that gesture/speech only
  Simultaneous input rarely used
  More studies need to be conducted
  What gesture/speech patterns? Richer input

3D Gesture Tracking
  3 Gear Systems
  Kinect/Primesense Sensor
  Two hand tracking
  http://www.threegear.com

Skeleton Interaction + AR
  HMD AR View
  Viewpoint tracking
  Two hand input
  Skeleton interaction, occlusion

Conclusions
  AR experiences need new interaction methods
  Combined speech and gesture more powerful
  Complimentary input modalities
  Natural user interfaces possible
  Free hand gesture, speech, intelligence interfaces
  Important research directions for the future
  What gesture/speech commands should be used?
  Relationship better speech and gesture?

More Information
•  Mark Billinghurst
–  Email: mark.billinghurst@hitlabnz.org
–  Twitter: @marknb00
•  Website
–  http://www.hitlabnz.org/

Hands and Speech in Space

More Related Content

Similar to Hands and Speech in Space (20)

More from Mark Billinghurst (20)

Recently uploaded (20)

Hands and Speech in Space