Liu Ren at AI Frontiers: Sensor-aware Augmented Reality

Sensor-Aware Augmented Reality
Addressing Real-World HMI Challenges
Dr. Liu Ren
Global Head and Chief Scientist, HMI
Bosch Research North America
Palo Alto, CA

Research and Technology Center North America | Liu Ren | 12/20/2016
© 2016 Robert Bosch LLC and affiliates. All rights reserved.
2
Bosch Overview
Mobility Solutions Industrial Technology
Energy and Building
Technology
Consumer Goods
Bosch is one of the world’s leading international providers of technology and services
• 375,0001 Bosch associates
• More than 4401 subsidiary companies and regional subsidiaries in some 601 countries
• Including its sales and service partners, Bosch is represented in some 1501 countries.
1 As of Dec. 2015
Home Appliance
Personal
Assistant
Home Robots
Garden Tools Smart Home
Internet of Things (IoT)
Thermothenology
Security
Systems
Smart Cities
Assembly Technology
Industry 4.0
Packaging Technology
Industrial
Robots
Car Infotainment
Concept Car
Autonomous
Driving
Automotive Aftermarket

3
Industry 4.0Smart HomeRoboticsAftermarket Repair Shops
Car Infotainment
Highly Automated Driving
Human Machine Interaction (HMI) Research in Bosch

Industry 4.0Smart HomeRoboticsAftermarket Repair Shops
Car Infotainment
Highly Automated Driving
4
Key Success Factors of HMI Products
Intuitive Interactive Intelligent
Human Machine Interaction

• Global Head and Chief
Scientist, HMI, Bosch
Research
• Ph.D. and M.Sc. in
Computer Science,
Carnegie Mellon
University
• B.Sc. in Computer
Science, Zhejiang
University, P.R. China
5
Global HMI Research Team
Renningen, Germany
Shanghai, China
Headquarters
Palo Alto, USA
Liu Ren Short Bio HMI teams in Bosch Research

6
Real-World HMI Challenges for (Wearable) AR
Hardware
Form Factor
SoftwareField-of-View
Comfort
Battery Life
Context-Aware
Visualization
Scalable & Easy
Content Generation
Natural Interaction
(Speech, Gesture, etc.)
AI
(Perception,
Understanding, etc.)

7
Bosch Product: CAP (Common Augmented Reality Platform)
Software
Context-Aware
Visualization
Scalable & Easy
Content Generation
Natural Interaction
Bosch CAP enables implementation of complete enterprises AR solutions
• Integrates the production of visual and digital content directly into the authoring process.
• Existing CAD, image and video data were used and save the expense of creating new content.
Bosch CAP
Production and
manufacturing
Target/actual comparison
and collision planning
Plant and system
planning
Education and
training Maintenance, service
and repair
Marketing, trade shows
and distribution
Technical doc. and digital
operating instructions

8
Our Sensor-Aware Solutions
Software
Context-Aware
Visualization
Scalable & Easy
Content Generation
Natural Interaction
Dynamic occlusion handling
ISMAR 2016[1]
• Enhance realistic depth
perception
Robust visual tracking
ISMAR 2016[2]
• Improve tracking robustness
and accuracy
[1] Chao Du, Yen-Lin Chen, Mao Ye, and Liu Ren, “Edge Snapping-Based Depth Enhancement for Dynamic Occlusion Handling in
Augmented Reality”, IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2016.
[2] Benzun Wisely Babu, Soohwan Kim, Zhixin Yan, and Liu Ren, “σ-DVO: Sensor Noise Model Meets Dense Visual Odometry”, IEEE
International Symposium on Mixed and Augmented Reality (ISMAR) 2016.

9
Dynamic Occlusion Handling
Software
Context-Aware
Visualization
Scalable & Easy
Content Generation
Natural Interaction
ISMAR 2016[2]
and accuracy
ISMAR 2016[1]
perception

1. Compact setup: single sensor
2. Dynamic occlusion handling
10
Dynamic Occlusion Handling: Motivation
One near-range RGBD sensor
Optical see-through head-
mounted display (HMD)
Challenges
• Performance requirements for real-time AR applications
• Limited computational resources (e.g., on tablet)
Goals

11
Dynamic Occlusion Handling: Our Sensor-Aware Solution
Target Object Boundary
Boundary from Depth
Align Object Boundary (Edge-Snapping)1
Depth
Color
Enhance Depth Map2
Raw Depth Map Enhanced Depth Map
• Use color images as guidance
• Snap object boundaries in depth data towards the
edges in color images
• Formulated as an optimization problem, efficiently
solved via dynamic programming
Edge Snapping-Based Algorithm
Depth data not reliable at the object boundary
• Structured Light/Stereo: matching is not accurate
at the boundary
• Time of Flight: light signal reaching object
boundary barely bounce back to the sensor
Knowledge on RGBD Sensor

12
Dynamic Occlusion Handling: Experimental Results

13
Robust Visual Tracking
Software
Context-Aware
Visualization
Scalable & Easy
Content Generation
Natural Interaction
ISMAR 2016[1]
perception
ISMAR 2016[2]
and accuracy

14
Robust Visual Tracking: Motivation
1. Visual tracking is an essential AR
component
• 6 DoF camera pose
• Correctly place virtual objects in real world
2. Markerless Visual Tracking:
• Visual SLAM (simultaneous localization
and mapping)
Background
Challenges
Textureless Blurry image Lighting condition change

15
Robust Visual Tracking: Our Sensor-Aware Solution (σ-DVO)
• Working well with textureless environments
• Less sensitive to lighting condition changes
• Noise of depth measurement grows quadratically as depth
increases
• Estimate the relative pose between two given frames based on
residuals (front-end of visual SLAM)
• Utilize all pixels from RGBD images
RGBD Dense Visual OdometryKnowledge on RGBD Sensor
Previous Frame
− =
Current Frame
Relative Camera Pose
Warped Current Frame Residuals
Color Residual Depth Residual
Non-linear Optimization
Previous Frame Current Frame
x
Color Weights Depth Weights
Weights
Weight decreases as
noise of depth
measurement grows
Sensor-Aware
Weighting
• Incorporate sensor noise model to guide pose optimization

Previous Frame
− =
Current Frame
Warped Current Frame
Non-linear Optimization
Residuals
Color Residual Depth Residual
16
Robust Visual Tracking: Our Sensor-Aware Solution (σ-DVO)
• For better robustness and accuracy, we formulated the optimization problem in a
Bayesian framework
Bayesian Framework
𝑝 𝑝𝑜𝑠𝑒|𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 ∝ 𝒑 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔|𝒑𝒐𝒔𝒆 ∙ 𝑝 𝑝𝑜𝑠𝑒
• Assume uniform
distribution of residuals
 All the pixels share the
same weight
Early approaches
• Find an empirical distribution via
experiments
 Weights only depends on residuals
The state-of-the-art approach (DVO[1])
• Explore the source of residuals, especially based on sensor
characteristics
• Develop a sensor noise model to generate distribution of
residuals
 Decrease weights of pixels with either noisy sensor
measurement or high residuals
• Easily incorporate sensor-specific noise model for different
sensors to customize pose optimization for best performance
Our sensor-aware approach (σ-DVO)
Sensor Measurement Noise
One near-range
RGBD sensor
x
Color Weights Depth Weights
Weights
[1] Christian Kerl, Jürgen Sturm, and Daniel Cremers. "Dense visual SLAM for RGB-D cameras." 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013.

Visual SLAM *
17
Robust Visual Tracking: Experimental Results
Dataset
RGB-D
SLAM[2]
MRSM
ap[3]
Kintinu
ous[4]
ElasticFu
sion[5]
DVO
SLAM[1]
Our SLAM
approach
r1/desk 0.023 0.043 0.037 0.020 0.021 0.019
fr2/xyz 0.008 0.020 0.029 0.011 0.018 0.018
fr3/office 0.032 0.042 0.030 0.017 0.035 0.015
fr1/360 0.079 0.069 - - 0.083 0.061
Dataset
DVO[1] Our approach (σ-DVO )
ATE RPE ATE RPE
fr1/360 0.415 0.153 0.229 0.110
fr1/desk 0.109 0.048 0.067 0.039
fr1/desk2 0.261 0.074 0.088 0.065
fr1/floor 0.242 0.070 0.226 0.053
fr1/room 0.459 0.092 0.314 0.063
fr1/rpy 0.216 0.065 0.072 0.046
fr1/xyz 0.102 0.05 0.052 0.036
fr2/desk 0.561 0.038 0.184 0.016
fr2/large 4.370 0.240 0.724 0.134
fr2/rpy 0.501 0.039 0.188 0.012
fr2/xyz 0.497 0.030 0.188 0.010
fr3/office 0.485 0.044 0.164 0.014
average 0.684 0.067 0.208 0.050
Absolute Tracking Error [m]
ATE: Absolute Tracking Error [m], RPE: Relative Pose Error [m/s]
Visual Odometry
• Our σ-DVO outperforms DVO significantly in all the datasets. On average, 70%
reduction in ATE and 25% reduction in RPE
• σ-DVO SLAM outperforms the state-of-the-art SLAM algorithms
in most of the RGB-D datasets. On average 25% reduction in ATE
* σ-DVO is extended to σ-DVO SLAM by combining the front end (visual odometry)
with the backend (pose-graph optimization)
[1] Kerl, et al. "Dense visual SLAM for RGB-D cameras." 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2] Endres, et al. "An evaluation of the RGB-D SLAM system." Robotics and Automation (ICRA), 2012.
[3] Stückler, et al. “Model Learning and Real-Time Tracking using Multi-Resolution Surfel Maps”. AAAI, 2012.
[4] Whelan, et al. "Kintinuous: Spatially extended kinectfusion." Proc. Workshop RGB-D, Adv. Reason. Depth Cameras, 2012.
[5] Whelan, et al. "ElasticFusion: Dense SLAM without a pose graph." Proc. Robotics: Science and Systems, 2015.

18
Robust Visual Tracking: Experimental Results

19
Deep Learning for Augmented Reality?
Software
Context-Aware
Visualization
Scalable & Easy
Content Generation
Natural Interaction
?
• Require semantic understanding of the
environments & context
• Modern AI technologies, e.g., Deep
Learning, could be effective approaches

20
Summary and Outlook
1 The three “I”s (Intuitive, Interactive, Intelligent) are key success factors of
Human Machine Interaction (HMI) solutions.
2 Sensor-aware approaches that leverage sensor knowledge and machine
learning are effective to address real-world HMI challenges.
3 Using the right AI technology to address the right problem. Deep Learning,
could be effective for core AR solutions.

We’re looking for good research scientists and interns!
(Send CV to rbvisualcomp@gmail.com)
Thank You

Liu Ren at AI Frontiers: Sensor-aware Augmented Reality

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Liu Ren at AI Frontiers: Sensor-aware Augmented Reality (20)

More from AI Frontiers (20)

Recently uploaded (20)

Liu Ren at AI Frontiers: Sensor-aware Augmented Reality