(Research Note) Model-Aided Monocular Visual-Inertial State Estimation and Dense Mapping

National Chung Cheng University, Taiwan
Robot Vision Laboratory
2018/03/22
Jacky Liu
(Research Note)
Model-Aided Monocular Visual-Inertial
State Estimation and Dense Mapping

About this work
Model-Aided Monocular Visual-Inertial State Estimati
on and Dense Mapping
Kejie Qiu1 , Shaojie Shen1
IROS2017 - IEEE/RSJ International Conference on Intelligent Robots an
d Systems
1. Department of Electronic and Computer Engineering, Hong Kong University of Science an
d Technology, Hong Kong, China
2018/03/22 Model-Aided Monocular Visual-Inertial State Estimation and Dense Mapping 2

Related works (1/5)
Global localization solutions based on place recognition [1][2] can only
obtain topological localization which is not accurate enough for closed-l
oop control
[1] G. Schindler, M. Brown, and R. Szeliski, “City-scale location recognition,” in Computer Vi
sion and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, pp. 1–7.
[2] M. Cummins and P. Newman, “Fab-map: Probabilistic localization and mapping in the spa
ce of appearance,” The International Journal of Robotics Research, vol. 27, no. 6, pp. 647–665,
2008.
Problem 1: Global localization

Related works (2/5)
The odometry-based methods suffer from long-term drifting while SLA
M-based approaches can not guarantee global consistency before a m
ajor loop closures detection.
Fusion of odometry, SLAM and GNSS may resolve the localization pro
blem in most cases, but it still does not guarantee drift-free localization
at all times.
Problem 2: Drift

Related works (3/5)
Depth camera
• It has the intrinsic detection limitation impedes outdoor applications.
Stereo camera
• It has limited baseline constrains detection range.
• The extrinsic calibration is another issue for easy use.
Problem 3: Sensing range

Related works (4/5)
Computation Map
[10] high Dense
LSD-SLAM low Semi-dense
DTAM high Dense
REMODE high Dense(mono)
3D model based[15] low Dense

Related works (5/5)
Multistate constraint Kalman filter (MSCKF) [6] is a light-weight filter-ba
sed solution of fusing visual odometry and IMU data.
[6] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint kalman filter for vision-aided
inertial navigation,” in Robotics and automation, 2007 IEEE international conference on. IEEE,
2007, pp. 3565–3572.
MSCKF

Contributions (1/2)
1. GPS not available => 3D model based localization
2. Drift problem => 3D model based localization
3. Sensing range => Monocular temporal stereo
Solutions

Contributions (2/2)
1. Handling simultaneously global localization and real-
time dense mapping problem with minimum sensing
and a rough prior 3D model.
2. Integrating the model-based global localization
method with a tightly-coupled visual-inertial fusion
method to get all-the-time global localization with
high local accuracy.
3. Implementing motion stereo with depth prior
rendered form a prior 3D model to realize accurate
environment awareness.

Overview

Method
1. Global pose fusion (MSCKF)
2. Fused state estimation
3. Semi-global matching

Global pose update for visual-inertial odometry
The original visual-inertial odometry (VIO) can already handle local are
a autonomous navigation robustly.
They use MSCKF(multi-state constraint kalman filter) as the VIO imple
mentation that is based on Kalman filter and treat the global pose upda
te as an additional EKF update.
Multi-view cost aggregation
VIO
3D model
global pose
All-the-time
global-
consistent
property
Global
Pose
update
Cost
aggregation
Semi-global
matching

MSCKF parameters
Global
Pose
update
Cost
aggregation
Semi-global
matching
rotation
positionvelocity
Bias of gryo and accelemeter
frame 1 frame N
Kalman
gain
residual

MSCKF parameters
Global
Pose
update
Cost
aggregation
Semi-global
matching

Observation (sensing)
Global
Pose
update
Cost
aggregation
Semi-global
matching
MSCKF camera state
global observation
symbol ⊗ denotes quaternion multiplication

EKF
http://ais.informatik.uni-freiburg.de/teaching/ws12/mapping/pdf/slam04-ekf-slam.pdf

Quaternion Multiplication
https://www.mathworks.com/help/aeroblks/quaternionmultiplication.html

Update (prediction)
Global
Pose
update
Cost
aggregation
Semi-global
matching

Monocular dense mapping with depth prior constrains
Different from spatial stereo where only two calibrated views are used for depth e
stimation, multiple temporal camera views are used for depth estimation with prec
ise pose estimation for every camera frame.
The advantage of using multiple temporal camera views:
1. No baseline limitation (camera mounting distance limitation)
Therefore, the same depth estimation scheme can be used both small indoor
environments and large outdoor cases.
A) Multi-view cost aggregation
Cost
aggregation
Global
Pose
update
Cost
aggregation
Semi-global
matching

Global
Pose
update
Cost
aggregation
Semi-global
matching

Every re-projection pixel is found by back-projecting a pixel in the reference frame
to a 3D point and re-projecting this 3D point into the measurement frame.
Reference image measurement image
Global
Pose
update
Cost
aggregation
Semi-global
matching

The multiplication with the inverse of the came
ra matrix gives you a ray along which the 3D
point is located.
Depth pixel intensityworld frame rotation matrix
world frame trans. matrix
Global
Pose
update
Cost
aggregation
Semi-global
matching

Reference image measurement image
Global
Pose
update
Cost
aggregation
Semi-global
matching

Global
Pose
update
Cost
aggregation
Semi-global
matching
Δ-Δ
Huber loss

Huber loss
Δ-Δ
Global
Pose
update
Cost
aggregation
Semi-global
matching

B) Semi-global matching
Pixel-wise cost
Smoothness constraints (depth diff = 1)
Smoothness constraints (depth diff > 1)
Global minimization is an NP-complete problem which
cannot be solved in polynomial time.
Global
Pose
update
Cost
aggregation
Semi-global
matching

Global
Pose
update
Cost
aggregation
Semi-global
matching

TSDF (truncated signed distance function)
Each 3D voxel contains
1. TSDF Depth
2. Photometric intensity
3. Confidence weight of the measurements.
C) 3D reconstruction

Experimental results
• Implementation detail
752x480
12Hz
100Hz
160x108
12Hz
3D model constructed by Altizure.com
MAV
Nvidia Tegra X1
4 CPU 256 GPU core

Online SfM service Altizure.com

State estimation results
• The comparison of position, orientation of MSCKF and MSCKF+Model.
(Motion capture Ground truth)

Mapping result
• Depth prior-based (2nd row) has better mapping performance at the
texture-less areas such as the green circular areas.
Motion stereo
Depth prior-based

Object removal (dynamic env.)

Conclusion
1. Global localization and dense mapping utilizing a known 3D texture
d model
2. Combine several techniques to achieve real-time onboard dense S
LAM.
1. Fast model view rendering
2. Image stabilization
3. Edge-based image alignment
4. Global pose fusion
5. Monocular dense mapping

Future work
1. Utilizing larger image to enhance mapping quality
2. Refining 3D model online using onboard visual information to acco
unt for the differences between the model and the environment.

J.Delmerico andD.Scaramuzza, “A Benchmark Comparison of M
onocular Visual-Inertial Odometry Algorithms for Flying Robot
s,” 2018.

(Research Note) Model-Aided Monocular Visual-Inertial State Estimation and Dense Mapping

More Related Content

What's hot (20)

Similar to (Research Note) Model-Aided Monocular Visual-Inertial State Estimation and Dense Mapping (20)

Recently uploaded (20)

(Research Note) Model-Aided Monocular Visual-Inertial State Estimation and Dense Mapping