Multi sensor calibration by deep learning

Multi-Sensor Calibration by
Deep Learning
Yu Huang
Sunnyvale, California
Yu.huang07@gmail.com

Outline
• RegNet: Multimodal Sensor Registration Using Deep Neural Networks
• CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial
Transformer Networks
• RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with
Geometric Deep Learning and Generative Model
• CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional
Neural Network and Geometric Constraints
• LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
• CFNet: LiDAR-Camera Registration Using Calibration Flow Network

RegNet: Multimodal Sensor Registration
Using Deep Neural Networks
• RegNet, the deep CNN to infer a 6 DOF extrinsic calibration between multimodal sensors,
exemplified using a scanning LiDAR and a monocular camera.
• Compared to existing approaches, RegNet casts all 3 conventional calibration steps (feature
extraction, feature matching and global regression) into a single real-time capable CNN.
• It does not require any human interaction and bridges the gap between classical offline and
target-less online calibration approaches as it provides both a stable initial estimation as well as a
continuous online correction of the extrinsic parameters.
• During training, randomly decalibrate our system in order to train RegNet to infer the
correspondence between projected depth measurements and RGB image and finally regress the
extrinsic calibration.
• Additionally, with an iterative execution of multiple CNNs, that are trained on different
magnitudes of decalibration, it compares favorably to state-of-the-art methods in terms of a
mean calibration error of 0.28◦ for the rotational and 6 cm for the translation components even
for large decalibrations up to 1.5 m and 20◦ .

It estimates the calibration btw a depth and an RGB sensor. The depth points are projected on the RGB image using
an initial calibration Hinit. In the 1st and 2nd part of the network, use NiN blocks to extract rich features for
matching. The final part regresses decalibration by gathering global info. using two FCLs. During training φdecalib is
randomly permutated resulting in different projections of the depth points.

CalibNet: Self-Supervised Extrinsic Calibration
using 3D Spatial Transformer Networks
• CalibNet: a self-supervised deep network capable of automatically estimating the 6-DoF rigid
body transformation between a 3D LiDAR and a 2D camera in real-time.
• CalibNet alleviates the need for calibration targets, thereby resulting in significant savings in
calibration efforts.
• During training, the network only takes as input a LiDAR point cloud, the corresponding monocular
image, and the camera calibration matrix K.
• At train time, no impose direct supervision (i.e., no directly regress to the calibration parameters, for
example).
• Instead, train the network to predict calibration parameters that maximize the geometric and
photometric consistency of the input images and point clouds.
• CalibNet learns to iteratively solve the underlying geometric problem and accurately predicts
extrinsic calibration parameters for a wide range of mis-calibrations, without requiring retraining
or domain adaptation.
• Code: https://github.com/epiception/CalibNet.

Input RGB image (a), a raw LiDAR point cloud (b), and outputs a transformation T that best aligns the two inputs. (c)
the colorized point cloud output for a mis-calibrated setup, and (d) the output after calibration using CalibNet.

Network architecture

RGGNet: Tolerance Aware LiDAR-Camera Online Calibration
with Geometric Deep Learning and Generative Model
• With increasing popularity of deep learning (DL), a few recent efforts have
demonstrated the advantages of DL for feature extraction on this task.
• However, their reported performances are not sufficiently satisfying yet.
• One improvement can be the formulation with proper consideration of the
underneath geometry.
• Besides, existing online calibration methods focus on optimizing the
calibration error while overlooking the tolerance within the error bounds.
• To address the research gap, a DL-based LiDAR-camera calibration method,
named as the RGGNet, is proposed by considering the Riemannian
geometry and utilizing a deep generative model to learn an implicit
tolerance model.
• The code available at https://github.com/KleinYuan/RGGNet.

The architecture of the proposed RGGNet.

The architecture of the Tolerance Regularizer. VAE encoders contain four Conv-ReLU-BN layers with
depths as 64/128/64/32, following with a FC-ReLU-BN layer with 1024 units and one FC layer with 200
units; decoders contain two FC-ReLU-BN FC layers with units as 200 and (HW ) 8 , three Deconv-ReLU-BN
layers with depths as 32/64/128 and output Deconv-Sigmoid layer with 5 units. Kernel size of 4 and stride
size of 2 are set for all convolution and deconvolution layers.

The intensity depth map and the point map are
projected by the point clouds with the
calibration parameters (K, Y ). Each feature
map corresponds to an RGB image.
Examples of RGGNet predictions. Reference objects
for calibration are shown in bounding boxes.

One bad prediction example with the same input fromT2b test. a: initial de-calibrated input;
b: the ground-truth; c: the prediction from -RGGNet, yielding a 0.0550 se3 error; d: the
prediction from RGGNet, yielding a 0.060 se3 error. Images are cropped for demonstration.

TABLE I PERFORMANCE COMPARISONS WITH DNN BASED METHODS

CalibRCNN: Calibrating Camera and LiDAR by Recurrent
Convolutional Neural Network and Geometric Constraints
• Calibration Recurrent Convolutional Neural Network (CalibRCNN) to infer a 6
degrees of freedom (DOF) rigid body transformation between 3D LiDAR and 2D
camera.
• Different from the existing methods, 3D-2D CalibRCNN not only uses the LSTM
network to extract the temporal features between 3D point clouds and RGB
images of consecutive frames, but also uses the geometric loss and photometric
loss obtained by the interframe constraint to refine the calibration accuracy of
the predicted transformation parameters.
• The CalibRCNN aims at inferring the correspondence between projected depth
image and RGB image to learn the underlying geometry of 2D-3D calibration.
• Thus, the proposed calibration model achieves a good generalization ability to
adapt to unknown initial calibration error ranges, and other 3D LiDAR and 2D
camera pairs with different intrinsic parameters from the training dataset.

Architecture of CalibRCNN. (a) the RGB images of consecutive frames; (b) depth maps of
consecutive frames generated from the mis-calibrated point clouds. By projecting 3D point
cloud to the corresponding 2D RGB image we can obtain images similar to (c) and (d), where
the color of the projected point represents its depth value. (c) is a 3D-2D projection image
converted by mis-calibration parameters, while (d) is the projection image after calibration
using our network. Red rectangles show the difference before and after calibration.

Architecture of the proposed Calibration Recurrent Convolutional Neural Network.

Pose transformation relationship diagram of
continuous frames of LiDAR and camera data

LCCNet: LiDAR and Camera Self-Calibration using
Cost Volume Network
• It is online LiDAR-Camera Self-calibration Network (LCCNet), different from the previous CNN-based
methods.
• LCCNet can be trained end-to-end and predict the extrinsic parameters in real-time.
• In the LCCNet, exploit the cost volume layer to express the correlation between the RGB image features and
the depth image projected from point clouds.
• Besides using the smooth L1-Loss of the predicted extrinsic calibration parameters as a supervised signal, an
additional self-supervised signal, point cloud distance loss, is applied during training.
• Instead of directly regressing the extrinsic parameters, predict the decalibrated deviation from initial
calibration to the ground truth.
• The calibration error decreases further with iterative refinement and the temporal filtering approach in the
inference stage.
• The execution time of the calibration process is 24ms for each iteration on a single GPU.
• LCCNet achieves a mean absolute calibration error of 0.297cm in translation and 0.017◦ in rotation with
miscalibration magnitudes of up to 1.5m and 20◦ on the KITTI-odometry dataset, which is better than the
state-of-the-art CNN-based calibration methods.
• The code available at https://github.com/LvXudong-HIT/LCCNet.

Cost Volume Network
The proposed LCCNet takes the RGB and the
projecte depth image as inputs to predict the
extrinsic parameters between the LiDAR and
the camera. The point clouds are reprojected b
the predicted extrinsic parameters. The
reprojected depth image and the RGB image
will be the subsequent inputs of the LCCNet
This process is called iterative refinement.
After five time iterative refinements, we obtain
the final extrinsic parameters estimation.

Cost Volume Network
The network takes an RGB image from a calibrated camera and a projected sparse depth image from
a mis-calibrated LiDAR as input. The output is a 6-DoF rigid-body transformation that represents the
deviation between the initial extrinsic and the ground truth extrinsic.

Cost Volume Network
(a) Initial Calibration, (b) Ground truth, (c) Calibration results.

Cost Volume Network

CFNet: LiDAR-Camera Registration Using
Calibration Flow Network
• An online LiDAR-camera extrinsic calibration algorithm that combines
the DL and the geometry methods.
• Define a two-channel image named calibration flow to illustrate the
deviation from the initial projection to the ground truth.
• EPnP algorithm within the RANSAC scheme is applied to estimate the
extrinsic parameters with 2D-3D correspondences constructed by the
calibration flow.
• A semantic initialization with the introduction of instance centroids
(ICs).
• The code available at https://github.com/LvXudong-HIT/CFNet.

(First Row) The 2D instance centroid (2D-
IC) extracted from images. (Second Row)
The 3D instance centroid (3D-IC)
extracted from LiDAR point clouds. (Third
Row) The initial calibration parameters
provided by semantic initialization
(registration of 2DIC and 3D-IC). (Fourth
Row) The final calibration parameters
predicted by CFNet. (Bottom Row) Three-
dimensional map generated by fusing the
sensors using calibration values provided
by our approach.

CFNet, an automatic online extrinsic calibration method that estimates
the transformation parameters between 3D LiDAR and 2D camera.

The Architecture of calibration network CFNet

Multi sensor calibration by deep learning

Multi sensor calibration by deep learning

More Related Content

What's hot (20)

Similar to Multi sensor calibration by deep learning (20)

More from Yu Huang (20)

Recently uploaded (20)

Multi sensor calibration by deep learning