FastCampus 2018 SLAM Workshop

SLAM Workshop
Aug. 11-18, 2018
Dong-Won Shin

• Dong-Won Shin
• Gwangju Institute of Science and Technology
• PhD candidate in Computer Science Field
• SLAM Research Group KR manager
• (blog) dongwonshin.net
• (github) github.com/JustWon
• (E-mail) dongwonshin@gist.ac.kr
Speaker
2

• 1. Simultaneous Localization and Mapping
• 2. Brief review on SLAM: a historical perspective
• 3. Sparse SLAM
• 4. Lidar SLAM
• 5. Direct method
• 6. Dense SLAM
• (extra) Structure from Motion
• 7. SLAM research & job trend in CVPR 2018
• 8. Useful resources
Table of Contents
3

• What you will learn?
• What is SLAM and where can it be used?
• What is the underlying algorithms of the SLAM framework?
• How is it actually written in the code?
• Who is intended for?
• People who wants to run Visual SLAM directly
• People who wants to use Visual SLAM for their research projects
• People who wants to study on Visual SLAM
• Convention in This Presentation
• [Something]: code review in the code diagram
• Something: important keyword, detailed explanation from the next slide or below
Preface
4

1. Simultaneous
Localization and Mapping
5

• What is SLAM?
• Computational problem of constructing a map of an environment
while simultaneously keeping track of a robot’s location
• Application
Simultaneous Localization and Mapping
Augmented reality Virtual reality Robotics
indoor outdoor
https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping

• Visual localization
• Under the inaccurate GPS
• GPS-denied environment
• Ex)
• Mapping
• Scenarios in which a prior map is not available and needs to be built.
• Map can inform path planning or provide an intuitive visualization for a human or robot.
• Ex)
Simultaneous Localization and Mapping
7
C. Cadena et al., “Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age,” IEEE Trans. Robot., vol. 32, no.
6, pp. 1309–1332, 2016.
Indoor environment Skyscraper
Disaster areaPrivate room

• Do autonomous robots really need SLAM?
• Yes, since many applications implicitly or explicitly do require a globally consistent map
• SLAM as a mechanism to compute a sufficient statistic that summarizes all past observati
ons of the robot
• Is SLAM completely solved?
• Not yet, SLAM is such a broad topic that the question is well posed only for a given robot
/environment/performance combination.
• Current SLAM algorithm can be easily induced to fail when either the motion of the robot
or the environment are too challenging
• Fast robot dynamics
• Highly dynamic environments
• Semantic fusion
• Collaborative mapping
Questions
8

Brief Review on SLAM:
a Historical Perspective
9

• Bayesian Filtering based SLAM
• Theoretical fundamental and prototype of traditional
Bayesian filtering based SLAM framework emerged in 1900s.
• Ex) EKF SLAM, FastSLAM
• Visual odometry
• The process of estimating the ego-motion of a robot using
only the input of a single or multiple cameras attached to it.
• Ex) stereo VO, monocular VO
• Structure from motion
• Investigating the problem of recovering relative camera poses
and 3D structure from a set of camera images
• Off-line version of visual SLAM
Earlier Inspirations
10

• The solution to large scale map management
• 1) graph based slam and loop closure detection
• 2) efficient map representation and refinement: sparse, dense, and semi-dense
• Graph based SLAM
• constructing a graph whose nodes represent robot poses or landmarks
• edge between nodes encodes a sensor measurement that constrains connected poses
• Loop closure detection
• Detecting loop closures in a map to give additional constraint for the consistent mapping
Effort towards Large Scale Mapping
11

• Sparse SLAM
• Only use a small selected subset of the pixels (features) from a monocular color camera
• Fast and real time on CPU but it produces a sparse map (point clouds)
• Landmark-based or feature-based representations
• ORB SLAM
• One of the SOTA frameworks in the sparse SLAM category
• Complete SLAM system for monocular camera
• Real-time on standard CPUs in a wide variety of environments
• small hand-held indoors
• drones flying in industrial environments
• cars driving around a city
Modern State of the Art Systems
12

• Dense SLAM
• Use most or all of the pixels in each received frame
• Or use depth images from a depth camera
• It produces a dense map but GPU acceleration is necessary for the real-time operation.
• Volumetric model or surfel-based representations
• InfiniTam
• One of the SOTA frameworks in the Dense SLAM category
• Multi-platform framework for real-time, large-scale depth fusion and tracking
• Densely reconstructed 3D scene
13

• Direct method (semi-dense SLAM)
• Make use of pixel intensities directly
• Enable using all information in the image
• It produces a semi-dense map
• Higher accuracy and robustness in particular even in environments with little keypoints
• LSD SLAM
• Highly cited SLAM framework in the direct method SLAM category
• Large-scale, consistent maps of the environment
• Accurate pose estimation based on direct image alignment
14

• Lidar SLAM
• Make use of the Lidar sensor input for the localization and mapping
• Autonomous driving purpose-oriented in outdoor environment
• Segmap
• Unified approach for Lidar SLAM based on the extraction of segments in 3D point clouds
• Real-time single- and multi-agent systems
15

16
ORB
SLAM
InfiniTam
LSD
SLAM
Segmap

ORB SLAM
19
https://www.youtube.com/watch?v=ufvPS5wJAx0

• Feature-based monocular SLAM system
• operates in real time, in small and large, indoor and outdoor environments
• URL: https://github.com/raulmur/ORB_SLAM2
• Main components
• Tracking
• Local mapping
• Loop closing
• Data structures
• Map points
• Keyframes
• Covisibility graph
• Essential graph
ORB SLAM
20
Map points & keyframes Covisibility graph Essential graph
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” IEEE Trans. Robot., vol. 31, no. 5, pp. 1147–1
163, 2015.

• Map points 𝑝"
• 3D position 𝐗$,"
• Viewing direction 𝐧"
• Representative ORB descriptor 𝐃"
• Keyframes 𝐾"
• Camera pose 𝑻"$
• Camera intrinsics (focal length and principal point)
• All the ORB features extracted in the frame
Data structures
21
Map points & keyframes

• Covisibility graph
• Undirected weighted graph
• Node: keyframe
• Edge: if two keyframes share observations of
the same map points (at least 15)
• Weight 𝜃: the number of common map points
• Essential graph
• Retain all the nodes but less edges
• Subset of edges from the covisibility graph with high covisibility
+ loop closure edges
Data structures
22
Covisibility graph
Essential graph

• ORB feature extraction
• For tracking, mapping, and place recognition tasks
• Robust to rotation and scale
• Good invariance to camera auto-gain and auto-exposure, and illumination changes
• Fast to extract and match allowing for real-time operation
• Show good precision/recall performances in bag-of-word place recognition
• [Frame::Frame]
Tracking
23
E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “ORB: An efficient alternative to SIFT or SURF.,” ICCV, pp. 2564–2571, Jan. 2011.
Example of orb features

• Initial pose estimation
• Case 1: if tracking was successful for the last frame
• constant velocity motion model
• Bundle adjustment in two view case
• [Tracking::TrackWithMotionModel]
Tracking
24
Minimizing the reprojection error
https://cs.nyu.edu/~fergus/teaching/vision/11_12_multiview.pdf
t1
t2
t3

• Initial pose estimation
• Case 2: if the tracking is lost
• global relocalization
• Bag-of-visual-words
• Converting an image into a representative descriptor
• Perspective-N-Point problem
• Finding 2D—3D correspondences
• [Tracking::Relocalization]
Tracking
25

• Training step
• Visual vocabulary is first created by clustering a large number of keypoint descriptors
whose cluster centres from the visual words of the vocabulary.
• Estimation step
• The local keypoints of a given image are first detected and described.
• Each descriptor is vector-quantized.
• The histogram of the vector-quantized keypoint descriptors is used as the image
descriptor.
Bag-of-Visual-Words
26
D. Gálvez-López and J. D. Tardós, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197, 2012.

• Perspective-n-Point Problem
• 2D—3D correspondences
• To determine the position and orientation of a camera
• Given its intrinsic parameters and
• a set of n correspondences between 3D points and their 2D projections
• Procedure
• (1) Estimate point clouds from 2D projections
• (2) Compute the transformation between the estimated point clouds and the given point clouds
Perspective-N-Point Problem
27
V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis., vol. 81, no. 2, pp. 155–166, 2009.

• Single view geometry
• Mathematical relationship between the coordinates of a point in 3D space and its pr
ojection onto the image plane
Pinhole Camera Model
28
A. Hartley, R.I. and Zisserman, Multiple View Geometry in Computer Vision. 2003.

• Barycentric coordinate property
• A 3D point can be represented by a weighted sum of 4 control points.
• 2D projections of the point
29
https://en.wikipedia.org/wiki/Barycentric_coordinate_system
= Mx=0

• The solution can be expressed as a linear combination of 𝐯" (the null eigenvectors of M-M.)
• Now, we know the positions of 4 control points.
• Then, we can compute the 3D reprojected points of the 2D projection points.
• (2) Compute the transformation
• between the estimated point clouds and the given point clouds
• Via the point-to-point iterative closest points method
30

Iterative Closest Points (ICP)
31
• Widely used for geometric alignment of three-dimensional models
• Start with two meshes and an initial guess for their relative rigid-body transform
• Refine the transform by repeatedly generating pairs of corresponding points
• Related works
• Point to Point
• Point to Plane
• …

Point-to-Point ICP
32
• Original problem
• Centroid
• Decoupling the translation
min
1,2
3 𝑅 𝑝"
6
+ 𝑝̅ + 𝑡 − (𝑞"
6
+ 𝑞=) ?
@
"AB
min
1,2
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑡 − 𝑞"
6
− 𝑞= ?
@
"AB
min
1,2
3 (𝑅𝑝" + 𝑡) − 𝑞"
?
@
"AB
𝑝̅ =
1
𝑛
3 𝑝"
@
"AB
𝑝"
6
= 𝑝" − 𝑝̅
Source :
Target : 𝑞= =
1
𝑛
3 𝑞"
@
"AB
𝑞"
6
= 𝑞" − 𝑞=
𝑝" = 𝑝"
6
+ 𝑝̅
𝑞" = 𝑞"
6
+ 𝑞=
𝑡 = 𝑞 F − 𝑅𝑝̅Let’s assume
min
1
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑞 F − 𝑅𝑝̅ − 𝑞"
6
− 𝑞= ?
@
"AB
min
1
3 𝑅𝑝"
6
− 𝑞"
6 ?
@
"AB
P. J. Besl and N. D. McKay, “A Method for Registration of 3-D Shapes.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 239–256, Jan. 1992.

• Dual problem
• Solving the cost function
Point-to-Point ICP
33
It does not depend
on the rotation
Maximize this term would
minimize the entire cost
M = 3 𝑝"
6
G 𝑞"
6
@
"AB
R = UV-
𝑡 = 𝑞 F − 𝑅𝑝̅
https://cs.gmu.edu/~kosecka/cs685/cs685-icp.pdf
,where U and V are left and right eigenvectors of matrix M

• Track local map
• Map point filtering
• (1) compute the map point projection x in the current frame.
Discard if it lays out of the image bounds.
• (2) compute the angle between the current viewing ray v and the map point viewing direction n
Discard if v G n < cos 60°
• (3) compute the distance d from map point to camera center.
Discard if it is out of the scale invariance region of the map point d 𝑑 ∉ 𝑑U"@, 𝑑UVW
• Matching
• (4) compare the representative descriptor D of the map point with the still unmatched
ORB features in the frame and associate the map point with the best match
• Refinement
• (5) Perform the bundle adjustment for the reserved map points and keyframes
• [Tracking::TrackLocalMap]
Tracking
34
𝑓W 0 𝑐W
0 𝑓Z 𝑐Z
0 0 1
𝑥
𝑦
𝑧
=
𝑢
𝑣
𝑤
→
𝑢/𝑤
𝑣/𝑤
1
Projection

• New keyframe decision
• (condition 1) Good relocalization
• more than MAX frames have passed from the last keyframe insertion
• (condition 2) Idle case
• local mapping is idle AND
• more than MIN frames have passed from the last keyframe insertion
• (condition 3) Visual change
• current frame tracks less than 90% points than 𝐾cde
• ((condition 1) || (condition 2)) && (condition 3)
• [Tracking::NeedNewKeyFrame]
Tracking
35

• Keyframe insertion
• Compute the bags of words representation
• Update the covisibility graph
• [LocalMapping::ProcessNewKeyFrame]
• New map point creation
• Finding the matches by the epipolar geometry
• Initial map point creation via triangulation
• Consistency check
• Parallax
• Positive depth in both cameras
• Reprojection error
• Scale consistency check
• [LocalMapping::CreateNewMapPoints]
Local Mapping
36

• Geometry of stereo vision (two view geometry)
Epipolar Geometry
37
epipolar line
epipole
c c’
epipolar plane
https://web.stanford.edu/class/cs231a/course_notes/03-epipolar-geometry.pdf

• Algebraic representation of epipolar geometry
• Properties of the fundamental matrix
• Point correspondence
• If x and x’ are corresponding image points, then 𝐱6-
𝐅𝐱 = 0
• Epipolar lines
• 𝐥′ = 𝐅𝐱 is the epipolar line corresponding to x
• 𝐥 = 𝐅 𝐓 𝐱′ is the epipolar line corresponding to x’
Fundamental Matrix
38
F

• Specialization of the fundamental matrix to the case of normalized image coordinates
• Properties of the essential matrix
• In case of normalized images coordinates,
it has the same properties with the fundamental matrix.
• 𝐄 = 𝐊6𝐓
𝐅𝐊
• 𝐅 = 𝐊6m𝐓
𝐄𝐊m𝟏
• In case of calibration matrix K is the identity matrix, then E == F
Essential Matrix
39
E
In the normalized image coordinates

• Any two images of the same planar surface are related by a homography
• Properties of homography matrix
• 𝐱′ = 𝐇𝐱
Homography
40

• Generally, rays Càx and C’àx’ will not exactly intersect
• Can solve via SVD, finding a least squares solution to a system of equations
• Procedure
• Given camera projection matrix P, P’, and a correspondences x, x’
• Cross product of x and PX should be zero. (self cross product)
• Create matrix A
• [U, S, V] = svd(A)
• X = V(:, end)
Triangulation
41
X
x x'
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
1
v
u
wx
ú
ú
ú
û
ù
ê
ê
ê
ë
é
¢
¢
=¢
1
v
u
wx
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
T
T
T
3
2
1
p
p
p
P
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
¢-¢¢
¢-¢¢
-
-
=
TT
TT
TT
TT
v
u
v
u
23
13
23
13
pp
pp
pp
pp
A
ú
ú
ú
û
ù
ê
ê
ê
ë
é
¢
¢
¢
=¢
T
T
T
3
2
1
p
p
p
P
https://en.wikipedia.org/wiki/Cross_product

• Local keyframe culling
• To maintain a compact reconstruction, detect redundant keyframes and delete them
• Redundency check
• Keyframe whose 90% of the map points have been seen in at least other three keyframes
• [LocalMapping::KeyFrameCulling]
• Recent map point culling
• To retain the compact map
• Association check
• The tracking must find the point in more than the 25% of the frames in which it is predicted to
be visible.
• If more than one keyframe has passed from map point creation,
it must be observed from at least three keyframes.
• The points can be removed if at any time it is observed from less than three keyframes.
• [LocalMapping::MapPointCulling]
Local Mapping
42
(# frames the matching is found)
(# frames the map point is visible)
> 0.25

• Local bundle adjustment
• Local map points
• K" : Currently processed keyframe
• Ku : All the keyframes connected to it in the covisibility graph
• All the map points seen by those keyframes
• [Optimizer::LocalBundleAdjustment]
Local Mapping
43

• Loop candidates detection
• All those keyframes directly connected to the current keyframe are discarded
• Compute the similarity between the bag of words vector
• Current keyframe and all its neighbors in the covisibility graph
• Query the recognition DB
• Discard all those keyframes whose score is lower than minimum
• [KeyFrameDatabase::DetectLoopCandidates]
• Compute the similarity transformation
• Similarity transform between current keyframe and the loop closure candidate
• Given a number of points in two different Cartesian coordinate systems,
recovering the transformation and scale between the two systems
• [LoopClosing::ComputeSim3]
Loop Closing
44

Similarity Transform
45
• Original problem
• Centroid
• Decoupling the translation
min
1,2
3 𝑅 𝑝"
6
+ 𝑝̅ + 𝑡 − (𝑞"
6
+ 𝑞=) ?
@
"AB
min
1,2
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑡 − 𝑞"
6
− 𝑞= ?
@
"AB
min
1,2
3 (𝑅𝑝" + 𝑡) − 𝑞"
?
@
"AB
𝑝̅ =
1
𝑛
3 𝑝"
@
"AB
𝑝"
6
= 𝑝" − 𝑝̅
Source :
Target : 𝑞= =
1
𝑛
3 𝑞"
@
"AB
𝑞"
6
= 𝑞" − 𝑞=
𝑝" = 𝑝"
6
+ 𝑝̅
𝑞" = 𝑞"
6
+ 𝑞=
𝑡 = 𝑞 F − 𝑅𝑝̅Let’s assume
min
1
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑞 F − 𝑅𝑝̅ − 𝑞"
6
− 𝑞= ?
@
"AB
min
1
3 𝑅𝑝"
6
− 𝑞"
6 ?
@
"AB
B. K. P. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, Apr. 1987.
http://web.cs.iastate.edu/~cs577/handouts/quaternion.pdf

• Dual problem
• Solving by the quaternion
46
It does not depend
on the rotation
Maximize this term would
minimize the entire cost
=
Property of
quaternion product

• Eigenvector of the highest eigenvalue of N is the solution
• Quaternion to rotation
• Calculate the translation
• Calculate the scale
47
N
𝑡 = 𝑞 F − 𝑅𝑝̅
𝑠 = (∑ 𝑞"
6
G 𝑅𝑝"
6
)@
"AB / (∑ ||𝑅𝑝"
6@
"AB ||?)
,where
𝑝"B
6
𝑞"B
6
+ 𝑝"?
6
𝑞"?
6
+ 𝑝"y
6
𝑞"y
6
𝑝"?
6
𝑞"y
6
− 𝑝"y
6
𝑞"?
6
𝑝"y
6
𝑞"B
6
− 𝑝"B
6
𝑞"y
6
𝑝"B
6
𝑞"?
6
− 𝑝"?
6
𝑞"B
6
𝑝"?
6
𝑞"y
6
− 𝑝"y
6
𝑞"?
6
𝑝"B
6
𝑞"B
6
− 𝑝"?
6
𝑞"?
6
− 𝑝"y
6
𝑞"y
6
𝑝"B
6
𝑞"?
6
+ 𝑝"?
6
𝑞"B
6
𝑝"y
6
𝑞"B
6
+ 𝑝"B
6
𝑞"y
6
𝑝"y
6
𝑞"B
6
− 𝑝"B
6
𝑞"y
6
𝑝"B
6
𝑞"?
6
+ 𝑝"?
6
𝑞"B
6
−𝑝"B
6
𝑞"B
6
+ 𝑝"?
6
𝑞"?
6
− 𝑝"y
6
𝑞"y
6
𝑝"?
6
𝑞"y
6
+ 𝑝"y
6
𝑞"?
6
𝑝"B
6
𝑞"?
6
− 𝑝"?
6
𝑞"B
6
𝑝"y
6
𝑞"B
6
+ 𝑝"B
6
𝑞"y
6
𝑝"?
6
𝑞"y
6
+ 𝑝"y
6
𝑞"?
6
−𝑝"B
6
𝑞"B
6
− 𝑝"?
6
𝑞"?
6
+ 𝑝"y
6
𝑞"y
6
N=
𝑝"B
6
𝑞"B
6
𝑝"B
6
𝑞"?
6
𝑝"B
6
𝑞"y
6
𝑝"?
6
𝑞"B
6
𝑝"?
6
𝑞"?
6
𝑝"?
6
𝑞"y
6
𝑝"y
6
𝑞"B
6
𝑝"y
6
𝑞"?
6
𝑝"y
6
𝑞"y
6
M= 𝑝"
6
𝑞"
6z
=

• Loop fusion
• When the loop closure detection is triggered, the loop fusion is performed.
• Keyframe and map point correction
• Insert loop edges in the covisibility graph
• Current keyframe pose is corrected with the similarity transformation that we obtained
• Fuse duplicated map points
• [LoopClosing::CorrectLoop]
• Essential graph optimization
• For all keyframes and all map points
• [Optimizer::GlobalBundleAdjustment]
Loop Closing
48

Applications
49
• AR
https://www.youtube.com/watch?v=kPwy8yA4CKM

Applications
50
• VR
https://www.youtube.com/watch?v=aVdWED6kfKc

SegMap
53
https://www.youtube.com/watch?v=cHfs3HLzc2Y

• Incremental method for localization in 3D point clouds based on segment matching
• URL: https://github.com/ethz-asl/segmap
• Main components
• Front-end
• Sequential factors
• Place recognition factors
• Back-end
• Pose graph optimization
SegMap
54

• The front-end is responsible for
• Sensor measurements into sequential factors
• Segmentation and description for place recognition factors
• Sequential factors
• Odometry factors
• Displacement between consecutive robot poses from IMU data
• Scan-matching factors
• Registering the current scan against a submap by point-to-plane ICP
• [SegMapper::SegMapper]
• SegMatch: segment based loop-closure for 3D point clouds
• [SegMapper::segMatchThread]
Front-End
55

• Minimize a perpendicular distance from the source point to tangent plane of
destination point
• Nonlinear least square algorithm using Levenberg-Marquardt method
Point-to-Plane ICP
56
K. Low, “Linear Least-squares Optimization for Point-to-plane ICP Surface Registration,” Chapel Hill, Univ. North Carolina, no. February, pp. 2–4, 2004.
𝑠" = (𝑠"W, 𝑠"Z, 𝑠"{, 1)z
𝑑" = (𝑑"W, 𝑑"Z, 𝑑"{, 1)z
𝑛" = (𝑛"W, 𝑛"Z, 𝑛"{, 0)z
Source point s
Destination point d
Unit normal vector at d

Point-to-Plane ICP
• Transformation matrix M
• Least square problem
• 6-DOF (𝛼, 𝛽, 𝛾, 𝑡W, 𝑡Z, 𝑡{)
• However, In case of 𝛼, 𝛽, 𝛾, it is a nonlinear trigonometric function
• Linear approximation is needed
57
where
𝑀€•2 = arg min
…
3((𝑀 G 𝑠" − 𝑑") G 𝑛")?
"

Point-to-Plane ICP
• Approximated Transformation Matrix 𝐌ˆ
• Linearized expression for the i-th correspondence
58

Point-to-Plane ICP
• Expand to N correspondences
• Modified form to the general least square problem
• Optimum solution 𝑥€•2
• Iteratively perform the SVD optimization until it converges
• [PointToPlaneWithCovErrorMinimizer::compute]
59
where
= 𝐴z 𝐴 mB 𝐴z 𝑏

• SegMatch: segment based loop-closure for 3D point clouds
• Four different modules
• Point cloud segmentation
• Descriptor extraction
• Segment matching
• Geometric verification
• [SegMapper::segMatchThread]
Front-End
60

• Point cloud segmentation
• Incremental region growing policy
• CanGrowTo
• If growing from a seed to a neighbor is allowed
• Check the angle between seed and candidate normals
• LinkClusters
• Link clusters if they have the same cluster id
• CanBeSeed
• If a point can be used as seed
• Check the curvature at a point
• [growRegionFromSeed]
SegMatch
61
Region growing Cluster merging

• Descriptor extraction
• For compressing the raw segment data and build object signatures
• Diverse segment descriptors
• Eigenvalue based
• Eigenvalues for the segment are computed and combined in a 7 dim feature vector
• Linearity, planarity, scattering, omnivariance, anisotropy, eigenentropy, change of curvature
• [EigenvalueBasedDescriptor::describe]
SegMatch
62
M. Weinmann, B. Jutzi, and C. Mallet, “Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant fea
tures,” ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., vol. II-3, no. September, pp. 181–188, 2014.
Eigenvalue-based 3D features

• Diverse segment descriptors
• Auto-encoder based
• Input: 3D binary voxel grid of fixed dimension 32x32x16
• Descriptor extractor part
• 3D convolutional layers with max pool layers and two Fully connected layers
• Rectified linear activation function for all layers
• Output: 64x1 descriptor
• Reconstruction part
• One fully connected layer and three deconvolutional layers with a final sigmoid output
• Output: reconstructed 3D binary voxel grid
• Loss
• Classification loss: softmax cross entropy loss
• Reconstruction loss: binary cross entropy loss
SegMatch
63
R. Dubé, A. Cramariuc, D. Dugas, J. Nieto, R. Siegwart, and C. Cadena, “SegMap: 3D Segment Mapping using Data-Driven Descriptors,” 2018.

• Matching
• K-nearest neighbor search in the descriptor space by k-d tree
• Construction of k-d trees
SegMatch
64
5
2 7
4 3 9 6
1 0 8

• Matching
• Searching k-nearest neighbors
• Ex) Query point: p1
• Time complexity: O(log n)
• [findCandidates]
SegMatch
65
5
2 7
4 3 9 6
1 0 8

• Consistency graph G=(V,E)
• Vertex V = {𝑐"}, the set of correspondences 𝑐"
• Edge E = {𝑒"Œ} the set of undirected edges 𝑒"Œ connecting all consistent pairs of correspo
ndences (𝑐", 𝑐Œ)
• Geometrically consistent
• If the difference of the Euclidean distance between the segment centroid in the local map and i
n the target map is less than a threshold
• Identifying a maximum geometrically consistent set == finding a maximum clique of G
• [Segmatch::recognize]
SegMatch
66
|𝑑•(𝑐", 𝑐Œ) − 𝑑2(𝑐", 𝑐Œ)| ≤ 𝜖

• Illustration
Consistency Graph
67
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c2
c3 c5
c6
c7
c8
c9
c10 c11

Back-End
• Pose graph optimization by factor graphs (iSAM library)
• Factor graphs
• Graphical models that are well suited to modeling complex estimation problems
• Variables
• Unknown random variables in the estimation problem
• Robot poses
• Factors
• Probabilistic information on those variables, derived from measurements or prior knowledge
• Odometry, loop closure constraints
• [IncrementalEstimator::estimate]
68

• Autonomous vehicles
Applications
69 https://www.youtube.com/watch?v=gEy91PGGLR0&feature=share

Epipolar Line in Various Cases
73
• Motion parallel with image plane • Forward motion
• Conversing case
http://people.scs.carleton.ca/~c_shu/Courses/comp4900d/notes/epipolar.pdf

• 회전을 해도 세그먼트 간의 거리는 유지되기 때문에 잘 동작할것으로 판단
• 하지만 아주 특수한 경우에 타겟맵의 세그먼트가 로컬맵의 회전한것과
아주 동일하게 구성되어 있으면 실패할 가능성
Consistency Graph with Severe Rotation
74
0 deg 180 deg 90 deg

LSD-SLAM
76
https://www.youtube.com/watch?v=GnuQzP3gty4

• Large scale direct (feature-less) monocular SLAM algorithm
• URL: https://github.com/tum-vision/lsd_slam
• Main components
• Tracking
• Depth map estimation
• Map optimization
• Key features
• Novel direct tracking method
• Elegant probabilistic solution to include the effect of noisy depth values into tracking
• Large-scale and real time operation on a CPU
LSD-SLAM
77

Feature-based VS Direct Method
78
Feature-based Direct method

• Manifold
• Mathematical space that is not necessarily Euclidean on a global scale,
but can be seen as Euclidean on a local scale
• Why use the manifold for robotics or computer vision?
• The translation clearly forms Euclidean space,
while the rotational components span over the non-Euclidean 3D rotation group
• Lie group SE(3) <-> Lie algebra se(3)
Optimization on the Manifold
79
Angular velocity
Linear velocity
𝝃 =
𝑣W
𝑣Z
𝑣{
𝜔W
𝜔Z
𝜔{
𝐺(𝝃) =
𝑟BB 𝑟B? 𝑟By 𝑇W
𝑟?B 𝑟?? 𝑟?y 𝑇Z
𝑟yB 𝑟y? 𝑟yy 𝑌{
0 0 0 1
G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based SLAM,” IEEE Intell. Transp. Syst. Mag., vol. 2, no. 4, pp. 31–43, 2010.
Section 10.3.3, B. Claraco, “A tutorial on SE ( 3 ) transformation parameterizations and on-manifold optimization,” no. 3, 2018.
Exponential map
Logarithm map

• Relative 3D pose
• between existing keyframe K" = (𝐼", 𝐷", 𝑉") and a new image I"
• [SE3Tracker::trackFrame]
Tracking
80
3D projective warp function
photometric residual
photometric variance

• Minimizing the intensity error between two images
• Photometric residual 𝑟• 𝐩, 𝝃Œ" ≔ 𝐼" 𝐩 − 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ"
• In order to apply Gauss-Newton minimization, Jacobian is necessary.
• n-th delta for the parameter 𝛿𝝃(𝒏)
= − 𝐽z
𝐽 mB
𝑟(𝝃 𝒏
)
• The new estimate is obtained by multiplication with the computed update
• 𝝃(𝒏Ÿ𝟏) = 𝛿𝝃(𝒏) ∘ 𝝃(𝒏)
• Jacobian is calculated from 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" for parameter 𝝃Œ"
• More specifically, 𝐼Œ 𝜋 𝑔 𝐩, 𝐷" 𝐩 , 𝐺 𝝃Œ"
• Therefore, it can be decomposed into a product of Jacobians,
• 𝐽£¤
𝝃Œ" = 𝐽¥¦
𝐽§ 𝐽¨ 𝐽©
• Relationship among functions
Photometric Residual
81
𝐼 𝜋 𝑔 𝐺 𝝃Lie algebra to
Lie group
Rigid body transformProjectionPixel intensity

Chain of Jacobians
82
=
=
Capital Small

Chain of Jacobians
83
Section 10.3.3, B. Claraco, “A tutorial on SE ( 3 ) transformation parameterizations and on-manifold optimization,” no. 3, 2018.

Final Jacobian
84

Final Jacobian =
[SE3Tracker::calculateWarpUpdate]𝐽£¤
𝝃Œ" = 𝐽¥¦
𝐽§ 𝐽¨ 𝐽© =

• Photometric residual 𝑟• 𝐩, 𝝃Œ" ≔ 𝐼" 𝐩 − 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ"
• Derivative is calculated from 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" for parameter 𝐷" 𝐩
• More specifically, 𝐼Œ 𝜋 𝑔 𝐩, 𝐷" 𝐩 , 𝐺 𝝃Œ"
• Therefore, it can be decomposed into a product of derivatives,
•
«£¤ 𝐩,𝝃¦¬
«-¬ 𝐩
=
«¥¦
«§
«§
«¨
«¨
«-¬(𝐩)
• = 𝛻𝐼Œ¯ 𝛻𝐼Œ°
B
{
0 −
±
²³
0
B
{
−
´
²³
(𝑇W−𝑋)/𝑑
(𝑇Z−𝑌)/𝑑
(𝑇{−𝑍)/𝑑
• = 𝛻𝐼Œ¯ 𝛻𝐼Œ°
𝑇W − 𝑋 𝑍 − 𝑇{ − 𝑍 𝑋 /𝑍? 𝑑
𝑇Z − 𝑌 𝑍 − 𝑇{ − 𝑍 𝑌 /𝑍? 𝑑
• [SE3Tracker::calcWeightsAndResidual]
Photometric Variance
85
Gaussian image intensity noise inverse depth variance

•
«£¤ 𝐩,𝝃¦¬
«-¬ 𝐩
=
«¥¦
«§
«§
«¨
«¨
«-¬(𝐩)
= 𝛻𝐼Œ¯ 𝛻𝐼Œ°
B
{
0 −
±
²³
0
B
{
−
´
²³
(𝑇W−𝑋)/𝑑
(𝑇Z−𝑌)/𝑑
(𝑇{−𝑍)/𝑑
•
«¨
«-¬(𝐩)
=
Chain of Derivatives
86

• Keyframe selection
• If the camera moves too far away from the existing map,
a new keyframe is created from the most recent tracked image.
• [getRefFrameScore]
Depth Map Estimation
87
not a keyframe
keyframe
keyframe

• Depth map creation
• Depth map for the keyframe is initialized by projecting points from the previous keyframe
• Depth map is scaled to have a mean inverse depth of one
• Procedure
• Compute the epipolar line 𝐥6 = 𝐅𝐱 = 𝐊m𝐓 𝐭 𝐱 𝐑𝐊m𝟏 𝐱
• Geometric and photometric disparity error
• Check the magnitude of the image gradient along the epipolar line
• Check the angle between the image gradient and the epipolar line
• [DepthMap::makeAndCheckEPL]
• Stereo matching
• between current keyframe and the previous keyframe
• using SSD error over five equidistant points on the epipolar line
• [DepthMap::doLineStereo]
Depth Map Estimation
88
J. Engel, J. Sturm, and D. Cremers, “Semi-dense visual odometry for a monocular camera,” in Proceedings of the IEEE International Conference on Computer V
ision, 2013, pp. 1449–1456.

• Constraint search
• Finding the Loop closure candidates via the appearance based method (ex. BoVW)
• Direct image alignment on sim(3)
• Finding the transformation between current keyframe and the loop closure candidates
• Tracking in Sim(3) space
• [Sim3Tracker::trackFrameSim3]
Constraint Acquisition
89

Map Optimization
• Pose graph optimization by general graphs (g2o library)
• General graphs
• Graphical models that are well suited to modeling complex estimation problems
• Nodes
• Unknown random variables in the estimation problem
• Robot poses
• Edges
• Probabilistic information on those variables, derived from measurements or prior knowledge
• Odometry, loop closure constraints
• [SlamSystem::optimizationIteration]
90

• Autonomous drone driving
Applications
91
https://www.youtube.com/watch?v=BLY3kgeZrZg

InfiniTam
94
https://www.youtube.com/watch?v=gmfKzTIKyww

• Real-time, large-scale depth fusion and tracking framework
• URL: https://github.com/victorprad/InfiniTAM
• Main components
• Tracking
• Allocation
• Integration
• Raycasting
• Relocalisation & Loop Closure Detection
• Data structure
• Volumetric representation
• Voxel block hashing
InfiniTam
95
V. A. Prisacariu et al., “InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure,” 2017.

• Volumetric representation using a hash lookup
• Data structures and operations
• Voxel
• Voxel block hash
• Hash table and hashing function
• Hash table operations
• Voxel
• A value on a regular grid in 3D space, the extended concept of 2D pixel
• Widely used for realistic rendering 3D object in computer graphics
• Data
• Truncated signed distance function (TSDF) value
• TSDF Weight
• Color value
• Color weight
Volumetric Representation
96

• Truncated signed distance function (TSDF)
• Predefined 3D volume is subdivided uniformly into a 3D grid of voxels
• These values are positive in-front of the surface and negative behind
• Zero-crossing point means the surface of the object
97

• Voxel block array
• Majority of data stored in the regular voxel grid is marked either as
• Free space
• Unobserved space
• Only store the surface data by efficient hashing scheme
• Grouped voxels in blocks of predefined size (ex. 8x8x8)
• Data
• Positions of the corner of the 8x8x8 voxel block
• Offset in the excess list
• Pointer to the voxel block array
98
M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger, “Real-time 3D reconstruction at scale using voxel hashing,” ACM Trans. Graph., vol. 32, no. 6, pp. 1–11,
2013.

• Hash table
• To quickly and efficiently find the position of a certain voxel block in the voxel block array
• Contiguous array of ITMHashEntry objects
• Hashing function
• For locating entries of the hash table takes the corner coordinates of a 3D voxel block
• Hash collision case
• Use the additional unordered excess list
• Store an offset in the voxel block array
99

• Hash table operations
• Given a target 3D voxel location in world coordinates
• Compute its corresponding voxel block location by dividing the voxel location by the size
of the voxel block array
• Call the hashing function to compute the index of the bucket from the ordered part of th
e hash table
• Retrieval
• Returns the voxel stored at the target location within the block addressed by the hash entry
• Insertion
• Reserves a block inside the voxel block array
100

• To determine the pose of a new camera frame given the 3D world model
• Diverse methods
• Using only the depth
• Inspired by Point-to-Plane ICP
• [class ITMDepthTracker]
• Using only the color
• Inspired by the direct method
• [class ITMColorTracker]
• Using both data
• Utilize both approaches
• [class ITMExtendedTracker]
• Main differences of the extended tracker
• Huber-norm instead of the standard L2 norm
• Error term weighted by its depth measurement
• Tracking failure determination by SVM classifier
Tracking
101

• Huber-norm instead of the standard L2 norm
• Huber-Norm
• The squared loss has the disadvantage that it has the tendency to be dominated by outliers
• Huber-Norm is quadratic for small values of residual, and linear for large values
• Huber-norm in the code
Tracking
102 https://en.wikipedia.org/wiki/Huber_loss
Same

• Error term weighted by its depth measurement
• The error term for each pixel of the depth image is weighted according to its depth meas
urement provided by the sensor.
• The reliability of depth measurement decreases with the increase in distance reading.
• Depth weight in the code
Tracking
103
K. Khoshelham and S. O. Elberink, “Accuracy and resolution of kinect depth data for indoor mapping applications,” Sensors, Jan. 2012.

• Three main stages in allocation
• 1) backproject a line connecting 𝑑 − 𝜇 to 𝑑 + 𝜇
• 𝑑 : depth in image coordinates
• 𝜇: a fixed, tunable parameter
• This leads to a line in world coordinates, which intersects a number of voxel blocks.
• Search the hash table for each of these blocks and look for a free hash entry
• 2) allocate voxel blocks for each non zero entry in the allocation and visibility arrays
• 3) build a list of live hash entries
• [AllocateSceneFromDepth]
Allocation
104
d+μ
d
d-μ
virtual voxel
block grid

• TSDF integration
• If a voxel is behind the surface observed in the new depth image,
the image does not contain any new information about it, and the function returns.
• If the voxel is close to or in front of the observed surface,
a corresponding observation is added to the accumulated sum.
• [IntegrateIntoScene]
Integration
105 S. Izadi, D. Kim, O. Hilliges, and … D. M., “KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera,” in Proceedings of the 24th …, 2
011.

• Motivation
• Depth image is computed from the updated 3D world model given a camera pose
• Input to the tracking step in the next frame and also for visualization.
• Raycasting
• A ray is being cast from the camera up until an intersection with the surface is found
• Checking the value of the TSDF at each voxel along the ray until a zero-crossing is found
• State machine to efficiently handle the sparse volumetric space
• SEARCH_BLOCK_COARSE
• SEARCH_BLOCK_FINE
• SEARCH_SURFACE
• BEHIND_SURFACE
• [castRay]
Raycasting
106

• SEARCH_BLOCK_COARSE state
• Take steps of the size of each block, i.e. 8 voxels
• Until an actually allocated block is encountered
• SEARCH_BLOCK_FINE state
• Once the ray enters an allocated block, step back and enter this state
• The step length is now limited by the truncation band of the SDF.
• SEARCH_SURFACE state
• Once the ray enters a valid block and the values in that block indicate we are still in front of the surface,
the state is changed to SEARCH_SURFACE
• BEHIND_SURFACE state
• Until a negative value is read from the SDF
• This terminates the raycasting iteration and the exact location of the surface is now found.
Raycasting
107
SEARC
H
_BLO
C
K_C
O
ARSE
SEARC
H
_BLO
C
K_FIN
E
SEARC
H
_SU
RFAC
E
BEH
IN
D_SU
RFAC
E

• Keyframe-based random ferns relocaliser
• To relocalise the camera when tracking fails
• To detect loop closures when aiming to construct a globally-consistent scene
• Procedure
• Downsample and preprocess image
• Each of m code blocks is obtained by applying a random fern to I
• Fern is a set of n binary feature tests on the image, each yielding either 0 or 1
• 𝑏º»
¥
∈ 𝐵@
denote the n-bit binary code resulting from applying fern 𝐹¿ to I
• 𝑏À
¥
∈ 𝐵U@
denote the result of concatenating all m such binary codes for I
• Dissimilarity measure between two different images I and J
as the block-wise Hamming distance between 𝑏À
¥
and 𝑏À
Á
:
where is 0 if the two code blocks are identical, and 1 otherwise
Relocalisation & Loop Closure Detection
108

• Caching strategy for the efficient BlockHD computation
109
the number
of ferns
(ex. 500)
the number of code blocks (ex.16)
keyframe ID
similarity
the number of keyframes
fern id
code id
keyframe ID
query
code fragments
keyframes
query frame

• Idea behind the relocaliser
• Encode an RGB-D image I as a set of m binary code blocks, each of length n
• To learn a lookup table from encodings of keyframe images to their known camera poses
• Harvesting keyframes
• If there is no similar keyframe in the keyframe DB, harvest a new keyframe
• Relocalization
• By finding the nearest neighbours of the encoding of the current camera input image in t
his table, and trying to use their recorded pose to restart tracking
• Loop closure detection
• Determine whether the current frame has been seen in the previous frames or not
• [Relocaliser::ProcessFrame]
110

• Submap based approach
• Division of the scene into multiple rigid submaps
• Active submaps: tracked against at each frame
• Passive submaps: maintained, but not tracked against unless they become active again
Globally-Consistent Reconstruction
111
N. Fioraio, J. Taylor, A. Fitzgibbon, L. Di Stefano, and S. Izadi, “Large-scale and drift-free surface reconstruction using online subvolume registration,” 2015 IEEE Co
nf. Comput. Vis. Pattern Recognit., pp. 4475–4483, Jan. 2015.

• Graph representation
• Numerical solution of the equation can be obtained by using popular Gauss-Newton
Recall: Pose Graph Optimization
112
eB? e?y
eyÃ
eÃÄ
eÅB
eÄ?
measurement estimation
G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based SLAM,” IEEE Intell. Transp. Syst. Mag., vol. 2, no. 4, pp. 31–43, 2010.

• First order Taylor expansion
• Substituting the equation
Least Squares Optimization
113
https://en.wikipedia.org/wiki/Linear_approximation

• Rewrite the function F(x)
• Minimizing the quadratic form by,
• The solution is obtained by adding the increments ∆x∗ to the initial guess
Least Squares Optimization
114
where

• The Matrix H and the vector b are obtained by summing up a set of matrices and v
ectors, one for every constraint.
• Every constraint will contribute to the system with an addend term.
• The structure of this addend depends on the Jacobian of the error function.
• Since the error function of a constraint depends only on the values of two nodes, th
e Jacobian has the following form.
Structure of Linearized System
115

Structure of Linearized System
116
• [runGlobalAdjustment]

• AR
Applications
117
https://www.youtube.com/watch?v=gmfKzTIKyww

Applications
118
https://www.youtube.com/watch?v=Qf0KkDEChj4

• 3D reconstruction
• Bundlefusion : http://graphics.stanford.edu/projects/bundlefusion/
Applications
119
https://www.youtube.com/watch?v=keIirXrRb1k

• Reconstructing 3D structure from its projections into a series of images
taken from different viewpoints
• General SfM pipeline
Structure-from-Motion
122
Result of Rome with 21K registered out of 75K images

• Feature extraction
• Features should be invariant under radiometric and geometric changes
• SfM can uniquely recognize them in multiple images.
• Ex) SIFT, SURF, ORB, FAST, BRIEF…
• [RunFeatureExtraction]
• Matching
• Matching the images that see the same scene part
• By leveraging the features as an appearance description of the images.
• Output
• a set of potentially overlapping image pairs
• their associated feature correspondences
• Naïve approach
• Test every image pair for scene overlap
• O(𝑁¥
?
𝑁º¬
?
)
• Scalable and efficient matching is necessary.
• [RunFeatureMatching]
Correspondence Search
123

• Since matching is based solely on appearance,
it is not guaranteed that corresponding features actually map to the same scene point.
• Diverse verification methods
• Projective geometry (pinhole camera model)
• Epipolar geometry (essential matrix, fundamental matrix)
• Homography
• If a valid transformation maps a sufficient number of features between the images,
they are considered geometrically verified.
• RANSAC is required to remove outliers
• Output
• a set of geometrically verified image pairs
• Their associated inlier correspondences
• (optional) a description of their geometric relation
• Scene-graph with images as nodes and verified pairs of images as edges
Correspondence Search
124

• Image registration
• New images can be registered to the current model by solving the PnP problem
• Perspective-n-Point Problem
• 2D—3D correspondences
• To determine the position and orientation of a camera
• Given its intrinsic parameters and a set of n correspondences between 3D points and their 2D projections
• Procedure
• (2) Compute the transformation between the estimated point clouds and the given point clouds
• [EPNPEstimator::ComputePose]
Incremental Reconstruction
125
V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis., vol. 81, no. 2, pp. 155–166, 2009.

• Bundle adjustment
• Joint non-linear refinement of camera parameters 𝑃u and point parameters 𝑋¿
• Minimize the reprojection error
• Levenberg-Marquardt is the method of choice for solving BA problems
• [BundleAdjuster::BundleAdjuster]
Incremental Reconstruction
126

• Previous SfM frameworks (Bundler, VisualSFM)
• Challenges
• the current state-of-the-art SfM algorithms is good, but not enough
• Fail to produce fully satisfactory results in terms of completeness and robustness.
• Problems
• Correspondence search producing an incomplete scene graph
• Reconstruction stage failing to register images due to missing or inaccurate scene
• Symbiotic relationship between image registration and triangulation
• COLMAP
• Novel SfM algorithm containing following contributions to achieve the ultimate goal
• Contributions
• Scene graph augmentation
• Next best view selection maximizing the robustness and accuracy
• Robust and efficient triangulation method
• Iterative BA, re-triangulation, and outlier filtering strategy
COLMAP
127

COLMAP Demo
128
https://www.youtube.com/watch?v=Gb086k7b0wg

• Augmented geometric verification
• The number of inliers for the fundamental matrix 𝑁º
• The number of inliers for the essential matrix 𝑁È
• The number of inliers for the homography 𝑁É
• Check the ratio
ÊË
ÊÌ
,
ÊÍ
ÊË
,
ÊÍ
ÊÌ
and label type of the two view geometry
• If
ÊË
ÊÌ
< 𝜖Èº and
ÊÍ
ÊË
< 𝜖ÉÈ , then “planar or panoramic (pure rotation)”
• If
ÊË
ÊÌ
< 𝜖Èº and
ÊÍ
ÊË
> 𝜖ÉÈ , then “calibrated”
• If
ÊË
ÊÌ
> 𝜖Èº and
ÊÍ
ÊÌ
< 𝜖Éº , then “planar or panoramic (pure rotation)”
• If
ÊË
ÊÌ
> 𝜖Èº and
ÊÍ
ÊÌ
> 𝜖ÉÈ , then “uncalibrated”
• Seed for reconstruction
• Non-panoramic
• Calibrated image pairs
• Do not triangulate
• Panoramic image pairs to avoid degenerate points
• [EstimateInitialTwoViewGeometry]
Scene Graph Augmentation
129

• Frequent problem in Internet photos
• Watermarks, timestamps, and frames (WTF)
• Incorrectly link images of different landmarks
• Two assumptions for WTF
• (1) watermarks and frames always have the exact same appearance
• (2) all WTFs are typically close to the border the image.
• Procedure
• Estimate a translation transformation with 𝑁z inliers at the image borders
• Any image pair with
ÊÎ
ÊÌ
< 𝜖zº is considered a WTF and not inserted to the scene graph
• [DetectWatermark]
Scene Graph Augmentation
130
T. Weyand, C. Y. Tsai, and B. Leibe, “Fixing WTFs: Detecting image matches caused by watermarks, timestamps, and frames in internet photos,” in Proceedings - 2
015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015, 2015, pp. 1185–1192.

• Motivation
• Choosing the next best view is critical,
as every decision impacts the remaining reconstruction.
• A single bad decision may lead to a cascade of camera mis-registrations.
• Diverse strategies
• MAX_VISIBLE_POINTS_NUM
• To choose the image that sees most triangulated points
• MAX_VISIBLE_POINTS_RATIO
• Higher ratio of visible points on observations
• MIN_UNCERTAINTY
• More visible points and a more uniform distribution of points
• [IncrementalMapper::FindNextImages]
Next Best View Selection
131

• Refinement using multiple view triangulation
• [TriangulateMultiViewPoint]
• Cheirality constraint
• Positive depth with respect to the camera views
• [HasPointPositiveDepth]
• Sufficient triangulation angle
• Angle between two rays should be bigger enough.
• [CalculateTriangulationAngle]
Robust and Efficient Triangulation
132

• Before BA: Re-Triangulation
• To improve the completeness of the reconstruction
by continuing the tracks of points that previously failed to triangulate
• continue tracks with observations whose errors are below the filtering thresholds
• [IncrementalTriangulator::Retriangulate]
• After BA: Filtering
• Filter observations with large reprojection errors
• Enforcing a minimum triangulation angle over all pairs of viewing rays
• [Reconstruction::FilterAllPoints3D]
• Iterative refinement
• Perform RT, BA and filtering in an iterative optimization
• until the number of filtered observations and post_BA RT points diminishes.
• [IterativeGlobalRefinement]
Bundle Adjustment
133

• Middlebury Temple
• 312 images
Demo
134

• Middlebury Dino
• 363 images
Demo
135

• Building
• 128 images
Demo
136

SLAM Research & Job Trend
in CVPR 2018
138

• Computer Vision and Pattern Recognition 2018
• Salt Lake City, Utah
• June 18- 22, 2018
• 979 accepted papers
• 6512 registered attendees
CVPR 2018
139

• http://visualslam.ai/
• Deep learning for visual SLAM review
Deep Learning for Visual SLAM
140

• Facebook
• https://www.facebook.com/careers/jobs/?q=slam
Recruitment Expo
141

• Nvidia
• https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/0/refreshFacet/318c8b
b6f553100021d223d9780d30be
Recruitment Expo
142

• Naver Labs
• https://recruit.naverlabs.com/labs/recruitMain
Recruitment Expo
143

• Here
• https://www.here.com/en
Recruitment Expo
144 https://www.youtube.com/watch?v=54J_ZCbeJdc

• DiDi Chuxing
• https://www.didiglobal.com/#/
Recruitment Expo
145

• Vtrus - Robotics scientist , SLAM/Computer Vision
• https://www.vtr.us/
Recruitment Expo
146

• Skydio
• https://www.skydio.com/
Recruitment Expo
147 https://www.youtube.com/watch?v=Gh5pAT1o2V8

• DroneDeploy
• https://www.dronedeploy.com/
Recruitment Expo
148 https://www.youtube.com/watch?v=NS8WLnoFqyE

• Multiple View Geometry in Computer Vision- Richard Hartley and Andrew Zisserman
• Numerical Optimization – Jorge Nocedal and Stephen J. Wright
• Modern C++ Course 2018
• http://www.ipb.uni-bonn.de/teaching/modern-cpp/
• 따라하며 배우는 C++ 2018
• https://www.youtube.com/playlist?list=PLNfg4W25Tapw5Yx4yuExHNybBIUk68aNz
• Photogrammetry (Computer vision)
• http://www.ipb.uni-bonn.de/photogrammetry-i-ii/
• Multiple view geometry - Prof. D. Cremers from TUM
• https://www.youtube.com/playlist?list=PLTBdjV_4f-EJn6udZ34tht9EVIW7lbeo4
Books & Free Lectures
150

• SLAM Research KR
• https://www.facebook.com/groups/slamkr/
• 로열모 (로봇공학을 위한 열린 모임)
• https://www.facebook.com/groups/KoreanRobotics/
• 오로카 (오픈소스 소프트웨어/하드웨어로 만들어가는 로봇 기술 공유 카페)
• https://cafe.naver.com/openrt/10561
Research Groups in SNS
151

FastCampus 2018 SLAM Workshop

More Related Content

What's hot (20)

Similar to FastCampus 2018 SLAM Workshop (20)

Recently uploaded (20)

FastCampus 2018 SLAM Workshop