SlideShare a Scribd company logo
SLAM Workshop
Aug. 11-18, 2018
Dong-Won Shin
• Dong-Won Shin
• Gwangju Institute of Science and Technology
• PhD candidate in Computer Science Field
• SLAM Research Group KR manager
• (blog) dongwonshin.net
• (github) github.com/JustWon
• (E-mail) dongwonshin@gist.ac.kr
Speaker
2
• 1. Simultaneous Localization and Mapping
• 2. Brief review on SLAM: a historical perspective
• 3. Sparse SLAM
• 4. Lidar SLAM
• 5. Direct method
• 6. Dense SLAM
• (extra) Structure from Motion
• 7. SLAM research & job trend in CVPR 2018
• 8. Useful resources
Table of Contents
3
• What you will learn?
• What is SLAM and where can it be used?
• What is the underlying algorithms of the SLAM framework?
• How is it actually written in the code?
• Who is intended for?
• People who wants to run Visual SLAM directly
• People who wants to use Visual SLAM for their research projects
• People who wants to study on Visual SLAM
• Convention in This Presentation
• [Something]: code review in the code diagram
• Something: important keyword, detailed explanation from the next slide or below
Preface
4
1. Simultaneous
Localization and Mapping
5
• What is SLAM?
• Computational problem of constructing a map of an environment
while simultaneously keeping track of a robot’s location
• Application
Simultaneous Localization and Mapping
Augmented	reality Virtual	reality Robotics
indoor outdoor
https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
• Visual localization
• Under the inaccurate GPS
• GPS-denied environment
• Ex)
• Mapping
• Scenarios in which a prior map is not available and needs to be built.
• Map can inform path planning or provide an intuitive visualization for a human or robot.
• Ex)
Simultaneous Localization and Mapping
7
C.	Cadena	et	al.,	“Past,	Present,	and	Future	of	Simultaneous	Localization	And	Mapping:	Towards	the	Robust-Perception	Age,”	IEEE	Trans.	Robot.,	vol.	32,	no.	
6,	pp.	1309–1332,	2016.
Indoor	environment Skyscraper	
Disaster	areaPrivate	room
• Do autonomous robots really need SLAM?
• Yes, since many applications implicitly or explicitly do require a globally consistent map
• SLAM as a mechanism to compute a sufficient statistic that summarizes all past observati
ons of the robot
• Is SLAM completely solved?
• Not yet, SLAM is such a broad topic that the question is well posed only for a given robot
/environment/performance combination.
• Current SLAM algorithm can be easily induced to fail when either the motion of the robot
or the environment are too challenging
• Fast robot dynamics
• Highly dynamic environments
• Semantic fusion
• Collaborative mapping
Questions
8
Brief Review on SLAM:
a Historical Perspective
9
• Bayesian Filtering based SLAM
• Theoretical fundamental and prototype of traditional
Bayesian filtering based SLAM framework emerged in 1900s.
• Ex) EKF SLAM, FastSLAM
• Visual odometry
• The process of estimating the ego-motion of a robot using
only the input of a single or multiple cameras attached to it.
• Ex) stereo VO, monocular VO
• Structure from motion
• Investigating the problem of recovering relative camera poses
and 3D structure from a set of camera images
• Off-line version of visual SLAM
Earlier Inspirations
10
• The solution to large scale map management
• 1) graph based slam and loop closure detection
• 2) efficient map representation and refinement: sparse, dense, and semi-dense
• Graph based SLAM
• constructing a graph whose nodes represent robot poses or landmarks
• edge between nodes encodes a sensor measurement that constrains connected poses
• Loop closure detection
• Detecting loop closures in a map to give additional constraint for the consistent mapping
Effort towards Large Scale Mapping
11
• Sparse SLAM
• Only use a small selected subset of the pixels (features) from a monocular color camera
• Fast and real time on CPU but it produces a sparse map (point clouds)
• Landmark-based or feature-based representations
• ORB SLAM
• One of the SOTA frameworks in the sparse SLAM category
• Complete SLAM system for monocular camera
• Real-time on standard CPUs in a wide variety of environments
• small hand-held indoors
• drones flying in industrial environments
• cars driving around a city
Modern State of the Art Systems
12
• Dense SLAM
• Use most or all of the pixels in each received frame
• Or use depth images from a depth camera
• It produces a dense map but GPU acceleration is necessary for the real-time operation.
• Volumetric model or surfel-based representations
• InfiniTam
• One of the SOTA frameworks in the Dense SLAM category
• Multi-platform framework for real-time, large-scale depth fusion and tracking
• Densely reconstructed 3D scene
Modern State of the Art Systems
13
• Direct method (semi-dense SLAM)
• Make use of pixel intensities directly
• Enable using all information in the image
• It produces a semi-dense map
• Higher accuracy and robustness in particular even in environments with little keypoints
• LSD SLAM
• Highly cited SLAM framework in the direct method SLAM category
• Large-scale, consistent maps of the environment
• Accurate pose estimation based on direct image alignment
Modern State of the Art Systems
14
• Lidar SLAM
• Make use of the Lidar sensor input for the localization and mapping
• Autonomous driving purpose-oriented in outdoor environment
• Segmap
• Unified approach for Lidar SLAM based on the extraction of segments in 3D point clouds
• Real-time single- and multi-agent systems
Modern State of the Art Systems
15
Modern State of the Art Systems
16
ORB
SLAM
InfiniTam
LSD
SLAM
Segmap
Q&A
17
Sparse SLAM
18
ORB SLAM
19
https://www.youtube.com/watch?v=ufvPS5wJAx0
• Feature-based monocular SLAM system
• operates in real time, in small and large, indoor and outdoor environments
• URL: https://github.com/raulmur/ORB_SLAM2
• Main components
• Tracking
• Local mapping
• Loop closing
• Data structures
• Map points
• Keyframes
• Covisibility graph
• Essential graph
ORB SLAM
20
Map	points	&	keyframes Covisibility graph Essential	graph
R.	Mur-Artal,	J.	M.	M.	Montiel,	and	J.	D.	Tardos,	“ORB-SLAM:	A	Versatile	and	Accurate	Monocular	SLAM	System,”	IEEE	Trans.	Robot.,	vol.	31,	no.	5,	pp.	1147–1
163,	2015.
• Map points 𝑝"
• 3D position 𝐗$,"
• Viewing direction 𝐧"
• Representative ORB descriptor 𝐃"
• Keyframes 𝐾"
• Camera pose 𝑻"$
• Camera intrinsics (focal length and principal point)
• All the ORB features extracted in the frame
Data structures
21
Map	points	&	keyframes
• Covisibility graph
• Undirected weighted graph
• Node: keyframe
• Edge: if two keyframes share observations of
the same map points (at least 15)
• Weight 𝜃: the number of common map points
• Essential graph
• Retain all the nodes but less edges
• Subset of edges from the covisibility graph with high covisibility
+ loop closure edges
Data structures
22
Covisibility graph
Essential	graph
• ORB feature extraction
• For tracking, mapping, and place recognition tasks
• Robust to rotation and scale
• Good invariance to camera auto-gain and auto-exposure, and illumination changes
• Fast to extract and match allowing for real-time operation
• Show good precision/recall performances in bag-of-word place recognition
• [Frame::Frame]
Tracking
23
E.	Rublee,	V.	Rabaud,	K.	Konolige,	and	G.	R.	Bradski,	“ORB:	An	efficient	alternative	to	SIFT	or	SURF.,”	ICCV,	pp.	2564–2571,	Jan.	2011.
Example	of	orb	features
• Initial pose estimation
• Case 1: if tracking was successful for the last frame
• constant velocity motion model
• Bundle adjustment in two view case
• [Tracking::TrackWithMotionModel]
Tracking
24
Minimizing	the	reprojection error
https://cs.nyu.edu/~fergus/teaching/vision/11_12_multiview.pdf
t1
t2
t3
• Initial pose estimation
• Case 2: if the tracking is lost
• global relocalization
• Bag-of-visual-words
• Converting an image into a representative descriptor
• Perspective-N-Point problem
• Finding 2D—3D correspondences
• [Tracking::Relocalization]
Tracking
25
• Training step
• Visual vocabulary is first created by clustering a large number of keypoint descriptors
whose cluster centres from the visual words of the vocabulary.
• Estimation step
• The local keypoints of a given image are first detected and described.
• Each descriptor is vector-quantized.
• The histogram of the vector-quantized keypoint descriptors is used as the image
descriptor.
Bag-of-Visual-Words
26
D.	Gálvez-López and	J.	D.	Tardós,	“Bags	of	binary	words	for	fast	place	recognition	in	image	sequences,”	IEEE	Trans.	Robot.,	vol.	28,	no.	5,	pp.	1188–1197,	2012.
• Perspective-n-Point Problem
• 2D—3D correspondences
• To determine the position and orientation of a camera
• Given its intrinsic parameters and
• a set of n correspondences between 3D points and their 2D projections
• Procedure
• (1) Estimate point clouds from 2D projections
• (2) Compute the transformation between the estimated point clouds and the given point clouds
Perspective-N-Point Problem
27
V.	Lepetit,	F.	Moreno-Noguer,	and	P.	Fua,	“EPnP:	An	accurate	O(n)	solution	to	the	PnP	problem,”	Int.	J.	Comput.	Vis.,	vol.	81,	no.	2,	pp.	155–166,	2009.
• Single view geometry
• Mathematical relationship between the coordinates of a point in 3D space and its pr
ojection onto the image plane
Pinhole Camera Model
28
A.	Hartley,	R.I.	and	Zisserman,	Multiple	View	Geometry	in	Computer	Vision.	2003.
• (1) Estimate point clouds from 2D projections
• Barycentric coordinate property
• A 3D point can be represented by a weighted sum of 4 control points.
• 2D projections of the point
Perspective-N-Point Problem
29
https://en.wikipedia.org/wiki/Barycentric_coordinate_system
=	Mx=0
• (1) Estimate point clouds from 2D projections
• The solution can be expressed as a linear combination of 𝐯" (the null eigenvectors of M-M.)
• Now, we know the positions of 4 control points.
• Then, we can compute the 3D reprojected points of the 2D projection points.
• (2) Compute the transformation
• between the estimated point clouds and the given point clouds
• Via the point-to-point iterative closest points method
Perspective-N-Point Problem
30
Iterative Closest Points (ICP)
31
• Widely used for geometric alignment of three-dimensional models
• Start with two meshes and an initial guess for their relative rigid-body transform
• Refine the transform by repeatedly generating pairs of corresponding points
• Related works
• Point to Point
• Point to Plane
• …
Point-to-Point ICP
32
• Original problem
• Centroid
• Decoupling the translation
min
1,2
3 	 𝑅 𝑝"
6
+ 𝑝̅ + 𝑡 − (𝑞"
6
+ 𝑞=) ?
@
"AB
min
1,2
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑡 − 𝑞"
6
− 𝑞= ?
@
"AB
min
1,2
3 (𝑅𝑝" + 𝑡) − 𝑞"
?
@
"AB
𝑝̅ =
1
𝑛
3 𝑝"
@
"AB
𝑝"
6
= 𝑝" − 𝑝̅
Source	:
Target	: 𝑞= =
1
𝑛
3 𝑞"
@
"AB
𝑞"
6
= 𝑞" − 𝑞=
𝑝" = 𝑝"
6
+ 𝑝̅
𝑞" = 𝑞"
6
+ 𝑞=
𝑡 = 𝑞	F − 𝑅𝑝̅Let’s	assume
min
1
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑞	F − 𝑅𝑝̅	 − 𝑞"
6
− 𝑞= ?
@
"AB
min
1
3 𝑅𝑝"
6
− 𝑞"
6 ?
@
"AB
P.	J.	Besl and	N.	D.	McKay,	“A	Method	for	Registration	of	3-D	Shapes.,”	IEEE	Trans.	Pattern	Anal.	Mach.	Intell.,	vol.	14,	no.	2,	pp.	239–256,	Jan.	1992.
• Dual problem
• Solving the cost function
Point-to-Point ICP
33
It	does	not	depend	
on	the	rotation	
Maximize this	term	would	
minimize	the	entire	cost
M = 3 𝑝"
6
G 𝑞"
6
@
"AB
R = UV-
𝑡 = 𝑞	F − 𝑅𝑝̅
https://cs.gmu.edu/~kosecka/cs685/cs685-icp.pdf
,where	U	and	V	are	left	and	right	eigenvectors	of	matrix	M
• Track local map
• Map point filtering
• (1) compute the map point projection x in the current frame.
Discard if it lays out of the image bounds.
• (2) compute the angle between the current viewing ray v and the map point viewing direction n
Discard if v G n < cos 60°
• (3) compute the distance d from map point to camera center.
Discard if it is out of the scale invariance region of the map point d 𝑑 ∉ 𝑑U"@, 𝑑UVW
• Matching
• (4) compare the representative descriptor D of the map point with the still unmatched
ORB features in the frame and associate the map point with the best match
• Refinement
• (5) Perform the bundle adjustment for the reserved map points and keyframes
• [Tracking::TrackLocalMap]
Tracking
34
𝑓W 0 𝑐W
0 𝑓Z 𝑐Z
0 0 1
𝑥
𝑦
𝑧
=
𝑢
𝑣
𝑤
→
𝑢/𝑤
𝑣/𝑤
1
Projection
• New keyframe decision
• (condition 1) Good relocalization
• more than MAX frames have passed from the last keyframe insertion
• (condition 2) Idle case
• local mapping is idle AND
• more than MIN frames have passed from the last keyframe insertion
• (condition 3) Visual change
• current frame tracks less than 90% points than 𝐾cde
• ((condition 1) || (condition 2)) && (condition 3)
• [Tracking::NeedNewKeyFrame]
Tracking
35
• Keyframe insertion
• Compute the bags of words representation
• Update the covisibility graph
• [LocalMapping::ProcessNewKeyFrame]
• New map point creation
• Finding the matches by the epipolar geometry
• Initial map point creation via triangulation
• Consistency check
• Parallax
• Positive depth in both cameras
• Reprojection error
• Scale consistency check
• [LocalMapping::CreateNewMapPoints]
Local Mapping
36
• Geometry of stereo vision (two view geometry)
Epipolar Geometry
37
epipolar line
epipole
c c’
epipolar plane
https://web.stanford.edu/class/cs231a/course_notes/03-epipolar-geometry.pdf
A.	Hartley,	R.I.	and	Zisserman,	Multiple	View	Geometry	in	Computer	Vision.	2003.
• Algebraic representation of epipolar geometry
• Properties of the fundamental matrix
• Point correspondence
• If x and x’ are corresponding image points, then 𝐱6-
𝐅𝐱 = 0
• Epipolar lines
• 𝐥′ = 𝐅𝐱 is the epipolar line corresponding to x
• 𝐥 = 𝐅 𝐓 𝐱′ is the epipolar line corresponding to x’
Fundamental Matrix
38
F
• Specialization of the fundamental matrix to the case of normalized image coordinates
• Properties of the essential matrix
• In case of normalized images coordinates,
it has the same properties with the fundamental matrix.
• 𝐄 = 𝐊6𝐓
𝐅𝐊
• 𝐅 = 𝐊6m𝐓
𝐄𝐊m𝟏
• In case of calibration matrix K is the identity matrix, then E == F
Essential Matrix
39
E
In	the	normalized	image	coordinates
• Any two images of the same planar surface are related by a homography
• Properties of homography matrix
• 𝐱′ = 𝐇𝐱
Homography
40
• Generally, rays Càx and C’àx’ will not exactly intersect
• Can solve via SVD, finding a least squares solution to a system of equations
• Procedure
• Given camera projection matrix P, P’, and a correspondences x, x’
• Cross product of x and PX should be zero. (self cross product)
• Create matrix A
• [U, S, V] = svd(A)
• X = V(:, end)
Triangulation
41
X
x x'
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
1
v
u
wx
ú
ú
ú
û
ù
ê
ê
ê
ë
é
¢
¢
=¢
1
v
u
wx
ú
ú
ú
û
ù
ê
ê
ê
ë
é
=
T
T
T
3
2
1
p
p
p
P
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
¢-¢¢
¢-¢¢
-
-
=
TT
TT
TT
TT
v
u
v
u
23
13
23
13
pp
pp
pp
pp
A
ú
ú
ú
û
ù
ê
ê
ê
ë
é
¢
¢
¢
=¢
T
T
T
3
2
1
p
p
p
P
A.	Hartley,	R.I.	and	Zisserman,	Multiple	View	Geometry	in	Computer	Vision.	2003.
https://en.wikipedia.org/wiki/Cross_product
• Local keyframe culling
• To maintain a compact reconstruction, detect redundant keyframes and delete them
• Redundency check
• Keyframe whose 90% of the map points have been seen in at least other three keyframes
• [LocalMapping::KeyFrameCulling]
• Recent map point culling
• To retain the compact map
• Association check
• The tracking must find the point in more than the 25% of the frames in which it is predicted to
be visible.
• If more than one keyframe has passed from map point creation,
it must be observed from at least three keyframes.
• The points can be removed if at any time it is observed from less than three keyframes.
• [LocalMapping::MapPointCulling]
Local Mapping
42
(#	frames	the	matching	is	found)
(#	frames	the	map	point	is	visible)
> 0.25
• Local bundle adjustment
• Local map points
• K"	: Currently processed keyframe
• Ku	: All the keyframes connected to it in the covisibility graph
• All the map points seen by those keyframes
• [Optimizer::LocalBundleAdjustment]
Local Mapping
43
• Loop candidates detection
• All those keyframes directly connected to the current keyframe are discarded
• Compute the similarity between the bag of words vector
• Current keyframe and all its neighbors in the covisibility graph
• Query the recognition DB
• Discard all those keyframes whose score is lower than minimum
• [KeyFrameDatabase::DetectLoopCandidates]
• Compute the similarity transformation
• Similarity transform between current keyframe and the loop closure candidate
• Given a number of points in two different Cartesian coordinate systems,
recovering the transformation and scale between the two systems
• [LoopClosing::ComputeSim3]
Loop Closing
44
Similarity Transform
45
• Original problem
• Centroid
• Decoupling the translation
min
1,2
3 	 𝑅 𝑝"
6
+ 𝑝̅ + 𝑡 − (𝑞"
6
+ 𝑞=) ?
@
"AB
min
1,2
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑡 − 𝑞"
6
− 𝑞= ?
@
"AB
min
1,2
3 (𝑅𝑝" + 𝑡) − 𝑞"
?
@
"AB
𝑝̅ =
1
𝑛
3 𝑝"
@
"AB
𝑝"
6
= 𝑝" − 𝑝̅
Source	:
Target	: 𝑞= =
1
𝑛
3 𝑞"
@
"AB
𝑞"
6
= 𝑞" − 𝑞=
𝑝" = 𝑝"
6
+ 𝑝̅
𝑞" = 𝑞"
6
+ 𝑞=
𝑡 = 𝑞	F − 𝑅𝑝̅Let’s	assume
min
1
3 𝑅𝑝"
6
+ 𝑅𝑝̅ + 𝑞	F − 𝑅𝑝̅	 − 𝑞"
6
− 𝑞= ?
@
"AB
min
1
3 𝑅𝑝"
6
− 𝑞"
6 ?
@
"AB
B.	K.	P.	Horn,	“Closed-form	solution	of	absolute	orientation	using	unit	quaternions,”	JOSA	A,	vol.	4,	no.	4,	pp.	629–642,	Apr.	1987.
http://web.cs.iastate.edu/~cs577/handouts/quaternion.pdf
• Dual problem
• Solving by the quaternion
Similarity Transform
46
It	does	not	depend	
on	the	rotation	
Maximize this	term	would	
minimize	the	entire	cost
=
Property	of	
quaternion	product
• Eigenvector of the highest eigenvalue of N is the solution
• Quaternion to rotation
• Calculate the translation
• Calculate the scale
Similarity Transform
47
N
𝑡 = 𝑞	F − 𝑅𝑝̅
𝑠 = (∑ 𝑞"
6
G 𝑅𝑝"
6
)@
"AB / (∑ ||𝑅𝑝"
6@
"AB ||?)
,where
𝑝"B
6
𝑞"B
6
+ 𝑝"?
6
𝑞"?
6
+ 𝑝"y
6
𝑞"y
6
𝑝"?
6
𝑞"y
6
− 𝑝"y
6
𝑞"?
6
𝑝"y
6
𝑞"B
6
− 𝑝"B
6
𝑞"y
6
𝑝"B
6
𝑞"?
6
− 𝑝"?
6
𝑞"B
6
𝑝"?
6
𝑞"y
6
− 𝑝"y
6
𝑞"?
6
𝑝"B
6
𝑞"B
6
− 𝑝"?
6
𝑞"?
6
− 𝑝"y
6
𝑞"y
6
𝑝"B
6
𝑞"?
6
+ 𝑝"?
6
𝑞"B
6
𝑝"y
6
𝑞"B
6
+ 𝑝"B
6
𝑞"y
6
𝑝"y
6
𝑞"B
6
− 𝑝"B
6
𝑞"y
6
𝑝"B
6
𝑞"?
6
+ 𝑝"?
6
𝑞"B
6
−𝑝"B
6
𝑞"B
6
+ 𝑝"?
6
𝑞"?
6
− 𝑝"y
6
𝑞"y
6
𝑝"?
6
𝑞"y
6
+ 𝑝"y
6
𝑞"?
6
𝑝"B
6
𝑞"?
6
− 𝑝"?
6
𝑞"B
6
𝑝"y
6
𝑞"B
6
+ 𝑝"B
6
𝑞"y
6
𝑝"?
6
𝑞"y
6
+ 𝑝"y
6
𝑞"?
6
−𝑝"B
6
𝑞"B
6
− 𝑝"?
6
𝑞"?
6
+ 𝑝"y
6
𝑞"y
6
N=
𝑝"B
6
𝑞"B
6
𝑝"B
6
𝑞"?
6
𝑝"B
6
𝑞"y
6
𝑝"?
6
𝑞"B
6
𝑝"?
6
𝑞"?
6
𝑝"?
6
𝑞"y
6
𝑝"y
6
𝑞"B
6
𝑝"y
6
𝑞"?
6
𝑝"y
6
𝑞"y
6
M= 𝑝"
6
𝑞"
6z
=
• Loop fusion
• When the loop closure detection is triggered, the loop fusion is performed.
• Keyframe and map point correction
• Insert loop edges in the covisibility graph
• Current keyframe pose is corrected with the similarity transformation that we obtained
• Fuse duplicated map points
• [LoopClosing::CorrectLoop]
• Essential graph optimization
• For all keyframes and all map points
• [Optimizer::GlobalBundleAdjustment]
Loop Closing
48
Applications
49
• AR
https://www.youtube.com/watch?v=kPwy8yA4CKM
Applications
50
• VR
https://www.youtube.com/watch?v=aVdWED6kfKc
Q&A
51
Lidar SLAM
52
SegMap
53
https://www.youtube.com/watch?v=cHfs3HLzc2Y
• Incremental method for localization in 3D point clouds based on segment matching
• URL: https://github.com/ethz-asl/segmap
• Main components
• Front-end
• Sequential factors
• Place recognition factors
• Back-end
• Pose graph optimization
SegMap
54
• The front-end is responsible for
• Sensor measurements into sequential factors
• Segmentation and description for place recognition factors
• Sequential factors
• Odometry factors
• Displacement between consecutive robot poses from IMU data
• Scan-matching factors
• Registering the current scan against a submap by point-to-plane ICP
• [SegMapper::SegMapper]
• Place recognition factors
• SegMatch: segment based loop-closure for 3D point clouds
• [SegMapper::segMatchThread]
Front-End
55
• Minimize a perpendicular distance from the source point to tangent plane of
destination point
• Nonlinear least square algorithm using Levenberg-Marquardt method
Point-to-Plane ICP
56
K.	Low,	“Linear	Least-squares	Optimization	for	Point-to-plane	ICP	Surface	Registration,”	Chapel	Hill,	Univ.	North	Carolina,	no.	February,	pp.	2–4,	2004.
𝑠" = (𝑠"W, 𝑠"Z, 𝑠"{, 1)z
𝑑" = (𝑑"W, 𝑑"Z, 𝑑"{, 1)z
𝑛" = (𝑛"W, 𝑛"Z, 𝑛"{, 0)z
Source	point	s
Destination	point	d
Unit	normal	vector	at	d
Point-to-Plane ICP
• Transformation matrix M
• Least square problem
• 6-DOF (𝛼, 𝛽, 𝛾, 𝑡W, 𝑡Z, 𝑡{)
• However, In case of 𝛼, 𝛽, 𝛾, it is a nonlinear trigonometric function
• Linear approximation is needed
57
where
𝑀€•2 = arg	min
…
3((𝑀 G 𝑠" − 𝑑") G 𝑛")?
"
Point-to-Plane ICP
• Approximated Transformation Matrix 𝐌ˆ
• Linearized expression for the i-th correspondence
58
Point-to-Plane ICP
• Expand to N correspondences
• Modified form to the general least square problem
• Optimum solution 𝑥€•2
• Iteratively perform the SVD optimization until it converges
• [PointToPlaneWithCovErrorMinimizer::compute]
59
where
=	 𝐴z 𝐴 mB 𝐴z 𝑏
• Place recognition factors
• SegMatch: segment based loop-closure for 3D point clouds
• Four different modules
• Point cloud segmentation
• Descriptor extraction
• Segment matching
• Geometric verification
• [SegMapper::segMatchThread]
Front-End
60
• Point cloud segmentation
• Incremental region growing policy
• CanGrowTo
• If growing from a seed to a neighbor is allowed
• Check the angle between seed and candidate normals
• LinkClusters
• Link clusters if they have the same cluster id
• CanBeSeed
• If a point can be used as seed
• Check the curvature at a point
• [growRegionFromSeed]
SegMatch
61
Region	growing Cluster	merging
• Descriptor extraction
• For compressing the raw segment data and build object signatures
• Diverse segment descriptors
• Eigenvalue based
• Eigenvalues for the segment are computed and combined in a 7 dim feature vector
• Linearity, planarity, scattering, omnivariance, anisotropy, eigenentropy, change of curvature
• [EigenvalueBasedDescriptor::describe]
SegMatch
62
M.	Weinmann,	B.	Jutzi,	and	C.	Mallet,	“Semantic	3D	scene	interpretation:	A	framework	combining	optimal	neighborhood	size	selection	with	relevant	fea
tures,”	ISPRS	Ann.	Photogramm.	Remote	Sens.	Spat.	Inf.	Sci.,	vol.	II-3,	no.	September,	pp.	181–188,	2014.
Eigenvalue-based	3D	features
• Diverse segment descriptors
• Auto-encoder based
• Input: 3D binary voxel grid of fixed dimension 32x32x16
• Descriptor extractor part
• 3D convolutional layers with max pool layers and two Fully connected layers
• Rectified linear activation function for all layers
• Output: 64x1 descriptor
• Reconstruction part
• One fully connected layer and three deconvolutional layers with a final sigmoid output
• Output: reconstructed 3D binary voxel grid
• Loss
• Classification loss: softmax cross entropy loss
• Reconstruction loss: binary cross entropy loss
SegMatch
63
R.	Dubé,	A.	Cramariuc,	D.	Dugas,	J.	Nieto,	R.	Siegwart,	and	C.	Cadena,	“SegMap:	3D	Segment	Mapping	using	Data-Driven	Descriptors,”	2018.
• Matching
• K-nearest neighbor search in the descriptor space by k-d tree
• Construction of k-d trees
SegMatch
64
5
2 7
4 3 9 6
1 0 8
• Matching
• Searching k-nearest neighbors
• Ex) Query point: p1
• Time complexity: O(log n)
• [findCandidates]
SegMatch
65
5
2 7
4 3 9 6
1 0 8
• Geometric verification
• Consistency graph G=(V,E)
• Vertex V = {𝑐"}, the set of correspondences 𝑐"
• Edge E = {𝑒"Œ} the set of undirected edges 𝑒"Œ connecting all consistent pairs of correspo
ndences (𝑐", 𝑐Œ)
• Geometrically consistent
• If the difference of the Euclidean distance between the segment centroid in the local map and i
n the target map is less than a threshold
• Identifying a maximum geometrically consistent set == finding a maximum clique of G
• [Segmatch::recognize]
SegMatch
66
|𝑑•(𝑐", 𝑐Œ) − 𝑑2(𝑐", 𝑐Œ)| ≤ 𝜖
• Illustration
Consistency Graph
67
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c1 c2
c3
c4
c5
c6
c7
c8
c9
c10 c11
c12
Local
map
Target
map
c2
c3 c5
c6
c7
c8
c9
c10 c11
Back-End
• Pose graph optimization by factor graphs (iSAM library)
• Factor graphs
• Graphical models that are well suited to modeling complex estimation problems
• Variables
• Unknown random variables in the estimation problem
• Robot poses
• Factors
• Probabilistic information on those variables, derived from measurements or prior knowledge
• Odometry, loop closure constraints
• [IncrementalEstimator::estimate]
68
• Autonomous vehicles
Applications
69 https://www.youtube.com/watch?v=gEy91PGGLR0&feature=share
Q&A
70
End of First Week
71
Last Questions
72
Epipolar Line in Various Cases
73
• Motion	parallel	with	image	plane • Forward	motion
• Conversing	case
http://people.scs.carleton.ca/~c_shu/Courses/comp4900d/notes/epipolar.pdf
• 회전을 해도 세그먼트 간의 거리는 유지되기 때문에 잘 동작할것으로 판단
• 하지만 아주 특수한 경우에 타겟맵의 세그먼트가 로컬맵의 회전한것과
아주 동일하게 구성되어 있으면 실패할 가능성
Consistency Graph with Severe Rotation
74
0 deg 180 deg 90 deg
Direct Method
75
LSD-SLAM
76
https://www.youtube.com/watch?v=GnuQzP3gty4
• Large scale direct (feature-less) monocular SLAM algorithm
• URL: https://github.com/tum-vision/lsd_slam
• Main components
• Tracking
• Depth map estimation
• Map optimization
• Key features
• Novel direct tracking method
• Elegant probabilistic solution to include the effect of noisy depth values into tracking
• Large-scale and real time operation on a CPU
LSD-SLAM
77
Feature-based VS Direct Method
78
Feature-based Direct	method
• Manifold
• Mathematical space that is not necessarily Euclidean on a global scale,
but can be seen as Euclidean on a local scale
• Why use the manifold for robotics or computer vision?
• The translation clearly forms Euclidean space,
while the rotational components span over the non-Euclidean 3D rotation group
• Lie group SE(3) <-> Lie algebra se(3)
Optimization on the Manifold
79
Angular	velocity
Linear	velocity
𝝃 =
𝑣W
𝑣Z
𝑣{
𝜔W
𝜔Z
𝜔{
𝐺(𝝃) =
𝑟BB 𝑟B? 𝑟By 𝑇W
𝑟?B 𝑟?? 𝑟?y 𝑇Z
𝑟yB 𝑟y? 𝑟yy 𝑌{
0 0 0 1
G.	Grisetti,	R.	Kummerle,	C.	Stachniss,	and	W.	Burgard,	“A	tutorial	on	graph-based	SLAM,”	IEEE	Intell.	Transp.	Syst.	Mag.,	vol.	2,	no.	4,	pp.	31–43,	2010.
Section	10.3.3,	B.	Claraco,	“A	tutorial	on	SE	(	3	)	transformation	parameterizations	and	on-manifold	optimization,”	no.	3,	2018.
Exponential	map
Logarithm	map
• Relative 3D pose
• between existing keyframe K" = (𝐼", 𝐷", 𝑉") and a new image I"
• [SE3Tracker::trackFrame]
Tracking
80
3D	projective	warp	function	
photometric	residual
photometric	variance
• Minimizing the intensity error between two images
• Photometric residual 𝑟• 𝐩, 𝝃Œ" ≔ 𝐼" 𝐩 − 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ"
• In order to apply Gauss-Newton minimization, Jacobian is necessary.
• n-th delta for the parameter 𝛿𝝃(𝒏)
= − 𝐽z
𝐽 mB
𝑟(𝝃 𝒏
)
• The new estimate is obtained by multiplication with the computed update
• 𝝃(𝒏Ÿ𝟏) = 𝛿𝝃(𝒏) ∘ 𝝃(𝒏)
• Jacobian is calculated from 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" for parameter 𝝃Œ"
• More specifically, 𝐼Œ 𝜋 𝑔 𝐩, 𝐷" 𝐩 , 𝐺 𝝃Œ"
• Therefore, it can be decomposed into a product of Jacobians,
• 𝐽£¤
𝝃Œ" = 𝐽¥¦
𝐽§ 𝐽¨ 𝐽©
• Relationship among functions
Photometric Residual
81
𝐼 𝜋 𝑔 𝐺 𝝃Lie	algebra	to
Lie	group
Rigid	body	transformProjectionPixel	intensity
Chain of Jacobians
82
=
=
Capital Small
Chain of Jacobians
83
Section	10.3.3,	B.	Claraco,	“A	tutorial	on	SE	(	3	)	transformation	parameterizations	and	on-manifold	optimization,”	no.	3,	2018.
Final Jacobian
84
							
										
																																													
										
																
										
Final	Jacobian	=	
[SE3Tracker::calculateWarpUpdate]𝐽£¤
𝝃Œ" = 𝐽¥¦
𝐽§ 𝐽¨ 𝐽© =
• Photometric residual 𝑟• 𝐩, 𝝃Œ" ≔ 𝐼" 𝐩 − 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ"
• Derivative is calculated from 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" for parameter 𝐷" 𝐩
• More specifically, 𝐼Œ 𝜋 𝑔 𝐩, 𝐷" 𝐩 , 𝐺 𝝃Œ"
• Therefore, it can be decomposed into a product of derivatives,
•
«£¤ 𝐩,𝝃¦¬
«-¬ 𝐩
=
«¥¦
«§
«§
Ǭ
Ǭ
«-¬(𝐩)
• = 𝛻𝐼Œ¯ 𝛻𝐼Œ°
B
{
0 −
±
²³
0
B
{
−
´
²³
(𝑇W−𝑋)/𝑑
(𝑇Z−𝑌)/𝑑
(𝑇{−𝑍)/𝑑
• = 𝛻𝐼Œ¯ 𝛻𝐼Œ°
𝑇W − 𝑋 𝑍 − 𝑇{ − 𝑍 𝑋 /𝑍? 𝑑
𝑇Z − 𝑌 𝑍 − 𝑇{ − 𝑍 𝑌 /𝑍? 𝑑
• [SE3Tracker::calcWeightsAndResidual]
Photometric Variance
85
Gaussian	image	intensity	noise inverse	depth	variance
•
«£¤ 𝐩,𝝃¦¬
«-¬ 𝐩
=
«¥¦
«§
«§
Ǭ
Ǭ
«-¬(𝐩)
= 𝛻𝐼Œ¯ 𝛻𝐼Œ°
B
{
0 −
±
²³
0
B
{
−
´
²³
(𝑇W−𝑋)/𝑑
(𝑇Z−𝑌)/𝑑
(𝑇{−𝑍)/𝑑
•
Ǭ
«-¬(𝐩)
=
Chain of Derivatives
86
• Keyframe selection
• If the camera moves too far away from the existing map,
a new keyframe is created from the most recent tracked image.
• [getRefFrameScore]
Depth Map Estimation
87
not	a	keyframe
keyframe
keyframe
• Depth map creation
• Depth map for the keyframe is initialized by projecting points from the previous keyframe
• Depth map is scaled to have a mean inverse depth of one
• Procedure
• Compute the epipolar line 𝐥6 = 𝐅𝐱 = 𝐊m𝐓 𝐭 𝐱 𝐑𝐊m𝟏 𝐱
• Geometric and photometric disparity error
• Check the magnitude of the image gradient along the epipolar line
• Check the angle between the image gradient and the epipolar line
• [DepthMap::makeAndCheckEPL]
• Stereo matching
• between current keyframe and the previous keyframe
• using SSD error over five equidistant points on the epipolar line
• [DepthMap::doLineStereo]
Depth Map Estimation
88
J.	Engel,	J.	Sturm,	and	D.	Cremers,	“Semi-dense	visual	odometry for	a	monocular	camera,”	in	Proceedings	of	the	IEEE	International	Conference	on	Computer	V
ision,	2013,	pp.	1449–1456.
• Constraint search
• Finding the Loop closure candidates via the appearance based method (ex. BoVW)
• Direct image alignment on sim(3)
• Finding the transformation between current keyframe and the loop closure candidates
• Tracking in Sim(3) space
• [Sim3Tracker::trackFrameSim3]
Constraint Acquisition
89
Map Optimization
• Pose graph optimization by general graphs (g2o library)
• General graphs
• Graphical models that are well suited to modeling complex estimation problems
• Nodes
• Unknown random variables in the estimation problem
• Robot poses
• Edges
• Probabilistic information on those variables, derived from measurements or prior knowledge
• Odometry, loop closure constraints
• [SlamSystem::optimizationIteration]
90
• Autonomous drone driving
Applications
91
https://www.youtube.com/watch?v=BLY3kgeZrZg
Q&A
92
Dense SLAM
93
InfiniTam
94
https://www.youtube.com/watch?v=gmfKzTIKyww
• Real-time, large-scale depth fusion and tracking framework
• URL: https://github.com/victorprad/InfiniTAM
• Main components
• Tracking
• Allocation
• Integration
• Raycasting
• Relocalisation & Loop Closure Detection
• Data structure
• Volumetric representation
• Voxel block hashing
InfiniTam
95
V.	A.	Prisacariu et	al.,	“InfiniTAM v3:	A	Framework	for	Large-Scale	3D	Reconstruction	with	Loop	Closure,”	2017.
• Volumetric representation using a hash lookup
• Data structures and operations
• Voxel
• Voxel block hash
• Hash table and hashing function
• Hash table operations
• Voxel
• A value on a regular grid in 3D space, the extended concept of 2D pixel
• Widely used for realistic rendering 3D object in computer graphics
• Data
• Truncated signed distance function (TSDF) value
• TSDF Weight
• Color value
• Color weight
Volumetric Representation
96
• Truncated signed distance function (TSDF)
• Predefined 3D volume is subdivided uniformly into a 3D grid of voxels
• These values are positive in-front of the surface and negative behind
• Zero-crossing point means the surface of the object
Volumetric Representation
97
• Voxel block array
• Majority of data stored in the regular voxel grid is marked either as
• Free space
• Unobserved space
• Only store the surface data by efficient hashing scheme
• Grouped voxels in blocks of predefined size (ex. 8x8x8)
• Data
• Positions of the corner of the 8x8x8 voxel block
• Offset in the excess list
• Pointer to the voxel block array
Volumetric Representation
98
M.	Nießner,	M.	Zollhöfer,	S.	Izadi,	and	M.	Stamminger,	“Real-time	3D	reconstruction	at	scale	using	voxel	hashing,”	ACM	Trans.	Graph.,	vol.	32,	no.	6,	pp.	1–11,	
2013.
• Hash table
• To quickly and efficiently find the position of a certain voxel block in the voxel block array
• Contiguous array of ITMHashEntry objects
• Hashing function
• For locating entries of the hash table takes the corner coordinates of a 3D voxel block
• Hash collision case
• Use the additional unordered excess list
• Store an offset in the voxel block array
Volumetric Representation
99
• Hash table operations
• Given a target 3D voxel location in world coordinates
• Compute its corresponding voxel block location by dividing the voxel location by the size
of the voxel block array
• Call the hashing function to compute the index of the bucket from the ordered part of th
e hash table
• Retrieval
• Returns the voxel stored at the target location within the block addressed by the hash entry
• Insertion
• Reserves a block inside the voxel block array
Volumetric Representation
100
• To determine the pose of a new camera frame given the 3D world model
• Diverse methods
• Using only the depth
• Inspired by Point-to-Plane ICP
• [class ITMDepthTracker]
• Using only the color
• Inspired by the direct method
• [class ITMColorTracker]
• Using both data
• Utilize both approaches
• [class ITMExtendedTracker]
• Main differences of the extended tracker
• Huber-norm instead of the standard L2 norm
• Error term weighted by its depth measurement
• Tracking failure determination by SVM classifier
Tracking
101
• Huber-norm instead of the standard L2 norm
• Huber-Norm
• The squared loss has the disadvantage that it has the tendency to be dominated by outliers
• Huber-Norm is quadratic for small values of residual, and linear for large values
• Huber-norm in the code
Tracking
102 https://en.wikipedia.org/wiki/Huber_loss
Same
• Error term weighted by its depth measurement
• The error term for each pixel of the depth image is weighted according to its depth meas
urement provided by the sensor.
• The reliability of depth measurement decreases with the increase in distance reading.
• Depth weight in the code
Tracking
103
K.	Khoshelham and	S.	O.	Elberink,	“Accuracy	and	resolution	of	kinect depth	data	for	indoor	mapping	applications,”	Sensors,	Jan.	2012.
• Three main stages in allocation
• 1) backproject a line connecting 𝑑 − 𝜇 to 𝑑 + 𝜇
• 𝑑	: depth in image coordinates
• 𝜇: a fixed, tunable parameter
• This leads to a line in world coordinates, which intersects a number of voxel blocks.
• Search the hash table for each of these blocks and look for a free hash entry
• 2) allocate voxel blocks for each non zero entry in the allocation and visibility arrays
• 3) build a list of live hash entries
• [AllocateSceneFromDepth]
Allocation
104
d+μ
d
d-μ
virtual voxel
block grid
• TSDF integration
• If a voxel is behind the surface observed in the new depth image,
the image does not contain any new information about it, and the function returns.
• If the voxel is close to or in front of the observed surface,
a corresponding observation is added to the accumulated sum.
• [IntegrateIntoScene]
Integration
105 S.	Izadi,	D.	Kim,	O.	Hilliges,	and	…	D.	M.,	“KinectFusion:	real-time	3D	reconstruction	and	interaction	using	a	moving	depth	camera,”	in	Proceedings	of	the	24th	…,	2
011.
• Motivation
• Depth image is computed from the updated 3D world model given a camera pose
• Input to the tracking step in the next frame and also for visualization.
• Raycasting
• A ray is being cast from the camera up until an intersection with the surface is found
• Checking the value of the TSDF at each voxel along the ray until a zero-crossing is found
• State machine to efficiently handle the sparse volumetric space
• SEARCH_BLOCK_COARSE
• SEARCH_BLOCK_FINE
• SEARCH_SURFACE
• BEHIND_SURFACE
• [castRay]
Raycasting
106
• SEARCH_BLOCK_COARSE state
• Take steps of the size of each block, i.e. 8 voxels
• Until an actually allocated block is encountered
• SEARCH_BLOCK_FINE state
• Once the ray enters an allocated block, step back and enter this state
• The step length is now limited by the truncation band of the SDF.
• SEARCH_SURFACE state
• Once the ray enters a valid block and the values in that block indicate we are still in front of the surface,
the state is changed to SEARCH_SURFACE
• BEHIND_SURFACE state
• Until a negative value is read from the SDF
• This terminates the raycasting iteration and the exact location of the surface is now found.
Raycasting
107
SEARC
H
_BLO
C
K_C
O
ARSE
SEARC
H
_BLO
C
K_FIN
E
SEARC
H
_SU
RFAC
E
BEH
IN
D_SU
RFAC
E
• Keyframe-based random ferns relocaliser
• To relocalise the camera when tracking fails
• To detect loop closures when aiming to construct a globally-consistent scene
• Procedure
• Downsample and preprocess image
• Each of m code blocks is obtained by applying a random fern to I
• Fern is a set of n binary feature tests on the image, each yielding either 0 or 1
• 𝑏º»
¥
∈ 𝐵@
denote the n-bit binary code resulting from applying fern 𝐹¿ to I
• 𝑏À
¥
∈ 𝐵U@
denote the result of concatenating all m such binary codes for I
• Dissimilarity measure between two different images I and J
as the block-wise Hamming distance between 𝑏À
¥
and 𝑏À
Á
:
where is 0 if the two code blocks are identical, and 1 otherwise
Relocalisation & Loop Closure Detection
108
• Caching strategy for the efficient BlockHD computation
Relocalisation & Loop Closure Detection
109
the number
of ferns
(ex. 500)
the number of code blocks (ex.16)
keyframe ID
similarity
the number of keyframes
fern id
code id
keyframe ID
query
code fragments
keyframes
query	frame
• Idea behind the relocaliser
• Encode an RGB-D image I as a set of m binary code blocks, each of length n
• To learn a lookup table from encodings of keyframe images to their known camera poses
• Harvesting keyframes
• If there is no similar keyframe in the keyframe DB, harvest a new keyframe
• Relocalization
• By finding the nearest neighbours of the encoding of the current camera input image in t
his table, and trying to use their recorded pose to restart tracking
• Loop closure detection
• Determine whether the current frame has been seen in the previous frames or not
• [Relocaliser::ProcessFrame]
Relocalisation & Loop Closure Detection
110
• Submap based approach
• Division of the scene into multiple rigid submaps
• Active submaps: tracked against at each frame
• Passive submaps: maintained, but not tracked against unless they become active again
Globally-Consistent Reconstruction
111
N.	Fioraio,	J.	Taylor,	A.	Fitzgibbon,	L.	Di	Stefano,	and	S.	Izadi,	“Large-scale	and	drift-free	surface	reconstruction	using	online	subvolume registration,”	2015	IEEE	Co
nf.	Comput.	Vis.	Pattern	Recognit.,	pp.	4475–4483,	Jan.	2015.
• Graph representation
• Numerical solution of the equation can be obtained by using popular Gauss-Newton
Recall: Pose Graph Optimization
112
eB? e?y
eyÃ
eÃÄ
eÅB
eÄ?
measurement estimation
G.	Grisetti,	R.	Kummerle,	C.	Stachniss,	and	W.	Burgard,	“A	tutorial	on	graph-based	SLAM,”	IEEE	Intell.	Transp.	Syst.	Mag.,	vol.	2,	no.	4,	pp.	31–43,	2010.
• First order Taylor expansion
• Substituting the equation
Least Squares Optimization
113
https://en.wikipedia.org/wiki/Linear_approximation
• Rewrite the function F(x)
• Minimizing the quadratic form by,
• The solution is obtained by adding the increments ∆x∗ to the initial guess
Least Squares Optimization
114
where
• The Matrix H and the vector b are obtained by summing up a set of matrices and v
ectors, one for every constraint.
• Every constraint will contribute to the system with an addend term.
• The structure of this addend depends on the Jacobian of the error function.
• Since the error function of a constraint depends only on the values of two nodes, th
e Jacobian has the following form.
Structure of Linearized System
115
Structure of Linearized System
116
• [runGlobalAdjustment]
• AR
Applications
117
https://www.youtube.com/watch?v=gmfKzTIKyww
Applications
118
https://www.youtube.com/watch?v=Qf0KkDEChj4
• 3D reconstruction
• Bundlefusion : http://graphics.stanford.edu/projects/bundlefusion/
Applications
119
https://www.youtube.com/watch?v=keIirXrRb1k
Q&A
120
Structure-from-Motion
121
• Reconstructing 3D structure from its projections into a series of images
taken from different viewpoints
• General SfM pipeline
Structure-from-Motion
122
Result	of	Rome	with	21K	registered	out	of	75K	images
• Feature extraction
• Features should be invariant under radiometric and geometric changes
• SfM can uniquely recognize them in multiple images.
• Ex) SIFT, SURF, ORB, FAST, BRIEF…
• [RunFeatureExtraction]
• Matching
• Matching the images that see the same scene part
• By leveraging the features as an appearance description of the images.
• Output
• a set of potentially overlapping image pairs
• their associated feature correspondences
• Naïve approach
• Test every image pair for scene overlap
• O(𝑁¥
?
𝑁º¬
?
)
• Scalable and efficient matching is necessary.
• [RunFeatureMatching]
Correspondence Search
123
• Geometric verification
• Since matching is based solely on appearance,
it is not guaranteed that corresponding features actually map to the same scene point.
• Diverse verification methods
• Projective geometry (pinhole camera model)
• Epipolar geometry (essential matrix, fundamental matrix)
• Homography
• If a valid transformation maps a sufficient number of features between the images,
they are considered geometrically verified.
• RANSAC is required to remove outliers
• Output
• a set of geometrically verified image pairs
• Their associated inlier correspondences
• (optional) a description of their geometric relation
• Scene-graph with images as nodes and verified pairs of images as edges
Correspondence Search
124
• Image registration
• New images can be registered to the current model by solving the PnP problem
• Perspective-n-Point Problem
• 2D—3D correspondences
• To determine the position and orientation of a camera
• Given its intrinsic parameters and a set of n correspondences between 3D points and their 2D projections
• Procedure
• (1) Estimate point clouds from 2D projections
• (2) Compute the transformation between the estimated point clouds and the given point clouds
• [EPNPEstimator::ComputePose]
Incremental Reconstruction
125
V.	Lepetit,	F.	Moreno-Noguer,	and	P.	Fua,	“EPnP:	An	accurate	O(n)	solution	to	the	PnP	problem,”	Int.	J.	Comput.	Vis.,	vol.	81,	no.	2,	pp.	155–166,	2009.
• Bundle adjustment
• Joint non-linear refinement of camera parameters 𝑃u and point parameters 𝑋¿
• Minimize the reprojection error
• Levenberg-Marquardt is the method of choice for solving BA problems
• [BundleAdjuster::BundleAdjuster]
Incremental Reconstruction
126
• Previous SfM frameworks (Bundler, VisualSFM)
• Challenges
• the current state-of-the-art SfM algorithms is good, but not enough
• Fail to produce fully satisfactory results in terms of completeness and robustness.
• Problems
• Correspondence search producing an incomplete scene graph
• Reconstruction stage failing to register images due to missing or inaccurate scene
• Symbiotic relationship between image registration and triangulation
• COLMAP
• Novel SfM algorithm containing following contributions to achieve the ultimate goal
• Contributions
• Scene graph augmentation
• Next best view selection maximizing the robustness and accuracy
• Robust and efficient triangulation method
• Iterative BA, re-triangulation, and outlier filtering strategy
COLMAP
127
COLMAP Demo
128
https://www.youtube.com/watch?v=Gb086k7b0wg
• Augmented geometric verification
• The number of inliers for the fundamental matrix 𝑁º
• The number of inliers for the essential matrix 𝑁È
• The number of inliers for the homography 𝑁É
• Check the ratio
ÊË
ÊÌ
,
ÊÍ
ÊË
,
ÊÍ
ÊÌ
and label type of the two view geometry
• If
ÊË
ÊÌ
< 𝜖Ⱥ and
ÊÍ
ÊË
< 𝜖ÉÈ	, then “planar or panoramic (pure rotation)”
• If
ÊË
ÊÌ
< 𝜖Ⱥ and
ÊÍ
ÊË
> 𝜖ÉÈ	, then “calibrated”
• If
ÊË
ÊÌ
> 𝜖Ⱥ and
ÊÍ
ÊÌ
< 𝜖ɺ	, then “planar or panoramic (pure rotation)”
• If
ÊË
ÊÌ
> 𝜖Ⱥ and
ÊÍ
ÊÌ
> 𝜖ÉÈ	, then “uncalibrated”
• Seed for reconstruction
• Non-panoramic
• Calibrated image pairs
• Do not triangulate
• Panoramic image pairs to avoid degenerate points
• [EstimateInitialTwoViewGeometry]
Scene Graph Augmentation
129
• Frequent problem in Internet photos
• Watermarks, timestamps, and frames (WTF)
• Incorrectly link images of different landmarks
• Two assumptions for WTF
• (1) watermarks and frames always have the exact same appearance
• (2) all WTFs are typically close to the border the image.
• Procedure
• Estimate a translation transformation with 𝑁z inliers at the image borders
• Any image pair with
ÊÎ
ÊÌ
< 𝜖zº is considered a WTF and not inserted to the scene graph
• [DetectWatermark]
Scene Graph Augmentation
130
T.	Weyand,	C.	Y.	Tsai,	and	B.	Leibe,	“Fixing	WTFs:	Detecting	image	matches	caused	by	watermarks,	timestamps,	and	frames	in	internet	photos,”	in	Proceedings	- 2
015	IEEE	Winter	Conference	on	Applications	of	Computer	Vision,	WACV	2015,	2015,	pp.	1185–1192.
• Motivation
• Choosing the next best view is critical,
as every decision impacts the remaining reconstruction.
• A single bad decision may lead to a cascade of camera mis-registrations.
• Diverse strategies
• MAX_VISIBLE_POINTS_NUM
• To choose the image that sees most triangulated points
• MAX_VISIBLE_POINTS_RATIO
• Higher ratio of visible points on observations
• MIN_UNCERTAINTY
• More visible points and a more uniform distribution of points
• [IncrementalMapper::FindNextImages]
Next Best View Selection
131
• Refinement using multiple view triangulation
• [TriangulateMultiViewPoint]
• Cheirality constraint
• Positive depth with respect to the camera views
• [HasPointPositiveDepth]
• Sufficient triangulation angle
• Angle between two rays should be bigger enough.
• [CalculateTriangulationAngle]
Robust and Efficient Triangulation
132
• Before BA: Re-Triangulation
• To improve the completeness of the reconstruction
by continuing the tracks of points that previously failed to triangulate
• continue tracks with observations whose errors are below the filtering thresholds
• [IncrementalTriangulator::Retriangulate]
• After BA: Filtering
• Filter observations with large reprojection errors
• Enforcing a minimum triangulation angle over all pairs of viewing rays
• [Reconstruction::FilterAllPoints3D]
• Iterative refinement
• Perform RT, BA and filtering in an iterative optimization
• until the number of filtered observations and post_BA RT points diminishes.
• [IterativeGlobalRefinement]
Bundle Adjustment
133
• Middlebury Temple
• 312 images
Demo
134
• Middlebury Dino
• 363 images
Demo
135
• Building
• 128 images
Demo
136
Q&A
137
SLAM Research & Job Trend
in CVPR 2018
138
• Computer Vision and Pattern Recognition 2018
• Salt Lake City, Utah
• June 18- 22, 2018
• 979 accepted papers
• 6512 registered attendees
CVPR 2018
139
• http://visualslam.ai/
• Deep learning for visual SLAM review
Deep Learning for Visual SLAM
140
• Facebook
• https://www.facebook.com/careers/jobs/?q=slam
Recruitment Expo
141
• Nvidia
• https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/0/refreshFacet/318c8b
b6f553100021d223d9780d30be
Recruitment Expo
142
• Naver Labs
• https://recruit.naverlabs.com/labs/recruitMain
Recruitment Expo
143
• Here
• https://www.here.com/en
Recruitment Expo
144 https://www.youtube.com/watch?v=54J_ZCbeJdc
• DiDi Chuxing
• https://www.didiglobal.com/#/
Recruitment Expo
145
• Vtrus - Robotics scientist , SLAM/Computer Vision
• https://www.vtr.us/
Recruitment Expo
146
• Skydio
• https://www.skydio.com/
Recruitment Expo
147 https://www.youtube.com/watch?v=Gh5pAT1o2V8
• DroneDeploy
• https://www.dronedeploy.com/
Recruitment Expo
148 https://www.youtube.com/watch?v=NS8WLnoFqyE
Useful Resources
149
• Multiple View Geometry in Computer Vision- Richard Hartley and Andrew Zisserman
• Numerical Optimization – Jorge Nocedal and Stephen J. Wright
• Modern C++ Course 2018
• http://www.ipb.uni-bonn.de/teaching/modern-cpp/
• 따라하며 배우는 C++ 2018
• https://www.youtube.com/playlist?list=PLNfg4W25Tapw5Yx4yuExHNybBIUk68aNz
• Photogrammetry (Computer vision)
• http://www.ipb.uni-bonn.de/photogrammetry-i-ii/
• Multiple view geometry - Prof. D. Cremers from TUM
• https://www.youtube.com/playlist?list=PLTBdjV_4f-EJn6udZ34tht9EVIW7lbeo4
Books & Free Lectures
150
• SLAM Research KR
• https://www.facebook.com/groups/slamkr/
• 로열모 (로봇공학을 위한 열린 모임)
• https://www.facebook.com/groups/KoreanRobotics/
• 오로카 (오픈소스 소프트웨어/하드웨어로 만들어가는 로봇 기술 공유 카페)
• https://cafe.naver.com/openrt/10561
Research Groups in SNS
151
End of Second Week
152
Thank you
153

More Related Content

PDF
Introductory Level of SLAM Seminar
PPTX
An Introduction to ROS-Industrial
PDF
Denavit Hartenberg Algorithm
PDF
大規模ソフトウェア開発とテストの経験について
PPTX
Mozilla Hubsが拓く新世代WebVRのススメ #HubsScrum
PDF
[NEDO特別講座] OSS活用のためのライセンス解説コース
PDF
今だから聞きたい 「一番新しい xRアプリの作り方」 2020年 最新版
PPTX
見よう見まねでやってみる2D流体シミュレーション
Introductory Level of SLAM Seminar
An Introduction to ROS-Industrial
Denavit Hartenberg Algorithm
大規模ソフトウェア開発とテストの経験について
Mozilla Hubsが拓く新世代WebVRのススメ #HubsScrum
[NEDO特別講座] OSS活用のためのライセンス解説コース
今だから聞きたい 「一番新しい xRアプリの作り方」 2020年 最新版
見よう見まねでやってみる2D流体シミュレーション

What's hot (20)

PPTX
コンピュテーション式ハンズオン
PPTX
2A_ROBOT KINEMATICS.pptx
PDF
=SLAM ppt.pdf
PDF
Introduction to Mobile Robotics
PDF
hooks riverpod + state notifier + freezed でのドメイン駆動設計
PDF
AndroidでWebSocket
PPTX
Dronecodeの概要とROSの対応について
PPTX
Unityネイティブプラグインマニアクス #denatechcon
PDF
ある工場の Redmine 2022 〜ある工場の Redmine 5.0 バージョンアップ〜 ( Redmine of one plant 2022 ...
PDF
SteamVR Plugin 2.0 にアップデートした話
PDF
ARでVRアバターを表示するシステムを構築しよう
PPTX
【システムテスト自動化カンファレンス2015】 楽天の品質改善を加速する継続的システムテストパターン #stac2015
PPTX
What is a Software Module?
PPTX
AR / VR / MRの世界に、置けるUI、置けないUI、置くべきUI
PDF
유니티 - 물리엔진(Physics Engine) 개념 잡기
PDF
UnityによるHoloLensアプリケーション入門
PPTX
さるでも分かりたい9dofで作るクォータニオン姿勢
PPTX
Redmineカスタムフィールド表示改善
PPSX
10 robotic manufacturing systems
PDF
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
コンピュテーション式ハンズオン
2A_ROBOT KINEMATICS.pptx
=SLAM ppt.pdf
Introduction to Mobile Robotics
hooks riverpod + state notifier + freezed でのドメイン駆動設計
AndroidでWebSocket
Dronecodeの概要とROSの対応について
Unityネイティブプラグインマニアクス #denatechcon
ある工場の Redmine 2022 〜ある工場の Redmine 5.0 バージョンアップ〜 ( Redmine of one plant 2022 ...
SteamVR Plugin 2.0 にアップデートした話
ARでVRアバターを表示するシステムを構築しよう
【システムテスト自動化カンファレンス2015】 楽天の品質改善を加速する継続的システムテストパターン #stac2015
What is a Software Module?
AR / VR / MRの世界に、置けるUI、置けないUI、置くべきUI
유니티 - 물리엔진(Physics Engine) 개념 잡기
UnityによるHoloLensアプリケーション入門
さるでも分かりたい9dofで作るクォータニオン姿勢
Redmineカスタムフィールド表示改善
10 robotic manufacturing systems
Streamlined landscape creation with new Terrain Tools- Unite Copenhagen 2019
Ad

Similar to FastCampus 2018 SLAM Workshop (20)

PPTX
Spark Technology Center IBM
PDF
3D SLAM introcution& current status
PDF
2022 COMP4010 Lecture4: AR Interaction
PPTX
object recognition for robots
PDF
“Introduction to Visual Simultaneous Localization and Mapping (VSLAM),” a Pre...
PDF
Elevation mapping using stereo vision enabled heterogeneous multi-agent robot...
PPTX
3D PRINTING - INTRODUCTION
PPTX
Efficient architecture to condensate visual information driven by attention ...
PDF
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
PDF
VSlam 2017 11_20(張閎智)
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PPTX
Photogrammetry: A Quick Primer
PPTX
Automated Video Analysis and Reporting for Construction Sites
PPT
Mainprojpresentation 150617092611-lva1-app6892
PPT
pick and place robotic arm
PPTX
Lobula Giant Movement Detector Based Embedded Vision System for Micro-robots
PPTX
slide-171212080528.pptx
PPTX
PCL (Point Cloud Library)
PPTX
“ADAS in Action (POC Autonomous Driving Vehicle Presentation)”
PPTX
Real Time Object Dectection using machine learning
Spark Technology Center IBM
3D SLAM introcution& current status
2022 COMP4010 Lecture4: AR Interaction
object recognition for robots
“Introduction to Visual Simultaneous Localization and Mapping (VSLAM),” a Pre...
Elevation mapping using stereo vision enabled heterogeneous multi-agent robot...
3D PRINTING - INTRODUCTION
Efficient architecture to condensate visual information driven by attention ...
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
VSlam 2017 11_20(張閎智)
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
Photogrammetry: A Quick Primer
Automated Video Analysis and Reporting for Construction Sites
Mainprojpresentation 150617092611-lva1-app6892
pick and place robotic arm
Lobula Giant Movement Detector Based Embedded Vision System for Micro-robots
slide-171212080528.pptx
PCL (Point Cloud Library)
“ADAS in Action (POC Autonomous Driving Vehicle Presentation)”
Real Time Object Dectection using machine learning
Ad

Recently uploaded (20)

PDF
Website Design Services for Small Businesses.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
chapter 5 systemdesign2008.pptx for cimputer science students
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Time Tracking Features That Teams and Organizations Actually Need
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Cost to Outsource Software Development in 2025
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Website Design Services for Small Businesses.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
GSA Content Generator Crack (2025 Latest)
MCP Security Tutorial - Beginner to Advanced
Patient Appointment Booking in Odoo with online payment
iTop VPN Crack Latest Version Full Key 2025
Salesforce Agentforce AI Implementation.pdf
chapter 5 systemdesign2008.pptx for cimputer science students
Why Generative AI is the Future of Content, Code & Creativity?
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Autodesk AutoCAD Crack Free Download 2025
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Time Tracking Features That Teams and Organizations Actually Need
Wondershare Recoverit Full Crack New Version (Latest 2025)
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Digital Systems & Binary Numbers (comprehensive )
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Cost to Outsource Software Development in 2025
Complete Guide to Website Development in Malaysia for SMEs
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf

FastCampus 2018 SLAM Workshop

  • 1. SLAM Workshop Aug. 11-18, 2018 Dong-Won Shin
  • 2. • Dong-Won Shin • Gwangju Institute of Science and Technology • PhD candidate in Computer Science Field • SLAM Research Group KR manager • (blog) dongwonshin.net • (github) github.com/JustWon • (E-mail) dongwonshin@gist.ac.kr Speaker 2
  • 3. • 1. Simultaneous Localization and Mapping • 2. Brief review on SLAM: a historical perspective • 3. Sparse SLAM • 4. Lidar SLAM • 5. Direct method • 6. Dense SLAM • (extra) Structure from Motion • 7. SLAM research & job trend in CVPR 2018 • 8. Useful resources Table of Contents 3
  • 4. • What you will learn? • What is SLAM and where can it be used? • What is the underlying algorithms of the SLAM framework? • How is it actually written in the code? • Who is intended for? • People who wants to run Visual SLAM directly • People who wants to use Visual SLAM for their research projects • People who wants to study on Visual SLAM • Convention in This Presentation • [Something]: code review in the code diagram • Something: important keyword, detailed explanation from the next slide or below Preface 4
  • 6. • What is SLAM? • Computational problem of constructing a map of an environment while simultaneously keeping track of a robot’s location • Application Simultaneous Localization and Mapping Augmented reality Virtual reality Robotics indoor outdoor https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
  • 7. • Visual localization • Under the inaccurate GPS • GPS-denied environment • Ex) • Mapping • Scenarios in which a prior map is not available and needs to be built. • Map can inform path planning or provide an intuitive visualization for a human or robot. • Ex) Simultaneous Localization and Mapping 7 C. Cadena et al., “Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age,” IEEE Trans. Robot., vol. 32, no. 6, pp. 1309–1332, 2016. Indoor environment Skyscraper Disaster areaPrivate room
  • 8. • Do autonomous robots really need SLAM? • Yes, since many applications implicitly or explicitly do require a globally consistent map • SLAM as a mechanism to compute a sufficient statistic that summarizes all past observati ons of the robot • Is SLAM completely solved? • Not yet, SLAM is such a broad topic that the question is well posed only for a given robot /environment/performance combination. • Current SLAM algorithm can be easily induced to fail when either the motion of the robot or the environment are too challenging • Fast robot dynamics • Highly dynamic environments • Semantic fusion • Collaborative mapping Questions 8
  • 9. Brief Review on SLAM: a Historical Perspective 9
  • 10. • Bayesian Filtering based SLAM • Theoretical fundamental and prototype of traditional Bayesian filtering based SLAM framework emerged in 1900s. • Ex) EKF SLAM, FastSLAM • Visual odometry • The process of estimating the ego-motion of a robot using only the input of a single or multiple cameras attached to it. • Ex) stereo VO, monocular VO • Structure from motion • Investigating the problem of recovering relative camera poses and 3D structure from a set of camera images • Off-line version of visual SLAM Earlier Inspirations 10
  • 11. • The solution to large scale map management • 1) graph based slam and loop closure detection • 2) efficient map representation and refinement: sparse, dense, and semi-dense • Graph based SLAM • constructing a graph whose nodes represent robot poses or landmarks • edge between nodes encodes a sensor measurement that constrains connected poses • Loop closure detection • Detecting loop closures in a map to give additional constraint for the consistent mapping Effort towards Large Scale Mapping 11
  • 12. • Sparse SLAM • Only use a small selected subset of the pixels (features) from a monocular color camera • Fast and real time on CPU but it produces a sparse map (point clouds) • Landmark-based or feature-based representations • ORB SLAM • One of the SOTA frameworks in the sparse SLAM category • Complete SLAM system for monocular camera • Real-time on standard CPUs in a wide variety of environments • small hand-held indoors • drones flying in industrial environments • cars driving around a city Modern State of the Art Systems 12
  • 13. • Dense SLAM • Use most or all of the pixels in each received frame • Or use depth images from a depth camera • It produces a dense map but GPU acceleration is necessary for the real-time operation. • Volumetric model or surfel-based representations • InfiniTam • One of the SOTA frameworks in the Dense SLAM category • Multi-platform framework for real-time, large-scale depth fusion and tracking • Densely reconstructed 3D scene Modern State of the Art Systems 13
  • 14. • Direct method (semi-dense SLAM) • Make use of pixel intensities directly • Enable using all information in the image • It produces a semi-dense map • Higher accuracy and robustness in particular even in environments with little keypoints • LSD SLAM • Highly cited SLAM framework in the direct method SLAM category • Large-scale, consistent maps of the environment • Accurate pose estimation based on direct image alignment Modern State of the Art Systems 14
  • 15. • Lidar SLAM • Make use of the Lidar sensor input for the localization and mapping • Autonomous driving purpose-oriented in outdoor environment • Segmap • Unified approach for Lidar SLAM based on the extraction of segments in 3D point clouds • Real-time single- and multi-agent systems Modern State of the Art Systems 15
  • 16. Modern State of the Art Systems 16 ORB SLAM InfiniTam LSD SLAM Segmap
  • 20. • Feature-based monocular SLAM system • operates in real time, in small and large, indoor and outdoor environments • URL: https://github.com/raulmur/ORB_SLAM2 • Main components • Tracking • Local mapping • Loop closing • Data structures • Map points • Keyframes • Covisibility graph • Essential graph ORB SLAM 20 Map points & keyframes Covisibility graph Essential graph R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” IEEE Trans. Robot., vol. 31, no. 5, pp. 1147–1 163, 2015.
  • 21. • Map points 𝑝" • 3D position 𝐗$," • Viewing direction 𝐧" • Representative ORB descriptor 𝐃" • Keyframes 𝐾" • Camera pose 𝑻"$ • Camera intrinsics (focal length and principal point) • All the ORB features extracted in the frame Data structures 21 Map points & keyframes
  • 22. • Covisibility graph • Undirected weighted graph • Node: keyframe • Edge: if two keyframes share observations of the same map points (at least 15) • Weight 𝜃: the number of common map points • Essential graph • Retain all the nodes but less edges • Subset of edges from the covisibility graph with high covisibility + loop closure edges Data structures 22 Covisibility graph Essential graph
  • 23. • ORB feature extraction • For tracking, mapping, and place recognition tasks • Robust to rotation and scale • Good invariance to camera auto-gain and auto-exposure, and illumination changes • Fast to extract and match allowing for real-time operation • Show good precision/recall performances in bag-of-word place recognition • [Frame::Frame] Tracking 23 E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “ORB: An efficient alternative to SIFT or SURF.,” ICCV, pp. 2564–2571, Jan. 2011. Example of orb features
  • 24. • Initial pose estimation • Case 1: if tracking was successful for the last frame • constant velocity motion model • Bundle adjustment in two view case • [Tracking::TrackWithMotionModel] Tracking 24 Minimizing the reprojection error https://cs.nyu.edu/~fergus/teaching/vision/11_12_multiview.pdf t1 t2 t3
  • 25. • Initial pose estimation • Case 2: if the tracking is lost • global relocalization • Bag-of-visual-words • Converting an image into a representative descriptor • Perspective-N-Point problem • Finding 2D—3D correspondences • [Tracking::Relocalization] Tracking 25
  • 26. • Training step • Visual vocabulary is first created by clustering a large number of keypoint descriptors whose cluster centres from the visual words of the vocabulary. • Estimation step • The local keypoints of a given image are first detected and described. • Each descriptor is vector-quantized. • The histogram of the vector-quantized keypoint descriptors is used as the image descriptor. Bag-of-Visual-Words 26 D. Gálvez-López and J. D. Tardós, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197, 2012.
  • 27. • Perspective-n-Point Problem • 2D—3D correspondences • To determine the position and orientation of a camera • Given its intrinsic parameters and • a set of n correspondences between 3D points and their 2D projections • Procedure • (1) Estimate point clouds from 2D projections • (2) Compute the transformation between the estimated point clouds and the given point clouds Perspective-N-Point Problem 27 V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis., vol. 81, no. 2, pp. 155–166, 2009.
  • 28. • Single view geometry • Mathematical relationship between the coordinates of a point in 3D space and its pr ojection onto the image plane Pinhole Camera Model 28 A. Hartley, R.I. and Zisserman, Multiple View Geometry in Computer Vision. 2003.
  • 29. • (1) Estimate point clouds from 2D projections • Barycentric coordinate property • A 3D point can be represented by a weighted sum of 4 control points. • 2D projections of the point Perspective-N-Point Problem 29 https://en.wikipedia.org/wiki/Barycentric_coordinate_system = Mx=0
  • 30. • (1) Estimate point clouds from 2D projections • The solution can be expressed as a linear combination of 𝐯" (the null eigenvectors of M-M.) • Now, we know the positions of 4 control points. • Then, we can compute the 3D reprojected points of the 2D projection points. • (2) Compute the transformation • between the estimated point clouds and the given point clouds • Via the point-to-point iterative closest points method Perspective-N-Point Problem 30
  • 31. Iterative Closest Points (ICP) 31 • Widely used for geometric alignment of three-dimensional models • Start with two meshes and an initial guess for their relative rigid-body transform • Refine the transform by repeatedly generating pairs of corresponding points • Related works • Point to Point • Point to Plane • …
  • 32. Point-to-Point ICP 32 • Original problem • Centroid • Decoupling the translation min 1,2 3 𝑅 𝑝" 6 + 𝑝̅ + 𝑡 − (𝑞" 6 + 𝑞=) ? @ "AB min 1,2 3 𝑅𝑝" 6 + 𝑅𝑝̅ + 𝑡 − 𝑞" 6 − 𝑞= ? @ "AB min 1,2 3 (𝑅𝑝" + 𝑡) − 𝑞" ? @ "AB 𝑝̅ = 1 𝑛 3 𝑝" @ "AB 𝑝" 6 = 𝑝" − 𝑝̅ Source : Target : 𝑞= = 1 𝑛 3 𝑞" @ "AB 𝑞" 6 = 𝑞" − 𝑞= 𝑝" = 𝑝" 6 + 𝑝̅ 𝑞" = 𝑞" 6 + 𝑞= 𝑡 = 𝑞 F − 𝑅𝑝̅Let’s assume min 1 3 𝑅𝑝" 6 + 𝑅𝑝̅ + 𝑞 F − 𝑅𝑝̅ − 𝑞" 6 − 𝑞= ? @ "AB min 1 3 𝑅𝑝" 6 − 𝑞" 6 ? @ "AB P. J. Besl and N. D. McKay, “A Method for Registration of 3-D Shapes.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 239–256, Jan. 1992.
  • 33. • Dual problem • Solving the cost function Point-to-Point ICP 33 It does not depend on the rotation Maximize this term would minimize the entire cost M = 3 𝑝" 6 G 𝑞" 6 @ "AB R = UV- 𝑡 = 𝑞 F − 𝑅𝑝̅ https://cs.gmu.edu/~kosecka/cs685/cs685-icp.pdf ,where U and V are left and right eigenvectors of matrix M
  • 34. • Track local map • Map point filtering • (1) compute the map point projection x in the current frame. Discard if it lays out of the image bounds. • (2) compute the angle between the current viewing ray v and the map point viewing direction n Discard if v G n < cos 60° • (3) compute the distance d from map point to camera center. Discard if it is out of the scale invariance region of the map point d 𝑑 ∉ 𝑑U"@, 𝑑UVW • Matching • (4) compare the representative descriptor D of the map point with the still unmatched ORB features in the frame and associate the map point with the best match • Refinement • (5) Perform the bundle adjustment for the reserved map points and keyframes • [Tracking::TrackLocalMap] Tracking 34 𝑓W 0 𝑐W 0 𝑓Z 𝑐Z 0 0 1 𝑥 𝑦 𝑧 = 𝑢 𝑣 𝑤 → 𝑢/𝑤 𝑣/𝑤 1 Projection
  • 35. • New keyframe decision • (condition 1) Good relocalization • more than MAX frames have passed from the last keyframe insertion • (condition 2) Idle case • local mapping is idle AND • more than MIN frames have passed from the last keyframe insertion • (condition 3) Visual change • current frame tracks less than 90% points than 𝐾cde • ((condition 1) || (condition 2)) && (condition 3) • [Tracking::NeedNewKeyFrame] Tracking 35
  • 36. • Keyframe insertion • Compute the bags of words representation • Update the covisibility graph • [LocalMapping::ProcessNewKeyFrame] • New map point creation • Finding the matches by the epipolar geometry • Initial map point creation via triangulation • Consistency check • Parallax • Positive depth in both cameras • Reprojection error • Scale consistency check • [LocalMapping::CreateNewMapPoints] Local Mapping 36
  • 37. • Geometry of stereo vision (two view geometry) Epipolar Geometry 37 epipolar line epipole c c’ epipolar plane https://web.stanford.edu/class/cs231a/course_notes/03-epipolar-geometry.pdf A. Hartley, R.I. and Zisserman, Multiple View Geometry in Computer Vision. 2003.
  • 38. • Algebraic representation of epipolar geometry • Properties of the fundamental matrix • Point correspondence • If x and x’ are corresponding image points, then 𝐱6- 𝐅𝐱 = 0 • Epipolar lines • 𝐥′ = 𝐅𝐱 is the epipolar line corresponding to x • 𝐥 = 𝐅 𝐓 𝐱′ is the epipolar line corresponding to x’ Fundamental Matrix 38 F
  • 39. • Specialization of the fundamental matrix to the case of normalized image coordinates • Properties of the essential matrix • In case of normalized images coordinates, it has the same properties with the fundamental matrix. • 𝐄 = 𝐊6𝐓 𝐅𝐊 • 𝐅 = 𝐊6m𝐓 𝐄𝐊m𝟏 • In case of calibration matrix K is the identity matrix, then E == F Essential Matrix 39 E In the normalized image coordinates
  • 40. • Any two images of the same planar surface are related by a homography • Properties of homography matrix • 𝐱′ = 𝐇𝐱 Homography 40
  • 41. • Generally, rays Càx and C’àx’ will not exactly intersect • Can solve via SVD, finding a least squares solution to a system of equations • Procedure • Given camera projection matrix P, P’, and a correspondences x, x’ • Cross product of x and PX should be zero. (self cross product) • Create matrix A • [U, S, V] = svd(A) • X = V(:, end) Triangulation 41 X x x' ú ú ú û ù ê ê ê ë é = 1 v u wx ú ú ú û ù ê ê ê ë é ¢ ¢ =¢ 1 v u wx ú ú ú û ù ê ê ê ë é = T T T 3 2 1 p p p P ú ú ú ú û ù ê ê ê ê ë é ¢-¢¢ ¢-¢¢ - - = TT TT TT TT v u v u 23 13 23 13 pp pp pp pp A ú ú ú û ù ê ê ê ë é ¢ ¢ ¢ =¢ T T T 3 2 1 p p p P A. Hartley, R.I. and Zisserman, Multiple View Geometry in Computer Vision. 2003. https://en.wikipedia.org/wiki/Cross_product
  • 42. • Local keyframe culling • To maintain a compact reconstruction, detect redundant keyframes and delete them • Redundency check • Keyframe whose 90% of the map points have been seen in at least other three keyframes • [LocalMapping::KeyFrameCulling] • Recent map point culling • To retain the compact map • Association check • The tracking must find the point in more than the 25% of the frames in which it is predicted to be visible. • If more than one keyframe has passed from map point creation, it must be observed from at least three keyframes. • The points can be removed if at any time it is observed from less than three keyframes. • [LocalMapping::MapPointCulling] Local Mapping 42 (# frames the matching is found) (# frames the map point is visible) > 0.25
  • 43. • Local bundle adjustment • Local map points • K" : Currently processed keyframe • Ku : All the keyframes connected to it in the covisibility graph • All the map points seen by those keyframes • [Optimizer::LocalBundleAdjustment] Local Mapping 43
  • 44. • Loop candidates detection • All those keyframes directly connected to the current keyframe are discarded • Compute the similarity between the bag of words vector • Current keyframe and all its neighbors in the covisibility graph • Query the recognition DB • Discard all those keyframes whose score is lower than minimum • [KeyFrameDatabase::DetectLoopCandidates] • Compute the similarity transformation • Similarity transform between current keyframe and the loop closure candidate • Given a number of points in two different Cartesian coordinate systems, recovering the transformation and scale between the two systems • [LoopClosing::ComputeSim3] Loop Closing 44
  • 45. Similarity Transform 45 • Original problem • Centroid • Decoupling the translation min 1,2 3 𝑅 𝑝" 6 + 𝑝̅ + 𝑡 − (𝑞" 6 + 𝑞=) ? @ "AB min 1,2 3 𝑅𝑝" 6 + 𝑅𝑝̅ + 𝑡 − 𝑞" 6 − 𝑞= ? @ "AB min 1,2 3 (𝑅𝑝" + 𝑡) − 𝑞" ? @ "AB 𝑝̅ = 1 𝑛 3 𝑝" @ "AB 𝑝" 6 = 𝑝" − 𝑝̅ Source : Target : 𝑞= = 1 𝑛 3 𝑞" @ "AB 𝑞" 6 = 𝑞" − 𝑞= 𝑝" = 𝑝" 6 + 𝑝̅ 𝑞" = 𝑞" 6 + 𝑞= 𝑡 = 𝑞 F − 𝑅𝑝̅Let’s assume min 1 3 𝑅𝑝" 6 + 𝑅𝑝̅ + 𝑞 F − 𝑅𝑝̅ − 𝑞" 6 − 𝑞= ? @ "AB min 1 3 𝑅𝑝" 6 − 𝑞" 6 ? @ "AB B. K. P. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, Apr. 1987. http://web.cs.iastate.edu/~cs577/handouts/quaternion.pdf
  • 46. • Dual problem • Solving by the quaternion Similarity Transform 46 It does not depend on the rotation Maximize this term would minimize the entire cost = Property of quaternion product
  • 47. • Eigenvector of the highest eigenvalue of N is the solution • Quaternion to rotation • Calculate the translation • Calculate the scale Similarity Transform 47 N 𝑡 = 𝑞 F − 𝑅𝑝̅ 𝑠 = (∑ 𝑞" 6 G 𝑅𝑝" 6 )@ "AB / (∑ ||𝑅𝑝" 6@ "AB ||?) ,where 𝑝"B 6 𝑞"B 6 + 𝑝"? 6 𝑞"? 6 + 𝑝"y 6 𝑞"y 6 𝑝"? 6 𝑞"y 6 − 𝑝"y 6 𝑞"? 6 𝑝"y 6 𝑞"B 6 − 𝑝"B 6 𝑞"y 6 𝑝"B 6 𝑞"? 6 − 𝑝"? 6 𝑞"B 6 𝑝"? 6 𝑞"y 6 − 𝑝"y 6 𝑞"? 6 𝑝"B 6 𝑞"B 6 − 𝑝"? 6 𝑞"? 6 − 𝑝"y 6 𝑞"y 6 𝑝"B 6 𝑞"? 6 + 𝑝"? 6 𝑞"B 6 𝑝"y 6 𝑞"B 6 + 𝑝"B 6 𝑞"y 6 𝑝"y 6 𝑞"B 6 − 𝑝"B 6 𝑞"y 6 𝑝"B 6 𝑞"? 6 + 𝑝"? 6 𝑞"B 6 −𝑝"B 6 𝑞"B 6 + 𝑝"? 6 𝑞"? 6 − 𝑝"y 6 𝑞"y 6 𝑝"? 6 𝑞"y 6 + 𝑝"y 6 𝑞"? 6 𝑝"B 6 𝑞"? 6 − 𝑝"? 6 𝑞"B 6 𝑝"y 6 𝑞"B 6 + 𝑝"B 6 𝑞"y 6 𝑝"? 6 𝑞"y 6 + 𝑝"y 6 𝑞"? 6 −𝑝"B 6 𝑞"B 6 − 𝑝"? 6 𝑞"? 6 + 𝑝"y 6 𝑞"y 6 N= 𝑝"B 6 𝑞"B 6 𝑝"B 6 𝑞"? 6 𝑝"B 6 𝑞"y 6 𝑝"? 6 𝑞"B 6 𝑝"? 6 𝑞"? 6 𝑝"? 6 𝑞"y 6 𝑝"y 6 𝑞"B 6 𝑝"y 6 𝑞"? 6 𝑝"y 6 𝑞"y 6 M= 𝑝" 6 𝑞" 6z =
  • 48. • Loop fusion • When the loop closure detection is triggered, the loop fusion is performed. • Keyframe and map point correction • Insert loop edges in the covisibility graph • Current keyframe pose is corrected with the similarity transformation that we obtained • Fuse duplicated map points • [LoopClosing::CorrectLoop] • Essential graph optimization • For all keyframes and all map points • [Optimizer::GlobalBundleAdjustment] Loop Closing 48
  • 54. • Incremental method for localization in 3D point clouds based on segment matching • URL: https://github.com/ethz-asl/segmap • Main components • Front-end • Sequential factors • Place recognition factors • Back-end • Pose graph optimization SegMap 54
  • 55. • The front-end is responsible for • Sensor measurements into sequential factors • Segmentation and description for place recognition factors • Sequential factors • Odometry factors • Displacement between consecutive robot poses from IMU data • Scan-matching factors • Registering the current scan against a submap by point-to-plane ICP • [SegMapper::SegMapper] • Place recognition factors • SegMatch: segment based loop-closure for 3D point clouds • [SegMapper::segMatchThread] Front-End 55
  • 56. • Minimize a perpendicular distance from the source point to tangent plane of destination point • Nonlinear least square algorithm using Levenberg-Marquardt method Point-to-Plane ICP 56 K. Low, “Linear Least-squares Optimization for Point-to-plane ICP Surface Registration,” Chapel Hill, Univ. North Carolina, no. February, pp. 2–4, 2004. 𝑠" = (𝑠"W, 𝑠"Z, 𝑠"{, 1)z 𝑑" = (𝑑"W, 𝑑"Z, 𝑑"{, 1)z 𝑛" = (𝑛"W, 𝑛"Z, 𝑛"{, 0)z Source point s Destination point d Unit normal vector at d
  • 57. Point-to-Plane ICP • Transformation matrix M • Least square problem • 6-DOF (𝛼, 𝛽, 𝛾, 𝑡W, 𝑡Z, 𝑡{) • However, In case of 𝛼, 𝛽, 𝛾, it is a nonlinear trigonometric function • Linear approximation is needed 57 where 𝑀€•2 = arg min … 3((𝑀 G 𝑠" − 𝑑") G 𝑛")? "
  • 58. Point-to-Plane ICP • Approximated Transformation Matrix 𝐌ˆ • Linearized expression for the i-th correspondence 58
  • 59. Point-to-Plane ICP • Expand to N correspondences • Modified form to the general least square problem • Optimum solution 𝑥€•2 • Iteratively perform the SVD optimization until it converges • [PointToPlaneWithCovErrorMinimizer::compute] 59 where = 𝐴z 𝐴 mB 𝐴z 𝑏
  • 60. • Place recognition factors • SegMatch: segment based loop-closure for 3D point clouds • Four different modules • Point cloud segmentation • Descriptor extraction • Segment matching • Geometric verification • [SegMapper::segMatchThread] Front-End 60
  • 61. • Point cloud segmentation • Incremental region growing policy • CanGrowTo • If growing from a seed to a neighbor is allowed • Check the angle between seed and candidate normals • LinkClusters • Link clusters if they have the same cluster id • CanBeSeed • If a point can be used as seed • Check the curvature at a point • [growRegionFromSeed] SegMatch 61 Region growing Cluster merging
  • 62. • Descriptor extraction • For compressing the raw segment data and build object signatures • Diverse segment descriptors • Eigenvalue based • Eigenvalues for the segment are computed and combined in a 7 dim feature vector • Linearity, planarity, scattering, omnivariance, anisotropy, eigenentropy, change of curvature • [EigenvalueBasedDescriptor::describe] SegMatch 62 M. Weinmann, B. Jutzi, and C. Mallet, “Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant fea tures,” ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., vol. II-3, no. September, pp. 181–188, 2014. Eigenvalue-based 3D features
  • 63. • Diverse segment descriptors • Auto-encoder based • Input: 3D binary voxel grid of fixed dimension 32x32x16 • Descriptor extractor part • 3D convolutional layers with max pool layers and two Fully connected layers • Rectified linear activation function for all layers • Output: 64x1 descriptor • Reconstruction part • One fully connected layer and three deconvolutional layers with a final sigmoid output • Output: reconstructed 3D binary voxel grid • Loss • Classification loss: softmax cross entropy loss • Reconstruction loss: binary cross entropy loss SegMatch 63 R. Dubé, A. Cramariuc, D. Dugas, J. Nieto, R. Siegwart, and C. Cadena, “SegMap: 3D Segment Mapping using Data-Driven Descriptors,” 2018.
  • 64. • Matching • K-nearest neighbor search in the descriptor space by k-d tree • Construction of k-d trees SegMatch 64 5 2 7 4 3 9 6 1 0 8
  • 65. • Matching • Searching k-nearest neighbors • Ex) Query point: p1 • Time complexity: O(log n) • [findCandidates] SegMatch 65 5 2 7 4 3 9 6 1 0 8
  • 66. • Geometric verification • Consistency graph G=(V,E) • Vertex V = {𝑐"}, the set of correspondences 𝑐" • Edge E = {𝑒"Œ} the set of undirected edges 𝑒"Œ connecting all consistent pairs of correspo ndences (𝑐", 𝑐Œ) • Geometrically consistent • If the difference of the Euclidean distance between the segment centroid in the local map and i n the target map is less than a threshold • Identifying a maximum geometrically consistent set == finding a maximum clique of G • [Segmatch::recognize] SegMatch 66 |𝑑•(𝑐", 𝑐Œ) − 𝑑2(𝑐", 𝑐Œ)| ≤ 𝜖
  • 67. • Illustration Consistency Graph 67 Local map Target map c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 Local map Target map c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 Local map Target map c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 Local map Target map c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 Local map Target map c2 c3 c5 c6 c7 c8 c9 c10 c11
  • 68. Back-End • Pose graph optimization by factor graphs (iSAM library) • Factor graphs • Graphical models that are well suited to modeling complex estimation problems • Variables • Unknown random variables in the estimation problem • Robot poses • Factors • Probabilistic information on those variables, derived from measurements or prior knowledge • Odometry, loop closure constraints • [IncrementalEstimator::estimate] 68
  • 69. • Autonomous vehicles Applications 69 https://www.youtube.com/watch?v=gEy91PGGLR0&feature=share
  • 71. End of First Week 71
  • 73. Epipolar Line in Various Cases 73 • Motion parallel with image plane • Forward motion • Conversing case http://people.scs.carleton.ca/~c_shu/Courses/comp4900d/notes/epipolar.pdf
  • 74. • 회전을 해도 세그먼트 간의 거리는 유지되기 때문에 잘 동작할것으로 판단 • 하지만 아주 특수한 경우에 타겟맵의 세그먼트가 로컬맵의 회전한것과 아주 동일하게 구성되어 있으면 실패할 가능성 Consistency Graph with Severe Rotation 74 0 deg 180 deg 90 deg
  • 77. • Large scale direct (feature-less) monocular SLAM algorithm • URL: https://github.com/tum-vision/lsd_slam • Main components • Tracking • Depth map estimation • Map optimization • Key features • Novel direct tracking method • Elegant probabilistic solution to include the effect of noisy depth values into tracking • Large-scale and real time operation on a CPU LSD-SLAM 77
  • 78. Feature-based VS Direct Method 78 Feature-based Direct method
  • 79. • Manifold • Mathematical space that is not necessarily Euclidean on a global scale, but can be seen as Euclidean on a local scale • Why use the manifold for robotics or computer vision? • The translation clearly forms Euclidean space, while the rotational components span over the non-Euclidean 3D rotation group • Lie group SE(3) <-> Lie algebra se(3) Optimization on the Manifold 79 Angular velocity Linear velocity 𝝃 = 𝑣W 𝑣Z 𝑣{ 𝜔W 𝜔Z 𝜔{ 𝐺(𝝃) = 𝑟BB 𝑟B? 𝑟By 𝑇W 𝑟?B 𝑟?? 𝑟?y 𝑇Z 𝑟yB 𝑟y? 𝑟yy 𝑌{ 0 0 0 1 G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based SLAM,” IEEE Intell. Transp. Syst. Mag., vol. 2, no. 4, pp. 31–43, 2010. Section 10.3.3, B. Claraco, “A tutorial on SE ( 3 ) transformation parameterizations and on-manifold optimization,” no. 3, 2018. Exponential map Logarithm map
  • 80. • Relative 3D pose • between existing keyframe K" = (𝐼", 𝐷", 𝑉") and a new image I" • [SE3Tracker::trackFrame] Tracking 80 3D projective warp function photometric residual photometric variance
  • 81. • Minimizing the intensity error between two images • Photometric residual 𝑟• 𝐩, 𝝃Œ" ≔ 𝐼" 𝐩 − 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" • In order to apply Gauss-Newton minimization, Jacobian is necessary. • n-th delta for the parameter 𝛿𝝃(𝒏) = − 𝐽z 𝐽 mB 𝑟(𝝃 𝒏 ) • The new estimate is obtained by multiplication with the computed update • 𝝃(𝒏Ÿ𝟏) = 𝛿𝝃(𝒏) ∘ 𝝃(𝒏) • Jacobian is calculated from 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" for parameter 𝝃Œ" • More specifically, 𝐼Œ 𝜋 𝑔 𝐩, 𝐷" 𝐩 , 𝐺 𝝃Œ" • Therefore, it can be decomposed into a product of Jacobians, • 𝐽£¤ 𝝃Œ" = 𝐽¥¦ 𝐽§ 𝐽¨ 𝐽© • Relationship among functions Photometric Residual 81 𝐼 𝜋 𝑔 𝐺 𝝃Lie algebra to Lie group Rigid body transformProjectionPixel intensity
  • 85. • Photometric residual 𝑟• 𝐩, 𝝃Œ" ≔ 𝐼" 𝐩 − 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" • Derivative is calculated from 𝐼Œ 𝜔 𝐩, 𝐷" 𝐩 , 𝝃Œ" for parameter 𝐷" 𝐩 • More specifically, 𝐼Œ 𝜋 𝑔 𝐩, 𝐷" 𝐩 , 𝐺 𝝃Œ" • Therefore, it can be decomposed into a product of derivatives, • «£¤ 𝐩,𝝃¦¬ «-¬ 𝐩 = «¥¦ «§ «§ «¨ «¨ «-¬(𝐩) • = 𝛻𝐼Œ¯ 𝛻𝐼Œ° B { 0 − ± ²³ 0 B { − ´ ²³ (𝑇W−𝑋)/𝑑 (𝑇Z−𝑌)/𝑑 (𝑇{−𝑍)/𝑑 • = 𝛻𝐼Œ¯ 𝛻𝐼Œ° 𝑇W − 𝑋 𝑍 − 𝑇{ − 𝑍 𝑋 /𝑍? 𝑑 𝑇Z − 𝑌 𝑍 − 𝑇{ − 𝑍 𝑌 /𝑍? 𝑑 • [SE3Tracker::calcWeightsAndResidual] Photometric Variance 85 Gaussian image intensity noise inverse depth variance
  • 86. • «£¤ 𝐩,𝝃¦¬ «-¬ 𝐩 = «¥¦ «§ «§ «¨ «¨ «-¬(𝐩) = 𝛻𝐼Œ¯ 𝛻𝐼Œ° B { 0 − ± ²³ 0 B { − ´ ²³ (𝑇W−𝑋)/𝑑 (𝑇Z−𝑌)/𝑑 (𝑇{−𝑍)/𝑑 • «¨ «-¬(𝐩) = Chain of Derivatives 86
  • 87. • Keyframe selection • If the camera moves too far away from the existing map, a new keyframe is created from the most recent tracked image. • [getRefFrameScore] Depth Map Estimation 87 not a keyframe keyframe keyframe
  • 88. • Depth map creation • Depth map for the keyframe is initialized by projecting points from the previous keyframe • Depth map is scaled to have a mean inverse depth of one • Procedure • Compute the epipolar line 𝐥6 = 𝐅𝐱 = 𝐊m𝐓 𝐭 𝐱 𝐑𝐊m𝟏 𝐱 • Geometric and photometric disparity error • Check the magnitude of the image gradient along the epipolar line • Check the angle between the image gradient and the epipolar line • [DepthMap::makeAndCheckEPL] • Stereo matching • between current keyframe and the previous keyframe • using SSD error over five equidistant points on the epipolar line • [DepthMap::doLineStereo] Depth Map Estimation 88 J. Engel, J. Sturm, and D. Cremers, “Semi-dense visual odometry for a monocular camera,” in Proceedings of the IEEE International Conference on Computer V ision, 2013, pp. 1449–1456.
  • 89. • Constraint search • Finding the Loop closure candidates via the appearance based method (ex. BoVW) • Direct image alignment on sim(3) • Finding the transformation between current keyframe and the loop closure candidates • Tracking in Sim(3) space • [Sim3Tracker::trackFrameSim3] Constraint Acquisition 89
  • 90. Map Optimization • Pose graph optimization by general graphs (g2o library) • General graphs • Graphical models that are well suited to modeling complex estimation problems • Nodes • Unknown random variables in the estimation problem • Robot poses • Edges • Probabilistic information on those variables, derived from measurements or prior knowledge • Odometry, loop closure constraints • [SlamSystem::optimizationIteration] 90
  • 91. • Autonomous drone driving Applications 91 https://www.youtube.com/watch?v=BLY3kgeZrZg
  • 95. • Real-time, large-scale depth fusion and tracking framework • URL: https://github.com/victorprad/InfiniTAM • Main components • Tracking • Allocation • Integration • Raycasting • Relocalisation & Loop Closure Detection • Data structure • Volumetric representation • Voxel block hashing InfiniTam 95 V. A. Prisacariu et al., “InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure,” 2017.
  • 96. • Volumetric representation using a hash lookup • Data structures and operations • Voxel • Voxel block hash • Hash table and hashing function • Hash table operations • Voxel • A value on a regular grid in 3D space, the extended concept of 2D pixel • Widely used for realistic rendering 3D object in computer graphics • Data • Truncated signed distance function (TSDF) value • TSDF Weight • Color value • Color weight Volumetric Representation 96
  • 97. • Truncated signed distance function (TSDF) • Predefined 3D volume is subdivided uniformly into a 3D grid of voxels • These values are positive in-front of the surface and negative behind • Zero-crossing point means the surface of the object Volumetric Representation 97
  • 98. • Voxel block array • Majority of data stored in the regular voxel grid is marked either as • Free space • Unobserved space • Only store the surface data by efficient hashing scheme • Grouped voxels in blocks of predefined size (ex. 8x8x8) • Data • Positions of the corner of the 8x8x8 voxel block • Offset in the excess list • Pointer to the voxel block array Volumetric Representation 98 M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger, “Real-time 3D reconstruction at scale using voxel hashing,” ACM Trans. Graph., vol. 32, no. 6, pp. 1–11, 2013.
  • 99. • Hash table • To quickly and efficiently find the position of a certain voxel block in the voxel block array • Contiguous array of ITMHashEntry objects • Hashing function • For locating entries of the hash table takes the corner coordinates of a 3D voxel block • Hash collision case • Use the additional unordered excess list • Store an offset in the voxel block array Volumetric Representation 99
  • 100. • Hash table operations • Given a target 3D voxel location in world coordinates • Compute its corresponding voxel block location by dividing the voxel location by the size of the voxel block array • Call the hashing function to compute the index of the bucket from the ordered part of th e hash table • Retrieval • Returns the voxel stored at the target location within the block addressed by the hash entry • Insertion • Reserves a block inside the voxel block array Volumetric Representation 100
  • 101. • To determine the pose of a new camera frame given the 3D world model • Diverse methods • Using only the depth • Inspired by Point-to-Plane ICP • [class ITMDepthTracker] • Using only the color • Inspired by the direct method • [class ITMColorTracker] • Using both data • Utilize both approaches • [class ITMExtendedTracker] • Main differences of the extended tracker • Huber-norm instead of the standard L2 norm • Error term weighted by its depth measurement • Tracking failure determination by SVM classifier Tracking 101
  • 102. • Huber-norm instead of the standard L2 norm • Huber-Norm • The squared loss has the disadvantage that it has the tendency to be dominated by outliers • Huber-Norm is quadratic for small values of residual, and linear for large values • Huber-norm in the code Tracking 102 https://en.wikipedia.org/wiki/Huber_loss Same
  • 103. • Error term weighted by its depth measurement • The error term for each pixel of the depth image is weighted according to its depth meas urement provided by the sensor. • The reliability of depth measurement decreases with the increase in distance reading. • Depth weight in the code Tracking 103 K. Khoshelham and S. O. Elberink, “Accuracy and resolution of kinect depth data for indoor mapping applications,” Sensors, Jan. 2012.
  • 104. • Three main stages in allocation • 1) backproject a line connecting 𝑑 − 𝜇 to 𝑑 + 𝜇 • 𝑑 : depth in image coordinates • 𝜇: a fixed, tunable parameter • This leads to a line in world coordinates, which intersects a number of voxel blocks. • Search the hash table for each of these blocks and look for a free hash entry • 2) allocate voxel blocks for each non zero entry in the allocation and visibility arrays • 3) build a list of live hash entries • [AllocateSceneFromDepth] Allocation 104 d+μ d d-μ virtual voxel block grid
  • 105. • TSDF integration • If a voxel is behind the surface observed in the new depth image, the image does not contain any new information about it, and the function returns. • If the voxel is close to or in front of the observed surface, a corresponding observation is added to the accumulated sum. • [IntegrateIntoScene] Integration 105 S. Izadi, D. Kim, O. Hilliges, and … D. M., “KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera,” in Proceedings of the 24th …, 2 011.
  • 106. • Motivation • Depth image is computed from the updated 3D world model given a camera pose • Input to the tracking step in the next frame and also for visualization. • Raycasting • A ray is being cast from the camera up until an intersection with the surface is found • Checking the value of the TSDF at each voxel along the ray until a zero-crossing is found • State machine to efficiently handle the sparse volumetric space • SEARCH_BLOCK_COARSE • SEARCH_BLOCK_FINE • SEARCH_SURFACE • BEHIND_SURFACE • [castRay] Raycasting 106
  • 107. • SEARCH_BLOCK_COARSE state • Take steps of the size of each block, i.e. 8 voxels • Until an actually allocated block is encountered • SEARCH_BLOCK_FINE state • Once the ray enters an allocated block, step back and enter this state • The step length is now limited by the truncation band of the SDF. • SEARCH_SURFACE state • Once the ray enters a valid block and the values in that block indicate we are still in front of the surface, the state is changed to SEARCH_SURFACE • BEHIND_SURFACE state • Until a negative value is read from the SDF • This terminates the raycasting iteration and the exact location of the surface is now found. Raycasting 107 SEARC H _BLO C K_C O ARSE SEARC H _BLO C K_FIN E SEARC H _SU RFAC E BEH IN D_SU RFAC E
  • 108. • Keyframe-based random ferns relocaliser • To relocalise the camera when tracking fails • To detect loop closures when aiming to construct a globally-consistent scene • Procedure • Downsample and preprocess image • Each of m code blocks is obtained by applying a random fern to I • Fern is a set of n binary feature tests on the image, each yielding either 0 or 1 • 𝑏º» ¥ ∈ 𝐵@ denote the n-bit binary code resulting from applying fern 𝐹¿ to I • 𝑏À ¥ ∈ 𝐵U@ denote the result of concatenating all m such binary codes for I • Dissimilarity measure between two different images I and J as the block-wise Hamming distance between 𝑏À ¥ and 𝑏À Á : where is 0 if the two code blocks are identical, and 1 otherwise Relocalisation & Loop Closure Detection 108
  • 109. • Caching strategy for the efficient BlockHD computation Relocalisation & Loop Closure Detection 109 the number of ferns (ex. 500) the number of code blocks (ex.16) keyframe ID similarity the number of keyframes fern id code id keyframe ID query code fragments keyframes query frame
  • 110. • Idea behind the relocaliser • Encode an RGB-D image I as a set of m binary code blocks, each of length n • To learn a lookup table from encodings of keyframe images to their known camera poses • Harvesting keyframes • If there is no similar keyframe in the keyframe DB, harvest a new keyframe • Relocalization • By finding the nearest neighbours of the encoding of the current camera input image in t his table, and trying to use their recorded pose to restart tracking • Loop closure detection • Determine whether the current frame has been seen in the previous frames or not • [Relocaliser::ProcessFrame] Relocalisation & Loop Closure Detection 110
  • 111. • Submap based approach • Division of the scene into multiple rigid submaps • Active submaps: tracked against at each frame • Passive submaps: maintained, but not tracked against unless they become active again Globally-Consistent Reconstruction 111 N. Fioraio, J. Taylor, A. Fitzgibbon, L. Di Stefano, and S. Izadi, “Large-scale and drift-free surface reconstruction using online subvolume registration,” 2015 IEEE Co nf. Comput. Vis. Pattern Recognit., pp. 4475–4483, Jan. 2015.
  • 112. • Graph representation • Numerical solution of the equation can be obtained by using popular Gauss-Newton Recall: Pose Graph Optimization 112 eB? e?y eyà eÃÄ eÅB eÄ? measurement estimation G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based SLAM,” IEEE Intell. Transp. Syst. Mag., vol. 2, no. 4, pp. 31–43, 2010.
  • 113. • First order Taylor expansion • Substituting the equation Least Squares Optimization 113 https://en.wikipedia.org/wiki/Linear_approximation
  • 114. • Rewrite the function F(x) • Minimizing the quadratic form by, • The solution is obtained by adding the increments ∆x∗ to the initial guess Least Squares Optimization 114 where
  • 115. • The Matrix H and the vector b are obtained by summing up a set of matrices and v ectors, one for every constraint. • Every constraint will contribute to the system with an addend term. • The structure of this addend depends on the Jacobian of the error function. • Since the error function of a constraint depends only on the values of two nodes, th e Jacobian has the following form. Structure of Linearized System 115
  • 116. Structure of Linearized System 116 • [runGlobalAdjustment]
  • 119. • 3D reconstruction • Bundlefusion : http://graphics.stanford.edu/projects/bundlefusion/ Applications 119 https://www.youtube.com/watch?v=keIirXrRb1k
  • 122. • Reconstructing 3D structure from its projections into a series of images taken from different viewpoints • General SfM pipeline Structure-from-Motion 122 Result of Rome with 21K registered out of 75K images
  • 123. • Feature extraction • Features should be invariant under radiometric and geometric changes • SfM can uniquely recognize them in multiple images. • Ex) SIFT, SURF, ORB, FAST, BRIEF… • [RunFeatureExtraction] • Matching • Matching the images that see the same scene part • By leveraging the features as an appearance description of the images. • Output • a set of potentially overlapping image pairs • their associated feature correspondences • Naïve approach • Test every image pair for scene overlap • O(𝑁¥ ? 𝑁º¬ ? ) • Scalable and efficient matching is necessary. • [RunFeatureMatching] Correspondence Search 123
  • 124. • Geometric verification • Since matching is based solely on appearance, it is not guaranteed that corresponding features actually map to the same scene point. • Diverse verification methods • Projective geometry (pinhole camera model) • Epipolar geometry (essential matrix, fundamental matrix) • Homography • If a valid transformation maps a sufficient number of features between the images, they are considered geometrically verified. • RANSAC is required to remove outliers • Output • a set of geometrically verified image pairs • Their associated inlier correspondences • (optional) a description of their geometric relation • Scene-graph with images as nodes and verified pairs of images as edges Correspondence Search 124
  • 125. • Image registration • New images can be registered to the current model by solving the PnP problem • Perspective-n-Point Problem • 2D—3D correspondences • To determine the position and orientation of a camera • Given its intrinsic parameters and a set of n correspondences between 3D points and their 2D projections • Procedure • (1) Estimate point clouds from 2D projections • (2) Compute the transformation between the estimated point clouds and the given point clouds • [EPNPEstimator::ComputePose] Incremental Reconstruction 125 V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis., vol. 81, no. 2, pp. 155–166, 2009.
  • 126. • Bundle adjustment • Joint non-linear refinement of camera parameters 𝑃u and point parameters 𝑋¿ • Minimize the reprojection error • Levenberg-Marquardt is the method of choice for solving BA problems • [BundleAdjuster::BundleAdjuster] Incremental Reconstruction 126
  • 127. • Previous SfM frameworks (Bundler, VisualSFM) • Challenges • the current state-of-the-art SfM algorithms is good, but not enough • Fail to produce fully satisfactory results in terms of completeness and robustness. • Problems • Correspondence search producing an incomplete scene graph • Reconstruction stage failing to register images due to missing or inaccurate scene • Symbiotic relationship between image registration and triangulation • COLMAP • Novel SfM algorithm containing following contributions to achieve the ultimate goal • Contributions • Scene graph augmentation • Next best view selection maximizing the robustness and accuracy • Robust and efficient triangulation method • Iterative BA, re-triangulation, and outlier filtering strategy COLMAP 127
  • 129. • Augmented geometric verification • The number of inliers for the fundamental matrix 𝑁º • The number of inliers for the essential matrix 𝑁È • The number of inliers for the homography 𝑁É • Check the ratio ÊË ÊÌ , ÊÍ ÊË , ÊÍ ÊÌ and label type of the two view geometry • If ÊË ÊÌ < 𝜖Ⱥ and ÊÍ ÊË < 𝜖ÉÈ , then “planar or panoramic (pure rotation)” • If ÊË ÊÌ < 𝜖Ⱥ and ÊÍ ÊË > 𝜖ÉÈ , then “calibrated” • If ÊË ÊÌ > 𝜖Ⱥ and ÊÍ ÊÌ < 𝜖ɺ , then “planar or panoramic (pure rotation)” • If ÊË ÊÌ > 𝜖Ⱥ and ÊÍ ÊÌ > 𝜖ÉÈ , then “uncalibrated” • Seed for reconstruction • Non-panoramic • Calibrated image pairs • Do not triangulate • Panoramic image pairs to avoid degenerate points • [EstimateInitialTwoViewGeometry] Scene Graph Augmentation 129
  • 130. • Frequent problem in Internet photos • Watermarks, timestamps, and frames (WTF) • Incorrectly link images of different landmarks • Two assumptions for WTF • (1) watermarks and frames always have the exact same appearance • (2) all WTFs are typically close to the border the image. • Procedure • Estimate a translation transformation with 𝑁z inliers at the image borders • Any image pair with ÊÎ ÊÌ < 𝜖zº is considered a WTF and not inserted to the scene graph • [DetectWatermark] Scene Graph Augmentation 130 T. Weyand, C. Y. Tsai, and B. Leibe, “Fixing WTFs: Detecting image matches caused by watermarks, timestamps, and frames in internet photos,” in Proceedings - 2 015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015, 2015, pp. 1185–1192.
  • 131. • Motivation • Choosing the next best view is critical, as every decision impacts the remaining reconstruction. • A single bad decision may lead to a cascade of camera mis-registrations. • Diverse strategies • MAX_VISIBLE_POINTS_NUM • To choose the image that sees most triangulated points • MAX_VISIBLE_POINTS_RATIO • Higher ratio of visible points on observations • MIN_UNCERTAINTY • More visible points and a more uniform distribution of points • [IncrementalMapper::FindNextImages] Next Best View Selection 131
  • 132. • Refinement using multiple view triangulation • [TriangulateMultiViewPoint] • Cheirality constraint • Positive depth with respect to the camera views • [HasPointPositiveDepth] • Sufficient triangulation angle • Angle between two rays should be bigger enough. • [CalculateTriangulationAngle] Robust and Efficient Triangulation 132
  • 133. • Before BA: Re-Triangulation • To improve the completeness of the reconstruction by continuing the tracks of points that previously failed to triangulate • continue tracks with observations whose errors are below the filtering thresholds • [IncrementalTriangulator::Retriangulate] • After BA: Filtering • Filter observations with large reprojection errors • Enforcing a minimum triangulation angle over all pairs of viewing rays • [Reconstruction::FilterAllPoints3D] • Iterative refinement • Perform RT, BA and filtering in an iterative optimization • until the number of filtered observations and post_BA RT points diminishes. • [IterativeGlobalRefinement] Bundle Adjustment 133
  • 134. • Middlebury Temple • 312 images Demo 134
  • 135. • Middlebury Dino • 363 images Demo 135
  • 136. • Building • 128 images Demo 136
  • 138. SLAM Research & Job Trend in CVPR 2018 138
  • 139. • Computer Vision and Pattern Recognition 2018 • Salt Lake City, Utah • June 18- 22, 2018 • 979 accepted papers • 6512 registered attendees CVPR 2018 139
  • 140. • http://visualslam.ai/ • Deep learning for visual SLAM review Deep Learning for Visual SLAM 140
  • 143. • Naver Labs • https://recruit.naverlabs.com/labs/recruitMain Recruitment Expo 143
  • 144. • Here • https://www.here.com/en Recruitment Expo 144 https://www.youtube.com/watch?v=54J_ZCbeJdc
  • 145. • DiDi Chuxing • https://www.didiglobal.com/#/ Recruitment Expo 145
  • 146. • Vtrus - Robotics scientist , SLAM/Computer Vision • https://www.vtr.us/ Recruitment Expo 146
  • 147. • Skydio • https://www.skydio.com/ Recruitment Expo 147 https://www.youtube.com/watch?v=Gh5pAT1o2V8
  • 148. • DroneDeploy • https://www.dronedeploy.com/ Recruitment Expo 148 https://www.youtube.com/watch?v=NS8WLnoFqyE
  • 150. • Multiple View Geometry in Computer Vision- Richard Hartley and Andrew Zisserman • Numerical Optimization – Jorge Nocedal and Stephen J. Wright • Modern C++ Course 2018 • http://www.ipb.uni-bonn.de/teaching/modern-cpp/ • 따라하며 배우는 C++ 2018 • https://www.youtube.com/playlist?list=PLNfg4W25Tapw5Yx4yuExHNybBIUk68aNz • Photogrammetry (Computer vision) • http://www.ipb.uni-bonn.de/photogrammetry-i-ii/ • Multiple view geometry - Prof. D. Cremers from TUM • https://www.youtube.com/playlist?list=PLTBdjV_4f-EJn6udZ34tht9EVIW7lbeo4 Books & Free Lectures 150
  • 151. • SLAM Research KR • https://www.facebook.com/groups/slamkr/ • 로열모 (로봇공학을 위한 열린 모임) • https://www.facebook.com/groups/KoreanRobotics/ • 오로카 (오픈소스 소프트웨어/하드웨어로 만들어가는 로봇 기술 공유 카페) • https://cafe.naver.com/openrt/10561 Research Groups in SNS 151
  • 152. End of Second Week 152