SlideShare a Scribd company logo
© 2020 Your Company Name
Introduction To Simultaneous
Localization and Mapping (SLAM)
Gareth Cross
Skydio
September, 2020
© 2020 Skydio
What is SLAM?
2
Simultaneous
Localization and Mapping
Recover state of a vehicle or
sensor platform, usually over
multiple time-steps.
Recover location of landmarks
in some common reference
frame.
Simultaneous: We must do these tasks at the same time, as both quantities are
initially unknown.
© 2020 Skydio
An age-old practice
3
Image Source: A History of Ancient Geography among the Greeks and Romans from the Earliest Ages till the Fall of the Roman
Empire via Wikipedia
© 2020 Skydio 4
Image Source: COLMAP / Schönberger, Johannes Lutz and Frahm, Jan-Michael, “Structure From Motion Revisited”, CVPR 2016
© 2020 Skydio
SLAM at Skydio
5
Visual Inertial Odometry (VIO) on the
Skydio drone, an embedded system.
© 2020 Skydio
SLAM vs. Localization
6
Image Source: E. Kaplan, C. Hergarty,
Understanding GPS Principles and Applications,
2005
Video Source: Skydio
© 2020 Skydio
Formulating a SLAM Problem
For every SLAM problem, we have two key ingredients:
1) One or more sensors:
7
Source: MatrixVision
Cameras
Source: Lord MicroStrain
Inertial Measurement Unit LiDAR/Range-finders
Source: Velodyne
RGB-D/Structured Light
Source: Occipital
© 2020 Skydio
Formulating a SLAM Problem
2) A set of states we wish to recover.
8
Ego-motion: Rotation,
position, velocity
World Structure (Map)
Image Source: DroneTest
Calibration Parameters
© 2020 Skydio
Sensor Selection
• Choice of sensor will drive many downstream design considerations.
• Consider the sensor measurement model:
9
Sensor output
Ego-motion
Noise
Calibration parameters
Map
© 2020 Skydio
High Level Goal
Take many measurements (possibly from many sensors), and recover the ego-motion,
map, and calibration parameters.
10
SLAM System
© 2020 Skydio
Example - Calibration
11
Intrinsic temperature distortion may also introduce
unexpected errors into vision estimates.
Image Source: Skydio
Tire inflation will affect the scale of wheel
odometry, as could slippage between the tire and
the road surface.
Image source: MotorTrend.com
Un-modelled extrinsic rotation between IMU and
camera may cause increased drift in a visual SLAM
pipeline.
Image source: MWee RF Microwave
© 2020 Skydio
Example - Rolling Shutter
When selecting a camera sensor for your platform, you have the choice of global or
rolling shutter.
12
Image credit: LucidVision
© 2020 Skydio
Example - Rolling Shutter
13
Rolling shutter deforms rigid objects like the
horizon line and the vehicle itself.
Global-shutter model:
Measured location in
image.
Camera projection
Pose of the camera at
time t0
Point in the world.
Rolling-shutter model:
Inclusion of higher-order derivatives in the measurement model
increases computational cost.
Transformation of world
points into sensor
frame.
© 2020 Skydio
What about noise?
• All sensors exhibit some minimum amount of noise.
• We distinguish between noise and model error.
14
A random error that can only be
modeled via statistical means.
Example: thermal electrical noise.
Errors resulting from a limitation in our
sensor model.
Example: failure to include a calibration
parameter.
© 2020 Skydio
Uncertainty
• Owing to noise in the sensor inputs, SLAM is an inherently uncertain process.
• We can never recover the “true” states, only uncertain estimates of them.
• More measurements usually means reduced uncertainty…
• … But it also means increased computational cost.
15
© 2020 Skydio
The Map
Choice of sensor may also influence map parameterization.
16
Collection of
photographs?
Image Source: Noah Snavely
2D LiDAR
scans?
Image Source: B. Bellekens et al., A
Benchmark Survey of Rigid 3D Point Cloud
Registration Algorithms, 2015
3D range images?
Image Source: Skydio
© 2020 Skydio
Design Trade-offs
17
Computational cost
Sensor cost
Solution error
Where we’d like to be (impossible).
© 2020 Skydio
Factor Graphs
Factor Graphs are a convenient method of graphically representing a SLAM problem.
18
Nodes represent states.
Edges (factors) represent
information about the states, in
the form of measurements or
priors.
Factors may be unary,
binary, ternary, etc…
In a SLAM problem, we will
typically have nodes for our ego-
motion, map, and calibration
parameters.
© 2020 Skydio
Factor Graphs
There is a mapping from the factor graph to our sensor measurements:
19
© 2020 Skydio
Real Example: Bundle Adjustment (BA)
• A form of Structure from Motion (SFM).
• Leverage projective geometry to
recover 3D landmarks and poses from
2D feature associations.
• Highly scalable and can be quite
accurate.
• Using marginalization the compute cost
can be bounded.
20
Image source: Theia SFM
For more details on SFM, see Richard Szeliski’s book as a jumping off point.
© 2020 Skydio
Structure From Motion (SFM)
21
Simultaneous
Localization and Mapping
Recover state of a vehicle or
sensor platform, usually over
multiple time-steps.
Recover location of landmarks
in some common reference
frame.
Recover camera pose
with respect to map
points.
Triangulate map points
using camera poses.
Bundle Adjustment is a form of optimization that does these steps
jointly.
© 2020 Skydio
Typical SLAM Pipeline w/ BA
22
Compute feature
associations
Map
Compute pose of camera
Outlier rejection
Bundle
Adjustment
(Optimization)
Keyframes
Incoming
frames
t
t -1
t - 2
Keyframes store our estimates of
the ego-motion.
1
2 3
4
© 2020 Skydio
How do we get feature associations?
23
Descriptor Matching
Examples: SIFT, KAZE, ORB, SuperPoint
Image Source: Georgia Tech
Feature Tracking / Flow
Examples: Optical Flow, Lucas Kanade Tracking, FlowNet
© 2020 Skydio
Design Trade-offs
A “rule of thumb” principle to consider in selecting features (axes not to scale):
24
Computational Cost
Robustness
Lucas-
Kanade
SIFT
ORB
Deep Networks are somewhat difficult to place since
they offer an adjustable cost-robustness trade-off.
KAZE
© 2020 Skydio
Outliers in Feature Association
25
© 2020 Skydio
Outliers
• Outliers: Data that does not agree with our sensor model.
• How do we deal with them?
• Let’s review a (very simple) toy problem:
26
State: alpha and beta
Measurement
© 2020 Skydio
Toy Problem
27
© 2020 Skydio
Toy Problem
28
© 2020 Skydio
RANSAC (Random Sample Consensus)
29
© 2020 Skydio
RANSAC
30
© 2020 Skydio
RANSAC
31
© 2020 Skydio
RANSAC
32
© 2020 Skydio
RANSAC
33
© 2020 Skydio
RANSAC
34
© 2020 Skydio
RANSAC
35
© 2020 Skydio
RANSAC
36
?
© 2020 Skydio
RANSAC
• Pros:
• Dead simple to implement: Draw K examples, solve, count, repeat.
• Easily wrap around an existing method.
• Trivially parallelized. Have more CPU time? Sample more.
• Cons:
• Relatively weak guarantees.
• Can require a lot of iterations for high outlier fractions or models with a large K.
• Hyper-parameters need tuning.
37
… but, still quite useful in practice.
© 2020 Skydio
RANSAC
38
0
150
300
450
600
750
0% 10% 20% 30% 40% 50% 60% 70% 80%
# Iterations vs. Outlier Fraction
2 Pts 3 Pts 4 Pts
© 2020 Skydio
Typical SLAM Pipeline w/ BA
39
Compute feature
associations
Map
Compute pose of camera
Outlier rejection
Bundle
Adjustment
(Optimization)
Keyframes
Incoming
frames
t
t -1
t - 2
1
2 3
4
© 2020 Skydio
Typical SLAM Pipeline w/ BA
40
Compute feature
associations
Map
Compute pose of camera
Outlier rejection
Bundle
Adjustment
(Optimization)
Keyframes
Incoming
frames
t
t -1
t - 2
1
2 3
4
Typical algorithms:
• PnP
• Essential Matrix
• Homography
© 2020 Skydio
BA as a Factor Graph
41
Bundle of rays
Position, orientation
Camera calibrations
Landmarks
© 2020 Skydio
BA as a SLAM Problem
• How do we actually recover the states, given the measurements and our model?
42
?
Feature tracks form a
‘sensor’ measurement.
Camera poses, landmark
positions, calibration
parameters.
Measurement model is given by the
projective geometry of the problem.
We need to fill in this box.
© 2020 Skydio
Solving the Problem
• We can use a technique called Nonlinear Least Squares to do this.
• There are many ways to formulate SLAM problems generally, and we cannot review
them all in the time allotted.
• However, this method is widely applicable, typically fast, and is straightforward to
implement.
• For a much more comprehensive review, I highly recommend: State Estimation for
Robotics, Tim Barfoot, 2015 (Free online)
43
© 2020 Skydio
Assumptions
• We will convert our measurement models into a system of equations.
• Prior to that, we will make an additional assumption - that the measurement noise is
drawn from a zero-mean gaussian.
44
• We will also assume we have an initial guess for our states. In a time recursive system,
this could come from the previous frame.
© 2020 Skydio
Nonlinear Least Squares
We re-write our measurements as a residual functions:
45
The i’th camera pose, observing the j’th landmark.
And concatenate these into a large vector:
We take the squared Mahalanobis norm,
weighting by our assumed measurement
uncertainty.
© 2020 Skydio
Nonlinear Least Squares
Our ‘best estimate’ will occur when the objective function is minimized:
46
Because f is usually going to be non-linear for most SLAM problems,
we end up linearizing the problem and taking a series of steps.
Jacobian J is the linearization of f about
our initial guess.
© 2020 Skydio
Nonlinear Least Squares
The solution at each iteration:
47
When linearized about the converged solution, the inverted Hessian doubles as a first order
approximation of the marginal covariance of our estimate: *
* See Barfoot, Chapters 3 and 4.
First order approximation of the Hessian. Inversion
has complexity O(|y|3)
Each residual is weighted by its inverse uncertainty.
© 2020 Skydio
Sparsity
48
Binary
Factor
© 2020 Skydio
Nonlinear Least Squares
• In the linearized form, the problem is ‘easy’ to solve.
• Reduces to iterated application of weighted least squares.
• Generally, cost of solving for updates is cubic in the number of states:
• However, in some problems (like BA) there is sparsity we can leverage to improve
this.
• Huge number of problems can be cast this way (given an initial guess).
• Can run in a fixed memory footprint → suitable for embedded use case.
• With the appropriate Σ weights we can show the NLS produces an approximate estimate
of the uncertainty in our solution.
49
© 2020 Skydio
Caveats
• Remember our assumptions:
• We needed an initial guess to linearize the system. If the guess is poor, the gradient
used in the optimizer will steer our solution in the wrong direction.
• Additionally, the covariance estimate we get out is only as good as the linearization
point.
• We also assumed Gaussian noise on the measurements.
• Outliers must be removed, or they will dominate the optimization.
50
© 2020 Skydio
Linearization
• It is worth considering the effect of linearization on our uncertainty estimate.
• For a Gaussian variable u and non-linear vector function g, we can approximate:
51
Because we linearized, the fidelity of our first-order Σ
relies on this approximation.
See Barfoot, Chapter 2.
© 2020 Skydio
Tools
• Some relevant tools:
• GTSAM, open source package created by Frank Dellaert et al.
• Allows specification of problem in factor graph format, built for SLAM.
• G2O
• Includes solutions for SLAM and BA.
• Ceres Solver, produced by Google
• General non-linear least-squares optimizer.
• Python
• scipy.optimize.least_squares
52
© 2020 Skydio
BA on Real-Time Systems
• BA can operate at small and large scale.
• Small: A few image frames on a mobile phone.
• Large: Tens of thousand of images at city-scale.
• Fairly straightforward to implement.
• But:
• Robust association may require expensive descriptors.
• After feature association, we must devote nontrivial compute to outlier rejection.
• Update rate limited to camera frame rate (slow).
53
© 2020 Skydio
VIO as a Factor Graph: BA + IMU
54
Bundle of rays
Position, orientation, velocity
Biases/IMU calibration parameters
Camera calibrations
Landmarks
IMU, motion model
Source: MatrixVision
Cameras
Source: Lord MicroStrain
Inertial Measurement Unit
© 2020 Skydio
VIO
• One of the most successful adaptations of vision research to the market.
• Present in smart phones, AR/VR headsets, drones, autonomous vehicles.
• Camera and IMU are highly complementary:
• Camera:
• Low update rate, high compute cost, subject to outlier data.
• Able to relocalize accurately at large distances.
• IMU:
• High update rate, low compute cost, few outliers (maybe saturation).
• Accurate over short intervals, but drifts over time.
• Able to recover attitude with respect to global reference frame (gravity).
55
© 2020 Skydio
VIO
56
Compute feature
associations
Map
Compute pose of camera
Outlier rejection
Bundle
Adjustment
(Optimization)
Keyframes
Incoming
frames
t
t -1
t - 2
IMU can deliver substantial value here.
© 2020 Skydio
BA/VIO Implementations
• Existing open-source implementations (not exhaustive):
• OpenMVG
• COLMAP Offline SFM and Multi-view Stereo (MVS)
• CMVS Multi-view Stereo
• ORB-SLAM2 Real-time SLAM featuring BA optimization
• PTAM One of the earliest functional visual-SLAM demos
• VINS-Mono VIO, runs on a mobile device
• Basalt VIO
• ROVIO VIO, example of a direct method
57
© 2020 Skydio
Fin
• Additional Reading:
• State Estimation for Robotics (Barfoot, 2015)
• Factor Graphs for Robot Perception (Dellaert and Kaess, 2017)
• Visual Odometry, (Scaramuzza and Fraundorfer, 2011)
• Probabilistic Robotics, (Thrun, Burgard, and Fox, 2005)
• GTSAM Software Library
Questions? Feel free to reach out: gareth@skydio.com
58

More Related Content

PPTX
Hable John Uncharted2 Hdr Lighting
PDF
Depth Fusion from RGB and Depth Sensors by Deep Learning
PPT
Crysis Next-Gen Effects (GDC 2008)
PPTX
Lighting the City of Glass
PDF
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
PPTX
Stochastic Screen-Space Reflections
PPTX
Moving Frostbite to Physically Based Rendering
PPTX
light-detection-and-ranging(lidar)
Hable John Uncharted2 Hdr Lighting
Depth Fusion from RGB and Depth Sensors by Deep Learning
Crysis Next-Gen Effects (GDC 2008)
Lighting the City of Glass
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
Stochastic Screen-Space Reflections
Moving Frostbite to Physically Based Rendering
light-detection-and-ranging(lidar)

What's hot (20)

PDF
SfMLearner++ Intro
PDF
Precomputed atmospheric scattering(사전 계산 대기 산란)
PPTX
Digital image processing
PDF
=SLAM ppt.pdf
PDF
Introduction of slam
PPTX
Digital image processing
PPTX
Single image haze removal
PPT
Raspberrypi best ppt
PPT
The single image dehazing based on efficient transmission estimation
KEY
Preemptive RANSAC by David Nister.
PPTX
satellite Communication
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPTX
Global Illumination
PDF
Taking Killzone Shadow Fall Image Quality Into The Next Generation
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PDF
Physically Based Lighting in Unreal Engine 4
PDF
30th コンピュータビジョン勉強会@関東 DynamicFusion
PDF
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
PDF
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
SfMLearner++ Intro
Precomputed atmospheric scattering(사전 계산 대기 산란)
Digital image processing
=SLAM ppt.pdf
Introduction of slam
Digital image processing
Single image haze removal
Raspberrypi best ppt
The single image dehazing based on efficient transmission estimation
Preemptive RANSAC by David Nister.
satellite Communication
Rendering Technologies from Crysis 3 (GDC 2013)
Global Illumination
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based Lighting in Unreal Engine 4
30th コンピュータビジョン勉強会@関東 DynamicFusion
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Ad

Similar to “Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentation from Skydio (20)

PDF
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
PDF
“Tackling Extreme Visual Conditions for Autonomous UAVs In the Wild,” a Prese...
PPTX
From STC (Stereo Camera onboard on Bepi Colombo ESA Mission) to Blender
PPTX
CHAPTER - 4 for software engineering (1).pptx
PDF
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
PPTX
Realtime pothole detection system using improved CNN Models
PPTX
GRT Imaging for Seismic AVO/AVA Inversion
PDF
Optimized Rendering Techniques for Mobile VR
PPTX
2 Prelaunch Assessment of the NG VCM.pptx
PPTX
Explaining the decisions of image/video classifiers
PPTX
GRT Imaging for Seismic AVO/AVA Inversion
PDF
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
PDF
How Deep Learning Could Predict Weather Events
PDF
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
PDF
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
PDF
Instalatii_petroliere_03_EN
PDF
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
PPTX
IGARSS-SAR-Pritt.pptx
PPTX
IGARSS-SAR-Pritt.pptx
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Tackling Extreme Visual Conditions for Autonomous UAVs In the Wild,” a Prese...
From STC (Stereo Camera onboard on Bepi Colombo ESA Mission) to Blender
CHAPTER - 4 for software engineering (1).pptx
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
Realtime pothole detection system using improved CNN Models
GRT Imaging for Seismic AVO/AVA Inversion
Optimized Rendering Techniques for Mobile VR
2 Prelaunch Assessment of the NG VCM.pptx
Explaining the decisions of image/video classifiers
GRT Imaging for Seismic AVO/AVA Inversion
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
How Deep Learning Could Predict Weather Events
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
Instalatii_petroliere_03_EN
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
IGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptx
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Spectroscopy.pptx food analysis technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectroscopy.pptx food analysis technology
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentation from Skydio

  • 1. © 2020 Your Company Name Introduction To Simultaneous Localization and Mapping (SLAM) Gareth Cross Skydio September, 2020
  • 2. © 2020 Skydio What is SLAM? 2 Simultaneous Localization and Mapping Recover state of a vehicle or sensor platform, usually over multiple time-steps. Recover location of landmarks in some common reference frame. Simultaneous: We must do these tasks at the same time, as both quantities are initially unknown.
  • 3. © 2020 Skydio An age-old practice 3 Image Source: A History of Ancient Geography among the Greeks and Romans from the Earliest Ages till the Fall of the Roman Empire via Wikipedia
  • 4. © 2020 Skydio 4 Image Source: COLMAP / Schönberger, Johannes Lutz and Frahm, Jan-Michael, “Structure From Motion Revisited”, CVPR 2016
  • 5. © 2020 Skydio SLAM at Skydio 5 Visual Inertial Odometry (VIO) on the Skydio drone, an embedded system.
  • 6. © 2020 Skydio SLAM vs. Localization 6 Image Source: E. Kaplan, C. Hergarty, Understanding GPS Principles and Applications, 2005 Video Source: Skydio
  • 7. © 2020 Skydio Formulating a SLAM Problem For every SLAM problem, we have two key ingredients: 1) One or more sensors: 7 Source: MatrixVision Cameras Source: Lord MicroStrain Inertial Measurement Unit LiDAR/Range-finders Source: Velodyne RGB-D/Structured Light Source: Occipital
  • 8. © 2020 Skydio Formulating a SLAM Problem 2) A set of states we wish to recover. 8 Ego-motion: Rotation, position, velocity World Structure (Map) Image Source: DroneTest Calibration Parameters
  • 9. © 2020 Skydio Sensor Selection • Choice of sensor will drive many downstream design considerations. • Consider the sensor measurement model: 9 Sensor output Ego-motion Noise Calibration parameters Map
  • 10. © 2020 Skydio High Level Goal Take many measurements (possibly from many sensors), and recover the ego-motion, map, and calibration parameters. 10 SLAM System
  • 11. © 2020 Skydio Example - Calibration 11 Intrinsic temperature distortion may also introduce unexpected errors into vision estimates. Image Source: Skydio Tire inflation will affect the scale of wheel odometry, as could slippage between the tire and the road surface. Image source: MotorTrend.com Un-modelled extrinsic rotation between IMU and camera may cause increased drift in a visual SLAM pipeline. Image source: MWee RF Microwave
  • 12. © 2020 Skydio Example - Rolling Shutter When selecting a camera sensor for your platform, you have the choice of global or rolling shutter. 12 Image credit: LucidVision
  • 13. © 2020 Skydio Example - Rolling Shutter 13 Rolling shutter deforms rigid objects like the horizon line and the vehicle itself. Global-shutter model: Measured location in image. Camera projection Pose of the camera at time t0 Point in the world. Rolling-shutter model: Inclusion of higher-order derivatives in the measurement model increases computational cost. Transformation of world points into sensor frame.
  • 14. © 2020 Skydio What about noise? • All sensors exhibit some minimum amount of noise. • We distinguish between noise and model error. 14 A random error that can only be modeled via statistical means. Example: thermal electrical noise. Errors resulting from a limitation in our sensor model. Example: failure to include a calibration parameter.
  • 15. © 2020 Skydio Uncertainty • Owing to noise in the sensor inputs, SLAM is an inherently uncertain process. • We can never recover the “true” states, only uncertain estimates of them. • More measurements usually means reduced uncertainty… • … But it also means increased computational cost. 15
  • 16. © 2020 Skydio The Map Choice of sensor may also influence map parameterization. 16 Collection of photographs? Image Source: Noah Snavely 2D LiDAR scans? Image Source: B. Bellekens et al., A Benchmark Survey of Rigid 3D Point Cloud Registration Algorithms, 2015 3D range images? Image Source: Skydio
  • 17. © 2020 Skydio Design Trade-offs 17 Computational cost Sensor cost Solution error Where we’d like to be (impossible).
  • 18. © 2020 Skydio Factor Graphs Factor Graphs are a convenient method of graphically representing a SLAM problem. 18 Nodes represent states. Edges (factors) represent information about the states, in the form of measurements or priors. Factors may be unary, binary, ternary, etc… In a SLAM problem, we will typically have nodes for our ego- motion, map, and calibration parameters.
  • 19. © 2020 Skydio Factor Graphs There is a mapping from the factor graph to our sensor measurements: 19
  • 20. © 2020 Skydio Real Example: Bundle Adjustment (BA) • A form of Structure from Motion (SFM). • Leverage projective geometry to recover 3D landmarks and poses from 2D feature associations. • Highly scalable and can be quite accurate. • Using marginalization the compute cost can be bounded. 20 Image source: Theia SFM For more details on SFM, see Richard Szeliski’s book as a jumping off point.
  • 21. © 2020 Skydio Structure From Motion (SFM) 21 Simultaneous Localization and Mapping Recover state of a vehicle or sensor platform, usually over multiple time-steps. Recover location of landmarks in some common reference frame. Recover camera pose with respect to map points. Triangulate map points using camera poses. Bundle Adjustment is a form of optimization that does these steps jointly.
  • 22. © 2020 Skydio Typical SLAM Pipeline w/ BA 22 Compute feature associations Map Compute pose of camera Outlier rejection Bundle Adjustment (Optimization) Keyframes Incoming frames t t -1 t - 2 Keyframes store our estimates of the ego-motion. 1 2 3 4
  • 23. © 2020 Skydio How do we get feature associations? 23 Descriptor Matching Examples: SIFT, KAZE, ORB, SuperPoint Image Source: Georgia Tech Feature Tracking / Flow Examples: Optical Flow, Lucas Kanade Tracking, FlowNet
  • 24. © 2020 Skydio Design Trade-offs A “rule of thumb” principle to consider in selecting features (axes not to scale): 24 Computational Cost Robustness Lucas- Kanade SIFT ORB Deep Networks are somewhat difficult to place since they offer an adjustable cost-robustness trade-off. KAZE
  • 25. © 2020 Skydio Outliers in Feature Association 25
  • 26. © 2020 Skydio Outliers • Outliers: Data that does not agree with our sensor model. • How do we deal with them? • Let’s review a (very simple) toy problem: 26 State: alpha and beta Measurement
  • 27. © 2020 Skydio Toy Problem 27
  • 28. © 2020 Skydio Toy Problem 28
  • 29. © 2020 Skydio RANSAC (Random Sample Consensus) 29
  • 37. © 2020 Skydio RANSAC • Pros: • Dead simple to implement: Draw K examples, solve, count, repeat. • Easily wrap around an existing method. • Trivially parallelized. Have more CPU time? Sample more. • Cons: • Relatively weak guarantees. • Can require a lot of iterations for high outlier fractions or models with a large K. • Hyper-parameters need tuning. 37 … but, still quite useful in practice.
  • 38. © 2020 Skydio RANSAC 38 0 150 300 450 600 750 0% 10% 20% 30% 40% 50% 60% 70% 80% # Iterations vs. Outlier Fraction 2 Pts 3 Pts 4 Pts
  • 39. © 2020 Skydio Typical SLAM Pipeline w/ BA 39 Compute feature associations Map Compute pose of camera Outlier rejection Bundle Adjustment (Optimization) Keyframes Incoming frames t t -1 t - 2 1 2 3 4
  • 40. © 2020 Skydio Typical SLAM Pipeline w/ BA 40 Compute feature associations Map Compute pose of camera Outlier rejection Bundle Adjustment (Optimization) Keyframes Incoming frames t t -1 t - 2 1 2 3 4 Typical algorithms: • PnP • Essential Matrix • Homography
  • 41. © 2020 Skydio BA as a Factor Graph 41 Bundle of rays Position, orientation Camera calibrations Landmarks
  • 42. © 2020 Skydio BA as a SLAM Problem • How do we actually recover the states, given the measurements and our model? 42 ? Feature tracks form a ‘sensor’ measurement. Camera poses, landmark positions, calibration parameters. Measurement model is given by the projective geometry of the problem. We need to fill in this box.
  • 43. © 2020 Skydio Solving the Problem • We can use a technique called Nonlinear Least Squares to do this. • There are many ways to formulate SLAM problems generally, and we cannot review them all in the time allotted. • However, this method is widely applicable, typically fast, and is straightforward to implement. • For a much more comprehensive review, I highly recommend: State Estimation for Robotics, Tim Barfoot, 2015 (Free online) 43
  • 44. © 2020 Skydio Assumptions • We will convert our measurement models into a system of equations. • Prior to that, we will make an additional assumption - that the measurement noise is drawn from a zero-mean gaussian. 44 • We will also assume we have an initial guess for our states. In a time recursive system, this could come from the previous frame.
  • 45. © 2020 Skydio Nonlinear Least Squares We re-write our measurements as a residual functions: 45 The i’th camera pose, observing the j’th landmark. And concatenate these into a large vector: We take the squared Mahalanobis norm, weighting by our assumed measurement uncertainty.
  • 46. © 2020 Skydio Nonlinear Least Squares Our ‘best estimate’ will occur when the objective function is minimized: 46 Because f is usually going to be non-linear for most SLAM problems, we end up linearizing the problem and taking a series of steps. Jacobian J is the linearization of f about our initial guess.
  • 47. © 2020 Skydio Nonlinear Least Squares The solution at each iteration: 47 When linearized about the converged solution, the inverted Hessian doubles as a first order approximation of the marginal covariance of our estimate: * * See Barfoot, Chapters 3 and 4. First order approximation of the Hessian. Inversion has complexity O(|y|3) Each residual is weighted by its inverse uncertainty.
  • 49. © 2020 Skydio Nonlinear Least Squares • In the linearized form, the problem is ‘easy’ to solve. • Reduces to iterated application of weighted least squares. • Generally, cost of solving for updates is cubic in the number of states: • However, in some problems (like BA) there is sparsity we can leverage to improve this. • Huge number of problems can be cast this way (given an initial guess). • Can run in a fixed memory footprint → suitable for embedded use case. • With the appropriate Σ weights we can show the NLS produces an approximate estimate of the uncertainty in our solution. 49
  • 50. © 2020 Skydio Caveats • Remember our assumptions: • We needed an initial guess to linearize the system. If the guess is poor, the gradient used in the optimizer will steer our solution in the wrong direction. • Additionally, the covariance estimate we get out is only as good as the linearization point. • We also assumed Gaussian noise on the measurements. • Outliers must be removed, or they will dominate the optimization. 50
  • 51. © 2020 Skydio Linearization • It is worth considering the effect of linearization on our uncertainty estimate. • For a Gaussian variable u and non-linear vector function g, we can approximate: 51 Because we linearized, the fidelity of our first-order Σ relies on this approximation. See Barfoot, Chapter 2.
  • 52. © 2020 Skydio Tools • Some relevant tools: • GTSAM, open source package created by Frank Dellaert et al. • Allows specification of problem in factor graph format, built for SLAM. • G2O • Includes solutions for SLAM and BA. • Ceres Solver, produced by Google • General non-linear least-squares optimizer. • Python • scipy.optimize.least_squares 52
  • 53. © 2020 Skydio BA on Real-Time Systems • BA can operate at small and large scale. • Small: A few image frames on a mobile phone. • Large: Tens of thousand of images at city-scale. • Fairly straightforward to implement. • But: • Robust association may require expensive descriptors. • After feature association, we must devote nontrivial compute to outlier rejection. • Update rate limited to camera frame rate (slow). 53
  • 54. © 2020 Skydio VIO as a Factor Graph: BA + IMU 54 Bundle of rays Position, orientation, velocity Biases/IMU calibration parameters Camera calibrations Landmarks IMU, motion model Source: MatrixVision Cameras Source: Lord MicroStrain Inertial Measurement Unit
  • 55. © 2020 Skydio VIO • One of the most successful adaptations of vision research to the market. • Present in smart phones, AR/VR headsets, drones, autonomous vehicles. • Camera and IMU are highly complementary: • Camera: • Low update rate, high compute cost, subject to outlier data. • Able to relocalize accurately at large distances. • IMU: • High update rate, low compute cost, few outliers (maybe saturation). • Accurate over short intervals, but drifts over time. • Able to recover attitude with respect to global reference frame (gravity). 55
  • 56. © 2020 Skydio VIO 56 Compute feature associations Map Compute pose of camera Outlier rejection Bundle Adjustment (Optimization) Keyframes Incoming frames t t -1 t - 2 IMU can deliver substantial value here.
  • 57. © 2020 Skydio BA/VIO Implementations • Existing open-source implementations (not exhaustive): • OpenMVG • COLMAP Offline SFM and Multi-view Stereo (MVS) • CMVS Multi-view Stereo • ORB-SLAM2 Real-time SLAM featuring BA optimization • PTAM One of the earliest functional visual-SLAM demos • VINS-Mono VIO, runs on a mobile device • Basalt VIO • ROVIO VIO, example of a direct method 57
  • 58. © 2020 Skydio Fin • Additional Reading: • State Estimation for Robotics (Barfoot, 2015) • Factor Graphs for Robot Perception (Dellaert and Kaess, 2017) • Visual Odometry, (Scaramuzza and Fraundorfer, 2011) • Probabilistic Robotics, (Thrun, Burgard, and Fox, 2005) • GTSAM Software Library Questions? Feel free to reach out: gareth@skydio.com 58